[Bug 216359] [amdgpu] ring gfx timeout after waking from suspend and exiting X

2022-08-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216359

--- Comment #1 from Shlomo (shl...@fastmail.com) ---
Reposted on GitLab:

https://gitlab.freedesktop.org/drm/amd/-/issues/2124

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v6 1/6] drm/ttm: Add new callbacks to ttm res mgr

2022-08-15 Thread Arunpravin Paneer Selvam




On 8/15/2022 4:35 PM, Christian König wrote:

Am 12.08.22 um 15:30 schrieb Arunpravin Paneer Selvam:

We are adding two new callbacks to ttm resource manager
function to handle intersection and compatibility of
placement and resources.

v2: move the amdgpu and ttm_range_manager changes to
 separate patches (Christian)
v3: rename "intersect" to "intersects" (Matthew)
v4: move !place check to the !res if and return false
 in ttm_resource_compatible() function (Christian)
v5: move bits of code from patch number 6 to avoid
 temporary driver breakup (Christian)

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 



Patch #6 could still be cleaned up more now that we have the 
workaround code in patch #1, but that not really a must have.


Reviewed-by: Christian König  for the entire 
series.


Do you already have commit rights?


Hi Christian,
I applied for drm-misc commit rights, waiting for the project 
maintainers to approve my request.


Thanks,
Arun


Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c   |  9 ++--
  drivers/gpu/drm/ttm/ttm_resource.c | 77 +-
  include/drm/ttm/ttm_resource.h | 40 
  3 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c1bd006a5525..f066e8124c50 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -518,6 +518,9 @@ static int ttm_bo_evict(struct ttm_buffer_object 
*bo,

  bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
    const struct ttm_place *place)
  {
+    struct ttm_resource *res = bo->resource;
+    struct ttm_device *bdev = bo->bdev;
+
  dma_resv_assert_held(bo->base.resv);
  if (bo->resource->mem_type == TTM_PL_SYSTEM)
  return true;
@@ -525,11 +528,7 @@ bool ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,

  /* Don't evict this BO if it's outside of the
   * requested placement range
   */
-    if (place->fpfn >= (bo->resource->start + 
bo->resource->num_pages) ||

-    (place->lpfn && place->lpfn <= bo->resource->start))
-    return false;
-
-    return true;
+    return ttm_resource_intersects(bdev, res, place, bo->base.size);
  }
  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
  diff --git a/drivers/gpu/drm/ttm/ttm_resource.c 
b/drivers/gpu/drm/ttm/ttm_resource.c

index 20f9adcc3235..0d1f862a582b 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -253,10 +253,84 @@ void ttm_resource_free(struct ttm_buffer_object 
*bo, struct ttm_resource **res)

  }
  EXPORT_SYMBOL(ttm_resource_free);
  +/**
+ * ttm_resource_intersects - test for intersection
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res intersects with @place and @size. Used for testing 
if evictions

+ * are valueable or not.
+ *
+ * Returns true if the res placement intersects with @place and @size.
+ */
+bool ttm_resource_intersects(struct ttm_device *bdev,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+    struct ttm_resource_manager *man;
+
+    if (!res)
+    return false;
+
+    if (!place)
+    return true;
+
+    man = ttm_manager_type(bdev, res->mem_type);
+    if (!man->func->intersects) {
+    if (place->fpfn >= (res->start + res->num_pages) ||
+    (place->lpfn && place->lpfn <= res->start))
+    return false;
+
+    return true;
+    }
+
+    return man->func->intersects(man, res, place, size);
+}
+
+/**
+ * ttm_resource_compatible - test for compatibility
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res compatible with @place and @size.
+ *
+ * Returns true if the res placement compatible with @place and @size.
+ */
+bool ttm_resource_compatible(struct ttm_device *bdev,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+    struct ttm_resource_manager *man;
+
+    if (!res || !place)
+    return false;
+
+    man = ttm_manager_type(bdev, res->mem_type);
+    if (!man->func->compatible) {
+    if (res->start < place->fpfn ||
+    (place->lpfn && (res->start + res->num_pages) > 
place->lpfn))

+    return false;
+
+    return true;
+    }
+
+    return man->func->compatible(man, res, place, size);
+}
+
  static bool ttm_resource_places_compat(struct ttm_resource *res,
 const struct ttm_place *places,
 unsigned num_placement)
  {
+    struct ttm_buffer_object *bo = res->bo;
+    struct ttm_device *bdev = bo->bdev;
  unsigned i;
    if (res->placement & TTM_PL_FLAG_TEMPORARY)
@@ 

Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Matti Vaittinen

Hi dee Ho Mark, Laurent, Stephen, all

On 8/16/22 01:55, Mark Brown wrote:

On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:

On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:



You will very quickly see drivers doing this (either directly or
indirectly):



probe()
{
devm_clk_get_enabled();
devm_regulator_get_enable();
}



Without a devres-based get+enable API drivers can get the resources they
need in any order, possibly moving some of those resource acquisition
operations to different functions, and then have a clear block of code
that enables the resources in the right order.


I agree. And I think that drivers which do that should stick with it. 
Still, as you know the devm-unwinding is also done in well defined 
order. I believe that instead of fighting against the devm we should try 
educate people to pay attention in the order of unwinding (also when not 
handled by the devm. Driver writers occasionally break things also w/o 
devm for example by freeing resources needed by IRQ handlers prior 
freeing the IRQ.)


If "purging" must not be done in reverse order compared to the 
aquisition - then one should not use devm. I know people have done 
errors with devm - OTOH, I've seen devm also fixing bunch of errors.



These devres helpers give
a false sense of security to driver authors and they will end up
introducing problems, the same way that devm_kzalloc() makes it
outrageously easy to crash the kernel by disconnecting a device that is
in use.


I think this is going a bit "off-topic" but I'd like to understand what 
is behind this statement? From device-writer's perspective - I don't 
know much better alternatives to free up the memory. I don't see how 
freeing stuff at .remove would be any better? As far as I understand - 
if someone is using driver's resources after the device has gone and the 
driver is detached, then there is not much the driver could do to 
free-up the stuff be it devm or not? This sounds like fundamentally 
different problem (to me).



TBH I think the problem you have here is with devm not with this
particular function.


I must say I kind of agree with Mark. If we stop for a second to think 
what would the Laurent's example look like if there were no 
devm_regulator_get_enable() provided. I bet the poor driver author could 
have used devm_clk_get_enabled() - and then implemented a .remove for 
disabling the regulator. That would be even worse, right?



That's a different conversation, and a totally
valid one especially when you start looking at things like implementing
userspace APIs which need to cope with hardware going away while still
visible to userspace.


This is interesting. It's not easy for me to spot how devm changes 
things here? If we consider some removable device - then I guess also 
the .remove() is ran only after HW has already gone? Yes, devm might 
make the time window when userspace can see hardware that has gone 
longer but does it bring any new problem there? It seems to me devm can 
make hitting the spot more likely - but I don't think it brings 
completely new issues? (Well, I may be wrong here - wouldn't be the 
first time :])



It's *probably* more of a subsystem conversation
than a driver one though, or at least I think subsystems should try to
arrange to make it so.



--
Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~

Discuss - Estimate - Plan - Report and finally accomplish this:
void do_work(int time) __attribute__ ((const));


[PATCH] drm/i915: Switch TGL-H DP-IN to dGFX when it's supported

2022-08-15 Thread Kai-Heng Feng
On mobile workstations like HP ZBook Fury G8, iGFX's DP-IN can switch to
dGFX so external monitors are routed to dGFX, and more monitors can be
supported as result.

To switch the DP-IN to dGFX, the driver needs to invoke _DSM function 20
on intel_dsm_guid2. This method is described in Intel document 632107.

Signed-off-by: Kai-Heng Feng 
---
 drivers/gpu/drm/i915/display/intel_acpi.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_acpi.c 
b/drivers/gpu/drm/i915/display/intel_acpi.c
index e78430001f077..3bd5930e2769b 100644
--- a/drivers/gpu/drm/i915/display/intel_acpi.c
+++ b/drivers/gpu/drm/i915/display/intel_acpi.c
@@ -20,6 +20,7 @@ static const guid_t intel_dsm_guid =
  0xa8, 0x54, 0x0f, 0x13, 0x17, 0xb0, 0x1c, 0x2c);
 
 #define INTEL_DSM_FN_GET_BIOS_DATA_FUNCS_SUPPORTED 0 /* No args */
+#define INTEL_DSM_FN_DP_IN_SWITCH_TO_DGFX 20 /* No args */
 
 static const guid_t intel_dsm_guid2 =
GUID_INIT(0x3e5b41c6, 0xeb1d, 0x4260,
@@ -187,6 +188,7 @@ void intel_dsm_get_bios_data_funcs_supported(struct 
drm_i915_private *i915)
struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
acpi_handle dhandle;
union acpi_object *obj;
+   int supported = 0;
 
dhandle = ACPI_HANDLE(>dev);
if (!dhandle)
@@ -194,8 +196,22 @@ void intel_dsm_get_bios_data_funcs_supported(struct 
drm_i915_private *i915)
 
obj = acpi_evaluate_dsm(dhandle, _dsm_guid2, 
INTEL_DSM_REVISION_ID,
INTEL_DSM_FN_GET_BIOS_DATA_FUNCS_SUPPORTED, 
NULL);
-   if (obj)
+   if (obj) {
+   if (obj->type == ACPI_TYPE_INTEGER)
+   supported = obj->integer.value;
+
ACPI_FREE(obj);
+   }
+
+   /* Tiger Lake H DP-IN Boot Time Switching from iGfx to dGfx */
+   if (supported & BIT(20)) {
+   obj = acpi_evaluate_dsm(dhandle, _dsm_guid2,
+   INTEL_DSM_REVISION_ID,
+   INTEL_DSM_FN_DP_IN_SWITCH_TO_DGFX,
+   NULL);
+   if (obj)
+   ACPI_FREE(obj);
+   }
 }
 
 /*
-- 
2.36.1



[PATCH v2 2/2] drm/i915/dg2: Add additional tuning settings

2022-08-15 Thread Matt Roper
Some additional MMIO tuning settings have appeared in the bspec's
performance tuning guide section.

One of the tuning settings here is also documented as formal workaround
Wa_22012654132 for some steppings of DG2.  However the tuning setting
applies to all DG2 variants and steppings, making it a superset of the
workaround.

v2:
 - Move DRAW_WATERMARK to engine workaround section.  It only moves into
   the engine context on future platforms.  (Lucas)
 - CHICKEN_RASTER_2 needs to be handled as a masked register.  (Lucas)

Bspec: 68331
Cc: Lucas De Marchi 
Cc: Lionel Landwerlin 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  8 ++
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 27 ++---
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index b3b49f6d6d1c..f64fafe28f72 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -259,6 +259,9 @@
 #define   GEN9_PREEMPT_GPGPU_COMMAND_LEVEL GEN9_PREEMPT_GPGPU_LEVEL(1, 0)
 #define   GEN9_PREEMPT_GPGPU_LEVEL_MASK
GEN9_PREEMPT_GPGPU_LEVEL(1, 1)
 
+#define DRAW_WATERMARK _MMIO(0x26c0)
+#define   VERT_WM_VAL  REG_GENMASK(9, 0)
+
 #define GEN12_GLOBAL_MOCS(i)   _MMIO(0x4000 + (i) * 4) /* 
Global MOCS regs */
 
 #define RENDER_HWS_PGA_GEN7_MMIO(0x4080)
@@ -374,6 +377,9 @@
 #define CHICKEN_RASTER_1   _MMIO(0x6204)
 #define   DIS_SF_ROUND_NEAREST_EVENREG_BIT(8)
 
+#define CHICKEN_RASTER_2   _MMIO(0x6208)
+#define   TBIMR_FAST_CLIP  REG_BIT(5)
+
 #define VFLSKPD_MMIO(0x62a8)
 #define   DIS_OVER_FETCH_CACHE REG_BIT(1)
 #define   DIS_MULT_MISS_RD_SQUASH  REG_BIT(0)
@@ -1124,6 +1130,8 @@
 
 #define RT_CTRL_MMIO(0xe530)
 #define   DIS_NULL_QUERY   REG_BIT(10)
+#define   STACKID_CTRL REG_GENMASK(6, 5)
+#define   STACKID_CTRL_512 REG_FIELD_PREP(STACKID_CTRL, 
0x2)
 
 #define EU_PERF_CNTL1  _MMIO(0xe558)
 #define EU_PERF_CNTL5  _MMIO(0xe55c)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index a68d279b01f0..31e129329fb0 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -568,6 +568,7 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs 
*engine,
 static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine,
   struct i915_wa_list *wal)
 {
+   wa_masked_en(wal, CHICKEN_RASTER_2, TBIMR_FAST_CLIP);
wa_write_clr_set(wal, GEN11_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK,
 REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f));
wa_add(wal,
@@ -2195,15 +2196,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, 
struct i915_wa_list *wal)
wa_write_or(wal, XEHP_L3NODEARBCFG, XEHP_LNESPARE);
}
 
-   if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_C0) ||
-   IS_DG2_G11(i915)) {
-   /* Wa_22012654132:dg2 */
-   wa_add(wal, GEN10_CACHE_MODE_SS, 0,
-  _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC),
-  0 /* write-only, so skip validation */,
-  true);
-   }
-
/* Wa_14013202645:dg2 */
if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) ||
IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0))
@@ -2692,6 +2684,23 @@ add_render_compute_tuning_settings(struct 
drm_i915_private *i915,
 
if (IS_DG2(i915)) {
wa_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS);
+   wa_write_clr_set(wal, RT_CTRL, STACKID_CTRL, STACKID_CTRL_512);
+   wa_write_clr_set(wal, DRAW_WATERMARK, VERT_WM_VAL,
+REG_FIELD_PREP(VERT_WM_VAL, 0x3FF));
+
+   /*
+* This is also listed as Wa_22012654132 for certain DG2
+* steppings, but the tuning setting programming is a superset
+* since it applies to all DG2 variants and steppings.
+*
+* Note that register 0xE420 is write-only and cannot be read
+* back for verification on DG2 (due to Wa_14012342262), so
+* we need to explicitly skip the readback.
+*/
+   wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+  _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC),
+  0 /* write-only, so skip validation */,
+  true);
}
 }
 
-- 
2.37.1



[PATCH v2] drm/i915/guc/slpc: Allow SLPC to use efficient frequency

2022-08-15 Thread Vinay Belgaumkar
Host Turbo operates at efficient frequency when GT is not idle unless
the user or workload has forced it to a higher level. Replicate the same
behavior in SLPC by allowing the algorithm to use efficient frequency.
We had disabled it during boot due to concerns that it might break
kernel ABI for min frequency. However, this is not the case since
SLPC will still abide by the (min,max) range limits.

With this change, min freq will be at efficient frequency level at init
instead of fused min (RPn). If user chooses to reduce min freq below the
efficient freq, we will turn off usage of efficient frequency and honor
the user request. When a higher value is written, it will get toggled
back again.

The patch also corrects the register which needs to be read for obtaining
the correct efficient frequency for Gen9+.

We see much better perf numbers with benchmarks like glmark2 with
efficient frequency usage enabled as expected.

v2: Address review comments (Rodrigo)

BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/5468

Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
 drivers/gpu/drm/i915/gt/intel_rps.c |  7 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 66 -
 drivers/gpu/drm/i915/intel_mchbar_regs.h|  3 +
 3 files changed, 22 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index c7d381ad90cf..8c289a032103 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1107,7 +1107,12 @@ void gen6_rps_get_freq_caps(struct intel_rps *rps, 
struct intel_rps_freq_caps *c
caps->min_freq = (rp_state_cap >>  0) & 0xff;
} else {
caps->rp0_freq = (rp_state_cap >>  0) & 0xff;
-   caps->rp1_freq = (rp_state_cap >>  8) & 0xff;
+   if (GRAPHICS_VER(i915) >= 10)
+   caps->rp1_freq = REG_FIELD_GET(RPE_MASK,
+  
intel_uncore_read(to_gt(i915)->uncore,
+  GEN10_FREQ_INFO_REC));
+   else
+   caps->rp1_freq = (rp_state_cap >>  8) & 0xff;
caps->min_freq = (rp_state_cap >> 16) & 0xff;
}
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index e1fa1f32f29e..9d49ccef03bb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -137,17 +137,6 @@ static int guc_action_slpc_set_param(struct intel_guc 
*guc, u8 id, u32 value)
return ret > 0 ? -EPROTO : ret;
 }
 
-static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id)
-{
-   u32 request[] = {
-   GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
-   SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
-   id,
-   };
-
-   return intel_guc_send(guc, request, ARRAY_SIZE(request));
-}
-
 static bool slpc_is_running(struct intel_guc_slpc *slpc)
 {
return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING;
@@ -201,16 +190,6 @@ static int slpc_set_param(struct intel_guc_slpc *slpc, u8 
id, u32 value)
return ret;
 }
 
-static int slpc_unset_param(struct intel_guc_slpc *slpc,
-   u8 id)
-{
-   struct intel_guc *guc = slpc_to_guc(slpc);
-
-   GEM_BUG_ON(id >= SLPC_MAX_PARAM);
-
-   return guc_action_slpc_unset_param(guc, id);
-}
-
 static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
 {
struct drm_i915_private *i915 = slpc_to_i915(slpc);
@@ -491,6 +470,16 @@ int intel_guc_slpc_set_min_freq(struct intel_guc_slpc 
*slpc, u32 val)
 
with_intel_runtime_pm(>runtime_pm, wakeref) {
 
+   /* Ignore efficient freq if lower min freq is requested */
+   ret = slpc_set_param(slpc,
+SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
+val < slpc->rp1_freq);
+   if (unlikely(ret)) {
+   i915_probe_error(i915, "Failed to toggle efficient freq 
(%pe)\n",
+ERR_PTR(ret));
+   return ret;
+   }
+
ret = slpc_set_param(slpc,
 SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
 val);
@@ -587,7 +576,9 @@ static int slpc_set_softlimits(struct intel_guc_slpc *slpc)
return ret;
 
if (!slpc->min_freq_softlimit) {
-   slpc->min_freq_softlimit = slpc->min_freq;
+   ret = intel_guc_slpc_get_min_freq(slpc, 
>min_freq_softlimit);
+   if (unlikely(ret))
+   return ret;
slpc_to_gt(slpc)->defaults.min_freq = slpc->min_freq_softlimit;
} else if (slpc->min_freq_softlimit != slpc->min_freq) {
return 

Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Mark Brown
On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:
> On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:

> > The basic idea is that drivers should be focused on what they're
> > driving, not navigating the (sometimes) complex integration that's
> > taking place around them. When a device driver probe function is called
> > the device should already be powered on.

> No. ACPI does that in many cases, and that's a real bad idea. There are
> devices that you do *not* want to power up on probe. I'm thinking, for
> example, about camera sensors that have a privacy LED that will light up
> when the sensor is powered up. You don't want it to flash on boot. There
> are also other use cases related to fault tolerance where you want
> drivers to initialize properly and only access the device later.

I don't think it's an either/or thing in terms of approach here - we
need a range of options to choose from.  ACPI is totally fine and solves
real problems for the systems it targets, the problems we see with it
are mainly that it has a very strong system abstraction and doesn't cope
well when things go outside that coupled with the fact that Windows long
ago decided that board files were totally fine for papering over any
problems so people haven't worked on standardisation where they should.
Some SoCs like to do similar things with their power controller cores.

Conversely for example with many (but not all) SoC IPs the mechanics of
the system integration and range of options available are such that
dealing with them is kind of out of scope of the driver, but they're
often very repetitive over any given SoC so there is a benefit in
pushing things into power domains rather than having the driver for the
IP manage everything.  We need to be able to be flexible so we can find
the best idioms to represent the different systems in front of us rather
than trying to force all systems into a single idiom.

> These devres helpers go in the exact opposite direction of what we
> should be doing, by telling driver authors it's totally fine to not
> implement power management. Why don't we just drop error handling and go
> back to the big kernel lock in that case ? That was much easier to
> program too.

Sometimes it's totally fine to not worry, at least at a first pass.
Perhaps you're more concerned with real time, perhaps your system
doesn't provide control for the relevant resources.  Sometimes the
savings are so negligable that it's questionable if doing the power
manageement is an overall power saving.

> You will very quickly see drivers doing this (either directly or
> indirectly):

> probe()
> {
>   devm_clk_get_enabled();
>   devm_regulator_get_enable();
> }

> Without a devres-based get+enable API drivers can get the resources they
> need in any order, possibly moving some of those resource acquisition
> operations to different functions, and then have a clear block of code
> that enables the resources in the right order. These devres helpers give
> a false sense of security to driver authors and they will end up
> introducing problems, the same way that devm_kzalloc() makes it
> outrageously easy to crash the kernel by disconnecting a device that is
> in use.

TBH I think the problem you have here is with devm not with this
particular function.  That's a different conversation, and a totally
valid one especially when you start looking at things like implementing
userspace APIs which need to cope with hardware going away while still
visible to userspace.  It's *probably* more of a subsystem conversation
than a driver one though, or at least I think subsystems should try to
arrange to make it so.


signature.asc
Description: PGP signature


Re: [PATCH] drm/i915/guc: clear stalled request after a reset

2022-08-15 Thread John Harrison

On 8/11/2022 14:08, Daniele Ceraolo Spurio wrote:

If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.

If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.

Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.

Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio 
Cc: Matthew Brost 
Cc: John Harrison 
Cc:  # v5.15+

Seems like a good thing to do.
Reviewed-by: John Harrison 


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0d17da77e787..0d56b615bf78 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4002,6 +4002,13 @@ static inline void guc_init_lrc_mapping(struct intel_guc 
*guc)
/* make sure all descriptors are clean... */
xa_destroy(>context_lookup);
  
+	/*

+* A reset might have occurred while we had a pending stalled request,
+* so make sure we clean that up.
+*/
+   guc->stalled_request = NULL;
+   guc->submission_stall_reason = STALL_NONE;
+
/*
 * Some contexts might have been pinned before we enabled GuC
 * submission, so we need to add them to the GuC bookeeping.




Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Mark Brown
On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> Quoting Laurent Pinchart (2022-08-15 11:52:36)
> > On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:

> > > > - With devres, you don't have full control over the order in which
> > > >   resources will be released, which means that you can't control the
> > > >   power off sequence, in particular if it needs to be sequenced with
> > > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > > >   will creep in in places where it shouldn't be used, driver authours
> > > >   should really pay attention to power management and not live with the
> > > >   false impression that everything will be handled automatically for
> > > >   them. In the worst cases, an incorrect power off sequence could lead
> > > >   to hardware damage.

> I think the main issue is that platform drivers are being asked to do
> too much. We've put the burden on platform driver authors to intimately
> understand how their devices are integrated, and as we all know they're

This is for the regulator API, it's mainly for off SoC devices so it's
not a question of understanding the integration of a device into a piece
of silicon, it's a question of understanding the integration of a chip
into a board which seems reasonably in scope for a chip driver and is
certainly the sort of thing that you'd be talking to your customers
about as a silicon vendor.

> The basic idea is that drivers should be focused on what they're
> driving, not navigating the (sometimes) complex integration that's
> taking place around them. When a device driver probe function is called
> the device should already be powered on. When the driver is
> removed/unbound, the power should be removed after the driver's remove
> function is called. We're only going to be able to solve the power
> sequencing and ordering problem by taking away power control and
> sequencing from drivers.

That is a sensible approach for most on SoC things but for something
shipped as a separate driver there's little point in separating the
power and clocking domain driver from the device since there's typically
a 1:1 mapping.  Usually either it's extremely simple (eg, turn
everything on and remove reset) but some devices really need to manage
things.  There's obviously some edge cases in SoC integration as well
(eg, the need to manage card supplies for SD controllers, or knowing
exact clock rates for things like audio controllers) so you need some
flex.


signature.asc
Description: PGP signature


Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Laurent Pinchart
Hi Stephan,

On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> Quoting Laurent Pinchart (2022-08-15 11:52:36)
> > On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> > > 
> > > > - With devres, you don't have full control over the order in which
> > > >   resources will be released, which means that you can't control the
> > > >   power off sequence, in particular if it needs to be sequenced with
> > > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > > >   will creep in in places where it shouldn't be used, driver authours
> > > >   should really pay attention to power management and not live with the
> > > >   false impression that everything will be handled automatically for
> > > >   them. In the worst cases, an incorrect power off sequence could lead
> > > >   to hardware damage.
> 
> I think the main issue is that platform drivers are being asked to do
> too much. We've put the burden on platform driver authors to intimately
> understand how their devices are integrated, and as we all know they're
> not very interested in these details because they already have a hard
> time to write a driver just to make their latest gizmo whir. Throw in
> power management and you get these wrappers that try to compartmentalize
> power management logic away from the main part of the driver that's
> plugging into the driver subsystem because the SoC integration logic is
> constantly changing but the device core isn't.
> 
> We need to enhance the platform bus layer to make it SoC aware when the
> platform device is inside an SoC, or "board" aware when the device lives
> outside of an SoC, i.e. it's a discrete IC. The bus layer should manage
> power state transitions for the platform devices, and the platform
> drivers should only be able to request runtime power/performance state
> changes through device PM APIs (dev_pm_*). If this can all be done
> through genpds then it sounds great. We may need to write some generic
> code for discrete ICs that enables regulators and then clks before
> muxing out pins or something like that. Obviously, I don't have all the
> details figured out.
> 
> The basic idea is that drivers should be focused on what they're
> driving, not navigating the (sometimes) complex integration that's
> taking place around them. When a device driver probe function is called
> the device should already be powered on.

No. ACPI does that in many cases, and that's a real bad idea. There are
devices that you do *not* want to power up on probe. I'm thinking, for
example, about camera sensors that have a privacy LED that will light up
when the sensor is powered up. You don't want it to flash on boot. There
are also other use cases related to fault tolerance where you want
drivers to initialize properly and only access the device later.

> When the driver is
> removed/unbound, the power should be removed after the driver's remove
> function is called. We're only going to be able to solve the power
> sequencing and ordering problem by taking away power control and
> sequencing from drivers.

For SoC devices we may be able to achieve this to some extent, at least
for simple devices that don't have exotic requirements (by exotic I mean
requiring more than one clock for instance, so it's not *that* exotic).
For on-board devices that's impossible by nature, the power up/down
constraints are specific to the device, it's the job of the driver to
handle them, the same way it has to handle everything else that is
device-specific.

The right way to handle this in my opinion is to push for RPM support in
drivers. The regulator, reset, clock & co. handling then happens in the
RPM suspend and resume handlers, they're self-contained, well ordered,
and much easier to review. The rest of the driver then only uses the RPM
API. It's easy(er) and clean (at least when you don't throw ACPI to the
mix, with its "peculiar" idea of calling probe with the device powered
on *BUT* in the RPM suspended state - due to historical reasons I
believe - but I think that should be fixable on the ACPI side, albeit
perhaps not without substantial effort), and you get simple review
rules: if the driver doesn't implement RPM, or handles resource
enable/disable outside of the RPM suspend/resume handlers, it's most
likely getting it wrong.

These devres helpers go in the exact opposite direction of what we
should be doing, by telling driver authors it's totally fine to not
implement power management. Why don't we just drop error handling and go
back to the big kernel lock in that case ? That was much easier to
program too.

> > > I basically agree with these concerns which is why I was only happy with
> > > this API when Matti suggested doing it in a way that meant that the
> > > callers are unable to access the regulator at runtime, this means that
> > > if anyone wants to do any kind of management of the power state outside

[PATCH v2 3/3] drm/msm/prime: Add mmap_info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1dee0d18abbb..1db53545ac40 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1048,6 +1048,17 @@ static const struct vm_operations_struct vm_ops = {
.close = drm_gem_vm_close,
 };
 
+static enum dma_buf_map_info msm_gem_map_info(struct drm_gem_object *obj)
+{
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
+
+   switch (msm_obj->flags & MSM_BO_CACHE_MASK) {
+   case MSM_BO_WC:return DMA_BUF_COHERENT_WC;
+   case MSM_BO_CACHED_COHERENT:   return DMA_BUF_COHERENT_CACHED;
+   default:   return DMA_BUF_MAP_INCOHERENT;
+   }
+}
+
 static const struct drm_gem_object_funcs msm_gem_object_funcs = {
.free = msm_gem_free_object,
.pin = msm_gem_prime_pin,
@@ -1057,6 +1068,7 @@ static const struct drm_gem_object_funcs 
msm_gem_object_funcs = {
.vunmap = msm_gem_prime_vunmap,
.mmap = msm_gem_object_mmap,
.vm_ops = _ops,
+   .map_info = msm_gem_map_info,
 };
 
 static int msm_gem_new_impl(struct drm_device *dev,
-- 
2.36.1



[PATCH v2 2/3] drm/prime: Wire up mmap_info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

Just plumbing the thing thru an extra layer.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_prime.c |  3 +++
 include/drm/drm_gem.h   | 11 +++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index e3f09f18110c..4457fedde1ec 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -888,6 +888,9 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object 
*obj,
.resv = obj->resv,
};
 
+   if (obj->funcs && obj->funcs->map_info)
+   exp_info.map_info = obj->funcs->map_info(obj);
+
return drm_gem_dmabuf_export(dev, _info);
 }
 EXPORT_SYMBOL(drm_gem_prime_export);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index f28a48a6f846..a573ebfc529a 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -172,6 +172,17 @@ struct drm_gem_object_funcs {
 * This is optional but necessary for mmap support.
 */
const struct vm_operations_struct *vm_ops;
+
+   /**
+* @map_info:
+*
+* Return dma_buf_map_info indicating the coherency of an exported
+* dma-buf.
+*
+* This callback is optional.  If not provided, exported dma-bufs are
+* assumed to be DMA_BUF_MAP_INCOHERENT.
+*/
+   enum dma_buf_map_info (*map_info)(struct drm_gem_object *obj);
 };
 
 /**
-- 
2.36.1



[PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

2022-08-15 Thread Rob Clark
From: Rob Clark 

This is a fairly narrowly focused interface, providing a way for a VMM
in userspace to tell the guest kernel what pgprot settings to use when
mapping a buffer to guest userspace.

For buffers that get mapped into guest userspace, virglrenderer returns
a dma-buf fd to the VMM (crosvm or qemu).  In addition to mapping the
pages into the guest VM, it needs to report to drm/virtio in the guest
the cache settings to use for guest userspace.  In particular, on some
architectures, creating aliased mappings with different cache attributes
is frowned upon, so it is important that the guest mappings have the
same cache attributes as any potential host mappings.

Signed-off-by: Rob Clark 
---
v2: Combine with coherency, as that is a related concept.. and it is
relevant to the VMM whether coherent access without the SYNC ioctl
is possible; set map_info at export time to make it more clear
that it applies for the lifetime of the dma-buf (for any mmap
created via the dma-buf)

 drivers/dma-buf/dma-buf.c| 63 +++--
 include/linux/dma-buf.h  | 11 ++
 include/uapi/linux/dma-buf.h | 68 
 3 files changed, 132 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..262c4706f721 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
.kill_sb = kill_anon_super,
 };
 
+static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
+{
+   int ret;
+
+   /* check if buffer supports mmap */
+   if (!dmabuf->ops->mmap)
+   return -EINVAL;
+
+   ret = dmabuf->ops->mmap(dmabuf, vma);
+
+   /*
+* If the exporter claims to support coherent access, ensure the
+* pgprot flags match the claim.
+*/
+   if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
+   pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
+   WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != 
pgprot_val(wc_prot));
+   } else {
+   WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == 
pgprot_val(wc_prot));
+   }
+   }
+
+   return ret;
+}
+
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
struct dma_buf *dmabuf;
@@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, 
struct vm_area_struct *vma)
 
dmabuf = file->private_data;
 
-   /* check if buffer supports mmap */
-   if (!dmabuf->ops->mmap)
-   return -EINVAL;
-
/* check for overflowing the buffer's size */
if (vma->vm_pgoff + vma_pages(vma) >
dmabuf->size >> PAGE_SHIFT)
return -EINVAL;
 
-   return dmabuf->ops->mmap(dmabuf, vma);
+   return __dma_buf_mmap(dmabuf, vma);
 }
 
 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -326,6 +348,27 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
+{
+   struct dma_buf_info arg;
+
+   if (copy_from_user(, uarg, sizeof(arg)))
+   return -EFAULT;
+
+   switch (arg.param) {
+   case DMA_BUF_INFO_MAP_INFO:
+   arg.value = dmabuf->map_info;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (copy_to_user(uarg, , sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static long dma_buf_ioctl(struct file *file,
  unsigned int cmd, unsigned long arg)
 {
@@ -369,6 +412,9 @@ static long dma_buf_ioctl(struct file *file,
case DMA_BUF_SET_NAME_B:
return dma_buf_set_name(dmabuf, (const char __user *)arg);
 
+   case DMA_BUF_IOCTL_INFO:
+   return dma_buf_info(dmabuf, (void __user *)arg);
+
default:
return -ENOTTY;
}
@@ -530,6 +576,7 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
dmabuf->priv = exp_info->priv;
dmabuf->ops = exp_info->ops;
dmabuf->size = exp_info->size;
+   dmabuf->map_info = exp_info->map_info;
dmabuf->exp_name = exp_info->exp_name;
dmabuf->owner = exp_info->owner;
spin_lock_init(>name_lock);
@@ -1245,10 +1292,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct 
vm_area_struct *vma,
if (WARN_ON(!dmabuf || !vma))
return -EINVAL;
 
-   /* check if buffer supports mmap */
-   if (!dmabuf->ops->mmap)
-   return -EINVAL;
-
/* check for offset overflow */
if (pgoff + vma_pages(vma) < pgoff)
return -EOVERFLOW;
@@ -1262,7 +1305,7 @@ int dma_buf_mmap(struct 

[PATCH v2 0/3] dma-buf: map-info support

2022-08-15 Thread Rob Clark
From: Rob Clark 

See 1/3 for motivation.

Rob Clark (3):
  dma-buf: Add ioctl to query mmap coherency/cache info
  drm/prime: Wire up mmap_info support
  drm/msm/prime: Add mmap_info support

 drivers/dma-buf/dma-buf.c | 63 ++--
 drivers/gpu/drm/drm_prime.c   |  3 ++
 drivers/gpu/drm/msm/msm_gem.c | 12 +++
 include/drm/drm_gem.h | 11 ++
 include/linux/dma-buf.h   | 11 ++
 include/uapi/linux/dma-buf.h  | 68 +++
 6 files changed, 158 insertions(+), 10 deletions(-)

-- 
2.36.1



Re: [PATCH] drm/hyperv: Fix an error handling path in hyperv_vmbus_probe()

2022-08-15 Thread Christophe JAILLET

Le 15/08/2022 à 17:56, Wei Liu a écrit :


All that said, the fix looks good, so

Reviewed-by: Michael Kelley 


I made the two changes listed above and applied this patch to
hyperv-fixes.



Thanks a lot, that saves me a v2.

CJ


Thanks,
Wei.





Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Stephen Boyd
Quoting Laurent Pinchart (2022-08-15 11:52:36)
> Hi Mark,
> 
> On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> > 
> > > - With devres, you don't have full control over the order in which
> > >   resources will be released, which means that you can't control the
> > >   power off sequence, in particular if it needs to be sequenced with
> > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > >   will creep in in places where it shouldn't be used, driver authours
> > >   should really pay attention to power management and not live with the
> > >   false impression that everything will be handled automatically for
> > >   them. In the worst cases, an incorrect power off sequence could lead
> > >   to hardware damage.

I think the main issue is that platform drivers are being asked to do
too much. We've put the burden on platform driver authors to intimately
understand how their devices are integrated, and as we all know they're
not very interested in these details because they already have a hard
time to write a driver just to make their latest gizmo whir. Throw in
power management and you get these wrappers that try to compartmentalize
power management logic away from the main part of the driver that's
plugging into the driver subsystem because the SoC integration logic is
constantly changing but the device core isn't.

We need to enhance the platform bus layer to make it SoC aware when the
platform device is inside an SoC, or "board" aware when the device lives
outside of an SoC, i.e. it's a discrete IC. The bus layer should manage
power state transitions for the platform devices, and the platform
drivers should only be able to request runtime power/performance state
changes through device PM APIs (dev_pm_*). If this can all be done
through genpds then it sounds great. We may need to write some generic
code for discrete ICs that enables regulators and then clks before
muxing out pins or something like that. Obviously, I don't have all the
details figured out.

The basic idea is that drivers should be focused on what they're
driving, not navigating the (sometimes) complex integration that's
taking place around them. When a device driver probe function is called
the device should already be powered on. When the driver is
removed/unbound, the power should be removed after the driver's remove
function is called. We're only going to be able to solve the power
sequencing and ordering problem by taking away power control and
sequencing from drivers.

> > 
> > I basically agree with these concerns which is why I was only happy with
> > this API when Matti suggested doing it in a way that meant that the
> > callers are unable to access the regulator at runtime, this means that
> > if anyone wants to do any kind of management of the power state outside
> > of probe and remove they are forced to convert to the full fat APIs.
> > The general ordering concern with devm is that the free happens too late
> > but for the most part this isn't such a concern with regulators, they
> > might have delayed power off anyway due to sharing - it's no worse than
> > memory allocation AFAICT.  Given all the other APIs using devm it's
> > probably going to end up fixing some bugs.
> > 
> > For sequencing I'm not convinced it's much worse than the bulk API is
> > anyway, and practically speaking I expect most devices that have
> > problems here will also need more control over power anyway - it's
> > certainly the common case that hardware has pretty basic requirements
> > and is fairly tolerant.
> 
> I'm not extremely concerned here at the moment, as power should be the
> last thing to be turned off, after clocks and reset signals. As clocks
> and GPIOs will still be controlled manually in the driver .remove()
> function, it means that power will go last, which should be fine.
> However, should a devm_clk_get_enable() or similar function be

This API is implemented now.

> implemented, we'll run into trouble. Supplying active high input signals
> to a device that is not powered can lead to latch-up, which tends to
> only manifest after a statistically significant number of occurrences of
> the condition, and can slowly damage the hardware over time. This is a
> real concern as it will typically not be caught during early
> development. I think we would still be better off with requiring drivers
> to manually handle powering off the device until we provide a mechanism
> that can do so safely in an automated way.

Can you describe the error scenario further? I think it's driver author
error that would lead to getting and enabling the regulator after
getting and enabling a clk that drives out a clock signal on some pins
that aren't powered yet. I'm not sure that's all that much easier to do
with these sorts of devm APIs, but if it is then I'm concerned.

> 
> > > - Powering regulators on at probe time and leaving them on is a very 

Re: [PATCH v3 4/4] drm/amdgpu: Document gfx_off members of struct amdgpu_gfx

2022-08-15 Thread Alex Deucher
Applied the series.  Thanks!

Alex

On Wed, Aug 10, 2022 at 7:30 PM André Almeida  wrote:
>
> Add comments to document gfx_off related members of struct amdgpu_gfx.
>
> Signed-off-by: André Almeida 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 1b8b4a5270c9..8abdf41d0f83 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -332,12 +332,12 @@ struct amdgpu_gfx {
> uint32_tsrbm_soft_reset;
>
> /* gfx off */
> -   boolgfx_off_state; /* true: enabled, 
> false: disabled */
> -   struct mutexgfx_off_mutex;
> -   uint32_tgfx_off_req_count; /* default 1, 
> enable gfx off: dec 1, disable gfx off: add 1 */
> -   struct delayed_work gfx_off_delay_work;
> -   uint32_tgfx_off_residency;
> -   uint64_tgfx_off_entrycount;
> +   boolgfx_off_state;  /* true: enabled, 
> false: disabled */
> +   struct mutexgfx_off_mutex;  /* mutex to 
> change gfxoff state */
> +   uint32_tgfx_off_req_count;  /* default 1, 
> enable gfx off: dec 1, disable gfx off: add 1 */
> +   struct delayed_work gfx_off_delay_work; /* async work to 
> set gfx block off */
> +   uint32_tgfx_off_residency;  /* last logged 
> residency */
> +   uint64_tgfx_off_entrycount; /* count of times 
> GPU has get into GFXOFF state */
>
> /* pipe reservation */
> struct mutexpipe_reserve_mutex;
> --
> 2.37.1
>


Re: [PATCH] drm/amd/display: Unneeded semicolon

2022-08-15 Thread Alex Deucher
Applied.  Thanks!

On Sat, Aug 13, 2022 at 11:35 AM min tang  wrote:
>
> There is no semicolon after '}' in line 510.
>
> Signed-off-by: min tang 
> ---
>  drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c 
> b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
> index 27501b735a9c..c87cf8771c6d 100644
> --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
> +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_clk_mgr.c
> @@ -505,7 +505,7 @@ static void dcn315_clk_mgr_helper_populate_bw_params(
> bw_params->clk_table.entries[i].dispclk_mhz = 
> clock_table->DispClocks[i];
> bw_params->clk_table.entries[i].dppclk_mhz = 
> clock_table->DppClocks[i];
> bw_params->clk_table.entries[i].wck_ratio = 1;
> -   };
> +   }
>
> /* Make sure to include at least one entry and highest pstate */
> if (max_pstate != min_pstate || i == 0) {
> --
> 2.17.1
>


Re: [PATCH] drm/amd/display: Fix comment typo

2022-08-15 Thread Alex Deucher
Applied.  Thanks!

On Sat, Aug 13, 2022 at 11:13 AM min tang  wrote:
>
> The double `aligned' is duplicated in line 1070, remove one.
>
> Signed-off-by: min tang 
> ---
>  drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c 
> b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
> index b1671b00ce40..0844b3eeb291 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
> @@ -1066,7 +1066,7 @@ static void optc1_set_test_pattern(
> src_color[index] >> (src_bpc - dst_bpc);
> /* CRTC_TEST_PATTERN_DATA has 16 bits,
>  * lowest 6 are hardwired to ZERO
> -* color bits should be left aligned aligned to MSB
> +* color bits should be left aligned to MSB
>  * XX00 for 10 bit,
>  *  for 8 bit and XX00 for 6
>  */
> --
> 2.17.1
>


Re: [PATCH] drm/amdgpu: remove useless condition in amdgpu_job_stop_all_jobs_on_sched()

2022-08-15 Thread Alex Deucher
Applied.  Thanks!

On Fri, Aug 12, 2022 at 7:13 AM Christian König
 wrote:
>
> @Alex was that one already picked up?
>
> Am 25.07.22 um 18:40 schrieb Andrey Grodzovsky:
> > Reviewed-by: Andrey Grodzovsky 
> >
> > Andrey
> >
> > On 2022-07-19 06:39, Andrey Strachuk wrote:
> >> Local variable 'rq' is initialized by an address
> >> of field of drm_sched_job, so it does not make
> >> sense to compare 'rq' with NULL.
> >>
> >> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> >>
> >> Signed-off-by: Andrey Strachuk 
> >> Fixes: 7c6e68c777f1 ("drm/amdgpu: Avoid HW GPU reset for RAS.")
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 
> >>   1 file changed, 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> index 67f66f2f1809..600401f2a98f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >> @@ -285,10 +285,6 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
> >> drm_gpu_scheduler *sched)
> >>   /* Signal all jobs not yet scheduled */
> >>   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >=
> >> DRM_SCHED_PRIORITY_MIN; i--) {
> >>   struct drm_sched_rq *rq = >sched_rq[i];
> >> -
> >> -if (!rq)
> >> -continue;
> >> -
> >>   spin_lock(>lock);
> >>   list_for_each_entry(s_entity, >entities, list) {
> >>   while ((s_job =
> >> to_drm_sched_job(spsc_queue_pop(_entity->job_queue {
>


[PATCH 5.19 0310/1157] drm/mgag200: Acquire I/O lock while reading EDID

2022-08-15 Thread Greg Kroah-Hartman
From: Thomas Zimmermann 

[ Upstream commit 5913ab941d6ea782e841234c76958c6872ea752d ]

DDC operation conflicts with concurrent mode setting. Acquire the
driver's I/O lock in get_modes to prevent this. This change should
have been part of commit 931e3f3a0e99 ("drm/mgag200: Protect
concurrent access to I/O registers with lock"), but apparently got
lost somewhere.

v3:
* fix commit message to say 'drm/mgag200' (Jocelyn)

Signed-off-by: Thomas Zimmermann 
Fixes: 931e3f3a0e99 ("drm/mgag200: Protect concurrent access to I/O registers 
with lock")
Reviewed-by: Jocelyn Falempe 
Tested-by: Jocelyn Falempe 
Cc: Thomas Zimmermann 
Cc: Jocelyn Falempe 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: dri-devel@lists.freedesktop.org
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220516134343.6085-2-tzimmerm...@suse.de
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/mgag200/mgag200_mode.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/mgag200/mgag200_mode.c 
b/drivers/gpu/drm/mgag200/mgag200_mode.c
index abde7655477d..4ad8d62c5631 100644
--- a/drivers/gpu/drm/mgag200/mgag200_mode.c
+++ b/drivers/gpu/drm/mgag200/mgag200_mode.c
@@ -667,16 +667,26 @@ static void mgag200_disable_display(struct mga_device 
*mdev)
 
 static int mga_vga_get_modes(struct drm_connector *connector)
 {
+   struct mga_device *mdev = to_mga_device(connector->dev);
struct mga_connector *mga_connector = to_mga_connector(connector);
struct edid *edid;
int ret = 0;
 
+   /*
+* Protect access to I/O registers from concurrent modesetting
+* by acquiring the I/O-register lock.
+*/
+   mutex_lock(>rmmio_lock);
+
edid = drm_get_edid(connector, _connector->i2c->adapter);
if (edid) {
drm_connector_update_edid_property(connector, edid);
ret = drm_add_edid_modes(connector, edid);
kfree(edid);
}
+
+   mutex_unlock(>rmmio_lock);
+
return ret;
 }
 
-- 
2.35.1





[PATCH 5.19 0305/1157] dt-bindings: display: bridge: ldb: Fill in reg property

2022-08-15 Thread Greg Kroah-Hartman
From: Marek Vasut 

[ Upstream commit 16c8d76abe83d75b578d72ee22d25a52c764e14a ]

Add missing reg and reg-names properties for both 'LDB_CTRL'
and 'LVDS_CTRL' registers.

Fixes: 463db5c2ed4ae ("drm: bridge: ldb: Implement simple Freescale i.MX8MP LDB 
bridge")
Signed-off-by: Marek Vasut 
Cc: Laurent Pinchart 
Cc: Lucas Stach 
Cc: Maxime Ripard 
Cc: Peng Fan 
Cc: Rob Herring 
Cc: Robby Cai 
Cc: Robert Foss 
Cc: Sam Ravnborg 
Cc: Thomas Zimmermann 
Cc: devicet...@vger.kernel.org
To: dri-devel@lists.freedesktop.org
Reviewed-by: Rob Herring 
Signed-off-by: Robert Foss 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220504012601.423644-1-ma...@denx.de
Signed-off-by: Sasha Levin 
---
 .../bindings/display/bridge/fsl,ldb.yaml | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/bridge/fsl,ldb.yaml 
b/Documentation/devicetree/bindings/display/bridge/fsl,ldb.yaml
index 77f174eee424..2ebaa43eb62e 100644
--- a/Documentation/devicetree/bindings/display/bridge/fsl,ldb.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/fsl,ldb.yaml
@@ -24,6 +24,15 @@ properties:
   clock-names:
 const: ldb
 
+  reg:
+minItems: 2
+maxItems: 2
+
+  reg-names:
+items:
+  - const: ldb
+  - const: lvds
+
   ports:
 $ref: /schemas/graph.yaml#/properties/ports
 
@@ -56,10 +65,15 @@ examples:
 #include 
 
 blk-ctrl {
-bridge {
+#address-cells = <1>;
+#size-cells = <1>;
+
+bridge@5c {
 compatible = "fsl,imx8mp-ldb";
 clocks = < IMX8MP_CLK_MEDIA_LDB>;
 clock-names = "ldb";
+reg = <0x5c 0x4>, <0x128 0x4>;
+reg-names = "ldb", "lvds";
 
 ports {
 #address-cells = <1>;
-- 
2.35.1





[PATCH 5.19 0087/1157] drm/hyperv-drm: Include framebuffer and EDID headers

2022-08-15 Thread Greg Kroah-Hartman
From: Thomas Zimmermann 

commit 009a3a52791f31c57d755a73f6bc66fbdd8bd76c upstream.

Fix a number of compile errors by including the correct header
files. Examples are shown below.

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_blit_to_vram_rect':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:25:48: error: invalid use of 
undefined type 'struct drm_framebuffer'
   25 | struct hyperv_drm_device *hv = to_hv(fb->dev);
  |^~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_connector_get_modes':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:59:17: error: implicit 
declaration of function 'drm_add_modes_noedid' 
[-Werror=implicit-function-declaration]
   59 | count = drm_add_modes_noedid(connector,
  | ^~~~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:62:9: error: implicit 
declaration of function 'drm_set_preferred_mode'; did you mean 
'drm_mm_reserve_node'? [-Werror=implicit-function-declaration]
   62 | drm_set_preferred_mode(connector, hv->preferred_width,
  | ^~

Signed-off-by: Thomas Zimmermann 
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video 
device")
Fixes: 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h")
Fixes: 255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
Cc: Deepak Rawat 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: linux-hyp...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.14+
Acked-by: Maxime Ripard 
Reviewed-by: Ville Syrjälä 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220622083413.12573-1-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/hyperv/hyperv_drm_modeset.c |2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -7,9 +7,11 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 




[PATCH 09/10] drm/amdgpu: add gang submit backend v2

2022-08-15 Thread Christian König
Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

v2: fix logic inversion, improve documentation, fix rcu

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 28 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h|  3 ++
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5a639c857bd0..3ac1e4d05fcb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -885,6 +885,7 @@ struct amdgpu_device {
u64 fence_context;
unsignednum_rings;
struct amdgpu_ring  *rings[AMDGPU_MAX_RINGS];
+   struct dma_fence __rcu  *gang_submit;
boolib_pool_ready;
struct amdgpu_sa_managerib_pools[AMDGPU_IB_POOL_MAX];
struct amdgpu_sched 
gpu_sched[AMDGPU_HW_IP_NUM][AMDGPU_RING_PRIO_MAX];
@@ -1294,6 +1295,8 @@ u32 amdgpu_device_pcie_port_rreg(struct amdgpu_device 
*adev,
u32 reg);
 void amdgpu_device_pcie_port_wreg(struct amdgpu_device *adev,
u32 reg, u32 v);
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang);
 
 /* atpx handler */
 #if defined(CONFIG_VGA_SWITCHEROO)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c84fdef0ac45..23f2938a1fea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3499,6 +3499,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->gmc.gart_size = 512 * 1024 * 1024;
adev->accel_working = false;
adev->num_rings = 0;
+   RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub());
adev->mman.buffer_funcs = NULL;
adev->mman.buffer_funcs_ring = NULL;
adev->vm_manager.vm_pte_funcs = NULL;
@@ -3979,6 +3980,7 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
release_firmware(adev->firmware.gpu_info_fw);
adev->firmware.gpu_info_fw = NULL;
adev->accel_working = false;
+   dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));
 
amdgpu_reset_fini(adev);
 
@@ -5914,3 +5916,36 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device 
*adev,
(void)RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
 }
+
+/**
+ * amdgpu_device_switch_gang - switch to a new gang
+ * @adev: amdgpu_device pointer
+ * @gang: the gang to switch to
+ *
+ * Try to switch to a new gang.
+ * Returns: NULL if we switched to the new gang or a reference to the current
+ * gang leader.
+ */
+struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
+   struct dma_fence *gang)
+{
+   struct dma_fence *old = NULL;
+
+   do {
+   dma_fence_put(old);
+   rcu_read_lock();
+   old = dma_fence_get_rcu_safe(>gang_submit);
+   rcu_read_unlock();
+
+   if (old == gang)
+   break;
+
+   if (!dma_fence_is_signaled(old))
+   return old;
+
+   } while (cmpxchg((struct dma_fence __force **)>gang_submit,
+old, gang) != old);
+
+   dma_fence_put(old);
+   return NULL;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 2348beea6a2e..e4b791cdda2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -173,11 +173,29 @@ static void amdgpu_job_free_cb(struct drm_sched_job 
*s_job)
dma_fence_put(>hw_fence);
 }
 
+void amdgpu_job_set_gang_leader(struct amdgpu_job *job,
+   struct amdgpu_job *leader)
+{
+   struct dma_fence *fence = >base.s_fence->scheduled;
+
+   WARN_ON(job->gang_submit);
+
+   /*
+* Don't add a reference when we are the gang leader to avoid circle
+* dependency.
+*/
+   if (job != leader)
+   dma_fence_get(fence);
+   job->gang_submit = fence;
+}
+
 void amdgpu_job_free(struct amdgpu_job *job)
 {
amdgpu_job_free_resources(job);
amdgpu_sync_free(>sync);
amdgpu_sync_free(>sched_sync);
+   if (job->gang_submit != >base.s_fence->scheduled)
+   

[PATCH 06/10] drm/amdgpu: move setting the job resources

2022-08-15 Thread Christian König
Move setting the job resources into amdgpu_job.c

Signed-off-by: Christian König 
Reviewed-by: Andrey Grodzovsky 
Reviewed-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  2 ++
 3 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index dfb7b4f46bc3..88f491dc7ca2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -828,9 +828,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
struct amdgpu_vm *vm = >vm;
struct amdgpu_bo_list_entry *e;
struct list_head duplicates;
-   struct amdgpu_bo *gds;
-   struct amdgpu_bo *gws;
-   struct amdgpu_bo *oa;
int r;
 
INIT_LIST_HEAD(>validated);
@@ -947,22 +944,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
 p->bytes_moved_vis);
 
-   gds = p->bo_list->gds_obj;
-   gws = p->bo_list->gws_obj;
-   oa = p->bo_list->oa_obj;
-
-   if (gds) {
-   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
-   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
-   }
-   if (gws) {
-   p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
-   p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
-   }
-   if (oa) {
-   p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
-   p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
-   }
+   amdgpu_job_set_resources(p->job, p->bo_list->gds_obj,
+p->bo_list->gws_obj, p->bo_list->oa_obj);
 
if (p->uf_entry.tv.bo) {
struct amdgpu_bo *uf = ttm_to_amdgpu_bo(p->uf_entry.tv.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d5b737c6dbbf..2348beea6a2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -132,6 +132,23 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, 
unsigned size,
return r;
 }
 
+void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
+ struct amdgpu_bo *gws, struct amdgpu_bo *oa)
+{
+   if (gds) {
+   job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
+   job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
+   }
+   if (gws) {
+   job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
+   job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
+   }
+   if (oa) {
+   job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
+   job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
+   }
+}
+
 void amdgpu_job_free_resources(struct amdgpu_job *job)
 {
struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index babc0af751c2..2a1961bf1194 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -76,6 +76,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
 struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
enum amdgpu_ib_pool_type pool, struct amdgpu_job **job);
+void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds,
+ struct amdgpu_bo *gws, struct amdgpu_bo *oa);
 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
 int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
-- 
2.25.1



[PATCH 05/10] drm/amdgpu: remove SRIOV and MCBP dependencies from the CS

2022-08-15 Thread Christian König
We should not have any different CS constrains based
on the execution environment.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b9de631a66a3..dfb7b4f46bc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -323,8 +323,7 @@ static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
return -EINVAL;
 
if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
-   chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT &&
-   (amdgpu_mcbp || amdgpu_sriov_vf(p->adev))) {
+   chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
(*ce_preempt)++;
else
@@ -1084,7 +1083,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
if (r)
return r;
 
-   if (amdgpu_mcbp || amdgpu_sriov_vf(adev)) {
+   if (fpriv->csa_va) {
bo_va = fpriv->csa_va;
r = amdgpu_vm_bo_update(adev, bo_va, false);
if (r)
-- 
2.25.1



[PATCH 07/10] drm/amdgpu: revert "fix limiting AV1 to the first instance on VCN3"

2022-08-15 Thread Christian König
This reverts commit 250195ff744f260c169f5427422b6f39c58cb883.

The job should now be initialized when we reach the parser functions.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 39405f0db824..3cabceee5f57 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1761,21 +1761,23 @@ static const struct amdgpu_ring_funcs 
vcn_v3_0_dec_sw_ring_vm_funcs = {
.emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
 };
 
-static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p)
+static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p,
+   struct amdgpu_job *job)
 {
struct drm_gpu_scheduler **scheds;
 
/* The create msg must be in the first IB submitted */
-   if (atomic_read(>entity->fence_seq))
+   if (atomic_read(>base.entity->fence_seq))
return -EINVAL;
 
scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC]
[AMDGPU_RING_PRIO_DEFAULT].sched;
-   drm_sched_entity_modify_sched(p->entity, scheds, 1);
+   drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
return 0;
 }
 
-static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr)
+static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
+   uint64_t addr)
 {
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_bo_va_mapping *map;
@@ -1846,7 +1848,7 @@ static int vcn_v3_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11)
continue;
 
-   r = vcn_v3_0_limit_sched(p);
+   r = vcn_v3_0_limit_sched(p, job);
if (r)
goto out;
}
@@ -1860,7 +1862,7 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct 
amdgpu_cs_parser *p,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib)
 {
-   struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched);
+   struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
uint32_t msg_lo = 0, msg_hi = 0;
unsigned i;
int r;
@@ -1879,7 +1881,8 @@ static int vcn_v3_0_ring_patch_cs_in_place(struct 
amdgpu_cs_parser *p,
msg_hi = val;
} else if (reg == PACKET0(p->adev->vcn.internal.cmd, 0) &&
   val == 0) {
-   r = vcn_v3_0_dec_msg(p, ((u64)msg_hi) << 32 | msg_lo);
+   r = vcn_v3_0_dec_msg(p, job,
+((u64)msg_hi) << 32 | msg_lo);
if (r)
return r;
}
-- 
2.25.1



[PATCH 08/10] drm/amdgpu: cleanup instance limit on VCN4

2022-08-15 Thread Christian König
Similar to what we did for VCN3 use the job instead of the parser
entity. Cleanup the coding style quite a bit as well.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 46 +++
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..a59418ff9c65 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1328,21 +1328,23 @@ static void vcn_v4_0_unified_ring_set_wptr(struct 
amdgpu_ring *ring)
}
 }
 
-static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p)
+static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p,
+   struct amdgpu_job *job)
 {
struct drm_gpu_scheduler **scheds;
 
/* The create msg must be in the first IB submitted */
-   if (atomic_read(>entity->fence_seq))
+   if (atomic_read(>base.entity->fence_seq))
return -EINVAL;
 
-   scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC]
-   [AMDGPU_RING_PRIO_0].sched;
-   drm_sched_entity_modify_sched(p->entity, scheds, 1);
+   scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC]
+   [AMDGPU_RING_PRIO_DEFAULT].sched;
+   drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
return 0;
 }
 
-static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, uint64_t addr)
+static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, struct amdgpu_job *job,
+   uint64_t addr)
 {
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_bo_va_mapping *map;
@@ -1413,7 +1415,7 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
if (create[0] == 0x7 || create[0] == 0x10 || create[0] == 0x11)
continue;
 
-   r = vcn_v4_0_limit_sched(p);
+   r = vcn_v4_0_limit_sched(p, job);
if (r)
goto out;
}
@@ -1426,32 +1428,34 @@ static int vcn_v4_0_dec_msg(struct amdgpu_cs_parser *p, 
uint64_t addr)
 #define RADEON_VCN_ENGINE_TYPE_DECODE 
(0x0003)
 
 static int vcn_v4_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
-   struct amdgpu_job *job,
-   struct amdgpu_ib *ib)
+  struct amdgpu_job *job,
+  struct amdgpu_ib *ib)
 {
-   struct amdgpu_ring *ring = to_amdgpu_ring(p->entity->rq->sched);
-   struct amdgpu_vcn_decode_buffer *decode_buffer = NULL;
+   struct amdgpu_ring *ring = to_amdgpu_ring(job->base.entity->rq->sched);
+   struct amdgpu_vcn_decode_buffer *decode_buffer;
+   uint64_t addr;
uint32_t val;
-   int r = 0;
 
/* The first instance can decode anything */
if (!ring->me)
-   return r;
+   return 0;
 
/* unified queue ib header has 8 double words. */
if (ib->length_dw < 8)
-   return r;
+   return 0;
 
val = amdgpu_ib_get_value(ib, 6); //RADEON_VCN_ENGINE_TYPE
+   if (val != RADEON_VCN_ENGINE_TYPE_DECODE)
+   return 0;
 
-   if (val == RADEON_VCN_ENGINE_TYPE_DECODE) {
-   decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10];
+   decode_buffer = (struct amdgpu_vcn_decode_buffer *)>ptr[10];
 
-   if (decode_buffer->valid_buf_flag  & 0x1)
-   r = vcn_v4_0_dec_msg(p, 
((u64)decode_buffer->msg_buffer_address_hi) << 32 |
-   
decode_buffer->msg_buffer_address_lo);
-   }
-   return r;
+   if (!(decode_buffer->valid_buf_flag  & 0x1))
+   return 0;
+
+   addr = ((u64)decode_buffer->msg_buffer_address_hi) << 32 |
+   decode_buffer->msg_buffer_address_lo;
+   return vcn_v4_0_dec_msg(p, job, addr);
 }
 
 static const struct amdgpu_ring_funcs vcn_v4_0_unified_ring_vm_funcs = {
-- 
2.25.1



[PATCH 10/10] drm/amdgpu: add gang submit frontend v3

2022-08-15 Thread Christian König
Allows submitting jobs as gang which needs to run on multiple engines at the
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no
joining of work at the end.

v2: rebase and fix review comments from Andrey and Yogesh
v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang
leader only when necessary

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 258 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h |  12 +-
 3 files changed, 185 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 88f491dc7ca2..21f0a6c08eb4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -69,6 +69,7 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
   unsigned int *num_ibs)
 {
struct drm_sched_entity *entity;
+   unsigned int i;
int r;
 
r = amdgpu_ctx_get_entity(p->ctx, chunk_ib->ip_type,
@@ -77,17 +78,28 @@ static int amdgpu_cs_p1_ib(struct amdgpu_cs_parser *p,
if (r)
return r;
 
-   /* Abort if there is no run queue associated with this entity.
-* Possibly because of disabled HW IP*/
+   /*
+* Abort if there is no run queue associated with this entity.
+* Possibly because of disabled HW IP.
+*/
if (entity->rq == NULL)
return -EINVAL;
 
-   /* Currently we don't support submitting to multiple entities */
-   if (p->entity && p->entity != entity)
+   /* Check if we can add this IB to some existing job */
+   for (i = 0; i < p->gang_size; ++i) {
+   if (p->entities[i] == entity)
+   goto found;
+   }
+
+   /* If not increase the gang size if possible */
+   if (i == AMDGPU_CS_GANG_SIZE)
return -EINVAL;
 
-   p->entity = entity;
-   ++(*num_ibs);
+   p->entities[i] = entity;
+   p->gang_size = i + 1;
+
+found:
+   ++(num_ibs[i]);
return 0;
 }
 
@@ -161,11 +173,12 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
   union drm_amdgpu_cs *cs)
 {
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+   unsigned int num_ibs[AMDGPU_CS_GANG_SIZE] = { };
struct amdgpu_vm *vm = >vm;
uint64_t *chunk_array_user;
uint64_t *chunk_array;
-   unsigned size, num_ibs = 0;
uint32_t uf_offset = 0;
+   unsigned int size;
int ret;
int i;
 
@@ -231,7 +244,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
if (size < sizeof(struct drm_amdgpu_cs_chunk_ib))
goto free_partial_kdata;
 
-   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, _ibs);
+   ret = amdgpu_cs_p1_ib(p, p->chunks[i].kdata, num_ibs);
if (ret)
goto free_partial_kdata;
break;
@@ -268,21 +281,28 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
}
}
 
-   ret = amdgpu_job_alloc(p->adev, num_ibs, >job, vm);
-   if (ret)
-   goto free_all_kdata;
+   if (!p->gang_size)
+   return -EINVAL;
 
-   ret = drm_sched_job_init(>job->base, p->entity, >vm);
-   if (ret)
-   goto free_all_kdata;
+   for (i = 0; i < p->gang_size; ++i) {
+   ret = amdgpu_job_alloc(p->adev, num_ibs[i], >jobs[i], vm);
+   if (ret)
+   goto free_all_kdata;
+
+   ret = drm_sched_job_init(>jobs[i]->base, p->entities[i],
+>vm);
+   if (ret)
+   goto free_all_kdata;
+   }
+   p->gang_leader = p->jobs[p->gang_size - 1];
 
-   if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
+   if (p->ctx->vram_lost_counter != p->gang_leader->vram_lost_counter) {
ret = -ECANCELED;
goto free_all_kdata;
}
 
if (p->uf_entry.tv.bo)
-   p->job->uf_addr = uf_offset;
+   p->gang_leader->uf_addr = uf_offset;
kvfree(chunk_array);
 
/* Use this opportunity to fill in task info for the vm */
@@ -304,22 +324,18 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
return ret;
 }
 
-static int amdgpu_cs_p2_ib(struct amdgpu_cs_parser *p,
-  struct amdgpu_cs_chunk *chunk,
-  unsigned int *num_ibs,
-   

[PATCH 01/10] drm/sched: move calling drm_sched_entity_select_rq

2022-08-15 Thread Christian König
We already discussed that the call to drm_sched_entity_select_rq() needs
to move to drm_sched_job_arm() to be able to set a new scheduler list
between _init() and _arm(). This was just not applied for some reason.

Signed-off-by: Christian König 
Reviewed-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/sched_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 68317d3a7a27..e0ab14e0fb6b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -592,7 +592,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
   struct drm_sched_entity *entity,
   void *owner)
 {
-   drm_sched_entity_select_rq(entity);
if (!entity->rq)
return -ENOENT;
 
@@ -628,7 +627,7 @@ void drm_sched_job_arm(struct drm_sched_job *job)
struct drm_sched_entity *entity = job->entity;
 
BUG_ON(!entity);
-
+   drm_sched_entity_select_rq(entity);
sched = entity->rq->sched;
 
job->sched = sched;
-- 
2.25.1



[PATCH 02/10] drm/amdgpu: revert "partial revert "remove ctx->lock" v2"

2022-08-15 Thread Christian König
This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.

We found that the bo_list is missing a protection for its list entries.
Since that is fixed now this workaround can be removed again.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
 3 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..a3b8400c914e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -128,8 +128,6 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
goto free_chunk;
}
 
-   mutex_lock(>ctx->lock);
-
/* skip guilty context job */
if (atomic_read(>ctx->guilty) == 1) {
ret = -ECANCELED;
@@ -708,7 +706,6 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
dma_fence_put(parser->fence);
 
if (parser->ctx) {
-   mutex_unlock(>ctx->lock);
amdgpu_ctx_put(parser->ctx);
}
if (parser->bo_list)
@@ -1161,9 +1158,6 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
 {
int i, r;
 
-   /* TODO: Investigate why we still need the context lock */
-   mutex_unlock(>ctx->lock);
-
for (i = 0; i < p->nchunks; ++i) {
struct amdgpu_cs_chunk *chunk;
 
@@ -1174,34 +1168,32 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
r = amdgpu_cs_process_fence_dep(p, chunk);
if (r)
-   goto out;
+   return r;
break;
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
if (r)
-   goto out;
+   return r;
break;
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
r = amdgpu_cs_process_syncobj_out_dep(p, chunk);
if (r)
-   goto out;
+   return r;
break;
case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
r = amdgpu_cs_process_syncobj_timeline_in_dep(p, chunk);
if (r)
-   goto out;
+   return r;
break;
case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
r = amdgpu_cs_process_syncobj_timeline_out_dep(p, 
chunk);
if (r)
-   goto out;
+   return r;
break;
}
}
 
-out:
-   mutex_lock(>ctx->lock);
-   return r;
+   return 0;
 }
 
 static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
@@ -1363,7 +1355,6 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
goto out;
 
r = amdgpu_cs_submit(, cs);
-
 out:
amdgpu_cs_parser_fini(, r, reserved_buffers);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8ee4e8491f39..168337d8d4cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -315,7 +315,6 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
int32_t priority,
kref_init(>refcount);
ctx->mgr = mgr;
spin_lock_init(>ring_lock);
-   mutex_init(>lock);
 
ctx->reset_counter = atomic_read(>adev->gpu_reset_counter);
ctx->reset_counter_query = ctx->reset_counter;
@@ -407,7 +406,6 @@ static void amdgpu_ctx_fini(struct kref *ref)
drm_dev_exit(idx);
}
 
-   mutex_destroy(>lock);
kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index cc7c8afff414..0fa0e56daf67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -53,7 +53,6 @@ struct amdgpu_ctx {
boolpreamble_presented;
int32_t init_priority;
int32_t override_priority;
-   struct mutexlock;
atomic_tguilty;
unsigned long   ras_counter_ce;
unsigned long   ras_counter_ue;
-- 
2.25.1



Latest gang submit patches

2022-08-15 Thread Christian König
So I think we can push this now. Alex, can you take a look the the
remaining patches which don't have any rb yet.

Thanks,
Christian.




[PATCH 5.18 0079/1095] drm/hyperv-drm: Include framebuffer and EDID headers

2022-08-15 Thread Greg Kroah-Hartman
From: Thomas Zimmermann 

commit 009a3a52791f31c57d755a73f6bc66fbdd8bd76c upstream.

Fix a number of compile errors by including the correct header
files. Examples are shown below.

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_blit_to_vram_rect':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:25:48: error: invalid use of 
undefined type 'struct drm_framebuffer'
   25 | struct hyperv_drm_device *hv = to_hv(fb->dev);
  |^~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_connector_get_modes':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:59:17: error: implicit 
declaration of function 'drm_add_modes_noedid' 
[-Werror=implicit-function-declaration]
   59 | count = drm_add_modes_noedid(connector,
  | ^~~~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:62:9: error: implicit 
declaration of function 'drm_set_preferred_mode'; did you mean 
'drm_mm_reserve_node'? [-Werror=implicit-function-declaration]
   62 | drm_set_preferred_mode(connector, hv->preferred_width,
  | ^~

Signed-off-by: Thomas Zimmermann 
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video 
device")
Fixes: 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h")
Fixes: 255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
Cc: Deepak Rawat 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: linux-hyp...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.14+
Acked-by: Maxime Ripard 
Reviewed-by: Ville Syrjälä 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220622083413.12573-1-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/hyperv/hyperv_drm_modeset.c |2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -7,9 +7,11 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 




Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Laurent Pinchart
Hi Mark,

On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> 
> > - With devres, you don't have full control over the order in which
> >   resources will be released, which means that you can't control the
> >   power off sequence, in particular if it needs to be sequenced with
> >   GPIOs and clocks. That's not a concern for all drivers, but this API
> >   will creep in in places where it shouldn't be used, driver authours
> >   should really pay attention to power management and not live with the
> >   false impression that everything will be handled automatically for
> >   them. In the worst cases, an incorrect power off sequence could lead
> >   to hardware damage.
> 
> I basically agree with these concerns which is why I was only happy with
> this API when Matti suggested doing it in a way that meant that the
> callers are unable to access the regulator at runtime, this means that
> if anyone wants to do any kind of management of the power state outside
> of probe and remove they are forced to convert to the full fat APIs.
> The general ordering concern with devm is that the free happens too late
> but for the most part this isn't such a concern with regulators, they
> might have delayed power off anyway due to sharing - it's no worse than
> memory allocation AFAICT.  Given all the other APIs using devm it's
> probably going to end up fixing some bugs.
> 
> For sequencing I'm not convinced it's much worse than the bulk API is
> anyway, and practically speaking I expect most devices that have
> problems here will also need more control over power anyway - it's
> certainly the common case that hardware has pretty basic requirements
> and is fairly tolerant.

I'm not extremely concerned here at the moment, as power should be the
last thing to be turned off, after clocks and reset signals. As clocks
and GPIOs will still be controlled manually in the driver .remove()
function, it means that power will go last, which should be fine.
However, should a devm_clk_get_enable() or similar function be
implemented, we'll run into trouble. Supplying active high input signals
to a device that is not powered can lead to latch-up, which tends to
only manifest after a statistically significant number of occurrences of
the condition, and can slowly damage the hardware over time. This is a
real concern as it will typically not be caught during early
development. I think we would still be better off with requiring drivers
to manually handle powering off the device until we provide a mechanism
that can do so safely in an automated way.

> > - Powering regulators on at probe time and leaving them on is a very bad
> >   practice from a power management point of view, and should really be
> >   discouraged. Adding convenience helpers to make this easy is the wrong
> >   message, we should instead push driver authors to implement proper
> >   runtime PM.
> 
> The stick simply isn't working here as far as I can see.

Do you think there's no way we can get it to work, instead of giving up
and adding an API that goes in the wrong direction ? :-( I'll give a
talk about the dangers of devm_* at the kernel summit, this is something
I can mention to raise awareness of the issue among maintainers,
hopefully leading to improvements through better reviews.

-- 
Regards,

Laurent Pinchart


[RFC PATCH 00/14] QAIC DRM accelerator driver

2022-08-15 Thread Jeffrey Hugo
This patchset introduces a Linux Kernel driver (QAIC - Qualcomm AIC) for the
Qualcomm Cloud AI 100 product (AIC100).

Qualcomm Cloud AI 100 is a PCIe adapter card that hosts a dedicated machine
learning inference accelerator.  Tons of documentation in the first patch of
the series.

The driver was a misc device until recently.  In accordance with the 2021
Ksummit (per LWN), it has been converted to a DRM driver due to the use of
dma_buf.

For historical purposes, the last revision that was on list is:
https://lore.kernel.org/all/1589897645-17088-1-git-send-email-jh...@codeaurora.org/
The driver has evolved quite a bit in the two years since.

Regarding the open userspace, it is currently a work in progress (WIP) but will
be delivered.  The motivation for this RFC series is to get some early feedback
on the driver since Daniel Vetter and David Airlie indicated that would good
idea while the userspace is being worked on.

We are a bit new to the DRM area, and appreciate all guidence/feedback.

Questions we are hoping to get an answer to:

1. Does Qualcomm Cloud AI 100 fit in DRM?

2. Would a "QAIC" directory in the GPU documentation be acceptable?
   We'd like to split up the documentation into multiple files as we feel that
   would make it more organized.  It looks like only AMD has a directory,
   everyone else has a single file.

Things that are still a todo (in no particular order):

-Open userspace (see above)

-Figure out what to do with the device partitioning feature.  The uAPI for it
 is clunky.  Seems like perhaps it should fall under a cgroup.  The intent is
 to start a discussion over in the cgroup area to see what the experts say.

-Add proper documentation for our sysfs additions

-Extend the driver to export a few of the MHI channels to userspace.  We are
 currently using an old driver which was proposed and rejected.  Need to
 refactor and make something QAIC specific.

-Covert the documentation (patch 1) to proper rst syntax

Jeffrey Hugo (14):
  drm/qaic: Add documentation for AIC100 accelerator driver
  drm/qaic: Add uapi and core driver file
  drm/qaic: Add qaic.h internal header
  drm/qaic: Add MHI controller
  drm/qaic: Add control path
  drm/qaic: Add datapath
  drm/qaic: Add debugfs
  drm/qaic: Add RAS component
  drm/qaic: Add ssr component
  drm/qaic: Add sysfs
  drm/qaic: Add telemetry
  drm/qaic: Add tracepoints
  drm/qaic: Add qaic driver to the build system
  MAINTAINERS: Add entry for QAIC driver

 Documentation/gpu/drivers.rst |1 +
 Documentation/gpu/qaic.rst|  567 +
 MAINTAINERS   |7 +
 drivers/gpu/drm/Kconfig   |2 +
 drivers/gpu/drm/Makefile  |1 +
 drivers/gpu/drm/qaic/Kconfig  |   33 +
 drivers/gpu/drm/qaic/Makefile |   17 +
 drivers/gpu/drm/qaic/mhi_controller.c |  575 +
 drivers/gpu/drm/qaic/mhi_controller.h |   18 +
 drivers/gpu/drm/qaic/qaic.h   |  396 ++
 drivers/gpu/drm/qaic/qaic_control.c   | 1788 +++
 drivers/gpu/drm/qaic/qaic_data.c  | 2152 +
 drivers/gpu/drm/qaic/qaic_debugfs.c   |  335 +
 drivers/gpu/drm/qaic/qaic_debugfs.h   |   33 +
 drivers/gpu/drm/qaic/qaic_drv.c   |  825 +
 drivers/gpu/drm/qaic/qaic_ras.c   |  653 ++
 drivers/gpu/drm/qaic/qaic_ras.h   |   11 +
 drivers/gpu/drm/qaic/qaic_ssr.c   |  889 ++
 drivers/gpu/drm/qaic/qaic_ssr.h   |   13 +
 drivers/gpu/drm/qaic/qaic_sysfs.c |  113 ++
 drivers/gpu/drm/qaic/qaic_telemetry.c |  851 +
 drivers/gpu/drm/qaic/qaic_telemetry.h |   14 +
 drivers/gpu/drm/qaic/qaic_trace.h |  493 
 include/uapi/drm/qaic_drm.h   |  283 +
 24 files changed, 10070 insertions(+)
 create mode 100644 Documentation/gpu/qaic.rst
 create mode 100644 drivers/gpu/drm/qaic/Kconfig
 create mode 100644 drivers/gpu/drm/qaic/Makefile
 create mode 100644 drivers/gpu/drm/qaic/mhi_controller.c
 create mode 100644 drivers/gpu/drm/qaic/mhi_controller.h
 create mode 100644 drivers/gpu/drm/qaic/qaic.h
 create mode 100644 drivers/gpu/drm/qaic/qaic_control.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_data.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_debugfs.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_debugfs.h
 create mode 100644 drivers/gpu/drm/qaic/qaic_drv.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_ras.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_ras.h
 create mode 100644 drivers/gpu/drm/qaic/qaic_ssr.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_ssr.h
 create mode 100644 drivers/gpu/drm/qaic/qaic_sysfs.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_telemetry.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_telemetry.h
 create mode 100644 drivers/gpu/drm/qaic/qaic_trace.h
 create mode 100644 include/uapi/drm/qaic_drm.h

-- 
2.7.4



[RFC PATCH 11/14] drm/qaic: Add telemetry

2022-08-15 Thread Jeffrey Hugo
A QAIC device has a number of attributes like thermal limits which can be
read and in some cases, controlled from the host.  Expose these attributes
via hwmon.  Use the pre-defined interface where possible, but define
custom interfaces where it is not possible.

Change-Id: I3b559baed4016e27457658c9286f4c529f95dbbb
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_telemetry.c | 851 ++
 drivers/gpu/drm/qaic/qaic_telemetry.h |  14 +
 2 files changed, 865 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_telemetry.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_telemetry.h

diff --git a/drivers/gpu/drm/qaic/qaic_telemetry.c 
b/drivers/gpu/drm/qaic/qaic_telemetry.c
new file mode 100644
index 000..44950d1
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_telemetry.c
@@ -0,0 +1,851 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qaic.h"
+#include "qaic_telemetry.h"
+
+#if defined(CONFIG_QAIC_HWMON)
+
+#define MAGIC  0x55AA
+#define VERSION0x1
+#define RESP_TIMEOUT   (1 * HZ)
+
+enum cmds {
+   CMD_THERMAL_SOC_TEMP,
+   CMD_THERMAL_SOC_MAX_TEMP,
+   CMD_THERMAL_BOARD_TEMP,
+   CMD_THERMAL_BOARD_MAX_TEMP,
+   CMD_THERMAL_DDR_TEMP,
+   CMD_THERMAL_WARNING_TEMP,
+   CMD_THERMAL_SHUTDOWN_TEMP,
+   CMD_CURRENT_TDP,
+   CMD_BOARD_POWER,
+   CMD_POWER_STATE,
+   CMD_POWER_MAX,
+   CMD_THROTTLE_PERCENT,
+   CMD_THROTTLE_TIME,
+   CMD_UPTIME,
+   CMD_THERMAL_SOC_FLOOR_TEMP,
+   CMD_THERMAL_SOC_CEILING_TEMP,
+};
+
+enum cmd_type {
+   TYPE_READ,  /* read value from device */
+   TYPE_WRITE, /* write value to device */
+};
+
+enum msg_type {
+   MSG_PUSH, /* async push from device */
+   MSG_REQ,  /* sync request to device */
+   MSG_RESP, /* sync response from device */
+};
+
+struct telemetry_data {
+   u8  cmd;
+   u8  cmd_type;
+   u8  status;
+   __le64  val; /*signed*/
+} __packed;
+
+struct telemetry_header {
+   __le16  magic;
+   __le16  ver;
+   __le32  seq_num;
+   u8  type;
+   u8  id;
+   __le16  len;
+} __packed;
+
+struct telemetry_msg { /* little endian encoded */
+   struct telemetry_header hdr;
+   struct telemetry_data data;
+} __packed;
+
+struct wrapper_msg {
+   struct kref ref_count;
+   struct telemetry_msg msg;
+};
+
+struct xfer_queue_elem {
+   /*
+* Node in list of ongoing transfer request on telemetry channel.
+* Maintained by root device struct
+*/
+   struct list_head list;
+   /* Sequence number of this transfer request */
+   u32 seq_num;
+   /* This is used to wait on until completion of transfer request */
+   struct completion xfer_done;
+   /* Received data from device */
+   void *buf;
+};
+
+struct resp_work {
+   /* Work struct to schedule work coming on QAIC_TELEMETRY channel */
+   struct work_struct work;
+   /* Root struct of device, used to access device resources */
+   struct qaic_device *qdev;
+   /* Buffer used by MHI for transfer requests */
+   void *buf;
+};
+
+static void free_wrapper(struct kref *ref)
+{
+   struct wrapper_msg *wrapper = container_of(ref, struct wrapper_msg,
+  ref_count);
+
+   kfree(wrapper);
+}
+
+static int telemetry_request(struct qaic_device *qdev, u8 cmd, u8 cmd_type,
+s64 *val)
+{
+   struct wrapper_msg *wrapper;
+   struct xfer_queue_elem elem;
+   struct telemetry_msg *resp;
+   struct telemetry_msg *req;
+   long ret = 0;
+
+   wrapper = kzalloc(sizeof(*wrapper), GFP_KERNEL);
+   if (!wrapper)
+   return -ENOMEM;
+
+   kref_init(>ref_count);
+   req = >msg;
+
+   ret = mutex_lock_interruptible(>tele_mutex);
+   if (ret)
+   goto free_req;
+
+   req->hdr.magic = cpu_to_le16(MAGIC);
+   req->hdr.ver = cpu_to_le16(VERSION);
+   req->hdr.seq_num = cpu_to_le32(qdev->tele_next_seq_num++);
+   req->hdr.type = MSG_REQ;
+   req->hdr.id = 0;
+   req->hdr.len = cpu_to_le16(sizeof(req->data));
+
+   req->data.cmd = cmd;
+   req->data.cmd_type = cmd_type;
+   req->data.status = 0;
+   if (cmd_type == TYPE_READ)
+   req->data.val = cpu_to_le64(0);
+   else
+   req->data.val = cpu_to_le64(*val);
+
+   elem.seq_num = qdev->tele_next_seq_num - 1;
+   elem.buf = NULL;
+   init_completion(_done);
+   if (likely(!qdev->tele_lost_buf)) {
+   resp = kmalloc(sizeof(*resp), GFP_KERNEL);
+   if (!resp) {
+

[RFC PATCH 07/14] drm/qaic: Add debugfs

2022-08-15 Thread Jeffrey Hugo
Add debugfs entries that dump information about the dma_bridge fifo state
and also the SBL boot log.

Change-Id: Ib46b84c07c25afcf0ac2c73304cf6275689d002e
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_debugfs.c | 335 
 drivers/gpu/drm/qaic/qaic_debugfs.h |  33 
 2 files changed, 368 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_debugfs.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_debugfs.h

diff --git a/drivers/gpu/drm/qaic/qaic_debugfs.c 
b/drivers/gpu/drm/qaic/qaic_debugfs.c
new file mode 100644
index 000..82478e3
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_debugfs.c
@@ -0,0 +1,335 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "qaic.h"
+#include "qaic_debugfs.h"
+
+#define BOOTLOG_POOL_SIZE 16
+#define BOOTLOG_MSG_SIZE  512
+
+struct bootlog_msg {
+   /* Buffer for bootlog messages */
+   char str[BOOTLOG_MSG_SIZE];
+   /* Root struct of device, used to access device resources */
+   struct qaic_device *qdev;
+   /* Work struct to schedule work coming on QAIC_LOGGING channel */
+   struct work_struct work;
+};
+
+struct bootlog_page {
+   /* Node in list of bootlog pages maintained by root device struct */
+   struct list_head node;
+   /* Total size of the buffer that holds the bootlogs. It is PAGE_SIZE */
+   unsigned int size;
+   /* Offset for the next bootlog */
+   unsigned int offset;
+};
+
+static int bootlog_show(struct seq_file *s, void *data)
+{
+   struct qaic_device *qdev = s->private;
+   struct bootlog_page *page;
+   void *log;
+   void *page_end;
+
+   mutex_lock(>bootlog_mutex);
+   list_for_each_entry(page, >bootlog, node) {
+   log = page + 1;
+   page_end = (void *)page + page->offset;
+   while (log < page_end) {
+   seq_printf(s, "%s", (char *)log);
+   log += strlen(log) + 1;
+   }
+   }
+   mutex_unlock(>bootlog_mutex);
+
+   return 0;
+}
+
+static int bootlog_open(struct inode *inode, struct file *file)
+{
+   struct qaic_device *qdev = inode->i_private;
+
+   return single_open(file, bootlog_show, qdev);
+}
+
+static const struct file_operations bootlog_fops = {
+   .owner = THIS_MODULE,
+   .open = bootlog_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = single_release,
+};
+
+static int read_dbc_fifo_size(void *data, u64 *value)
+{
+   struct dma_bridge_chan *dbc = (struct dma_bridge_chan *)data;
+
+   *value = dbc->nelem;
+   return 0;
+}
+
+static int read_dbc_queued(void *data, u64 *value)
+{
+   struct dma_bridge_chan *dbc = (struct dma_bridge_chan *)data;
+   u32 tail, head;
+
+   qaic_data_get_fifo_info(dbc, , );
+
+   if (head == U32_MAX || tail == U32_MAX)
+   *value = 0;
+   else if (head > tail)
+   *value = dbc->nelem - head + tail;
+   else
+   *value = tail - head;
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(dbc_fifo_size_fops, read_dbc_fifo_size, NULL, 
"%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(dbc_queued_fops, read_dbc_queued, NULL, "%llu\n");
+
+static void qaic_debugfs_add_dbc_entry(struct qaic_device *qdev, uint16_t 
dbc_id,
+  struct dentry *parent)
+{
+   struct dma_bridge_chan *dbc = >dbc[dbc_id];
+   char name[16];
+
+   snprintf(name, 16, "%s%03u", QAIC_DEBUGFS_DBC_PREFIX, dbc_id);
+
+   dbc->debugfs_root = debugfs_create_dir(name, parent);
+
+   debugfs_create_file(QAIC_DEBUGFS_DBC_FIFO_SIZE, 0444, dbc->debugfs_root,
+   dbc, _fifo_size_fops);
+
+   debugfs_create_file(QAIC_DEBUGFS_DBC_QUEUED, 0444, dbc->debugfs_root,
+   dbc, _queued_fops);
+}
+
+void qaic_debugfs_init(struct drm_minor *minor)
+{
+   struct qaic_drm_device *qddev = minor->dev->dev_private;
+   struct qaic_device *qdev = qddev->qdev;
+   int i;
+
+   for (i = 0; i < qdev->num_dbc; ++i)
+   qaic_debugfs_add_dbc_entry(qdev, i, minor->debugfs_root);
+
+   debugfs_create_file("bootlog", 0444, minor->debugfs_root, qdev,
+   _fops);
+}
+
+static struct bootlog_page *alloc_bootlog_page(struct qaic_device *qdev)
+{
+   struct bootlog_page *page;
+
+   page = (struct bootlog_page *)__get_free_page(GFP_KERNEL);
+   if (!page)
+   return page;
+
+   page->size = PAGE_SIZE;
+   page->offset = sizeof(*page);
+   list_add_tail(>node, >bootlog);
+
+   return page;
+}
+
+static int reset_bootlog(struct qaic_device *qdev)
+{
+   struct bootlog_page 

[RFC PATCH 05/14] drm/qaic: Add control path

2022-08-15 Thread Jeffrey Hugo
Add the control path component that talks to the management processor to
load workloads onto the qaic device.  This implements the driver portion
of the NNC protocol.

Change-Id: Ic9c0be41a91532843b78e49b32cf1fcf39faeb9f
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_control.c | 1788 +++
 1 file changed, 1788 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_control.c

diff --git a/drivers/gpu/drm/qaic/qaic_control.c 
b/drivers/gpu/drm/qaic/qaic_control.c
new file mode 100644
index 000..9a8a6b6
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_control.c
@@ -0,0 +1,1788 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "qaic.h"
+#include "qaic_trace.h"
+
+#define MANAGE_MAGIC_NUMBER ((__force __le32)0x43494151) /* "QAIC" in 
little endian */
+#define QAIC_DBC_Q_GAP0x100
+#define QAIC_DBC_Q_BUF_ALIGN  0x1000
+#define QAIC_MANAGE_EXT_MSG_LENGTH SZ_64K /* Max DMA message length */
+#define QAIC_WRAPPER_MAX_SIZE  SZ_4K
+#define QAIC_MHI_RETRY_WAIT_MS100
+#define QAIC_MHI_RETRY_MAX20
+
+static unsigned int control_resp_timeout = 60; /* 60 sec default */
+module_param(control_resp_timeout, uint, 0600);
+
+struct manage_msg {
+   u32 len;
+   u32 count;
+   u8 data[];
+};
+
+/*
+ * wire encoding structures for the manage protocol.
+ * All fields are little endian on the wire
+ */
+struct _msg_hdr {
+   __le32 crc32; /* crc of everything following this field in the message 
*/
+   __le32 magic_number;
+   __le32 sequence_number;
+   __le32 len; /* length of this message */
+   __le32 count; /* number of transactions in this message */
+   __le32 handle; /* unique id to track the resources consumed */
+   __le32 partition_id; /* partition id for the request (signed)*/
+   __le32 padding; /* must be 0 */
+} __packed;
+
+struct _msg {
+   struct _msg_hdr hdr;
+   u8 data[];
+} __packed;
+
+struct _trans_hdr {
+   __le32 type;
+   __le32 len;
+} __packed;
+
+/* Each message sent from driver to device are organized in a list of 
wrapper_msg */
+struct wrapper_msg {
+   struct list_head list;
+   struct kref ref_count;
+   u32 len; /* length of data to transfer */
+   struct wrapper_list *head;
+   union {
+   struct _msg msg;
+   struct _trans_hdr trans;
+   };
+};
+
+struct wrapper_list {
+   struct list_head list;
+   spinlock_t lock;
+};
+
+struct _trans_passthrough {
+   struct _trans_hdr hdr;
+   u8 data[];
+} __packed;
+
+struct _addr_size_pair {
+   __le64 addr;
+   __le64 size;
+} __packed;
+
+struct _trans_dma_xfer {
+   struct _trans_hdr hdr;
+   __le32 tag;
+   __le32 count;
+   __le32 dma_chunk_id;
+   __le32 padding;
+   struct _addr_size_pair data[];
+} __packed;
+
+/* Initiated by device to continue the DMA xfer of a large piece of data */
+struct _trans_dma_xfer_cont {
+   struct _trans_hdr hdr;
+   __le32 dma_chunk_id;
+   __le32 padding;
+   __le64 xferred_size;
+} __packed;
+
+struct _trans_activate_to_dev {
+   struct _trans_hdr hdr;
+   __le64 req_q_addr;
+   __le64 rsp_q_addr;
+   __le32 req_q_size;
+   __le32 rsp_q_size;
+   __le32 buf_len;
+   __le32 options; /* unused, but BIT(16) has meaning to the device */
+} __packed;
+
+struct _trans_activate_from_dev {
+   struct _trans_hdr hdr;
+   __le32 status;
+   __le32 dbc_id;
+   __le64 options; /* unused */
+} __packed;
+
+struct _trans_deactivate_from_dev {
+   struct _trans_hdr hdr;
+   __le32 status;
+   __le32 dbc_id;
+} __packed;
+
+struct _trans_terminate_to_dev {
+   struct _trans_hdr hdr;
+   __le32 handle;
+   __le32 padding;
+} __packed;
+
+struct _trans_terminate_from_dev {
+   struct _trans_hdr hdr;
+   __le32 status;
+   __le32 padding;
+} __packed;
+
+struct _trans_status_to_dev {
+   struct _trans_hdr hdr;
+} __packed;
+
+struct _trans_status_from_dev {
+   struct _trans_hdr hdr;
+   __le16 major;
+   __le16 minor;
+   __le32 status;
+   __le64 status_flags;
+} __packed;
+
+struct _trans_validate_part_to_dev {
+   struct _trans_hdr hdr;
+   __le32 part_id;
+   __le32 padding;
+} __packed;
+
+struct _trans_validate_part_from_dev {
+   struct _trans_hdr hdr;
+   __le32 status;
+   __le32 padding;
+} __packed;
+
+struct xfer_queue_elem {
+   /*
+* Node in list of ongoing transfer request on control channel.
+* Maintained by root device struct
+   

[RFC PATCH 14/14] MAINTAINERS: Add entry for QAIC driver

2022-08-15 Thread Jeffrey Hugo
Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.

Change-Id: I149dbe34f1dbaeeca449b4ebf97f274c7484ed27
Signed-off-by: Jeffrey Hugo 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cd0f68d..695654c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15962,6 +15962,13 @@ F: Documentation/devicetree/bindings/clock/qcom,*
 F: drivers/clk/qcom/
 F: include/dt-bindings/clock/qcom,*
 
+QUALCOMM CLOUD AI (QAIC) DRIVER
+M: Jeffrey Hugo 
+L: linux-arm-...@vger.kernel.org
+S: Supported
+F: drivers/gpu/drm/qaic/
+F: include/uapi/drm/qaic_drm.h
+
 QUALCOMM CORE POWER REDUCTION (CPR) AVS DRIVER
 M: Niklas Cassel 
 L: linux...@vger.kernel.org
-- 
2.7.4



[RFC PATCH 12/14] drm/qaic: Add tracepoints

2022-08-15 Thread Jeffrey Hugo
Add QAIC specific tracepoints which can be useful in debugging issues.

Change-Id: I8cde015990d5a3482dbba142cf0a4bbb4512cb02
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_trace.h | 493 ++
 1 file changed, 493 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_trace.h

diff --git a/drivers/gpu/drm/qaic/qaic_trace.h 
b/drivers/gpu/drm/qaic/qaic_trace.h
new file mode 100644
index 000..0be824eb
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_trace.h
@@ -0,0 +1,493 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2020-2021, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#if !defined(_TRACE_QAIC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_QAIC_H
+#include 
+#include 
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM qaic
+#define TRACE_INCLUDE_FILE qaic_trace
+#define TRACE_INCLUDE_PATH ../../drivers/gpu/drm/qaic
+
+TRACE_EVENT(qaic_ioctl,
+   TP_PROTO(struct qaic_device *qdev, struct qaic_user *usr,
+unsigned int cmd, bool in),
+   TP_ARGS(qdev, usr, cmd, in),
+   TP_STRUCT__entry(
+   __string(device, dev_name(>pdev->dev))
+   __field(unsigned int, user)
+   __field(unsigned int, cmd)
+   __field(unsigned int, type)
+   __field(unsigned int, nr)
+   __field(unsigned int, size)
+   __field(unsigned int, dir)
+   __field(bool, in)
+   ),
+   TP_fast_assign(
+   __assign_str(device, dev_name(>pdev->dev));
+   __entry->user = usr->handle;
+   __entry->cmd =  cmd;
+   __entry->type = _IOC_TYPE(cmd);
+   __entry->nr =   _IOC_NR(cmd);
+   __entry->size = _IOC_SIZE(cmd);
+   __entry->dir =  _IOC_DIR(cmd);
+   __entry->in =   in;
+   ),
+   TP_printk("%s:%s user:%d cmd:0x%x (%c nr=%d len=%d dir=%d)",
+   __entry->in ? "Entry" : "Exit", __get_str(device),
+   __entry->user, __entry->cmd, __entry->type, __entry->nr,
+   __entry->size, __entry->dir)
+);
+
+TRACE_EVENT(qaic_mhi_queue_error,
+   TP_PROTO(struct qaic_device *qdev, const char *msg, int ret),
+   TP_ARGS(qdev, msg, ret),
+   TP_STRUCT__entry(
+   __string(device, dev_name(>pdev->dev))
+   __string(msg, msg)
+   __field(int, ret)
+   ),
+   TP_fast_assign(
+   __assign_str(device, dev_name(>pdev->dev));
+   __assign_str(msg, msg);
+   __entry->ret = ret;
+   ),
+   TP_printk("%s %s %d",
+   __get_str(device), __get_str(msg), __entry->ret)
+);
+
+DECLARE_EVENT_CLASS(qaic_manage_error,
+   TP_PROTO(struct qaic_device *qdev, struct qaic_user *usr,
+const char *msg),
+   TP_ARGS(qdev, usr, msg),
+   TP_STRUCT__entry(
+   __string(device, dev_name(>pdev->dev))
+   __field(unsigned int, user)
+   __string(msg, msg)
+   ),
+   TP_fast_assign(
+   __assign_str(device, dev_name(>pdev->dev));
+   __entry->user = usr->handle;
+   __assign_str(msg, msg);
+   ),
+   TP_printk("%s user:%d %s",
+ __get_str(device), __entry->user, __get_str(msg))
+);
+
+DEFINE_EVENT(qaic_manage_error, manage_error,
+   TP_PROTO(struct qaic_device *qdev, struct qaic_user *usr,
+const char *msg),
+   TP_ARGS(qdev, usr, msg)
+);
+
+DECLARE_EVENT_CLASS(qaic_encdec_error,
+   TP_PROTO(struct qaic_device *qdev, const char *msg),
+   TP_ARGS(qdev, msg),
+   TP_STRUCT__entry(
+   __string(device, dev_name(>pdev->dev))
+   __string(msg, msg)
+   ),
+   TP_fast_assign(
+   __assign_str(device, dev_name(>pdev->dev));
+   __assign_str(msg, msg);
+   ),
+   TP_printk("%s %s", __get_str(device), __get_str(msg))
+);
+
+DEFINE_EVENT(qaic_encdec_error, encode_error,
+   TP_PROTO(struct qaic_device *qdev, const char *msg),
+   TP_ARGS(qdev, msg)
+);
+
+DEFINE_EVENT(qaic_encdec_error, decode_error,
+   TP_PROTO(struct qaic_device *qdev, const char *msg),
+   TP_ARGS(qdev, msg)
+);
+
+TRACE_EVENT(qaic_control_dbg,
+   TP_PROTO(struct qaic_device *qdev, const char *msg, int ret),
+   TP_ARGS(qdev, msg, ret),
+   TP_STRUCT__entry(
+   __string(device, dev_name(>pdev->dev))
+   __string(msg, msg)
+   __field(int, ret)
+   ),
+   TP_fast_assign(
+   __assign_str(device, dev_name(>pdev->dev));
+   __assign_str(msg, msg);
+   __entry->ret = ret;
+   ),
+   TP_printk("%s %s %d",
+ __get_str(device), __get_str(msg), __entry->ret)
+);
+
+TRACE_EVENT(qaic_encode_passthrough,
+   TP_PROTO(struct qaic_device *qdev,
+

[RFC PATCH 09/14] drm/qaic: Add ssr component

2022-08-15 Thread Jeffrey Hugo
A QAIC device supports the concept of subsystem restart (ssr).  If a
processing unit for a workload crashes, it is possible to reset that unit
instead of crashing the device.  Since such an error is likely related to
the workload code that was running, it is possible to collect a crashdump
of the workload for offline analysis.

Change-Id: I77aa21ecbf0f730d8736a7465285ce5290ed3745
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_ssr.c | 889 
 drivers/gpu/drm/qaic/qaic_ssr.h |  13 +
 2 files changed, 902 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_ssr.c
 create mode 100644 drivers/gpu/drm/qaic/qaic_ssr.h

diff --git a/drivers/gpu/drm/qaic/qaic_ssr.c b/drivers/gpu/drm/qaic/qaic_ssr.c
new file mode 100644
index 000..826361b
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_ssr.c
@@ -0,0 +1,889 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qaic.h"
+#include "qaic_ssr.h"
+#include "qaic_trace.h"
+
+#define MSG_BUF_SZ 32
+#define MAX_PAGE_DUMP_RESP 4 /* It should always be in powers of 2 */
+
+enum ssr_cmds {
+   DEBUG_TRANSFER_INFO =   BIT(0),
+   DEBUG_TRANSFER_INFO_RSP =   BIT(1),
+   MEMORY_READ =   BIT(2),
+   MEMORY_READ_RSP =   BIT(3),
+   DEBUG_TRANSFER_DONE =   BIT(4),
+   DEBUG_TRANSFER_DONE_RSP =   BIT(5),
+   SSR_EVENT = BIT(8),
+   SSR_EVENT_RSP = BIT(9),
+};
+
+enum ssr_events {
+   SSR_EVENT_NACK =BIT(0),
+   BEFORE_SHUTDOWN =   BIT(1),
+   AFTER_SHUTDOWN =BIT(2),
+   BEFORE_POWER_UP =   BIT(3),
+   AFTER_POWER_UP =BIT(4),
+};
+
+struct debug_info_table {
+   /* Save preferences. Default is mandatory */
+   u64 save_perf;
+   /* Base address of the debug region */
+   u64 mem_base;
+   /* Size of debug region in bytes */
+   u64 len;
+   /* Description */
+   char desc[20];
+   /* Filename of debug region */
+   char filename[20];
+};
+
+struct _ssr_hdr {
+   __le32 cmd;
+   __le32 len;
+   __le32 dbc_id;
+};
+
+struct ssr_hdr {
+   u32 cmd;
+   u32 len;
+   u32 dbc_id;
+};
+
+struct ssr_debug_transfer_info {
+   struct ssr_hdr hdr;
+   u32 resv;
+   u64 tbl_addr;
+   u64 tbl_len;
+} __packed;
+
+struct ssr_debug_transfer_info_rsp {
+   struct _ssr_hdr hdr;
+   __le32 ret;
+} __packed;
+
+struct ssr_memory_read {
+   struct _ssr_hdr hdr;
+   __le32 resv;
+   __le64 addr;
+   __le64 len;
+} __packed;
+
+struct ssr_memory_read_rsp {
+   struct _ssr_hdr hdr;
+   __le32 resv;
+   u8 data[];
+} __packed;
+
+struct ssr_debug_transfer_done {
+   struct _ssr_hdr hdr;
+   __le32 resv;
+} __packed;
+
+struct ssr_debug_transfer_done_rsp {
+   struct _ssr_hdr hdr;
+   __le32 ret;
+} __packed;
+
+struct ssr_event {
+   struct ssr_hdr hdr;
+   u32 event;
+} __packed;
+
+struct ssr_event_rsp {
+   struct _ssr_hdr hdr;
+   __le32 event;
+} __packed;
+
+struct ssr_resp {
+   /* Work struct to schedule work coming on QAIC_SSR channel */
+   struct work_struct work;
+   /* Root struct of device, used to access device resources */
+   struct qaic_device *qdev;
+   /* Buffer used by MHI for transfer requests */
+   u8 data[] __aligned(8);
+};
+
+/* SSR crashdump book keeping structure */
+struct ssr_dump_info {
+   /* DBC associated with this SSR crashdump */
+   struct dma_bridge_chan *dbc;
+   /*
+* It will be used when we complete the crashdump download and switch
+* to waiting on SSR events
+*/
+   struct ssr_resp *resp;
+   /* We use this buffer to queue Crashdump downloading requests */
+   struct ssr_resp *dump_resp;
+   /* TRUE: dump_resp is queued for MHI transaction. FALSE: Otherwise */
+   bool dump_resp_queued;
+   /* TRUE: mem_rd_buf is queued for MHI transaction. FALSE: Otherwise */
+   bool mem_rd_buf_queued;
+   /* MEMORY READ request MHI buffer.*/
+   struct ssr_memory_read *mem_rd_buf;
+   /* Address of table in host */
+   void *tbl_addr;
+   /* Ptr to the entire dump */
+   void *dump_addr;
+   /* Address of table in device/target */
+   u64 tbl_addr_dev;
+   /* Total size of table */
+   u64 tbl_len;
+   /* Entire crashdump size */
+   u64 dump_sz;
+   /* Size of the buffer queued in for MHI transfer */
+   u64 resp_buf_sz;
+   /*
+* Crashdump will be collected chunk by chunk and this is max size of
+* one chunk
+*/
+   u64 chunk_sz;
+   /* Offset of table(tbl_addr) where the new chunk will be dumped */

[RFC PATCH 13/14] drm/qaic: Add qaic driver to the build system

2022-08-15 Thread Jeffrey Hugo
Add the infrastructure that allows the QAIC driver to be built.

Change-Id: I5b609b2e91b6a99939bdac35849813263ad874af
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/Kconfig   |  2 ++
 drivers/gpu/drm/Makefile  |  1 +
 drivers/gpu/drm/qaic/Kconfig  | 33 +
 drivers/gpu/drm/qaic/Makefile | 17 +
 4 files changed, 53 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/Kconfig
 create mode 100644 drivers/gpu/drm/qaic/Makefile

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index b1f22e4..b614940 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -390,6 +390,8 @@ source "drivers/gpu/drm/gud/Kconfig"
 
 source "drivers/gpu/drm/sprd/Kconfig"
 
+source "drivers/gpu/drm/qaic/Kconfig"
+
 config DRM_HYPERV
tristate "DRM Support for Hyper-V synthetic video device"
depends on DRM && PCI && MMU && HYPERV
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 301a44d..28b0f1b 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -135,3 +135,4 @@ obj-y   += xlnx/
 obj-y  += gud/
 obj-$(CONFIG_DRM_HYPERV) += hyperv/
 obj-$(CONFIG_DRM_SPRD) += sprd/
+obj-$(CONFIG_DRM_QAIC) += qaic/
diff --git a/drivers/gpu/drm/qaic/Kconfig b/drivers/gpu/drm/qaic/Kconfig
new file mode 100644
index 000..eca2bcb
--- /dev/null
+++ b/drivers/gpu/drm/qaic/Kconfig
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Qualcomm Cloud AI accelerators driver
+#
+
+config DRM_QAIC
+   tristate "Qualcomm Cloud AI accelerators"
+   depends on PCI && HAS_IOMEM
+   depends on MHI_BUS
+   depends on DRM
+   depends on MMU
+   select CRC32
+   help
+ Enables driver for Qualcomm's Cloud AI accelerator PCIe cards that are
+ designed to accelerate Deep Learning inference workloads.
+
+ The driver manages the PCIe devices and provides an IOCTL interface
+ for users to submit workloads to the devices.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called qaic.
+
+config QAIC_HWMON
+   bool "Qualcomm Cloud AI accelerator telemetry"
+   depends on DRM_QAIC
+   depends on HWMON
+   help
+ Enables telemetry via the HWMON interface for Qualcomm's Cloud AI
+ accelerator PCIe cards.
+
+ If unsure, say N.
diff --git a/drivers/gpu/drm/qaic/Makefile b/drivers/gpu/drm/qaic/Makefile
new file mode 100644
index 000..4a5daff
--- /dev/null
+++ b/drivers/gpu/drm/qaic/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for Qualcomm Cloud AI accelerators driver
+#
+
+obj-$(CONFIG_DRM_QAIC) := qaic.o
+
+qaic-y := \
+   qaic_drv.o \
+   mhi_controller.o \
+   qaic_control.o \
+   qaic_data.o \
+   qaic_debugfs.o \
+   qaic_telemetry.o \
+   qaic_ras.o \
+   qaic_ssr.o \
+   qaic_sysfs.o
-- 
2.7.4



[RFC PATCH 04/14] drm/qaic: Add MHI controller

2022-08-15 Thread Jeffrey Hugo
A QAIC device contains a MHI interface with a number of different channels
for controlling different aspects of the device.  The MHI controller works
with the MHI bus to enable and drive that interface.

Change-Id: I77363193b1a2dece7abab287a6acef3cac1b4e1b
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/mhi_controller.c | 575 ++
 drivers/gpu/drm/qaic/mhi_controller.h |  18 ++
 2 files changed, 593 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/mhi_controller.c
 create mode 100644 drivers/gpu/drm/qaic/mhi_controller.h

diff --git a/drivers/gpu/drm/qaic/mhi_controller.c 
b/drivers/gpu/drm/qaic/mhi_controller.c
new file mode 100644
index 000..e88e0fe
--- /dev/null
+++ b/drivers/gpu/drm/qaic/mhi_controller.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mhi_controller.h"
+#include "qaic.h"
+
+#define MAX_RESET_TIME_SEC 25
+
+static unsigned int mhi_timeout = 2000; /* 2 sec default */
+module_param(mhi_timeout, uint, 0600);
+
+static struct mhi_channel_config aic100_channels[] = {
+   {
+   .name = "QAIC_LOOPBACK",
+   .num = 0,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_TO_DEVICE,
+   .ee_mask = MHI_CH_EE_AMSS,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_LOOPBACK",
+   .num = 1,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_FROM_DEVICE,
+   .ee_mask = MHI_CH_EE_AMSS,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_SAHARA",
+   .num = 2,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_TO_DEVICE,
+   .ee_mask = MHI_CH_EE_SBL,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_SAHARA",
+   .num = 3,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_FROM_DEVICE,
+   .ee_mask = MHI_CH_EE_SBL,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_DIAG",
+   .num = 4,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_TO_DEVICE,
+   .ee_mask = MHI_CH_EE_AMSS,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_DIAG",
+   .num = 5,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_FROM_DEVICE,
+   .ee_mask = MHI_CH_EE_AMSS,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   .lpm_notify = false,
+   .offload_channel = false,
+   .doorbell_mode_switch = false,
+   .auto_queue = false,
+   .wake_capable = false,
+   },
+   {
+   .name = "QAIC_SSR",
+   .num = 6,
+   .num_elements = 32,
+   .local_elements = 0,
+   .event_ring = 0,
+   .dir = DMA_TO_DEVICE,
+   .ee_mask = MHI_CH_EE_AMSS,
+   .pollcfg = 0,
+   .doorbell = MHI_DB_BRST_DISABLE,
+   

[RFC PATCH 02/14] drm/qaic: Add uapi and core driver file

2022-08-15 Thread Jeffrey Hugo
Add the QAIC driver uapi file and core driver file that binds to the PCIe
device.  The core driver file also creates the drm device and manages all
the interconnections between the different parts of the driver.

Change-Id: I28854e8a5dacda217439be2f65a4ab67d4dccd1e
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_drv.c | 825 
 include/uapi/drm/qaic_drm.h | 283 ++
 2 files changed, 1108 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_drm.h

diff --git a/drivers/gpu/drm/qaic/qaic_drv.c b/drivers/gpu/drm/qaic/qaic_drv.c
new file mode 100644
index 000..0e139e6
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_drv.c
@@ -0,0 +1,825 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2022 Qualcomm Innovation Center, Inc. All rights 
reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mhi_controller.h"
+#include "qaic.h"
+#include "qaic_debugfs.h"
+#include "qaic_ras.h"
+#include "qaic_ssr.h"
+#include "qaic_telemetry.h"
+#define CREATE_TRACE_POINTS
+#include "qaic_trace.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+#define PCI_DEV_AIC100 0xa100
+#define QAIC_NAME  "qaic"
+#define STR2(s)#s
+#define STR(s) STR2(s)
+#define MAJOR_VER  1
+#define MINOR_VER  0
+#define PATCH_VER  0
+#define QAIC_DESC  "Qualcomm Cloud AI Accelerators"
+
+static unsigned int datapath_polling;
+module_param(datapath_polling, uint, 0400);
+bool poll_datapath;
+
+static u16 cntl_major = 5;
+static u16 cntl_minor;/* 0 */
+static bool link_up;
+
+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
+ struct qaic_user *owner);
+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
+   struct qaic_user *owner);
+
+static void free_usr(struct kref *kref)
+{
+   struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
+
+   cleanup_srcu_struct(>qddev_lock);
+   kfree(usr);
+}
+
+static int qaic_open(struct drm_device *dev, struct drm_file *file)
+{
+   struct qaic_drm_device *qddev = dev->dev_private;
+   struct qaic_device *qdev = qddev->qdev;
+   struct qaic_user *usr;
+   int rcu_id;
+   int ret;
+
+   rcu_id = srcu_read_lock(>dev_lock);
+   if (qdev->in_reset) {
+   srcu_read_unlock(>dev_lock, rcu_id);
+   return -ENODEV;
+   }
+
+   usr = kmalloc(sizeof(*usr), GFP_KERNEL);
+   if (!usr) {
+   srcu_read_unlock(>dev_lock, rcu_id);
+   return -ENOMEM;
+   }
+
+   usr->handle = current->pid;
+   usr->qddev = qddev;
+   atomic_set(>chunk_id, 0);
+   init_srcu_struct(>qddev_lock);
+   kref_init(>ref_count);
+
+   ret = mutex_lock_interruptible(>users_mutex);
+   if (ret) {
+   cleanup_srcu_struct(>qddev_lock);
+   kfree(usr);
+   srcu_read_unlock(>dev_lock, rcu_id);
+   return ret;
+   }
+
+   list_add(>node, >users);
+   mutex_unlock(>users_mutex);
+
+   file->driver_priv = usr;
+
+   srcu_read_unlock(>dev_lock, rcu_id);
+   return 0;
+}
+
+static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
+{
+   struct qaic_user *usr = file->driver_priv;
+   struct qaic_drm_device *qddev;
+   struct qaic_device *qdev;
+   int qdev_rcu_id;
+   int usr_rcu_id;
+   int i;
+
+   qddev = usr->qddev;
+   usr_rcu_id = srcu_read_lock(>qddev_lock);
+   if (qddev) {
+   qdev = qddev->qdev;
+   qdev_rcu_id = srcu_read_lock(>dev_lock);
+   if (!qdev->in_reset) {
+   qaic_release_usr(qdev, usr);
+   for (i = 0; i < qdev->num_dbc; ++i)
+   if (qdev->dbc[i].usr &&
+   qdev->dbc[i].usr->handle == usr->handle)
+   release_dbc(qdev, i, true);
+
+   /* Remove child devices */
+   if (qddev->partition_id == QAIC_NO_PARTITION)
+   qaic_destroy_drm_device(qdev, 
QAIC_NO_PARTITION, usr);
+   }
+   srcu_read_unlock(>dev_lock, qdev_rcu_id);
+
+   mutex_lock(>users_mutex);
+   if (!list_empty(>node))
+   list_del_init(>node);
+   mutex_unlock(>users_mutex);
+   }
+
+   srcu_read_unlock(>qddev_lock, 

[RFC PATCH 10/14] drm/qaic: Add sysfs

2022-08-15 Thread Jeffrey Hugo
The QAIC driver can advertise the state of individual dma_bridge channels
to userspace.  Userspace can use this information to manage userspace
state when a channel crashes.

Change-Id: Ifc7435c53cec6aa326bdcd9bfcb77ea7f2a63bab
Signed-off-by: Jeffrey Hugo 
---
 drivers/gpu/drm/qaic/qaic_sysfs.c | 113 ++
 1 file changed, 113 insertions(+)
 create mode 100644 drivers/gpu/drm/qaic/qaic_sysfs.c

diff --git a/drivers/gpu/drm/qaic/qaic_sysfs.c 
b/drivers/gpu/drm/qaic/qaic_sysfs.c
new file mode 100644
index 000..5ee1696
--- /dev/null
+++ b/drivers/gpu/drm/qaic/qaic_sysfs.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2020-2021, The Linux Foundation. All rights reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qaic.h"
+
+#define NAME_LEN 14
+
+struct dbc_attribute {
+   struct device_attribute dev_attr;
+   u32 dbc_id;
+   char name[NAME_LEN];
+};
+
+static ssize_t dbc_state_show(struct device *dev,
+ struct device_attribute *a, char *buf)
+{
+   struct dbc_attribute *attr = container_of(a, struct dbc_attribute, 
dev_attr);
+   struct qaic_device *qdev = dev_get_drvdata(dev);
+
+   return sprintf(buf, "%d\n", qdev->dbc[attr->dbc_id].state);
+}
+
+void set_dbc_state(struct qaic_device *qdev, u32 dbc_id, unsigned int state)
+{
+   char id_str[12];
+   char state_str[16];
+   char *envp[] = { id_str, state_str, NULL };
+   struct qaic_drm_device *qddev;
+
+   if (state >= DBC_STATE_MAX) {
+   pci_dbg(qdev->pdev, "%s invalid state %d\n", __func__, state);
+   return;
+   }
+   if (dbc_id >= qdev->num_dbc) {
+   pci_dbg(qdev->pdev, "%s invalid dbc_id %d\n", __func__, dbc_id);
+   return;
+   }
+   if (state == qdev->dbc[dbc_id].state) {
+   pci_dbg(qdev->pdev, "%s already at state %d\n", __func__, 
state);
+   return;
+   }
+
+   snprintf(id_str, ARRAY_SIZE(id_str), "DBC_ID=%d", dbc_id);
+   snprintf(state_str, ARRAY_SIZE(state_str), "DBC_STATE=%d", state);
+
+   qdev->dbc[dbc_id].state = state;
+   mutex_lock(>qaic_drm_devices_mutex);
+   list_for_each_entry(qddev, >qaic_drm_devices, node)
+   kobject_uevent_env(>ddev->dev->kobj, KOBJ_CHANGE, envp);
+   mutex_unlock(>qaic_drm_devices_mutex);
+}
+
+int qaic_sysfs_init(struct qaic_drm_device *qddev)
+{
+   u32 num_dbc = qddev->qdev->num_dbc;
+   struct dbc_attribute *dbc_attrs;
+   int i, ret;
+
+   dbc_attrs = kcalloc(num_dbc, sizeof(*dbc_attrs), GFP_KERNEL);
+   if (!dbc_attrs)
+   return -ENOMEM;
+
+   qddev->sysfs_attrs = dbc_attrs;
+
+   for (i = 0; i < num_dbc; ++i) {
+   struct dbc_attribute *dbc = _attrs[i];
+
+   sysfs_attr_init(>dev_attr.attr);
+   dbc->dbc_id = i;
+   snprintf(dbc->name, NAME_LEN, "dbc%d_state", i);
+   dbc->dev_attr.attr.name = dbc->name;
+   dbc->dev_attr.attr.mode = 0444;
+   dbc->dev_attr.show = dbc_state_show;
+   ret = sysfs_create_file(>ddev->dev->kobj,
+   >dev_attr.attr);
+   if (ret) {
+   int j;
+
+   for (j = 0; j < i; ++j) {
+   dbc = _attrs[j];
+   sysfs_remove_file(>ddev->dev->kobj,
+ >dev_attr.attr);
+   }
+   break;
+   }
+   }
+
+   if (ret)
+   kfree(dbc_attrs);
+
+   return ret;
+}
+
+void qaic_sysfs_remove(struct qaic_drm_device *qddev)
+{
+   struct dbc_attribute *dbc_attrs = qddev->sysfs_attrs;
+   u32 num_dbc = qddev->qdev->num_dbc;
+   int i;
+
+   for (i = 0; i < num_dbc; ++i)
+   sysfs_remove_file(>ddev->dev->kobj,
+ _attrs[i].dev_attr.attr);
+
+   kfree(dbc_attrs);
+}
-- 
2.7.4



[RFC PATCH 01/14] drm/qaic: Add documentation for AIC100 accelerator driver

2022-08-15 Thread Jeffrey Hugo
Add documentation covering both the QAIC driver, and the device that it
drives.

Change-Id: Iee519cc0a276249c4e8684507d27ae2c33e29aeb
Signed-off-by: Jeffrey Hugo 
---
 Documentation/gpu/drivers.rst |   1 +
 Documentation/gpu/qaic.rst| 567 ++
 2 files changed, 568 insertions(+)
 create mode 100644 Documentation/gpu/qaic.rst

diff --git a/Documentation/gpu/drivers.rst b/Documentation/gpu/drivers.rst
index 3a52f48..433dac5 100644
--- a/Documentation/gpu/drivers.rst
+++ b/Documentation/gpu/drivers.rst
@@ -18,6 +18,7 @@ GPU Driver Documentation
xen-front
afbc
komeda-kms
+   qaic
 
 .. only::  subproject and html
 
diff --git a/Documentation/gpu/qaic.rst b/Documentation/gpu/qaic.rst
new file mode 100644
index 000..3414f98
--- /dev/null
+++ b/Documentation/gpu/qaic.rst
@@ -0,0 +1,567 @@
+Overview
+
+QAIC is the driver for the Qualcomm Cloud AI 100/AIC100 and SA9000P (part of
+Snapdragon Ride) products.  Qualcomm Cloud AI 100 is a PCIe adapter card which
+contains a dedicated SoC ASIC for the purpose of efficiently running Artificial
+Intelligence (AI) Deep Learning inference workloads.
+
+The PCIe interface of Qualcomm Cloud AI 100 is capable of Gen4 x8.  An
+individual SoC on a card can have up to 16 NSPs for running workloads.  Each 
SoC
+has a A53 management CPU.  On card, there can be up to 32 GB of DDR
+
+Multiple Qualcomm Cloud AI 100 cards can be hosted in a single system to scale
+overall performance.
+
+
+Hardware Description
+
+An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
+peripherals (PMICs, etc).
+
+An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
+or a Dual M.2 card.  Both use PCIe to connect to the host system.
+
+As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
+ProductID(PID) combination to uniquely identify itself to the host.  AIC100
+uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
+AIC100 PID (0xa100).
+
+AIC100 does not implement FLR (function level reset).
+
+AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
+operate (1 for MHI, 16 for the DMA Bridge).
+
+As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
+hardware.  AIC100 provides 3, 64-bit BARs.
+
+-The first BAR is 4K in size, and exposes the MHI interface to the host.
+
+-The second BAR is 2M in size, and exposes the DMA Bridge interface to the 
host.
+
+-The third BAR is variable in size based on an individual AIC100's
+   configuration, but defaults to 64K.  This BAR currently has no purpose.
+
+From the host perspective, AIC100 has several key hardware components-
+QSM (QAIC Service Manager)
+NSPs (Neural Signal Processor)
+DMA Bridge
+DDR
+MHI (Modem Host Interface)
+
+QSM - QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
+firmware of the card and performs on-card management tasks.  It also
+communicates with the host (QAIC/userspace) via MHI.  Each AIC100 has one of
+these.
+
+NSP - Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
+the processors that run the workloads on AIC100.  Each NSP is a Qualcomm 
Hexagon
+(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
+multiple NSPs may be assigned to a single workload.  Since each NSP can only 
run
+one workload, AIC100 is limited to 16 concurrent workloads.  Workload
+"scheduling" is under the purview of the host.  AIC100 does not automatically
+timeslice.
+
+DMA Bridge - The DMA Bridge is custom DMA engine that manages the flow of data
+in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
+channels, each consisting of a set of request/response FIFOs.  Each active
+workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
+hardware registers to manage the FIFOs (head/tail pointers), but requires host
+memory to store the FIFOs.
+
+DDR - AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
+This DDR is used to store workloads, data for the workloads, and is used by the
+QSM for managing the device.  NSPs are granted access to sections of the DDR by
+the QSM.  The host does not have direct access to the DDR, and must make
+requests to the QSM to transfer data to the DDR.
+
+MHI - AIC100 has one MHI interface over PCIe.  MHI itself is documented at
+Documentation/mhi/index.rst  MHI is the mechanism the host (QAIC/userspace)
+uses to communicate with the QSM.  Except for workload data via the DMA Bridge,
+all interaction with the device occurs via MHI.
+
+
+High-level Use Flow
+---
+AIC100 is a programmable accelerator.  AIC100 is typically used for running
+neural networks in inferencing mode to efficiently perform AI operations.
+AIC100 is not intended for training neural networks.  AIC100 can be utilitized
+for generic compute workloads.
+
+Assuming a user wants to utilize AIC100, they would 

RE: [PATCH linux-next] drm/amdgpu/vcn: Return void from the stop_dbg_mode

2022-08-15 Thread Dong, Ruijing
[AMD Official Use Only - General]

This patch is

Reviewed-by: Ruijing Dong 

Thanks,
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 2:34 PM
To: Dong, Ruijing ; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Zhu, James 
; Liu, Leo ; Jiang, Sonny 
; Wan Jiabing ; Greg Kroah-Hartman 
; Khalid Masum 
Subject: [PATCH linux-next] drm/amdgpu/vcn: Return void from the stop_dbg_mode

There is no point in returning an int here. It only returns 0 which the caller 
never uses. Therefore return void and remove the unnecessary assignment.

Addresses-Coverity: 1504988 ("Unused value")
Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
Suggested-by: Ruijing Dong 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Khalid Masum 
---
Past discussions:
- V1 Link: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20220815070056.10816-1-khalid.masum.92%40gmail.com%2Fdata=05%7C01%7Cruijing.dong%40amd.com%7C017222a9e81f49ea336f08da7eecd6c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961852950464412%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=TyXtoF2flWNqabtiJ%2BDVcR2vdsnLZ19qr3b%2BQT2DBQA%3Dreserved=0

Changes since V1:
- Make stop_dbg_mode return void
- Update commit description

 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..fb2d74f30448 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1115,7 +1115,7 @@ static int vcn_v4_0_start(struct amdgpu_device *adev)
  *
  * Stop VCN block with dpg mode
  */
-static int vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
+static void vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int
+inst_idx)
 {
uint32_t tmp;

@@ -1133,7 +1133,6 @@ static int vcn_v4_0_stop_dpg_mode(struct amdgpu_device 
*adev, int inst_idx)
/* disable dynamic power gating mode */
WREG32_P(SOC15_REG_OFFSET(VCN, inst_idx, regUVD_POWER_STATUS), 0,
~UVD_POWER_STATUS__UVD_PG_MODE_MASK);
-   return 0;
 }

 /**
@@ -1154,7 +1153,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
-   r = vcn_v4_0_stop_dpg_mode(adev, i);
+   vcn_v4_0_stop_dpg_mode(adev, i);
continue;
}

--
2.37.1



[PATCH linux-next] drm/amdgpu/vcn: Return void from the stop_dbg_mode

2022-08-15 Thread Khalid Masum
There is no point in returning an int here. It only returns 0 which
the caller never uses. Therefore return void and remove the unnecessary 
assignment.

Addresses-Coverity: 1504988 ("Unused value")
Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
Suggested-by: Ruijing Dong 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Khalid Masum 
---
Past discussions:
- V1 Link: 
https://lore.kernel.org/lkml/20220815070056.10816-1-khalid.masum...@gmail.com/

Changes since V1:
- Make stop_dbg_mode return void
- Update commit description

 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..fb2d74f30448 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1115,7 +1115,7 @@ static int vcn_v4_0_start(struct amdgpu_device *adev)
  *
  * Stop VCN block with dpg mode
  */
-static int vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
+static void vcn_v4_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
 {
uint32_t tmp;
 
@@ -1133,7 +1133,6 @@ static int vcn_v4_0_stop_dpg_mode(struct amdgpu_device 
*adev, int inst_idx)
/* disable dynamic power gating mode */
WREG32_P(SOC15_REG_OFFSET(VCN, inst_idx, regUVD_POWER_STATUS), 0,
~UVD_POWER_STATUS__UVD_PG_MODE_MASK);
-   return 0;
 }
 
 /**
@@ -1154,7 +1153,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
 
if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
-   r = vcn_v4_0_stop_dpg_mode(adev, i);
+   vcn_v4_0_stop_dpg_mode(adev, i);
continue;
}
 
-- 
2.37.1



Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Khalid Masum

On 8/15/22 22:00, Dong, Ruijing wrote:

[AMD Official Use Only - General]

Then please update commit message, this change is due to "value r is never used, and 
remove unnecessary assignment", that makes sense to me.

Thanks
Ruijing

Greg also pointed out that the function vcn_v4_0_stop_dpg_mode should 
return void. I shall send a patch shortly with these two changes. Thanks 
for your suggestion.


Thanks,
  -- Khalid Masum


[PATCH 5.15 063/779] drm/hyperv-drm: Include framebuffer and EDID headers

2022-08-15 Thread Greg Kroah-Hartman
From: Thomas Zimmermann 

commit 009a3a52791f31c57d755a73f6bc66fbdd8bd76c upstream.

Fix a number of compile errors by including the correct header
files. Examples are shown below.

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_blit_to_vram_rect':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:25:48: error: invalid use of 
undefined type 'struct drm_framebuffer'
   25 | struct hyperv_drm_device *hv = to_hv(fb->dev);
  |^~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 
'hyperv_connector_get_modes':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:59:17: error: implicit 
declaration of function 'drm_add_modes_noedid' 
[-Werror=implicit-function-declaration]
   59 | count = drm_add_modes_noedid(connector,
  | ^~~~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:62:9: error: implicit 
declaration of function 'drm_set_preferred_mode'; did you mean 
'drm_mm_reserve_node'? [-Werror=implicit-function-declaration]
   62 | drm_set_preferred_mode(connector, hv->preferred_width,
  | ^~

Signed-off-by: Thomas Zimmermann 
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video 
device")
Fixes: 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h")
Fixes: 255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
Cc: Deepak Rawat 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: linux-hyp...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.14+
Acked-by: Maxime Ripard 
Reviewed-by: Ville Syrjälä 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220622083413.12573-1-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/hyperv/hyperv_drm_modeset.c |2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -7,9 +7,11 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 




Re: Build regressions/improvements in v6.0-rc1

2022-08-15 Thread Geert Uytterhoeven

On Mon, 15 Aug 2022, Geert Uytterhoeven wrote:

Below is the list of build error/warning regressions/improvements in
v6.0-rc1[1] compared to v5.19[2].

Summarized:
 - build errors: +26/-15


  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: .cfi_endproc 
without corresponding .cfi_startproc:  => 32
  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: bad or 
irreducible absolute expression:  => 16
  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: junk at end of 
line, first unrecognized character is `:':  => 16
  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: no such 
instruction: `be 0x100(%sr2,%r0)':  => 29
  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: no such 
instruction: `ldi 0,%r20':  => 30
  + /kisskb/src/arch/parisc/kernel/vdso32/restart_syscall.S: Error: no such 
instruction: `ldw 0(%sp),%r31':  => 26
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: no such instruction: 
`ble 0x100(%sr2,%r0)':  => 51, 46
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: no such instruction: 
`ldi 0,%r25':  => 44
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: no such instruction: 
`ldi 1,%r25':  => 49
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: no such instruction: 
`ldi 173,%r20':  => 50, 45
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.callinfo':  => 40
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.entry':  => 41
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.exit':  => 54
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.proc':  => 39
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.procend':  => 55
  + /kisskb/src/arch/parisc/kernel/vdso32/sigtramp.S: Error: unknown pseudo-op: 
`.stringz':  => 76

parisc64-gcc11/generic-64bit_defconfig
parisc-gcc11/generic-32bit_defconfig
parisc-gcc11/parisc-{allmod,allno,def}config

  + /kisskb/src/arch/sh/include/asm/io.h: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]:  => 239:34

sh4-gcc11/sh-allmodconfig (drivers/staging/octeon/ethernet-mem.c)

  + 
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:
 error: the frame size of 2096 bytes is larger than 2048 bytes 
[-Werror=frame-larger-than=]:  => 6806:1
  + 
/kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:
 error: the frame size of 2160 bytes is larger than 2048 bytes 
[-Werror=frame-larger-than=]:  => 3778:1

x86_64-gcc8/x86-allmodconfig

  + /kisskb/src/include/linux/bitfield.h: error: call to '__field_overflow' 
declared with attribute error: value doesn't fit into mask:  => 151:3

mipsel-gcc5/mips-allmodconfig (net/mac80211/tx.c)

  + /kisskb/src/include/linux/compiler_types.h: error: call to 
'__compiletime_assert_603' declared with attribute error: FIELD_GET: mask is not 
constant:  => 354:38

arm64-gcc5/arm64-allmodconfig (arch/arm64/kvm/arm.c)

  + /kisskb/src/include/linux/random.h: error: 'latent_entropy' undeclared (first 
use in this function):  => 25:39

powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc32_allmodconfig
powerpc-gcc5/ppc64_book3e_allmodconfig
powerpc-gcc5/ppc64le_allmodconfig

  + /kisskb/src/include/linux/random.h: error: 'latent_entropy' undeclared (first 
use in this function); did you mean 'add_latent_entropy'?:  => 25:46

powerpc-gcc11/powerpc-all{mod,yes}config
powerpc-gcc11/ppc64_book3e_allmodconfig

  + {standard input}: Error: displacement to undefined symbol .L377 overflows 
12-bit field:  => 2286
  + {standard input}: Error: displacement to undefined symbol .L378 overflows 
8-bit field :  => 2302
  + {standard input}: Error: displacement to undefined symbol .L382 overflows 
8-bit field :  => 2213

sh4-gcc11/sh-allmodconfig (seen before, root cause is internal compiler error)


[1] 
http://kisskb.ellerman.id.au/kisskb/branch/linus/head/568035b01cfb107af8d2e4bd2fb9aea22cf5b868/
 (all 135 configs)
[2] 
http://kisskb.ellerman.id.au/kisskb/branch/linus/head/3d7cb6b04c3f3115719235cc6866b10326de34cd/
 (all 135 configs)


Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH 2/2] drm/i915: Add DSC support to MST path

2022-08-15 Thread Stanislav Lisovskiy
Whenever we are not able to get enough timeslots
for required PBN, let's try to allocate those
using DSC, just same way as we do for SST.

v2: Removed intel_dp_mst_dsc_compute_config and refactored
intel_dp_dsc_compute_config to support timeslots as a
parameter(Ville Syrjälä)

v3: - Rebased
- Added a debug to see that we at least try reserving
  VCPI slots using DSC, because currently its not visible
  from the logs, thus making debugging more tricky.
- Moved timeslots to numerator, where it should be.

v4: - Call drm_dp_mst_atomic_check already during link
  config computation, because we need to know already
  by this moment if uncompressed amount of VCPI slots
  needed can fit, otherwise we need to use DSC.
  (thanks to Vinod Govindapillai for pointing this out)

Signed-off-by: Stanislav Lisovskiy 
---
 drivers/gpu/drm/i915/display/intel_dp.c |  76 --
 drivers/gpu/drm/i915/display/intel_dp.h |  17 +++
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 157 
 3 files changed, 206 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 32292c0be2bd..1f6dc52251c2 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -116,7 +116,6 @@ bool intel_dp_is_edp(struct intel_dp *intel_dp)
 }
 
 static void intel_dp_unset_edid(struct intel_dp *intel_dp);
-static int intel_dp_dsc_compute_bpp(struct intel_dp *intel_dp, u8 dsc_max_bpc);
 
 /* Is link rate UHBR and thus 128b/132b? */
 bool intel_dp_is_uhbr(const struct intel_crtc_state *crtc_state)
@@ -687,11 +686,12 @@ small_joiner_ram_size_bits(struct drm_i915_private *i915)
return 6144 * 8;
 }
 
-static u16 intel_dp_dsc_get_output_bpp(struct drm_i915_private *i915,
-  u32 link_clock, u32 lane_count,
-  u32 mode_clock, u32 mode_hdisplay,
-  bool bigjoiner,
-  u32 pipe_bpp)
+u16 intel_dp_dsc_get_output_bpp(struct drm_i915_private *i915,
+   u32 link_clock, u32 lane_count,
+   u32 mode_clock, u32 mode_hdisplay,
+   bool bigjoiner,
+   u32 pipe_bpp,
+   u32 timeslots)
 {
u32 bits_per_pixel, max_bpp_small_joiner_ram;
int i;
@@ -702,8 +702,9 @@ static u16 intel_dp_dsc_get_output_bpp(struct 
drm_i915_private *i915,
 * for SST -> TimeSlotsPerMTP is 1,
 * for MST -> TimeSlotsPerMTP has to be calculated
 */
-   bits_per_pixel = (link_clock * lane_count * 8) /
+   bits_per_pixel = (link_clock * lane_count * 8) * timeslots /
 intel_dp_mode_to_fec_clock(mode_clock);
+   drm_dbg_kms(>drm, "Max link bpp: %u\n", bits_per_pixel);
 
/* Small Joiner Check: output bpp <= joiner RAM (bits) / Horiz. width */
max_bpp_small_joiner_ram = small_joiner_ram_size_bits(i915) /
@@ -752,9 +753,9 @@ static u16 intel_dp_dsc_get_output_bpp(struct 
drm_i915_private *i915,
return bits_per_pixel << 4;
 }
 
-static u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp,
-  int mode_clock, int mode_hdisplay,
-  bool bigjoiner)
+u8 intel_dp_dsc_get_slice_count(struct intel_dp *intel_dp,
+   int mode_clock, int mode_hdisplay,
+   bool bigjoiner)
 {
struct drm_i915_private *i915 = dp_to_i915(intel_dp);
u8 min_slice_count, i;
@@ -961,8 +962,8 @@ intel_dp_mode_valid_downstream(struct intel_connector 
*connector,
return MODE_OK;
 }
 
-static bool intel_dp_need_bigjoiner(struct intel_dp *intel_dp,
-   int hdisplay, int clock)
+bool intel_dp_need_bigjoiner(struct intel_dp *intel_dp,
+int hdisplay, int clock)
 {
struct drm_i915_private *i915 = dp_to_i915(intel_dp);
 
@@ -1049,7 +1050,7 @@ intel_dp_mode_valid(struct drm_connector *_connector,
target_clock,
mode->hdisplay,
bigjoiner,
-   pipe_bpp) >> 4;
+   pipe_bpp, 1) >> 4;
dsc_slice_count =
intel_dp_dsc_get_slice_count(intel_dp,
 target_clock,
@@ -1354,7 +1355,7 @@ intel_dp_compute_link_config_wide(struct intel_dp 
*intel_dp,
return -EINVAL;
 }
 
-static int intel_dp_dsc_compute_bpp(struct intel_dp *intel_dp, u8 max_req_bpc)
+int 

[PATCH 0/2] Add DP MST DSC support to i915

2022-08-15 Thread Stanislav Lisovskiy
Currently we have only DSC support for DP SST.

Stanislav Lisovskiy (2):
  drm: Add missing DP DSC extended capability definitions.
  drm/i915: Add DSC support to MST path

 drivers/gpu/drm/i915/display/intel_dp.c |  76 --
 drivers/gpu/drm/i915/display/intel_dp.h |  17 +++
 drivers/gpu/drm/i915/display/intel_dp_mst.c | 157 
 include/drm/display/drm_dp.h|  10 +-
 4 files changed, 215 insertions(+), 45 deletions(-)

-- 
2.24.1.485.gad05a3d8e5



[PATCH 1/2] drm: Add missing DP DSC extended capability definitions.

2022-08-15 Thread Stanislav Lisovskiy
Adding DP DSC register definitions, we might need for further
DSC implementation, supporting MST and DP branch pass-through mode.

v2: - Fixed checkpatch comment warning
v3: - Removed function which is not yet used(Jani Nikula)

Signed-off-by: Stanislav Lisovskiy 
---
 include/drm/display/drm_dp.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/drm/display/drm_dp.h b/include/drm/display/drm_dp.h
index 9e3aff7e68bb..0d05e3172f96 100644
--- a/include/drm/display/drm_dp.h
+++ b/include/drm/display/drm_dp.h
@@ -239,6 +239,9 @@
 
 #define DP_DSC_SUPPORT  0x060   /* DP 1.4 */
 # define DP_DSC_DECOMPRESSION_IS_SUPPORTED  (1 << 0)
+# define DP_DSC_PASS_THROUGH_IS_SUPPORTED   (1 << 1)
+# define DP_DSC_DYNAMIC_PPS_UPDATE_SUPPORT_COMP_TO_COMP(1 << 2)
+# define DP_DSC_DYNAMIC_PPS_UPDATE_SUPPORT_UNCOMP_TO_COMP  (1 << 3)
 
 #define DP_DSC_REV  0x061
 # define DP_DSC_MAJOR_MASK  (0xf << 0)
@@ -277,12 +280,15 @@
 
 #define DP_DSC_BLK_PREDICTION_SUPPORT   0x066
 # define DP_DSC_BLK_PREDICTION_IS_SUPPORTED (1 << 0)
+# define DP_DSC_RGB_COLOR_CONV_BYPASS_SUPPORT (1 << 1)
 
 #define DP_DSC_MAX_BITS_PER_PIXEL_LOW   0x067   /* eDP 1.4 */
 
 #define DP_DSC_MAX_BITS_PER_PIXEL_HI0x068   /* eDP 1.4 */
 # define DP_DSC_MAX_BITS_PER_PIXEL_HI_MASK  (0x3 << 0)
 # define DP_DSC_MAX_BITS_PER_PIXEL_HI_SHIFT 8
+# define DP_DSC_MAX_BPP_DELTA_VERSION_MASK  0x06
+# define DP_DSC_MAX_BPP_DELTA_AVAILABILITY  0x08
 
 #define DP_DSC_DEC_COLOR_FORMAT_CAP 0x069
 # define DP_DSC_RGB (1 << 0)
@@ -344,11 +350,13 @@
 # define DP_DSC_24_PER_DP_DSC_SINK  (1 << 2)
 
 #define DP_DSC_BITS_PER_PIXEL_INC   0x06F
+# define DP_DSC_RGB_YCbCr444_MAX_BPP_DELTA_MASK 0x1f
+# define DP_DSC_RGB_YCbCr420_MAX_BPP_DELTA_MASK 0xe0
 # define DP_DSC_BITS_PER_PIXEL_1_16 0x0
 # define DP_DSC_BITS_PER_PIXEL_1_8  0x1
 # define DP_DSC_BITS_PER_PIXEL_1_4  0x2
 # define DP_DSC_BITS_PER_PIXEL_1_2  0x3
-# define DP_DSC_BITS_PER_PIXEL_10x4
+# define DP_DSC_BITS_PER_PIXEL_1_1  0x4
 
 #define DP_PSR_SUPPORT  0x070   /* XXX 1.2? */
 # define DP_PSR_IS_SUPPORTED1
-- 
2.24.1.485.gad05a3d8e5



Re: [PATCH] drm/i915/guc/slpc: Allow SLPC to use efficient frequency

2022-08-15 Thread Rodrigo Vivi
On Sun, Aug 14, 2022 at 04:46:54PM -0700, Vinay Belgaumkar wrote:
> Host Turbo operates at efficient frequency when GT is not idle unless
> the user or workload has forced it to a higher level. Replicate the same
> behavior in SLPC by allowing the algorithm to use efficient frequency.
> We had disabled it during boot due to concerns that it might break
> kernel ABI for min frequency. However, this is not the case since
> SLPC will still abide by the (min,max) range limits.
> 
> With this change, min freq will be at efficient frequency level at init
> instead of fused min (RPn). If user chooses to reduce min freq below the
> efficient freq, we will turn off usage of efficient frequency and honor
> the user request. When a higher value is written, it will get toggled
> back again.
> 
> The patch also corrects the register which needs to be read for obtaining
> the correct efficient frequency for Gen9+.
> 
> We see much better perf numbers with benchmarks like glmark2 with
> efficient frequency usage enabled as expected.
> 
> BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/5468
> 
> Cc: Rodrigo Vivi 

First of all sorry for looking to the old patch first... I was delayed in my 
inbox flow.

> Signed-off-by: Vinay Belgaumkar 
> ---
>  drivers/gpu/drm/i915/gt/intel_rps.c |  3 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 66 +++--
>  drivers/gpu/drm/i915/intel_mchbar_regs.h|  3 +
>  3 files changed, 40 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
> b/drivers/gpu/drm/i915/gt/intel_rps.c
> index c7d381ad90cf..281a086fc265 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1108,6 +1108,9 @@ void gen6_rps_get_freq_caps(struct intel_rps *rps, 
> struct intel_rps_freq_caps *c
>   } else {
>   caps->rp0_freq = (rp_state_cap >>  0) & 0xff;
>   caps->rp1_freq = (rp_state_cap >>  8) & 0xff;
> + caps->rp1_freq = REG_FIELD_GET(RPE_MASK,
> +
> intel_uncore_read(to_gt(i915)->uncore,
> +GEN10_FREQ_INFO_REC));

This register is only gen10+ while the func is gen6+.
either we handle the platform properly or we create a new rpe_freq tracker 
somewhere
and if that's available we use this rpe, otherwise we use the hw fused rp1 
which is a good
enough, but it is not the actual one resolved by pcode, like this new RPe one.

>   caps->min_freq = (rp_state_cap >> 16) & 0xff;
>   }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> index e1fa1f32f29e..70a2af5f518d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> @@ -465,6 +465,29 @@ int intel_guc_slpc_get_max_freq(struct intel_guc_slpc 
> *slpc, u32 *val)
>   return ret;
>  }
>  
> +static int slpc_ignore_eff_freq(struct intel_guc_slpc *slpc, bool ignore)

I know this code was already there, but I do have some questions around this
and maybe we can simplify now that are touching this function.

> +{
> + int ret = 0;
> +
> + if (ignore) {
> + ret = slpc_set_param(slpc,
> +  SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
> +  ignore);
> + if (!ret)
> + return slpc_set_param(slpc,
> +   
> SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> +   slpc->min_freq);

why do we need to touch this min request here?

> + } else {
> + ret = slpc_unset_param(slpc,
> +SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY);

do we really need the unset param?

for me using set_param(SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY, freq < rpe_freq)
was enough...

> + if (!ret)
> + return slpc_unset_param(slpc,
> + 
> SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ);
> + }
> +
> + return ret;
> +}
> +
>  /**
>   * intel_guc_slpc_set_min_freq() - Set min frequency limit for SLPC.
>   * @slpc: pointer to intel_guc_slpc.
> @@ -491,6 +514,14 @@ int intel_guc_slpc_set_min_freq(struct intel_guc_slpc 
> *slpc, u32 val)
>  
>   with_intel_runtime_pm(>runtime_pm, wakeref) {
>  
> + /* Ignore efficient freq if lower min freq is requested */
> + ret = slpc_ignore_eff_freq(slpc, val < slpc->rp1_freq);
> + if (unlikely(ret)) {
> + i915_probe_error(i915, "Failed to toggle efficient freq 
> (%pe)\n",
> +  ERR_PTR(ret));
> + return ret;
> + }
> +
>   ret = slpc_set_param(slpc,
>SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
>val);
> 

[PATCH] drm/nouveau/hwmon: use simplified HWMON_CHANNEL_INFO macro

2022-08-15 Thread Beniamin Sandu
This makes the code look cleaner and easier to read.

Signed-off-by: Beniamin Sandu 
---
 drivers/gpu/drm/nouveau/nouveau_hwmon.c | 85 +
 1 file changed, 17 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_hwmon.c 
b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
index 1c3104d20571..a7db7c31064b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_hwmon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_hwmon.c
@@ -211,75 +211,24 @@ static const struct attribute_group 
temp1_auto_point_sensor_group = {
 
 #define N_ATTR_GROUPS   3
 
-static const u32 nouveau_config_chip[] = {
-   HWMON_C_UPDATE_INTERVAL,
-   0
-};
-
-static const u32 nouveau_config_in[] = {
-   HWMON_I_INPUT | HWMON_I_MIN | HWMON_I_MAX | HWMON_I_LABEL,
-   0
-};
-
-static const u32 nouveau_config_temp[] = {
-   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_MAX_HYST |
-   HWMON_T_CRIT | HWMON_T_CRIT_HYST | HWMON_T_EMERGENCY |
-   HWMON_T_EMERGENCY_HYST,
-   0
-};
-
-static const u32 nouveau_config_fan[] = {
-   HWMON_F_INPUT,
-   0
-};
-
-static const u32 nouveau_config_pwm[] = {
-   HWMON_PWM_INPUT | HWMON_PWM_ENABLE,
-   0
-};
-
-static const u32 nouveau_config_power[] = {
-   HWMON_P_INPUT | HWMON_P_CAP_MAX | HWMON_P_CRIT,
-   0
-};
-
-static const struct hwmon_channel_info nouveau_chip = {
-   .type = hwmon_chip,
-   .config = nouveau_config_chip,
-};
-
-static const struct hwmon_channel_info nouveau_temp = {
-   .type = hwmon_temp,
-   .config = nouveau_config_temp,
-};
-
-static const struct hwmon_channel_info nouveau_fan = {
-   .type = hwmon_fan,
-   .config = nouveau_config_fan,
-};
-
-static const struct hwmon_channel_info nouveau_in = {
-   .type = hwmon_in,
-   .config = nouveau_config_in,
-};
-
-static const struct hwmon_channel_info nouveau_pwm = {
-   .type = hwmon_pwm,
-   .config = nouveau_config_pwm,
-};
-
-static const struct hwmon_channel_info nouveau_power = {
-   .type = hwmon_power,
-   .config = nouveau_config_power,
-};
-
 static const struct hwmon_channel_info *nouveau_info[] = {
-   _chip,
-   _temp,
-   _fan,
-   _in,
-   _pwm,
-   _power,
+   HWMON_CHANNEL_INFO(chip,
+  HWMON_C_UPDATE_INTERVAL),
+   HWMON_CHANNEL_INFO(temp,
+  HWMON_T_INPUT |
+  HWMON_T_MAX | HWMON_T_MAX_HYST |
+  HWMON_T_CRIT | HWMON_T_CRIT_HYST |
+  HWMON_T_EMERGENCY | HWMON_T_EMERGENCY_HYST),
+   HWMON_CHANNEL_INFO(fan,
+  HWMON_F_INPUT),
+   HWMON_CHANNEL_INFO(in,
+  HWMON_I_INPUT |
+  HWMON_I_MIN | HWMON_I_MAX |
+  HWMON_I_LABEL),
+   HWMON_CHANNEL_INFO(pwm,
+  HWMON_PWM_INPUT | HWMON_PWM_ENABLE),
+   HWMON_CHANNEL_INFO(power,
+  HWMON_P_INPUT | HWMON_P_CAP_MAX | HWMON_P_CRIT),
NULL
 };
 
-- 
2.25.1



Re: [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-15 Thread Stephen Boyd
Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>
> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> >
> > Even then, you do have a valid point. DRM framework should not have
> > caused the disable path to happen without an enable.
> >
> > I went through the stack mentioned in the issue.
> >
> > Lets see this part of the stack:
> >
> > dp_ctrl_push_idle+0x40/0x88
> >  dp_bridge_disable+0x24/0x30
> >  drm_atomic_bridge_chain_disable+0x90/0xbc
> >  drm_atomic_helper_commit_modeset_disables+0x198/0x444
> >  msm_atomic_commit_tail+0x1d0/0x374
> >
> > In drm_atomic_helper_commit_modeset_disables(), we call
> > disable_outputs().
> >
> > AFAICT, this is the only place which has a protection to not call the
> > disable() flow if it was not enabled here:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
> >
> >
> > But that function is only checking crtc_state->active. Thats set by
> > the usermode:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
> >
> >
> > Now, if usermode sets that to true and then crashed it can bypass this
> > check and we will crash in the location kuogee is trying to fix.

That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.

> >
> > From the issue mentioned in
> > https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
> > mention the usermode crashed.
> >
> > So this is my tentative analysis of whats happening here.
> >
> > Ideally yes, we should have been protected by the location mentioned
> > above in disable_outputs() but looks to me due to the above hypothesis
> > its getting bypassed.

Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.

Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.

> >
> > Thanks
> >
> > Abhinav
> >
> >
> Ii sound likes that there is a hole either at user space or drm.
>
> But that should not cause dp_bridge_disable() at dp driver to crash.

Agreed.

>
> Therefore it is properly to check hdp_state condition at
> dp_bridge_disable() to prevent it from crashing.
>

Disagree. Userspace shouldn't be able to get drm into a wedged state.


[PATCH RFC 0/1] Imagination Technologies PowerVR DRM driver

2022-08-15 Thread Sarah Walker
This patch adds the initial DRM driver for Imagination Technologies PowerVR
GPUs, starting with those based on our Rogue architecture. It's worth pointing
out that this is a new driver, written from the ground up, rather than a
refactored version of our existing downstream driver (pvrsrvkm).

This new DRM driver supports:
- GEM shmem allocations
- dma-buf / PRIME
- Per-context (device node open) userspace managed virtual address space
- Implicit sync / reservation objects
- DRM sync objects
- Power management suspend / resume
- Render job submission
- Compute job submission
- META firmware processor
- MIPS firmware processor
- Basic GPU hang recovery

Still to do:
- Transfer job submission (needed for Vulkan)
- No-op job submission (needed for Vulkan)
- Support RISC-V firmware processor
- GPU hang detection
- Handling for running out of parameter buffer space
- DVFS

Currently our main focus is on our GX6250, AXE-1-16M and BXS-4-64 GPUs. Testing
so far has been done using an Acer Chromebook R13 (GX6250 GPU) and a TI SK-AM62
board (AXE-1-16M GPU). Firmware for the GX6250 and AXE-1-16M can be found here:
https://gitlab.freedesktop.org/frankbinns/linux-firmware/-/tree/powervr

A Vulkan driver that works with our downstream kernel driver has already been
merged into Mesa [1][2]. Support for this new DRM driver is being maintained in
a draft merge request [3], with the branch located here:
https://gitlab.freedesktop.org/frankbinns/mesa/-/tree/powervr-winsys

The Vulkan driver is progressing towards Vulkan 1.0. We've got several APIs left
to implement and a bunch that are queued up. We've mainly been running the
Sascha Willems 'triangle' example to verify that we've not regressed anything,
along with the 'pipelines' example and a simple compute example. We've not yet
done a full conformance run, so I don't have any numbers to share just yet.

The code in this patch, along with some of its history, can also be found here:
https://gitlab.freedesktop.org/frankbinns/powervr/-/tree/powervr-next

Sending this out now as it felt like a good point to get some feedback.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15243
[2] https://gitlab.freedesktop.org/mesa/mesa/-/tree/main/src/imagination/vulkan
[3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15507

Sarah Walker (1):
  drm/imagination: Add initial Imagination Technologies PowerVR driver

 drivers/gpu/drm/Kconfig   |2 +
 drivers/gpu/drm/Makefile  |1 +
 drivers/gpu/drm/imagination/Kconfig   |   11 +
 drivers/gpu/drm/imagination/Makefile  |   37 +
 drivers/gpu/drm/imagination/pvr_ccb.c |  348 +
 drivers/gpu/drm/imagination/pvr_ccb.h |   50 +
 drivers/gpu/drm/imagination/pvr_cccb.c|  326 +
 drivers/gpu/drm/imagination/pvr_cccb.h|  103 +
 drivers/gpu/drm/imagination/pvr_context.c |  607 ++
 drivers/gpu/drm/imagination/pvr_context.h |  176 +
 drivers/gpu/drm/imagination/pvr_debugfs.c |   53 +
 drivers/gpu/drm/imagination/pvr_debugfs.h |   29 +
 drivers/gpu/drm/imagination/pvr_device.c  |  759 ++
 drivers/gpu/drm/imagination/pvr_device.h  |  721 ++
 drivers/gpu/drm/imagination/pvr_device_info.c |  207 +
 drivers/gpu/drm/imagination/pvr_device_info.h |  125 +
 drivers/gpu/drm/imagination/pvr_drv.c | 1118 +++
 drivers/gpu/drm/imagination/pvr_drv.h |   19 +
 drivers/gpu/drm/imagination/pvr_fence.c   |  446 ++
 drivers/gpu/drm/imagination/pvr_fence.h   |  168 +
 drivers/gpu/drm/imagination/pvr_free_list.c   |  407 ++
 drivers/gpu/drm/imagination/pvr_free_list.h   |  142 +
 drivers/gpu/drm/imagination/pvr_fw.c  | 1028 +++
 drivers/gpu/drm/imagination/pvr_fw.h  |  329 +
 drivers/gpu/drm/imagination/pvr_fw_info.h |  106 +
 drivers/gpu/drm/imagination/pvr_fw_meta.c |  597 ++
 drivers/gpu/drm/imagination/pvr_fw_meta.h |   14 +
 drivers/gpu/drm/imagination/pvr_fw_mips.c |  276 +
 drivers/gpu/drm/imagination/pvr_fw_mips.h |   38 +
 .../gpu/drm/imagination/pvr_fw_startstop.c|  279 +
 .../gpu/drm/imagination/pvr_fw_startstop.h|   13 +
 drivers/gpu/drm/imagination/pvr_fw_trace.c|  505 ++
 drivers/gpu/drm/imagination/pvr_fw_trace.h|   78 +
 drivers/gpu/drm/imagination/pvr_gem.c | 1082 +++
 drivers/gpu/drm/imagination/pvr_gem.h |  374 +
 drivers/gpu/drm/imagination/pvr_hwrt.c|  548 ++
 drivers/gpu/drm/imagination/pvr_hwrt.h|  165 +
 drivers/gpu/drm/imagination/pvr_job.c | 1208 
 drivers/gpu/drm/imagination/pvr_job.h |   34 +
 drivers/gpu/drm/imagination/pvr_object.c  |  221 +
 drivers/gpu/drm/imagination/pvr_object.h  |   60 +
 drivers/gpu/drm/imagination/pvr_params.c  |  147 +
 drivers/gpu/drm/imagination/pvr_params.h  |   72 +
 drivers/gpu/drm/imagination/pvr_power.c   |  196 +
 drivers/gpu/drm/imagination/pvr_power.h   |   37 +
 .../gpu/drm/imagination/pvr_rogue_cr_defs.h   | 6193 

Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Khalid Masum

On 8/15/22 22:12, Greg KH wrote:

On Mon, Aug 15, 2022 at 09:11:18PM +0600, Khalid Masum wrote:

On 8/15/22 20:15, Dong, Ruijing wrote:

[AMD Official Use Only - General]

Sorry, which "r" value was overwritten?  I didn't see the point of making this 
change.

Thanks
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 3:01 AM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; linux-kernel-ment...@lists.linuxfoundation.org
Cc: Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui 
; David Airlie ; Daniel Vetter ; Zhu, James ; 
Jiang, Sonny ; Dong, Ruijing ; Wan Jiabing ; Liu, Leo 
; Khalid Masum 
Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
vcn_v4_0_stop

The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before it 
can be used. Remove this assignment.

Addresses-Coverity: 1504988 ("Unused value")
Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
Signed-off-by: Khalid Masum 
---
   drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..80b8a2c66b36 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
  fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
-   r = vcn_v4_0_stop_dpg_mode(adev, i);
+   vcn_v4_0_stop_dpg_mode(adev, i);
  continue;
  }

--
2.37.1



After value is overwritten soon right after the diff.

See:
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c

static int vcn_v4_0_stop(struct amdgpu_device *adev)
{
 volatile struct amdgpu_vcn4_fw_shared *fw_shared;
...

 for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
 fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
 fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

 if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
 r = vcn_v4_0_stop_dpg_mode(adev, i);
 continue;
 }

 /* wait for vcn idle */
 r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS,
UVD_STATUS__IDLE, 0x7);

Here, any value assigned to r is overwritten before it could
be used. So the assignment in the true branch of the if statement
here can be removed.


Why not fix vcn_v4_0_stop_dpg_mode() to not return anything, as it does
not, and then remove this assignment as well, which would fix up
everything at once to be more obvious what is happening and why.


That makes sense. I shall send a v2 this way. Thanks for your suggestion.



thanks,

greg k-h


thanks,
  -- Khalid Masum



Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Allow SLPC to use efficient frequency

2022-08-15 Thread Belgaumkar, Vinay



On 8/15/2022 9:51 AM, Rodrigo Vivi wrote:

On Tue, Aug 09, 2022 at 05:03:06PM -0700, Vinay Belgaumkar wrote:

Host Turbo operates at efficient frequency when GT is not idle unless
the user or workload has forced it to a higher level. Replicate the same
behavior in SLPC by allowing the algorithm to use efficient frequency.
We had disabled it during boot due to concerns that it might break
kernel ABI for min frequency. However, this is not the case, since
SLPC will still abide by the (min,max) range limits and pcode forces
frequency to 0 anyways when GT is in C6.

We also see much better perf numbers with benchmarks like glmark2 with
efficient frequency usage enabled.

Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits")

Signed-off-by: Vinay Belgaumkar 

I'm honestly surprised that our CI passed cleanly. What happens when user
request both min and max < rpe?

I'm sure that in this case GuC SLPC will put us to rpe ignoring our requests.
Or is this good enough for the users expectation because of the soft limits
showing the requested freq and we not asking to guc what it currently has as
minimal?

I just want to be sure that we are not causing any confusion for end users
out there in the case they request some min/max below RPe and start seeing
mismatches on the expectation because GuC is forcing the real min request
to RPe.

My suggestion is to ignore the RPe whenever we have a min request below it.
So GuC respects our (and users) chosen min. And restore whenever min request
is abobe rpe.


Yup, I have already sent a patch yesterday with that change, doesn't 
look like CI has run on it yet. This was the old version.


Thanks,

Vinay.



Thanks,
Rodrigo.


---
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 52 -
  1 file changed, 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
index e1fa1f32f29e..4b824da3048a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
@@ -137,17 +137,6 @@ static int guc_action_slpc_set_param(struct intel_guc 
*guc, u8 id, u32 value)
return ret > 0 ? -EPROTO : ret;
  }
  
-static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id)

-{
-   u32 request[] = {
-   GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
-   SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
-   id,
-   };
-
-   return intel_guc_send(guc, request, ARRAY_SIZE(request));
-}
-
  static bool slpc_is_running(struct intel_guc_slpc *slpc)
  {
return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING;
@@ -201,16 +190,6 @@ static int slpc_set_param(struct intel_guc_slpc *slpc, u8 
id, u32 value)
return ret;
  }
  
-static int slpc_unset_param(struct intel_guc_slpc *slpc,

-   u8 id)
-{
-   struct intel_guc *guc = slpc_to_guc(slpc);
-
-   GEM_BUG_ON(id >= SLPC_MAX_PARAM);
-
-   return guc_action_slpc_unset_param(guc, id);
-}
-
  static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
  {
struct drm_i915_private *i915 = slpc_to_i915(slpc);
@@ -597,29 +576,6 @@ static int slpc_set_softlimits(struct intel_guc_slpc *slpc)
return 0;
  }
  
-static int slpc_ignore_eff_freq(struct intel_guc_slpc *slpc, bool ignore)

-{
-   int ret = 0;
-
-   if (ignore) {
-   ret = slpc_set_param(slpc,
-SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
-ignore);
-   if (!ret)
-   return slpc_set_param(slpc,
- 
SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
- slpc->min_freq);
-   } else {
-   ret = slpc_unset_param(slpc,
-  SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY);
-   if (!ret)
-   return slpc_unset_param(slpc,
-   
SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ);
-   }
-
-   return ret;
-}
-
  static int slpc_use_fused_rp0(struct intel_guc_slpc *slpc)
  {
/* Force SLPC to used platform rp0 */
@@ -679,14 +635,6 @@ int intel_guc_slpc_enable(struct intel_guc_slpc *slpc)
  
  	slpc_get_rp_values(slpc);
  
-	/* Ignore efficient freq and set min to platform min */

-   ret = slpc_ignore_eff_freq(slpc, true);
-   if (unlikely(ret)) {
-   i915_probe_error(i915, "Failed to set SLPC min to RPn (%pe)\n",
-ERR_PTR(ret));
-   return ret;
-   }
-
/* Set SLPC max limit to RP0 */
ret = slpc_use_fused_rp0(slpc);
if (unlikely(ret)) {
--
2.35.1



Re: [PATCH 23/33] drm/vc4: hdmi: Move HDMI reset to pm_resume

2022-08-15 Thread Florian Fainelli

On 8/15/22 07:12, Maxime Ripard wrote:

On Wed, Aug 10, 2022 at 10:33:48PM +0200, Stefan Wahren wrote:

Hi Florian,

Am 09.08.22 um 21:02 schrieb Florian Fainelli:

On 8/4/22 16:11, Florian Fainelli wrote:

On 6/13/22 07:47, Maxime Ripard wrote:

From: Dave Stevenson 

The BCM2835-37 found in the RaspberryPi 0 to 3 have a power domain
attached to the HDMI block, handled in Linux through runtime_pm.

That power domain is shared with the VEC block, so even if we put our
runtime_pm reference in the HDMI driver it would keep being on. If the
VEC is disabled though, the power domain would be disabled and we would
lose any initialization done in our bind implementation.

That initialization involves calling the reset function and
initializing
the CEC registers.

Let's move the initialization to our runtime_resume implementation so
that we initialize everything properly if we ever need to.

Fixes: c86b41214362 ("drm/vc4: hdmi: Move the HSM clock enable
to runtime_pm")
Signed-off-by: Dave Stevenson 
Signed-off-by: Maxime Ripard 


After seeing the same warning as Stefan reported in the link below,
but on the Raspberry Pi 4B:

https://www.spinics.net/lists/dri-devel/msg354170.html

a separate bisection effort led me to this commit, before is fine,
after produces 4 warnings during boot, see attached log.

Is there a fix that we can try that would also cover the Raspberry
Pi 4B? Is it possible that this series precipitates the problem:

https://www.spinics.net/lists/arm-kernel/msg984638.html


Maxime, Dave, anything you would want me to try? Still seeing these
warnings with net-next-6.0-11220-g15205c2829ca


At first this issue doesn't occur in Linux 5.19. So it's something new. I
was able to reproduce with todays linux-next, but interestingly it doesn't
occur in drm-misc-next.


Yeah, it should be fixed by
https://lore.kernel.org/all/20220629123510.1915022-38-max...@cerno.tech/
https://lore.kernel.org/all/20220629123510.1915022-39-max...@cerno.tech/

Both patches apparently didn't make the cut for the merge window, if it
works for you we can always queue them in drm-misc-fixes


Both of these patches eliminate the warning, I don't have a good set-up 
yet for ensuring that HDMI/V3dD is functional however:


Tested-by: Florian Fainelli 
--
Florian


Re: [Intel-gfx] [PATCH] drm/i915/guc/slpc: Allow SLPC to use efficient frequency

2022-08-15 Thread Rodrigo Vivi
On Tue, Aug 09, 2022 at 05:03:06PM -0700, Vinay Belgaumkar wrote:
> Host Turbo operates at efficient frequency when GT is not idle unless
> the user or workload has forced it to a higher level. Replicate the same
> behavior in SLPC by allowing the algorithm to use efficient frequency.
> We had disabled it during boot due to concerns that it might break
> kernel ABI for min frequency. However, this is not the case, since
> SLPC will still abide by the (min,max) range limits and pcode forces
> frequency to 0 anyways when GT is in C6.
> 
> We also see much better perf numbers with benchmarks like glmark2 with
> efficient frequency usage enabled.
> 
> Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits")
> 
> Signed-off-by: Vinay Belgaumkar 

I'm honestly surprised that our CI passed cleanly. What happens when user
request both min and max < rpe?

I'm sure that in this case GuC SLPC will put us to rpe ignoring our requests.
Or is this good enough for the users expectation because of the soft limits
showing the requested freq and we not asking to guc what it currently has as
minimal?

I just want to be sure that we are not causing any confusion for end users
out there in the case they request some min/max below RPe and start seeing
mismatches on the expectation because GuC is forcing the real min request
to RPe.

My suggestion is to ignore the RPe whenever we have a min request below it.
So GuC respects our (and users) chosen min. And restore whenever min request
is abobe rpe.

Thanks,
Rodrigo.

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c | 52 -
>  1 file changed, 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> index e1fa1f32f29e..4b824da3048a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
> @@ -137,17 +137,6 @@ static int guc_action_slpc_set_param(struct intel_guc 
> *guc, u8 id, u32 value)
>   return ret > 0 ? -EPROTO : ret;
>  }
>  
> -static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id)
> -{
> - u32 request[] = {
> - GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
> - SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1),
> - id,
> - };
> -
> - return intel_guc_send(guc, request, ARRAY_SIZE(request));
> -}
> -
>  static bool slpc_is_running(struct intel_guc_slpc *slpc)
>  {
>   return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING;
> @@ -201,16 +190,6 @@ static int slpc_set_param(struct intel_guc_slpc *slpc, 
> u8 id, u32 value)
>   return ret;
>  }
>  
> -static int slpc_unset_param(struct intel_guc_slpc *slpc,
> - u8 id)
> -{
> - struct intel_guc *guc = slpc_to_guc(slpc);
> -
> - GEM_BUG_ON(id >= SLPC_MAX_PARAM);
> -
> - return guc_action_slpc_unset_param(guc, id);
> -}
> -
>  static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
>  {
>   struct drm_i915_private *i915 = slpc_to_i915(slpc);
> @@ -597,29 +576,6 @@ static int slpc_set_softlimits(struct intel_guc_slpc 
> *slpc)
>   return 0;
>  }
>  
> -static int slpc_ignore_eff_freq(struct intel_guc_slpc *slpc, bool ignore)
> -{
> - int ret = 0;
> -
> - if (ignore) {
> - ret = slpc_set_param(slpc,
> -  SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
> -  ignore);
> - if (!ret)
> - return slpc_set_param(slpc,
> -   
> SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> -   slpc->min_freq);
> - } else {
> - ret = slpc_unset_param(slpc,
> -SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY);
> - if (!ret)
> - return slpc_unset_param(slpc,
> - 
> SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ);
> - }
> -
> - return ret;
> -}
> -
>  static int slpc_use_fused_rp0(struct intel_guc_slpc *slpc)
>  {
>   /* Force SLPC to used platform rp0 */
> @@ -679,14 +635,6 @@ int intel_guc_slpc_enable(struct intel_guc_slpc *slpc)
>  
>   slpc_get_rp_values(slpc);
>  
> - /* Ignore efficient freq and set min to platform min */
> - ret = slpc_ignore_eff_freq(slpc, true);
> - if (unlikely(ret)) {
> - i915_probe_error(i915, "Failed to set SLPC min to RPn (%pe)\n",
> -  ERR_PTR(ret));
> - return ret;
> - }
> -
>   /* Set SLPC max limit to RP0 */
>   ret = slpc_use_fused_rp0(slpc);
>   if (unlikely(ret)) {
> -- 
> 2.35.1
> 


Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Greg KH
On Mon, Aug 15, 2022 at 09:11:18PM +0600, Khalid Masum wrote:
> On 8/15/22 20:15, Dong, Ruijing wrote:
> > [AMD Official Use Only - General]
> > 
> > Sorry, which "r" value was overwritten?  I didn't see the point of making 
> > this change.
> > 
> > Thanks
> > Ruijing
> > 
> > -Original Message-
> > From: Khalid Masum 
> > Sent: Monday, August 15, 2022 3:01 AM
> > To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
> > linux-ker...@vger.kernel.org; linux-kernel-ment...@lists.linuxfoundation.org
> > Cc: Deucher, Alexander ; Koenig, Christian 
> > ; Pan, Xinhui ; David Airlie 
> > ; Daniel Vetter ; Zhu, James 
> > ; Jiang, Sonny ; Dong, Ruijing 
> > ; Wan Jiabing ; Liu, Leo 
> > ; Khalid Masum 
> > Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
> > vcn_v4_0_stop
> > 
> > The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before 
> > it can be used. Remove this assignment.
> > 
> > Addresses-Coverity: 1504988 ("Unused value")
> > Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
> > Signed-off-by: Khalid Masum 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > index ca14c3ef742e..80b8a2c66b36 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > @@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
> >  fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
> > 
> >  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
> > -   r = vcn_v4_0_stop_dpg_mode(adev, i);
> > +   vcn_v4_0_stop_dpg_mode(adev, i);
> >  continue;
> >  }
> > 
> > --
> > 2.37.1
> > 
> 
> After value is overwritten soon right after the diff.
> 
> See:
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> 
> static int vcn_v4_0_stop(struct amdgpu_device *adev)
> {
> volatile struct amdgpu_vcn4_fw_shared *fw_shared;
> ...
> 
> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
> fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
> 
> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
> r = vcn_v4_0_stop_dpg_mode(adev, i);
> continue;
> }
> 
> /* wait for vcn idle */
> r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS,
> UVD_STATUS__IDLE, 0x7);
> 
> Here, any value assigned to r is overwritten before it could
> be used. So the assignment in the true branch of the if statement
> here can be removed.

Why not fix vcn_v4_0_stop_dpg_mode() to not return anything, as it does
not, and then remove this assignment as well, which would fix up
everything at once to be more obvious what is happening and why.

thanks,

greg k-h


RE: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Dong, Ruijing
[AMD Official Use Only - General]

Then please update commit message, this change is due to "value r is never 
used, and remove unnecessary assignment", that makes sense to me.

Thanks
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 11:54 AM
To: Dong, Ruijing ; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; 
linux-kernel-ment...@lists.linuxfoundation.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Zhu, James 
; Jiang, Sonny ; Wan Jiabing 
; Liu, Leo 
Subject: Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
vcn_v4_0_stop

On 8/15/22 21:17, Dong, Ruijing wrote:
> [AMD Official Use Only - General]
>
> If the condition was met and it came to execute vcn_4_0_stop_dpg_mode, then 
> it would never have a chance to go for /*wait for vcn idle*/, isn't it?

Hypothetically, some other thread might set adev->pg_flags NULL and in that 
case it will get the chance to go for /* wait for vcn idle */.


> I still didn't see obvious purpose of this change.
>
>  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
>   ==>  r = vcn_v4_0_stop_dpg_mode(adev, i);

Regardless of that, this assignment to r is unnecessary. Because this
value of r is never used. This patch simply removes this unnecessary
assignment.

>   continue;
>   }
>
>   /* wait for vcn idle */
>   r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS, 
> UVD_STATUS__IDLE, 0x7);
>
> Thanks
> Ruijing
>

Thanks,
   -- Khalid Masum


Re: [PATCH] drm/hyperv: Fix an error handling path in hyperv_vmbus_probe()

2022-08-15 Thread Wei Liu
On Fri, Aug 05, 2022 at 06:35:01PM +, Michael Kelley (LINUX) wrote:
> From: Christophe JAILLET  Sent: Sunday, July 
> 31, 2022 1:02 PM
> > 
> > hyperv_setup_vram() calls vmbus_allocate_mmio().
> > This must be undone in the error handling path of the probe, as already
> > done in the remove function.
> > 
> > This patch depends on commit a0ab5abced55 ("drm/hyperv : Removing the
> > restruction of VRAM allocation with PCI bar size").
> > Without it, something like what is done in commit e048834c209a
> > ("drm/hyperv: Fix device removal on Gen1 VMs") should be done.
> 
> Should the above paragraph be below the '---' as a comment, rather than
> part of the commit message?  It's more about staging instructions than a
> long-term record of the actual functional/code change.
> 

I don't think this paragraph needs to be in the final commit message.

> > 
> > Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video 
> > device")
> 
> I wonder if the Fixes: dependency should be on a0ab5abced55.  As you noted,
> this patch won't apply cleanly on stable kernel versions that lack that 
> commit,
> so we'll need a separate patch for stable if we want to make the fix there.
> 

I think a0ab5abced55 is more appropriate.

> > Signed-off-by: Christophe JAILLET 
> 
> All that said, the fix looks good, so
> 
> Reviewed-by: Michael Kelley 

I made the two changes listed above and applied this patch to
hyperv-fixes.

Thanks,
Wei.


Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Laurent Pinchart
On Mon, Aug 15, 2022 at 04:44:44PM +0100, Mark Brown wrote:
> On Fri, 12 Aug 2022 13:08:17 +0300, Matti Vaittinen wrote:
> > Devm helpers for regulator get and enable
> > 
> > First patch in the series is actually just a simple documentation fix
> > which could be taken in as it is now.
> > 
> > A few* drivers seem to use pattern demonstrated by pseudocode:
> > 
> > [...]
> 
> Applied to
> 
>https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
> for-next
> 
> Thanks!
> 
> [1/7] docs: devres: regulator: Add missing devm_* functions to devres.rst
>   commit: 9b6744f60b6b47bc0757a1955adb4d2c3ab22e13
> [2/7] regulator: Add devm helpers for get and enable
>   (no commit info)

I didn't have time to reply to the series yet, but I think this isn't a
great idea. There are two issues:

- With devres, you don't have full control over the order in which
  resources will be released, which means that you can't control the
  power off sequence, in particular if it needs to be sequenced with
  GPIOs and clocks. That's not a concern for all drivers, but this API
  will creep in in places where it shouldn't be used, driver authours
  should really pay attention to power management and not live with the
  false impression that everything will be handled automatically for
  them. In the worst cases, an incorrect power off sequence could lead
  to hardware damage.

- Powering regulators on at probe time and leaving them on is a very bad
  practice from a power management point of view, and should really be
  discouraged. Adding convenience helpers to make this easy is the wrong
  message, we should instead push driver authors to implement proper
  runtime PM.

> All being well this means that it will be integrated into the linux-next
> tree (usually sometime in the next 24 hours) and sent to Linus during
> the next merge window (or sooner if it is a bug fix), however if
> problems are discovered then the patch may be dropped or reverted.
> 
> You may get further e-mails resulting from automated or manual testing
> and review of the tree, please engage with people reporting problems and
> send followup patches addressing any issues that are reported if needed.
> 
> If any updates are required or you are submitting further changes they
> should be sent as incremental updates against current git, existing
> patches will not be replaced.
> 
> Please add any relevant lists and maintainers to the CCs when replying
> to this mail.

-- 
Regards,

Laurent Pinchart


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Dmitry Osipenko
On 8/15/22 17:57, Dmitry Osipenko wrote:
> On 8/15/22 16:53, Christian König wrote:
>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>> [SNIP]
 Well that comment sounds like KVM is doing the right thing, so I'm
 wondering what exactly is going on here.
>>> KVM actually doesn't hold the page reference, it takes the temporal
>>> reference during page fault and then drops the reference once page is
>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>> a race condition here?
>>>
>>
>> Well the question is why does KVM grab the page reference in the first
>> place?
>>
>> If that is to prevent the mapping from changing then yes that's illegal
>> and won't work. It can always happen that you grab the address, solve
>> the fault and then immediately fault again because the address you just
>> grabbed is invalidated.
>>
>> If it's for some other reason than we should probably investigate if we
>> shouldn't stop doing this.
> 
> CC: +Paolo Bonzini who introduced this code
> 
> commit add6a0cd1c5ba51b201e1361b05a5df817083618
> Author: Paolo Bonzini 
> Date:   Tue Jun 7 17:51:18 2016 +0200
> 
> KVM: MMU: try to fix up page faults before giving up
> 
> The vGPU folks would like to trap the first access to a BAR by setting
> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> handler
> then can use remap_pfn_range to place some non-reserved pages in the
> VMA.
> 
> This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
> and fixup_user_fault together help supporting it.  The patch also
> supports
> VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
> reference counting.
> 
> @Paolo,
> https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5...@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154
> 

If we need to bump the refcount only for VM_MIXEDMAP and not for
VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
code that will denote to kvm_release_page_clean whether it needs to put
the page?

-- 
Best regards,
Dmitry


Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Khalid Masum

On 8/15/22 21:17, Dong, Ruijing wrote:

[AMD Official Use Only - General]

If the condition was met and it came to execute vcn_4_0_stop_dpg_mode, then it 
would never have a chance to go for /*wait for vcn idle*/, isn't it?


Hypothetically, some other thread might set adev->pg_flags NULL and in 
that case it will get the chance to go for /* wait for vcn idle */.




I still didn't see obvious purpose of this change.

 if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
  ==>  r = vcn_v4_0_stop_dpg_mode(adev, i);


Regardless of that, this assignment to r is unnecessary. Because this 
value of r is never used. This patch simply removes this unnecessary

assignment.


  continue;
  }

  /* wait for vcn idle */
  r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS, 
UVD_STATUS__IDLE, 0x7);

Thanks
Ruijing



Thanks,
  -- Khalid Masum


Re: [PATCH] spi/panel: dt-bindings: drop 3-wire from common properties

2022-08-15 Thread Mark Brown
On Wed, 10 Aug 2022 16:13:11 +0300, Krzysztof Kozlowski wrote:
> The spi-3wire property is device specific and should be accepted only if
> device really needs them.  Drop it from common spi-peripheral-props.yaml
> schema, mention in few panel drivers which use it and include instead in
> the SPI controller bindings.  The controller bindings will provide
> spi-3wire type validation and one place for description.  Each device
> schema must list the property if it is applicable.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi/panel: dt-bindings: drop 3-wire from common properties
  commit: 41f53a65444997f55c82c67f71a9cff05c1dee31

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: (subset) [PATCH v2 0/7] Devm helpers for regulator get and enable

2022-08-15 Thread Mark Brown
On Fri, 12 Aug 2022 13:08:17 +0300, Matti Vaittinen wrote:
> Devm helpers for regulator get and enable
> 
> First patch in the series is actually just a simple documentation fix
> which could be taken in as it is now.
> 
> A few* drivers seem to use pattern demonstrated by pseudocode:
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
for-next

Thanks!

[1/7] docs: devres: regulator: Add missing devm_* functions to devres.rst
  commit: 9b6744f60b6b47bc0757a1955adb4d2c3ab22e13
[2/7] regulator: Add devm helpers for get and enable
  (no commit info)

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


[PATCH v1 7/7] drm/vc4: Make sure we don't end up with a core clock too high

2022-08-15 Thread Maxime Ripard
Following the clock rate range improvements to the clock framework,
trying to set a disjoint range on a clock will now result in an error.

Thus, we can't set a minimum rate higher than the maximum reported by
the firmware, or clk_set_min_rate() will fail.

Thus we need to clamp the rate we are about to ask for to the maximum
rate possible on that clock.

Signed-off-by: Maxime Ripard 

diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index b45dcdfd7306..4794e7235bb0 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 #include "vc4_drv.h"
 #include "vc4_regs.h"
 
@@ -354,6 +356,7 @@ static void vc4_atomic_commit_tail(struct drm_atomic_state 
*state)
struct vc4_hvs_state *new_hvs_state;
struct drm_crtc *crtc;
struct vc4_hvs_state *old_hvs_state;
+   unsigned long max_clock_rate;
unsigned int channel;
int i;
 
@@ -394,11 +397,12 @@ static void vc4_atomic_commit_tail(struct 
drm_atomic_state *state)
old_hvs_state->fifo_state[channel].pending_commit = NULL;
}
 
+   max_clock_rate = rpi_firmware_clk_get_max_rate(hvs->core_clk);
if (vc4->is_vc5) {
unsigned long state_rate = max(old_hvs_state->core_clock_rate,
   new_hvs_state->core_clock_rate);
-   unsigned long core_rate = max_t(unsigned long,
-   5, state_rate);
+   unsigned long core_rate = clamp_t(unsigned long, state_rate,
+ 5, max_clock_rate);
 
drm_dbg(dev, "Raising the core clock at %lu Hz\n", core_rate);
 
@@ -432,14 +436,17 @@ static void vc4_atomic_commit_tail(struct 
drm_atomic_state *state)
drm_atomic_helper_cleanup_planes(dev, state);
 
if (vc4->is_vc5) {
-   drm_dbg(dev, "Running the core clock at %lu Hz\n",
-   new_hvs_state->core_clock_rate);
+   unsigned long core_rate = min_t(unsigned long,
+  max_clock_rate,
+  new_hvs_state->core_clock_rate);
+
+   drm_dbg(dev, "Running the core clock at %lu Hz\n", core_rate);
 
/*
 * Request a clock rate based on the current HVS
 * requirements.
 */
-   WARN_ON(clk_set_min_rate(hvs->core_clk, 
new_hvs_state->core_clock_rate));
+   WARN_ON(clk_set_min_rate(hvs->core_clk, core_rate));
 
drm_dbg(dev, "Core clock actual rate: %lu Hz\n",
clk_get_rate(hvs->core_clk));

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 5/7] drm/vc4: hdmi: Rework hdmi_enable_4kp60 detection code

2022-08-15 Thread Maxime Ripard
In order to support higher HDMI frequencies, users have to set the
hdmi_enable_4kp60 parameter in their config.txt file.

This will have the side-effect of raising the maximum of the core clock,
tied to the HVS, and managed by the HVS driver.

However, we are querying this in the HDMI driver by poking into the HVS
structure to get our struct clk handle.

Let's make this part of the HVS bind implementation to have all the core
clock related setup in the same place.

Signed-off-by: Maxime Ripard 

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index 1beb96b77b8c..d48ef302af42 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -339,6 +339,14 @@ struct vc4_hvs {
struct drm_mm_node mitchell_netravali_filter;
 
struct debugfs_regset32 regset;
+
+   /*
+* Even if HDMI0 on the RPi4 can output modes requiring a pixel
+* rate higher than 297MHz, it needs some adjustments in the
+* config.txt file to be able to do so and thus won't always be
+* available.
+*/
+   bool vc5_hdmi_enable_scrambling;
 };
 
 struct vc4_plane {
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index aa3ebda55e04..371fbc05bf5a 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -46,7 +46,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -277,6 +276,7 @@ static void vc4_hdmi_connector_destroy(struct drm_connector 
*connector)
 static int vc4_hdmi_connector_get_modes(struct drm_connector *connector)
 {
struct vc4_hdmi *vc4_hdmi = connector_to_vc4_hdmi(connector);
+   struct vc4_dev *vc4 = to_vc4_dev(connector->dev);
int ret = 0;
struct edid *edid;
 
@@ -293,7 +293,7 @@ static int vc4_hdmi_connector_get_modes(struct 
drm_connector *connector)
ret = drm_add_edid_modes(connector, edid);
kfree(edid);
 
-   if (vc4_hdmi->disable_4kp60) {
+   if (!vc4->hvs->vc5_hdmi_enable_scrambling) {
struct drm_device *drm = connector->dev;
struct drm_display_mode *mode;
 
@@ -1480,11 +1480,12 @@ vc4_hdmi_encoder_clock_valid(const struct vc4_hdmi 
*vc4_hdmi,
 {
const struct drm_connector *connector = _hdmi->connector;
const struct drm_display_info *info = >display_info;
+   struct vc4_dev *vc4 = to_vc4_dev(connector->dev);
 
if (clock > vc4_hdmi->variant->max_pixel_clock)
return MODE_CLOCK_HIGH;
 
-   if (vc4_hdmi->disable_4kp60 && clock > HDMI_14_MAX_TMDS_CLK)
+   if (!vc4->hvs->vc5_hdmi_enable_scrambling && clock > 
HDMI_14_MAX_TMDS_CLK)
return MODE_CLOCK_HIGH;
 
if (info->max_tmds_clock && clock > (info->max_tmds_clock * 1000))
@@ -2965,14 +2966,6 @@ static int vc4_hdmi_bind(struct device *dev, struct 
device *master, void *data)
vc4_hdmi->disable_wifi_frequencies =
of_property_read_bool(dev->of_node, "wifi-2.4ghz-coexistence");
 
-   if (variant->max_pixel_clock == 6) {
-   struct vc4_dev *vc4 = to_vc4_dev(drm);
-   unsigned long max_rate = 
rpi_firmware_clk_get_max_rate(vc4->hvs->core_clk);
-
-   if (max_rate < 55000)
-   vc4_hdmi->disable_4kp60 = true;
-   }
-
/*
 * We need to have the device powered up at this point to call
 * our reset hook and for the CEC init.
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.h b/drivers/gpu/drm/vc4/vc4_hdmi.h
index c3ed2b07df23..7506943050cf 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.h
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.h
@@ -155,14 +155,6 @@ struct vc4_hdmi {
 */
bool disable_wifi_frequencies;
 
-   /*
-* Even if HDMI0 on the RPi4 can output modes requiring a pixel
-* rate higher than 297MHz, it needs some adjustments in the
-* config.txt file to be able to do so and thus won't always be
-* available.
-*/
-   bool disable_4kp60;
-
struct cec_adapter *cec_adap;
struct cec_msg cec_rx_msg;
bool cec_tx_ok;
diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c
index fbaa741dda5f..3fdd2c4356f6 100644
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 
+#include 
+
 #include "vc4_drv.h"
 #include "vc4_regs.h"
 
@@ -671,12 +673,18 @@ static int vc4_hvs_bind(struct device *dev, struct device 
*master, void *data)
hvs->regset.nregs = ARRAY_SIZE(hvs_regs);
 
if (vc4->is_vc5) {
+   unsigned long max_rate;
+
hvs->core_clk = devm_clk_get(>dev, NULL);
if (IS_ERR(hvs->core_clk)) {
dev_err(>dev, "Couldn't get core clock\n");
return PTR_ERR(hvs->core_clk);
}
 
+   max_rate = rpi_firmware_clk_get_max_rate(hvs->core_clk);
+   if (max_rate >= 

[PATCH v1 6/7] drm/vc4: hdmi: Add more checks for 4k resolutions

2022-08-15 Thread Maxime Ripard
From: Dom Cobley 

At least the 4096x2160@60Hz mode requires some overclocking that isn't
available by default, even if hdmi_enable_4kp60 is enabled.

Let's add some logic to detect whether we can satisfy the core clock
requirements for that mode, and prevent it from being used otherwise.

Signed-off-by: Dom Cobley 
Signed-off-by: Maxime Ripard 

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index d48ef302af42..e05f62a7eed6 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -347,6 +347,12 @@ struct vc4_hvs {
 * available.
 */
bool vc5_hdmi_enable_scrambling;
+
+   /*
+* 4096x2160@60 requires a core overclock to work, so register
+* whether that is sufficient.
+*/
+   bool vc5_hdmi_enable_4096by2160;
 };
 
 struct vc4_plane {
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 371fbc05bf5a..5abbb2fe41ac 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -1476,6 +1476,7 @@ vc4_hdmi_sink_supports_format_bpc(const struct vc4_hdmi 
*vc4_hdmi,
 
 static enum drm_mode_status
 vc4_hdmi_encoder_clock_valid(const struct vc4_hdmi *vc4_hdmi,
+const struct drm_display_mode *mode,
 unsigned long long clock)
 {
const struct drm_connector *connector = _hdmi->connector;
@@ -1488,6 +1489,12 @@ vc4_hdmi_encoder_clock_valid(const struct vc4_hdmi 
*vc4_hdmi,
if (!vc4->hvs->vc5_hdmi_enable_scrambling && clock > 
HDMI_14_MAX_TMDS_CLK)
return MODE_CLOCK_HIGH;
 
+   /* 4096x2160@60 is not reliable without overclocking core */
+   if (!vc4->hvs->vc5_hdmi_enable_4096by2160 &&
+   mode->hdisplay > 3840 && mode->vdisplay >= 2160 &&
+   drm_mode_vrefresh(mode) >= 50)
+   return MODE_CLOCK_HIGH;
+
if (info->max_tmds_clock && clock > (info->max_tmds_clock * 1000))
return MODE_CLOCK_HIGH;
 
@@ -1522,7 +1529,7 @@ vc4_hdmi_encoder_compute_clock(const struct vc4_hdmi 
*vc4_hdmi,
unsigned long long clock;
 
clock = vc4_hdmi_encoder_compute_mode_clock(mode, bpc, fmt);
-   if (vc4_hdmi_encoder_clock_valid(vc4_hdmi, clock) != MODE_OK)
+   if (vc4_hdmi_encoder_clock_valid(vc4_hdmi, mode, clock) != MODE_OK)
return -EINVAL;
 
vc4_state->tmds_char_rate = clock;
@@ -1685,7 +1692,7 @@ vc4_hdmi_encoder_mode_valid(struct drm_encoder *encoder,
 (mode->hsync_end % 2) || (mode->htotal % 2)))
return MODE_H_ILLEGAL;
 
-   return vc4_hdmi_encoder_clock_valid(vc4_hdmi, mode->clock * 1000);
+   return vc4_hdmi_encoder_clock_valid(vc4_hdmi, mode, mode->clock * 1000);
 }
 
 static const struct drm_encoder_helper_funcs vc4_hdmi_encoder_helper_funcs = {
diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c
index 3fdd2c4356f6..6cfc1a4e7161 100644
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -673,6 +673,7 @@ static int vc4_hvs_bind(struct device *dev, struct device 
*master, void *data)
hvs->regset.nregs = ARRAY_SIZE(hvs_regs);
 
if (vc4->is_vc5) {
+   unsigned long min_rate;
unsigned long max_rate;
 
hvs->core_clk = devm_clk_get(>dev, NULL);
@@ -685,6 +686,10 @@ static int vc4_hvs_bind(struct device *dev, struct device 
*master, void *data)
if (max_rate >= 55000)
hvs->vc5_hdmi_enable_scrambling = true;
 
+   min_rate = rpi_firmware_clk_get_min_rate(hvs->core_clk);
+   if (min_rate >= 6)
+   hvs->vc5_hdmi_enable_4096by2160 = true;
+
ret = clk_prepare_enable(hvs->core_clk);
if (ret) {
dev_err(>dev, "Couldn't enable the core clock\n");

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 4/7] drm/vc4: hdmi: Fix hdmi_enable_4kp60 detection

2022-08-15 Thread Maxime Ripard
In order to support higher HDMI frequencies, users have to set the
hdmi_enable_4kp60 parameter in their config.txt file.

We were detecting this so far by calling clk_round_rate() on the core
clock with the frequency we're supposed to run at when one of those
modes is enabled. Whether or not the parameter was enabled could then be
inferred by the returned rate since the maximum clock rate reported by
the firmware was one of the side effect of setting that parameter.

However, the recent clock rework we did changed what clk_round_rate()
was returning to always return the minimum allowed, and thus this test
wasn't reliable anymore.

Let's use the new clk_get_max_rate() function to reliably determine the
maximum rate allowed on that clock and fix the 4k@60Hz output.

Fixes: e9d6cea2af1c ("clk: bcm: rpi: Run some clocks at the minimum rate 
allowed")
Signed-off-by: Maxime Ripard 

diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 592c3b5d03e6..aa3ebda55e04 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2966,7 +2967,7 @@ static int vc4_hdmi_bind(struct device *dev, struct 
device *master, void *data)
 
if (variant->max_pixel_clock == 6) {
struct vc4_dev *vc4 = to_vc4_dev(drm);
-   long max_rate = clk_round_rate(vc4->hvs->core_clk, 55000);
+   unsigned long max_rate = 
rpi_firmware_clk_get_max_rate(vc4->hvs->core_clk);
 
if (max_rate < 55000)
vc4_hdmi->disable_4kp60 = true;

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 2/7] clk: bcm: rpi: Add a function to retrieve the maximum

2022-08-15 Thread Maxime Ripard
The RaspberryPi firmware can be configured by the end user using the
config.txt file.

Some of these options will affect the kernel capabilities, and we thus
need to be able to detect it to operate reliably.

One of such parameters is the hdmi_enable_4kp60 parameter that will
setup the clocks in a way that is suitable to reach the pixel
frequencies required by the 4k at 60Hz and higher modes.

If the user forgot to enable it, then those modes will simply not work
but are still likely to be picked up by the userspace, which is a poor
user-experience.

The kernel can't access the config.txt file directly, but one of the
effect that parameter has is that the core clock frequency maximum will
be much higher. Thus we can infer whether it was enabled or not by
querying the firmware for that maximum, and if it isn't prevent any of
the modes that wouldn't work.

The HDMI driver is already doing this, but was relying on a behaviour of
clk_round_rate() that got changed recently, and doesn't return the
result we would like anymore.

We also considered introducing a CCF function to access the maximum of a
given struct clk, but that wouldn't work if the clock is further
constrained by another user.

It was thus suggested to create a small, ad-hoc function to query the
RaspberryPi firmware for the maximum rate a given clock has.

Suggested-by: Stephen Boyd 
Signed-off-by: Maxime Ripard 

diff --git a/drivers/clk/bcm/clk-raspberrypi.c 
b/drivers/clk/bcm/clk-raspberrypi.c
index 6c0a0fd6cd79..182e8817eac2 100644
--- a/drivers/clk/bcm/clk-raspberrypi.c
+++ b/drivers/clk/bcm/clk-raspberrypi.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 enum rpi_firmware_clk_id {
@@ -254,6 +255,33 @@ static int raspberrypi_fw_dumb_determine_rate(struct 
clk_hw *hw,
return 0;
 }
 
+unsigned long rpi_firmware_clk_get_max_rate(struct clk *clk)
+{
+   const struct raspberrypi_clk_data *data;
+   struct raspberrypi_clk *rpi;
+   struct clk_hw *hw;
+   u32 max_rate;
+   int ret;
+
+   if (!clk)
+   return 0;
+
+   hw =  __clk_get_hw(clk);
+   if (!hw)
+   return 0;
+
+   data = clk_hw_to_data(hw);
+   rpi = data->rpi;
+   ret = raspberrypi_clock_property(rpi->firmware, data,
+RPI_FIRMWARE_GET_MAX_CLOCK_RATE,
+_rate);
+   if (ret)
+   return 0;
+
+   return max_rate;
+}
+EXPORT_SYMBOL_GPL(rpi_firmware_clk_get_max_rate);
+
 static const struct clk_ops raspberrypi_firmware_clk_ops = {
.is_prepared= raspberrypi_fw_is_prepared,
.recalc_rate= raspberrypi_fw_get_rate,
diff --git a/include/soc/bcm2835/raspberrypi-clocks.h 
b/include/soc/bcm2835/raspberrypi-clocks.h
new file mode 100644
index ..ff0b608b51a8
--- /dev/null
+++ b/include/soc/bcm2835/raspberrypi-clocks.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __SOC_RASPBERRY_CLOCKS_H__
+#define __SOC_RASPBERRY_CLOCKS_H__
+
+#if IS_ENABLED(CONFIG_CLK_RASPBERRYPI)
+unsigned long rpi_firmware_clk_get_max_rate(struct clk *clk);
+#else
+static inline unsigned long rpi_firmware_clk_get_max_rate(struct clk *clk)
+{
+   return ULONG_MAX;
+}
+#endif
+
+#endif /* __SOC_RASPBERRY_CLOCKS_H__ */

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 3/7] clk: bcm: rpi: Add a function to retrieve the minimum

2022-08-15 Thread Maxime Ripard
The RaspberryPi firmware can be configured by the end user using the
config.txt file.

Some of these options will affect the kernel capabilities, and we thus
need to be able to detect it to operate reliably.

One of such parameters is the core_clock parameter that allows users to
setup the clocks in a way that is suitable to reach the pixel
frequencies required by the 4096x2016 resolution at 60Hz and higher
modes.

If the user misconfigured it, then those modes will simply not work
but are still likely to be picked up by the userspace, which is a poor
user-experience.

The kernel can't access the config.txt file directly, but one of the
effect that parameter has is that the core clock frequency minimum will
be raised. Thus we can infer its setup by querying the firmware for that
minimum, and if it isn't ignore any of the modes that wouldn't work.

We had in the past a discussion for the maximum and it was suggested to
create a small, ad-hoc function to query the RaspberryPi firmware for
the minimum rate a given clock has, so let's do the same here.

Signed-off-by: Maxime Ripard 

diff --git a/drivers/clk/bcm/clk-raspberrypi.c 
b/drivers/clk/bcm/clk-raspberrypi.c
index 182e8817eac2..b81da5b1dd1e 100644
--- a/drivers/clk/bcm/clk-raspberrypi.c
+++ b/drivers/clk/bcm/clk-raspberrypi.c
@@ -282,6 +282,33 @@ unsigned long rpi_firmware_clk_get_max_rate(struct clk 
*clk)
 }
 EXPORT_SYMBOL_GPL(rpi_firmware_clk_get_max_rate);
 
+unsigned long rpi_firmware_clk_get_min_rate(struct clk *clk)
+{
+   const struct raspberrypi_clk_data *data;
+   struct raspberrypi_clk *rpi;
+   struct clk_hw *hw;
+   u32 min_rate;
+   int ret;
+
+   if (!clk)
+   return 0;
+
+   hw =  __clk_get_hw(clk);
+   if (!hw)
+   return 0;
+
+   data = clk_hw_to_data(hw);
+   rpi = data->rpi;
+   ret = raspberrypi_clock_property(rpi->firmware, data,
+RPI_FIRMWARE_GET_MIN_CLOCK_RATE,
+_rate);
+   if (ret)
+   return 0;
+
+   return min_rate;
+}
+EXPORT_SYMBOL_GPL(rpi_firmware_clk_get_min_rate);
+
 static const struct clk_ops raspberrypi_firmware_clk_ops = {
.is_prepared= raspberrypi_fw_is_prepared,
.recalc_rate= raspberrypi_fw_get_rate,
diff --git a/include/soc/bcm2835/raspberrypi-clocks.h 
b/include/soc/bcm2835/raspberrypi-clocks.h
index ff0b608b51a8..627535877964 100644
--- a/include/soc/bcm2835/raspberrypi-clocks.h
+++ b/include/soc/bcm2835/raspberrypi-clocks.h
@@ -5,11 +5,17 @@
 
 #if IS_ENABLED(CONFIG_CLK_RASPBERRYPI)
 unsigned long rpi_firmware_clk_get_max_rate(struct clk *clk);
+unsigned long rpi_firmware_clk_get_min_rate(struct clk *clk);
 #else
 static inline unsigned long rpi_firmware_clk_get_max_rate(struct clk *clk)
 {
return ULONG_MAX;
 }
+
+static inline unsigned long rpi_firmware_clk_get_min_rate(struct clk *clk)
+{
+   return 0;
+}
 #endif
 
 #endif /* __SOC_RASPBERRY_CLOCKS_H__ */

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 1/7] clk: bcm: rpi: Create helper to retrieve private data

2022-08-15 Thread Maxime Ripard
The RaspberryPi firmware clocks driver uses in several instances a
container_of to retrieve the struct raspberrypi_clk_data from a pointer
to struct clk_hw. Let's create a small function to avoid duplicating it
all over the place.

Signed-off-by: Maxime Ripard 

diff --git a/drivers/clk/bcm/clk-raspberrypi.c 
b/drivers/clk/bcm/clk-raspberrypi.c
index 73518009a0f2..6c0a0fd6cd79 100644
--- a/drivers/clk/bcm/clk-raspberrypi.c
+++ b/drivers/clk/bcm/clk-raspberrypi.c
@@ -73,6 +73,12 @@ struct raspberrypi_clk_data {
struct raspberrypi_clk *rpi;
 };
 
+static inline
+const struct raspberrypi_clk_data *clk_hw_to_data(const struct clk_hw *hw)
+{
+   return container_of(hw, struct raspberrypi_clk_data, hw);
+}
+
 struct raspberrypi_clk_variant {
boolexport;
char*clkdev;
@@ -176,8 +182,7 @@ static int raspberrypi_clock_property(struct rpi_firmware 
*firmware,
 
 static int raspberrypi_fw_is_prepared(struct clk_hw *hw)
 {
-   struct raspberrypi_clk_data *data =
-   container_of(hw, struct raspberrypi_clk_data, hw);
+   const struct raspberrypi_clk_data *data = clk_hw_to_data(hw);
struct raspberrypi_clk *rpi = data->rpi;
u32 val = 0;
int ret;
@@ -194,8 +199,7 @@ static int raspberrypi_fw_is_prepared(struct clk_hw *hw)
 static unsigned long raspberrypi_fw_get_rate(struct clk_hw *hw,
 unsigned long parent_rate)
 {
-   struct raspberrypi_clk_data *data =
-   container_of(hw, struct raspberrypi_clk_data, hw);
+   const struct raspberrypi_clk_data *data = clk_hw_to_data(hw);
struct raspberrypi_clk *rpi = data->rpi;
u32 val = 0;
int ret;
@@ -211,8 +215,7 @@ static unsigned long raspberrypi_fw_get_rate(struct clk_hw 
*hw,
 static int raspberrypi_fw_set_rate(struct clk_hw *hw, unsigned long rate,
   unsigned long parent_rate)
 {
-   struct raspberrypi_clk_data *data =
-   container_of(hw, struct raspberrypi_clk_data, hw);
+   const struct raspberrypi_clk_data *data = clk_hw_to_data(hw);
struct raspberrypi_clk *rpi = data->rpi;
u32 _rate = rate;
int ret;
@@ -229,8 +232,7 @@ static int raspberrypi_fw_set_rate(struct clk_hw *hw, 
unsigned long rate,
 static int raspberrypi_fw_dumb_determine_rate(struct clk_hw *hw,
  struct clk_rate_request *req)
 {
-   struct raspberrypi_clk_data *data =
-   container_of(hw, struct raspberrypi_clk_data, hw);
+   const struct raspberrypi_clk_data *data = clk_hw_to_data(hw);
struct raspberrypi_clk_variant *variant = data->variant;
 
/*

-- 
b4 0.10.0-dev-a76f5


[PATCH v1 0/7] drm/vc4: Fix the core clock behaviour

2022-08-15 Thread Maxime Ripard
Hi,

Those patches used to be part of a larger clock fixes series:
https://lore.kernel.org/linux-clk/20220715160014.2623107-1-max...@cerno.tech/

However, that series doesn't seem to be getting anywhere, so I've split out
these patches that fix a regression that has been there since 5.18 and that
prevents the 4k output from working on the RaspberryPi4.

Hopefully, we will be able to merge those patches through the DRM tree to avoid
any further disruption.

Let me know what you think,
Maxime

---
Dom Cobley (1):
  drm/vc4: hdmi: Add more checks for 4k resolutions

Maxime Ripard (6):
  clk: bcm: rpi: Create helper to retrieve private data
  clk: bcm: rpi: Add a function to retrieve the maximum
  clk: bcm: rpi: Add a function to retrieve the minimum
  drm/vc4: hdmi: Fix hdmi_enable_4kp60 detection
  drm/vc4: hdmi: Rework hdmi_enable_4kp60 detection code
  drm/vc4: Make sure we don't end up with a core clock too high

 drivers/clk/bcm/clk-raspberrypi.c| 73 
 drivers/gpu/drm/vc4/vc4_drv.h| 14 ++
 drivers/gpu/drm/vc4/vc4_hdmi.c   | 25 +--
 drivers/gpu/drm/vc4/vc4_hdmi.h   |  8 
 drivers/gpu/drm/vc4/vc4_hvs.c| 13 ++
 drivers/gpu/drm/vc4/vc4_kms.c| 17 +---
 include/soc/bcm2835/raspberrypi-clocks.h | 21 +
 7 files changed, 138 insertions(+), 33 deletions(-)
---
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
change-id: 20220815-rpi-fix-4k-60-17273650429d

Best regards,
-- 
Maxime Ripard 


RE: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Dong, Ruijing
[AMD Official Use Only - General]

If the condition was met and it came to execute vcn_4_0_stop_dpg_mode, then it 
would never have a chance to go for /*wait for vcn idle*/, isn't it?
I still didn't see obvious purpose of this change.

if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
 ==>  r = vcn_v4_0_stop_dpg_mode(adev, i);
 continue;
 }

 /* wait for vcn idle */
 r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS, 
UVD_STATUS__IDLE, 0x7);

Thanks
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 11:11 AM
To: Dong, Ruijing ; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; 
linux-kernel-ment...@lists.linuxfoundation.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Zhu, James 
; Jiang, Sonny ; Wan Jiabing 
; Liu, Leo 
Subject: Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
vcn_v4_0_stop

On 8/15/22 20:15, Dong, Ruijing wrote:
> [AMD Official Use Only - General]
>
> Sorry, which "r" value was overwritten?  I didn't see the point of making 
> this change.
>
> Thanks
> Ruijing
>
> -Original Message-
> From: Khalid Masum 
> Sent: Monday, August 15, 2022 3:01 AM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org;
> linux-ker...@vger.kernel.org;
> linux-kernel-ment...@lists.linuxfoundation.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui ; David
> Airlie ; Daniel Vetter ; Zhu, James
> ; Jiang, Sonny ; Dong, Ruijing
> ; Wan Jiabing ; Liu, Leo
> ; Khalid Masum 
> Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment
> in vcn_v4_0_stop
>
> The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before it 
> can be used. Remove this assignment.
>
> Addresses-Coverity: 1504988 ("Unused value")
> Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
> Signed-off-by: Khalid Masum 
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> index ca14c3ef742e..80b8a2c66b36 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> @@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
>  fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
>
>  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
> -   r = vcn_v4_0_stop_dpg_mode(adev, i);
> +   vcn_v4_0_stop_dpg_mode(adev, i);
>  continue;
>  }
>
> --
> 2.37.1
>

After value is overwritten soon right after the diff.

See:
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c

static int vcn_v4_0_stop(struct amdgpu_device *adev) {
 volatile struct amdgpu_vcn4_fw_shared *fw_shared; ...

 for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
 fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
 fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

 if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
 r = vcn_v4_0_stop_dpg_mode(adev, i);
 continue;
 }

 /* wait for vcn idle */
 r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS, 
UVD_STATUS__IDLE, 0x7);

Here, any value assigned to r is overwritten before it could be used. So the 
assignment in the true branch of the if statement here can be removed.

Thanks,
   -- Khalid Masum


Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Khalid Masum

On 8/15/22 20:15, Dong, Ruijing wrote:

[AMD Official Use Only - General]

Sorry, which "r" value was overwritten?  I didn't see the point of making this 
change.

Thanks
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 3:01 AM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; linux-kernel-ment...@lists.linuxfoundation.org
Cc: Deucher, Alexander ; Koenig, Christian ; Pan, Xinhui 
; David Airlie ; Daniel Vetter ; Zhu, James ; 
Jiang, Sonny ; Dong, Ruijing ; Wan Jiabing ; Liu, Leo 
; Khalid Masum 
Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
vcn_v4_0_stop

The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before it 
can be used. Remove this assignment.

Addresses-Coverity: 1504988 ("Unused value")
Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
Signed-off-by: Khalid Masum 
---
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..80b8a2c66b36 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
 fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

 if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
-   r = vcn_v4_0_stop_dpg_mode(adev, i);
+   vcn_v4_0_stop_dpg_mode(adev, i);
 continue;
 }

--
2.37.1



After value is overwritten soon right after the diff.

See:
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c

static int vcn_v4_0_stop(struct amdgpu_device *adev)
{
volatile struct amdgpu_vcn4_fw_shared *fw_shared;
...

for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
r = vcn_v4_0_stop_dpg_mode(adev, i);
continue;
}

/* wait for vcn idle */
r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS, 
UVD_STATUS__IDLE, 0x7);


Here, any value assigned to r is overwritten before it could
be used. So the assignment in the true branch of the if statement
here can be removed.

Thanks,
  -- Khalid Masum


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Dmitry Osipenko
On 8/15/22 16:53, Christian König wrote:
> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>> [SNIP]
>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>> wondering what exactly is going on here.
>> KVM actually doesn't hold the page reference, it takes the temporal
>> reference during page fault and then drops the reference once page is
>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>> a race condition here?
>>
> 
> Well the question is why does KVM grab the page reference in the first
> place?
> 
> If that is to prevent the mapping from changing then yes that's illegal
> and won't work. It can always happen that you grab the address, solve
> the fault and then immediately fault again because the address you just
> grabbed is invalidated.
> 
> If it's for some other reason than we should probably investigate if we
> shouldn't stop doing this.

CC: +Paolo Bonzini who introduced this code

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini 
Date:   Tue Jun 7 17:51:18 2016 +0200

KVM: MMU: try to fix up page faults before giving up

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
handler
then can use remap_pfn_range to place some non-reserved pages in the
VMA.

This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
and fixup_user_fault together help supporting it.  The patch also
supports
VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
reference counting.

@Paolo,
https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5...@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154

-- 
Best regards,
Dmitry


Re: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex

2022-08-15 Thread Melissa Wen
On 08/15, Maíra Canal wrote:
> If amdgpu_cs_vm_handling returns r != 0, then it will unlock the
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini. This problem results in the following
> use-after-free problem:
> 
> [ 220.280990] [ cut here ]
> [ 220.281000] refcount_t: underflow; use-after-free.
> [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 
> refcount_warn_saturate+0xba/0x110
> [ 220.281029] [ cut here ]
> [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L --- --- 
> 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
> [ 220.281421] Hardware name: System manufacturer System Product Name/ROG 
> STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
> 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
> 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 220.281437] RSP: 0018:b4b0d18d7a80 EFLAGS: 00010282
> [ 220.281443] RAX: 0026 RBX: 0003 RCX: 
> 
> [ 220.281448] RDX: 0001 RSI: 988d06dc RDI: 
> 
> [ 220.281452] RBP:  R08:  R09: 
> b4b0d18d7930
> [ 220.281457] R10: 0003 R11: a0672e2fffe8 R12: 
> a058ca360400
> [ 220.281461] R13: a05846c50a18 R14: fe00 R15: 
> 0003
> [ 220.281465] FS: 7f82683e06c0() GS:a066e2e0() 
> knlGS:
> [ 220.281470] CS: 0010 DS:  ES:  CR0: 80050033
> [ 220.281475] CR2: 3590005cc000 CR3: 0001fca46000 CR4: 
> 00350ee0
> [ 220.281480] Call Trace:
> [ 220.281485] 
> [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
> [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [ 220.282028] drm_ioctl_kernel+0xa4/0x150
> [ 220.282043] drm_ioctl+0x21f/0x420
> [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [ 220.282275] ? lock_release+0x14f/0x460
> [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
> [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [ 220.282534] __x64_sys_ioctl+0x90/0xd0
> [ 220.282545] do_syscall_64+0x5b/0x80
> [ 220.282551] ? futex_wake+0x6c/0x150
> [ 220.282568] ? lock_is_held_type+0xe8/0x140
> [ 220.282580] ? do_syscall_64+0x67/0x80
> [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282592] ? do_syscall_64+0x67/0x80
> [ 220.282597] ? do_syscall_64+0x67/0x80
> [ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
> [ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 220.282616] RIP: 0033:0x7f8282a4f8bf
> [ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
> 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
> 00
> [ 220.282644] RSP: 002b:7f82683df410 EFLAGS: 0246 ORIG_RAX: 
> 0010
> [ 220.282651] RAX: ffda RBX: 7f82683df588 RCX: 
> 7f8282a4f8bf
> [ 220.282655] RDX: 7f82683df4d0 RSI: c0186444 RDI: 
> 0018
> [ 220.282659] RBP: 7f82683df4d0 R08: 7f82683df5e0 R09: 
> 7f82683df4b0
> [ 220.282663] R10: 1d04000a0600 R11: 0246 R12: 
> c0186444
> [ 220.282667] R13: 0018 R14: 7f82683df588 R15: 
> 0003
> [ 220.282689] 
> [ 220.282693] irq event stamp: 6232311
> [ 220.282697] hardirqs last enabled at (6232319): [] 
> __up_console_sem+0x5e/0x70
> [ 220.282704] hardirqs last disabled at (6232326): [] 
> __up_console_sem+0x43/0x70
> [ 220.282709] softirqs last enabled at (6232072): [] 
> __irq_exit_rcu+0xf9/0x170
> [ 220.282716] softirqs last disabled at (6232061): [] 
> __irq_exit_rcu+0xf9/0x170
> [ 220.282722] ---[ end trace  ]---
> 
> Therefore, remove the mutex_unlock from the amdgpu_cs_vm_handling
> function, so that amdgpu_cs_submit and amdgpu_cs_parser_fini can handle
> the unlock.
> 
> Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a 
> mutex v2")
> Reported-by: Mikhail Gavrilov 
> Signed-off-by: Maíra Canal 
> ---
> Thanks Melissa and Christian for the feedback on mutex_unlock.
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d8f1335bc68f..b7bae833c804 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -837,16 +837,12 @@ static int amdgpu_cs_vm_handling(struct 
> amdgpu_cs_parser *p)
>   continue;
>  
>   r = amdgpu_vm_bo_update(adev, bo_va, 

RE: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Dong, Ruijing
[AMD Official Use Only - General]

Sorry, which "r" value was overwritten?  I didn't see the point of making this 
change.

Thanks
Ruijing

-Original Message-
From: Khalid Masum 
Sent: Monday, August 15, 2022 3:01 AM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; linux-kernel-ment...@lists.linuxfoundation.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Pan, Xinhui ; David Airlie 
; Daniel Vetter ; Zhu, James 
; Jiang, Sonny ; Dong, Ruijing 
; Wan Jiabing ; Liu, Leo 
; Khalid Masum 
Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
vcn_v4_0_stop

The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before it 
can be used. Remove this assignment.

Addresses-Coverity: 1504988 ("Unused value")
Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
Signed-off-by: Khalid Masum 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index ca14c3ef742e..80b8a2c66b36 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;

if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
-   r = vcn_v4_0_stop_dpg_mode(adev, i);
+   vcn_v4_0_stop_dpg_mode(adev, i);
continue;
}

--
2.37.1



[Bug 216143] [bisected] garbled screen when starting X + dmesg cluttered with "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling -1431655766!"

2022-08-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216143

--- Comment #11 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 301574
  --> https://bugzilla.kernel.org/attachment.cgi?id=301574=edit
kernel .config (kernel 6.0-rc1, AMD Ryzen 9 5950X)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 216143] [bisected] garbled screen when starting X + dmesg cluttered with "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling -1431655766!"

2022-08-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216143

--- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 301573
  --> https://bugzilla.kernel.org/attachment.cgi?id=301573=edit
kernel dmesg (kernel 6.0-rc1, AMD Ryzen 9 5950X)

No change with v6-0-rc1.
[...]
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling
-1431655766!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling
-1431655766!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling -22!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed in the dependencies handling
-1431655766!
[...]

Additionally I get:
[...]
[ cut here ]
refcount_t: underflow; use-after-free.
WARNING: CPU: 7 PID: 2120 at lib/refcount.c:28 refcount_warn_saturate+0x93/0xf0
Modules linked in: rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 aes_generic
aesni_intel libaes crypto_simd cryptd chacha_generic chacha_x86_64 libchacha
adiantum libpoly1305 algif_skcipher joydev input_leds hid_generic usbhid hid
ext4 mbcache crc16 jbd2 sr_mod amdgpu cdrom snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio dm_mod led_class mfd_core
snd_hda_codec_hdmi drm_buddy r8169 gpu_sched evdev wmi_bmof drm_ttm_helper
snd_hda_intel ttm snd_intel_dspcfg realtek i2c_algo_bit snd_hda_codec
drm_display_helper snd_hwdep mdio_devres drm_kms_helper snd_hda_core sysimgblt
syscopyarea snd_pcm sysfillrect libphy fb_sys_fops xhci_pci snd_timer ahci
xhci_hcd snd libahci soundcore usbcore libata k10temp usb_common i2c_piix4
gpio_amdpt gpio_generic button pkcs8_key_parser nct6775 hwmon_vid nct6775_core
wmi hwmon zram zsmalloc amd_pstate drm fuse drm_panel_orientation_quirks
backlight configfs efivarfs
CPU: 7 PID: 2120 Comm: X:cs0 Not tainted 6.0.0-rc1-Zen3 #1
Hardware name: To Be Filled By O.E.M. B450M Steel Legend/B450M Steel Legend,
BIOS P4.30 02/25/2022
RIP: 0010:refcount_warn_saturate+0x93/0xf0
Code: c7 c7 6d 4b e9 b2 e8 cc 13 bf ff 0f 0b c3 80 3d 5b fe da 00 00 75 af c6
05 52 fe da 00 01 48 c7 c7 ad 45 ea b2 e8 ad 13 bf ff <0f> 0b c3 80 3d 39 fe da
00 00 75 90 c6 05 30 fe da 00 01 48 c7 c7
RSP: 0018:bc8ac1b7fb38 EFLAGS: 00010246
RAX: d8250f016f21c100 RBX: 0038 RCX: 0027
RDX: bfff RSI: 0004 RDI: a0db5ebd71c8
RBP: 0003 R08:  R09: a0db5e8a
R10: 0419 R11:  R12: 
R13: a0d4f3e2 R14: a0d5a62ccc00 R15: 0003
FS:  7f879006c640() GS:a0db5ebc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 563c67cfb000 CR3: 0002c3938000 CR4: 00350ee0
Call Trace:
 
 amdgpu_cs_ioctl+0x498/0xdd0 [amdgpu]
 ? amdgpu_cs_report_moved_bytes+0x60/0x60 [amdgpu]
 drm_ioctl_kernel+0xdb/0x150 [drm]
 drm_ioctl+0x301/0x440 [drm]
 ? amdgpu_cs_report_moved_bytes+0x60/0x60 [amdgpu]
 amdgpu_drm_ioctl+0x42/0x80 [amdgpu]
 __se_sys_ioctl+0x72/0xc0
 do_syscall_64+0x6a/0x90
 ? do_user_addr_fault+0x2da/0x410
 ? exc_page_fault+0x5f/0x90
 entry_SYSCALL_64_after_hwframe+0x4b/0xb5
RIP: 0033:0x7f879b42496b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24
08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff
77 1b 48 8b 44 24 18 64 48 2b 04 25 28 00
RSP: 002b:7f879006b590 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: c0186444 RCX: 7f879b42496b
RDX: 7f879006b8a8 RSI: c0186444 RDI: 000d
RBP: 7f879006b8e0 R08: 7f879006b970 R09: 0003
R10: 560b4e0cdc40 R11: 0246 R12: 000d
R13: 560b4e175968 R14:  R15: 7f879006b8a8
 
---[ end trace  ]---

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König

Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:

[SNIP]

Well that comment sounds like KVM is doing the right thing, so I'm
wondering what exactly is going on here.

KVM actually doesn't hold the page reference, it takes the temporal
reference during page fault and then drops the reference once page is
mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
a race condition here?



Well the question is why does KVM grab the page reference in the first 
place?


If that is to prevent the mapping from changing then yes that's illegal 
and won't work. It can always happen that you grab the address, solve 
the fault and then immediately fault again because the address you just 
grabbed is invalidated.


If it's for some other reason than we should probably investigate if we 
shouldn't stop doing this.


Regards,
Christian.


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Dmitry Osipenko
On 8/15/22 16:06, Christian König wrote:
> Am 15.08.22 um 13:50 schrieb Dmitry Osipenko:
>> On 8/15/22 14:28, Christian König wrote:
>> Maybe it was discussed privately? In this case I will be happy to get
>> more info from you about the root of the problem so I could start to
>> look at how to fix it properly. It's not apparent where the
>> problem is
>> to a TTM newbie like me.
>>
> Well this is completely unfixable. See the whole purpose of TTM is to
> allow tracing where what is mapped of a buffer object.
>
> If you circumvent that and increase the page reference yourself than
> that whole functionality can't work correctly any more.
 Are you suggesting that the problem is that TTM doesn't see the KVM
 page
 faults/mappings?
>>> Yes, and no. It's one of the issues, but there is more behind that (e.g.
>>> what happens when TTM switches from pages to local memory for backing a
>>> BO).
>> If KVM page fault could reach TTM, then it should be able to relocate
>> BO. I see now where is the problem, thanks. Although, I'm wondering
>> whether it already works somehow.. I'll try to play with the the AMDGPU
>> shrinker and see what will happen on guest mapping of a relocated BO.
> 
> Well the page fault already somehow reaches TTM, otherwise the pfn
> couldn't be filled in in the first place.
> 
> The issues is more that KVM should never ever grab a page reference to
> pages mapped with VM_IO or VM_PFNMAP.
> 
> Essentially we need to apply the same restriction as with
> get_user_pages() here.
> 
>>> Another question is why is KVM accessing the page structure in the first
>>> place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
>>> touch any of those pages.
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fvirt%2Fkvm%2Fkvm_main.c%23L2549data=05%7C01%7Cchristian.koenig%40amd.com%7C2f38c27f20f842fc582a08da7eb4580d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961610314049167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Pu5F1EF9UvDPdOQ7sjJ1WDRt5XpFZmAMXdkexnDpEmU%3Dreserved=0
>>
> 
> Well that comment sounds like KVM is doing the right thing, so I'm
> wondering what exactly is going on here.

KVM actually doesn't hold the page reference, it takes the temporal
reference during page fault and then drops the reference once page is
mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
a race condition here?

-- 
Best regards,
Dmitry


[PATCH] drm/rockchip: vop2: Fix eDP/HDMI sync polarities

2022-08-15 Thread Sascha Hauer
The hsync/vsync polarities were not honoured for the eDP and HDMI ports.
Add the register settings to configure the polarities as requested by the
DRM_MODE_FLAG_PHSYNC/DRM_MODE_FLAG_PVSYNC flags.

Signed-off-by: Sascha Hauer 
---
 drivers/gpu/drm/rockchip/rockchip_drm_vop2.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
index e4631f515ba42..f9aa8b96c6952 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -1439,11 +1439,15 @@ static void rk3568_set_intf_mux(struct vop2_video_port 
*vp, int id,
die &= ~RK3568_SYS_DSP_INFACE_EN_HDMI_MUX;
die |= RK3568_SYS_DSP_INFACE_EN_HDMI |
   FIELD_PREP(RK3568_SYS_DSP_INFACE_EN_HDMI_MUX, 
vp->id);
+   dip &= ~RK3568_DSP_IF_POL__HDMI_PIN_POL;
+   dip |= FIELD_PREP(RK3568_DSP_IF_POL__HDMI_PIN_POL, polflags);
break;
case ROCKCHIP_VOP2_EP_EDP0:
die &= ~RK3568_SYS_DSP_INFACE_EN_EDP_MUX;
die |= RK3568_SYS_DSP_INFACE_EN_EDP |
   FIELD_PREP(RK3568_SYS_DSP_INFACE_EN_EDP_MUX, vp->id);
+   dip &= ~RK3568_DSP_IF_POL__EDP_PIN_POL;
+   dip |= FIELD_PREP(RK3568_DSP_IF_POL__EDP_PIN_POL, polflags);
break;
case ROCKCHIP_VOP2_EP_MIPI0:
die &= ~RK3568_SYS_DSP_INFACE_EN_MIPI0_MUX;
-- 
2.30.2



Re: [PATCH] drm/radeon: add a force flush to delay work when radeon

2022-08-15 Thread Christian König

Am 15.08.22 um 09:34 schrieb 李真能:


在 2022/8/12 18:55, Christian König 写道:

Am 11.08.22 um 09:25 schrieb Zhenneng Li:
Although radeon card fence and wait for gpu to finish processing 
current batch rings,
there is still a corner case that radeon lockup work queue may not 
be fully flushed,
and meanwhile the radeon_suspend_kms() function has called 
pci_set_power_state() to

put device in D3hot state.


If I'm not completely mistaken the reset worker uses the 
suspend/resume functionality as well to get the hardware into a 
working state again.


So if I'm not completely mistaken this here would lead to a deadlock, 
please double check that.


We have tested many times, there are no deadlock.


Testing doesn't tells you anything, you need to audit the call paths.


In which situation, there would lead to a deadlock?


GPU resets.

Regards,
Christian.





Regards,
Christian.


Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.
Configuration and Message requests are the only TLPs accepted by a 
Function in
the D3hot state. All other received Requests must be handled as 
Unsupported Requests,
and all received Completions may optionally be handled as 
Unexpected Completions.

This issue will happen in following logs:
Unable to handle kernel paging request at virtual address 
8800e0008010

CPU 0 kworker/0:3(131): Oops 0
pc = []  ra = []  ps =  
Tainted: G    W

pc is at si_gpu_check_soft_reset+0x3c/0x240
ra is at si_dma_is_lockup+0x34/0xd0
v0 =   t0 = fff08800e0008010  t1 = 0001
t2 = 8010  t3 = fff7e3c0  t4 = fff7e3c00258
t5 =   t6 = 0001  t7 = fff7ef078000
s0 = fff7e3c016e8  s1 = fff7e3c0  s2 = fff7e3c00018
s3 = fff7e3c0  s4 = fff7fff59d80  s5 = 
s6 = fff7ef07bd98
a0 = fff7e3c0  a1 = fff7e3c016e8  a2 = 0008
a3 = 0001  a4 = 8f5c28f5c28f5c29  a5 = 810f4338
t8 = 0275  t9 = 809b66f8  t10 = ff6769c5d964b800
t11= b886  pv = 811bea20  at = 
gp = 81d89690  sp = aa814126
Disabling lock debugging due to kernel taint
Trace:
[] si_dma_is_lockup+0x34/0xd0
[] radeon_fence_check_lockup+0xd0/0x290
[] process_one_work+0x280/0x550
[] worker_thread+0x70/0x7c0
[] worker_thread+0x130/0x7c0
[] kthread+0x200/0x210
[] worker_thread+0x0/0x7c0
[] kthread+0x14c/0x210
[] ret_from_kernel_thread+0x18/0x20
[] kthread+0x0/0x210
  Code: ad3e0008  43f0074a  ad7e0018  ad9e0020  8c3001e8 40230101
  <8821> 4821ed21
So force lockup work queue flush to fix this problem.

Signed-off-by: Zhenneng Li 
---
  drivers/gpu/drm/radeon/radeon_device.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c

index 15692cb241fc..e608ca26780a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1604,6 +1604,9 @@ int radeon_suspend_kms(struct drm_device *dev, 
bool suspend,

  if (r) {
  /* delay GPU reset to resume */
  radeon_fence_driver_force_completion(rdev, i);
+    } else {
+    /* finish executing delayed work */
+ flush_delayed_work(>fence_drv[i].lockup_work);
  }
  }






[PATCH] drm/ttm: prevent grabbing page references

2022-08-15 Thread Christian König
TTM owns the pages it uses for backing buffer objects with system
memory. Because of this it is absolutely illegal to mess around with
the reference count of those pages.

So make sure that nobody ever tries to grab an extra reference on
pages allocated through the page pool.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1bba0a0ed3f9..cbca84dbd83f 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -93,8 +93,17 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
*pool, gfp_t gfp_flags,
 
if (!pool->use_dma_alloc) {
p = alloc_pages(gfp_flags, order);
-   if (p)
+   if (p) {
p->private = order;
+
+   /* The pages are fully owned by TTM and because of this
+* it's illegal to grab extra references to it or
+* otherwise we corrupt TTMs internal state. Make sure
+* nobody tries to ever increase the reference count of
+* those pages.
+*/
+   set_page_count(p, 0);
+   }
return p;
}
 
@@ -144,6 +153,9 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum 
ttm_caching caching,
 #endif
 
if (!pool || !pool->use_dma_alloc) {
+   /* See alloc why references to TTMs pages are illegal */
+   WARN_ON(page_count(p));
+   set_page_count(p, 1);
__free_pages(p, order);
return;
}
-- 
2.25.1



Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König

Am 15.08.22 um 13:50 schrieb Dmitry Osipenko:

On 8/15/22 14:28, Christian König wrote:

Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.


Well this is completely unfixable. See the whole purpose of TTM is to
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than
that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?

Yes, and no. It's one of the issues, but there is more behind that (e.g.
what happens when TTM switches from pages to local memory for backing a
BO).

If KVM page fault could reach TTM, then it should be able to relocate
BO. I see now where is the problem, thanks. Although, I'm wondering
whether it already works somehow.. I'll try to play with the the AMDGPU
shrinker and see what will happen on guest mapping of a relocated BO.


Well the page fault already somehow reaches TTM, otherwise the pfn 
couldn't be filled in in the first place.


The issues is more that KVM should never ever grab a page reference to 
pages mapped with VM_IO or VM_PFNMAP.


Essentially we need to apply the same restriction as with 
get_user_pages() here.



Another question is why is KVM accessing the page structure in the first
place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
touch any of those pages.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fvirt%2Fkvm%2Fkvm_main.c%23L2549data=05%7C01%7Cchristian.koenig%40amd.com%7C2f38c27f20f842fc582a08da7eb4580d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961610314049167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Pu5F1EF9UvDPdOQ7sjJ1WDRt5XpFZmAMXdkexnDpEmU%3Dreserved=0


Well that comment sounds like KVM is doing the right thing, so I'm 
wondering what exactly is going on here.


Regards,
Christian.







Re: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex

2022-08-15 Thread Christian König




Am 15.08.22 um 13:39 schrieb Maíra Canal:

If amdgpu_cs_vm_handling returns r != 0, then it will unlock the
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini. This problem results in the following
use-after-free problem:

[ 220.280990] [ cut here ]
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 
refcount_warn_saturate+0xba/0x110
[ 220.281029] [ cut here ]
[ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L --- --- 
5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
[ 220.281421] Hardware name: System manufacturer System Product Name/ROG STRIX 
X570-I GAMING, BIOS 4403 04/27/2022
[ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 220.281437] RSP: 0018:b4b0d18d7a80 EFLAGS: 00010282
[ 220.281443] RAX: 0026 RBX: 0003 RCX: 
[ 220.281448] RDX: 0001 RSI: 988d06dc RDI: 
[ 220.281452] RBP:  R08:  R09: b4b0d18d7930
[ 220.281457] R10: 0003 R11: a0672e2fffe8 R12: a058ca360400
[ 220.281461] R13: a05846c50a18 R14: fe00 R15: 0003
[ 220.281465] FS: 7f82683e06c0() GS:a066e2e0() 
knlGS:
[ 220.281470] CS: 0010 DS:  ES:  CR0: 80050033
[ 220.281475] CR2: 3590005cc000 CR3: 0001fca46000 CR4: 00350ee0
[ 220.281480] Call Trace:
[ 220.281485] 
[ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
[ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282028] drm_ioctl_kernel+0xa4/0x150
[ 220.282043] drm_ioctl+0x21f/0x420
[ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282275] ? lock_release+0x14f/0x460
[ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[ 220.282534] __x64_sys_ioctl+0x90/0xd0
[ 220.282545] do_syscall_64+0x5b/0x80
[ 220.282551] ? futex_wake+0x6c/0x150
[ 220.282568] ? lock_is_held_type+0xe8/0x140
[ 220.282580] ? do_syscall_64+0x67/0x80
[ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282592] ? do_syscall_64+0x67/0x80
[ 220.282597] ? do_syscall_64+0x67/0x80
[ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 220.282616] RIP: 0033:0x7f8282a4f8bf
[ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
00
[ 220.282644] RSP: 002b:7f82683df410 EFLAGS: 0246 ORIG_RAX: 
0010
[ 220.282651] RAX: ffda RBX: 7f82683df588 RCX: 7f8282a4f8bf
[ 220.282655] RDX: 7f82683df4d0 RSI: c0186444 RDI: 0018
[ 220.282659] RBP: 7f82683df4d0 R08: 7f82683df5e0 R09: 7f82683df4b0
[ 220.282663] R10: 1d04000a0600 R11: 0246 R12: c0186444
[ 220.282667] R13: 0018 R14: 7f82683df588 R15: 0003
[ 220.282689] 
[ 220.282693] irq event stamp: 6232311
[ 220.282697] hardirqs last enabled at (6232319): [] 
__up_console_sem+0x5e/0x70
[ 220.282704] hardirqs last disabled at (6232326): [] 
__up_console_sem+0x43/0x70
[ 220.282709] softirqs last enabled at (6232072): [] 
__irq_exit_rcu+0xf9/0x170
[ 220.282716] softirqs last disabled at (6232061): [] 
__irq_exit_rcu+0xf9/0x170
[ 220.282722] ---[ end trace  ]---

Therefore, remove the mutex_unlock from the amdgpu_cs_vm_handling
function, so that amdgpu_cs_submit and amdgpu_cs_parser_fini can handle
the unlock.

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a mutex 
v2")
Reported-by: Mikhail Gavrilov 
Signed-off-by: Maíra Canal 


Reviewed-by: Christian König 


---
Thanks Melissa and Christian for the feedback on mutex_unlock.
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..b7bae833c804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,16 +837,12 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
continue;
  
  		r = amdgpu_vm_bo_update(adev, bo_va, false);

-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }
  
  		r = 

Re: [RESEND PATCH v8 06/12] lib: add linear range index macro

2022-08-15 Thread Matti Vaittinen

Hi ChiaEn,

On 8/15/22 12:01, ChiaEn Wu wrote:

From: ChiaEn Wu 

Add linear_range_idx macro for declaring the linear_range struct simply.

Signed-off-by: ChiaEn Wu 
---
  include/linux/linear_range.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/include/linux/linear_range.h b/include/linux/linear_range.h
index fd3d0b358f22..fb53ea13c593 100644
--- a/include/linux/linear_range.h
+++ b/include/linux/linear_range.h
@@ -26,6 +26,14 @@ struct linear_range {
unsigned int step;
  };
  
+#define LINEAR_RANGE_IDX(_min, _min_sel, _max_sel, _step)	\

+   {   \
+   .min = _min,\
+   .min_sel = _min_sel,\
+   .max_sel = _max_sel,\
+   .step = _step,  \
+   }
+


I think this somewhat differs from what you had originally scetched. Eg, 
if I didn't misread the patch earlier - you had:


#define MT6370_CHG_LINEAR_RANGE(_rfd, _min, _min_sel, _max_sel, _step) \
[_rfd] = { \
...

instead of the
> +#define LINEAR_RANGE_IDX(_min, _min_sel, _max_sel, _step) \
> +  {   \

I think the latter (without the []-index) is more generic, and very 
welcome. However, the IDX-suffix does no longer make much sense, right? 
I suggested name LINEAR_RANGE_IDX for macro taking the array index as it 
would also be useful when dealing with arrays.


Do you think you could still drop the IDX from macro name or keep the 
array index as the original did?


Maybe ideally introduce both macros (unless Mark has objections), one 
with the [_rfd] and suffix IDX, and the other w/o the suffix and w/o the 
[_rfd]?


Thanks for the improvements and the patience! ;)

Yours
  -- Matti

--
Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Dmitry Osipenko
On 8/15/22 14:28, Christian König wrote:
 Maybe it was discussed privately? In this case I will be happy to get
 more info from you about the root of the problem so I could start to
 look at how to fix it properly. It's not apparent where the problem is
 to a TTM newbie like me.

>>> Well this is completely unfixable. See the whole purpose of TTM is to
>>> allow tracing where what is mapped of a buffer object.
>>>
>>> If you circumvent that and increase the page reference yourself than
>>> that whole functionality can't work correctly any more.
>> Are you suggesting that the problem is that TTM doesn't see the KVM page
>> faults/mappings?
> 
> Yes, and no. It's one of the issues, but there is more behind that (e.g.
> what happens when TTM switches from pages to local memory for backing a
> BO).

If KVM page fault could reach TTM, then it should be able to relocate
BO. I see now where is the problem, thanks. Although, I'm wondering
whether it already works somehow.. I'll try to play with the the AMDGPU
shrinker and see what will happen on guest mapping of a relocated BO.

> Another question is why is KVM accessing the page structure in the first
> place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
> touch any of those pages.

https://elixir.bootlin.com/linux/v5.19/source/virt/kvm/kvm_main.c#L2549

-- 
Best regards,
Dmitry


[PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex

2022-08-15 Thread Maíra Canal
If amdgpu_cs_vm_handling returns r != 0, then it will unlock the
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini. This problem results in the following
use-after-free problem:

[ 220.280990] [ cut here ]
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 
refcount_warn_saturate+0xba/0x110
[ 220.281029] [ cut here ]
[ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L --- --- 
5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
[ 220.281421] Hardware name: System manufacturer System Product Name/ROG STRIX 
X570-I GAMING, BIOS 4403 04/27/2022
[ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 220.281437] RSP: 0018:b4b0d18d7a80 EFLAGS: 00010282
[ 220.281443] RAX: 0026 RBX: 0003 RCX: 
[ 220.281448] RDX: 0001 RSI: 988d06dc RDI: 
[ 220.281452] RBP:  R08:  R09: b4b0d18d7930
[ 220.281457] R10: 0003 R11: a0672e2fffe8 R12: a058ca360400
[ 220.281461] R13: a05846c50a18 R14: fe00 R15: 0003
[ 220.281465] FS: 7f82683e06c0() GS:a066e2e0() 
knlGS:
[ 220.281470] CS: 0010 DS:  ES:  CR0: 80050033
[ 220.281475] CR2: 3590005cc000 CR3: 0001fca46000 CR4: 00350ee0
[ 220.281480] Call Trace:
[ 220.281485] 
[ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
[ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282028] drm_ioctl_kernel+0xa4/0x150
[ 220.282043] drm_ioctl+0x21f/0x420
[ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282275] ? lock_release+0x14f/0x460
[ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[ 220.282534] __x64_sys_ioctl+0x90/0xd0
[ 220.282545] do_syscall_64+0x5b/0x80
[ 220.282551] ? futex_wake+0x6c/0x150
[ 220.282568] ? lock_is_held_type+0xe8/0x140
[ 220.282580] ? do_syscall_64+0x67/0x80
[ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282592] ? do_syscall_64+0x67/0x80
[ 220.282597] ? do_syscall_64+0x67/0x80
[ 220.282602] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282609] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 220.282616] RIP: 0033:0x7f8282a4f8bf
[ 220.282639] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00
0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00
00
[ 220.282644] RSP: 002b:7f82683df410 EFLAGS: 0246 ORIG_RAX: 
0010
[ 220.282651] RAX: ffda RBX: 7f82683df588 RCX: 7f8282a4f8bf
[ 220.282655] RDX: 7f82683df4d0 RSI: c0186444 RDI: 0018
[ 220.282659] RBP: 7f82683df4d0 R08: 7f82683df5e0 R09: 7f82683df4b0
[ 220.282663] R10: 1d04000a0600 R11: 0246 R12: c0186444
[ 220.282667] R13: 0018 R14: 7f82683df588 R15: 0003
[ 220.282689] 
[ 220.282693] irq event stamp: 6232311
[ 220.282697] hardirqs last enabled at (6232319): [] 
__up_console_sem+0x5e/0x70
[ 220.282704] hardirqs last disabled at (6232326): [] 
__up_console_sem+0x43/0x70
[ 220.282709] softirqs last enabled at (6232072): [] 
__irq_exit_rcu+0xf9/0x170
[ 220.282716] softirqs last disabled at (6232061): [] 
__irq_exit_rcu+0xf9/0x170
[ 220.282722] ---[ end trace  ]---

Therefore, remove the mutex_unlock from the amdgpu_cs_vm_handling
function, so that amdgpu_cs_submit and amdgpu_cs_parser_fini can handle
the unlock.

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a mutex 
v2")
Reported-by: Mikhail Gavrilov 
Signed-off-by: Maíra Canal 
---
Thanks Melissa and Christian for the feedback on mutex_unlock.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..b7bae833c804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,16 +837,12 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
continue;
 
r = amdgpu_vm_bo_update(adev, bo_va, false);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }
 
r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
-   

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Christian König

Am 15.08.22 um 13:19 schrieb Dmitry Osipenko:

[SNIP]

I'll try to dig out the older discussions, thank you for the quick
reply!

Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.

Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.

If it was only CC'd to dri-devel, then could be that emails didn't pass
the spam moderation :/


That might be possible.


Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.


Well this is completely unfixable. See the whole purpose of TTM is to
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than
that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?


Yes, and no. It's one of the issues, but there is more behind that (e.g. 
what happens when TTM switches from pages to local memory for backing a BO).


Another question is why is KVM accessing the page structure in the first 
place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever 
touch any of those pages.


Regards,
Christian.


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-08-15 Thread Dmitry Osipenko
On 8/15/22 13:51, Christian König wrote:
> Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
>> On 8/15/22 13:18, Dmitry Osipenko wrote:
>>> On 8/15/22 13:14, Christian König wrote:
 Am 15.08.22 um 12:11 schrieb Christian König:
> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>> On 8/15/22 13:05, Christian König wrote:
>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
 Higher order pages allocated using alloc_pages() aren't
 refcounted and
 they
 need to be refcounted, otherwise it's impossible to map them by
 KVM. This
 patch sets the refcount of the tail pages and fixes the KVM memory
 mapping
 faults.

 Without this change guest virgl driver can't map host buffers into
 guest
 and can't provide OpenGL 4.5 profile support to the guest. The host
 mappings are also needed for enabling the Venus driver using
 host GPU
 drivers that are utilizing TTM.

 Based on a patch proposed by Trigger Huang.
>>> Well I can't count how often I have repeated this: This is an
>>> absolutely
>>> clear NAK!
>>>
>>> TTM pages are not reference counted in the first place and
>>> because of
>>> this giving them to virgl is illegal.
>> A? The first page is refcounted when allocated, the tail pages are
>> not.
> No they aren't. The first page is just by coincident initialized with
> a refcount of 1. This refcount is completely ignored and not used
> at all.
>
> Incrementing the reference count and by this mapping the page into
> some other address space is illegal and corrupts the internal state
> tracking of TTM.
 See this comment in the source code as well:

  /* Don't set the __GFP_COMP flag for higher order allocations.
   * Mapping pages directly into an userspace process and
 calling
   * put_page() on a TTM allocated page is illegal.
   */

 I have absolutely no idea how somebody had the idea he could do this.
>>> I saw this comment, but it doesn't make sense because it doesn't explain
>>> why it's illegal. Hence it looks like a bogus comment since the
>>> refcouting certainly works, at least to a some degree because I haven't
>>> noticed any problems in practice, maybe by luck :)
>>>
>>> I'll try to dig out the older discussions, thank you for the quick
>>> reply!
>> Are you sure it was really discussed in public previously? All I can
>> find is yours two answers to a similar patches where you're saying that
>> this it's a wrong solution without in-depth explanation and further
>> discussions.
> 
> Yeah, that's my problem as well I can't find that of hand.
> 
> But yes it certainly was discussed in public.

If it was only CC'd to dri-devel, then could be that emails didn't pass
the spam moderation :/

>> Maybe it was discussed privately? In this case I will be happy to get
>> more info from you about the root of the problem so I could start to
>> look at how to fix it properly. It's not apparent where the problem is
>> to a TTM newbie like me.
>>
> 
> Well this is completely unfixable. See the whole purpose of TTM is to
> allow tracing where what is mapped of a buffer object.
> 
> If you circumvent that and increase the page reference yourself than
> that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?

-- 
Best regards,
Dmitry


Re: [PATCH v6 1/6] drm/ttm: Add new callbacks to ttm res mgr

2022-08-15 Thread Christian König

Am 12.08.22 um 15:30 schrieb Arunpravin Paneer Selvam:

We are adding two new callbacks to ttm resource manager
function to handle intersection and compatibility of
placement and resources.

v2: move the amdgpu and ttm_range_manager changes to
 separate patches (Christian)
v3: rename "intersect" to "intersects" (Matthew)
v4: move !place check to the !res if and return false
 in ttm_resource_compatible() function (Christian)
v5: move bits of code from patch number 6 to avoid
 temporary driver breakup (Christian)

Signed-off-by: Christian König 
Signed-off-by: Arunpravin Paneer Selvam 


Patch #6 could still be cleaned up more now that we have the workaround 
code in patch #1, but that not really a must have.


Reviewed-by: Christian König  for the entire 
series.


Do you already have commit rights?

Regards,
Christian.


---
  drivers/gpu/drm/ttm/ttm_bo.c   |  9 ++--
  drivers/gpu/drm/ttm/ttm_resource.c | 77 +-
  include/drm/ttm/ttm_resource.h | 40 
  3 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c1bd006a5525..f066e8124c50 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -518,6 +518,9 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
  bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
  const struct ttm_place *place)
  {
+   struct ttm_resource *res = bo->resource;
+   struct ttm_device *bdev = bo->bdev;
+
dma_resv_assert_held(bo->base.resv);
if (bo->resource->mem_type == TTM_PL_SYSTEM)
return true;
@@ -525,11 +528,7 @@ bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
/* Don't evict this BO if it's outside of the
 * requested placement range
 */
-   if (place->fpfn >= (bo->resource->start + bo->resource->num_pages) ||
-   (place->lpfn && place->lpfn <= bo->resource->start))
-   return false;
-
-   return true;
+   return ttm_resource_intersects(bdev, res, place, bo->base.size);
  }
  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
  
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c

index 20f9adcc3235..0d1f862a582b 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -253,10 +253,84 @@ void ttm_resource_free(struct ttm_buffer_object *bo, 
struct ttm_resource **res)
  }
  EXPORT_SYMBOL(ttm_resource_free);
  
+/**

+ * ttm_resource_intersects - test for intersection
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res intersects with @place and @size. Used for testing if evictions
+ * are valueable or not.
+ *
+ * Returns true if the res placement intersects with @place and @size.
+ */
+bool ttm_resource_intersects(struct ttm_device *bdev,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct ttm_resource_manager *man;
+
+   if (!res)
+   return false;
+
+   if (!place)
+   return true;
+
+   man = ttm_manager_type(bdev, res->mem_type);
+   if (!man->func->intersects) {
+   if (place->fpfn >= (res->start + res->num_pages) ||
+   (place->lpfn && place->lpfn <= res->start))
+   return false;
+
+   return true;
+   }
+
+   return man->func->intersects(man, res, place, size);
+}
+
+/**
+ * ttm_resource_compatible - test for compatibility
+ *
+ * @bdev: TTM device structure
+ * @res: The resource to test
+ * @place: The placement to test
+ * @size: How many bytes the new allocation needs.
+ *
+ * Test if @res compatible with @place and @size.
+ *
+ * Returns true if the res placement compatible with @place and @size.
+ */
+bool ttm_resource_compatible(struct ttm_device *bdev,
+struct ttm_resource *res,
+const struct ttm_place *place,
+size_t size)
+{
+   struct ttm_resource_manager *man;
+
+   if (!res || !place)
+   return false;
+
+   man = ttm_manager_type(bdev, res->mem_type);
+   if (!man->func->compatible) {
+   if (res->start < place->fpfn ||
+   (place->lpfn && (res->start + res->num_pages) > 
place->lpfn))
+   return false;
+
+   return true;
+   }
+
+   return man->func->compatible(man, res, place, size);
+}
+
  static bool ttm_resource_places_compat(struct ttm_resource *res,
   const struct ttm_place *places,
   unsigned num_placement)
  {
+   struct ttm_buffer_object *bo = res->bo;
+   struct ttm_device *bdev 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Christian König

Am 15.08.22 um 12:55 schrieb Melissa Wen:

On 08/14, Maíra Canal wrote:

Hi Mikhail

Looks like this use-after-free problem was introduced on
90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini.

Maybe the following patch will help:

---
 From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= 
Date: Sun, 14 Aug 2022 21:12:24 -0300
Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
mutex v2")
Reported-by: Mikhail Gavrilov 
Signed-off-by: Maíra Canal 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..a7fce7b14321 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
amdgpu_cs_parser *p)
continue;

r = amdgpu_vm_bo_update(adev, bo_va, false);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }

r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }
}
+   mutex_unlock(>bo_list->bo_list_mutex);

I think we don't need to unlock the bo_list_mutex here. If return != 0
amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
unlocks it in the end.


Yeah, exactly that.

Apart from that the patch looks good to me. We moved the mutex unlocking 
around a few times during review. Probably just a fallout from that.


Thanks for fixing this,
Christian.



BR,

Melissa

r = amdgpu_vm_handle_moved(adev, vm);
if (r)
--
2.37.1
---
Best Regards,
- Maíra Canal

On 8/14/22 18:11, Mikhail Gavrilov wrote:

Hi folks.
Joined testing 5.20 today (7ebfc85e2cd7).
I encountered a frequently GPU freeze, after which a message appears
in the kernel logs:
[ 220.280990] [ cut here ]
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
fat intel_rapl_common snd_hda_codec_realtek mt76x2u
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
snd_seq_device joydev xpad iwlmei platform_profile bluetooth
ff_memless snd_pcm mc rapl
[ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
sp5100_tco cec wmi ip6_tables ip_tables fuse
[ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Melissa Wen
On 08/14, Maíra Canal wrote:
> Hi Mikhail
> 
> Looks like this use-after-free problem was introduced on
> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini.
> 
> Maybe the following patch will help:
> 
> ---
> From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ma=C3=ADra=20Canal?= 
> Date: Sun, 14 Aug 2022 21:12:24 -0300
> Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
> mutex v2")
> Reported-by: Mikhail Gavrilov 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d8f1335bc68f..a7fce7b14321 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
> amdgpu_cs_parser *p)
>   continue;
> 
>   r = amdgpu_vm_bo_update(adev, bo_va, false);
> - if (r) {
> - mutex_unlock(>bo_list->bo_list_mutex);
> + if (r)
>   return r;
> - }
> 
>   r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
> - if (r) {
> - mutex_unlock(>bo_list->bo_list_mutex);
> + if (r)
>   return r;
> - }
>   }
> + mutex_unlock(>bo_list->bo_list_mutex);

I think we don't need to unlock the bo_list_mutex here. If return != 0
amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
unlocks it in the end.

BR,

Melissa
> 
>   r = amdgpu_vm_handle_moved(adev, vm);
>   if (r)
> -- 
> 2.37.1
> ---
> Best Regards,
> - Maíra Canal
> 
> On 8/14/22 18:11, Mikhail Gavrilov wrote:
> > Hi folks.
> > Joined testing 5.20 today (7ebfc85e2cd7).
> > I encountered a frequently GPU freeze, after which a message appears
> > in the kernel logs:
> > [ 220.280990] [ cut here ]
> > [ 220.281000] refcount_t: underflow; use-after-free.
> > [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
> > refcount_warn_saturate+0xba/0x110
> > [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
> > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> > qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
> > fat intel_rapl_common snd_hda_codec_realtek mt76x2u
> > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
> > mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
> > mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
> > kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
> > videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
> > snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
> > snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
> > snd_seq_device joydev xpad iwlmei platform_profile bluetooth
> > ff_memless snd_pcm mc rapl
> > [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
> > k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
> > hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
> > iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
> > typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
> > sp5100_tco cec wmi ip6_tables ip_tables fuse
> > [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > 

  1   2   >