[PATCH] drm/xe/guc: Extract GuC error capture lists on G2H notification

2024-01-16 Thread Zhanjun Dong
Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/abi/guc_actions_abi.h | 7 + drivers/gpu/drm/xe/xe_guc_capture.c | 572 +++ drivers/gpu/drm/xe/xe_guc_ct.c | 2 + drivers/gpu/drm/xe/xe_guc_submit.c | 22 +- drivers/gpu/drm/xe/xe_guc_submit.h | 3

[PATCH] drm/xe/guc: Add XE_LP steered register lists

2024-01-16 Thread Zhanjun Dong
Add the ability for runtime allocation and freeing of steered register list extentions that depend on the detected HW config fuses. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_guc_capture.c | 187 +++- 1 file changed, 185 insertions(+), 2 deletions(-) diff

[PATCH] drm/xe/guc: Check sizing of guc_capture output

2024-01-16 Thread Zhanjun Dong
Add capture output size check function to provide a reasonable minimum size for error capture region before allocating the shared buffer. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_guc_capture.c | 76 + 1 file changed, 76 insertions(+) diff --git

[PATCH] drm/xe/guc: Expose dss per group for GuC error capture

2024-01-16 Thread Zhanjun Dong
Expose helper for dss per group of mcr, GuC error capture feature need this info to prepare buffer required. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_gt_mcr.c | 4 ++-- drivers/gpu/drm/xe/xe_gt_mcr.h | 1 + drivers/gpu/drm/xe/xe_gt_topology.c | 3

[PATCH] drm/xe/guc: Update GuC ADS size for error capture

2024-01-16 Thread Zhanjun Dong
for every engine-class type on the current hardware. Ensure we allocate a persistent store for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong --- drivers/gpu

[PATCH] drm/xe/guc: Add capture size check in GuC log buffer

2024-01-16 Thread Zhanjun Dong
The capture-nodes is included in GuC log buffer, add the size check for capture region in the whole GuC log buffer. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_gt_printk.h | 3 + drivers/gpu/drm/xe/xe_guc_fwif.h | 48 +++ drivers/gpu/drm/xe/xe_guc_log.c | 179

[PATCH] drm/xe/guc: Pre-allocate output nodes for extraction

2024-01-16 Thread Zhanjun Dong
Pre-allocate a fixed number of empty nodes up front (at the time of ADS registration) that we can consume from or return to an internal cached list of nodes. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_guc_capture.c | 83 + 1 file changed, 83 insertions

[PATCH] drm/xe/guc: Plumb GuC-capture into dev coredump

2024-01-16 Thread Zhanjun Dong
. This is reserved for future. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/xe_guc_capture.c | 99 ++- drivers/gpu/drm/xe/xe_guc_capture.h | 10 +++ drivers/gpu/drm/xe/xe_hw_engine.c | 73 - drivers/gpu/drm/xe/xe_hw_engine_types.h | 103

[PATCH] drm/xe/guc: Add register defines for GuC based register capture

2024-01-16 Thread Zhanjun Dong
Add registers defines and list of registers for GuC based error state capture. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/xe/Kconfig | 11 +++ drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/regs/xe_engine_regs.h | 12 +++ drivers/gpu/drm/xe/regs

[PATCH v2] drm/xe/guc: Add GuC based register capture for error capture

2024-01-16 Thread Zhanjun Dong
. Signed-off-by: Zhanjun Dong Zhanjun Dong (9): drm/xe/guc: Add register defines for GuC based register capture drm/xe/guc: Expose dss per group for GuC error capture drm/xe/guc: Update GuC ADS size for error capture drm/xe/guc: Add XE_LP steered register lists drm/xe/guc: Add capture size

[PATCH v3] drm/i915: Skip pxp init if gt is wedged

2023-11-13 Thread Zhanjun Dong
The gt wedged could be triggered by missing guc firmware file, HW not working, etc. Once triggered, it means all gt usage is dead, therefore we can't enable pxp under this fatal error condition. v2: Updated commit message. v3: Updated return code check. Signed-off-by: Zhanjun Dong --- drivers

[PATCH] drm/i915: Skip pxp init if gt is wedged

2023-11-01 Thread Zhanjun Dong
The gt wedged could be triggered by missing guc firmware file, HW not working, etc. Once triggered, it means all gt usage is dead, therefore we can't enable pxp under this fatal error condition. v2: Updated commit message. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/pxp/intel_pxp.c

[PATCH] drm/i915: Skip pxp init if gt is wedged

2023-10-26 Thread Zhanjun Dong
gt wedged is fatal error, skip the pxp init on this situation. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/pxp/intel_pxp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c b/drivers/gpu/drm/i915/pxp/intel_pxp.c index dc327cf40b5a

[PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-11 Thread Zhanjun Dong
intel_gt_reset called, reset_in_progress flag will be set, add code to check the flag, call async verion if reset is in progress. Signed-off-by: Zhanjun Dong Cc: John Harrison Cc: Andi Shyti Cc: Daniel Vetter --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++- 1 file

[PATCH v4] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-07-27 Thread Zhanjun Dong
set path calls asynchronous cancel. v4: Set to always sync from __uc_fini_hw path. Signed-off-by: Zhanjun Dong Cc: John Harrison Cc: Andi Shyti --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 17 ++--- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 +- drivers/gpu/dr

[PATCH] drm/i915/mtl: Update cache coherency setting for context structure

2023-07-06 Thread Zhanjun Dong
As context structure is shared memory for CPU/GPU, Wa_22016122933 is needed for this memory block as well. Signed-off-by: Zhanjun Dong CC: Fei Yang --- drivers/gpu/drm/i915/gt/intel_lrc.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt

[PATCH] drm/i915/gt: Remove incorrect hard coded cache coherrency setting

2023-06-22 Thread Zhanjun Dong
The previouse i915_gem_object_create_internal already set it with proper value before function return. This hard coded setting is incorrect for platforms like MTL, thus need to be removed. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_timeline.c | 2 -- 1 file changed, 2

[PATCH] drm/i915/gt: Remove incorrect hard coded cache coherrency setting

2023-06-16 Thread Zhanjun Dong
The previouse i915_gem_object_create_internal already set it with proper value before function return. This hard coded setting is incorrect for platforms like MTL, thus need to be removed. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_timeline.c | 2 -- 1 file changed, 2

[PATCH] Remove incorrect hard coded cache coherrency setting

2023-06-15 Thread Zhanjun Dong
The previouse i915_gem_object_create_internal already set it with proper value before function return. This hard coded setting is incorrect for platforms like MTL, thus need to be removed. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_timeline.c | 2 -- 1 file changed, 2

[PATCH v3] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-06-15 Thread Zhanjun Dong
}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 #2: 88813e6cce90 (>reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] v2: Add sync flag to guc_cancel_busyness_worker to ensure reset path calls asynchronous cancel. v3: Add sync flag to intel_guc_submission_disable to ensure

[PATCH] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-06-07 Thread Zhanjun Dong
}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 #2: 88813e6cce90 (>reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff

[PATCH] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-06-05 Thread Zhanjun Dong
;mutex){+.+.}-{3:3}, at: simple_attr_write_xsigned.constprop.0+0x47/0x110 #2: 88813e6cce90 (>reset.mutex){+.+.}-{3:3}, at: intel_gt_reset+0x19e/0x470 [i915] Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --

[PATCH] drm/i915/guc: Set wedged if enable guc communication failed

2023-04-26 Thread Zhanjun Dong
Add err code check for enable_communication on resume path. When resume failed, we can no longer use the GPU, marking the GPU as wedged. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++- drivers/gpu/drm/i915/gt/intel_reset.c | 19 --- drivers

[PATCH] drm/i915: Set wedged if enable guc communication failed

2023-03-02 Thread Zhanjun Dong
Add err code check for enable_communication on resume path. When resume failed, we can no longer use the GPU, marking the GPU as wedged. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 +++-- 2 files changed, 13

[PATCH] drm/i915: Set wedged if enable guc communication failed

2023-02-24 Thread Zhanjun Dong
Add err code check for enable_communication on resume path, set wedged if failed. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 - drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 +++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu

[PATCH] drm/i915/guc: Check for ct enabled while waiting for response

2022-07-16 Thread Zhanjun Dong
or message into debug message. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 27 +-- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index f0

[PATCH] drm/i915/guc: Check for ct enabled while waiting for response

2022-06-16 Thread Zhanjun Dong
or message into debug message. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 24 --- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index f0

[PATCH] drm/i915/guc: Check ctx while waiting for response

2022-06-02 Thread Zhanjun Dong
or message into debug message. Signed-off-by: Zhanjun Dong --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index f01325cd1b62..a3