Re: [PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump

2023-07-28 Thread André Almeida

Hi Christian, gently ping here

Em 14/07/2023 13:11, André Almeida escreveu:

Hi,

The goal of this patchset is to improve debugging device resets on amdgpu.

The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.

The last patches are a rework to alloc devcoredump dynamically and to move it to
a better source file.

I have dropped the patches that add more information to devcoredump for now,
until I figure out a better way to do so, like storing the IB address in the
fence.

Thanks,
André

[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1

Changelog:

v2: 
https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealm...@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset

v1: 
https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealm...@igalia.com/
  - Drop "Mark contexts guilty for causing soft recoveries" patch
  - Use GFP_NOWAIT for devcoredump allocation

André Almeida (5):
   drm/amdgpu: Create a module param to disable soft recovery
   drm/amdgpu: Allocate coredump memory in a nonblocking way
   drm/amdgpu: Rework coredump to use memory dynamically
   drm/amdgpu: Move coredump code to amdgpu_reset file
   drm/amdgpu: Create version number for coredumps

  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  9 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 79 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 14 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  6 +-
  6 files changed, 111 insertions(+), 70 deletions(-)



[PATCH v3 0/5] drm/amdgpu: Add new reset option and rework coredump

2023-07-14 Thread André Almeida
Hi,

The goal of this patchset is to improve debugging device resets on amdgpu.

The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.

The last patches are a rework to alloc devcoredump dynamically and to move it to
a better source file.

I have dropped the patches that add more information to devcoredump for now,
until I figure out a better way to do so, like storing the IB address in the
fence.

Thanks,
André

[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1

Changelog:

v2: 
https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealm...@igalia.com/
- Drop the IB and ring patch
- Drop patch that limited information from kernel threads
- Add patch to move coredump to amdgpu_reset

v1: 
https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealm...@igalia.com/
 - Drop "Mark contexts guilty for causing soft recoveries" patch
 - Use GFP_NOWAIT for devcoredump allocation

André Almeida (5):
  drm/amdgpu: Create a module param to disable soft recovery
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Rework coredump to use memory dynamically
  drm/amdgpu: Move coredump code to amdgpu_reset file
  drm/amdgpu: Create version number for coredumps

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  9 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 79 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  | 14 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  6 +-
 6 files changed, 111 insertions(+), 70 deletions(-)

-- 
2.41.0