[PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread jianzh
From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
removed state checking in amdgpu_debugfs_wait_dump
Improve on top of version 3 so that the node can be reopened.

v7: move reinit_completion into open() so that only one user
can open it.

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 79 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2a806cb55b78..9e8eeddfe7ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -992,6 +992,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..efee3f1adecf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include "amdgpu.h"
@@ -74,8 +74,83 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
 }
 
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   complete_all(>autodump.dumping);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (adev->autodump.dumping.done) {
+   reinit_completion(>autodump.dumping);
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete_all(>autodump.dumping);
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   poll_wait(file, >autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static void amdgpu_debugfs_autodump_init(struct amdgpu_device *adev)
+{
+   init_completion(>autodump.dumping);
+   

[pull] amdgpu, amdkfd drm-fixes-5.7

2020-05-13 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.7.

The following changes since commit a9fe6f18cde03c20facbf75dc910a372c1c1025b:

  Merge tag 'drm-misc-fixes-2020-05-07' of 
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes (2020-05-08 15:04:25 
+1000)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-fixes-5.7-2020-05-13

for you to fetch changes up to 650e723cecf2738dee828564396f3239829aba83:

  drm/amd/amdgpu: Update update_config() logic (2020-05-12 08:40:06 -0400)


amd-drm-fixes-5.7-2020-05-13:

amdgpu:
- Clockgating fixes
- Fix fbdev with scatter/gather display
- S4 fix for navi
- Soft recovery for gfx10
- Freesync fixes
- Atomic check cursor fix
- Add a gfxoff quirk
- MST fix

amdkfd:
- Fix GEM reference counting


Alex Deucher (2):
  drm/amdgpu: force fbdev into vram
  drm/amdgpu: implement soft_recovery for gfx10

Evan Quan (4):
  drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate
  drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate
  drm/amd/powerplay: perform PG ungate prior to CG ungate
  drm/amdgpu: enable hibernate support on Navi1X

Felix Kuehling (1):
  drm/amdgpu: Use GEM obj reference for KFD BOs

Leo (Hanghong) Ma (1):
  drm/amd/amdgpu: Update update_config() logic

Nicholas Kazlauskas (1):
  drm/amd/display: Fix vblank and pageflip event handling for FreeSync

Simon Ser (1):
  drm/amd/display: add basic atomic check for cursor plane

Tom St Denis (1):
  drm/amd/amdgpu: add raven1 part to the gfxoff quirk list

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  22 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  14 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 163 ++---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c |  10 +-
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c  |   6 +-
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c |   8 +-
 10 files changed, 119 insertions(+), 115 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Updated XGMI power down control support check

2020-05-13 Thread Zhang, Hawking
[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
From: Clements, John 
Sent: Thursday, May 14, 2020 11:23
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: Updated XGMI power down control support check


[AMD Official Use Only - Internal Distribution Only]

Updated SMC FW version check to determine if XGMI power down control is 
supported
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Updated XGMI power down control support check

2020-05-13 Thread Clements, John
[AMD Official Use Only - Internal Distribution Only]

Updated SMC FW version check to determine if XGMI power down control is 
supported


0001-drm-amdgpu-Updated-XGMI-power-down-control-support-c.patch
Description: 0001-drm-amdgpu-Updated-XGMI-power-down-control-support-c.patch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Provide SMI events watch

2020-05-13 Thread Felix Kuehling
Am 2020-05-13 um 3:41 p.m. schrieb Amber Lin:
> When the compute is malfunctioning or performance drops, the system admin
> will use SMI (System Management Interface) tool to monitor/diagnostic what
> went wrong. This patch provides an event watch interface for the user
> space to register devices and subscribe events they are interested. After
> registered, the user can use annoymous file descriptor's poll function
> with wait-time specified and wait for events to happen. Once an event
> happens, the user can use read() to retrieve information related to the
> event.
>
> VM fault event is done in this patch.
>
> v2: - remove UNREGISTER and add event ENABLE/DISABLE
> - correct kfifo usage
> - move event message API to kfd_ioctl.h
> v3: send the event msg in text than in binary
> v4: support multiple clients
> v5: move events enablement from ioctl to fd write
>
> Signed-off-by: Amber Lin 

Reviewed-by: Felix Kuehling 


> ---
>  drivers/gpu/drm/amd/amdkfd/Makefile  |   1 +
>  drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  18 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |   7 +
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c  |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   4 +
>  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c  | 214 
> +++
>  drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h  |  29 +++
>  include/uapi/linux/kfd_ioctl.h   |  16 +-
>  9 files changed, 292 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
> b/drivers/gpu/drm/amd/amdkfd/Makefile
> index 6147462..e1e4115 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -53,6 +53,7 @@ AMDKFD_FILES:= $(AMDKFD_PATH)/kfd_module.o \
>   $(AMDKFD_PATH)/kfd_int_process_v9.o \
>   $(AMDKFD_PATH)/kfd_dbgdev.o \
>   $(AMDKFD_PATH)/kfd_dbgmgr.o \
> + $(AMDKFD_PATH)/kfd_smi_events.o \
>   $(AMDKFD_PATH)/kfd_crat.o
>  
>  ifneq ($(CONFIG_AMD_IOMMU_V2),)
> diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> index 9f59ba9..24b4717 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> @@ -24,6 +24,7 @@
>  #include "kfd_events.h"
>  #include "cik_int.h"
>  #include "amdgpu_amdkfd.h"
> +#include "kfd_smi_events.h"
>  
>  static bool cik_event_interrupt_isr(struct kfd_dev *dev,
>   const uint32_t *ih_ring_entry,
> @@ -107,6 +108,7 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
>   ihre->source_id == CIK_INTSRC_GFX_MEM_PROT_FAULT) {
>   struct kfd_vm_fault_info info;
>  
> + kfd_smi_event_update_vmfault(dev, pasid);
>   kfd_process_vm_fault(dev->dqm, pasid);
>  
>   memset(, 0, sizeof(info));
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index cf0017f..e9b96ad 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -39,6 +39,7 @@
>  #include "kfd_device_queue_manager.h"
>  #include "kfd_dbgmgr.h"
>  #include "amdgpu_amdkfd.h"
> +#include "kfd_smi_events.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> @@ -1740,6 +1741,20 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
>   return r;
>  }
>  
> +/* Handle requests for watching SMI events */
> +static int kfd_ioctl_smi_events(struct file *filep,
> + struct kfd_process *p, void *data)
> +{
> + struct kfd_ioctl_smi_events_args *args = data;
> + struct kfd_dev *dev;
> +
> + dev = kfd_device_by_id(args->gpuid);
> + if (!dev)
> + return -EINVAL;
> +
> + return kfd_smi_event_open(dev, >anon_fd);
> +}
> +
>  #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
>   [_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
>   .cmd_drv = 0, .name = #ioctl}
> @@ -1835,6 +1850,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = 
> {
>  
>   AMDKFD_IOCTL_DEF(AMDKFD_IOC_ALLOC_QUEUE_GWS,
>   kfd_ioctl_alloc_queue_gws, 0),
> +
> + AMDKFD_IOCTL_DEF(AMDKFD_IOC_SMI_EVENTS,
> + kfd_ioctl_smi_events, 0),
>  };
>  
>  #define AMDKFD_CORE_IOCTL_COUNT  ARRAY_SIZE(amdkfd_ioctls)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 0491ab2..2c030c2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -586,6 

[PATCH] drm/amdkfd: Provide SMI events watch

2020-05-13 Thread Amber Lin
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.

VM fault event is done in this patch.

v2: - remove UNREGISTER and add event ENABLE/DISABLE
- correct kfifo usage
- move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
v5: move events enablement from ioctl to fd write

Signed-off-by: Amber Lin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |   1 +
 drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  18 ++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |   7 +
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c  | 214 +++
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h  |  29 +++
 include/uapi/linux/kfd_ioctl.h   |  16 +-
 9 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 6147462..e1e4115 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -53,6 +53,7 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v9.o \
$(AMDKFD_PATH)/kfd_dbgdev.o \
$(AMDKFD_PATH)/kfd_dbgmgr.o \
+   $(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
index 9f59ba9..24b4717 100644
--- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
@@ -24,6 +24,7 @@
 #include "kfd_events.h"
 #include "cik_int.h"
 #include "amdgpu_amdkfd.h"
+#include "kfd_smi_events.h"
 
 static bool cik_event_interrupt_isr(struct kfd_dev *dev,
const uint32_t *ih_ring_entry,
@@ -107,6 +108,7 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
ihre->source_id == CIK_INTSRC_GFX_MEM_PROT_FAULT) {
struct kfd_vm_fault_info info;
 
+   kfd_smi_event_update_vmfault(dev, pasid);
kfd_process_vm_fault(dev->dqm, pasid);
 
memset(, 0, sizeof(info));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cf0017f..e9b96ad 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -39,6 +39,7 @@
 #include "kfd_device_queue_manager.h"
 #include "kfd_dbgmgr.h"
 #include "amdgpu_amdkfd.h"
+#include "kfd_smi_events.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1740,6 +1741,20 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
return r;
 }
 
+/* Handle requests for watching SMI events */
+static int kfd_ioctl_smi_events(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   struct kfd_ioctl_smi_events_args *args = data;
+   struct kfd_dev *dev;
+
+   dev = kfd_device_by_id(args->gpuid);
+   if (!dev)
+   return -EINVAL;
+
+   return kfd_smi_event_open(dev, >anon_fd);
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
.cmd_drv = 0, .name = #ioctl}
@@ -1835,6 +1850,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_ALLOC_QUEUE_GWS,
kfd_ioctl_alloc_queue_gws, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_SMI_EVENTS,
+   kfd_ioctl_smi_events, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0491ab2..2c030c2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -586,6 +586,11 @@ static int kfd_gws_init(struct kfd_dev *kfd)
return ret;
 }
 
+static void kfd_smi_init(struct kfd_dev *dev) {
+   INIT_LIST_HEAD(>smi_clients);
+   spin_lock_init(>smi_lock);
+}
+
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
   

Re: [PATCH 0/6] RFC Support hot device unplug in amdgpu

2020-05-13 Thread Daniel Vetter
On Wed, May 13, 2020 at 10:32:56AM -0400, Andrey Grodzovsky wrote:
> 
> On 5/11/20 5:54 AM, Daniel Vetter wrote:
> > On Sat, May 09, 2020 at 02:51:44PM -0400, Andrey Grodzovsky wrote:
> > > This RFC is a more of a proof of concept then a fully working solution as 
> > > there are a few unresolved issues we are hopping to get advise on from 
> > > people on the mailing list.
> > > Until now extracting a card either by physical extraction (e.g. eGPU with 
> > > thunderbold connection or by emulation through syfs -> 
> > > /sys/bus/pci/devices/device_id/remove)
> > > would cause random crashes in user apps. The random crashes in apps were 
> > > mostly due to the app having mapped a device backed BO into it's adress 
> > > space was still
> > > trying to access the BO while the backing device was gone.
> > > To answer this first problem Christian suggested to fix the handling of 
> > > mapped memory in the clients when the device goes away by forcibly unmap 
> > > all buffers
> > > the user processes has by clearing their respective VMAs mapping the 
> > > device BOs. Then when the VMAs try to fill in the page tables again we 
> > > check in the fault handler
> > > if the device is removed and if so, return an error. This will generate a 
> > > SIGBUS to the application which can then cleanly terminate.
> > > This indeed was done but this in turn created a problem of kernel OOPs 
> > > were the OOPSes were due to the fact that while the app was terminating 
> > > because of the SIGBUS
> > > it would trigger use after free in the driver by calling to accesses 
> > > device structures that were already released from the pci remove sequence.
> > > This we handled by introducing a 'flush' seqence during device removal 
> > > were we wait for drm file reference to drop to 0 meaning all user clients 
> > > directly using this device terminated.
> > > With this I was able to cleanly emulate device unplug with X and glxgears 
> > > running and later emulate device plug back and restart of X and glxgears.
> > > 
> > > But this use case is only partial and as I see it all the use cases are 
> > > as follwing and the questions it raises.
> > > 
> > > 1) Application accesses a BO by opening drm file
> > >   1.1) BO is mapped into applications address space (BO is CPU visible) - 
> > > this one we have a solution for by invaldating BO's CPU mapping casuing 
> > > SIGBUS
> > >and termination and waiting for drm file refcound to drop to 0 
> > > before releasing the device
> > >   1.2) BO is not mapped into applcation address space (BO is CPU 
> > > invisible) - no solution yet because how we force the application to 
> > > terminate in this case ?
> > > 
> > > 2) Application accesses a BO by importing a DMA-BUF
> > >   2.1)  BO is mapped into applications address space (BO is CPU visible) 
> > > - solution is same as 1.1 but instead of waiting for drm file release we 
> > > wait for the
> > > imported dma-buf's file release
> > >   2.2)  BO is not mapped into applcation address space (BO is CPU 
> > > invisible) - our solution is to invalidate GPUVM page tables and destroy 
> > > backing storage for
> > >all exported BOs which will in turn casue VM faults in the 
> > > importing device and then when the importing driver will try to re-attach 
> > > the imported BO to
> > > update mappings we return -ENODEV in the import hook which 
> > > hopeffuly will cause the user app to terminate.
> > > 
> > > 3) Applcation opens a drm file or imports a dma-bud and holds a reference 
> > > but never access any BO or does access but never more after device was 
> > > unplug - how would we
> > > force this applcation to termiante before proceeding with device 
> > > removal code ? Otherwise the wait in pci remove just hangs for ever.
> > > 
> > > The attached patches adress 1.1, 2.1 and 2.2, for now only 1.1 fully 
> > > tested and I am still testing the others but I will be happy for any 
> > > advise on all the
> > > described use cases and maybe some alternative and better (more generic) 
> > > approach to this like maybe obtaining PIDs of relevant processes through 
> > > some revere
> > > mapping from device file and exported dma-buf files and send them SIGKILL 
> > > - would this make more sense or any other method ?
> > > 
> > > Patches 1-3 address 1.1
> > > Patch 4 addresses 2.1
> > > Pathces 5-6 address 2.2
> > > 
> > > Reference: 
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1081data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cf6eec90e9da144cb772a08d7f5921ec2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637247880251844517sdata=QBGIbm1KLysglvRAvoiek8jBcNLE%2B4J7gVGDAbZD5Jw%3Dreserved=0
> > So we've been working on this problem for a few years already (but it's
> > still not solved), I think you could have saved yourselfs some typing.
> > 
> > Bunch of things:
> > - we can't wait for userspace in the hotunplug 

Re: Reg. Adaptive Sync feature in xf86-video-amdgpu

2020-05-13 Thread uday kiran pichika
Hello Michel and Team,

Can you please help to provide the below details on the Adaptive Sync
verification?
1. As you have mentioned in IRC, AMD has verified on Ubuntu machine with
Unity/Compiz Compositor. But when i see mesa/src/util/00-mesa-defaults.conf
where Mutter and Compiz compositors are black listed.







  
I even tried by keeping the Ubuntu machine in Runlevel-3 mode and launched
Xorg manually. Ran a gfxapp in full screen mode. In this case modesetting
driver is not getting launched because there is no gdm3 service running.

Can you please tell me what is the setup you people followed to verify this
?

Thanks
Uday Kiran





On Mon, Apr 20, 2020 at 10:18 PM Michel Dänzer  wrote:

> On 2020-04-20 6:45 p.m., uday kiran pichika wrote:
> > On Mon, Apr 20, 2020 at 9:45 PM Michel Dänzer 
> wrote:
> >> On 2020-04-20 6:04 p.m., uday kiran pichika wrote:
> >>>
> >>> Even in amdgpu_present_flip(), there is a check
> >>> for amdgpu_window_has_variable_refresh() which actually checks whether
> >>> window has a variable_refresh property set from the MESA or not ? this
> >>> check is failing in my case and never calls
> >> amdgpu_present_set_screen_vrr.
> >>
> >> This should be set by
> >>
> >> get_window_priv(window)->variable_refresh = variable_refresh;
> >>
> >> in amdgpu_vrr_property_update.
> >>
> >
> > I amdgpu_vrr_property_update method gets called from
> amdgpu_change_property
> > when the property is being added to the window. though variable_refresh
> > property is updating from the below call, this window is not same as the
> > one(info->flip_window) in amdgpu_present_flip.
> > *get_window_priv(window)*->variable_refresh = variable_refresh;
>
> Then it's probably not the application's window which is page flipping,
> but e.g. the compositor's. Make sure your compositor supports
> unredirecting fullscreen windows.
>
>
> > Could you please let me know you IRC ID to have a chat in that for more
> > information.
>
> I'm MrCooper on IRC.
>
>
> --
> Earthling Michel Dänzer   |   https://redhat.com
> Libre software enthusiast | Mesa and X developer
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/6] RFC Support hot device unplug in amdgpu

2020-05-13 Thread Andrey Grodzovsky


On 5/11/20 5:54 AM, Daniel Vetter wrote:

On Sat, May 09, 2020 at 02:51:44PM -0400, Andrey Grodzovsky wrote:

This RFC is a more of a proof of concept then a fully working solution as there 
are a few unresolved issues we are hopping to get advise on from people on the 
mailing list.
Until now extracting a card either by physical extraction (e.g. eGPU with 
thunderbold connection or by emulation through syfs -> 
/sys/bus/pci/devices/device_id/remove)
would cause random crashes in user apps. The random crashes in apps were mostly 
due to the app having mapped a device backed BO into it's adress space was still
trying to access the BO while the backing device was gone.
To answer this first problem Christian suggested to fix the handling of mapped 
memory in the clients when the device goes away by forcibly unmap all buffers
the user processes has by clearing their respective VMAs mapping the device 
BOs. Then when the VMAs try to fill in the page tables again we check in the 
fault handler
if the device is removed and if so, return an error. This will generate a 
SIGBUS to the application which can then cleanly terminate.
This indeed was done but this in turn created a problem of kernel OOPs were the 
OOPSes were due to the fact that while the app was terminating because of the 
SIGBUS
it would trigger use after free in the driver by calling to accesses device 
structures that were already released from the pci remove sequence.
This we handled by introducing a 'flush' seqence during device removal were we 
wait for drm file reference to drop to 0 meaning all user clients directly 
using this device terminated.
With this I was able to cleanly emulate device unplug with X and glxgears 
running and later emulate device plug back and restart of X and glxgears.

But this use case is only partial and as I see it all the use cases are as 
follwing and the questions it raises.

1) Application accesses a BO by opening drm file
1.1) BO is mapped into applications address space (BO is CPU visible) - 
this one we have a solution for by invaldating BO's CPU mapping casuing SIGBUS
 and termination and waiting for drm file refcound to drop to 0 
before releasing the device
1.2) BO is not mapped into applcation address space (BO is CPU 
invisible) - no solution yet because how we force the application to terminate 
in this case ?

2) Application accesses a BO by importing a DMA-BUF
2.1)  BO is mapped into applications address space (BO is CPU visible) 
- solution is same as 1.1 but instead of waiting for drm file release we wait 
for the
  imported dma-buf's file release
2.2)  BO is not mapped into applcation address space (BO is CPU 
invisible) - our solution is to invalidate GPUVM page tables and destroy 
backing storage for
   all exported BOs which will in turn casue VM faults in the 
importing device and then when the importing driver will try to re-attach the 
imported BO to
  update mappings we return -ENODEV in the import hook which 
hopeffuly will cause the user app to terminate.

3) Applcation opens a drm file or imports a dma-bud and holds a reference but 
never access any BO or does access but never more after device was unplug - how 
would we
force this applcation to termiante before proceeding with device removal 
code ? Otherwise the wait in pci remove just hangs for ever.

The attached patches adress 1.1, 2.1 and 2.2, for now only 1.1 fully tested and 
I am still testing the others but I will be happy for any advise on all the
described use cases and maybe some alternative and better (more generic) 
approach to this like maybe obtaining PIDs of relevant processes through some 
revere
mapping from device file and exported dma-buf files and send them SIGKILL - 
would this make more sense or any other method ?

Patches 1-3 address 1.1
Patch 4 addresses 2.1
Pathces 5-6 address 2.2

Reference: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1081data=02%7C01%7Candrey.grodzovsky%40amd.com%7Cf6eec90e9da144cb772a08d7f5921ec2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637247880251844517sdata=QBGIbm1KLysglvRAvoiek8jBcNLE%2B4J7gVGDAbZD5Jw%3Dreserved=0

So we've been working on this problem for a few years already (but it's
still not solved), I think you could have saved yourselfs some typing.

Bunch of things:
- we can't wait for userspace in the hotunplug handlers, that might never
   happen. The correct way is to untangle the lifetime of your hw driver
   for a specific struct pci_device from the drm_device lifetime.
   Infrastructure is all there now, see drm_dev_get/put, drm_dev_unplug and
   drm_dev_enter/exit.


this

To be sure I understood you - do you mean that we should 
disable/shutdown any HW related stuff such as interrupts disable, any 
shutdown related device registers programming and io regions unmapping 
during pci remove sequence (in our case 

[PATCH v5 20/38] drm: radeon: fix common struct sg_table related issues

2020-05-13 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Christian König 
---
For more information, see '[PATCH v5 00/38] DRM: fix struct sg_table nents
vs. orig_nents misuse' thread:
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5d50c9e..0e3eb0d 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -481,7 +481,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct radeon_device *rdev = radeon_get_rdev(ttm->bdev);
struct radeon_ttm_tt *gtt = (void *)ttm;
-   unsigned pinned = 0, nents;
+   unsigned pinned = 0;
int r;
 
int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);
@@ -521,9 +521,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
if (r)
goto release_sg;
 
-   r = -ENOMEM;
-   nents = dma_map_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   r = dma_map_sgtable(rdev->dev, ttm->sg, direction, 0);
+   if (r)
goto release_sg;
 
drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
@@ -554,9 +553,9 @@ static void radeon_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
return;
 
/* free the sg table and pages again */
-   dma_unmap_sg(rdev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
+   dma_unmap_sgtable(rdev->dev, ttm->sg, direction, 0);
 
-   for_each_sg_page(ttm->sg->sgl, _iter, ttm->sg->nents, 0) {
+   for_each_sgtable_page(ttm->sg, _iter, 0) {
struct page *page = sg_page_iter_page(_iter);
if (!(gtt->userflags & RADEON_GEM_USERPTR_READONLY))
set_page_dirty(page);
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v5 07/38] drm: amdgpu: fix common struct sg_table related issues

2020-05-13 Thread Marek Szyprowski
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski 
Reviewed-by: Christian König 
---
For more information, see '[PATCH v5 00/38] DRM: fix struct sg_table nents
vs. orig_nents misuse' thread:
https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 9 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 8 
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 43d8ed7..519ce44 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -307,8 +307,8 @@ static struct sg_table *amdgpu_dma_buf_map(struct 
dma_buf_attachment *attach,
if (IS_ERR(sgt))
return sgt;
 
-   if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
- DMA_ATTR_SKIP_CPU_SYNC))
+   if (dma_map_sgtable(attach->dev, sgt, dir,
+   DMA_ATTR_SKIP_CPU_SYNC))
goto error_free;
break;
 
@@ -349,7 +349,7 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment 
*attach,
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
if (sgt->sgl->page_link) {
-   dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, dir);
+   dma_unmap_sgtable(attach->dev, sgt, dir, 0);
sg_free_table(sgt);
kfree(sgt);
} else {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 9cbecd5..57a5d56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1024,7 +1024,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(ttm->bdev);
struct amdgpu_ttm_tt *gtt = (void *)ttm;
-   unsigned nents;
int r;
 
int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
@@ -1039,9 +1038,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
goto release_sg;
 
/* Map SG to device */
-   r = -ENOMEM;
-   nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-   if (nents == 0)
+   r = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
+   if (r)
goto release_sg;
 
/* convert SG to linear array of pages and dma addresses */
@@ -1072,8 +1070,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt 
*ttm)
return;
 
/* unmap the pages mapped to the device */
-   dma_unmap_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-
+   dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
sg_free_table(ttm->sg);
 
 #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index d399e58..75495a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -477,11 +477,11 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
if (r)
goto error_free;
 
-   for_each_sg((*sgt)->sgl, sg, num_entries, i)
+   for_each_sgtable_sg(*sgt, sg, i)
sg->length = 0;
 
node = mem->mm_node;
-   for_each_sg((*sgt)->sgl, sg, num_entries, i) {
+   for_each_sgtable_sg(*sgt, sg, i) {
phys_addr_t phys = (node->start << PAGE_SHIFT) +
adev->gmc.aper_base;
size_t size = node->size << PAGE_SHIFT;
@@ -501,7 +501,7 @@ int amdgpu_vram_mgr_alloc_sgt(struct 

Re: [PATCH 1/2] drm/radeon: disable AGP by default

2020-05-13 Thread Mathieu Malaterre
On Wed, May 13, 2020 at 1:21 PM Christian König
 wrote:
>
> Always use the PCI GART instead.

Reviewed-by: Mathieu Malaterre 

> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/radeon/radeon_drv.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
> b/drivers/gpu/drm/radeon/radeon_drv.c
> index bbb0883e8ce6..a71f13116d6b 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -171,12 +171,7 @@ int radeon_no_wb;
>  int radeon_modeset = -1;
>  int radeon_dynclks = -1;
>  int radeon_r4xx_atom = 0;
> -#ifdef __powerpc__
> -/* Default to PCI on PowerPC (fdo #95017) */
>  int radeon_agpmode = -1;
> -#else
> -int radeon_agpmode = 0;
> -#endif
>  int radeon_vram_limit = 0;
>  int radeon_gart_size = -1; /* auto */
>  int radeon_benchmarking = 0;
> --
> 2.17.1
>


-- 
Mathieu
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/ttm: deprecate AGP support

2020-05-13 Thread Christian König

Am 13.05.20 um 14:34 schrieb Daniel Vetter:

On Wed, May 13, 2020 at 01:03:13PM +0200, Christian König wrote:

Even when core AGP support is compiled in Radeon and
Nouveau can also work with the PCI GART.

The AGP support was notorious unstable and hard to
maintain, so deprecate it for now and only enable it if
there is a good reason to do so.

Signed-off-by: Christian König 

So a lot more work, and more risk (but hey it's agp, how busted can it
get) could be to demidlayer this. I.e. a small set of helpers to create a
TTM_PL_TT manager, backed by agp. With zero agp code remaining in ttm
itself, and all the ttm agp code moved out to a ttm-agp-helper.ko module
that drivers would call.


Yes, exactly that's the idea which I have in mind for quite a while as well.

Problem is I have exactly one old x86 Mac to test this. Currently trying 
to get another old system up and running again.


That is not even remotely sufficient to test anything as large as this.

Regards,
Christian.



But again a lot of work, so really only an option if we can't sunset agp
directly.
-Daniel


---
  drivers/gpu/drm/Kconfig   |  8 
  drivers/gpu/drm/nouveau/nouveau_bo.c  |  8 
  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h |  2 +-
  drivers/gpu/drm/radeon/radeon_agp.c   |  8 
  drivers/gpu/drm/radeon/radeon_ttm.c   | 10 +-
  drivers/gpu/drm/ttm/Makefile  |  2 +-
  6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 4f4e7fa001c1..52d834303766 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -182,6 +182,14 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
  
+config DRM_TTM_AGP

+   bool "TTM AGP GART support (deprecated)"
+   depends on DRM_TTM && AGP
+   default n
+   help
+ Enables deprecated AGP GART support in TTM.
+ Less reliable than PCI GART, but faster in some cases.
+
  config DRM_TTM_DMA_PAGE_POOL
bool
depends on DRM_TTM && (SWIOTLB || INTEL_IOMMU)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c40f127de3d0..c73d4ae48f5c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -635,7 +635,7 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
  static struct ttm_tt *
  nouveau_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags)
  {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
  
  	if (drm->agp.bridge) {

@@ -1448,7 +1448,7 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, 
struct ttm_mem_reg *reg)
/* System memory */
return 0;
case TTM_PL_TT:
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
reg->bus.offset = reg->start << PAGE_SHIFT;
reg->bus.base = drm->agp.base;
@@ -1603,7 +1603,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm, struct 
ttm_operation_ctx *ctx)
drm = nouveau_bdev(ttm->bdev);
dev = drm->dev->dev;
  
-#if IS_ENABLED(CONFIG_AGP)

+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
return ttm_agp_tt_populate(ttm, ctx);
}
@@ -1656,7 +1656,7 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
drm = nouveau_bdev(ttm->bdev);
dev = drm->dev->dev;
  
-#if IS_ENABLED(CONFIG_AGP)

+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
ttm_agp_tt_unpopulate(ttm);
return;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
index ad4d3621d02b..d572528da852 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
@@ -1,6 +1,6 @@
  /* SPDX-License-Identifier: MIT */
  #include "priv.h"
-#if defined(CONFIG_AGP) || (defined(CONFIG_AGP_MODULE) && defined(MODULE))
+#if defined(CONFIG_DRM_TTM_AGP)
  #ifndef __NVKM_PCI_AGP_H__
  #define __NVKM_PCI_AGP_H__
  
diff --git a/drivers/gpu/drm/radeon/radeon_agp.c b/drivers/gpu/drm/radeon/radeon_agp.c

index 0aca7bdf54c7..294d19301708 100644
--- a/drivers/gpu/drm/radeon/radeon_agp.c
+++ b/drivers/gpu/drm/radeon/radeon_agp.c
@@ -33,7 +33,7 @@
  
  #include "radeon.h"
  
-#if IS_ENABLED(CONFIG_AGP)

+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
  
  struct radeon_agpmode_quirk {

u32 hostbridge_vendor;
@@ -131,7 +131,7 @@ static struct radeon_agpmode_quirk 
radeon_agpmode_quirk_list[] = {
  
  int radeon_agp_init(struct radeon_device *rdev)

  {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
struct radeon_agpmode_quirk *p = radeon_agpmode_quirk_list;
struct drm_agp_mode mode;
struct drm_agp_info info;

Re: [PATCH 2/2] drm/ttm: deprecate AGP support

2020-05-13 Thread Daniel Vetter
On Wed, May 13, 2020 at 01:03:13PM +0200, Christian König wrote:
> Even when core AGP support is compiled in Radeon and
> Nouveau can also work with the PCI GART.
> 
> The AGP support was notorious unstable and hard to
> maintain, so deprecate it for now and only enable it if
> there is a good reason to do so.
> 
> Signed-off-by: Christian König 

So a lot more work, and more risk (but hey it's agp, how busted can it
get) could be to demidlayer this. I.e. a small set of helpers to create a
TTM_PL_TT manager, backed by agp. With zero agp code remaining in ttm
itself, and all the ttm agp code moved out to a ttm-agp-helper.ko module
that drivers would call.

But again a lot of work, so really only an option if we can't sunset agp
directly.
-Daniel

> ---
>  drivers/gpu/drm/Kconfig   |  8 
>  drivers/gpu/drm/nouveau/nouveau_bo.c  |  8 
>  drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h |  2 +-
>  drivers/gpu/drm/radeon/radeon_agp.c   |  8 
>  drivers/gpu/drm/radeon/radeon_ttm.c   | 10 +-
>  drivers/gpu/drm/ttm/Makefile  |  2 +-
>  6 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 4f4e7fa001c1..52d834303766 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -182,6 +182,14 @@ config DRM_TTM
> GPU memory types. Will be enabled automatically if a device driver
> uses it.
>  
> +config DRM_TTM_AGP
> + bool "TTM AGP GART support (deprecated)"
> + depends on DRM_TTM && AGP
> + default n
> + help
> +   Enables deprecated AGP GART support in TTM.
> +   Less reliable than PCI GART, but faster in some cases.
> +
>  config DRM_TTM_DMA_PAGE_POOL
>   bool
>   depends on DRM_TTM && (SWIOTLB || INTEL_IOMMU)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index c40f127de3d0..c73d4ae48f5c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -635,7 +635,7 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
> u32 val)
>  static struct ttm_tt *
>  nouveau_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags)
>  {
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
>  
>   if (drm->agp.bridge) {
> @@ -1448,7 +1448,7 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, 
> struct ttm_mem_reg *reg)
>   /* System memory */
>   return 0;
>   case TTM_PL_TT:
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   if (drm->agp.bridge) {
>   reg->bus.offset = reg->start << PAGE_SHIFT;
>   reg->bus.base = drm->agp.base;
> @@ -1603,7 +1603,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm, struct 
> ttm_operation_ctx *ctx)
>   drm = nouveau_bdev(ttm->bdev);
>   dev = drm->dev->dev;
>  
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   if (drm->agp.bridge) {
>   return ttm_agp_tt_populate(ttm, ctx);
>   }
> @@ -1656,7 +1656,7 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
>   drm = nouveau_bdev(ttm->bdev);
>   dev = drm->dev->dev;
>  
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   if (drm->agp.bridge) {
>   ttm_agp_tt_unpopulate(ttm);
>   return;
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
> index ad4d3621d02b..d572528da852 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: MIT */
>  #include "priv.h"
> -#if defined(CONFIG_AGP) || (defined(CONFIG_AGP_MODULE) && defined(MODULE))
> +#if defined(CONFIG_DRM_TTM_AGP)
>  #ifndef __NVKM_PCI_AGP_H__
>  #define __NVKM_PCI_AGP_H__
>  
> diff --git a/drivers/gpu/drm/radeon/radeon_agp.c 
> b/drivers/gpu/drm/radeon/radeon_agp.c
> index 0aca7bdf54c7..294d19301708 100644
> --- a/drivers/gpu/drm/radeon/radeon_agp.c
> +++ b/drivers/gpu/drm/radeon/radeon_agp.c
> @@ -33,7 +33,7 @@
>  
>  #include "radeon.h"
>  
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>  
>  struct radeon_agpmode_quirk {
>   u32 hostbridge_vendor;
> @@ -131,7 +131,7 @@ static struct radeon_agpmode_quirk 
> radeon_agpmode_quirk_list[] = {
>  
>  int radeon_agp_init(struct radeon_device *rdev)
>  {
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   struct radeon_agpmode_quirk *p = radeon_agpmode_quirk_list;
>   struct drm_agp_mode mode;
>   struct drm_agp_info info;
> @@ -265,7 +265,7 @@ int radeon_agp_init(struct radeon_device *rdev)
>  
>  void radeon_agp_resume(struct radeon_device *rdev)
>  {
> -#if IS_ENABLED(CONFIG_AGP)
> +#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
>   

[RFC] Deprecate AGP GART support for Radeon/Nouveau/TTM

2020-05-13 Thread Christian König
Unfortunately AGP is still to widely used as we could just drop support for 
using its GART.

Not using the AGP GART also doesn't mean a loss in functionality since drivers 
will just fallback to the driver specific PCI GART.

For now just deprecate the code and don't enable the AGP GART in TTM even when 
general AGP support is available.

Please comment,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/ttm: deprecate AGP support

2020-05-13 Thread Christian König
Even when core AGP support is compiled in Radeon and
Nouveau can also work with the PCI GART.

The AGP support was notorious unstable and hard to
maintain, so deprecate it for now and only enable it if
there is a good reason to do so.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/Kconfig   |  8 
 drivers/gpu/drm/nouveau/nouveau_bo.c  |  8 
 drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h |  2 +-
 drivers/gpu/drm/radeon/radeon_agp.c   |  8 
 drivers/gpu/drm/radeon/radeon_ttm.c   | 10 +-
 drivers/gpu/drm/ttm/Makefile  |  2 +-
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 4f4e7fa001c1..52d834303766 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -182,6 +182,14 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
 
+config DRM_TTM_AGP
+   bool "TTM AGP GART support (deprecated)"
+   depends on DRM_TTM && AGP
+   default n
+   help
+ Enables deprecated AGP GART support in TTM.
+ Less reliable than PCI GART, but faster in some cases.
+
 config DRM_TTM_DMA_PAGE_POOL
bool
depends on DRM_TTM && (SWIOTLB || INTEL_IOMMU)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c40f127de3d0..c73d4ae48f5c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -635,7 +635,7 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, 
u32 val)
 static struct ttm_tt *
 nouveau_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags)
 {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
 
if (drm->agp.bridge) {
@@ -1448,7 +1448,7 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, 
struct ttm_mem_reg *reg)
/* System memory */
return 0;
case TTM_PL_TT:
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
reg->bus.offset = reg->start << PAGE_SHIFT;
reg->bus.base = drm->agp.base;
@@ -1603,7 +1603,7 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm, struct 
ttm_operation_ctx *ctx)
drm = nouveau_bdev(ttm->bdev);
dev = drm->dev->dev;
 
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
return ttm_agp_tt_populate(ttm, ctx);
}
@@ -1656,7 +1656,7 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
drm = nouveau_bdev(ttm->bdev);
dev = drm->dev->dev;
 
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (drm->agp.bridge) {
ttm_agp_tt_unpopulate(ttm);
return;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
index ad4d3621d02b..d572528da852 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/agp.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: MIT */
 #include "priv.h"
-#if defined(CONFIG_AGP) || (defined(CONFIG_AGP_MODULE) && defined(MODULE))
+#if defined(CONFIG_DRM_TTM_AGP)
 #ifndef __NVKM_PCI_AGP_H__
 #define __NVKM_PCI_AGP_H__
 
diff --git a/drivers/gpu/drm/radeon/radeon_agp.c 
b/drivers/gpu/drm/radeon/radeon_agp.c
index 0aca7bdf54c7..294d19301708 100644
--- a/drivers/gpu/drm/radeon/radeon_agp.c
+++ b/drivers/gpu/drm/radeon/radeon_agp.c
@@ -33,7 +33,7 @@
 
 #include "radeon.h"
 
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
 
 struct radeon_agpmode_quirk {
u32 hostbridge_vendor;
@@ -131,7 +131,7 @@ static struct radeon_agpmode_quirk 
radeon_agpmode_quirk_list[] = {
 
 int radeon_agp_init(struct radeon_device *rdev)
 {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
struct radeon_agpmode_quirk *p = radeon_agpmode_quirk_list;
struct drm_agp_mode mode;
struct drm_agp_info info;
@@ -265,7 +265,7 @@ int radeon_agp_init(struct radeon_device *rdev)
 
 void radeon_agp_resume(struct radeon_device *rdev)
 {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
int r;
if (rdev->flags & RADEON_IS_AGP) {
r = radeon_agp_init(rdev);
@@ -277,7 +277,7 @@ void radeon_agp_resume(struct radeon_device *rdev)
 
 void radeon_agp_fini(struct radeon_device *rdev)
 {
-#if IS_ENABLED(CONFIG_AGP)
+#if IS_ENABLED(CONFIG_DRM_TTM_AGP)
if (rdev->ddev->agp && rdev->ddev->agp->acquired) {
drm_agp_release(rdev->ddev);
}
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5d50c9edbe80..4f9c4e5f8263 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -86,7 +86,7 @@ static int 

[PATCH 1/2] drm/radeon: disable AGP by default

2020-05-13 Thread Christian König
Always use the PCI GART instead.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_drv.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index bbb0883e8ce6..a71f13116d6b 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -171,12 +171,7 @@ int radeon_no_wb;
 int radeon_modeset = -1;
 int radeon_dynclks = -1;
 int radeon_r4xx_atom = 0;
-#ifdef __powerpc__
-/* Default to PCI on PowerPC (fdo #95017) */
 int radeon_agpmode = -1;
-#else
-int radeon_agpmode = 0;
-#endif
 int radeon_vram_limit = 0;
 int radeon_gart_size = -1; /* auto */
 int radeon_benchmarking = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Nouveau] [RFC] Remove AGP support from Radeon/Nouveau/TTM

2020-05-13 Thread Thomas Zimmermann
Hi

Am 13.05.20 um 11:27 schrieb Emil Velikov:
> On Tue, 12 May 2020 at 20:48, Alex Deucher  wrote:
> 
>
> There's some AGP support code in the DRM core. Can some of that declared
> as legacy?
>
> Specifically, what about these AGP-related ioctl calls? Can they be
> declared as legacy? It would appear to me that KMS-based drivers don't
> have to manage AGP by themselves. (?)

 The old drm core AGP code is mainly (only?) for the non-KMS drivers.
 E.g., mach64, r128, sis, savage, etc.
>>>
>>> Exactly my point. There's one drm_agp_init() call left in radeon. The
>>> rest of the AGP code is apparently for legacy non-KMS drivers. Should
>>> the related ioctl calls be declared as legacy (i.e., listed with
>>> DRM_LEGACY_IOCTL_DEF instead of DRM_IOCTL_DEF)? If so, much of the AGP
>>> core code could probably be moved behind CONFIG_DRM_LEGACY as well.
>>
>> Ah, I forgot about drm_agp_init().  I was just thinking the other AGP
>> stuff.  Yeah, I think we could make it legacy.
>>
> Fwiw I've got local patches a) removing drm_agp_init() from radeon and
> b) making the core drm code legacy only.
> Will try to finish them over the weekend and send out.
> 
> Even if AGP GART gets dropped the b) patches will be good ;-)

Fantastic! Please see one of my other comments in this thread. There's
still drm_agp_init() somewhere in radeon_drv.c. So patch a) might still
be useful.

Best regards
Thomas

> 
> -Emil
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Nouveau] [RFC] Remove AGP support from Radeon/Nouveau/TTM

2020-05-13 Thread Thomas Zimmermann
Hi

Am 11.05.20 um 19:17 schrieb Christian König:
> Hi guys,
> 
> Well let's face it AGP is a total headache to maintain and dead for at least 
> 10+ years.
> 
> We have a lot of x86 specific stuff in the architecture independent graphics 
> memory management to get the caching right, abusing the DMA API on multiple 
> occasions, need to distinct between AGP and driver specific page tables etc 
> etc...
> 
> So the idea here is to just go ahead and remove the support from Radeon and 
> Nouveau and then drop the necessary code from TTM.
> 
> For Radeon this means that we just switch over to the driver specific page 
> tables and everything should more or less continue to work.
> 
> For Nouveau I'm not 100% sure, but from the code it of hand looks like we can 
> do it similar to Radeon.
> 
> Please comment what you think about this.

It's probably not much of a problem in practice.

I guess everyone who plays 3d games has upgraded to something newer.

Wrt desktops, I found that some components of modern desktops (KDE,
Gnome) now have a hard requirement on SSE2. [1][2] But AGP is mostly
used in old 32-bit systems, which don't have SSE2.* So remaining users
of the GART functionality probably don't use any of these desktops.

The simpler WMs are usually usable with only little VRAM. At least they
should not be hit by any performance penalty.

Best regards
Thomas

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1162283
[2] https://bugzilla.opensuse.org/show_bug.cgi?id=1077870

* First-generation Athlon 64 have SSE2 and AGP support. But there are
few systems. Around that time AGP was replaced by PCIe.

> 
> Regards,
> Christian.
> 
> 
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread Christian König

Since usermode app might open a file , do nothing and close it.
That case is unproblematic since closing the debugfs file sets the state 
of the struct completion to completed again no matter if we waited or not.


But when you don't reset in the open() callback we open a small windows 
between open and poll where userspace could open the debugfs file twice.


Regards,
Christian.

Am 13.05.20 um 11:37 schrieb Zhao, Jiange:

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,

Since amdgpu_debugfs_wait_dump() would need 'audodump.dumping.done==0' to 
actually stop and wait for user mode app to dump.

Since usermode app might open a file , do nothing and close it. I believe a 
poll() function would be a better indicator that the usermode app actually 
wants to do a dump.

Also, a reset might happen between open() and poll(). The worst case would be 
wait_dump() would wait until timeout and usermode poll would always fail.

Jiange

-Original Message-
From: Christian König 
Sent: Wednesday, May 13, 2020 4:20 PM
To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Pelloux-prayer, Pierre-eric 
; Zhao, Jiange ; Koenig, Christian 
; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

Am 09.05.20 um 11:45 schrieb jia...@amd.com:

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part of an
opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system and
poll() for readable/writable. When a GPU reset is due, amdgpu would
notify usermode app through wait_queue_head and give it 10 minutes to
dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through the
completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
  (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
rename debugfs file to amdgpu_autodump,
provide autodump_read as well,
style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
  the node can be reopened; also, there is no need to wait for
  completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
  add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
  removed state checking in amdgpu_debugfs_wait_dump
  Improve on top of version 3 so that the node can be reopened.

Signed-off-by: Jiange Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 78 -
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
   4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2a806cb55b78..9e8eeddfe7ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -992,6 +992,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
   };
   
   static inline struct amdgpu_device *amdgpu_ttm_adev(struct

ttm_bo_device *bdev) diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..261b67ece7fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
   #include 
   #include 
   #include 
-
+#include 
   #include 
   
   #include "amdgpu.h"

@@ -74,8 +74,82 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
   }
   
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev) { #if

+defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   complete_all(>autodump.dumping);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
   #if defined(CONFIG_DEBUG_FS)
   
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct

+file *file) {
+   

RE: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Hi Christian,

Since amdgpu_debugfs_wait_dump() would need 'audodump.dumping.done==0' to 
actually stop and wait for user mode app to dump.

Since usermode app might open a file , do nothing and close it. I believe a 
poll() function would be a better indicator that the usermode app actually 
wants to do a dump.

Also, a reset might happen between open() and poll(). The worst case would be 
wait_dump() would wait until timeout and usermode poll would always fail.

Jiange

-Original Message-
From: Christian König  
Sent: Wednesday, May 13, 2020 4:20 PM
To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Pelloux-prayer, Pierre-eric 
; Zhao, Jiange ; 
Koenig, Christian ; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

Am 09.05.20 um 11:45 schrieb jia...@amd.com:
> From: Jiange Zhao 
>
> When GPU got timeout, it would notify an interested part of an 
> opportunity to dump info before actual GPU reset.
>
> A usermode app would open 'autodump' node under debugfs system and 
> poll() for readable/writable. When a GPU reset is due, amdgpu would 
> notify usermode app through wait_queue_head and give it 10 minutes to 
> dump info.
>
> After usermode app has done its work, this 'autodump' node is closed.
> On node closure, amdgpu gets to know the dump is done through the 
> completion that is triggered in release().
>
> There is no write or read callback because necessary info can be 
> obtained through dmesg and umr. Messages back and forth between 
> usermode app and amdgpu are unnecessary.
>
> v2: (1) changed 'registered' to 'app_listening'
>  (2) add a mutex in open() to prevent race condition
>
> v3 (chk): grab the reset lock to avoid race in autodump_open,
>rename debugfs file to amdgpu_autodump,
>provide autodump_read as well,
>style and code cleanups
>
> v4: add 'bool app_listening' to differentiate situations, so that
>  the node can be reopened; also, there is no need to wait for
>  completion when no app is waiting for a dump.
>
> v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
>  add 'app_state_mutex' for race conditions:
>   (1)Only 1 user can open this file node
>   (2)wait_dump() can only take effect after poll() executed.
>   (3)eliminated the race condition between release() and
>  wait_dump()
>
> v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
>  removed state checking in amdgpu_debugfs_wait_dump
>  Improve on top of version 3 so that the node can be reopened.
>
> Signed-off-by: Jiange Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 78 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
>   4 files changed, 87 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 2a806cb55b78..9e8eeddfe7ce 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -992,6 +992,8 @@ struct amdgpu_device {
>   charproduct_number[16];
>   charproduct_name[32];
>   charserial[16];
> +
> + struct amdgpu_autodump  autodump;
>   };
>   
>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct 
> ttm_bo_device *bdev) diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 1a4894fa3693..261b67ece7fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -27,7 +27,7 @@
>   #include 
>   #include 
>   #include 
> -
> +#include 
>   #include 
>   
>   #include "amdgpu.h"
> @@ -74,8 +74,82 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
>   return 0;
>   }
>   
> +int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev) { #if 
> +defined(CONFIG_DEBUG_FS)
> + unsigned long timeout = 600 * HZ;
> + int ret;
> +
> + wake_up_interruptible(>autodump.gpu_hang);
> +
> + ret = 
> wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
> + complete_all(>autodump.dumping);
> + if (ret == 0) {
> + pr_err("autodump: timeout, move on to gpu recovery\n");
> + return -ETIMEDOUT;
> + }
> +#endif
> + return 0;
> +}
> +
>   #if defined(CONFIG_DEBUG_FS)
>   
> +static int amdgpu_debugfs_autodump_open(struct inode *inode, struct 
> +file *file) {
> + struct amdgpu_device *adev = inode->i_private;
> + int ret;
> +
> + file->private_data = adev;
> +
> + mutex_lock(>lock_reset);
> + if (adev->autodump.dumping.done)
> + ret = 0;
> + else
> + ret = -EBUSY;
> + mutex_unlock(>lock_reset);
> +
> 

Re: [Nouveau] [RFC] Remove AGP support from Radeon/Nouveau/TTM

2020-05-13 Thread Emil Velikov
On Tue, 12 May 2020 at 20:48, Alex Deucher  wrote:

> > >>
> > >> There's some AGP support code in the DRM core. Can some of that declared
> > >> as legacy?
> > >>
> > >> Specifically, what about these AGP-related ioctl calls? Can they be
> > >> declared as legacy? It would appear to me that KMS-based drivers don't
> > >> have to manage AGP by themselves. (?)
> > >
> > > The old drm core AGP code is mainly (only?) for the non-KMS drivers.
> > > E.g., mach64, r128, sis, savage, etc.
> >
> > Exactly my point. There's one drm_agp_init() call left in radeon. The
> > rest of the AGP code is apparently for legacy non-KMS drivers. Should
> > the related ioctl calls be declared as legacy (i.e., listed with
> > DRM_LEGACY_IOCTL_DEF instead of DRM_IOCTL_DEF)? If so, much of the AGP
> > core code could probably be moved behind CONFIG_DRM_LEGACY as well.
>
> Ah, I forgot about drm_agp_init().  I was just thinking the other AGP
> stuff.  Yeah, I think we could make it legacy.
>
Fwiw I've got local patches a) removing drm_agp_init() from radeon and
b) making the core drm code legacy only.
Will try to finish them over the weekend and send out.

Even if AGP GART gets dropped the b) patches will be good ;-)

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 02/17] dma-fence: basic lockdep annotations

2020-05-13 Thread Daniel Vetter
On Tue, May 12, 2020 at 11:19 AM Chris Wilson  wrote:
> Quoting Daniel Vetter (2020-05-12 10:08:47)
> > On Tue, May 12, 2020 at 10:04:22AM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2020-05-12 09:59:29)
> > > > Design is similar to the lockdep annotations for workers, but with
> > > > some twists:
> > > >
> > > > - We use a read-lock for the execution/worker/completion side, so that
> > > >   this explicit annotation can be more liberally sprinkled around.
> > > >   With read locks lockdep isn't going to complain if the read-side
> > > >   isn't nested the same way under all circumstances, so ABBA deadlocks
> > > >   are ok. Which they are, since this is an annotation only.
> > > >
> > > > - We're using non-recursive lockdep read lock mode, since in recursive
> > > >   read lock mode lockdep does not catch read side hazards. And we
> > > >   _very_ much want read side hazards to be caught. For full details of
> > > >   this limitation see
> > > >
> > > >   commit e91498589746065e3ae95d9a00b068e525eec34f
> > > >   Author: Peter Zijlstra 
> > > >   Date:   Wed Aug 23 13:13:11 2017 +0200
> > > >
> > > >   locking/lockdep/selftests: Add mixed read-write ABBA tests
> > > >
> > > > - To allow nesting of the read-side explicit annotations we explicitly
> > > >   keep track of the nesting. lock_is_held() allows us to do that.
> > > >
> > > > - The wait-side annotation is a write lock, and entirely done within
> > > >   dma_fence_wait() for everyone by default.
> > > >
> > > > - To be able to freely annotate helper functions I want to make it ok
> > > >   to call dma_fence_begin/end_signalling from soft/hardirq context.
> > > >   First attempt was using the hardirq locking context for the write
> > > >   side in lockdep, but this forces all normal spinlocks nested within
> > > >   dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> > > >
> > > >   The approach now is to simple check in_atomic(), and for these cases
> > > >   entirely rely on the might_sleep() check in dma_fence_wait(). That
> > > >   will catch any wrong nesting against spinlocks from soft/hardirq
> > > >   contexts.
> > > >
> > > > The idea here is that every code path that's critical for eventually
> > > > signalling a dma_fence should be annotated with
> > > > dma_fence_begin/end_signalling. The annotation ideally starts right
> > > > after a dma_fence is published (added to a dma_resv, exposed as a
> > > > sync_file fd, attached to a drm_syncobj fd, or anything else that
> > > > makes the dma_fence visible to other kernel threads), up to and
> > > > including the dma_fence_wait(). Examples are irq handlers, the
> > > > scheduler rt threads, the tail of execbuf (after the corresponding
> > > > fences are visible), any workers that end up signalling dma_fences and
> > > > really anything else. Not annotated should be code paths that only
> > > > complete fences opportunistically as the gpu progresses, like e.g.
> > > > shrinker/eviction code.
> > > >
> > > > The main class of deadlocks this is supposed to catch are:
> > > >
> > > > Thread A:
> > > >
> > > > mutex_lock(A);
> > > > mutex_unlock(A);
> > > >
> > > > dma_fence_signal();
> > > >
> > > > Thread B:
> > > >
> > > > mutex_lock(A);
> > > > dma_fence_wait();
> > > > mutex_unlock(A);
> > > >
> > > > Thread B is blocked on A signalling the fence, but A never gets around
> > > > to that because it cannot acquire the lock A.
> > > >
> > > > Note that dma_fence_wait() is allowed to be nested within
> > > > dma_fence_begin/end_signalling sections. To allow this to happen the
> > > > read lock needs to be upgraded to a write lock, which means that any
> > > > other lock is acquired between the dma_fence_begin_signalling() call and
> > > > the call to dma_fence_wait(), and still held, this will result in an
> > > > immediate lockdep complaint. The only other option would be to not
> > > > annotate such calls, defeating the point. Therefore these annotations
> > > > cannot be sprinkled over the code entirely mindless to avoid false
> > > > positives.
> > > >
> > > > v2: handle soft/hardirq ctx better against write side and dont forget
> > > > EXPORT_SYMBOL, drivers can't use this otherwise.
> > > >
> > > > Cc: linux-me...@vger.kernel.org
> > > > Cc: linaro-mm-...@lists.linaro.org
> > > > Cc: linux-r...@vger.kernel.org
> > > > Cc: amd-gfx@lists.freedesktop.org
> > > > Cc: intel-...@lists.freedesktop.org
> > > > Cc: Chris Wilson 
> > > > Cc: Maarten Lankhorst 
> > > > Cc: Christian König 
> > > > Signed-off-by: Daniel Vetter 
> > > > ---
> > > >  drivers/dma-buf/dma-fence.c | 53 +
> > > >  include/linux/dma-fence.h   | 12 +
> > > >  2 files changed, 65 insertions(+)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index 6802125349fb..d5c0fd2efc70 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread Christian König
Thanks for the reminder, had to much todo yesterday and just forgot 
about it.


Christian.

Am 13.05.20 um 10:16 schrieb Zhao, Jiange:


[AMD Official Use Only - Internal Distribution Only]


Hi @Koenig, Christian ,

I made some changes on top of version 3 and tested it. Can you help 
review?


Jiange

*From:* Zhao, Jiange 
*Sent:* Saturday, May 9, 2020 5:45 PM
*To:* amd-gfx@lists.freedesktop.org 
*Cc:* Koenig, Christian ; Pelloux-prayer, 
Pierre-eric ; Deucher, Alexander 
; Liu, Monk ; Zhao, 
Jiange 

*Subject:* [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4
From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
    (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
    the node can be reopened; also, there is no need to wait for
    completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
    add 'app_state_mutex' for race conditions:
    (1)Only 1 user can open this file node
    (2)wait_dump() can only take effect after poll() executed.
    (3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
    removed state checking in amdgpu_debugfs_wait_dump
    Improve on top of version 3 so that the node can be reopened.

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 78 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 2a806cb55b78..9e8eeddfe7ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -992,6 +992,8 @@ struct amdgpu_device {
 char product_number[16];
 char product_name[32];
 char    serial[16];
+
+   struct amdgpu_autodump  autodump;
 };

 static inline struct amdgpu_device *amdgpu_ttm_adev(struct 
ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c

index 1a4894fa3693..261b67ece7fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 

 #include "amdgpu.h"
@@ -74,8 +74,82 @@ int amdgpu_debugfs_add_files(struct amdgpu_device 
*adev,

 return 0;
 }

+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+ wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, 
timeout);

+   complete_all(>autodump.dumping);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)

+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct 
file *file)

+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (adev->autodump.dumping.done)
+   ret = 0;
+   else
+   ret = -EBUSY;
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, 
struct file *file)

+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete_all(>autodump.dumping);
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, 
struct poll_table_struct *poll_table)

+{
+   struct amdgpu_device *adev = file->private_data;
+
+ reinit_completion(>autodump.dumping);
+   

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread Christian König

Am 09.05.20 um 11:45 schrieb jia...@amd.com:

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
 (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
   rename debugfs file to amdgpu_autodump,
   provide autodump_read as well,
   style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
 the node can be reopened; also, there is no need to wait for
 completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
 add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
 removed state checking in amdgpu_debugfs_wait_dump
 Improve on top of version 3 so that the node can be reopened.

Signed-off-by: Jiange Zhao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 78 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
  4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2a806cb55b78..9e8eeddfe7ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -992,6 +992,8 @@ struct amdgpu_device {
charproduct_number[16];
charproduct_name[32];
charserial[16];
+
+   struct amdgpu_autodump  autodump;
  };
  
  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..261b67ece7fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include 
  
  #include "amdgpu.h"

@@ -74,8 +74,82 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
return 0;
  }
  
+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)

+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   complete_all(>autodump.dumping);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
  #if defined(CONFIG_DEBUG_FS)
  
+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)

+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (adev->autodump.dumping.done)
+   ret = 0;
+   else
+   ret = -EBUSY;
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete_all(>autodump.dumping);
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   reinit_completion(>autodump.dumping);


Why do you have the reinit_completion here and not in open callback?

Apart from that looks good to me.

Regards,
Christian.


+   poll_wait(file, >autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+
+   return 0;
+}
+
+static const struct file_operations autodump_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = amdgpu_debugfs_autodump_open,
+   .poll = amdgpu_debugfs_autodump_poll,
+   .release = amdgpu_debugfs_autodump_release,
+};
+
+static void 

Re: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

2020-05-13 Thread Zhao, Jiange
[AMD Official Use Only - Internal Distribution Only]

Hi @Koenig, Christian,

I made some changes on top of version 3 and tested it. Can you help review?

Jiange

From: Zhao, Jiange 
Sent: Saturday, May 9, 2020 5:45 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Koenig, Christian ; Pelloux-prayer, Pierre-eric 
; Deucher, Alexander 
; Liu, Monk ; Zhao, Jiange 

Subject: [PATCH] drm/amdgpu: Add autodump debugfs node for gpu reset v4

From: Jiange Zhao 

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
  rename debugfs file to amdgpu_autodump,
  provide autodump_read as well,
  style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
removed state checking in amdgpu_debugfs_wait_dump
Improve on top of version 3 so that the node can be reopened.

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 78 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2a806cb55b78..9e8eeddfe7ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -992,6 +992,8 @@ struct amdgpu_device {
 charproduct_number[16];
 charproduct_name[32];
 charserial[16];
+
+   struct amdgpu_autodump  autodump;
 };

 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 1a4894fa3693..261b67ece7fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 

 #include "amdgpu.h"
@@ -74,8 +74,82 @@ int amdgpu_debugfs_add_files(struct amdgpu_device *adev,
 return 0;
 }

+int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
+{
+#if defined(CONFIG_DEBUG_FS)
+   unsigned long timeout = 600 * HZ;
+   int ret;
+
+   wake_up_interruptible(>autodump.gpu_hang);
+
+   ret = 
wait_for_completion_interruptible_timeout(>autodump.dumping, timeout);
+   complete_all(>autodump.dumping);
+   if (ret == 0) {
+   pr_err("autodump: timeout, move on to gpu recovery\n");
+   return -ETIMEDOUT;
+   }
+#endif
+   return 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)

+static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
+{
+   struct amdgpu_device *adev = inode->i_private;
+   int ret;
+
+   file->private_data = adev;
+
+   mutex_lock(>lock_reset);
+   if (adev->autodump.dumping.done)
+   ret = 0;
+   else
+   ret = -EBUSY;
+   mutex_unlock(>lock_reset);
+
+   return ret;
+}
+
+static int amdgpu_debugfs_autodump_release(struct inode *inode, struct file 
*file)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   complete_all(>autodump.dumping);
+   return 0;
+}
+
+static unsigned int amdgpu_debugfs_autodump_poll(struct file *file, struct 
poll_table_struct *poll_table)
+{
+   struct amdgpu_device *adev = file->private_data;
+
+   reinit_completion(>autodump.dumping);
+   poll_wait(file, >autodump.gpu_hang, poll_table);
+
+   if (adev->in_gpu_reset)
+   return POLLIN | POLLRDNORM | POLLWRNORM;
+

Re: [Nouveau] [PATCH 1/3] drm/radeon: remove AGP support

2020-05-13 Thread Michel Dänzer
On 2020-05-13 9:46 a.m., Christian König wrote:
> Am 12.05.20 um 23:12 schrieb Alex Deucher:
>> On Tue, May 12, 2020 at 4:52 PM Roy Spliet  wrote:
>>>
>>> I'll volunteer to be the one asking: how big is this performance
>>> difference? Have any benchmarks been run before and after removal of AGP
>>> GART code on affected nouveau/radeon systems? Or is this code being
>>> dropped _just_ because it's cumbersome, with no regard for metrics that
>>> determine the value of AGP GART support?
>>>
>> I don't think anyone has any solid numbers, just anecdotal from
>> memory.  I certainly don't have any functional AGP systems at this
>> point.  It's mostly just cumbersome and would allow us to clean ttm
>> and probably improve stability at the same time.  At least on the
>> radeon side, the only native AGP cards were r1xx, r2xx, and some of
>> the early r3xx boards.  Once we switched to pcie mid-way through r3xx,
>> everything was native pcie and the AGP cards used a pcie to AGP bridge
>> chip so they had a decent on chip MMU.  Those older cards topped out
>> at maybe 32 or 64 MB of vram, so they are going to be hard pressed to
>> deal with modern desktops anyway.  No idea what sort of GART
>> capabilities NV AGP hardware at this time had.
> 
> I could only test with an old x86 Mac and an r3xx generation hw and in
> this case making the switch didn't had any noticeable effect at all.
> 
> But I didn't do more than playing around with the desktop effects and
> playing a video.

Yeah, that's not enough to see a difference. Try an OpenGL game, or even
just glxgears.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Nouveau] [PATCH 1/3] drm/radeon: remove AGP support

2020-05-13 Thread Christian König

Am 12.05.20 um 23:12 schrieb Alex Deucher:

On Tue, May 12, 2020 at 4:52 PM Roy Spliet  wrote:

Op 12-05-2020 om 14:36 schreef Alex Deucher:

On Tue, May 12, 2020 at 4:16 AM Michel Dänzer  wrote:

On 2020-05-11 10:12 p.m., Alex Deucher wrote:

On Mon, May 11, 2020 at 1:17 PM Christian König
 wrote:

AGP is deprecated for 10+ years now and not used any more on modern hardware.

Old hardware should continue to work in PCI mode.

Might want to clarify that there is no loss of functionality here.
Something like:

"There is no loss of functionality here.  GPUs will continue to
function.  This just drops the use of the AGP MMU in the chipset in
favor of the MMU on the device which has proven to be much more
reliable.  Due to its unreliability, AGP support has been disabled on
PowerPC for years already so there is no change on PowerPC."

There's a difference between something being disabled by default or not
being available at all. We may decide it's worth it anyway, but let's do
it based on facts.


I didn't mean to imply that AGP GART support was already removed.  But
for the vast majority of users the end result is the same.  If you
knew enough re-enable AGP GART, you probably wouldn't have been as
confused about what this patch set does either.  To reiterate, this
patch set does not remove support for AGP cards, it only removes the
support for AGP GART.  The cards will still be functional using the
device GART.  There may be performance tradeoffs there in some cases.

I'll volunteer to be the one asking: how big is this performance
difference? Have any benchmarks been run before and after removal of AGP
GART code on affected nouveau/radeon systems? Or is this code being
dropped _just_ because it's cumbersome, with no regard for metrics that
determine the value of AGP GART support?


I don't think anyone has any solid numbers, just anecdotal from
memory.  I certainly don't have any functional AGP systems at this
point.  It's mostly just cumbersome and would allow us to clean ttm
and probably improve stability at the same time.  At least on the
radeon side, the only native AGP cards were r1xx, r2xx, and some of
the early r3xx boards.  Once we switched to pcie mid-way through r3xx,
everything was native pcie and the AGP cards used a pcie to AGP bridge
chip so they had a decent on chip MMU.  Those older cards topped out
at maybe 32 or 64 MB of vram, so they are going to be hard pressed to
deal with modern desktops anyway.  No idea what sort of GART
capabilities NV AGP hardware at this time had.


I could only test with an old x86 Mac and an r3xx generation hw and in 
this case making the switch didn't had any noticeable effect at all.


But I didn't do more than playing around with the desktop effects and 
playing a video.


I do have a PC x86 AGP board lying around somewhere here, going top give 
that one a try a well.


Christian.



Alex


Roy


Alex
___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit()

2020-05-13 Thread Daniel Vetter
On Wed, May 13, 2020 at 9:02 AM Christian König
 wrote:
>
> Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> > This is a bit tricky, since ->notifier_lock is held while calling
> > dma_fence_wait we must ensure that also the read side (i.e.
> > dma_fence_begin_signalling) is on the same side. If we mix this up
> > lockdep complaints, and that's again why we want to have these
> > annotations.
> >
> > A nice side effect of this is that because of the fs_reclaim priming
> > for dma_fence_enable lockdep now automatically checks for us that
> > nothing in here allocates memory, without even running any userptr
> > workloads.
> >
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Cc: linux-r...@vger.kernel.org
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: Chris Wilson 
> > Cc: Maarten Lankhorst 
> > Cc: Christian König 
> > Signed-off-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index 7653f62b1b2d..6db3f3c629b0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1213,6 +1213,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser 
> > *p,
> >   struct amdgpu_job *job;
> >   uint64_t seq;
> >   int r;
> > + bool fence_cookie;
> >
> >   job = p->job;
> >   p->job = NULL;
> > @@ -1227,6 +1228,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser 
> > *p,
> >*/
> >   mutex_lock(>adev->notifier_lock);
> >
> > + fence_cookie = dma_fence_begin_signalling();
> > +
> >   /* If userptr are invalidated after amdgpu_cs_parser_bos(), return
> >* -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
> >*/
> > @@ -1264,12 +1267,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser 
> > *p,
> >   amdgpu_vm_move_to_lru_tail(p->adev, >vm);
> >
> >   ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
> > + dma_fence_end_signalling(fence_cookie);
>
> Mhm, this could come earlier in theory. E.g. after pushing the job to
> the scheduler.

Yeah, I have not much clue about how amdgpu works :-) In practice it
doesn't matter much, since the enclosing adev->notifier_lock is a lot
more strict about what it allows than the dma_fence signalling fake
lock.
-Daniel

>
> Christian.
>
> >   mutex_unlock(>adev->notifier_lock);
> >
> >   return 0;
> >
> >   error_abort:
> >   drm_sched_job_cleanup(>base);
> > + dma_fence_end_signalling(fence_cookie);
> >   mutex_unlock(>adev->notifier_lock);
> >
> >   error_unlock:
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit()

2020-05-13 Thread Christian König

Am 12.05.20 um 10:59 schrieb Daniel Vetter:

This is a bit tricky, since ->notifier_lock is held while calling
dma_fence_wait we must ensure that also the read side (i.e.
dma_fence_begin_signalling) is on the same side. If we mix this up
lockdep complaints, and that's again why we want to have these
annotations.

A nice side effect of this is that because of the fs_reclaim priming
for dma_fence_enable lockdep now automatically checks for us that
nothing in here allocates memory, without even running any userptr
workloads.

Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Cc: linux-r...@vger.kernel.org
Cc: amd-gfx@lists.freedesktop.org
Cc: intel-...@lists.freedesktop.org
Cc: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Christian König 
Signed-off-by: Daniel Vetter 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7653f62b1b2d..6db3f3c629b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1213,6 +1213,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_job *job;
uint64_t seq;
int r;
+   bool fence_cookie;
  
  	job = p->job;

p->job = NULL;
@@ -1227,6 +1228,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 */
mutex_lock(>adev->notifier_lock);
  
+	fence_cookie = dma_fence_begin_signalling();

+
/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
 */
@@ -1264,12 +1267,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_vm_move_to_lru_tail(p->adev, >vm);
  
  	ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);

+   dma_fence_end_signalling(fence_cookie);


Mhm, this could come earlier in theory. E.g. after pushing the job to 
the scheduler.


Christian.


mutex_unlock(>adev->notifier_lock);
  
  	return 0;
  
  error_abort:

drm_sched_job_cleanup(>base);
+   dma_fence_end_signalling(fence_cookie);
mutex_unlock(>adev->notifier_lock);
  
  error_unlock:


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC 16/17] drm/amdgpu: gpu recovery does full modesets

2020-05-13 Thread Daniel Vetter
On Tue, May 12, 2020 at 10:10 PM Kazlauskas, Nicholas
 wrote:
>
> On 2020-05-12 12:12 p.m., Daniel Vetter wrote:
> > On Tue, May 12, 2020 at 4:24 PM Alex Deucher  wrote:
> >>
> >> On Tue, May 12, 2020 at 9:45 AM Daniel Vetter  
> >> wrote:
> >>>
> >>> On Tue, May 12, 2020 at 3:29 PM Alex Deucher  
> >>> wrote:
> 
>  On Tue, May 12, 2020 at 9:17 AM Daniel Vetter  
>  wrote:
> >
> > On Tue, May 12, 2020 at 3:12 PM Alex Deucher  
> > wrote:
> >>
> >> On Tue, May 12, 2020 at 8:58 AM Daniel Vetter  wrote:
> >>>
> >>> On Tue, May 12, 2020 at 08:54:45AM -0400, Alex Deucher wrote:
>  On Tue, May 12, 2020 at 5:00 AM Daniel Vetter 
>   wrote:
> >
> > ...
> >
> > I think it's time to stop this little exercise.
> >
> > The lockdep splat, for the record:
> >
> > [  132.583381] 
> > ==
> > [  132.584091] WARNING: possible circular locking dependency 
> > detected
> > [  132.584775] 5.7.0-rc3+ #346 Tainted: GW
> > [  132.585461] 
> > --
> > [  132.586184] kworker/2:3/865 is trying to acquire lock:
> > [  132.586857] c9677c70 
> > (crtc_ww_class_acquire){+.+.}-{0:0}, at: 
> > drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper]
> > [  132.587569]
> > but task is already holding lock:
> > [  132.589044] 82318c80 (dma_fence_map){}-{0:0}, at: 
> > drm_sched_job_timedout+0x25/0xf0 [gpu_sched]
> > [  132.589803]
> > which lock already depends on the new lock.
> >
> > [  132.592009]
> > the existing dependency chain (in reverse order) is:
> > [  132.593507]
> > -> #2 (dma_fence_map){}-{0:0}:
> > [  132.595019]dma_fence_begin_signalling+0x50/0x60
> > [  132.595767]drm_atomic_helper_commit+0xa1/0x180 
> > [drm_kms_helper]
> > [  132.596567]drm_client_modeset_commit_atomic+0x1ea/0x250 
> > [drm]
> > [  132.597420]drm_client_modeset_commit_locked+0x55/0x190 
> > [drm]
> > [  132.598178]drm_client_modeset_commit+0x24/0x40 [drm]
> > [  132.598948]
> > drm_fb_helper_restore_fbdev_mode_unlocked+0x4b/0xa0 [drm_kms_helper]
> > [  132.599738]drm_fb_helper_set_par+0x30/0x40 
> > [drm_kms_helper]
> > [  132.600539]fbcon_init+0x2e8/0x660
> > [  132.601344]visual_init+0xce/0x130
> > [  132.602156]do_bind_con_driver+0x1bc/0x2b0
> > [  132.602970]do_take_over_console+0x115/0x180
> > [  132.603763]do_fbcon_takeover+0x58/0xb0
> > [  132.604564]register_framebuffer+0x1ee/0x300
> > [  132.605369]
> > __drm_fb_helper_initial_config_and_unlock+0x36e/0x520 
> > [drm_kms_helper]
> > [  132.606187]amdgpu_fbdev_init+0xb3/0xf0 [amdgpu]
> > [  132.607032]amdgpu_device_init.cold+0xe90/0x1677 [amdgpu]
> > [  132.607862]amdgpu_driver_load_kms+0x5a/0x200 [amdgpu]
> > [  132.608697]amdgpu_pci_probe+0xf7/0x180 [amdgpu]
> > [  132.609511]local_pci_probe+0x42/0x80
> > [  132.610324]pci_device_probe+0x104/0x1a0
> > [  132.611130]really_probe+0x147/0x3c0
> > [  132.611939]driver_probe_device+0xb6/0x100
> > [  132.612766]device_driver_attach+0x53/0x60
> > [  132.613593]__driver_attach+0x8c/0x150
> > [  132.614419]bus_for_each_dev+0x7b/0xc0
> > [  132.615249]bus_add_driver+0x14c/0x1f0
> > [  132.616071]driver_register+0x6c/0xc0
> > [  132.616902]do_one_initcall+0x5d/0x2f0
> > [  132.617731]do_init_module+0x5c/0x230
> > [  132.618560]load_module+0x2981/0x2bc0
> > [  132.619391]__do_sys_finit_module+0xaa/0x110
> > [  132.620228]do_syscall_64+0x5a/0x250
> > [  132.621064]entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > [  132.621903]
> > -> #1 (crtc_ww_class_mutex){+.+.}-{3:3}:
> > [  132.623587]__ww_mutex_lock.constprop.0+0xcc/0x10c0
> > [  132.624448]ww_mutex_lock+0x43/0xb0
> > [  132.625315]drm_modeset_lock+0x44/0x120 [drm]
> > [  132.626184]drmm_mode_config_init+0x2db/0x8b0 [drm]
> > [  132.627098]amdgpu_device_init.cold+0xbd1/0x1677 [amdgpu]
> > [  132.628007]amdgpu_driver_load_kms+0x5a/0x200 [amdgpu]
> > [  132.628920]amdgpu_pci_probe+0xf7/0x180 [amdgpu]
> > [