Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer dereference

2018-11-30 Thread Sharma, Deepak
Did find way to reproduce issue constantly. After applying David's patch 
"0001-drm-amdgpu-fix-signaled-fence-isn-t-handled" with minor change


-static struct dma_fence *drm_syncobj_get_stub_fence(void)
+struct dma_fence *drm_syncobj_get_stub_fence(void)


was able to avoid kernel panic due to NULL pointer dereference. So seems it is 
resolving the problem which I was trying to address.

Can you please put revision of this patch?



Though only one time I got reboot with dmesg..


[  213.714324] RIP: 0010:reservation_object_add_shared_fence+0x22e/0x245
[  213.714331] RSP: 0018:a03900b5ba80 EFLAGS: 00010246
[  213.714338] RAX: 0004 RBX: a03900b5bba8 RCX: 969421139d00
[  213.714343] RDX:  RSI: 9693639662e0 RDI: 9694203db230
[  213.714349] RBP: a03900b5bad0 R08: 002a R09: 969363966280
[  213.714354] R10: 000180800079 R11: a3637f66 R12: 96941ea5cf00
[  213.714360] R13:  R14: 9693639662e0 R15: 9694203db230
[  213.714366] FS:  786009e5f700() GS:96942ec0() 
knlGS:
[  213.714372] CS:  0010 DS:  ES:  CR0: 80050033
[  213.714377] CR2: 7860023a4000 CR3: 00011ea8a000 CR4: 001406f0
[  213.714382] Call Trace:
[  213.714397]  ttm_eu_fence_buffer_objects+0x53/0x89
[  213.714407]  amdgpu_cs_ioctl+0x194f/0x196c
[  213.714420]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[  213.714427]  drm_ioctl_kernel+0x98/0xac
[  213.714435]  drm_ioctl+0x235/0x357
[  213.714443]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
[  213.714450]  ? __switch_to_asm+0x34/0x70
[  213.714456]  ? __switch_to_asm+0x40/0x70
[  213.714462]  ? __switch_to_asm+0x34/0x70
[  213.714468]  ? __switch_to_asm+0x40/0x70
[  213.714476]  amdgpu_drm_ioctl+0x46/0x78
[  213.714500]  vfs_ioctl+0x1b/0x30
[  213.714507]  do_vfs_ioctl+0x492/0x6c1
[  213.714530]  SyS_ioctl+0x52/0x77
[  213.714537]  do_syscall_64+0x6d/0x81
[  213.714544]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  213.714551] RIP: 0033:0x78600c9c6967
[  213.714556] RSP: 002b:786009e5ec08 EFLAGS: 0246 ORIG_RAX: 
0010
[  213.714563] RAX: ffda RBX: c0186444 RCX: 78600c9c6967
[  213.714568] RDX: 786009e5ec60 RSI: c0186444 RDI: 000d
[  213.714573] RBP: 786009e5ec40 R08: 786009e5ed00 R09: 786009e5ec50
[  213.714578] R10:  R11: 0246 R12: 000d
[  213.714583] R13: 3ba2573ff128 R14:  R15: 786009e5ec60



Its worth to mention that I am not using latest kernel .relevant git log


71a0556255de125b7e3fc0dc6171fb31fab2b9ad drm/syncobj: Don't leak fences when 
WAIT_FOR_SUBMIT is set

4034318d2afac8f7f43457a4b480b877a94ec651  drm/syncobj: Stop reusing the same 
struct file for all syncobj -> fd


So have not taken any recent changes  from 
https://patchwork.kernel.org/patch/10575275/ series. only backported  attached 
two patches . I need to check if something is missing that can cause issue.


Thanks,

Deepak



From: Sharma, Deepak
Sent: Tuesday, November 27, 2018 3:12 PM
To: Zhou, David(ChunMing); Koenig, Christian; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference


Thanks David ,

I did backport the drm patch to my kernel  after that I am not getting -ENOMEM 
from amdgpu_cs_ioctl while running tests  so have not been able to test patch 
to handle signaled fence. As this issue is hard to reproduce , will give some 
more try.

But yes the problem is there and need to handle when fence is null that your 
patch seems to handle it correctly.

-Deepak

From: Zhou, David(ChunMing)
Sent: Monday, November 26, 2018 6:40 PM
To: Sharma, Deepak; Zhou, David(ChunMing); Koenig, Christian; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer 
dereference

Yeah, you need another drm patch as well when you apply my patch. Attached.

-David


On 2018年11月27日 08:40, Sharma, Deepak wrote:
>
> On 11/26/18 1:57 AM, Zhou, David(ChunMing) wrote:
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, November 26, 2018 5:23 PM
>>> To: Sharma, Deepak ; Zhou, David(ChunMing)
>>> ; Koenig, Christian ;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add the checking to avoid NULL pointer
>>> dereference
>>>
>>> Am 26.11.18 um 02:59 schrieb Sharma, Deepak:
 在 2018/11/24 2:10, Koenig, Christian 写道:
> Am 23.11.18 um 15:10 schrieb Zhou, David(ChunMing):
>> 在 2018/11/23 21:30, Koenig, Christian 写道:
>>> Am 23.11.18 um 14:27 schrieb Zhou, David(ChunMing):
 在 2018/11/22 19:25, Christian König 写道:
> Am 22.11.18 um 07:56 schrieb Sharma, Deepak:
>> when returned fence is not valid mostly due to userspace ignored
>> previous error causes NULL pointer dereference.
> Again, thi

Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-30 Thread Matt Roper
On Wed, Nov 28, 2018 at 07:46:06PM +, Ho, Kenny wrote:
> 
> On Wed, Nov 28, 2018 at 4:14 AM Joonas Lahtinen 
>  wrote:
> > So we can only choose the lowest common denominator, right?
> >
> > Any core count out of total core count should translate nicely into a
> > fraction, so what would be the problem with percentage amounts?
> 
> I don't think having an abstracted resource necessarily equate
> 'lowest'.  The issue with percentage is the lack of precision.  If you
> look at cpuset cgroup, you can see the specification can be very
> precise:
> 
> # /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6
> (https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt)
> 
> The driver can translate something like this to core count and then to
> percentage and handle accordingly while the reverse is not possible.
> (You can't tell which set of CUs/EUs a user want from a percentage
> request.)  It's also not clear to me, from
> user/application/admin/resource management perspective, how the base
> core counts of a GPU is relevant to the workload (since percentage is
> a 'relative' quantity.)  For example, let say a workload wants to use
> 256 'cores', does it matter if that workload is put on a GPU with 1024
> cores or a GPU with 4096 cores total?
> 
> I am not dismissing the possible need for percentage.  I just think
> there should be a way to accommodate more than just the 'lowest'. 
> 

As you noted, your proposal is similar to the cgroup-v1 "cpuset"
controller, which is sort of a way of partitioning your underlying
hardware resources; I think Joonas is describing something closer in
design to the cgroup-v2 "cpu" controller, which partitions the general
time/usage allocated to via cgroup; afaiu, "cpu" doesn't really care
which specific core the tasks run on, just the relative weights that
determine how much time they get to run on any of the cores.

It sounds like with your hardware, your kernel driver is able to specify
exactly which subset of GPU EU's a specific GPU context winds up running
on.  However I think there are a lot of platforms that don't allow that
kind of low-level control.  E.g., I don't think we can do that on Intel
hardware; we have a handful of high-level GPU engines that we can submit
different types of batchbuffers to (render, blitter, media, etc.).  What
we can do is use GPU preemption to limit how much time specific GPU
contexts get to run on the render engine before the engine is reclaimed
for use by a different context.

Using a %gputime approach like Joonas is suggesting could be handled in
a driver by reserving specific subsets of EU's on hardware like yours
that's capable of doing that, whereas it could be mostly handled on
other types of hardware via GPU engine preemption.

I think either approach "gpu_euset" or "%gputime" should map well to a
cgroup controller implementation.  Granted, neither one solves the
specific use case I was working on earlier this year where we need
unfair (starvation-okay) scheduling that will run contexts strictly
according to priority (i.e., lower priority contexts will never run at
all unless all higher priority contexts have completed all of their
submitted work), but that's a pretty specialized use case that we'll
probably need to handle in a different manner anyway.


Matt


> Regards,
> Kennny
> 
> 
> > > > That combined with the "GPU memory usable" property should be a good
> > > > starting point to start subdividing the GPU resources for multiple
> > > > users.
> > > >
> > > > Regards, Joonas
> > > >
> > > > >
> > > > > Your feedback is highly appreciated.
> > > > >
> > > > > Best Regards,
> > > > > Harish
> > > > >
> > > > >
> > > > >
> > > > > From: amd-gfx  on behalf of 
> > > > > Tejun Heo 
> > > > > Sent: Tuesday, November 20, 2018 5:30 PM
> > > > > To: Ho, Kenny
> > > > > Cc: cgro...@vger.kernel.org; intel-...@lists.freedesktop.org; 
> > > > > y2ke...@gmail.com; amd-gfx@lists.freedesktop.org; 
> > > > > dri-de...@lists.freedesktop.org
> > > > > Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor 
> > > > > specific DRM devices
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote:
> > > > > > By this reply, are you suggesting that vendor specific resources
> > > > > > will never be acceptable to be managed under cgroup?  Let say a user
> > > > >
> > > > > I wouldn't say never but whatever which gets included as a cgroup
> > > > > controller should have clearly defined resource abstractions and the
> > > > > control schemes around them including support for delegation.  AFAICS,
> > > > > gpu side still seems to have a long way to go (and it's not clear
> > > > > whether that's somewhere it will or needs to end up).
> > > > >
> > > > > > want to have similar functionality as what cgroup is offering but to
> > > > > > manage vendor specific resources, what would you suggest as a
> > > > > > solution?  When you say keeping vendor specific resourc

[PATCH v3 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Andrey Grodzovsky
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c75badf..bfd286c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 {
int i, r;
 
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
amdgpu_amdkfd_device_fini(adev);
 
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index f8c86d0..1b15ff3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
mutex_unlock(&xgmi_mutex);
return ret;
 }
+
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
+{
+   struct amdgpu_hive_info *hive;
+
+   if (!adev->gmc.xgmi.supported)
+   return;
+
+   mutex_lock(&xgmi_mutex);
+
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (!hive)
+   goto exit;
+
+   if (!(hive->number_devices--))
+   mutex_destroy(&hive->hive_lock);
+
+exit:
+   mutex_unlock(&xgmi_mutex);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6335bfd..6151eb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -35,5 +35,6 @@ struct amdgpu_hive_info {
 struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
 int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
amdgpu_device *adev);
 int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 1/3] drm/amdgpu/psp: Update waiting in psp mode1 reset.

2018-11-30 Thread Andrey Grodzovsky
No point in use mdelay unless running from interrupt context (which we are not)
This is busy wait which will block the CPU for the entirety of the wait time.
Also, reduce wait time to 500ms as it is done in refernce code because
1s might cause PSP FW TO issues during XGMI hive reset.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index 082093a..9c1569b 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -557,7 +557,7 @@ static int psp_v11_0_mode1_reset(struct psp_context *psp)
/*send the mode 1 reset command*/
WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST);
 
-   mdelay(1000);
+   msleep(500);
 
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
index 7efb823..7357fd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
@@ -592,7 +592,7 @@ static int psp_v3_1_mode1_reset(struct psp_context *psp)
/*send the mode 1 reset command*/
WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST);
 
-   mdelay(1000);
+   msleep(500);
 
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v3 3/3] drm/amdgpu: Implement concurrent asic reset for XGMI.

2018-11-30 Thread Andrey Grodzovsky
Use per hive wq to concurrently send reset commands to all nodes
in the hive.

v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop  the hive reset for each node loop if there
is a reset failure on any of the nodes.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 44 ++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c8ad6bf..6fc023b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -910,7 +910,9 @@ struct amdgpu_device {
boolin_gpu_reset;
struct mutex  lock_reset;
struct amdgpu_doorbell_index doorbell_index;
+
int asic_reset_res;
+   struct work_struct  xgmi_reset_work;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bfd286c..9fd9f63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2356,6 +2356,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device 
*adev)
return amdgpu_device_asic_has_dc_support(adev->asic_type);
 }
 
+
+static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
+{
+   struct amdgpu_device *adev =
+   container_of(__work, struct amdgpu_device, xgmi_reset_work);
+
+   adev->asic_reset_res =  amdgpu_asic_reset(adev);
+   if (adev->asic_reset_res)
+   DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s",
+adev->asic_reset_res, adev->ddev->unique);
+}
+
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -2454,6 +2467,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
  amdgpu_device_delay_enable_gfx_off);
 
+   INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+
adev->gfx.gfx_off_req_count = 1;
adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : 
false;
 
@@ -3331,10 +3346,31 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info 
*hive,
 */
if (need_full_reset) {
list_for_each_entry(tmp_adev, device_list_handle, 
gmc.xgmi.head) {
-   r = amdgpu_asic_reset(tmp_adev);
-   if (r)
-   DRM_WARN("ASIC reset failed with err r, %d for 
drm dev, %s",
+   /* For XGMI run all resets in parallel to speed up the 
process */
+   if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
+   if (!queue_work(system_highpri_wq, 
&tmp_adev->xgmi_reset_work))
+   r = -EALREADY;
+   } else
+   r = amdgpu_asic_reset(tmp_adev);
+
+   if (r) {
+   DRM_ERROR("ASIC reset failed with err r, %d for 
drm dev, %s",
 r, tmp_adev->ddev->unique);
+   break;
+   }
+   }
+
+   /* For XGMI wait for all PSP resets to complete before proceed 
*/
+   if (!r) {
+   list_for_each_entry(tmp_adev, device_list_handle,
+   gmc.xgmi.head) {
+   if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
+   flush_work(&tmp_adev->xgmi_reset_work);
+   r = tmp_adev->asic_reset_res;
+   if (r)
+   break;
+   }
+   }
}
}
 
@@ -3521,8 +3557,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (tmp_adev == adev)
continue;
 
-   dev_info(tmp_adev->dev, "GPU reset begin for drm dev %s!\n", 
adev->ddev->unique);
-
amdgpu_device_lock_adev(tmp_adev);
r = amdgpu_device_pre_asic_reset(tmp_adev,
 NULL,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling

2018-11-30 Thread Zeng, Oak
The credit was used to limit vm (retry) fault to be processed in each VM. If 
this is removed, it is possible that you get flooded interrupt storm.

Even though you claimed from the commit message that, printk_ratelimit is a 
better solution, I didn't see you implement it in this patch. Are you planning 
a future patch?

Thanks,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Friday, November 30, 2018 7:36 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling

printk_ratelimit() is much better suited to limit the number of reported VM 
faults.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 37 -  
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  5 
 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +---
 drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 18 +---
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +---
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 18 +---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  |  7 ++---
 7 files changed, 6 insertions(+), 115 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 2acb9838913e..a2f149da83f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
}
 
INIT_KFIFO(vm->faults);
-   vm->fault_credit = 16;
 
return 0;
 
@@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
amdgpu_vmid_free_reserved(adev, vm, i);  }
 
-/**
- * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID
- *
- * @adev: amdgpu_device pointer
- * @pasid: PASID do identify the VM
- *
- * This function is expected to be called in interrupt context.
- *
- * Returns:
- * True if there was fault credit, false otherwise
- */
-bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
- unsigned int pasid)
-{
-   struct amdgpu_vm *vm;
-
-   spin_lock(&adev->vm_manager.pasid_lock);
-   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
-   if (!vm) {
-   /* VM not found, can't track fault credit */
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return true;
-   }
-
-   /* No lock needed. only accessed by IRQ handler */
-   if (!vm->fault_credit) {
-   /* Too many faults in this VM */
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return false;
-   }
-
-   vm->fault_credit--;
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return true;
-}
-
 /**
  * amdgpu_vm_manager_init - init the VM manager
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2a8898d19c8b..e8dcfd59fc93 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -229,9 +229,6 @@ struct amdgpu_vm {
/* Up to 128 pending retry page faults */
DECLARE_KFIFO(faults, u64, 128);
 
-   /* Limit non-retry fault storms */
-   unsigned intfault_credit;
-
/* Points to the KFD process VM info */
struct amdkfd_process_info *process_info;
 
@@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,  int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct 
amdgpu_vm *vm, unsigned int pasid);  void amdgpu_vm_release_compute(struct 
amdgpu_device *adev, struct amdgpu_vm *vm);  void amdgpu_vm_fini(struct 
amdgpu_device *adev, struct amdgpu_vm *vm); -bool 
amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
- unsigned int pasid);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 struct list_head *validated,
 struct amdgpu_bo_list_entry *entry); diff --git 
a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index b5775c6a857b..3e6c8c4067cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
  */
 static bool cik_ih_prescreen_iv(struct amdgpu_device *adev)  {
-   u32 ring_index = adev->irq.ih.rptr >> 2;
-   u16 pasid;
-
-   switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) {
-   case 146:
-   case 147:
-   pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16;
-   if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid))
-   return true;
-   break;
-   default:
-   /* Not a VM fault */
-   return true;
-   }
-
-   adev->irq.ih.rptr += 16;
-   return false;
+   return true;
 }
 
  /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
b/driv

RE: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

2018-11-30 Thread Zeng, Oak
See comment [Oak]

Thanks,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Friday, November 30, 2018 7:36 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing 
them v2

This allows us to filter out VM faults in the GMC code.

v2: don't filter out all faults

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++--
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 6b6524f04ce0..6db4c58ddc13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev,
if (!amdgpu_ih_prescreen_iv(adev))
return;
 
-   /* Before dispatching irq to IP blocks, send it to amdkfd */
-   amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]);
-
entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
amdgpu_ih_decode_iv(adev, &entry);
 
@@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
unsigned client_id = entry->client_id;
unsigned src_id = entry->src_id;
struct amdgpu_irq_src *src;
+   bool handled = false;
int r;
 
trace_amdgpu_iv(entry);
 
if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
-   DRM_DEBUG("Invalid client_id in IV: %d\n", client_id);
+   DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
return;
}
 
if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
-   DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
+   DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
return;
}
 
if (adev->irq.virq[src_id]) {
generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id));
-   } else {
-   if (!adev->irq.client[client_id].sources) {
-   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: 
%d\n",
- client_id, src_id);
-   return;
-   }
+   return;
+   }
 
+   if (!adev->irq.client[client_id].sources) {
+   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n",
+ client_id, src_id);
+   return;
+   } else {
[Oak]: you probably don't need a else because the return in above if. 
src = adev->irq.client[client_id].sources[src_id];
if (!src) {
DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id); 
@@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
}
 
r = src->funcs->process(adev, src, entry);
-   if (r)
+   if (r < 0)
DRM_ERROR("error processing interrupt (%d)\n", r);
+   else if (r)
+   handled = true;
}
+
+   /* Send it to amdkfd as well if it isn't already handled */
+   if (!handled)
[Oak]: Kfd cares about followed interrupt source/client. How can we guarantee 
that those are not handled by registered interrupt handler at amdgpu level?
source_id == SOC15_INTSRC_CP_END_OF_PIPE ||
source_id == SOC15_INTSRC_SDMA_TRAP ||
source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG ||
source_id == SOC15_INTSRC_CP_BAD_OPCODE ||
client_id == SOC15_IH_CLIENTID_VMC ||
client_id == SOC15_IH_CLIENTID_UTCL2;

To completely avoid this, we should remove the amdgpu_amdkfd_interrupt 
function. Instead, kfd interrupt handling functions should be splitted and 
registered to adev->irq.client[].sources[]

+   amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
 }
 
 /**
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add a xgmi supported flag

2018-11-30 Thread Grodzovsky, Andrey
Reviewed-by: Andrey Grodzovsky 

Andrey


On 11/30/2018 03:36 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey
>  wrote:
>>
>>
>> On 11/30/2018 03:30 PM, Alex Deucher wrote:
>>> Use this to track whether an asic supports xgmi rather than
>>> checking the asic type everywhere.
>>>
>>> Signed-off-by: Alex Deucher 
>>> ---
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  | 1 +
>>>drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
>>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +-
>>>drivers/gpu/drm/amd/amdgpu/soc15.c   | 3 +++
>>>4 files changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>>> index 8c57924c075f..81e6070d255b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
>>> @@ -99,6 +99,7 @@ struct amdgpu_xgmi {
>>>unsigned num_physical_nodes;
>>>/* gpu list in the same hive */
>>>struct list_head head;
>>> + bool supported;
>>>};
>>>
>>>struct amdgpu_gmc {
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> index fb37e69f1bba..f8c86d0593dd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>>>
>>>int count = 0, ret = -EINVAL;
>>>
>>> - if ((adev->asic_type < CHIP_VEGA20) ||
>>> - (adev->flags & AMD_IS_APU) )
>>> + if (!adev->gmc.xgmi.supported)
>>>return 0;
>> What about the (adev->flags & AMD_IS_APU) part ?
> It's covered by the check.  Only vega20 has the flag set.
>
> Alex
>
>> Andrey
>>
>>> +
>>>adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp);
>>>adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index 3a4e5d8d5162..ed3145b2a596 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle)
>>>}
>>>adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
>>>
>>> - if (adev->asic_type == CHIP_VEGA20) {
>>> + if (adev->gmc.xgmi.supported) {
>>>r = gfxhub_v1_1_get_xgmi_info(adev);
>>>if (r)
>>>return r;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 83624e150ca7..8849b74078d6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>>>return -EINVAL;
>>>}
>>>
>>> + if (adev->asic_type == CHIP_VEGA20)
>>> + adev->gmc.xgmi.supported = true;
>>> +
>>>if (adev->flags & AMD_IS_APU)
>>>adev->nbio_funcs = &nbio_v7_0_funcs;
>>>else if (adev->asic_type == CHIP_VEGA20)

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add a xgmi supported flag

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 3:34 PM Grodzovsky, Andrey
 wrote:
>
>
>
> On 11/30/2018 03:30 PM, Alex Deucher wrote:
> > Use this to track whether an asic supports xgmi rather than
> > checking the asic type everywhere.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  | 1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +-
> >   drivers/gpu/drm/amd/amdgpu/soc15.c   | 3 +++
> >   4 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > index 8c57924c075f..81e6070d255b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > @@ -99,6 +99,7 @@ struct amdgpu_xgmi {
> >   unsigned num_physical_nodes;
> >   /* gpu list in the same hive */
> >   struct list_head head;
> > + bool supported;
> >   };
> >
> >   struct amdgpu_gmc {
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> > index fb37e69f1bba..f8c86d0593dd 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> > @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
> >
> >   int count = 0, ret = -EINVAL;
> >
> > - if ((adev->asic_type < CHIP_VEGA20) ||
> > - (adev->flags & AMD_IS_APU) )
> > + if (!adev->gmc.xgmi.supported)
> >   return 0;
>
> What about the (adev->flags & AMD_IS_APU) part ?

It's covered by the check.  Only vega20 has the flag set.

Alex

>
> Andrey
>
> > +
> >   adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp);
> >   adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > index 3a4e5d8d5162..ed3145b2a596 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle)
> >   }
> >   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> >
> > - if (adev->asic_type == CHIP_VEGA20) {
> > + if (adev->gmc.xgmi.supported) {
> >   r = gfxhub_v1_1_get_xgmi_info(adev);
> >   if (r)
> >   return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 83624e150ca7..8849b74078d6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
> >   return -EINVAL;
> >   }
> >
> > + if (adev->asic_type == CHIP_VEGA20)
> > + adev->gmc.xgmi.supported = true;
> > +
> >   if (adev->flags & AMD_IS_APU)
> >   adev->nbio_funcs = &nbio_v7_0_funcs;
> >   else if (adev->asic_type == CHIP_VEGA20)
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add a xgmi supported flag

2018-11-30 Thread Grodzovsky, Andrey


On 11/30/2018 03:30 PM, Alex Deucher wrote:
> Use this to track whether an asic supports xgmi rather than
> checking the asic type everywhere.
>
> Signed-off-by: Alex Deucher 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  | 1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +-
>   drivers/gpu/drm/amd/amdgpu/soc15.c   | 3 +++
>   4 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index 8c57924c075f..81e6070d255b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -99,6 +99,7 @@ struct amdgpu_xgmi {
>   unsigned num_physical_nodes;
>   /* gpu list in the same hive */
>   struct list_head head;
> + bool supported;
>   };
>   
>   struct amdgpu_gmc {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index fb37e69f1bba..f8c86d0593dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>   
>   int count = 0, ret = -EINVAL;
>   
> - if ((adev->asic_type < CHIP_VEGA20) ||
> - (adev->flags & AMD_IS_APU) )
> + if (!adev->gmc.xgmi.supported)
>   return 0;

What about the (adev->flags & AMD_IS_APU) part ?

Andrey

> +
>   adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp);
>   adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 3a4e5d8d5162..ed3145b2a596 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle)
>   }
>   adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
>   
> - if (adev->asic_type == CHIP_VEGA20) {
> + if (adev->gmc.xgmi.supported) {
>   r = gfxhub_v1_1_get_xgmi_info(adev);
>   if (r)
>   return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 83624e150ca7..8849b74078d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>   return -EINVAL;
>   }
>   
> + if (adev->asic_type == CHIP_VEGA20)
> + adev->gmc.xgmi.supported = true;
> +
>   if (adev->flags & AMD_IS_APU)
>   adev->nbio_funcs = &nbio_v7_0_funcs;
>   else if (adev->asic_type == CHIP_VEGA20)

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: add a xgmi supported flag

2018-11-30 Thread Alex Deucher
Use this to track whether an asic supports xgmi rather than
checking the asic type everywhere.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 2 +-
 drivers/gpu/drm/amd/amdgpu/soc15.c   | 3 +++
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 8c57924c075f..81e6070d255b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -99,6 +99,7 @@ struct amdgpu_xgmi {
unsigned num_physical_nodes;
/* gpu list in the same hive */
struct list_head head;
+   bool supported;
 };
 
 struct amdgpu_gmc {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index fb37e69f1bba..f8c86d0593dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -94,9 +94,9 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
 
int count = 0, ret = -EINVAL;
 
-   if ((adev->asic_type < CHIP_VEGA20) ||
-   (adev->flags & AMD_IS_APU) )
+   if (!adev->gmc.xgmi.supported)
return 0;
+
adev->gmc.xgmi.node_id = psp_xgmi_get_node_id(&adev->psp);
adev->gmc.xgmi.hive_id = psp_xgmi_get_hive_id(&adev->psp);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3a4e5d8d5162..ed3145b2a596 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -934,7 +934,7 @@ static int gmc_v9_0_sw_init(void *handle)
}
adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
 
-   if (adev->asic_type == CHIP_VEGA20) {
+   if (adev->gmc.xgmi.supported) {
r = gfxhub_v1_1_get_xgmi_info(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 83624e150ca7..8849b74078d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -507,6 +507,9 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
return -EINVAL;
}
 
+   if (adev->asic_type == CHIP_VEGA20)
+   adev->gmc.xgmi.supported = true;
+
if (adev->flags & AMD_IS_APU)
adev->nbio_funcs = &nbio_v7_0_funcs;
else if (adev->asic_type == CHIP_VEGA20)
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 3:12 PM Grodzovsky, Andrey
 wrote:
>
>
> On 11/30/2018 03:08 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
> >  wrote:
> >>
> >>
> >> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> >>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> >>>  wrote:
>  XGMI hive has some resources allocted on device init which
>  needs to be deallocated when the device is unregistered.
> 
>  v2: Remove creation of dedicated wq for XGMI hive reset.
> 
>  Signed-off-by: Andrey Grodzovsky 
>  ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
> 3 files changed, 24 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  index c75badf..bfd286c 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct 
>  amdgpu_device *adev)
> {
>    int i, r;
> 
>  +   if (adev->gmc.xgmi.num_physical_nodes > 1)
>  +   amdgpu_xgmi_remove_device(adev);
>  +
>    amdgpu_amdkfd_device_fini(adev);
> 
>    amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>  index fb37e69..38e1599 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>  @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device 
>  *adev)
>    mutex_unlock(&xgmi_mutex);
>    return ret;
> }
>  +
>  +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>  +{
>  +   struct amdgpu_hive_info *hive;
>  +
>  +   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & 
>  AMD_IS_APU))
>  +   return;
> >>> It would be nice to have something better here to check against.  This
> >>> seems kind of fragile.  Can we check based on some xgmi related
> >>> structure?
> >>>
> >>> Alex
> >> This check is same as in amdgpu_xgmi_add_device, i guess we could
> >> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
> >> hive->device_list if amdgpu_xgmi_add_device was called.
> >> Sounds ok ?
> > Sounds good.  We should probably fix up the other case as well at some 
> > point.
> >
> > Thanks!
> >
> > Alex
>
> The other case is in amdgpu_xgmi_add_device - at the beginning - just
> avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ?
>

Maybe something like adev->gmc.xgmi.supported and set it in
soc15_set_ip_blocks() for vega20.  We should do something similar for
ras.  That way for new asics we only need to set the variable rather
than updating asic checks scattered around the code.

Alex

> Andrey
>
>
> >
> >> Andrey
> >>
>  +
>  +   mutex_lock(&xgmi_mutex);
>  +
>  +   hive = amdgpu_get_xgmi_hive(adev);
>  +   if (!hive)
>  +   goto exit;
>  +
>  +   if (!(hive->number_devices--))
>  +   mutex_destroy(&hive->hive_lock);
>  +
>  +exit:
>  +   mutex_unlock(&xgmi_mutex);
>  +}
>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>  index 6335bfd..6151eb9 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>  @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
> struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device 
>  *adev);
> int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
>  amdgpu_device *adev);
> int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>  +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
> 
> #endif
>  --
>  2.7.4
> 
>  ___
>  amd-gfx mailing list
>  amd-gfx@lists.freedesktop.org
>  https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Grodzovsky, Andrey

On 11/30/2018 03:08 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
>  wrote:
>>
>>
>> On 11/30/2018 02:49 PM, Alex Deucher wrote:
>>> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
>>>  wrote:
 XGMI hive has some resources allocted on device init which
 needs to be deallocated when the device is unregistered.

 v2: Remove creation of dedicated wq for XGMI hive reset.

 Signed-off-by: Andrey Grodzovsky 
 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
3 files changed, 24 insertions(+)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
 index c75badf..bfd286c 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
 @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct 
 amdgpu_device *adev)
{
   int i, r;

 +   if (adev->gmc.xgmi.num_physical_nodes > 1)
 +   amdgpu_xgmi_remove_device(adev);
 +
   amdgpu_amdkfd_device_fini(adev);

   amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 index fb37e69..38e1599 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
   mutex_unlock(&xgmi_mutex);
   return ret;
}
 +
 +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
 +{
 +   struct amdgpu_hive_info *hive;
 +
 +   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
 +   return;
>>> It would be nice to have something better here to check against.  This
>>> seems kind of fragile.  Can we check based on some xgmi related
>>> structure?
>>>
>>> Alex
>> This check is same as in amdgpu_xgmi_add_device, i guess we could
>> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
>> hive->device_list if amdgpu_xgmi_add_device was called.
>> Sounds ok ?
> Sounds good.  We should probably fix up the other case as well at some point.
>
> Thanks!
>
> Alex

The other case is in amdgpu_xgmi_add_device - at the beginning - just 
avoids XGMI code path for non VEGA20 dGPUs - how can we avoid this check ?

Andrey


>
>> Andrey
>>
 +
 +   mutex_lock(&xgmi_mutex);
 +
 +   hive = amdgpu_get_xgmi_hive(adev);
 +   if (!hive)
 +   goto exit;
 +
 +   if (!(hive->number_devices--))
 +   mutex_destroy(&hive->hive_lock);
 +
 +exit:
 +   mutex_unlock(&xgmi_mutex);
 +}
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 index 6335bfd..6151eb9 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device 
 *adev);
int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
 amdgpu_device *adev);
int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
 +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);

#endif
 --
 2.7.4

 ___
 amd-gfx mailing list
 amd-gfx@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 3:06 PM Grodzovsky, Andrey
 wrote:
>
>
>
> On 11/30/2018 02:49 PM, Alex Deucher wrote:
> > On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
> >  wrote:
> >> XGMI hive has some resources allocted on device init which
> >> needs to be deallocated when the device is unregistered.
> >>
> >> v2: Remove creation of dedicated wq for XGMI hive reset.
> >>
> >> Signed-off-by: Andrey Grodzovsky 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
> >>   3 files changed, 24 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index c75badf..bfd286c 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct 
> >> amdgpu_device *adev)
> >>   {
> >>  int i, r;
> >>
> >> +   if (adev->gmc.xgmi.num_physical_nodes > 1)
> >> +   amdgpu_xgmi_remove_device(adev);
> >> +
> >>  amdgpu_amdkfd_device_fini(adev);
> >>
> >>  amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> index fb37e69..38e1599 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
> >>  mutex_unlock(&xgmi_mutex);
> >>  return ret;
> >>   }
> >> +
> >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
> >> +{
> >> +   struct amdgpu_hive_info *hive;
> >> +
> >> +   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
> >> +   return;
> > It would be nice to have something better here to check against.  This
> > seems kind of fragile.  Can we check based on some xgmi related
> > structure?
> >
> > Alex
>
> This check is same as in amdgpu_xgmi_add_device, i guess we could
> look if adev->gmc.xgmi.head is empty or not since it's only gets added to
> hive->device_list if amdgpu_xgmi_add_device was called.
> Sounds ok ?

Sounds good.  We should probably fix up the other case as well at some point.

Thanks!

Alex

>
> Andrey
>
> >
> >> +
> >> +   mutex_lock(&xgmi_mutex);
> >> +
> >> +   hive = amdgpu_get_xgmi_hive(adev);
> >> +   if (!hive)
> >> +   goto exit;
> >> +
> >> +   if (!(hive->number_devices--))
> >> +   mutex_destroy(&hive->hive_lock);
> >> +
> >> +exit:
> >> +   mutex_unlock(&xgmi_mutex);
> >> +}
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> index 6335bfd..6151eb9 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> >> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
> >>   struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device 
> >> *adev);
> >>   int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
> >> amdgpu_device *adev);
> >>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
> >> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
> >>
> >>   #endif
> >> --
> >> 2.7.4
> >>
> >> ___
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Grodzovsky, Andrey


On 11/30/2018 02:49 PM, Alex Deucher wrote:
> On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
>  wrote:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> v2: Remove creation of dedicated wq for XGMI hive reset.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
>>   3 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c75badf..bfd286c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
>> *adev)
>>   {
>>  int i, r;
>>
>> +   if (adev->gmc.xgmi.num_physical_nodes > 1)
>> +   amdgpu_xgmi_remove_device(adev);
>> +
>>  amdgpu_amdkfd_device_fini(adev);
>>
>>  amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> index fb37e69..38e1599 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>>  mutex_unlock(&xgmi_mutex);
>>  return ret;
>>   }
>> +
>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>> +{
>> +   struct amdgpu_hive_info *hive;
>> +
>> +   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
>> +   return;
> It would be nice to have something better here to check against.  This
> seems kind of fragile.  Can we check based on some xgmi related
> structure?
>
> Alex

This check is same as in amdgpu_xgmi_add_device, i guess we could
look if adev->gmc.xgmi.head is empty or not since it's only gets added to
hive->device_list if amdgpu_xgmi_add_device was called.
Sounds ok ?

Andrey

>
>> +
>> +   mutex_lock(&xgmi_mutex);
>> +
>> +   hive = amdgpu_get_xgmi_hive(adev);
>> +   if (!hive)
>> +   goto exit;
>> +
>> +   if (!(hive->number_devices--))
>> +   mutex_destroy(&hive->hive_lock);
>> +
>> +exit:
>> +   mutex_unlock(&xgmi_mutex);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> index 6335bfd..6151eb9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
>>   struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>>   int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
>> amdgpu_device *adev);
>>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>>
>>   #endif
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 1:17 PM Andrey Grodzovsky
 wrote:
>
> XGMI hive has some resources allocted on device init which
> needs to be deallocated when the device is unregistered.
>
> v2: Remove creation of dedicated wq for XGMI hive reset.
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
>  3 files changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c75badf..bfd286c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
> *adev)
>  {
> int i, r;
>
> +   if (adev->gmc.xgmi.num_physical_nodes > 1)
> +   amdgpu_xgmi_remove_device(adev);
> +
> amdgpu_amdkfd_device_fini(adev);
>
> amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index fb37e69..38e1599 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
> mutex_unlock(&xgmi_mutex);
> return ret;
>  }
> +
> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
> +{
> +   struct amdgpu_hive_info *hive;
> +
> +   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
> +   return;

It would be nice to have something better here to check against.  This
seems kind of fragile.  Can we check based on some xgmi related
structure?

Alex

> +
> +   mutex_lock(&xgmi_mutex);
> +
> +   hive = amdgpu_get_xgmi_hive(adev);
> +   if (!hive)
> +   goto exit;
> +
> +   if (!(hive->number_devices--))
> +   mutex_destroy(&hive->hive_lock);
> +
> +exit:
> +   mutex_unlock(&xgmi_mutex);
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> index 6335bfd..6151eb9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
> @@ -35,5 +35,6 @@ struct amdgpu_hive_info {
>  struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
>  int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
> amdgpu_device *adev);
>  int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>
>  #endif
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[pull] amdgpu, amdkfd, ttm, scheduler, radeon drm-next-4.21

2018-11-30 Thread Alex Deucher
Hi Dave,

More new features for 4.21:

amdgpu and amdkfd:
- Freesync support
- ABM support in DC
- KFD support for vega12 and polaris12
- Add sdma paging queue support for vega
- Use ACPI to query backlight range on supported platforms
- Clean up doorbell handling
- KFD fix for pasid handling under non-HWS
- Misc cleanups and fixes

scheduler:
- Revert "fix timeout handling v2"

radeon:
- Fix possible overflow on 32 bit

ttm:
- Fix for LRU handling for ghost objects

The following changes since commit 9235dd441af43599b9cdcce599a3da4083fcad3c:

  Merge branch 'drm-next-4.21' of git://people.freedesktop.org/~agd5f/linux 
into drm-next (2018-11-19 11:07:52 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-next-4.21

for you to fetch changes up to 2c486cc4c2774df684d8a43ca7a20670c67ccd76:

  drm/amdgpu: wait for IB test on first device open (2018-11-30 12:01:35 -0500)


Alex Deucher (4):
  drm/amdgpu/gfx: use proper offset define for MEC doorbells
  drm/amdgpu/psp: use define rather than magic number for mode1 reset
  drm/amdgpu: don't expose fan attributes on APUs
  drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities

Andrey Grodzovsky (3):
  drm/amdgpu: Refactor amdgpu_xgmi_add_device
  drm/amdgpu: Expose hive adev list and xgmi_mutex
  drm/amdgpu: Refactor GPU reset for XGMI hive case

Bhawanpreet Lakha (2):
  drm/amd/display: Set RMX_ASPECT as default
  drm/amd/display: Fix Scaling (RMX_*) for DC driver

Brajeswar Ghosh (7):
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: Remove duplicate header
  drm/amd/amdgpu/vce_v3_0.c: Remove duplicate header
  drm/amd/amdgpu: Remove duplicate header
  drm/amd/display/amdgpu_dm/amdgpu_dm.c: Remove duplicate header
  drm/amd/amdgpu: Remove duplicate header
  drm/amd/amdkfd: Remove duplicate header
  drm/amd/display: Remove duplicate header

Charlene Liu (1):
  drm/amd/display: expose surface confirm color function

Chengming Gui (1):
  Revert "drm/amdgpu: use GMC v9 KIQ workaround only for the GFXHUB" (v2)

Chris Wilson (1):
  drm/amdgpu: Reorder uvd ring init before uvd resume

Christian König (3):
  drm/sched: revert "fix timeout handling v2" v2
  drm/ttm: fix LRU handling in ttm_buffer_object_transfer
  drm/amdgpu: wait for IB test on first device open

Colin Ian King (3):
  drm/amd/display: fix dereference of pointer fs_params before it is null 
checked
  drm/amdgpu: fix spelling mistake "Pramater" -> "Parameter"
  drm/amd/pp: fix spelling mistake "dependancy" -> "dependency"

David Francis (10):
  drm/amd/display: Remove dc_stream_state->status
  drm/amd/display: Check for dmcu initialization before calling dmcu
  drm/amd/display: Clean up dp_blank functions
  drm/amd/display: Get backlight controller id from link
  drm/amd/display: Clean up DCN1 clock requests
  drm/amd/display: Load DMCU IRAM
  drm/amd: Add abm level drm property
  drm/amd: update ATIF functions in AMD ACPI header
  drm/amd: Query and use ACPI backlight caps
  drm/amd/display: Fix compile error with ACPI disabled

Dmytro Laktyushkin (2):
  drm/amd/display: redesign scaling rotation math
  drm/amd/display: fix pipe interdependent hubp programming

Emily Deng (1):
  drm/amd/amdgpu/sriov: Aligned the definition with libgv

Eric Bernstein (1):
  drm/amd/display: get tail pipe before aquire free pipe

Eric Huang (1):
  drm/amdkfd: change system memory overcommit limit

Felix Kuehling (2):
  drm/amdkfd: Fix and simplify sync object handling for KFD
  drm/amdgpu: Fix KFD doorbell SG BO mapping

Gang Ba (1):
  drm/amdkfd: Added Vega12 and Polaris12 for KFD.

Guttula, Suresh (2):
  drm/amd/powerplay:add hwmgr callback to update nbpstate on Carrizo
  drm/amd:Enable/Disable NBPSTATE on On/OFF of UVD

Harish Kasiviswanathan (2):
  drm/amdgpu: Remove explicit wait after VM validate
  drm/amdgpu: KFD Restore process: Optimize waiting

Jerry (Fangzhi) Zuo (1):
  drm/amd/display: Fix NULL ptr when calculating refresh rate

Joerg Roedel (1):
  drm/amd/powerplay: Ratelimit all "was not implemented" messages

Joshua Aberback (1):
  drm/amd/display: Adjust stream enable sequence

Jun Lei (2):
  drm/amd/display: make underflow status clear explicit
  drm/amd/display: clear underflow on optc unblank

Murton Liu (1):
  drm/amd/display: fix gamma not being applied correctly

Nevenko Stupar (1):
  drm/amd/display: expose dentist_get_divider_from_did

Nicholas Kazlauskas (9):
  drm/amdgpu: Add amdgpu "max bpc" connector property (v2)
  drm/amd/display: Support amdgpu "max bpc" connector property (v2)
  drm/amd/display: Use private obj helpers for dm_atomic_state
  drm: Add vrr_capable property to the drm connector
  drm: Add vrr_enabled property to drm CRTC
  drm: Document varia

[PATCH v2 3/3] drm/amdgpu: Implement concurrent asic reset for XGMI.

2018-11-30 Thread Andrey Grodzovsky
Use per hive wq to concurrently send reset commands to all nodes
in the hive.

v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop  the hive reset for each node loop if there
is a reset failure on any of the nodes.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 44 ++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c8ad6bf..6fc023b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -910,7 +910,9 @@ struct amdgpu_device {
boolin_gpu_reset;
struct mutex  lock_reset;
struct amdgpu_doorbell_index doorbell_index;
+
int asic_reset_res;
+   struct work_struct  xgmi_reset_work;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bfd286c..9fd9f63 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2356,6 +2356,19 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device 
*adev)
return amdgpu_device_asic_has_dc_support(adev->asic_type);
 }
 
+
+static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
+{
+   struct amdgpu_device *adev =
+   container_of(__work, struct amdgpu_device, xgmi_reset_work);
+
+   adev->asic_reset_res =  amdgpu_asic_reset(adev);
+   if (adev->asic_reset_res)
+   DRM_WARN("ASIC reset failed with err r, %d for drm dev, %s",
+adev->asic_reset_res, adev->ddev->unique);
+}
+
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -2454,6 +2467,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
  amdgpu_device_delay_enable_gfx_off);
 
+   INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+
adev->gfx.gfx_off_req_count = 1;
adev->pm.ac_power = power_supply_is_system_supplied() > 0 ? true : 
false;
 
@@ -3331,10 +3346,31 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info 
*hive,
 */
if (need_full_reset) {
list_for_each_entry(tmp_adev, device_list_handle, 
gmc.xgmi.head) {
-   r = amdgpu_asic_reset(tmp_adev);
-   if (r)
-   DRM_WARN("ASIC reset failed with err r, %d for 
drm dev, %s",
+   /* For XGMI run all resets in parallel to speed up the 
process */
+   if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
+   if (!queue_work(system_highpri_wq, 
&tmp_adev->xgmi_reset_work))
+   r = -EALREADY;
+   } else
+   r = amdgpu_asic_reset(tmp_adev);
+
+   if (r) {
+   DRM_ERROR("ASIC reset failed with err r, %d for 
drm dev, %s",
 r, tmp_adev->ddev->unique);
+   break;
+   }
+   }
+
+   /* For XGMI wait for all PSP resets to complete before proceed 
*/
+   if (!r) {
+   list_for_each_entry(tmp_adev, device_list_handle,
+   gmc.xgmi.head) {
+   if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
+   flush_work(&tmp_adev->xgmi_reset_work);
+   r = tmp_adev->asic_reset_res;
+   if (r)
+   break;
+   }
+   }
}
}
 
@@ -3521,8 +3557,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (tmp_adev == adev)
continue;
 
-   dev_info(tmp_adev->dev, "GPU reset begin for drm dev %s!\n", 
adev->ddev->unique);
-
amdgpu_device_lock_adev(tmp_adev);
r = amdgpu_device_pre_asic_reset(tmp_adev,
 NULL,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 1/3] drm/amdgpu/psp: Update waiting in psp mode1 reset.

2018-11-30 Thread Andrey Grodzovsky
No point in use mdelay unless running from interrupt context (which we are not)
This is busy wait which will block the CPU for the entirety of the wait time.
Also, reduce wait time to 500ms as it is done in refernce code because
it might cause PSP FW timeout issues during XGMI hive reset.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index 082093a..9c1569b 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -557,7 +557,7 @@ static int psp_v11_0_mode1_reset(struct psp_context *psp)
/*send the mode 1 reset command*/
WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST);
 
-   mdelay(1000);
+   msleep(500);
 
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
index 7efb823..7357fd5 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
@@ -592,7 +592,7 @@ static int psp_v3_1_mode1_reset(struct psp_context *psp)
/*send the mode 1 reset command*/
WREG32(offset, GFX_CTRL_CMD_ID_MODE1_RST);
 
-   mdelay(1000);
+   msleep(500);
 
offset = SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_33);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/3] drm/amdgpu: Handle xgmi device removal.

2018-11-30 Thread Andrey Grodzovsky
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

v2: Remove creation of dedicated wq for XGMI hive reset.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 +
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c75badf..bfd286c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1864,6 +1864,9 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 {
int i, r;
 
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
amdgpu_amdkfd_device_fini(adev);
 
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index fb37e69..38e1599 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -135,3 +135,23 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
mutex_unlock(&xgmi_mutex);
return ret;
 }
+
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
+{
+   struct amdgpu_hive_info *hive;
+
+   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
+   return;
+
+   mutex_lock(&xgmi_mutex);
+
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (!hive)
+   goto exit;
+
+   if (!(hive->number_devices--))
+   mutex_destroy(&hive->hive_lock);
+
+exit:
+   mutex_unlock(&xgmi_mutex);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6335bfd..6151eb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -35,5 +35,6 @@ struct amdgpu_hive_info {
 struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);
 int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
amdgpu_device *adev);
 int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

2018-11-30 Thread Kuehling, Felix
Won't this break VM fault handling in KFD? I don't see a way with the current 
code that you can leave some VM faults for KFD to process. If we could consider 
VM faults with VMIDs 8-15 as not handled in amdgpu and leave them for KFD to 
process, then this could work.

As far as I can tell, the only code path that leave IRQs unhandled and passes 
them to KFD prints an error message in the kernel log. We can't have the kernel 
log flooded with error messages every time there are IRQs for KFD. We can get 
extremely high frequency interrupts for HSA signals.

Regards,
  Felix

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, November 30, 2018 10:03 AM
To: Christian König 
Cc: amd-gfx list 
Subject: Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after 
processing them v2

On Fri, Nov 30, 2018 at 7:36 AM Christian König 
 wrote:
>
> This allows us to filter out VM faults in the GMC code.
>
> v2: don't filter out all faults
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 
> +++--
>  1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 6b6524f04ce0..6db4c58ddc13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device 
> *adev,
> if (!amdgpu_ih_prescreen_iv(adev))
> return;
>
> -   /* Before dispatching irq to IP blocks, send it to amdkfd */
> -   amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]);
> -
> entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
> amdgpu_ih_decode_iv(adev, &entry);
>
> @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
> unsigned client_id = entry->client_id;
> unsigned src_id = entry->src_id;
> struct amdgpu_irq_src *src;
> +   bool handled = false;
> int r;
>
> trace_amdgpu_iv(entry);
>
> if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
> -   DRM_DEBUG("Invalid client_id in IV: %d\n", client_id);
> +   DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
> return;
> }
>
> if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
> -   DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
> +   DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
> return;
> }
>
> if (adev->irq.virq[src_id]) {
> generic_handle_irq(irq_find_mapping(adev->irq.domain, 
> src_id));
> -   } else {
> -   if (!adev->irq.client[client_id].sources) {
> -   DRM_DEBUG("Unregistered interrupt client_id: %d 
> src_id: %d\n",
> - client_id, src_id);
> -   return;
> -   }
> +   return;
> +   }
>
> +   if (!adev->irq.client[client_id].sources) {
> +   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n",
> + client_id, src_id);
> +   return;
> +   } else {
> src = adev->irq.client[client_id].sources[src_id];
> if (!src) {
> DRM_DEBUG("Unhandled interrupt src_id: %d\n", 
> src_id); @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device 
> *adev,
> }
>
> r = src->funcs->process(adev, src, entry);
> -   if (r)
> +   if (r < 0)
> DRM_ERROR("error processing interrupt (%d)\n", 
> r);
> +   else if (r)
> +   handled = true;
> }
> +
> +   /* Send it to amdkfd as well if it isn't already handled */
> +   if (!handled)
> +   amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
>  }
>
>  /**
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

2018-11-30 Thread Grodzovsky, Andrey


On 11/30/2018 10:53 AM, Koenig, Christian wrote:
> Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>> On 11/30/2018 04:03 AM, Christian König wrote:
>>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
 XGMI hive has some resources allocted on device init which
 needs to be deallocated when the device is unregistered.

 Add per hive wq to allow all the nodes in hive to run resets
 concurently - this should speed up the total reset time to avoid
 breaching the PSP FW timeout.
>>> Do you really want a workqueue? That sounds like the exact opposite of
>>> what you describe here.
>>>
>>> E.g. a workqueue is there to serialize all work items scheduled to it.
>>>
>>> What you most likely rather want is a work item per device which are
>>> scheduled, run in parallel and are joined back together before
>>> dropping the per hive lock.
>>>
>>> Christian.
>> As you can see in second patch that exactly how I use this wq. I define
>> per adev work item all of which I enqueue during GPU reset routine.
>> I believe the wq I created will processes work items in parallel since
>> it's is served by per CPU worker thread (thread pool) while  a
>> serializing wq is
>> created  using the 'create_singlethread_workqueue' interface which
>> creates an ordered wq which process at most one item simultaneously.
> That is correct, but you are still limited to one work item processed
> per CPU which is not necessary what you want.
I was wrong to say it's single worker thread per CPU, it's actually work 
thread pool
per CPU, so even with single CPU in the system you get concurrency of 
execution.

>
>> I did have my doubts about creating a dedicated wq instead of reusing
>> existing system wide wq such as system_unbound_wq because this adds new
>> kernel threads but eventually I opted for my own queue because there is
>> a hard requirement from the PSP FW to complete all ASIC resets in
>> defined finite amount
>> of time (1s i believe)   otherwise some nodes in the XGMI hive might
>> switch to single GPU mode due to timeout.
>> If I rely on system wide wq then during stress times this queue might
>> fail to process my work items within that time limit because it's
>> overloaded by work items from other queue clients.
>> Maybe I should squash the 2 changes into one to make all of this more
>> clear ?
> No, that won't help. You would rather need to document that properly.
>
> But creating a wq just for a rarely used device reset is completely
> overkill.
>
> Stuff like that should go to the system_highpri_wq work queue which
> should make already sure that it never hits the PSP timeout.
>
> Christian.

I agree in general with that, but it looks like system_highpri_wq is 
'BOUND' wq meaning it's items are
limited to the thread pool of the CPU from which they were queued only, 
anyway, since I mostly just wait in sleep inside PSP reset
function that should be still OK.
Will try that.

Andrey

>
>> Andrey
 Signed-off-by: Andrey Grodzovsky 
 ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 
     drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
     2 files changed, 27 insertions(+)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 index fb37e69..9ac2dc5 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
 @@ -61,6 +61,8 @@ struct amdgpu_hive_info
 *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
     INIT_LIST_HEAD(&tmp->device_list);
     mutex_init(&tmp->hive_lock);
     +    tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND |
 WQ_HIGHPRI, 0);
 +
     return tmp;
     }
     @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct
 amdgpu_device *adev)
     mutex_unlock(&xgmi_mutex);
     return ret;
     }
 +
 +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
 +{
 +    struct amdgpu_hive_info *hive;
 +
 +    if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
 +    return;
 +
 +    mutex_lock(&xgmi_mutex);
 +
 +    hive = amdgpu_get_xgmi_hive(adev);
 +    if (!hive)
 +    goto exit;
 +
 +    if (!(hive->number_devices--)) {
 +    mutex_destroy(&hive->hive_lock);
 +    destroy_workqueue(hive->reset_queue);
 +    }
 +
 +exit:
 +    mutex_unlock(&xgmi_mutex);
 +}
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 index 6335bfd..285ab93 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
 @@ -30,10 +30,13 @@ struct amdgpu_hive_info {
     struct psp_xgmi_topology_info    topology_info;
     int number_devices;
     struct mutex hive_lock;
 +    /* hive members reset wq */
 +    str

Re: [PATCH 11/11] drm/amdgpu: disable IH ring 1 & 2 WPTR overflow on Vega10

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> That should add back pressure on the client.
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index f5c5ea628fdf..dd7f52f08fd7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -193,6 +193,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
>
> ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
> ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> +  WPTR_OVERFLOW_ENABLE, 0);
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
>
> /* set rptr, wptr to 0 */
> @@ -207,6 +209,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
>
> ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
> ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> +  WPTR_OVERFLOW_ENABLE, 0);
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
>
> /* set rptr, wptr to 0 */
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 10/11] drm/amdgpu: add support for self irq on Vega10

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> This finally enables processing of ring 1 & 2.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
>  1 file changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 562fd036bb9e..f5c5ea628fdf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -258,7 +258,7 @@ static void vega10_ih_irq_disable(struct amdgpu_device 
> *adev)
>  static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
>   struct amdgpu_ih_ring *ih)
>  {
> -   u32 wptr, tmp;
> +   u32 wptr, reg, tmp;
>
> wptr = le32_to_cpu(*ih->wptr_cpu);
>
> @@ -274,9 +274,18 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
>  wptr, ih->rptr, tmp);
> ih->rptr = tmp;
>
> -   tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, 
> mmIH_RB_CNTL));
> +   if (ih == &adev->irq.ih)
> +   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
> +   else if (ih == &adev->irq.ih1)
> +   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
> +   else if (ih == &adev->irq.ih2)
> +   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
> +   else
> +   BUG();
> +
> +   tmp = RREG32_NO_KIQ(reg);
> tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
> -   WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
> +   WREG32_NO_KIQ(reg, tmp);
> }
> return (wptr & ih->ptr_mask);
>  }
> @@ -338,9 +347,52 @@ static void vega10_ih_set_rptr(struct amdgpu_device 
> *adev,
> /* XXX check if swapping is necessary on BE */
> *ih->rptr_cpu = ih->rptr;
> WDOORBELL32(ih->doorbell_index, ih->rptr);
> -   } else {
> +   } else if (ih == &adev->irq.ih) {
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
> +   } else if (ih == &adev->irq.ih1) {
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, ih->rptr);
> +   } else if (ih == &adev->irq.ih2) {
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, ih->rptr);
> +   }
> +}
> +
> +/**
> + * vega10_ih_self_irq - dispatch work for ring 1 and 2
> + *
> + * @adev: amdgpu_device pointer
> + * @source: irq source
> + * @entry: IV with WPTR update
> + *
> + * Update the WPTR from the IV and schedule work to handle the entries.
> + */
> +static int vega10_ih_self_irq(struct amdgpu_device *adev,
> + struct amdgpu_irq_src *source,
> + struct amdgpu_iv_entry *entry)
> +{
> +   uint32_t wptr = cpu_to_le32(entry->src_data[0]);
> +
> +   switch (entry->ring_id) {
> +   case 1:
> +   *adev->irq.ih1.wptr_cpu = wptr;
> +   schedule_work(&adev->irq.ih1_work);
> +   break;
> +   case 2:
> +   *adev->irq.ih2.wptr_cpu = wptr;
> +   schedule_work(&adev->irq.ih2_work);
> +   break;
> +   default: break;
> }
> +   return 0;
> +}
> +
> +static const struct amdgpu_irq_src_funcs vega10_ih_self_irq_funcs = {
> +   .process = vega10_ih_self_irq,
> +};
> +
> +static void vega10_ih_set_self_irq_funcs(struct amdgpu_device *adev)
> +{
> +   adev->irq.self_irq.num_types = 0;
> +   adev->irq.self_irq.funcs = &vega10_ih_self_irq_funcs;
>  }
>
>  static int vega10_ih_early_init(void *handle)
> @@ -348,13 +400,19 @@ static int vega10_ih_early_init(void *handle)
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> vega10_ih_set_interrupt_funcs(adev);
> +   vega10_ih_set_self_irq_funcs(adev);
> return 0;
>  }
>
>  static int vega10_ih_sw_init(void *handle)
>  {
> -   int r;
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +   int r;
> +
> +   r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_IH, 0,
> + &adev->irq.self_irq);
> +   if (r)
> +   return r;
>
> r = amdgpu_ih_ring_init(adev, &adev->irq.ih, 256 * 1024, true);
> if (r)
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 09/11] drm/amdgpu: add support for processing IH ring 1 & 2

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> Previously we only added the ring buffer memory, now add the handling as
> well.
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  4 ++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 127ed7de841d..64ec92bd74fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -176,6 +176,36 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg)
> return ret;
>  }
>
> +/**
> + * amdgpu_irq_handle_ih1 - kick of processing for IH1
> + *
> + * @work: work structure in struct amdgpu_irq
> + *
> + * Kick of processing IH ring 1.
> + */
> +static void amdgpu_irq_handle_ih1(struct work_struct *work)
> +{
> +   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
> + irq.ih1_work);
> +
> +   amdgpu_ih_process(adev, &adev->irq.ih1, amdgpu_irq_callback);
> +}
> +
> +/**
> + * amdgpu_irq_handle_ih2 - kick of processing for IH2
> + *
> + * @work: work structure in struct amdgpu_irq
> + *
> + * Kick of processing IH ring 2.
> + */
> +static void amdgpu_irq_handle_ih2(struct work_struct *work)
> +{
> +   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
> + irq.ih2_work);
> +
> +   amdgpu_ih_process(adev, &adev->irq.ih2, amdgpu_irq_callback);
> +}
> +
>  /**
>   * amdgpu_msi_ok - check whether MSI functionality is enabled
>   *
> @@ -240,6 +270,9 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> amdgpu_hotplug_work_func);
> }
>
> +   INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1);
> +   INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2);
> +
> adev->irq.installed = true;
> r = drm_irq_install(adev->ddev, adev->ddev->pdev->irq);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index 7e06fa64321a..c27decfda494 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -89,7 +89,9 @@ struct amdgpu_irq {
>
> /* interrupt rings */
> struct amdgpu_ih_ring   ih, ih1, ih2;
> -   const struct amdgpu_ih_funcs*ih_funcs;
> +   const struct amdgpu_ih_funcs*ih_funcs;
> +   struct work_struct  ih1_work, ih2_work;
> +   struct amdgpu_irq_src   self_irq;
>
> /* gen irq stuff */
> struct irq_domain   *domain; /* GPU irq controller domain 
> */
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 05/11] drm/amdgpu: add IH ring to ih_get_wptr/ih_set_rptr v2

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> Let's start to support multiple rings.
>
> v2: decode IV is needed as well
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  |  6 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  | 13 +++---
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c | 29 +++--
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 31 +++---
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 29 +++--
>  drivers/gpu/drm/amd/amdgpu/si_ih.c  | 31 +++---
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 43 ++-
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 56 +
>  8 files changed, 128 insertions(+), 110 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index 8af67f649660..fb8dd6179926 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -137,7 +137,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct 
> amdgpu_ih_ring *ih,
> if (!ih->enabled || adev->shutdown)
> return IRQ_NONE;
>
> -   wptr = amdgpu_ih_get_wptr(adev);
> +   wptr = amdgpu_ih_get_wptr(adev, ih);
>
>  restart_ih:
> /* is somebody else already processing irqs? */
> @@ -154,11 +154,11 @@ int amdgpu_ih_process(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih,
> ih->rptr &= ih->ptr_mask;
> }
>
> -   amdgpu_ih_set_rptr(adev);
> +   amdgpu_ih_set_rptr(adev, ih);
> atomic_set(&ih->lock, 0);
>
> /* make sure wptr hasn't changed while processing */
> -   wptr = amdgpu_ih_get_wptr(adev);
> +   wptr = amdgpu_ih_get_wptr(adev, ih);
> if (wptr != ih->rptr)
> goto restart_ih;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> index f877bb78d10a..d810fd73d574 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> @@ -50,15 +50,16 @@ struct amdgpu_ih_ring {
>  /* provided by the ih block */
>  struct amdgpu_ih_funcs {
> /* ring read/write ptr handling, called from interrupt context */
> -   u32 (*get_wptr)(struct amdgpu_device *adev);
> -   void (*decode_iv)(struct amdgpu_device *adev,
> +   u32 (*get_wptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring 
> *ih);
> +   void (*decode_iv)(struct amdgpu_device *adev, struct amdgpu_ih_ring 
> *ih,
>   struct amdgpu_iv_entry *entry);
> -   void (*set_rptr)(struct amdgpu_device *adev);
> +   void (*set_rptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring 
> *ih);
>  };
>
> -#define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev))
> -#define amdgpu_ih_decode_iv(adev, iv) 
> (adev)->irq.ih_funcs->decode_iv((adev), (iv))
> -#define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev))
> +#define amdgpu_ih_get_wptr(adev, ih) (adev)->irq.ih_funcs->get_wptr((adev), 
> (ih))
> +#define amdgpu_ih_decode_iv(adev, iv) \
> +   (adev)->irq.ih_funcs->decode_iv((adev), (ih), (iv))
> +#define amdgpu_ih_set_rptr(adev, ih) (adev)->irq.ih_funcs->set_rptr((adev), 
> (ih))
>

Could simplify the interface and skip the adev pointer.  Just hang the
adev off the ih if we need it.  Either way:
Acked-by: Alex Deucher 


>  int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring 
> *ih,
> unsigned ring_size, bool use_bus_addr);
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> index 8a8b4967a101..884aa9b81e86 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> @@ -183,11 +183,12 @@ static void cik_ih_irq_disable(struct amdgpu_device 
> *adev)
>   * Used by cik_irq_process().
>   * Returns the value of the wptr.
>   */
> -static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
> +static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
> +  struct amdgpu_ih_ring *ih)
>  {
> u32 wptr, tmp;
>
> -   wptr = le32_to_cpu(adev->wb.wb[adev->irq.ih.wptr_offs]);
> +   wptr = le32_to_cpu(adev->wb.wb[ih->wptr_offs]);
>
> if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) {
> wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK;
> @@ -196,13 +197,13 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
>  * this should allow us to catchup.
>  */
> dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 
> 0x%08X)\n",
> -   wptr, adev->irq.ih.rptr, (wptr + 16) & 
> adev->irq.ih.ptr_mask);
> -   adev->irq.ih.rptr = (wptr + 16) & adev->irq.ih.ptr_mask;
> +wptr, ih->rptr, (wptr + 16) & ih->ptr_mask);
> +   ih->rptr = (wptr + 16) & ih->ptr_mask;
> tmp = RREG32(mmIH_RB_CNTL);
> tmp |= IH_RB_CNTL__WP

Re: [PATCH 04/11] drm/amdgpu: move IV prescreening into the GMC code

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> The GMC/VM subsystem is causing the faults, so move the handling here as
> well.
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c |  4 --
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13 
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 13 
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 59 ++
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 13 
>  drivers/gpu/drm/amd/amdgpu/si_ih.c  | 14 -
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 13 
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 82 -
>  9 files changed, 59 insertions(+), 154 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> index 9ce8c93ec19b..f877bb78d10a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> @@ -51,14 +51,12 @@ struct amdgpu_ih_ring {
>  struct amdgpu_ih_funcs {
> /* ring read/write ptr handling, called from interrupt context */
> u32 (*get_wptr)(struct amdgpu_device *adev);
> -   bool (*prescreen_iv)(struct amdgpu_device *adev);
> void (*decode_iv)(struct amdgpu_device *adev,
>   struct amdgpu_iv_entry *entry);
> void (*set_rptr)(struct amdgpu_device *adev);
>  };
>
>  #define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev))
> -#define amdgpu_ih_prescreen_iv(adev) 
> (adev)->irq.ih_funcs->prescreen_iv((adev))
>  #define amdgpu_ih_decode_iv(adev, iv) 
> (adev)->irq.ih_funcs->decode_iv((adev), (iv))
>  #define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev))
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 6db4c58ddc13..318874952bc2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -145,10 +145,6 @@ static void amdgpu_irq_callback(struct amdgpu_device 
> *adev,
> u32 ring_index = ih->rptr >> 2;
> struct amdgpu_iv_entry entry;
>
> -   /* Prescreening of high-frequency interrupts */
> -   if (!amdgpu_ih_prescreen_iv(adev))
> -   return;
> -
> entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
> amdgpu_ih_decode_iv(adev, &entry);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> index 3e6c8c4067cb..8a8b4967a101 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> @@ -228,18 +228,6 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
>   * [127:96] - reserved
>   */
>
> -/**
> - * cik_ih_prescreen_iv - prescreen an interrupt vector
> - *
> - * @adev: amdgpu_device pointer
> - *
> - * Returns true if the interrupt vector should be further processed.
> - */
> -static bool cik_ih_prescreen_iv(struct amdgpu_device *adev)
> -{
> -   return true;
> -}
> -
>   /**
>   * cik_ih_decode_iv - decode an interrupt vector
>   *
> @@ -445,7 +433,6 @@ static const struct amd_ip_funcs cik_ih_ip_funcs = {
>
>  static const struct amdgpu_ih_funcs cik_ih_funcs = {
> .get_wptr = cik_ih_get_wptr,
> -   .prescreen_iv = cik_ih_prescreen_iv,
> .decode_iv = cik_ih_decode_iv,
> .set_rptr = cik_ih_set_rptr
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> index 447b3cbc47e5..9d3ea298e116 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> @@ -207,18 +207,6 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev)
> return (wptr & adev->irq.ih.ptr_mask);
>  }
>
> -/**
> - * cz_ih_prescreen_iv - prescreen an interrupt vector
> - *
> - * @adev: amdgpu_device pointer
> - *
> - * Returns true if the interrupt vector should be further processed.
> - */
> -static bool cz_ih_prescreen_iv(struct amdgpu_device *adev)
> -{
> -   return true;
> -}
> -
>  /**
>   * cz_ih_decode_iv - decode an interrupt vector
>   *
> @@ -426,7 +414,6 @@ static const struct amd_ip_funcs cz_ih_ip_funcs = {
>
>  static const struct amdgpu_ih_funcs cz_ih_funcs = {
> .get_wptr = cz_ih_get_wptr,
> -   .prescreen_iv = cz_ih_prescreen_iv,
> .decode_iv = cz_ih_decode_iv,
> .set_rptr = cz_ih_set_rptr
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index e329a23e1f99..8995fca47efb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -244,6 +244,62 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
> amdgpu_device *adev,
> return 0;
>  }
>
> +/**
> + * vega10_ih_prescreen_iv - prescreen an interrupt vector
> + *
> + * @adev: amdgpu_device pointer
> + *
> + * Returns true if the interrupt vector should be further processed.
> + */
> +static bool

Re: [PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2

2018-11-30 Thread Christian König

Am 30.11.18 um 17:01 schrieb Alex Deucher:

On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:

The entries are ignored for now, but it at least stops crashing the
hardware when somebody tries to push something to the other IH rings.

v2: limit ring size, add TODO comment

Signed-off-by: Christian König 

We may want to guard this with some flag so we can easily disable it
(like we did for the paging queue) in case we run into issues on any
asics.


This should actually fix some issues since when you currently enable 
sources which go to ring 1 & 2 the hardware just crashes.


But I can add some checking if the rings are actually allocated. This 
way we can disable them by just commenting out two lines.


Christian.



Alex


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |   4 +-
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 129 +++-
  2 files changed, 107 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index f6ce171cb8aa..7e06fa64321a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -87,8 +87,8 @@ struct amdgpu_irq {
 /* status, etc. */
 boolmsi_enabled; /* msi enabled */

-   /* interrupt ring */
-   struct amdgpu_ih_ring   ih;
+   /* interrupt rings */
+   struct amdgpu_ih_ring   ih, ih1, ih2;
 const struct amdgpu_ih_funcs*ih_funcs;

 /* gen irq stuff */
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 3e9ebb0de94d..562fd036bb9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device 
*adev)
 ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1);
 WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl);
 adev->irq.ih.enabled = true;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
+   adev->irq.ih1.enabled = true;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
+   adev->irq.ih2.enabled = true;
  }

  /**
@@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct 
amdgpu_device *adev)
 WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0);
 adev->irq.ih.enabled = false;
 adev->irq.ih.rptr = 0;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
+   /* set rptr, wptr to 0 */
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0);
+   adev->irq.ih1.enabled = false;
+   adev->irq.ih1.rptr = 0;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
+   /* set rptr, wptr to 0 */
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0);
+   adev->irq.ih2.enabled = false;
+   adev->irq.ih2.rptr = 0;
+}
+
+static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t 
ih_rb_cntl)
+{
+   int rb_bufsz = order_base_2(ih->ring_size / 4);
+
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  MC_SPACE, ih->use_bus_addr ? 1 : 4);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_OVERFLOW_CLEAR, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_OVERFLOW_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz);
+   /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR register
+* value is written to memory
+*/
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_WRITEBACK_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0);
+
+   return ih_rb_cntl;
  }

  /**
@@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct 
amdgpu_device *adev)
   */
  static int vega10_ih_irq_init(struct amdgpu_device *adev)
  {
-   struct amdgpu_ih_ring *ih = &adev->irq.ih;
+   struct amdgpu_ih_ring *ih;
 int ret = 0;
-   int rb_bufsz;
   

Re: [PATCH 03/11] drm/amdgpu: remove VM fault_credit handling

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> printk_ratelimit() is much better suited to limit the number of reported
> VM faults.
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 37 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  5 
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +---
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 18 +---
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +---
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 18 +---
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  |  7 ++---
>  7 files changed, 6 insertions(+), 115 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 2acb9838913e..a2f149da83f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
> amdgpu_vm *vm,
> }
>
> INIT_KFIFO(vm->faults);
> -   vm->fault_credit = 16;
>
> return 0;
>
> @@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
> amdgpu_vm *vm)
> amdgpu_vmid_free_reserved(adev, vm, i);
>  }
>
> -/**
> - * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID
> - *
> - * @adev: amdgpu_device pointer
> - * @pasid: PASID do identify the VM
> - *
> - * This function is expected to be called in interrupt context.
> - *
> - * Returns:
> - * True if there was fault credit, false otherwise
> - */
> -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
> - unsigned int pasid)
> -{
> -   struct amdgpu_vm *vm;
> -
> -   spin_lock(&adev->vm_manager.pasid_lock);
> -   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
> -   if (!vm) {
> -   /* VM not found, can't track fault credit */
> -   spin_unlock(&adev->vm_manager.pasid_lock);
> -   return true;
> -   }
> -
> -   /* No lock needed. only accessed by IRQ handler */
> -   if (!vm->fault_credit) {
> -   /* Too many faults in this VM */
> -   spin_unlock(&adev->vm_manager.pasid_lock);
> -   return false;
> -   }
> -
> -   vm->fault_credit--;
> -   spin_unlock(&adev->vm_manager.pasid_lock);
> -   return true;
> -}
> -
>  /**
>   * amdgpu_vm_manager_init - init the VM manager
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 2a8898d19c8b..e8dcfd59fc93 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -229,9 +229,6 @@ struct amdgpu_vm {
> /* Up to 128 pending retry page faults */
> DECLARE_KFIFO(faults, u64, 128);
>
> -   /* Limit non-retry fault storms */
> -   unsigned intfault_credit;
> -
> /* Points to the KFD process VM info */
> struct amdkfd_process_info *process_info;
>
> @@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
> amdgpu_vm *vm,
>  int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, 
> unsigned int pasid);
>  void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm 
> *vm);
>  void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
> -bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
> - unsigned int pasid);
>  void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
>  struct list_head *validated,
>  struct amdgpu_bo_list_entry *entry);
> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> index b5775c6a857b..3e6c8c4067cb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
> @@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
>   */
>  static bool cik_ih_prescreen_iv(struct amdgpu_device *adev)
>  {
> -   u32 ring_index = adev->irq.ih.rptr >> 2;
> -   u16 pasid;
> -
> -   switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) {
> -   case 146:
> -   case 147:
> -   pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16;
> -   if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid))
> -   return true;
> -   break;
> -   default:
> -   /* Not a VM fault */
> -   return true;
> -   }
> -
> -   adev->irq.ih.rptr += 16;
> -   return false;
> +   return true;
>  }
>
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> index df5ac4d85a00..447b3cbc47e5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
> @@ -216,23 +216,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_de

Re: [PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> This allows us to filter out VM faults in the GMC code.
>
> v2: don't filter out all faults
>
> Signed-off-by: Christian König 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++--
>  1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 6b6524f04ce0..6db4c58ddc13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device 
> *adev,
> if (!amdgpu_ih_prescreen_iv(adev))
> return;
>
> -   /* Before dispatching irq to IP blocks, send it to amdkfd */
> -   amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]);
> -
> entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
> amdgpu_ih_decode_iv(adev, &entry);
>
> @@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
> unsigned client_id = entry->client_id;
> unsigned src_id = entry->src_id;
> struct amdgpu_irq_src *src;
> +   bool handled = false;
> int r;
>
> trace_amdgpu_iv(entry);
>
> if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
> -   DRM_DEBUG("Invalid client_id in IV: %d\n", client_id);
> +   DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
> return;
> }
>
> if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
> -   DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
> +   DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
> return;
> }
>
> if (adev->irq.virq[src_id]) {
> generic_handle_irq(irq_find_mapping(adev->irq.domain, 
> src_id));
> -   } else {
> -   if (!adev->irq.client[client_id].sources) {
> -   DRM_DEBUG("Unregistered interrupt client_id: %d 
> src_id: %d\n",
> - client_id, src_id);
> -   return;
> -   }
> +   return;
> +   }
>
> +   if (!adev->irq.client[client_id].sources) {
> +   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n",
> + client_id, src_id);
> +   return;
> +   } else {
> src = adev->irq.client[client_id].sources[src_id];
> if (!src) {
> DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id);
> @@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
> }
>
> r = src->funcs->process(adev, src, entry);
> -   if (r)
> +   if (r < 0)
> DRM_ERROR("error processing interrupt (%d)\n", r);
> +   else if (r)
> +   handled = true;
> }
> +
> +   /* Send it to amdkfd as well if it isn't already handled */
> +   if (!handled)
> +   amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
>  }
>
>  /**
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 08/11] drm/amdgpu: add the IH to the IV trace

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> To distinct on which IH ring an IV was found.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 318874952bc2..127ed7de841d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -148,6 +148,8 @@ static void amdgpu_irq_callback(struct amdgpu_device 
> *adev,
> entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
> amdgpu_ih_decode_iv(adev, &entry);
>
> +   trace_amdgpu_iv(ih - &adev->irq.ih, &entry);
> +
> amdgpu_irq_dispatch(adev, &entry);
>  }
>
> @@ -367,8 +369,6 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
> bool handled = false;
> int r;
>
> -   trace_amdgpu_iv(entry);
> -
> if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
> DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
> return;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> index 626abca770a0..6e388d7753c1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
> @@ -76,9 +76,10 @@ TRACE_EVENT(amdgpu_mm_wreg,
>  );
>
>  TRACE_EVENT(amdgpu_iv,
> -   TP_PROTO(struct amdgpu_iv_entry *iv),
> -   TP_ARGS(iv),
> +   TP_PROTO(unsigned ih, struct amdgpu_iv_entry *iv),
> +   TP_ARGS(ih, iv),
> TP_STRUCT__entry(
> +__field(unsigned, ih)
>  __field(unsigned, client_id)
>  __field(unsigned, src_id)
>  __field(unsigned, ring_id)
> @@ -90,6 +91,7 @@ TRACE_EVENT(amdgpu_iv,
>  __array(unsigned, src_data, 4)
> ),
> TP_fast_assign(
> +  __entry->ih = ih;
>__entry->client_id = iv->client_id;
>__entry->src_id = iv->src_id;
>__entry->ring_id = iv->ring_id;
> @@ -103,8 +105,9 @@ TRACE_EVENT(amdgpu_iv,
>__entry->src_data[2] = iv->src_data[2];
>__entry->src_data[3] = iv->src_data[3];
>),
> -   TP_printk("client_id:%u src_id:%u ring:%u vmid:%u timestamp: %llu 
> pasid:%u src_data: %08x %08x %08x %08x",
> - __entry->client_id, __entry->src_id,
> +   TP_printk("ih: %u client_id:%u src_id:%u ring:%u vmid:%u "
> + "timestamp: %llu pasid:%u src_data: %08x %08x %08x 
> %08x",
> + __entry->ih, __entry->client_id, __entry->src_id,
>   __entry->ring_id, __entry->vmid,
>   __entry->timestamp, __entry->pasid,
>   __entry->src_data[0], __entry->src_data[1],
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> The entries are ignored for now, but it at least stops crashing the
> hardware when somebody tries to push something to the other IH rings.
>
> v2: limit ring size, add TODO comment
>
> Signed-off-by: Christian König 

We may want to guard this with some flag so we can easily disable it
(like we did for the paging queue) in case we run into issues on any
asics.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |   4 +-
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 129 +++-
>  2 files changed, 107 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index f6ce171cb8aa..7e06fa64321a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -87,8 +87,8 @@ struct amdgpu_irq {
> /* status, etc. */
> boolmsi_enabled; /* msi enabled */
>
> -   /* interrupt ring */
> -   struct amdgpu_ih_ring   ih;
> +   /* interrupt rings */
> +   struct amdgpu_ih_ring   ih, ih1, ih2;
> const struct amdgpu_ih_funcs*ih_funcs;
>
> /* gen irq stuff */
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 3e9ebb0de94d..562fd036bb9e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct 
> amdgpu_device *adev)
> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1);
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl);
> adev->irq.ih.enabled = true;
> +
> +   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 
> 1);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
> +   adev->irq.ih1.enabled = true;
> +
> +   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 
> 1);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
> +   adev->irq.ih2.enabled = true;
>  }
>
>  /**
> @@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct 
> amdgpu_device *adev)
> WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0);
> adev->irq.ih.enabled = false;
> adev->irq.ih.rptr = 0;
> +
> +   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 
> 0);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
> +   /* set rptr, wptr to 0 */
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0);
> +   adev->irq.ih1.enabled = false;
> +   adev->irq.ih1.rptr = 0;
> +
> +   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 
> 0);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
> +   /* set rptr, wptr to 0 */
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0);
> +   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0);
> +   adev->irq.ih2.enabled = false;
> +   adev->irq.ih2.rptr = 0;
> +}
> +
> +static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t 
> ih_rb_cntl)
> +{
> +   int rb_bufsz = order_base_2(ih->ring_size / 4);
> +
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +  MC_SPACE, ih->use_bus_addr ? 1 : 4);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +  WPTR_OVERFLOW_CLEAR, 1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +  WPTR_OVERFLOW_ENABLE, 1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz);
> +   /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR 
> register
> +* value is written to memory
> +*/
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +  WPTR_WRITEBACK_ENABLE, 1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0);
> +   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0);
> +
> +   return ih_rb_cntl;
>  }
>
>  /**
> @@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct 
> amdgpu_device *adev)
>   */
>  static int vega10_ih_irq_init(struct amdgpu_device *adev)
>  {
> -   struct amdgpu_ih_ring *ih = &adev->irq.ih;
> +   struct amdgpu_ih_ring *ih;
> int ret = 0;
> -   int rb_bufsz;
> u32 ih_rb_cntl, ih_doorbell_rtpr;
> u32 tmp;
>
> @@ -97,26 +147,15 @@ static int vega10_ih_irq_init(s

Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

2018-11-30 Thread Koenig, Christian
Am 30.11.18 um 16:14 schrieb Grodzovsky, Andrey:
>
> On 11/30/2018 04:03 AM, Christian König wrote:
>> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>>> XGMI hive has some resources allocted on device init which
>>> needs to be deallocated when the device is unregistered.
>>>
>>> Add per hive wq to allow all the nodes in hive to run resets
>>> concurently - this should speed up the total reset time to avoid
>>> breaching the PSP FW timeout.
>> Do you really want a workqueue? That sounds like the exact opposite of
>> what you describe here.
>>
>> E.g. a workqueue is there to serialize all work items scheduled to it.
>>
>> What you most likely rather want is a work item per device which are
>> scheduled, run in parallel and are joined back together before
>> dropping the per hive lock.
>>
>> Christian.
> As you can see in second patch that exactly how I use this wq. I define
> per adev work item all of which I enqueue during GPU reset routine.
> I believe the wq I created will processes work items in parallel since
> it's is served by per CPU worker thread (thread pool) while  a
> serializing wq is
> created  using the 'create_singlethread_workqueue' interface which
> creates an ordered wq which process at most one item simultaneously.

That is correct, but you are still limited to one work item processed 
per CPU which is not necessary what you want.

> I did have my doubts about creating a dedicated wq instead of reusing
> existing system wide wq such as system_unbound_wq because this adds new
> kernel threads but eventually I opted for my own queue because there is
> a hard requirement from the PSP FW to complete all ASIC resets in
> defined finite amount
> of time (1s i believe)   otherwise some nodes in the XGMI hive might
> switch to single GPU mode due to timeout.
> If I rely on system wide wq then during stress times this queue might
> fail to process my work items within that time limit because it's
> overloaded by work items from other queue clients.
> Maybe I should squash the 2 changes into one to make all of this more
> clear ?

No, that won't help. You would rather need to document that properly.

But creating a wq just for a rarely used device reset is completely 
overkill.

Stuff like that should go to the system_highpri_wq work queue which 
should make already sure that it never hits the PSP timeout.

Christian.

>
> Andrey
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
>>>    2 files changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> index fb37e69..9ac2dc5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>>> @@ -61,6 +61,8 @@ struct amdgpu_hive_info
>>> *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
>>>    INIT_LIST_HEAD(&tmp->device_list);
>>>    mutex_init(&tmp->hive_lock);
>>>    +    tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND |
>>> WQ_HIGHPRI, 0);
>>> +
>>>    return tmp;
>>>    }
>>>    @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct
>>> amdgpu_device *adev)
>>>    mutex_unlock(&xgmi_mutex);
>>>    return ret;
>>>    }
>>> +
>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>>> +{
>>> +    struct amdgpu_hive_info *hive;
>>> +
>>> +    if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
>>> +    return;
>>> +
>>> +    mutex_lock(&xgmi_mutex);
>>> +
>>> +    hive = amdgpu_get_xgmi_hive(adev);
>>> +    if (!hive)
>>> +    goto exit;
>>> +
>>> +    if (!(hive->number_devices--)) {
>>> +    mutex_destroy(&hive->hive_lock);
>>> +    destroy_workqueue(hive->reset_queue);
>>> +    }
>>> +
>>> +exit:
>>> +    mutex_unlock(&xgmi_mutex);
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>> index 6335bfd..285ab93 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>>> @@ -30,10 +30,13 @@ struct amdgpu_hive_info {
>>>    struct psp_xgmi_topology_info    topology_info;
>>>    int number_devices;
>>>    struct mutex hive_lock;
>>> +    /* hive members reset wq */
>>> +    struct workqueue_struct *reset_queue;
>>>    };
>>>      struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct
>>> amdgpu_device *adev);
>>>    int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive,
>>> struct amdgpu_device *adev);
>>>    int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>>>      #endif

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 06/11] drm/amdgpu: simplify IH programming

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> Calculate all the addresses and pointers in amdgpu_ih.c
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 34 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  | 23 +---
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c |  9 +++
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 11 
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c |  9 +++
>  drivers/gpu/drm/amd/amdgpu/si_ih.c  |  9 +++
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 27 +--
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 36 +
>  8 files changed, 73 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> index fb8dd6179926..d0a5db777b6d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
> @@ -52,6 +52,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct 
> amdgpu_ih_ring *ih,
> ih->use_bus_addr = use_bus_addr;
>
> if (use_bus_addr) {
> +   dma_addr_t dma_addr;
> +
> if (ih->ring)
> return 0;
>
> @@ -59,21 +61,26 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih,
>  * add them to the end of the ring allocation.
>  */
> ih->ring = dma_alloc_coherent(adev->dev, ih->ring_size + 8,
> - &ih->rb_dma_addr, GFP_KERNEL);
> + &dma_addr, GFP_KERNEL);
> if (ih->ring == NULL)
> return -ENOMEM;
>
> memset((void *)ih->ring, 0, ih->ring_size + 8);
> -   ih->wptr_offs = (ih->ring_size / 4) + 0;
> -   ih->rptr_offs = (ih->ring_size / 4) + 1;
> +   ih->gpu_addr = dma_addr;
> +   ih->wptr_addr = dma_addr + ih->ring_size;
> +   ih->wptr_cpu = &ih->ring[ih->ring_size / 4];
> +   ih->rptr_addr = dma_addr + ih->ring_size + 4;
> +   ih->rptr_cpu = &ih->ring[(ih->ring_size / 4) + 1];
> } else {
> -   r = amdgpu_device_wb_get(adev, &ih->wptr_offs);
> +   unsigned wptr_offs, rptr_offs;
> +
> +   r = amdgpu_device_wb_get(adev, &wptr_offs);
> if (r)
> return r;
>
> -   r = amdgpu_device_wb_get(adev, &ih->rptr_offs);
> +   r = amdgpu_device_wb_get(adev, &rptr_offs);
> if (r) {
> -   amdgpu_device_wb_free(adev, ih->wptr_offs);
> +   amdgpu_device_wb_free(adev, wptr_offs);
> return r;
> }
>
> @@ -82,10 +89,15 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih,
> &ih->ring_obj, &ih->gpu_addr,
> (void **)&ih->ring);
> if (r) {
> -   amdgpu_device_wb_free(adev, ih->rptr_offs);
> -   amdgpu_device_wb_free(adev, ih->wptr_offs);
> +   amdgpu_device_wb_free(adev, rptr_offs);
> +   amdgpu_device_wb_free(adev, wptr_offs);
> return r;
> }
> +
> +   ih->wptr_addr = adev->wb.gpu_addr + wptr_offs * 4;
> +   ih->wptr_cpu = &adev->wb.wb[wptr_offs];
> +   ih->rptr_addr = adev->wb.gpu_addr + rptr_offs * 4;
> +   ih->rptr_cpu = &adev->wb.wb[rptr_offs];
> }
> return 0;
>  }
> @@ -109,13 +121,13 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, 
> struct amdgpu_ih_ring *ih)
>  * add them to the end of the ring allocation.
>  */
> dma_free_coherent(adev->dev, ih->ring_size + 8,
> - (void *)ih->ring, ih->rb_dma_addr);
> + (void *)ih->ring, ih->gpu_addr);
> ih->ring = NULL;
> } else {
> amdgpu_bo_free_kernel(&ih->ring_obj, &ih->gpu_addr,
>   (void **)&ih->ring);
> -   amdgpu_device_wb_free(adev, ih->wptr_offs);
> -   amdgpu_device_wb_free(adev, ih->rptr_offs);
> +   amdgpu_device_wb_free(adev, (ih->wptr_addr - ih->gpu_addr) / 
> 4);
> +   amdgpu_device_wb_free(adev, (ih->rptr_addr - ih->gpu_addr) / 
> 4);
> }
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> index d810fd73d574..1ccb1831382a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
> @@ -31,20 +31,25 @@ struct amdgpu_iv_entry;
>   * R6xx+ IH ring
>   */
>  struct amdgpu_ih_ring

Re: [PATCH 01/11] drm/amdgpu: add missing error handling

2018-11-30 Thread Alex Deucher
On Fri, Nov 30, 2018 at 7:36 AM Christian König
 wrote:
>
> We ignored the return code here.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 3a4e5d8d5162..e329a23e1f99 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -902,6 +902,9 @@ static int gmc_v9_0_sw_init(void *handle)
> /* This interrupt is VMC page fault.*/
> r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VMC, 
> VMC_1_0__SRCID__VM_FAULT,
> &adev->gmc.vm_fault);
> +   if (r)
> +   return r;
> +
> r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_UTCL2, 
> UTCL2_1_0__SRCID__FAULT,
> &adev->gmc.vm_fault);
>
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix num_doorbell calculation issue

2018-11-30 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Oak Zeng 

Sent: Friday, November 30, 2018 10:39:21 AM
To: amd-gfx@lists.freedesktop.org
Cc: Yang, Philip; Zeng, Oak; Yin, Tianci (Rico)
Subject: [PATCH] drm/amdgpu: Fix num_doorbell calculation issue

When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.

Change-Id: I62023bb91da33ca5532b22b263771b587b796d59
Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8eaa40e..c75badf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -539,10 +539,13 @@ static int amdgpu_device_doorbell_init(struct 
amdgpu_device *adev)
 return -EINVAL;

 /* For Vega, reserve and map two pages on doorbell BAR since SDMA
-* paging queue doorbell use the second page
+* paging queue doorbell use the second page. The
+* AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
+* doorbells are in the first page. So with paging queue enabled,
+* the max num_doorbells should + 1 page (0x400 in dword)
  */
 if (adev->asic_type >= CHIP_VEGA10)
-   adev->doorbell.num_doorbells *= 2;
+   adev->doorbell.num_doorbells += 0x400;

 adev->doorbell.ptr = ioremap(adev->doorbell.base,
  adev->doorbell.num_doorbells *
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Fix num_doorbell calculation issue

2018-11-30 Thread Oak Zeng
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.

Change-Id: I62023bb91da33ca5532b22b263771b587b796d59
Signed-off-by: Oak Zeng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8eaa40e..c75badf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -539,10 +539,13 @@ static int amdgpu_device_doorbell_init(struct 
amdgpu_device *adev)
return -EINVAL;
 
/* For Vega, reserve and map two pages on doorbell BAR since SDMA
-* paging queue doorbell use the second page
+* paging queue doorbell use the second page. The
+* AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
+* doorbells are in the first page. So with paging queue enabled,
+* the max num_doorbells should + 1 page (0x400 in dword)
 */
if (adev->asic_type >= CHIP_VEGA10)
-   adev->doorbell.num_doorbells *= 2;
+   adev->doorbell.num_doorbells += 0x400;
 
adev->doorbell.ptr = ioremap(adev->doorbell.base,
 adev->doorbell.num_doorbells *
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream

2018-11-30 Thread Wentland, Harry
On 2018-11-30 10:09 a.m., Nicholas Kazlauskas wrote:
> [Why]
> With scaling, underscan and abm changes we can end up calling
> commit_planes_to_stream in commit_tail. This call uses dm_state->context
> which can be NULL if the commit was a fast update.
> 
> [How]
> Use dc_state instead since that can't be NULL unless the system ran
> out of memory.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912
> Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for 
> dm_atomic_state")
> 
> Cc: Leo Li 
> Cc: Harry Wentland 
> Signed-off-by: Nicholas Kazlauskas 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 76b1aebdca0c..8ecd78657d43 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
> drm_atomic_state *state)
>   status->plane_count,
>   dm_new_crtc_state,
>   to_dm_crtc_state(old_crtc_state),
> - dm_state->context))
> + dc_state))
>   dm_error("%s: Failed to update stream scaling!\n", 
> __func__);
>   }
>  
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream

2018-11-30 Thread Li, Sun peng (Leo)


On 2018-11-30 10:13 a.m., Deucher, Alexander wrote:
> Acked-by: Alex Deucher 

Reviewed-by: Leo Li 

> 
> 
> *From:* amd-gfx  on behalf of 
> Nicholas Kazlauskas 
> *Sent:* Friday, November 30, 2018 10:09:28 AM
> *To:* amd-gfx@lists.freedesktop.org
> *Cc:* Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas
> *Subject:* [PATCH] drm/amd/display: Fix NULL ptr deref for 
> commit_planes_to_stream
> [Why]
> With scaling, underscan and abm changes we can end up calling
> commit_planes_to_stream in commit_tail. This call uses dm_state->context
> which can be NULL if the commit was a fast update.
> 
> [How]
> Use dc_state instead since that can't be NULL unless the system ran
> out of memory.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912
> Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for 
> dm_atomic_state")
> 
> Cc: Leo Li 
> Cc: Harry Wentland 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 76b1aebdca0c..8ecd78657d43 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
> drm_atomic_state *state)
>   status->plane_count,
>   dm_new_crtc_state,
>   to_dm_crtc_state(old_crtc_state),
> -   dm_state->context))
> +   dc_state))
>   dm_error("%s: Failed to update stream 
> scaling!\n", __func__);
>   }
> 
> -- 
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

2018-11-30 Thread Grodzovsky, Andrey


On 11/30/2018 04:03 AM, Christian König wrote:
> Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:
>> XGMI hive has some resources allocted on device init which
>> needs to be deallocated when the device is unregistered.
>>
>> Add per hive wq to allow all the nodes in hive to run resets
>> concurently - this should speed up the total reset time to avoid
>> breaching the PSP FW timeout.
>
> Do you really want a workqueue? That sounds like the exact opposite of 
> what you describe here.
>
> E.g. a workqueue is there to serialize all work items scheduled to it.
>
> What you most likely rather want is a work item per device which are 
> scheduled, run in parallel and are joined back together before 
> dropping the per hive lock.
>
> Christian.

As you can see in second patch that exactly how I use this wq. I define 
per adev work item all of which I enqueue during GPU reset routine.
I believe the wq I created will processes work items in parallel since 
it's is served by per CPU worker thread (thread pool) while  a 
serializing wq is
created  using the 'create_singlethread_workqueue' interface which 
creates an ordered wq which process at most one item simultaneously.
I did have my doubts about creating a dedicated wq instead of reusing 
existing system wide wq such as system_unbound_wq because this adds new
kernel threads but eventually I opted for my own queue because there is 
a hard requirement from the PSP FW to complete all ASIC resets in  
defined finite amount
of time (1s i believe)   otherwise some nodes in the XGMI hive might 
switch to single GPU mode due to timeout.
If I rely on system wide wq then during stress times this queue might 
fail to process my work items within that time limit because it's 
overloaded by work items from other queue clients.
Maybe I should squash the 2 changes into one to make all of this more 
clear ?

Andrey
>>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
>>   2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> index fb37e69..9ac2dc5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
>> @@ -61,6 +61,8 @@ struct amdgpu_hive_info 
>> *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
>>   INIT_LIST_HEAD(&tmp->device_list);
>>   mutex_init(&tmp->hive_lock);
>>   +    tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | 
>> WQ_HIGHPRI, 0);
>> +
>>   return tmp;
>>   }
>>   @@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct 
>> amdgpu_device *adev)
>>   mutex_unlock(&xgmi_mutex);
>>   return ret;
>>   }
>> +
>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
>> +{
>> +    struct amdgpu_hive_info *hive;
>> +
>> +    if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
>> +    return;
>> +
>> +    mutex_lock(&xgmi_mutex);
>> +
>> +    hive = amdgpu_get_xgmi_hive(adev);
>> +    if (!hive)
>> +    goto exit;
>> +
>> +    if (!(hive->number_devices--)) {
>> +    mutex_destroy(&hive->hive_lock);
>> +    destroy_workqueue(hive->reset_queue);
>> +    }
>> +
>> +exit:
>> +    mutex_unlock(&xgmi_mutex);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> index 6335bfd..285ab93 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
>> @@ -30,10 +30,13 @@ struct amdgpu_hive_info {
>>   struct psp_xgmi_topology_info    topology_info;
>>   int number_devices;
>>   struct mutex hive_lock;
>> +    /* hive members reset wq */
>> +    struct workqueue_struct *reset_queue;
>>   };
>>     struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
>> amdgpu_device *adev);
>>   int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, 
>> struct amdgpu_device *adev);
>>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
>> +void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
>>     #endif
>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream

2018-11-30 Thread Deucher, Alexander
Acked-by: Alex Deucher 


From: amd-gfx  on behalf of Nicholas 
Kazlauskas 
Sent: Friday, November 30, 2018 10:09:28 AM
To: amd-gfx@lists.freedesktop.org
Cc: Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas
Subject: [PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream

[Why]
With scaling, underscan and abm changes we can end up calling
commit_planes_to_stream in commit_tail. This call uses dm_state->context
which can be NULL if the commit was a fast update.

[How]
Use dc_state instead since that can't be NULL unless the system ran
out of memory.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912
Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for 
dm_atomic_state")

Cc: Leo Li 
Cc: Harry Wentland 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 76b1aebdca0c..8ecd78657d43 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
 status->plane_count,
 dm_new_crtc_state,
 to_dm_crtc_state(old_crtc_state),
-   dm_state->context))
+   dc_state))
 dm_error("%s: Failed to update stream scaling!\n", 
__func__);
 }

--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Fix NULL ptr deref for commit_planes_to_stream

2018-11-30 Thread Nicholas Kazlauskas
[Why]
With scaling, underscan and abm changes we can end up calling
commit_planes_to_stream in commit_tail. This call uses dm_state->context
which can be NULL if the commit was a fast update.

[How]
Use dc_state instead since that can't be NULL unless the system ran
out of memory.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108912
Fixes: e64abff2f133 ("drm/amd/display: Use private obj helpers for 
dm_atomic_state")

Cc: Leo Li 
Cc: Harry Wentland 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 76b1aebdca0c..8ecd78657d43 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5011,7 +5011,7 @@ static void amdgpu_dm_atomic_commit_tail(struct 
drm_atomic_state *state)
status->plane_count,
dm_new_crtc_state,
to_dm_crtc_state(old_crtc_state),
-   dm_state->context))
+   dc_state))
dm_error("%s: Failed to update stream scaling!\n", 
__func__);
}
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow

2018-11-30 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, November 30, 2018 7:45:17 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow

It is unused.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 47 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  5 ---
 2 files changed, 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cf768acb51dc..cc50cb65c212 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -607,53 +607,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 return r;
 }

-/**
- * amdgpu_bo_backup_to_shadow - Backs up an &amdgpu_bo buffer object
- * @adev: amdgpu device object
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: &amdgpu_bo buffer to be backed up
- * @resv: reservation object with embedded fence
- * @fence: dma_fence associated with the operation
- * @direct: whether to submit the job directly
- *
- * Copies an &amdgpu_bo buffer object to its shadow object.
- * Not used for now.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
-  struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
-  struct reservation_object *resv,
-  struct dma_fence **fence,
-  bool direct)
-
-{
-   struct amdgpu_bo *shadow = bo->shadow;
-   uint64_t bo_addr, shadow_addr;
-   int r;
-
-   if (!shadow)
-   return -EINVAL;
-
-   bo_addr = amdgpu_bo_gpu_offset(bo);
-   shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
-
-   r = reservation_object_reserve_shared(bo->tbo.resv, 1);
-   if (r)
-   goto err;
-
-   r = amdgpu_copy_buffer(ring, bo_addr, shadow_addr,
-  amdgpu_bo_size(bo), resv, fence,
-  direct, false);
-   if (!r)
-   amdgpu_bo_fence(bo, *fence, true);
-
-err:
-   return r;
-}
-
 /**
  * amdgpu_bo_validate - validate an &amdgpu_bo buffer object
  * @bo: pointer to the buffer object
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 7d3312d0da11..9291c2f837e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -267,11 +267,6 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo);
 void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
  bool shared);
 u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo);
-int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
-  struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
-  struct reservation_object *resv,
-  struct dma_fence **fence, bool direct);
 int amdgpu_bo_validate(struct amdgpu_bo *bo);
 int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
  struct dma_fence **fence);
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/16 v2] drm/amd/display: Add tracing to dc

2018-11-30 Thread David Francis
[Why]
Tracing is a useful and cheap debug functionality

[How]
This creates a new trace system amdgpu_dm, currently with
three trace events

amdgpu_dc_rreg and amdgpu_dc_wreg report the address and value
of any dc register reads and writes

amdgpu_dc_performance requires at least one of those two to be
enabled.  It counts the register reads and writes since the
last entry

v2: Don't check for NULL before kfree

Signed-off-by: David Francis 
Reviewed-by: Harry Wentland 
Acked-by: Leo Li 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   3 +
 .../amd/display/amdgpu_dm/amdgpu_dm_trace.h   | 104 ++
 drivers/gpu/drm/amd/display/dc/core/dc.c  |  19 
 drivers/gpu/drm/amd/display/dc/dc_types.h |   8 ++
 .../amd/display/dc/dcn10/dcn10_cm_common.c|   4 +-
 drivers/gpu/drm/amd/display/dc/dm_services.h  |  12 +-
 6 files changed, 146 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 76b1aebdca0c..376927c8bcc6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -23,6 +23,9 @@
  *
  */
 
+/* The caprices of the preprocessor require that this be declared right here */
+#define CREATE_TRACE_POINTS
+
 #include "dm_services_types.h"
 #include "dc.h"
 #include "dc/inc/core_types.h"
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h
new file mode 100644
index ..d898981684d5
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_trace.h
@@ -0,0 +1,104 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM amdgpu_dm
+
+#if !defined(_AMDGPU_DM_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _AMDGPU_DM_TRACE_H_
+
+#include 
+
+TRACE_EVENT(amdgpu_dc_rreg,
+   TP_PROTO(unsigned long *read_count, uint32_t reg, uint32_t value),
+   TP_ARGS(read_count, reg, value),
+   TP_STRUCT__entry(
+   __field(uint32_t, reg)
+   __field(uint32_t, value)
+   ),
+   TP_fast_assign(
+   __entry->reg = reg;
+   __entry->value = value;
+   *read_count = *read_count + 1;
+   ),
+   TP_printk("reg=0x%08lx, value=0x%08lx",
+   (unsigned long)__entry->reg,
+   (unsigned long)__entry->value)
+);
+
+TRACE_EVENT(amdgpu_dc_wreg,
+   TP_PROTO(unsigned long *write_count, uint32_t reg, uint32_t value),
+   TP_ARGS(write_count, reg, value),
+   TP_STRUCT__entry(
+   __field(uint32_t, reg)
+   __field(uint32_t, value)
+   ),
+   TP_fast_assign(
+   __entry->reg = reg;
+   __entry->value = value;
+   *write_count = *write_count + 1;
+   ),
+   TP_printk("reg=0x%08lx, value=0x%08lx",
+   (unsigned long)__entry->reg,
+   (unsigned long)__entry->value)
+);
+
+
+TRACE_EVENT(amdgpu_dc_performance,
+   TP_PROTO(unsigned long read_count, unsigned long write_count,
+   unsigned long *last_read, unsigned long *last_write,
+   const char *func, unsigned int line),
+   TP_ARGS(read_count, write_count, last_read, last_write, func, line),
+   TP_STRUCT__entry(
+   __field(uint32_t, reads)
+   __field(uint32_t, writes)
+   __field(uint32_t, read_delta)
+   __field(uint32_t, write_delta)
+   __string(func, func)
+   __field(uint3

[PATCH] drm/amdgpu: remove amdgpu_bo_backup_to_shadow

2018-11-30 Thread Christian König
It is unused.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 47 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  5 ---
 2 files changed, 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cf768acb51dc..cc50cb65c212 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -607,53 +607,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
return r;
 }
 
-/**
- * amdgpu_bo_backup_to_shadow - Backs up an &amdgpu_bo buffer object
- * @adev: amdgpu device object
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: &amdgpu_bo buffer to be backed up
- * @resv: reservation object with embedded fence
- * @fence: dma_fence associated with the operation
- * @direct: whether to submit the job directly
- *
- * Copies an &amdgpu_bo buffer object to its shadow object.
- * Not used for now.
- *
- * Returns:
- * 0 for success or a negative error code on failure.
- */
-int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
-  struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
-  struct reservation_object *resv,
-  struct dma_fence **fence,
-  bool direct)
-
-{
-   struct amdgpu_bo *shadow = bo->shadow;
-   uint64_t bo_addr, shadow_addr;
-   int r;
-
-   if (!shadow)
-   return -EINVAL;
-
-   bo_addr = amdgpu_bo_gpu_offset(bo);
-   shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
-
-   r = reservation_object_reserve_shared(bo->tbo.resv, 1);
-   if (r)
-   goto err;
-
-   r = amdgpu_copy_buffer(ring, bo_addr, shadow_addr,
-  amdgpu_bo_size(bo), resv, fence,
-  direct, false);
-   if (!r)
-   amdgpu_bo_fence(bo, *fence, true);
-
-err:
-   return r;
-}
-
 /**
  * amdgpu_bo_validate - validate an &amdgpu_bo buffer object
  * @bo: pointer to the buffer object
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 7d3312d0da11..9291c2f837e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -267,11 +267,6 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo);
 void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence,
 bool shared);
 u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo);
-int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
-  struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
-  struct reservation_object *resv,
-  struct dma_fence **fence, bool direct);
 int amdgpu_bo_validate(struct amdgpu_bo *bo);
 int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
 struct dma_fence **fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 05/11] drm/amdgpu: add IH ring to ih_get_wptr/ih_set_rptr v2

2018-11-30 Thread Christian König
Let's start to support multiple rings.

v2: decode IV is needed as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  | 13 +++---
 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 29 +++--
 drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 31 +++---
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 29 +++--
 drivers/gpu/drm/amd/amdgpu/si_ih.c  | 31 +++---
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 43 ++-
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 56 +
 8 files changed, 128 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
index 8af67f649660..fb8dd6179926 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
@@ -137,7 +137,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih,
if (!ih->enabled || adev->shutdown)
return IRQ_NONE;
 
-   wptr = amdgpu_ih_get_wptr(adev);
+   wptr = amdgpu_ih_get_wptr(adev, ih);
 
 restart_ih:
/* is somebody else already processing irqs? */
@@ -154,11 +154,11 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih,
ih->rptr &= ih->ptr_mask;
}
 
-   amdgpu_ih_set_rptr(adev);
+   amdgpu_ih_set_rptr(adev, ih);
atomic_set(&ih->lock, 0);
 
/* make sure wptr hasn't changed while processing */
-   wptr = amdgpu_ih_get_wptr(adev);
+   wptr = amdgpu_ih_get_wptr(adev, ih);
if (wptr != ih->rptr)
goto restart_ih;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index f877bb78d10a..d810fd73d574 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -50,15 +50,16 @@ struct amdgpu_ih_ring {
 /* provided by the ih block */
 struct amdgpu_ih_funcs {
/* ring read/write ptr handling, called from interrupt context */
-   u32 (*get_wptr)(struct amdgpu_device *adev);
-   void (*decode_iv)(struct amdgpu_device *adev,
+   u32 (*get_wptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
+   void (*decode_iv)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih,
  struct amdgpu_iv_entry *entry);
-   void (*set_rptr)(struct amdgpu_device *adev);
+   void (*set_rptr)(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih);
 };
 
-#define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev))
-#define amdgpu_ih_decode_iv(adev, iv) (adev)->irq.ih_funcs->decode_iv((adev), 
(iv))
-#define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev))
+#define amdgpu_ih_get_wptr(adev, ih) (adev)->irq.ih_funcs->get_wptr((adev), 
(ih))
+#define amdgpu_ih_decode_iv(adev, iv) \
+   (adev)->irq.ih_funcs->decode_iv((adev), (ih), (iv))
+#define amdgpu_ih_set_rptr(adev, ih) (adev)->irq.ih_funcs->set_rptr((adev), 
(ih))
 
 int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih,
unsigned ring_size, bool use_bus_addr);
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index 8a8b4967a101..884aa9b81e86 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -183,11 +183,12 @@ static void cik_ih_irq_disable(struct amdgpu_device *adev)
  * Used by cik_irq_process().
  * Returns the value of the wptr.
  */
-static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
+static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
+  struct amdgpu_ih_ring *ih)
 {
u32 wptr, tmp;
 
-   wptr = le32_to_cpu(adev->wb.wb[adev->irq.ih.wptr_offs]);
+   wptr = le32_to_cpu(adev->wb.wb[ih->wptr_offs]);
 
if (wptr & IH_RB_WPTR__RB_OVERFLOW_MASK) {
wptr &= ~IH_RB_WPTR__RB_OVERFLOW_MASK;
@@ -196,13 +197,13 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
 * this should allow us to catchup.
 */
dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 
0x%08X)\n",
-   wptr, adev->irq.ih.rptr, (wptr + 16) & 
adev->irq.ih.ptr_mask);
-   adev->irq.ih.rptr = (wptr + 16) & adev->irq.ih.ptr_mask;
+wptr, ih->rptr, (wptr + 16) & ih->ptr_mask);
+   ih->rptr = (wptr + 16) & ih->ptr_mask;
tmp = RREG32(mmIH_RB_CNTL);
tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
WREG32(mmIH_RB_CNTL, tmp);
}
-   return (wptr & adev->irq.ih.ptr_mask);
+   return (wptr & ih->ptr_mask);
 }
 
 /*CIK IV Ring
@@ -237,16 +238,17 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
  * position and also advance the position.
  */
 static void cik_ih_decode_iv(struct amdgpu_device *adev,
+struct am

[PATCH 09/11] drm/amdgpu: add support for processing IH ring 1 & 2

2018-11-30 Thread Christian König
Previously we only added the ring buffer memory, now add the handling as
well.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 33 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |  4 ++-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 127ed7de841d..64ec92bd74fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -176,6 +176,36 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg)
return ret;
 }
 
+/**
+ * amdgpu_irq_handle_ih1 - kick of processing for IH1
+ *
+ * @work: work structure in struct amdgpu_irq
+ *
+ * Kick of processing IH ring 1.
+ */
+static void amdgpu_irq_handle_ih1(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ irq.ih1_work);
+
+   amdgpu_ih_process(adev, &adev->irq.ih1, amdgpu_irq_callback);
+}
+
+/**
+ * amdgpu_irq_handle_ih2 - kick of processing for IH2
+ *
+ * @work: work structure in struct amdgpu_irq
+ *
+ * Kick of processing IH ring 2.
+ */
+static void amdgpu_irq_handle_ih2(struct work_struct *work)
+{
+   struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
+ irq.ih2_work);
+
+   amdgpu_ih_process(adev, &adev->irq.ih2, amdgpu_irq_callback);
+}
+
 /**
  * amdgpu_msi_ok - check whether MSI functionality is enabled
  *
@@ -240,6 +270,9 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
amdgpu_hotplug_work_func);
}
 
+   INIT_WORK(&adev->irq.ih1_work, amdgpu_irq_handle_ih1);
+   INIT_WORK(&adev->irq.ih2_work, amdgpu_irq_handle_ih2);
+
adev->irq.installed = true;
r = drm_irq_install(adev->ddev, adev->ddev->pdev->irq);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index 7e06fa64321a..c27decfda494 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -89,7 +89,9 @@ struct amdgpu_irq {
 
/* interrupt rings */
struct amdgpu_ih_ring   ih, ih1, ih2;
-   const struct amdgpu_ih_funcs*ih_funcs;
+   const struct amdgpu_ih_funcs*ih_funcs;
+   struct work_struct  ih1_work, ih2_work;
+   struct amdgpu_irq_src   self_irq;
 
/* gen irq stuff */
struct irq_domain   *domain; /* GPU irq controller domain */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 06/11] drm/amdgpu: simplify IH programming

2018-11-30 Thread Christian König
Calculate all the addresses and pointers in amdgpu_ih.c

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 34 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  | 23 +---
 drivers/gpu/drm/amd/amdgpu/cik_ih.c |  9 +++
 drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 11 
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c |  9 +++
 drivers/gpu/drm/amd/amdgpu/si_ih.c  |  9 +++
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 27 +--
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 36 +
 8 files changed, 73 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
index fb8dd6179926..d0a5db777b6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
@@ -52,6 +52,8 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih,
ih->use_bus_addr = use_bus_addr;
 
if (use_bus_addr) {
+   dma_addr_t dma_addr;
+
if (ih->ring)
return 0;
 
@@ -59,21 +61,26 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih,
 * add them to the end of the ring allocation.
 */
ih->ring = dma_alloc_coherent(adev->dev, ih->ring_size + 8,
- &ih->rb_dma_addr, GFP_KERNEL);
+ &dma_addr, GFP_KERNEL);
if (ih->ring == NULL)
return -ENOMEM;
 
memset((void *)ih->ring, 0, ih->ring_size + 8);
-   ih->wptr_offs = (ih->ring_size / 4) + 0;
-   ih->rptr_offs = (ih->ring_size / 4) + 1;
+   ih->gpu_addr = dma_addr;
+   ih->wptr_addr = dma_addr + ih->ring_size;
+   ih->wptr_cpu = &ih->ring[ih->ring_size / 4];
+   ih->rptr_addr = dma_addr + ih->ring_size + 4;
+   ih->rptr_cpu = &ih->ring[(ih->ring_size / 4) + 1];
} else {
-   r = amdgpu_device_wb_get(adev, &ih->wptr_offs);
+   unsigned wptr_offs, rptr_offs;
+
+   r = amdgpu_device_wb_get(adev, &wptr_offs);
if (r)
return r;
 
-   r = amdgpu_device_wb_get(adev, &ih->rptr_offs);
+   r = amdgpu_device_wb_get(adev, &rptr_offs);
if (r) {
-   amdgpu_device_wb_free(adev, ih->wptr_offs);
+   amdgpu_device_wb_free(adev, wptr_offs);
return r;
}
 
@@ -82,10 +89,15 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ih_ring *ih,
&ih->ring_obj, &ih->gpu_addr,
(void **)&ih->ring);
if (r) {
-   amdgpu_device_wb_free(adev, ih->rptr_offs);
-   amdgpu_device_wb_free(adev, ih->wptr_offs);
+   amdgpu_device_wb_free(adev, rptr_offs);
+   amdgpu_device_wb_free(adev, wptr_offs);
return r;
}
+
+   ih->wptr_addr = adev->wb.gpu_addr + wptr_offs * 4;
+   ih->wptr_cpu = &adev->wb.wb[wptr_offs];
+   ih->rptr_addr = adev->wb.gpu_addr + rptr_offs * 4;
+   ih->rptr_cpu = &adev->wb.wb[rptr_offs];
}
return 0;
 }
@@ -109,13 +121,13 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, 
struct amdgpu_ih_ring *ih)
 * add them to the end of the ring allocation.
 */
dma_free_coherent(adev->dev, ih->ring_size + 8,
- (void *)ih->ring, ih->rb_dma_addr);
+ (void *)ih->ring, ih->gpu_addr);
ih->ring = NULL;
} else {
amdgpu_bo_free_kernel(&ih->ring_obj, &ih->gpu_addr,
  (void **)&ih->ring);
-   amdgpu_device_wb_free(adev, ih->wptr_offs);
-   amdgpu_device_wb_free(adev, ih->rptr_offs);
+   amdgpu_device_wb_free(adev, (ih->wptr_addr - ih->gpu_addr) / 4);
+   amdgpu_device_wb_free(adev, (ih->rptr_addr - ih->gpu_addr) / 4);
}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index d810fd73d574..1ccb1831382a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -31,20 +31,25 @@ struct amdgpu_iv_entry;
  * R6xx+ IH ring
  */
 struct amdgpu_ih_ring {
-   struct amdgpu_bo*ring_obj;
-   volatile uint32_t   *ring;
-   unsignedrptr;
unsignedring_size;
-   uint64_tgpu_addr;
uint32_tptr_mask;
-   atomic_tlock;
-   b

[PATCH 11/11] drm/amdgpu: disable IH ring 1 & 2 WPTR overflow on Vega10

2018-11-30 Thread Christian König
That should add back pressure on the client.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index f5c5ea628fdf..dd7f52f08fd7 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -193,6 +193,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
 
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+  WPTR_OVERFLOW_ENABLE, 0);
WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
 
/* set rptr, wptr to 0 */
@@ -207,6 +209,8 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
 
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
ih_rb_cntl = vega10_ih_rb_cntl(ih, ih_rb_cntl);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+  WPTR_OVERFLOW_ENABLE, 0);
WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
 
/* set rptr, wptr to 0 */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/11] drm/amdgpu: enable IH ring 1 and ring 2 v2

2018-11-30 Thread Christian König
The entries are ignored for now, but it at least stops crashing the
hardware when somebody tries to push something to the other IH rings.

v2: limit ring size, add TODO comment

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h |   4 +-
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 129 +++-
 2 files changed, 107 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index f6ce171cb8aa..7e06fa64321a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -87,8 +87,8 @@ struct amdgpu_irq {
/* status, etc. */
boolmsi_enabled; /* msi enabled */
 
-   /* interrupt ring */
-   struct amdgpu_ih_ring   ih;
+   /* interrupt rings */
+   struct amdgpu_ih_ring   ih, ih1, ih2;
const struct amdgpu_ih_funcs*ih_funcs;
 
/* gen irq stuff */
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 3e9ebb0de94d..562fd036bb9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -50,6 +50,16 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device 
*adev)
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1);
WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL, ih_rb_cntl);
adev->irq.ih.enabled = true;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
+   adev->irq.ih1.enabled = true;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
+   adev->irq.ih2.enabled = true;
 }
 
 /**
@@ -71,6 +81,47 @@ static void vega10_ih_disable_interrupts(struct 
amdgpu_device *adev)
WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR, 0);
adev->irq.ih.enabled = false;
adev->irq.ih.rptr = 0;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1, ih_rb_cntl);
+   /* set rptr, wptr to 0 */
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING1, 0);
+   adev->irq.ih1.enabled = false;
+   adev->irq.ih1.rptr = 0;
+
+   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2, ih_rb_cntl);
+   /* set rptr, wptr to 0 */
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, 0);
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_WPTR_RING2, 0);
+   adev->irq.ih2.enabled = false;
+   adev->irq.ih2.rptr = 0;
+}
+
+static uint32_t vega10_ih_rb_cntl(struct amdgpu_ih_ring *ih, uint32_t 
ih_rb_cntl)
+{
+   int rb_bufsz = order_base_2(ih->ring_size / 4);
+
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  MC_SPACE, ih->use_bus_addr ? 1 : 4);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_OVERFLOW_CLEAR, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_OVERFLOW_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_SIZE, rb_bufsz);
+   /* Ring Buffer write pointer writeback. If enabled, IH_RB_WPTR register
+* value is written to memory
+*/
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  WPTR_WRITEBACK_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_SNOOP, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_RO, 0);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, MC_VMID, 0);
+
+   return ih_rb_cntl;
 }
 
 /**
@@ -86,9 +137,8 @@ static void vega10_ih_disable_interrupts(struct 
amdgpu_device *adev)
  */
 static int vega10_ih_irq_init(struct amdgpu_device *adev)
 {
-   struct amdgpu_ih_ring *ih = &adev->irq.ih;
+   struct amdgpu_ih_ring *ih;
int ret = 0;
-   int rb_bufsz;
u32 ih_rb_cntl, ih_doorbell_rtpr;
u32 tmp;
 
@@ -97,26 +147,15 @@ static int vega10_ih_irq_init(struct amdgpu_device *adev)
 
adev->nbio_funcs->ih_control(adev);
 
-   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL);
+   ih = &adev->irq.ih;
/* Ring Buffer base. [39:8] of 40-bit address of the beginning of the 
ring buffer*/
-   WREG32_SOC15(OSSSYS, 0, mmIH_RB_BASE, adev->irq.ih.gpu_addr >> 8);
-   WREG32_SOC15(OSSSYS, 0, mmIH_RB_BASE_HI,
-(adev->irq.ih.gpu_addr >> 40) & 0xff);

[PATCH 02/11] drm/amdgpu: send IVs to the KFD only after processing them v2

2018-11-30 Thread Christian König
This allows us to filter out VM faults in the GMC code.

v2: don't filter out all faults

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 29 +++--
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 6b6524f04ce0..6db4c58ddc13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -149,9 +149,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev,
if (!amdgpu_ih_prescreen_iv(adev))
return;
 
-   /* Before dispatching irq to IP blocks, send it to amdkfd */
-   amdgpu_amdkfd_interrupt(adev, (const void *) &ih->ring[ring_index]);
-
entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
amdgpu_ih_decode_iv(adev, &entry);
 
@@ -371,29 +368,31 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
unsigned client_id = entry->client_id;
unsigned src_id = entry->src_id;
struct amdgpu_irq_src *src;
+   bool handled = false;
int r;
 
trace_amdgpu_iv(entry);
 
if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
-   DRM_DEBUG("Invalid client_id in IV: %d\n", client_id);
+   DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
return;
}
 
if (src_id >= AMDGPU_MAX_IRQ_SRC_ID) {
-   DRM_DEBUG("Invalid src_id in IV: %d\n", src_id);
+   DRM_ERROR("Invalid src_id in IV: %d\n", src_id);
return;
}
 
if (adev->irq.virq[src_id]) {
generic_handle_irq(irq_find_mapping(adev->irq.domain, src_id));
-   } else {
-   if (!adev->irq.client[client_id].sources) {
-   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: 
%d\n",
- client_id, src_id);
-   return;
-   }
+   return;
+   }
 
+   if (!adev->irq.client[client_id].sources) {
+   DRM_DEBUG("Unregistered interrupt client_id: %d src_id: %d\n",
+ client_id, src_id);
+   return;
+   } else {
src = adev->irq.client[client_id].sources[src_id];
if (!src) {
DRM_DEBUG("Unhandled interrupt src_id: %d\n", src_id);
@@ -401,9 +400,15 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
}
 
r = src->funcs->process(adev, src, entry);
-   if (r)
+   if (r < 0)
DRM_ERROR("error processing interrupt (%d)\n", r);
+   else if (r)
+   handled = true;
}
+
+   /* Send it to amdkfd as well if it isn't already handled */
+   if (!handled)
+   amdgpu_amdkfd_interrupt(adev, entry->iv_entry);
 }
 
 /**
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/11] drm/amdgpu: add support for self irq on Vega10

2018-11-30 Thread Christian König
This finally enables processing of ring 1 & 2.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 562fd036bb9e..f5c5ea628fdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -258,7 +258,7 @@ static void vega10_ih_irq_disable(struct amdgpu_device 
*adev)
 static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
  struct amdgpu_ih_ring *ih)
 {
-   u32 wptr, tmp;
+   u32 wptr, reg, tmp;
 
wptr = le32_to_cpu(*ih->wptr_cpu);
 
@@ -274,9 +274,18 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
 wptr, ih->rptr, tmp);
ih->rptr = tmp;
 
-   tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
+   if (ih == &adev->irq.ih)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
+   else if (ih == &adev->irq.ih1)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
+   else if (ih == &adev->irq.ih2)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
+   else
+   BUG();
+
+   tmp = RREG32_NO_KIQ(reg);
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
-   WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
+   WREG32_NO_KIQ(reg, tmp);
}
return (wptr & ih->ptr_mask);
 }
@@ -338,9 +347,52 @@ static void vega10_ih_set_rptr(struct amdgpu_device *adev,
/* XXX check if swapping is necessary on BE */
*ih->rptr_cpu = ih->rptr;
WDOORBELL32(ih->doorbell_index, ih->rptr);
-   } else {
+   } else if (ih == &adev->irq.ih) {
WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR, ih->rptr);
+   } else if (ih == &adev->irq.ih1) {
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING1, ih->rptr);
+   } else if (ih == &adev->irq.ih2) {
+   WREG32_SOC15(OSSSYS, 0, mmIH_RB_RPTR_RING2, ih->rptr);
+   }
+}
+
+/**
+ * vega10_ih_self_irq - dispatch work for ring 1 and 2
+ *
+ * @adev: amdgpu_device pointer
+ * @source: irq source
+ * @entry: IV with WPTR update
+ *
+ * Update the WPTR from the IV and schedule work to handle the entries.
+ */
+static int vega10_ih_self_irq(struct amdgpu_device *adev,
+ struct amdgpu_irq_src *source,
+ struct amdgpu_iv_entry *entry)
+{
+   uint32_t wptr = cpu_to_le32(entry->src_data[0]);
+
+   switch (entry->ring_id) {
+   case 1:
+   *adev->irq.ih1.wptr_cpu = wptr;
+   schedule_work(&adev->irq.ih1_work);
+   break;
+   case 2:
+   *adev->irq.ih2.wptr_cpu = wptr;
+   schedule_work(&adev->irq.ih2_work);
+   break;
+   default: break;
}
+   return 0;
+}
+
+static const struct amdgpu_irq_src_funcs vega10_ih_self_irq_funcs = {
+   .process = vega10_ih_self_irq,
+};
+
+static void vega10_ih_set_self_irq_funcs(struct amdgpu_device *adev)
+{
+   adev->irq.self_irq.num_types = 0;
+   adev->irq.self_irq.funcs = &vega10_ih_self_irq_funcs;
 }
 
 static int vega10_ih_early_init(void *handle)
@@ -348,13 +400,19 @@ static int vega10_ih_early_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
vega10_ih_set_interrupt_funcs(adev);
+   vega10_ih_set_self_irq_funcs(adev);
return 0;
 }
 
 static int vega10_ih_sw_init(void *handle)
 {
-   int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   int r;
+
+   r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_IH, 0,
+ &adev->irq.self_irq);
+   if (r)
+   return r;
 
r = amdgpu_ih_ring_init(adev, &adev->irq.ih, 256 * 1024, true);
if (r)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/11] drm/amdgpu: add missing error handling

2018-11-30 Thread Christian König
We ignored the return code here.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 3a4e5d8d5162..e329a23e1f99 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -902,6 +902,9 @@ static int gmc_v9_0_sw_init(void *handle)
/* This interrupt is VMC page fault.*/
r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VMC, 
VMC_1_0__SRCID__VM_FAULT,
&adev->gmc.vm_fault);
+   if (r)
+   return r;
+
r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_UTCL2, 
UTCL2_1_0__SRCID__FAULT,
&adev->gmc.vm_fault);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/11] drm/amdgpu: move IV prescreening into the GMC code

2018-11-30 Thread Christian König
The GMC/VM subsystem is causing the faults, so move the handling here as
well.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c |  4 --
 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 13 
 drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 13 
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 59 ++
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 13 
 drivers/gpu/drm/amd/amdgpu/si_ih.c  | 14 -
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 13 
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  | 82 -
 9 files changed, 59 insertions(+), 154 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
index 9ce8c93ec19b..f877bb78d10a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
@@ -51,14 +51,12 @@ struct amdgpu_ih_ring {
 struct amdgpu_ih_funcs {
/* ring read/write ptr handling, called from interrupt context */
u32 (*get_wptr)(struct amdgpu_device *adev);
-   bool (*prescreen_iv)(struct amdgpu_device *adev);
void (*decode_iv)(struct amdgpu_device *adev,
  struct amdgpu_iv_entry *entry);
void (*set_rptr)(struct amdgpu_device *adev);
 };
 
 #define amdgpu_ih_get_wptr(adev) (adev)->irq.ih_funcs->get_wptr((adev))
-#define amdgpu_ih_prescreen_iv(adev) (adev)->irq.ih_funcs->prescreen_iv((adev))
 #define amdgpu_ih_decode_iv(adev, iv) (adev)->irq.ih_funcs->decode_iv((adev), 
(iv))
 #define amdgpu_ih_set_rptr(adev) (adev)->irq.ih_funcs->set_rptr((adev))
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 6db4c58ddc13..318874952bc2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -145,10 +145,6 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev,
u32 ring_index = ih->rptr >> 2;
struct amdgpu_iv_entry entry;
 
-   /* Prescreening of high-frequency interrupts */
-   if (!amdgpu_ih_prescreen_iv(adev))
-   return;
-
entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
amdgpu_ih_decode_iv(adev, &entry);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index 3e6c8c4067cb..8a8b4967a101 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -228,18 +228,6 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
  * [127:96] - reserved
  */
 
-/**
- * cik_ih_prescreen_iv - prescreen an interrupt vector
- *
- * @adev: amdgpu_device pointer
- *
- * Returns true if the interrupt vector should be further processed.
- */
-static bool cik_ih_prescreen_iv(struct amdgpu_device *adev)
-{
-   return true;
-}
-
  /**
  * cik_ih_decode_iv - decode an interrupt vector
  *
@@ -445,7 +433,6 @@ static const struct amd_ip_funcs cik_ih_ip_funcs = {
 
 static const struct amdgpu_ih_funcs cik_ih_funcs = {
.get_wptr = cik_ih_get_wptr,
-   .prescreen_iv = cik_ih_prescreen_iv,
.decode_iv = cik_ih_decode_iv,
.set_rptr = cik_ih_set_rptr
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
index 447b3cbc47e5..9d3ea298e116 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
@@ -207,18 +207,6 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev)
return (wptr & adev->irq.ih.ptr_mask);
 }
 
-/**
- * cz_ih_prescreen_iv - prescreen an interrupt vector
- *
- * @adev: amdgpu_device pointer
- *
- * Returns true if the interrupt vector should be further processed.
- */
-static bool cz_ih_prescreen_iv(struct amdgpu_device *adev)
-{
-   return true;
-}
-
 /**
  * cz_ih_decode_iv - decode an interrupt vector
  *
@@ -426,7 +414,6 @@ static const struct amd_ip_funcs cz_ih_ip_funcs = {
 
 static const struct amdgpu_ih_funcs cz_ih_funcs = {
.get_wptr = cz_ih_get_wptr,
-   .prescreen_iv = cz_ih_prescreen_iv,
.decode_iv = cz_ih_decode_iv,
.set_rptr = cz_ih_set_rptr
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index e329a23e1f99..8995fca47efb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -244,6 +244,62 @@ static int gmc_v9_0_vm_fault_interrupt_state(struct 
amdgpu_device *adev,
return 0;
 }
 
+/**
+ * vega10_ih_prescreen_iv - prescreen an interrupt vector
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Returns true if the interrupt vector should be further processed.
+ */
+static bool gmc_v9_0_prescreen_iv(struct amdgpu_device *adev,
+ struct amdgpu_iv_entry *entry,
+ uint64_t addr)
+{
+   struct amdgpu_vm *vm;
+   u64 key;
+   int r;
+
+   /* No PASID, can't identify faulting process */
+   if (!entry->pasid)
+   return tr

[PATCH 03/11] drm/amdgpu: remove VM fault_credit handling

2018-11-30 Thread Christian König
printk_ratelimit() is much better suited to limit the number of reported
VM faults.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 37 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  5 
 drivers/gpu/drm/amd/amdgpu/cik_ih.c | 18 +---
 drivers/gpu/drm/amd/amdgpu/cz_ih.c  | 18 +---
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 18 +---
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c   | 18 +---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c  |  7 ++---
 7 files changed, 6 insertions(+), 115 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 2acb9838913e..a2f149da83f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3057,7 +3057,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
}
 
INIT_KFIFO(vm->faults);
-   vm->fault_credit = 16;
 
return 0;
 
@@ -3269,42 +3268,6 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
amdgpu_vmid_free_reserved(adev, vm, i);
 }
 
-/**
- * amdgpu_vm_pasid_fault_credit - Check fault credit for given PASID
- *
- * @adev: amdgpu_device pointer
- * @pasid: PASID do identify the VM
- *
- * This function is expected to be called in interrupt context.
- *
- * Returns:
- * True if there was fault credit, false otherwise
- */
-bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
- unsigned int pasid)
-{
-   struct amdgpu_vm *vm;
-
-   spin_lock(&adev->vm_manager.pasid_lock);
-   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
-   if (!vm) {
-   /* VM not found, can't track fault credit */
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return true;
-   }
-
-   /* No lock needed. only accessed by IRQ handler */
-   if (!vm->fault_credit) {
-   /* Too many faults in this VM */
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return false;
-   }
-
-   vm->fault_credit--;
-   spin_unlock(&adev->vm_manager.pasid_lock);
-   return true;
-}
-
 /**
  * amdgpu_vm_manager_init - init the VM manager
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 2a8898d19c8b..e8dcfd59fc93 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -229,9 +229,6 @@ struct amdgpu_vm {
/* Up to 128 pending retry page faults */
DECLARE_KFIFO(faults, u64, 128);
 
-   /* Limit non-retry fault storms */
-   unsigned intfault_credit;
-
/* Points to the KFD process VM info */
struct amdkfd_process_info *process_info;
 
@@ -299,8 +296,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, 
unsigned int pasid);
 void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm 
*vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
-bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
- unsigned int pasid);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 struct list_head *validated,
 struct amdgpu_bo_list_entry *entry);
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
index b5775c6a857b..3e6c8c4067cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c
@@ -237,23 +237,7 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev)
  */
 static bool cik_ih_prescreen_iv(struct amdgpu_device *adev)
 {
-   u32 ring_index = adev->irq.ih.rptr >> 2;
-   u16 pasid;
-
-   switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) {
-   case 146:
-   case 147:
-   pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16;
-   if (!pasid || amdgpu_vm_pasid_fault_credit(adev, pasid))
-   return true;
-   break;
-   default:
-   /* Not a VM fault */
-   return true;
-   }
-
-   adev->irq.ih.rptr += 16;
-   return false;
+   return true;
 }
 
  /**
diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c 
b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
index df5ac4d85a00..447b3cbc47e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c
@@ -216,23 +216,7 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev)
  */
 static bool cz_ih_prescreen_iv(struct amdgpu_device *adev)
 {
-   u32 ring_index = adev->irq.ih.rptr >> 2;
-   u16 pasid;
-
-   switch (le32_to_cpu(adev->irq.ih.ring[ring_index]) & 0xff) {
-   case 146:
-   case 147:
-   pasid = le32_to_cpu(adev->irq.ih.ring[ring_index + 2]) >> 16;
-

[PATCH 08/11] drm/amdgpu: add the IH to the IV trace

2018-11-30 Thread Christian König
To distinct on which IH ring an IV was found.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c   |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 11 +++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 318874952bc2..127ed7de841d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -148,6 +148,8 @@ static void amdgpu_irq_callback(struct amdgpu_device *adev,
entry.iv_entry = (const uint32_t *)&ih->ring[ring_index];
amdgpu_ih_decode_iv(adev, &entry);
 
+   trace_amdgpu_iv(ih - &adev->irq.ih, &entry);
+
amdgpu_irq_dispatch(adev, &entry);
 }
 
@@ -367,8 +369,6 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev,
bool handled = false;
int r;
 
-   trace_amdgpu_iv(entry);
-
if (client_id >= AMDGPU_IRQ_CLIENTID_MAX) {
DRM_ERROR("Invalid client_id in IV: %d\n", client_id);
return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 626abca770a0..6e388d7753c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -76,9 +76,10 @@ TRACE_EVENT(amdgpu_mm_wreg,
 );
 
 TRACE_EVENT(amdgpu_iv,
-   TP_PROTO(struct amdgpu_iv_entry *iv),
-   TP_ARGS(iv),
+   TP_PROTO(unsigned ih, struct amdgpu_iv_entry *iv),
+   TP_ARGS(ih, iv),
TP_STRUCT__entry(
+__field(unsigned, ih)
 __field(unsigned, client_id)
 __field(unsigned, src_id)
 __field(unsigned, ring_id)
@@ -90,6 +91,7 @@ TRACE_EVENT(amdgpu_iv,
 __array(unsigned, src_data, 4)
),
TP_fast_assign(
+  __entry->ih = ih;
   __entry->client_id = iv->client_id;
   __entry->src_id = iv->src_id;
   __entry->ring_id = iv->ring_id;
@@ -103,8 +105,9 @@ TRACE_EVENT(amdgpu_iv,
   __entry->src_data[2] = iv->src_data[2];
   __entry->src_data[3] = iv->src_data[3];
   ),
-   TP_printk("client_id:%u src_id:%u ring:%u vmid:%u timestamp: %llu 
pasid:%u src_data: %08x %08x %08x %08x",
- __entry->client_id, __entry->src_id,
+   TP_printk("ih: %u client_id:%u src_id:%u ring:%u vmid:%u "
+ "timestamp: %llu pasid:%u src_data: %08x %08x %08x %08x",
+ __entry->ih, __entry->client_id, __entry->src_id,
  __entry->ring_id, __entry->vmid,
  __entry->timestamp, __entry->pasid,
  __entry->src_data[0], __entry->src_data[1],
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: enlarge maximum waiting time of KIQ

2018-11-30 Thread Christian König

Am 30.11.18 um 10:48 schrieb wentalou:

SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset.
late_init failed occasionally if overlapped with another VF’s reset.
MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue.

Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07
Signed-off-by: wentalou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c8ad6bf..26e2455 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -235,7 +235,7 @@ enum amdgpu_kiq_irq {
  
  #define MAX_KIQ_REG_WAIT   5000 /* in usecs, 5ms */

  #define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
-#define MAX_KIQ_REG_TRY 20
+#define MAX_KIQ_REG_TRY 80 /* SWDEV-171843: 20 -> 80 */


Please don't add any internal ticket numbers to public source code.

Christian.

  
  int amdgpu_device_ip_set_clockgating_state(void *dev,

   enum amd_ip_block_type block_type,


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: enlarge maximum waiting time of KIQ

2018-11-30 Thread wentalou
SWDEV-171843: KIQ in VF’s init delayed by another VF’s reset.
late_init failed occasionally if overlapped with another VF’s reset.
MAX_KIQ_REG_TRY enlarged from 20 to 80 would fix this issue.

Change-Id: I841774bdd9ebf125c5aa2046b1dcebd65e07
Signed-off-by: wentalou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c8ad6bf..26e2455 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -235,7 +235,7 @@ enum amdgpu_kiq_irq {
 
 #define MAX_KIQ_REG_WAIT   5000 /* in usecs, 5ms */
 #define MAX_KIQ_REG_BAILOUT_INTERVAL   5 /* in msecs, 5ms */
-#define MAX_KIQ_REG_TRY 20
+#define MAX_KIQ_REG_TRY 80 /* SWDEV-171843: 20 -> 80 */
 
 int amdgpu_device_ip_set_clockgating_state(void *dev,
   enum amd_ip_block_type block_type,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3

2018-11-30 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Friday, November 30, 2018 5:15 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for
> amdgpu v3
> 
[snip]
> >> +drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
> >> +   uint32_t *handles, uint64_t *points,
> > This interfaces is public to umd, I think they like "uint64_t
> > **points" for batch query, I've verified before, it works well and
> > more convenience.
> > If removing num_handles, that means only one syncobj to query, I agree
> > with "uint64_t *point".
> 
> "handles" as well as "points" are an array of objects. If the UMD wants to
> write the points to separate locations it can do so manually after calling the
> function.

Ok, it doesn't matter.

-David
> 
> It doesn't make any sense that libdrm or the kernel does the extra
> indirection, the transferred pointers are 64bit as well (even on a 32bit
> system) so the overhead is identical.
> 
> Adding another indirection just makes the implementation unnecessary
> complex.


> 
> Christian.
> 
> >
> > -David
> >> +   unsigned num_handles) {
> >> +    if (NULL == dev)
> >> +    return -EINVAL;
> >> +
> >> +    return drmSyncobjQuery(dev->fd, handles, points, num_handles); }
> >> +
> >>   drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
> >>   uint32_t handle,
> >>   int *shared_fd)
> >

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3

2018-11-30 Thread Christian König

Am 30.11.18 um 08:35 schrieb zhoucm1:



On 2018年11月28日 22:50, Christian König wrote:

From: Chunming Zhou 

v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
  amdgpu/amdgpu-symbol-check |  2 ++
  amdgpu/amdgpu.h    | 39 
+++

  amdgpu/amdgpu_cs.c | 23 +++
  3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 6f5e0f95..4553736f 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -49,8 +49,10 @@ amdgpu_cs_submit
  amdgpu_cs_submit_raw
  amdgpu_cs_syncobj_export_sync_file
  amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
  amdgpu_cs_syncobj_reset
  amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
  amdgpu_cs_syncobj_wait
  amdgpu_cs_wait_fences
  amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dc51659a..330658a0 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1489,6 +1489,45 @@ int 
amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,

 int64_t timeout_nsec, unsigned flags,
 uint32_t *first_signaled);
  +/**
+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev    - \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+    uint32_t *handles, uint64_t *points,
+    unsigned num_handles,
+    int64_t timeout_nsec, unsigned flags,
+    uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev    - \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which 
presents

+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+    uint32_t *handles, uint64_t *points,
+    unsigned num_handles);
+
  /**
   *  Export kernel sync object to shareable fd.
   *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 3b8231aa..e4a547c6 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -661,6 +661,29 @@ drm_public int 
amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,

    flags, first_signaled);
  }
  +drm_public int 
amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,

+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled)
+{
+    if (NULL == dev)
+    return -EINVAL;
+
+    return drmSyncobjTimelineWait(dev->fd, handles, points, 
num_handles,

+  timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
This interfaces is public to umd, I think they like "uint64_t 
**points" for batch query, I've verified before, it works well and 
more convenience.
If removing num_handles, that means only one syncobj to query, I agree 
with "uint64_t *point".


"handles" as well as "points" are an array of objects. If the UMD wants 
to write the points to separate locations it can do so manually after 
calling the function.


It doesn't make any sense that libdrm or the kernel does the extra 
indirection, the transferred pointers are 64bit as well (even on a 32bit 
system) so the overhead is identical.


Adding another indirection just makes the implementation unnecessary 
complex.


Christian.



-David

+   unsigned num_handles)
+{
+    if (NULL == dev)
+    return -EINVAL;
+
+    return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
  drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
  uint32_t handle,
  int *shared_fd)




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities

2018-11-30 Thread Christian König

Am 30.11.18 um 03:00 schrieb Alex Deucher:

Looks like it was missed when setting support was added.

Signed-off-by: Alex Deucher 


Reviewed-by: Christian König 


---

This is a legit bug fix.  the rest of this series needs more work.

drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index f9b54236102d..95f4c4139fc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -39,6 +39,7 @@ const unsigned int amdgpu_ctx_num_entities[AMDGPU_HW_IP_NUM] 
= {
[AMDGPU_HW_IP_UVD_ENC]  =   1,
[AMDGPU_HW_IP_VCN_DEC]  =   1,
[AMDGPU_HW_IP_VCN_ENC]  =   1,
+   [AMDGPU_HW_IP_VCN_JPEG] =   1,
  };
  
  static int amdgput_ctx_total_num_entities(void)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: Handle xgmi device removal and add reset wq.

2018-11-30 Thread Christian König

Am 29.11.18 um 21:36 schrieb Andrey Grodzovsky:

XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

Add per hive wq to allow all the nodes in hive to run resets
concurently - this should speed up the total reset time to avoid
breaching the PSP FW timeout.


Do you really want a workqueue? That sounds like the exact opposite of 
what you describe here.


E.g. a workqueue is there to serialize all work items scheduled to it.

What you most likely rather want is a work item per device which are 
scheduled, run in parallel and are joined back together before dropping 
the per hive lock.


Christian.



Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 24 
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  3 +++
  2 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index fb37e69..9ac2dc5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -61,6 +61,8 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
INIT_LIST_HEAD(&tmp->device_list);
mutex_init(&tmp->hive_lock);
  
+	tmp->reset_queue = alloc_workqueue("xgmi-hive", WQ_UNBOUND | WQ_HIGHPRI, 0);

+
return tmp;
  }
  
@@ -135,3 +137,25 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)

mutex_unlock(&xgmi_mutex);
return ret;
  }
+
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
+{
+   struct amdgpu_hive_info *hive;
+
+   if ((adev->asic_type < CHIP_VEGA20) || (adev->flags & AMD_IS_APU))
+   return;
+
+   mutex_lock(&xgmi_mutex);
+
+   hive = amdgpu_get_xgmi_hive(adev);
+   if (!hive)
+   goto exit;
+
+   if (!(hive->number_devices--)) {
+   mutex_destroy(&hive->hive_lock);
+   destroy_workqueue(hive->reset_queue);
+   }
+
+exit:
+   mutex_unlock(&xgmi_mutex);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6335bfd..285ab93 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -30,10 +30,13 @@ struct amdgpu_hive_info {
struct psp_xgmi_topology_info   topology_info;
int number_devices;
struct mutex hive_lock;
+   /* hive members reset wq */
+   struct workqueue_struct *reset_queue;
  };
  
  struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev);

  int amdgpu_xgmi_update_topology(struct amdgpu_hive_info *hive, struct 
amdgpu_device *adev);
  int amdgpu_xgmi_add_device(struct amdgpu_device *adev);
+void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
  
  #endif


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx