RE: [PATCH 4/8] drm/amdgpu: convert kiq ring_mutex to a spinlock

2017-05-08 Thread Liu, Monk
>Can you explain your reasoning behind your current position that the KIQ >shouldn't be used by baremetal amdgpu? [ML] I didn't mean KIQ shouldn't leveraged by bare-metal, instead how it is used by bare-metal is none of my interest ... I mean it better not be used under SR-IOV case by other

Re: [RFC] Problems with SRBM select on KIQ

2017-05-08 Thread zhoucm1
On 2017年05月06日 06:57, Felix Kuehling wrote: We ran into a similar problem when we played with priorities on KFD queues. You can't change an MQD of a currently mapped queue. To change a queue priority we need to unmap it, update the MQD, and then map it again. I wonder if there is similar

Re: Soliciting DRM feedback on latest DC rework

2017-05-08 Thread Harry Wentland
On 2017-05-08 03:07 PM, Dave Airlie wrote: On 9 May 2017 at 04:54, Harry Wentland wrote: Hi Daniel, Thanks for taking the time to look at DC. I had a couple more questions/comments in regard to the patch you posted on IRC: http://paste.debian.net/plain/930704 My

Re: [PATCH] drm/amdgpu: remove unsed amdgpu_gem_handle_lockup

2017-05-08 Thread Alex Deucher
On Mon, May 8, 2017 at 9:25 AM, Christian König wrote: > From: Christian König > > This kind of reset handling was removed a long time ago. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher

Re: [PATCH 1/2] drm/amdgpu/atomfirmware: add function to update engine hang status

2017-05-08 Thread Alex Deucher
On Fri, May 5, 2017 at 10:27 AM, Alex Deucher wrote: > Update the scratch reg for when the engine is hung. > > Signed-off-by: Alex Deucher ping on this series. Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 13 + >

Re: [PATCH] iommu/amd: flush IOTLB for specific domains only

2017-05-08 Thread Daniel Drake
On Wed, Apr 5, 2017 at 9:01 AM, Nath, Arindam wrote: > > >-Original Message- > >From: Daniel Drake [mailto:dr...@endlessm.com] > >Sent: Thursday, March 30, 2017 7:15 PM > >To: Nath, Arindam > >Cc: j...@8bytes.org; Deucher, Alexander; Bridgman, John; amd- >

[PATCH] gpu: drm: amd: amdgpu: remove dead code

2017-05-08 Thread Gustavo A. R. Silva
Local variable use_doorbell is assigned to a constant value and it is never updated again. Remove this variable and the dead code it guards. Addresses-Coverity-ID: 1401828 Signed-off-by: Gustavo A. R. Silva --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 53

RE: [PATCH] drm/amdgpu: fix errors in comments.

2017-05-08 Thread Deucher, Alexander
> -Original Message- > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf > Of Alex Xie > Sent: Monday, May 08, 2017 11:32 AM > To: amd-gfx@lists.freedesktop.org > Cc: Xie, AlexBin > Subject: [PATCH] drm/amdgpu: fix errors in comments. > > Signed-off-by: Alex Xie

Re: [PATCH] drm/amdgpu/gfx6: flush caches after IB with the correct vmid

2017-05-08 Thread Nicolai Hähnle
Unfortunately, further testing shows that this doesn't actually fix the problem. FWIW, that test runs very reliably on SI with the radeon drm, but with the amdgpu drm it fails. VI is fine on amdgpu, which is why I was sent down this road. Anyway, back to trying to figure this out :/ Cheers,

RE: [PATCH 4/8] drm/amdgpu: convert kiq ring_mutex to a spinlock

2017-05-08 Thread Andres Rodriguez
On 2017-05-08 02:08 AM, Liu, Monk wrote: > Andres > > Some previous patches like move KIQ mutex-lock from amdgpu_virt to common > place jumped my NAK, but from technique perspective it's no matter anyway, > But this patch and the following patches are go to a dead end, > > 1, Don't use KIQ

[PATCH] drm/amdgpu: fix errors in comments.

2017-05-08 Thread Alex Xie
Signed-off-by: Alex Xie --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 66bb60e..aab3206 100644 ---

Re: [PATCH 4/4] drm/amdgpu/SRIOV:implement guilty job TDR (V2)

2017-05-08 Thread Christian König
Am 08.05.2017 um 09:01 schrieb Liu, Monk: @Christian This one is changed to guilty job scheme accordingly with your response BR Monk -Original Message- From: Monk Liu [mailto:monk@amd.com] Sent: Monday, May 08, 2017 3:00 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Monk

Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Christian König
Because we can always rely on TDR and HYPERVISOR to detect GPU hang and resubmit malicious jobs or even kick them out later, and the gpu reset will eventually be invoked, so there is no reason to manually and voluntarily call gpu reset under SRIOV case. Well there is a rather good reason, we

RE: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Liu, Monk
The VM fault interrupt or illegal instruction will be delivered to GPU no matter it's SR-IOV or bare-metal case, And I removed them from invoking GPU reset is due to the same reason: Don't trigger gpu reset for sriov case if possible, always beware that trigger GPU reset under SR-IOV is a heavy

Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Christian König
Sounds good, but what do we do with the amdgpu_irq_reset_work_func? Please note that I find that calling amdgpu_gpu_reset() here is a bad idea in the first place. Instead we should consider the scheduler as faulting and let the scheduler handle that as in the same way as a job timeout. But

Re: [PATCH] drm/amdgpu:no debugfs_gpu_reset for SRIOV

2017-05-08 Thread Christian König
Am 08.05.2017 um 11:28 schrieb Monk Liu: Change-Id: Ie9730852da54ceb8b4c2c44acac2df3556a32d17 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git

[PATCH] drm/amdgpu/gfx6: flush caches after IB with the correct vmid

2017-05-08 Thread Nicolai Hähnle
From: Nicolai Hähnle Bring the code in line with what the radeon module does. Without this change, the fence following the IB may be signalled to the CPU even though some data written by shaders may not have been written back yet. This change fixes the OpenGL CTS test

[PATCH] drm/amdgpu:no debugfs_gpu_reset for SRIOV

2017-05-08 Thread Monk Liu
Change-Id: Ie9730852da54ceb8b4c2c44acac2df3556a32d17 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

RE: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Liu, Monk
I agree with disabling debugfs for amdgpu_reset when SRIOV detected. -Original Message- From: Christian König [mailto:deathsim...@vodafone.de] Sent: Monday, May 08, 2017 5:20 PM To: Liu, Monk ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/4] drm/amdgpu:don't

Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Christian König
You know that gpu reset under SR-IOV will have very big impact on all other VFs ... Mhm, good argument. But in this case we need to give at least some warning message instead of doing nothing. Or even better disable creating the amdgpu_reste debugfs file altogether. This way nobody will

RE: [PATCH 3/4] drm/amdgpu:only call flr_work under infinite timeout

2017-05-08 Thread Liu, Monk
yeah my mistake, thanks for catch -Original Message- From: Christian König [mailto:deathsim...@vodafone.de] Sent: Monday, May 08, 2017 5:11 PM To: Liu, Monk ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 3/4] drm/amdgpu:only call flr_work under infinite timeout

Re: [PATCH 3/4] drm/amdgpu:only call flr_work under infinite timeout

2017-05-08 Thread Christian König
Am 08.05.2017 um 08:51 schrieb Monk Liu: Change-Id: I541aa5109f4fcab06ece4761a09dc7e053ec6837 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git

RE: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Liu, Monk
For SR-IOV use case, we call gpu reset under the case we have no choice ... So many places like debug fs shouldn't a good reason to trigger gpu reset You know that gpu reset under SR-IOV will have very big impact on all other VFs ... BR Monk -Original Message- From: Christian König

Re: [PATCH 2/4] drm/amdgpu:use job* to replace voluntary

2017-05-08 Thread Christian König
Am 08.05.2017 um 08:51 schrieb Monk Liu: that way we can know which job cause hang and can do per sched reset/recovery instead of all sched. Change-Id: Ifc98cd74b2d93823c489de6a89087ba188957eff Signed-off-by: Monk Liu Reviewed-by: Christian König

Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Christian König
Am 08.05.2017 um 08:51 schrieb Monk Liu: because we don't want to do sriov-gpu-reset under certain cases, so just split those two funtion and don't invoke sr-iov one from bare-metal one. Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0 Signed-off-by: Monk Liu ---

[PATCH 4/4] drm/amdgpu/SRIOV:implement guilty job TDR (V2)

2017-05-08 Thread Liu, Monk
@Christian This one is changed to guilty job scheme accordingly with your response BR Monk -Original Message- From: Monk Liu [mailto:monk@amd.com] Sent: Monday, May 08, 2017 3:00 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Monk Subject: [PATCH]

[PATCH] drm/amdgpu/SRIOV:implement guilty job TDR (V2)

2017-05-08 Thread Monk Liu
1,TDR will kickout guilty job if it hang exceed the threshold of the given one from kernel paramter "job_hang_limit", that way a bad command stream will not infinitly cause GPU hang. by default this threshold is 1 so a job will be kicked out after it hang. 2,if a job timeout TDR routine will not

[PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

2017-05-08 Thread Monk Liu
because we don't want to do sriov-gpu-reset under certain cases, so just split those two funtion and don't invoke sr-iov one from bare-metal one. Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---

[PATCH 0/4] TDR guilty job feature

2017-05-08 Thread Monk Liu
for SRIOV gpu reset: this feature allows driver to judge how much time can a job hang for and will kickout this job from ring_mirror list when doing recover if the threshold is exceeded. Monk Liu (4): drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset drm/amdgpu:use job* to replace

RE: [PATCH 4/8] drm/amdgpu: convert kiq ring_mutex to a spinlock

2017-05-08 Thread Liu, Monk
Andres Some previous patches like move KIQ mutex-lock from amdgpu_virt to common place jumped my NAK, but from technique perspective it's no matter anyway, But this patch and the following patches are go to a dead end, 1, Don't use KIQ to access register inside INTR context 2, Don't