Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
expect it'll come and bite us pretty bad. > > Fundamentally you can't have indefinite fences with implicit synced > CS, and long running jobs is just one of these indefinite fences. Adding Jason, he's been involved in a lot of the indefinite fence discussion, especially around vk. -Daniel &

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
t; John > > > Original Message > > > From: Daniel Vetter > > > Sent: Wednesday, February 3, 2021 9:27 AM > > > To: Alex Deucher > > > Cc: Linux Kernel Mailing List; dri-devel; amd-gfx list; Deucher, > > > Alexander; Daniel Gomez; Koenig, Christian &

Re: [amdgpu] deadlock

2021-02-03 Thread Alex Deucher
y, February 3, 2021 9:27 AM > > To: Alex Deucher > > Cc: Linux Kernel Mailing List; dri-devel; amd-gfx list; Deucher, Alexander; > > Daniel Gomez; Koenig, Christian > > Subject: Re: [amdgpu] deadlock > > > > > > On Wed, Feb 03, 2021 at 08:56:17AM -0500, Al

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Gomez
On Wed, 3 Feb 2021 at 15:37, Christian König wrote: > > Hi Daniel, > > I've talked a bit with our internal team. > > The problem is that the 20.20 release still uses the older OpenCL stack > which obviously has a bug here and causes a hang. > > The best approach I can give you is to switch to the

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
Mailing List; dri-devel; amd-gfx list; Deucher, Alexander; > Daniel Gomez; Koenig, Christian > Subject: Re: [amdgpu] deadlock > > > On Wed, Feb 03, 2021 at 08:56:17AM -0500, Alex Deucher wrote: > > On Wed, Feb 3, 2021 at 7:30 AM Christian König > > wrote: > > &g

Re: [amdgpu] deadlock

2021-02-03 Thread Christian König
Hi Daniel, I've talked a bit with our internal team. The problem is that the 20.20 release still uses the older OpenCL stack which obviously has a bug here and causes a hang. The best approach I can give you is to switch to the ROCm stack instead. Regards, Christian. Am 03.02.21 um 09:33 sc

Re: [amdgpu] deadlock

2021-02-03 Thread Bridgman, John
m: Daniel Vetter Sent: Wednesday, February 3, 2021 9:27 AM To: Alex Deucher Cc: Linux Kernel Mailing List; dri-devel; amd-gfx list; Deucher, Alexander; Daniel Gomez; Koenig, Christian Subject: Re: [amdgpu] deadlock On Wed, Feb 03, 2021 at 08:56:17AM -0500, Alex Deucher wrote: > On Wed, Feb 3, 2021 at

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
On Wed, Feb 03, 2021 at 08:56:17AM -0500, Alex Deucher wrote: > On Wed, Feb 3, 2021 at 7:30 AM Christian König > wrote: > > > > Am 03.02.21 um 13:24 schrieb Daniel Vetter: > > > On Wed, Feb 03, 2021 at 01:21:20PM +0100, Christian König wrote: > > >> Am 03.02.21 um 12:45 schrieb Daniel Gomez: > >

Re: [amdgpu] deadlock

2021-02-03 Thread Alex Deucher
On Wed, Feb 3, 2021 at 7:30 AM Christian König wrote: > > Am 03.02.21 um 13:24 schrieb Daniel Vetter: > > On Wed, Feb 03, 2021 at 01:21:20PM +0100, Christian König wrote: > >> Am 03.02.21 um 12:45 schrieb Daniel Gomez: > >>> On Wed, 3 Feb 2021 at 10:47, Daniel Gomez wrote: > On Wed, 3 Feb 20

Re: [amdgpu] deadlock

2021-02-03 Thread Christian König
Am 03.02.21 um 13:24 schrieb Daniel Vetter: On Wed, Feb 03, 2021 at 01:21:20PM +0100, Christian König wrote: Am 03.02.21 um 12:45 schrieb Daniel Gomez: On Wed, 3 Feb 2021 at 10:47, Daniel Gomez wrote: On Wed, 3 Feb 2021 at 10:17, Daniel Vetter wrote: On Wed, Feb 3, 2021 at 9:51 AM Christian

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
On Wed, Feb 03, 2021 at 01:21:20PM +0100, Christian König wrote: > Am 03.02.21 um 12:45 schrieb Daniel Gomez: > > On Wed, 3 Feb 2021 at 10:47, Daniel Gomez wrote: > > > On Wed, 3 Feb 2021 at 10:17, Daniel Vetter wrote: > > > > On Wed, Feb 3, 2021 at 9:51 AM Christian König > > > > wrote: > > >

Re: [amdgpu] deadlock

2021-02-03 Thread Christian König
Am 03.02.21 um 12:45 schrieb Daniel Gomez: On Wed, 3 Feb 2021 at 10:47, Daniel Gomez wrote: On Wed, 3 Feb 2021 at 10:17, Daniel Vetter wrote: On Wed, Feb 3, 2021 at 9:51 AM Christian König wrote: Am 03.02.21 um 09:48 schrieb Daniel Vetter: On Wed, Feb 3, 2021 at 9:36 AM Christian König wr

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Gomez
On Wed, 3 Feb 2021 at 10:47, Daniel Gomez wrote: > > On Wed, 3 Feb 2021 at 10:17, Daniel Vetter wrote: > > > > On Wed, Feb 3, 2021 at 9:51 AM Christian König > > wrote: > > > > > > Am 03.02.21 um 09:48 schrieb Daniel Vetter: > > > > On Wed, Feb 3, 2021 at 9:36 AM Christian König > > > > wrote

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Gomez
On Wed, 3 Feb 2021 at 10:17, Daniel Vetter wrote: > > On Wed, Feb 3, 2021 at 9:51 AM Christian König > wrote: > > > > Am 03.02.21 um 09:48 schrieb Daniel Vetter: > > > On Wed, Feb 3, 2021 at 9:36 AM Christian König > > > wrote: > > >> Hi Daniel, > > >> > > >> this is not a deadlock, but rather

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
On Wed, Feb 3, 2021 at 9:51 AM Christian König wrote: > > Am 03.02.21 um 09:48 schrieb Daniel Vetter: > > On Wed, Feb 3, 2021 at 9:36 AM Christian König > > wrote: > >> Hi Daniel, > >> > >> this is not a deadlock, but rather a hardware lockup. > > Are you sure? Ime getting stuck in dma_fence_wai

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Gomez
On Wed, 3 Feb 2021 at 09:51, Christian König wrote: > > Am 03.02.21 um 09:48 schrieb Daniel Vetter: > > On Wed, Feb 3, 2021 at 9:36 AM Christian König > > wrote: > >> Hi Daniel, > >> > >> this is not a deadlock, but rather a hardware lockup. > > Are you sure? Ime getting stuck in dma_fence_wait

Re: [amdgpu] deadlock

2021-02-03 Thread Christian König
Am 03.02.21 um 09:48 schrieb Daniel Vetter: On Wed, Feb 3, 2021 at 9:36 AM Christian König wrote: Hi Daniel, this is not a deadlock, but rather a hardware lockup. Are you sure? Ime getting stuck in dma_fence_wait has generally good chance of being a dma_fence deadlock. GPU hang should never r

Re: [amdgpu] deadlock

2021-02-03 Thread Daniel Vetter
On Wed, Feb 3, 2021 at 9:36 AM Christian König wrote: > > Hi Daniel, > > this is not a deadlock, but rather a hardware lockup. Are you sure? Ime getting stuck in dma_fence_wait has generally good chance of being a dma_fence deadlock. GPU hang should never result in a forever stuck dma_fence. Dan

Re: [amdgpu] deadlock

2021-02-03 Thread Christian König
Hi Daniel, this is not a deadlock, but rather a hardware lockup. Which OpenCl stack are you using? Regards, Christian. Am 03.02.21 um 09:33 schrieb Daniel Gomez: Hi all, I have a deadlock with the amdgpu mainline driver when running in parallel two OpenCL applications. So far, we've been abl

[amdgpu] deadlock

2021-02-03 Thread Daniel Gomez
Hi all, I have a deadlock with the amdgpu mainline driver when running in parallel two OpenCL applications. So far, we've been able to replicate it easily by executing clinfo and MatrixMultiplication (from AMD opencl-samples). It's quite old the opencl-samples so, if you have any other suggestion