Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-02-06 Thread Grodzovsky, Andrey
21:41 To: Grodzovsky, Andrey ; Christian König ; Koenig, Christian ; Lazar, Lijo ; dri-devel@lists.freedesktop.org ; amd-...@lists.freedesktop.org ; Chen, JingWen Cc: Chen, Horace ; Liu, Monk Subject: Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Hi Andrey, I don't

Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-01-28 Thread Grodzovsky, Andrey
Just a gentle ping. Andrey From: Grodzovsky, Andrey Sent: 26 January 2022 10:52 To: Christian König ; Koenig, Christian ; Lazar, Lijo ; dri-devel@lists.freedesktop.org ; amd-...@lists.freedesktop.org ; Chen, JingWen Cc: Chen, Horace ; Liu, Monk Subject: Re

Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)

2021-09-14 Thread Grodzovsky, Andrey
AFAIK this one is independent. Christian, can you confirm ? Andrey From: amd-gfx on behalf of Alex Deucher Sent: 14 September 2021 15:33 To: Christian König Cc: Liu, Monk ; amd-gfx list ; Maling list - DRI developers Subject: Re: [PATCH 1/2] drm/sched: fix

Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3)

2021-08-31 Thread Grodzovsky, Andrey
What about removing (kthread_should_park()) ? We decided it's useless as far as I remember. Andrey From: amd-gfx on behalf of Liu, Monk Sent: 31 August 2021 20:24 To: Liu, Monk ; amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: RE:

Re: [PATCH 0/7] libdrm tests for hot-unplug fe goature

2021-06-03 Thread Grodzovsky, Andrey
Is libdrm on gitlab ? I wasn't aware of this. I assumed code reviews still go through dri-devel. Andrey From: Alex Deucher Sent: 03 June 2021 17:20 To: Grodzovsky, Andrey Cc: Maling list - DRI developers ; amd-gfx list ; Deucher, Alexander ; Christian König

Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object

2021-01-08 Thread Grodzovsky, Andrey
Ok then, I guess I will proceed with the dummy pages list implementation then. Andrey From: Koenig, Christian Sent: 08 January 2021 09:52 To: Grodzovsky, Andrey ; Daniel Vetter Cc: amd-...@lists.freedesktop.org ; dri-devel@lists.freedesktop.org ; daniel.vet

Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug

2020-11-27 Thread Grodzovsky, Andrey
Hey, just a ping on my comments/question bellow. Andrey From: Grodzovsky, Andrey Sent: 25 November 2020 12:39 To: Daniel Vetter Cc: amd-gfx list ; dri-devel ; Christian König ; Rob Herring ; Lucas Stach ; Qiang Yu ; Anholt, Eric ; Pekka Paalanen ; Deucher

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-27 Thread Grodzovsky, Andrey
Hey Daniel, just a ping on a bunch of questions i posted bellow. Andtey From: Grodzovsky, Andrey Sent: 25 November 2020 14:34 To: Daniel Vetter ; Koenig, Christian Cc: r...@kernel.org ; daniel.vet...@ffwll.ch ; dri-devel@lists.freedesktop.org ; e

Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

2019-11-25 Thread Grodzovsky, Andrey
the issue Emily reported can be avoided. Andrey From: Deng, Emily Sent: 25 November 2019 16:44:36 To: Grodzovsky, Andrey Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; Koenig, Christian; steven.pr...@arm.com; Grodzovsky, Andrey Subject: RE

Re: [PATCH] drm/sched: Fix passing zero to 'PTR_ERR' warning

2019-10-29 Thread Grodzovsky, Andrey
On 10/29/19 2:03 PM, Dan Carpenter wrote: > On Tue, Oct 29, 2019 at 11:04:44AM -0400, Andrey Grodzovsky wrote: >> Fix a static code checker warning. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2

Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.

2019-10-25 Thread Grodzovsky, Andrey
On 10/25/19 11:55 AM, Koenig, Christian wrote: > Am 25.10.19 um 16:57 schrieb Grodzovsky, Andrey: >> On 10/25/19 4:44 AM, Christian König wrote: >>> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky: >>>> Problem: >>>> When run_job fails and HW fence returne

Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.

2019-10-25 Thread Grodzovsky, Andrey
On 10/25/19 4:44 AM, Christian König wrote: > Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky: >> Problem: >> When run_job fails and HW fence returned is NULL we still signal >> the s_fence to avoid hangs but the user has no way of knowing if >> the actual HW job was ran and finished. >> >> Fix:

Re: drm_sched with panfrost crash on T820

2019-10-04 Thread Grodzovsky, Andrey
On 10/3/19 4:34 AM, Neil Armstrong wrote: > Hi Andrey, > > Le 02/10/2019 à 16:40, Grodzovsky, Andrey a écrit : >> On 9/30/19 10:52 AM, Hillf Danton wrote: >>> On Mon, 30 Sep 2019 11:17:45 +0200 Neil Armstrong wrote: >>>> Did a new run from 5.3:

Re: drm_sched with panfrost crash on T820

2019-10-02 Thread Grodzovsky, Andrey
On 9/30/19 5:17 AM, Neil Armstrong wrote: > Hi Andrey, > > On 27/09/2019 22:55, Grodzovsky, Andrey wrote: >> Can you please use addr2line or gdb to pinpoint where in >> drm_sched_increase_karma you hit the NULL ptr ? It looks like the guilty >> job, but to be sur

Re: drm_sched with panfrost crash on T820

2019-10-02 Thread Grodzovsky, Andrey
On 9/30/19 10:52 AM, Hillf Danton wrote: > On Mon, 30 Sep 2019 11:17:45 +0200 Neil Armstrong wrote: >> Did a new run from 5.3: >> >> [ 35.971972] Call trace: >> [ 35.974391] drm_sched_increase_karma+0x5c/0xf0 >> 10667f3810667F94 >>

Re: drm_sched with panfrost crash on T820

2019-09-27 Thread Grodzovsky, Andrey
Can you please use addr2line or gdb to pinpoint where in drm_sched_increase_karma you hit the NULL ptr ? It looks like the guilty job, but to be sure. Andrey On 9/27/19 4:12 AM, Neil Armstrong wrote: > Hi Christian, > > In v5.3, running dEQP triggers the following kernel crash : > > [

Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()

2019-09-26 Thread Grodzovsky, Andrey
On 9/26/19 11:59 AM, Steven Price wrote: > On 26/09/2019 16:48, Grodzovsky, Andrey wrote: >> On 9/26/19 11:23 AM, Steven Price wrote: >>> On 26/09/2019 16:14, Grodzovsky, Andrey wrote: >>>> On 9/26/19 10:16 AM, Steven Price wrote: >>>>> drm_sched_

Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()

2019-09-26 Thread Grodzovsky, Andrey
On 9/26/19 11:23 AM, Steven Price wrote: > On 26/09/2019 16:14, Grodzovsky, Andrey wrote: >> On 9/26/19 10:16 AM, Steven Price wrote: >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of wait_event_interrupti

Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()

2019-09-26 Thread Grodzovsky, Andrey
On 9/26/19 10:16 AM, Steven Price wrote: > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. > > Instead let's rename

Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

2019-09-26 Thread Grodzovsky, Andrey
On 9/26/19 3:07 AM, Koenig, Christian wrote: > Am 25.09.19 um 17:14 schrieb Steven Price: >> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >> it is called as the condition of wait_event_interruptible() it must not >> sleep. Unfortunately some free callbacks (notably for

Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

2019-09-26 Thread Grodzovsky, Andrey
On 9/26/19 5:41 AM, Steven Price wrote: > On 25/09/2019 21:09, Grodzovsky, Andrey wrote: >> On 9/25/19 12:07 PM, Andrey Grodzovsky wrote: >>> On 9/25/19 12:00 PM, Steven Price wrote: >>> >>>> On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >&g

Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

2019-09-25 Thread Grodzovsky, Andrey
On 9/25/19 12:07 PM, Andrey Grodzovsky wrote: > On 9/25/19 12:00 PM, Steven Price wrote: > >> On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >>> On 9/25/19 11:14 AM, Steven Price wrote: >>> >>>> drm_sched_cleanup_jobs() attempts to free finished job

Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

2019-09-25 Thread Grodzovsky, Andrey
On 9/25/19 12:00 PM, Steven Price wrote: > On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >> On 9/25/19 11:14 AM, Steven Price wrote: >> >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of w

Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()

2019-09-25 Thread Grodzovsky, Andrey
On 9/25/19 11:14 AM, Steven Price wrote: > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. > > Instead let's rename

Re: [PATCH] drm/scheduler: use job count instead of peek

2019-08-12 Thread Grodzovsky, Andrey
Acked-by: Andrey Grodzovsky Andrey On 8/9/19 11:31 AM, Christian König wrote: > The spsc_queue_peek function is accessing queue->head which belongs to > the consumer thread and shouldn't be accessed by the producer > > This is fixing a rare race condition when destroying entit

Re: [PATCH] drm/scheduler: put killed job cleanup to worker

2019-07-03 Thread Grodzovsky, Andrey
On 7/3/19 10:53 AM, Lucas Stach wrote: > Am Mittwoch, den 03.07.2019, 14:41 + schrieb Grodzovsky, Andrey: >> On 7/3/19 10:32 AM, Lucas Stach wrote: >>> Am Mittwoch, den 03.07.2019, 14:23 + schrieb Grodzovsky, Andrey: >>>> On 7/3

Re: [PATCH] drm/scheduler: put killed job cleanup to worker

2019-07-03 Thread Grodzovsky, Andrey
On 7/3/19 10:32 AM, Lucas Stach wrote: > Am Mittwoch, den 03.07.2019, 14:23 + schrieb Grodzovsky, Andrey: >> On 7/3/19 6:28 AM, Lucas Stach wrote: >>> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled >>> job finished fence signaling. As

Re: [PATCH] drm/sched: Fix make htmldocs warnings.

2019-06-03 Thread Grodzovsky, Andrey
On 6/3/19 3:24 AM, Daniel Vetter wrote: > On Thu, May 30, 2019 at 05:04:20PM +0200, Christian König wrote: >> Am 29.05.19 um 21:36 schrieb Daniel Vetter: >>> On Wed, May 29, 2019 at 04:43:45PM +, Grodzovsky, Andrey wrote: >>>> I don't, sorry. >>> Shoul

Re: [PATCH] drm/sched: Fix make htmldocs warnings.

2019-05-29 Thread Grodzovsky, Andrey
I don't, sorry. Andrey On 5/29/19 12:42 PM, Alex Deucher wrote: > On Wed, May 29, 2019 at 10:29 AM Andrey Grodzovsky > wrote: >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Alex Deucher > > I'll push it to drm-misc in a minute unless you have commit rights. > > Alex > >> --- >>

Re: [bug report] drm/scheduler: rework job destruction

2019-05-22 Thread Grodzovsky, Andrey
Thanks for letting know, I will send a fix soon. Andrey On 5/22/19 9:07 AM, Dan Carpenter wrote: > [CAUTION: External Email] > > Hello Christian König, > > The patch 5918045c4ed4: "drm/scheduler: rework job destruction" from > Apr 18, 2019, leads to the following static checker warning: > >

Re: lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Grodzovsky, Andrey
17:42:48 To: Grodzovsky, Andrey Cc: Deucher, Alexander; Koenig, Christian; Zhou, David(ChunMing); David Airlie; Daniel Vetter; Lucas Stach; Russell King; Christian Gmeiner; Qiang Yu; Rob Herring; Tomeu Vizoso; Eric Anholt; Rex Zhu; Huang, Ray; Deng, Emily; Nayan Deshmukh; Sharat Masetty; amd

Re: lima_bo memory leak after drm_sched job destruction rework

2019-05-17 Thread Grodzovsky, Andrey
On 5/17/19 3:35 PM, Erico Nunes wrote: > [CAUTION: External Email] > > Hello, > > I have recently observed a memory leak issue with lima using > drm-misc-next, which I initially reported here: > https://gitlab.freedesktop.org/lima/linux/issues/24 > It is an easily reproduceable memory leak which

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-29 Thread Grodzovsky, Andrey
cannot fully judge patch #4, #5, #6. -David From: amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Friday, April 26, 2019 10:09 PM To: Koenig, Christian <mailto:christian.koe...@amd.com>; Zhou, David(ChunMing) <mailto:david1.z...

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-26 Thread Grodzovsky, Andrey
that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, Da

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Grodzovsky, Andrey
: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.

Re: [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer

2019-04-23 Thread Grodzovsky, Andrey
On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote: > +Monk to response this patch. > > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> For later driver's reference to see if the fence is signaled. >> >> v2: Move parent fence put to resubmit jobs. >> >> Signed-off-by: Andrey Grodzovsky >>

Re: [PATCH v5 3/6] drm/scheduler: rework job destruction

2019-04-23 Thread Grodzovsky, Andrey
. Andrey Original Message Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,cko

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Grodzovsky, Andrey
On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does

Re: [PATCH v5 3/6] drm/scheduler: rework job destruction

2019-04-23 Thread Grodzovsky, Andrey
On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... >     spin_lock_irqsave(>job_list_lock, flags); >     /* remove job from ring_mirror_list */ >     list_del_init(_job->node); >

Re: [PATCH] drm/sched: Fix description of drm_sched_stop

2019-04-23 Thread Grodzovsky, Andrey
Reviewed-by: Andrey Grodzovsky Andrey On 4/20/19 8:50 AM, Jonathan Neuschäfer wrote: > Since commit 222b5f044159 ("drm/sched: Refactor ring mirror list > handling."), drm_sched_hw_job_reset is no longer there, so let's adjust > the doc comment accordingly. > > Signed-of

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Grodzovsky, Andrey
ing the mail and the KASAN dump. Andrey > > And we should probably commit patch #1 and #2. > > Christian. > > Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >> Ping for patches 3, new patch 5 and patch 6. >> >> Andrey >> >> On 4/18/19 11:00 AM,

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Grodzovsky, Andrey
Koenig, Christian wrote: >> Well you at least have to give me time till after the holidays to get >> going again :) >> >> Not sure exactly jet why we need patch number 5. >> >> And we should probably commit patch #1 and #2. >> >> Christian. >> &g

Re: [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock

2019-04-23 Thread Grodzovsky, Andrey
This series is on top of drm-misc because of panfrost and lima drovers which are missing form amd-staging-drm-next. Once i land it in drm-misc I will merge and p[ush it into drm-next. Andrey On 4/22/19 10:35 PM, Dieter Nützel wrote: > Hello Andrey, > > this series can't apply (brake on #3) on

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-22 Thread Grodzovsky, Andrey
Ping for patches 3, new patch 5 and patch 6. Andrey On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: > Also reject TDRs if another one already running. > > v2: > Stop all schedulers across device and entire XGMI hive before > force signaling HW fences. > Avoid passing job_signaled to helper

Re: [PATCH v3 1/5] drm/scheduler: rework job destruction

2019-04-17 Thread Grodzovsky, Andrey
On 4/16/19 12:00 PM, Koenig, Christian wrote: > Am 16.04.19 um 17:42 schrieb Grodzovsky, Andrey: >> On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: >>> On 4/16/19 10:43 AM, Koenig, Christian wrote: >>>> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>>

Re: [PATCH v4 3/5] drm/scheduler: rework job destruction

2019-04-17 Thread Grodzovsky, Andrey
On 4/17/19 2:01 PM, Koenig, Christian wrote: > Am 17.04.19 um 19:59 schrieb Christian König: >> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey: >>> On 4/17/19 1:17 PM, Christian König wrote: >>>> I can't review this patch, since I'm one of the authors of it, but in

Re: [PATCH v4 3/5] drm/scheduler: rework job destruction

2019-04-17 Thread Grodzovsky, Andrey
and keep it all in one place which is amdgpu_device_gpu_recover. Andrey > > Regards, > Christian. > > Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey: >> Ping on this patch and patch 5. The rest already RBed. >> >> Andrey >> >> On 4/16/19 2:23 PM, Andrey

Re: [PATCH v4 3/5] drm/scheduler: rework job destruction

2019-04-17 Thread Grodzovsky, Andrey
Ping on this patch and patch 5. The rest already RBed. Andrey On 4/16/19 2:23 PM, Andrey Grodzovsky wrote: > From: Christian König > > We now destroy finished jobs from the worker thread to make sure that > we never destroy a job currently in timeout processing. > By this we avoid holding lock

Re: [PATCH v3 1/5] drm/scheduler: rework job destruction

2019-04-16 Thread Grodzovsky, Andrey
On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: > On 4/16/19 10:43 AM, Koenig, Christian wrote: >> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>> On 4/16/19 5:47 AM, Christian König wrote: >>>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>>>

Re: [PATCH v3 1/5] drm/scheduler: rework job destruction

2019-04-16 Thread Grodzovsky, Andrey
On 4/16/19 10:43 AM, Koenig, Christian wrote: > Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >> On 4/16/19 5:47 AM, Christian König wrote: >>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>> Andrey Grodzovsky writes: >>>> >>>>> From:

Re: [PATCH v3 1/5] drm/scheduler: rework job destruction

2019-04-16 Thread Grodzovsky, Andrey
On 4/16/19 5:47 AM, Christian König wrote: > Am 15.04.19 um 23:17 schrieb Eric Anholt: >> Andrey Grodzovsky writes: >> >>> From: Christian König >>> >>> We now destroy finished jobs from the worker thread to make sure that >>> we never destroy a job currently in timeout processing. >>> By this

Re: [PATCH 3/4] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-15 Thread Grodzovsky, Andrey
On 4/15/19 2:46 AM, Koenig, Christian wrote: I agree this would be good in case of amdgpu_device_pre_asic_reset because we can totally skip this function if guilty job already signaled, but for amdgpu_device_post_asic_reset it crates complications because drm_sched_start is right in the middle

Re: [PATCH 3/4] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-12 Thread Grodzovsky, Andrey
On 4/12/19 3:39 AM, Christian König wrote: > Am 11.04.19 um 18:03 schrieb Andrey Grodzovsky: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> >> Signed-off-by: Andrey Grodzovsky >> ---

Re: [PATCH 4/4] drm/amd/display: Restore deleted patch to resolve reset deadlock.

2019-04-12 Thread Grodzovsky, Andrey
On 4/12/19 3:40 AM, Christian König wrote: > Am 11.04.19 um 18:03 schrieb Andrey Grodzovsky: >> Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' >> was accidentaly removed during one of DALs code merges. >> >> Signed-off-by: Andrey Grodzovsky >> --- >>  

Re: [PATCH 4/4] drm/amd/display: Restore deleted patch to resolve reset deadlock.

2019-04-11 Thread Grodzovsky, Andrey
On 4/11/19 12:41 PM, Kazlauskas, Nicholas wrote: > On 4/11/19 12:03 PM, Andrey Grodzovsky wrote: >> Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' >> was accidentaly removed during one of DALs code merges. >> >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Nicholas

Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-10 Thread Grodzovsky, Andrey
On 4/10/19 10:41 AM, Christian König wrote: > Am 10.04.19 um 16:28 schrieb Grodzovsky, Andrey: >> On 4/10/19 10:06 AM, Christian König wrote: >>> Am 09.04.19 um 18:42 schrieb Grodzovsky, Andrey: >>>> On 4/9/19 10:50 AM, Christian König wrote: >>>>> A

Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-10 Thread Grodzovsky, Andrey
On 4/10/19 10:06 AM, Christian König wrote: > Am 09.04.19 um 18:42 schrieb Grodzovsky, Andrey: >> On 4/9/19 10:50 AM, Christian König wrote: >>> Am 08.04.19 um 18:08 schrieb Andrey Grodzovsky: >>>> Also reject TDRs if another one already running. >>>

Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-09 Thread Grodzovsky, Andrey
On 4/9/19 10:50 AM, Christian König wrote: > Am 08.04.19 um 18:08 schrieb Andrey Grodzovsky: >> Also reject TDRs if another one already running. >> >> Signed-off-by: Andrey Grodzovsky >> --- >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 94 >> +- >>   1 file

Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.

2019-03-13 Thread Grodzovsky, Andrey
np Andrey On 3/13/19 1:53 PM, Eric Anholt wrote: > "Grodzovsky, Andrey" writes: > >> On 3/13/19 12:13 PM, Eric Anholt wrote: >>> "Grodzovsky, Andrey" writes: >>> >>>> They are not the same, but the guilty job belongs to only o

Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.

2019-03-13 Thread Grodzovsky, Andrey
On 3/13/19 12:13 PM, Eric Anholt wrote: > "Grodzovsky, Andrey" writes: > >> They are not the same, but the guilty job belongs to only one {entity, >> scheduler} pair and so we mark as guilty only for that particular >> entity in the context of that schedule

Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.

2019-03-12 Thread Grodzovsky, Andrey
To: Grodzovsky, Andrey; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; to...@tomeuvizoso.net Cc: Grodzovsky, Andrey Subject: Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched. Andrey Grodzovsky writes: > Also stop calling drm_sched_increase_karma multiple ti

Re: [PATCH v6 1/2] drm/sched: Refactor ring mirror list handling.

2019-03-12 Thread Grodzovsky, Andrey
On 3/12/19 3:43 AM, Tomeu Vizoso wrote: > On Thu, 27 Dec 2018 at 20:28, Andrey Grodzovsky > wrote: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >>

Re: [PATCH v2] tests/amdgpu: add deadlock test for sdma

2019-03-06 Thread Grodzovsky, Andrey
On 3/6/19 1:37 AM, Cui, Flora wrote: > deadlock test for sdma will cause gpu recoverty. > disable the test for now until GPU reset recovery could survive at least > 1000 times test. Can you specify what issues you see and on what ASIC ? Andrey > > v2: add modprobe parameter > > Change-Id:

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-24 Thread Grodzovsky, Andrey
er actually isn't used any more, isn't it? > >> +retry_wait: > Not used any more. > > But apart from that at least patch #1 and #2 look like they can have my > rb now. > > Patch #3 looks also like it should work after a bit of polishing. > > Thanks, > Christia

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-18 Thread Grodzovsky, Andrey
, Christian wrote: > Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey: >> On 01/18/2019 12:10 PM, Koenig, Christian wrote: >>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>>>> [SNIP] >>

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-18 Thread Grodzovsky, Andrey
On 01/18/2019 12:10 PM, Koenig, Christian wrote: > Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>> [SNIP] >>>>>>> Re-arming the timeout should probably have a much reduced value >>>>&

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-18 Thread Grodzovsky, Andrey
On 01/18/2019 04:25 AM, Koenig, Christian wrote: > [SNIP] > Re-arming the timeout should probably have a much reduced value > when the job hasn't changed. E.g. something like a few ms. >> Now i got thinking about non hanged job in progress (job A) and let's >> say it's a long job , it

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-17 Thread Grodzovsky, Andrey
On 01/17/2019 10:29 AM, Koenig, Christian wrote: Am 17.01.19 um 16:22 schrieb Grodzovsky, Andrey: On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-17 Thread Grodzovsky, Andrey
On 01/17/2019 10:29 AM, Koenig, Christian wrote: Am 17.01.19 um 16:22 schrieb Grodzovsky, Andrey: On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-17 Thread Grodzovsky, Andrey
On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-16 Thread Grodzovsky, Andrey
On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-16 Thread Grodzovsky, Andrey
On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-15 Thread Grodzovsky, Andrey
On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: > > > On 01/11/2019 02:11 PM, Koenig, Christian wrote: >> Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: >>> On 01/11/2019 04:42 AM, Koenig, Christian wrote: >>>> Am 10.01.19 um 16:56 schrieb Grodzovsky,

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-11 Thread Grodzovsky, Andrey
On 01/11/2019 02:11 PM, Koenig, Christian wrote: > Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: >> On 01/11/2019 04:42 AM, Koenig, Christian wrote: >>> Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: >>>> [SNIP] >>>>>>> But we will not be a

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-11 Thread Grodzovsky, Andrey
On 01/11/2019 04:42 AM, Koenig, Christian wrote: > Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: >> [SNIP] >>>>> But we will not be adding the cb back in drm_sched_stop anymore, now we >>>>> are only going to add back the cb in drm_sched_

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-10 Thread Grodzovsky, Andrey
Just a ping. Andrey On 01/09/2019 10:18 AM, Andrey Grodzovsky wrote: > > > On 01/09/2019 05:22 AM, Christian König wrote: >> Am 07.01.19 um 20:47 schrieb Grodzovsky, Andrey: >>> >>> On 01/07/2019 09:13 AM, Christian König wrote: >>>> Am 03.01.19 um

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-09 Thread Grodzovsky, Andrey
On 01/09/2019 05:22 AM, Christian König wrote: > Am 07.01.19 um 20:47 schrieb Grodzovsky, Andrey: >> >> On 01/07/2019 09:13 AM, Christian König wrote: >>> Am 03.01.19 um 18:42 schrieb Grodzovsky, Andrey: >>>> On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote:

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-07 Thread Grodzovsky, Andrey
On 01/07/2019 09:13 AM, Christian König wrote: > Am 03.01.19 um 18:42 schrieb Grodzovsky, Andrey: >> >> On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: >>> On 01/03/2019 03:54 AM, Koenig, Christian wrote: >>>> Am 21.12.18 um 21:36 schrieb Grodzovsky,

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-03 Thread Grodzovsky, Andrey
On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: > > On 01/03/2019 03:54 AM, Koenig, Christian wrote: >> Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >>> On 12/21/2018 01:37 PM, Christian König wrote: >>>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky:

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2019-01-03 Thread Grodzovsky, Andrey
On 01/03/2019 03:54 AM, Koenig, Christian wrote: > Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >> On 12/21/2018 01:37 PM, Christian König wrote: >>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>> Decauple sched threads stop and start and ring mirror >

Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-21 Thread Grodzovsky, Andrey
On 12/21/2018 01:37 PM, Christian König wrote: > Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from

Re: [PATCH] drm: Block fb changes for async plane updates

2018-12-21 Thread Grodzovsky, Andrey
As far as we discussed this internally looks good to me, but obviously we need to wait for some feedback from non AMD people. Acked-by: Andrey Grodzovsky Andrey On 12/21/2018 09:33 AM, Nicholas Kazlauskas wrote: > The behavior of drm_atomic_helper_cleanup_planes differs depend

Re: [PATCH v4 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-19 Thread Grodzovsky, Andrey
On 12/19/2018 11:21 AM, Christian König wrote: > Am 17.12.18 um 20:51 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from

Re: [PATCH v3 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-17 Thread Grodzovsky, Andrey
On 12/17/2018 10:27 AM, Christian König wrote: > Am 10.12.18 um 22:43 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from

Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.

2018-12-14 Thread Grodzovsky, Andrey
Just a reminder. Any new comments in light of all the discussion ? Andrey On 12/12/2018 08:08 AM, Grodzovsky, Andrey wrote: > BTW, the problem I pointed out with drm_sched_entity_kill_jobs_cb is not > an issue with this patch set since it removes the cb from > s_fence->finished in g

Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.

2018-12-12 Thread Grodzovsky, Andrey
ote: > Yeah, completely correct explained. > > I was unfortunately really busy today, but going to give that a look > as soon as I have time. > > Christian. > > Am 11.12.18 um 17:01 schrieb Grodzovsky, Andrey: >> A I understand you say that by the time the fence callback r

Re: [PATCH libdrm] amdgpu/test: Enable deadlock test for CI family (gfx7)

2018-12-11 Thread Grodzovsky, Andrey
np Andrey On 12/11/2018 03:18 PM, Alex Deucher wrote: > On Tue, Dec 11, 2018 at 3:13 PM Andrey Grodzovsky > wrote: >> I retested GPU recovery with Bonaire ASIC and it works. >> >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Alex Deucher > > Care to enable it in the kernel as well? > >

Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.

2018-12-11 Thread Grodzovsky, Andrey
Tuesday, December 11, 2018 5:44 AM >> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; >> ckoenig.leichtzumer...@gmail.com; e...@anholt.net; >> etna...@lists.freedesktop.org >> Cc: Zhou, David(ChunMing) ; Liu, Monk >> ; Grodzovsky, Andrey >> >> Su

Re: [PATCH 2/2] drm/sched: Rework HW fence processing.

2018-12-07 Thread Grodzovsky, Andrey
On 12/07/2018 03:19 AM, Christian König wrote: > Am 07.12.18 um 04:18 schrieb Zhou, David(ChunMing): >> >>> -Original Message- >>> From: dri-devel On Behalf Of >>> Andrey Grodzovsky >>> Sent: Friday, December 07, 2018 1:41 AM >>> To: dri-devel@lists.freedesktop.org;

Re: [PATCH 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-06 Thread Grodzovsky, Andrey
On 12/06/2018 01:33 PM, Christian König wrote: > Am 06.12.18 um 18:41 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from

Re: [PATCH 2/2] drm/sched: Rework HW fence processing.

2018-12-06 Thread Grodzovsky, Andrey
On 12/06/2018 12:41 PM, Andrey Grodzovsky wrote: > Expedite job deletion from ring mirror list to the HW fence signal > callback instead from finish_work, together with waiting for all > such fences to signal in drm_sched_stop we garantee that > already signaled job will not be processed twice. >

Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.

2018-11-02 Thread Grodzovsky, Andrey
There is a pplib messaging related failure currently during GPU reset. I will put this issue on my TODO list for later time after handling more prioritized stuff and will disable the deadlock test suite for all non dGPU gfx8/9 ASICs until then. Andrey On 11/02/2018 02:14 PM, Grodzovsky

Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.

2018-11-02 Thread Grodzovsky, Andrey
On 11/02/2018 02:12 PM, Alex Deucher wrote: > On Fri, Nov 2, 2018 at 11:59 AM Grodzovsky, Andrey > wrote: >> >> >> On 11/02/2018 10:24 AM, Michel Dänzer wrote: >>> On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote: >>>> Illegal access will cause CP h

Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.

2018-11-02 Thread Grodzovsky, Andrey
On 11/02/2018 10:24 AM, Michel Dänzer wrote: > On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and GPU reset

Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.

2018-10-31 Thread Grodzovsky, Andrey
On 10/31/2018 03:49 PM, Alex Deucher wrote: > On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky > wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and

Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.

2018-10-31 Thread Grodzovsky, Andrey
On 10/31/2018 03:49 PM, Alex Deucher wrote: > On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky > wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and

Re: [PATCH v2] drm/scheduler: Add drm_sched_job_cleanup

2018-10-29 Thread Grodzovsky, Andrey
Acked-by: Andrey Grodzovsky Andrey On 10/29/2018 05:32 AM, Sharat Masetty wrote: > This patch adds a new API to clean up the scheduler job resources. This > is primarliy needed in cases the job was created but was not queued to > the scheduler queue. Additionally with this change,

Re: [PATCH v2 1/2] drm/sched: Add boolean to mark if sched is ready to work v2

2018-10-23 Thread Grodzovsky, Andrey
On 10/22/2018 05:33 AM, Koenig, Christian wrote: > Am 19.10.18 um 22:52 schrieb Andrey Grodzovsky: >> Problem: >> A particular scheduler may become unsuable (underlying HW) after >> some event (e.g. GPU reset). If it's later chosen by >> the get free sched. policy a command will fail to be >>

Re: [PATCH v3 2/2] drm/amdgpu: Retire amdgpu_ring.ready flag v3

2018-10-23 Thread Grodzovsky, Andrey
On 10/23/2018 05:23 AM, Christian König wrote: > Am 22.10.18 um 22:46 schrieb Andrey Grodzovsky: >> Start using drm_gpu_scheduler.ready isntead. >> >> v3: >> Add helper function to run ring test and set >> sched.ready flag status accordingly, clean explicit >> sched.ready sets from the IP

Re: [PATCH 3/3] drm/amdgpu: Refresh rq selection for job after ASIC reset

2018-10-19 Thread Grodzovsky, Andrey
That my next step. Andrey On 10/19/2018 12:28 PM, Christian König wrote: From my testing looks like we can, compute ring 0 is dead but IB tests pass on other compute rings. Interesting, but I would rather investigate why compute ring 0 is dead while other still work.

  1   2   >