Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Christian König via amd-gfx

Am 14.02.19 um 20:03 schrieb Grodzovsky, Andrey:

On 2/14/19 12:54 PM, Kazlauskas, Nicholas wrote:

On 2/14/19 12:47 PM, Grodzovsky, Andrey wrote:

On 2/14/19 11:57 AM, Kazlauskas, Nicholas wrote:

On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:

On 2/14/19 11:07 AM, Michel Dänzer wrote:

On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:

On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:

On 2/14/19 4:05 AM, Christian König wrote:

Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:

When ring hang happens amdgpu_dm_commit_planes during flip is holding
the BO reserved and then stack waiting for fences to signal in
reservation_object_wait_timeout_rcu (which won't signal because there
was a hnag). Then when we try to shutdown display block during reset
recovery from drm_atomic_helper_suspend we also try to reserve the BO
from dm_plane_helper_cleanup_fb ending in deadlock.
Also remove useless WARN_ON

Well it is good that you pointed this out, but there are more problems
than just waiting wile the BO is reserved here.


Signed-off-by: Andrey Grodzovsky 
---
     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
+--
     1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index acc4ff8..f8dec36 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
drm_atomic_state *state,
      */
     abo = gem_to_amdgpu_bo(fb->obj[0]);
     r = amdgpu_bo_reserve(abo, true);

Well why do we reserve the BO in the first place? As the name
indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
for the BO to be idle, no need to actually reserve it at all.

That a good point, I honestly can't remember any more why it's here...
It does look unnecessary given that we are not validating the BO here.

I think we reserve it to grab the tiling flags to update the plane
address (and some other bits of the plane state with a patch I'm
currently working on). I'm guessing that we still need the reservation
so nothing else tries to grab it after it becomes idle, but I'm not
overly familiar with that bit.

The BO is already pinned at this point, isn't it? That's enough to
prevent it from moving, no reservation needed.

Yes, agree.

Andrey



I'm not worried about the buffer moving so much as I am about userspace
changing the tiling flags by calling the IOCTL. Can that happen when
it's pinned?

Nicholas Kazlauskas

In any case this has nothing to do with reserving the BO.

Andrey



Technically it does? See the function:

void amdgpu_bo_get_tiling_flags(struct amdgpu_bo *bo, u64 *tiling_flags)
{
lockdep_assert_held(>tbo.resv->lock.base);

if (tiling_flags)
*tiling_flags = bo->tiling_flags;
}

Nicholas Kazlauskas

What I meant that it has nothing to do with this BO reservation in the
flip code while waiting for fences. Seems it's dafe to remove it.


Yeah, agree. That looks actually quite superfluous to me.

Following up on Michel's argument you would actually need to keep the BO 
reserved until the scanout is completed when you really want to prevent 
that the tilling flags don't change.


I would just completely drop that function and access bo->tilling_flags 
directly here.


Christian.



Andrey


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Grodzovsky, Andrey

On 2/14/19 12:54 PM, Kazlauskas, Nicholas wrote:
> On 2/14/19 12:47 PM, Grodzovsky, Andrey wrote:
>> On 2/14/19 11:57 AM, Kazlauskas, Nicholas wrote:
>>> On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:
 On 2/14/19 11:07 AM, Michel Dänzer wrote:
> On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
>> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>>> On 2/14/19 4:05 AM, Christian König wrote:
 Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
> When ring hang happens amdgpu_dm_commit_planes during flip is holding
> the BO reserved and then stack waiting for fences to signal in
> reservation_object_wait_timeout_rcu (which won't signal because there
> was a hnag). Then when we try to shutdown display block during reset
> recovery from drm_atomic_helper_suspend we also try to reserve the BO
> from dm_plane_helper_cleanup_fb ending in deadlock.
> Also remove useless WARN_ON
 Well it is good that you pointed this out, but there are more problems
 than just waiting wile the BO is reserved here.

> Signed-off-by: Andrey Grodzovsky 
> ---
>     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
> +--
>     1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index acc4ff8..f8dec36 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
> drm_atomic_state *state,
>      */
>     abo = gem_to_amdgpu_bo(fb->obj[0]);
>     r = amdgpu_bo_reserve(abo, true);
 Well why do we reserve the BO in the first place? As the name
 indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
 for the BO to be idle, no need to actually reserve it at all.
>>> That a good point, I honestly can't remember any more why it's here...
>>> It does look unnecessary given that we are not validating the BO here.
>> I think we reserve it to grab the tiling flags to update the plane
>> address (and some other bits of the plane state with a patch I'm
>> currently working on). I'm guessing that we still need the reservation
>> so nothing else tries to grab it after it becomes idle, but I'm not
>> overly familiar with that bit.
> The BO is already pinned at this point, isn't it? That's enough to
> prevent it from moving, no reservation needed.
 Yes, agree.

 Andrey


>>> I'm not worried about the buffer moving so much as I am about userspace
>>> changing the tiling flags by calling the IOCTL. Can that happen when
>>> it's pinned?
>>>
>>> Nicholas Kazlauskas
>> In any case this has nothing to do with reserving the BO.
>>
>> Andrey
>>
>>
> Technically it does? See the function:
>
> void amdgpu_bo_get_tiling_flags(struct amdgpu_bo *bo, u64 *tiling_flags)
> {
>   lockdep_assert_held(>tbo.resv->lock.base);
>
>   if (tiling_flags)
>   *tiling_flags = bo->tiling_flags;
> }
>
> Nicholas Kazlauskas

What I meant that it has nothing to do with this BO reservation in the 
flip code while waiting for fences. Seems it's dafe to remove it.

Andrey


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Kazlauskas, Nicholas
On 2/14/19 12:47 PM, Grodzovsky, Andrey wrote:
> 
> On 2/14/19 11:57 AM, Kazlauskas, Nicholas wrote:
>> On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:
>>> On 2/14/19 11:07 AM, Michel Dänzer wrote:
 On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>> On 2/14/19 4:05 AM, Christian König wrote:
>>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
 When ring hang happens amdgpu_dm_commit_planes during flip is holding
 the BO reserved and then stack waiting for fences to signal in
 reservation_object_wait_timeout_rcu (which won't signal because there
 was a hnag). Then when we try to shutdown display block during reset
 recovery from drm_atomic_helper_suspend we also try to reserve the BO
 from dm_plane_helper_cleanup_fb ending in deadlock.
 Also remove useless WARN_ON
>>> Well it is good that you pointed this out, but there are more problems
>>> than just waiting wile the BO is reserved here.
>>>
 Signed-off-by: Andrey Grodzovsky 
 ---
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
 +--
    1 file changed, 13 insertions(+), 6 deletions(-)

 diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 index acc4ff8..f8dec36 100644
 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
 drm_atomic_state *state,
     */
    abo = gem_to_amdgpu_bo(fb->obj[0]);
    r = amdgpu_bo_reserve(abo, true);
>>> Well why do we reserve the BO in the first place? As the name
>>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>>> for the BO to be idle, no need to actually reserve it at all.
>> That a good point, I honestly can't remember any more why it's here...
>> It does look unnecessary given that we are not validating the BO here.
> I think we reserve it to grab the tiling flags to update the plane
> address (and some other bits of the plane state with a patch I'm
> currently working on). I'm guessing that we still need the reservation
> so nothing else tries to grab it after it becomes idle, but I'm not
> overly familiar with that bit.
 The BO is already pinned at this point, isn't it? That's enough to
 prevent it from moving, no reservation needed.
>>>
>>> Yes, agree.
>>>
>>> Andrey
>>>
>>>

>> I'm not worried about the buffer moving so much as I am about userspace
>> changing the tiling flags by calling the IOCTL. Can that happen when
>> it's pinned?
>>
>> Nicholas Kazlauskas
> 
> In any case this has nothing to do with reserving the BO.
> 
> Andrey
> 
> 

Technically it does? See the function:

void amdgpu_bo_get_tiling_flags(struct amdgpu_bo *bo, u64 *tiling_flags)
{
lockdep_assert_held(>tbo.resv->lock.base);

if (tiling_flags)
*tiling_flags = bo->tiling_flags;
}

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Grodzovsky, Andrey

On 2/14/19 11:57 AM, Kazlauskas, Nicholas wrote:
> On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:
>> On 2/14/19 11:07 AM, Michel Dänzer wrote:
>>> On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
 On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
> On 2/14/19 4:05 AM, Christian König wrote:
>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>> Well it is good that you pointed this out, but there are more problems
>> than just waiting wile the BO is reserved here.
>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>>> +--
>>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>>> drm_atomic_state *state,
>>>    */
>>>   abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>   r = amdgpu_bo_reserve(abo, true);
>> Well why do we reserve the BO in the first place? As the name
>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>> for the BO to be idle, no need to actually reserve it at all.
> That a good point, I honestly can't remember any more why it's here...
> It does look unnecessary given that we are not validating the BO here.
 I think we reserve it to grab the tiling flags to update the plane
 address (and some other bits of the plane state with a patch I'm
 currently working on). I'm guessing that we still need the reservation
 so nothing else tries to grab it after it becomes idle, but I'm not
 overly familiar with that bit.
>>> The BO is already pinned at this point, isn't it? That's enough to
>>> prevent it from moving, no reservation needed.
>>
>> Yes, agree.
>>
>> Andrey
>>
>>
>>>
> I'm not worried about the buffer moving so much as I am about userspace
> changing the tiling flags by calling the IOCTL. Can that happen when
> it's pinned?
>
> Nicholas Kazlauskas

In any case this has nothing to do with reserving the BO.

Andrey


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Michel Dänzer
On 2019-02-14 5:57 p.m., Kazlauskas, Nicholas wrote:
> On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:
>>
>> On 2/14/19 11:07 AM, Michel Dänzer wrote:
>>> On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
 On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
> On 2/14/19 4:05 AM, Christian König wrote:
>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>> Well it is good that you pointed this out, but there are more problems
>> than just waiting wile the BO is reserved here.
>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>>> +--
>>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>>> drm_atomic_state *state,
>>>   */
>>>  abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>  r = amdgpu_bo_reserve(abo, true);
>> Well why do we reserve the BO in the first place? As the name
>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>> for the BO to be idle, no need to actually reserve it at all.
> That a good point, I honestly can't remember any more why it's here...
> It does look unnecessary given that we are not validating the BO here.
 I think we reserve it to grab the tiling flags to update the plane
 address (and some other bits of the plane state with a patch I'm
 currently working on). I'm guessing that we still need the reservation
 so nothing else tries to grab it after it becomes idle, but I'm not
 overly familiar with that bit.
>>> The BO is already pinned at this point, isn't it? That's enough to
>>> prevent it from moving, no reservation needed.
>>
>>
>> Yes, agree.
>>
>> Andrey
>>
>>
>>>
>>>
> 
> I'm not worried about the buffer moving so much as I am about userspace 
> changing the tiling flags by calling the IOCTL. Can that happen when 
> it's pinned?

Even assuming it can theoretically, that would be a "doctor, it hurts if
I do this" kind of userspace problem. :) Userspace has to set the
correct tiling flags before submitting the flip, and then can't change
them until the BO stops being scanned out.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Kazlauskas, Nicholas
On 2/14/19 11:47 AM, Grodzovsky, Andrey wrote:
> 
> On 2/14/19 11:07 AM, Michel Dänzer wrote:
>> On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
>>> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
 On 2/14/19 4:05 AM, Christian König wrote:
> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>> the BO reserved and then stack waiting for fences to signal in
>> reservation_object_wait_timeout_rcu (which won't signal because there
>> was a hnag). Then when we try to shutdown display block during reset
>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>> from dm_plane_helper_cleanup_fb ending in deadlock.
>> Also remove useless WARN_ON
> Well it is good that you pointed this out, but there are more problems
> than just waiting wile the BO is reserved here.
>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>> +--
>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index acc4ff8..f8dec36 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>> drm_atomic_state *state,
>>   */
>>  abo = gem_to_amdgpu_bo(fb->obj[0]);
>>  r = amdgpu_bo_reserve(abo, true);
> Well why do we reserve the BO in the first place? As the name
> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
> for the BO to be idle, no need to actually reserve it at all.
 That a good point, I honestly can't remember any more why it's here...
 It does look unnecessary given that we are not validating the BO here.
>>> I think we reserve it to grab the tiling flags to update the plane
>>> address (and some other bits of the plane state with a patch I'm
>>> currently working on). I'm guessing that we still need the reservation
>>> so nothing else tries to grab it after it becomes idle, but I'm not
>>> overly familiar with that bit.
>> The BO is already pinned at this point, isn't it? That's enough to
>> prevent it from moving, no reservation needed.
> 
> 
> Yes, agree.
> 
> Andrey
> 
> 
>>
>>

I'm not worried about the buffer moving so much as I am about userspace 
changing the tiling flags by calling the IOCTL. Can that happen when 
it's pinned?

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Grodzovsky, Andrey

On 2/14/19 11:07 AM, Michel Dänzer wrote:
> On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
>> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>>> On 2/14/19 4:05 AM, Christian König wrote:
 Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
> When ring hang happens amdgpu_dm_commit_planes during flip is holding
> the BO reserved and then stack waiting for fences to signal in
> reservation_object_wait_timeout_rcu (which won't signal because there
> was a hnag). Then when we try to shutdown display block during reset
> recovery from drm_atomic_helper_suspend we also try to reserve the BO
> from dm_plane_helper_cleanup_fb ending in deadlock.
> Also remove useless WARN_ON
 Well it is good that you pointed this out, but there are more problems
 than just waiting wile the BO is reserved here.

> Signed-off-by: Andrey Grodzovsky 
> ---
>     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
> +--
>     1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index acc4ff8..f8dec36 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
> drm_atomic_state *state,
>      */
>     abo = gem_to_amdgpu_bo(fb->obj[0]);
>     r = amdgpu_bo_reserve(abo, true);
 Well why do we reserve the BO in the first place? As the name
 indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
 for the BO to be idle, no need to actually reserve it at all.
>>> That a good point, I honestly can't remember any more why it's here...
>>> It does look unnecessary given that we are not validating the BO here.
>> I think we reserve it to grab the tiling flags to update the plane
>> address (and some other bits of the plane state with a patch I'm
>> currently working on). I'm guessing that we still need the reservation
>> so nothing else tries to grab it after it becomes idle, but I'm not
>> overly familiar with that bit.
> The BO is already pinned at this point, isn't it? That's enough to
> prevent it from moving, no reservation needed.


Yes, agree.

Andrey


>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Michel Dänzer
On 2019-02-14 4:54 p.m., Kazlauskas, Nicholas wrote:
> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>> On 2/14/19 4:05 AM, Christian König wrote:
>>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
 When ring hang happens amdgpu_dm_commit_planes during flip is holding
 the BO reserved and then stack waiting for fences to signal in
 reservation_object_wait_timeout_rcu (which won't signal because there
 was a hnag). Then when we try to shutdown display block during reset
 recovery from drm_atomic_helper_suspend we also try to reserve the BO
 from dm_plane_helper_cleanup_fb ending in deadlock.
 Also remove useless WARN_ON
>>>
>>> Well it is good that you pointed this out, but there are more problems
>>> than just waiting wile the BO is reserved here.
>>>

 Signed-off-by: Andrey Grodzovsky 
 ---
    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
 +--
    1 file changed, 13 insertions(+), 6 deletions(-)

 diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 index acc4ff8..f8dec36 100644
 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
 drm_atomic_state *state,
     */
    abo = gem_to_amdgpu_bo(fb->obj[0]);
    r = amdgpu_bo_reserve(abo, true);
>>>
>>> Well why do we reserve the BO in the first place? As the name
>>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>>> for the BO to be idle, no need to actually reserve it at all.
>>
>> That a good point, I honestly can't remember any more why it's here...
>> It does look unnecessary given that we are not validating the BO here.
> 
> I think we reserve it to grab the tiling flags to update the plane 
> address (and some other bits of the plane state with a patch I'm 
> currently working on). I'm guessing that we still need the reservation 
> so nothing else tries to grab it after it becomes idle, but I'm not 
> overly familiar with that bit.

The BO is already pinned at this point, isn't it? That's enough to
prevent it from moving, no reservation needed.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Grodzovsky, Andrey

On 2/14/19 10:54 AM, Kazlauskas, Nicholas wrote:
> On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
>> On 2/14/19 4:05 AM, Christian König wrote:
>>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
 When ring hang happens amdgpu_dm_commit_planes during flip is holding
 the BO reserved and then stack waiting for fences to signal in
 reservation_object_wait_timeout_rcu (which won't signal because there
 was a hnag). Then when we try to shutdown display block during reset
 recovery from drm_atomic_helper_suspend we also try to reserve the BO
 from dm_plane_helper_cleanup_fb ending in deadlock.
 Also remove useless WARN_ON
>>> Well it is good that you pointed this out, but there are more problems
>>> than just waiting wile the BO is reserved here.
>>>
 Signed-off-by: Andrey Grodzovsky 
 ---
     drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
 +--
     1 file changed, 13 insertions(+), 6 deletions(-)

 diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 index acc4ff8..f8dec36 100644
 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
 drm_atomic_state *state,
      */
     abo = gem_to_amdgpu_bo(fb->obj[0]);
     r = amdgpu_bo_reserve(abo, true);
>>> Well why do we reserve the BO in the first place? As the name
>>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>>> for the BO to be idle, no need to actually reserve it at all.
>> That a good point, I honestly can't remember any more why it's here...
>> It does look unnecessary given that we are not validating the BO here.
> I think we reserve it to grab the tiling flags to update the plane
> address (and some other bits of the plane state with a patch I'm
> currently working on). I'm guessing that we still need the reservation
> so nothing else tries to grab it after it becomes idle, but I'm not
> overly familiar with that bit.
>
> Nicholas Kazlauskas

I don't think so, first of all we only hold the reservation during the 
wait for fences and then release it, secondly, BO reservation is a thin 
wrapper around ttm_bo_reserve which just protects the ttm BO internal 
structures and removes the BO from eviction list.

I think you can try to remove it and run some stress testing to see 
there is no visible corruption.

Andrey


>
>>
 -    if (unlikely(r != 0)) {
 +    if (unlikely(r != 0))
     DRM_ERROR("failed to reserve buffer before flip\n");
 -    WARN_ON(1);
 -    }
     -    /* Wait for all fences on this FB */
 - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
 false,
 - MAX_SCHEDULE_TIMEOUT) < 0);
 +    /*
 + * Wait for all fences on this FB. Do limited wait to avoid
 + * deadlock during GPU reset when this fence will not
 signal
 + * but we hold reservation lock for the BO.
 + */
 +    r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
 +    true, false,
>>> Does this waiting happen in a work item or process context? If it's
>>> process context we should actually try to wait interruptible here.
>>
>> Work item only.
>>
>> Andrey
>>
>>
>>> Regards,
>>> Christian.
>>>
 + msecs_to_jiffies(5000));
 +    if (unlikely(r == 0))
 +    DRM_ERROR("Waiting for fences timed out.");
 +
 +
       amdgpu_bo_get_tiling_flags(abo, _flags);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Kazlauskas, Nicholas
On 2/14/19 10:42 AM, Grodzovsky, Andrey wrote:
> 
> On 2/14/19 4:05 AM, Christian König wrote:
>> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>>
>> Well it is good that you pointed this out, but there are more problems
>> than just waiting wile the BO is reserved here.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>>    drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19
>>> +--
>>>    1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct
>>> drm_atomic_state *state,
>>>     */
>>>    abo = gem_to_amdgpu_bo(fb->obj[0]);
>>>    r = amdgpu_bo_reserve(abo, true);
>>
>> Well why do we reserve the BO in the first place? As the name
>> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait
>> for the BO to be idle, no need to actually reserve it at all.
> 
> That a good point, I honestly can't remember any more why it's here...
> It does look unnecessary given that we are not validating the BO here.

I think we reserve it to grab the tiling flags to update the plane 
address (and some other bits of the plane state with a patch I'm 
currently working on). I'm guessing that we still need the reservation 
so nothing else tries to grab it after it becomes idle, but I'm not 
overly familiar with that bit.

Nicholas Kazlauskas

> 
> 
>>
>>> -    if (unlikely(r != 0)) {
>>> +    if (unlikely(r != 0))
>>>    DRM_ERROR("failed to reserve buffer before flip\n");
>>> -    WARN_ON(1);
>>> -    }
>>>    -    /* Wait for all fences on this FB */
>>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true,
>>> false,
>>> - MAX_SCHEDULE_TIMEOUT) < 0);
>>> +    /*
>>> + * Wait for all fences on this FB. Do limited wait to avoid
>>> + * deadlock during GPU reset when this fence will not
>>> signal
>>> + * but we hold reservation lock for the BO.
>>> + */
>>> +    r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>>> +    true, false,
>>
>> Does this waiting happen in a work item or process context? If it's
>> process context we should actually try to wait interruptible here.
> 
> 
> Work item only.
> 
> Andrey
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> + msecs_to_jiffies(5000));
>>> +    if (unlikely(r == 0))
>>> +    DRM_ERROR("Waiting for fences timed out.");
>>> +
>>> +
>>>      amdgpu_bo_get_tiling_flags(abo, _flags);
>>

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Grodzovsky, Andrey

On 2/14/19 4:05 AM, Christian König wrote:
> Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:
>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>> the BO reserved and then stack waiting for fences to signal in
>> reservation_object_wait_timeout_rcu (which won't signal because there
>> was a hnag). Then when we try to shutdown display block during reset
>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>> from dm_plane_helper_cleanup_fb ending in deadlock.
>> Also remove useless WARN_ON
>
> Well it is good that you pointed this out, but there are more problems 
> than just waiting wile the BO is reserved here.
>
>>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
>> +--
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index acc4ff8..f8dec36 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>    */
>>   abo = gem_to_amdgpu_bo(fb->obj[0]);
>>   r = amdgpu_bo_reserve(abo, true);
>
> Well why do we reserve the BO in the first place? As the name 
> indicates reservation_object_wait_timeout_rcu() just uses rcu to wait 
> for the BO to be idle, no need to actually reserve it at all.

That a good point, I honestly can't remember any more why it's here... 
It does look unnecessary given that we are not validating the BO here.


>
>> -    if (unlikely(r != 0)) {
>> +    if (unlikely(r != 0))
>>   DRM_ERROR("failed to reserve buffer before flip\n");
>> -    WARN_ON(1);
>> -    }
>>   -    /* Wait for all fences on this FB */
>> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, 
>> false,
>> - MAX_SCHEDULE_TIMEOUT) < 0);
>> +    /*
>> + * Wait for all fences on this FB. Do limited wait to avoid
>> + * deadlock during GPU reset when this fence will not 
>> signal
>> + * but we hold reservation lock for the BO.
>> + */
>> +    r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>> +    true, false,
>
> Does this waiting happen in a work item or process context? If it's 
> process context we should actually try to wait interruptible here.


Work item only.

Andrey


>
> Regards,
> Christian.
>
>> + msecs_to_jiffies(5000));
>> +    if (unlikely(r == 0))
>> +    DRM_ERROR("Waiting for fences timed out.");
>> +
>> +
>>     amdgpu_bo_get_tiling_flags(abo, _flags);
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-14 Thread Christian König via amd-gfx

Am 13.02.19 um 19:58 schrieb Andrey Grodzovsky:

When ring hang happens amdgpu_dm_commit_planes during flip is holding
the BO reserved and then stack waiting for fences to signal in
reservation_object_wait_timeout_rcu (which won't signal because there
was a hnag). Then when we try to shutdown display block during reset
recovery from drm_atomic_helper_suspend we also try to reserve the BO
from dm_plane_helper_cleanup_fb ending in deadlock.
Also remove useless WARN_ON


Well it is good that you pointed this out, but there are more problems 
than just waiting wile the BO is reserved here.




Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +--
  1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index acc4ff8..f8dec36 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
 */
abo = gem_to_amdgpu_bo(fb->obj[0]);
r = amdgpu_bo_reserve(abo, true);


Well why do we reserve the BO in the first place? As the name indicates 
reservation_object_wait_timeout_rcu() just uses rcu to wait for the BO 
to be idle, no need to actually reserve it at all.



-   if (unlikely(r != 0)) {
+   if (unlikely(r != 0))
DRM_ERROR("failed to reserve buffer before 
flip\n");
-   WARN_ON(1);
-   }
  
-			/* Wait for all fences on this FB */

-   
WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
-  
 MAX_SCHEDULE_TIMEOUT) < 0);
+   /*
+* Wait for all fences on this FB. Do limited wait to 
avoid
+* deadlock during GPU reset when this fence will not 
signal
+* but we hold reservation lock for the BO.
+*/
+   r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
+   true, false,


Does this waiting happen in a work item or process context? If it's 
process context we should actually try to wait interruptible here.


Regards,
Christian.


+   
msecs_to_jiffies(5000));
+   if (unlikely(r == 0))
+   DRM_ERROR("Waiting for fences timed out.");
+
+
  
  			amdgpu_bo_get_tiling_flags(abo, _flags);
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Kazlauskas, Nicholas
On 2/13/19 2:21 PM, Grodzovsky, Andrey wrote:
> 
> On 2/13/19 2:16 PM, Kazlauskas, Nicholas wrote:
>> On 2/13/19 2:10 PM, Grodzovsky, Andrey wrote:
>>> On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
 On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
> When ring hang happens amdgpu_dm_commit_planes during flip is holding
> the BO reserved and then stack waiting for fences to signal in
> reservation_object_wait_timeout_rcu (which won't signal because there
> was a hnag). Then when we try to shutdown display block during reset
> recovery from drm_atomic_helper_suspend we also try to reserve the BO
> from dm_plane_helper_cleanup_fb ending in deadlock.
> Also remove useless WARN_ON
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
> +--
>   1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index acc4ff8..f8dec36 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
> drm_atomic_state *state,
>*/
>   abo = gem_to_amdgpu_bo(fb->obj[0]);
>   r = amdgpu_bo_reserve(abo, true);
> - if (unlikely(r != 0)) {
> + if (unlikely(r != 0))
>   DRM_ERROR("failed to reserve buffer 
> before flip\n");
> - WARN_ON(1);
> - }
>   
> - /* Wait for all fences on this FB */
> - 
> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
> - 
> MAX_SCHEDULE_TIMEOUT) < 0);
> + /*
> +  * Wait for all fences on this FB. Do limited wait to 
> avoid
> +  * deadlock during GPU reset when this fence will not 
> signal
> +  * but we hold reservation lock for the BO.
> +  */
> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
> + true, false,
> + 
> msecs_to_jiffies(5000));
> + if (unlikely(r == 0))
> + DRM_ERROR("Waiting for fences timed out.");
> +
> +
>   
>   amdgpu_bo_get_tiling_flags(abo, _flags);
>   
>
 Is it safe that we're just continuing like this? It's probably better to
 just unreserve the buffer and continue to the next plane, no?

 Nicholas Kazlauskas
>>> As far as I see it should be safe as you are simply flipping to a buffer
>>> for which rendering hasn't finished (or stack actually in this case) so
>>> you might see visual corruption but that the least of your problems if
>>> after 5s the BO still not finalized for presentation, the system is
>>> already probably in very bad shape. Also, in case we do want to  do
>>> error handling we should also take care of  amdgpu_bo_reserve failure
>>> just before that.
>>>
>>> Andrey
>>>
>>>
>> Yeah, I guess this whole blocks needs to be cleaned up in that case.
>> This is a good first step at least. Technically
>> reservation_object_wait_timeout_rcu will return < 0 when it's been
>> interrupted too as an error code but I guess that will just be silently
>> ignored here.
>>
>> If you want you can change the condition to:
>>
>> if (unlikely(r >= 0))
>> DRM_ERROR("Waiting for FB fence failed: id=%d res=%d\n",
>> plane->id, r);
> 
> 
> Note that reservation_object_wait_timeout_rcu has a flag 'bool intr: if
> true, do interruptible wait', we set it to false since the code in case
> of flip runs form the kernel worker thread and not from IOCTL meaning we
> are not in user mode context and hence are not going to recieve user
> signals (cannot be interrupted). So the only values we can recieve here
> are either 0 for time out or val > 0 for wait that completed before time
> out value.
> 
> Andrey

Oh, right.

This patch is good as-is then.

Reviewed-by: Nicholas Kazlauskas 

> 
>>
>> But with or without that change this patch is:
>>
>> Reviewed-by: Nicholas Kazlauskas 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Grodzovsky, Andrey

On 2/13/19 2:16 PM, Kazlauskas, Nicholas wrote:
> On 2/13/19 2:10 PM, Grodzovsky, Andrey wrote:
>> On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
>>> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
 When ring hang happens amdgpu_dm_commit_planes during flip is holding
 the BO reserved and then stack waiting for fences to signal in
 reservation_object_wait_timeout_rcu (which won't signal because there
 was a hnag). Then when we try to shutdown display block during reset
 recovery from drm_atomic_helper_suspend we also try to reserve the BO
 from dm_plane_helper_cleanup_fb ending in deadlock.
 Also remove useless WARN_ON

 Signed-off-by: Andrey Grodzovsky 
 ---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
 +--
  1 file changed, 13 insertions(+), 6 deletions(-)

 diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
 b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 index acc4ff8..f8dec36 100644
 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
 drm_atomic_state *state,
 */
abo = gem_to_amdgpu_bo(fb->obj[0]);
r = amdgpu_bo_reserve(abo, true);
 -  if (unlikely(r != 0)) {
 +  if (unlikely(r != 0))
DRM_ERROR("failed to reserve buffer 
 before flip\n");
 -  WARN_ON(1);
 -  }
  
 -  /* Wait for all fences on this FB */
 -  
 WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
 -  
 MAX_SCHEDULE_TIMEOUT) < 0);
 +  /*
 +   * Wait for all fences on this FB. Do limited wait to 
 avoid
 +   * deadlock during GPU reset when this fence will not 
 signal
 +   * but we hold reservation lock for the BO.
 +   */
 +  r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
 +  true, false,
 +  
 msecs_to_jiffies(5000));
 +  if (unlikely(r == 0))
 +  DRM_ERROR("Waiting for fences timed out.");
 +
 +
  
amdgpu_bo_get_tiling_flags(abo, _flags);
  

>>> Is it safe that we're just continuing like this? It's probably better to
>>> just unreserve the buffer and continue to the next plane, no?
>>>
>>> Nicholas Kazlauskas
>> As far as I see it should be safe as you are simply flipping to a buffer
>> for which rendering hasn't finished (or stack actually in this case) so
>> you might see visual corruption but that the least of your problems if
>> after 5s the BO still not finalized for presentation, the system is
>> already probably in very bad shape. Also, in case we do want to  do
>> error handling we should also take care of  amdgpu_bo_reserve failure
>> just before that.
>>
>> Andrey
>>
>>
> Yeah, I guess this whole blocks needs to be cleaned up in that case.
> This is a good first step at least. Technically
> reservation_object_wait_timeout_rcu will return < 0 when it's been
> interrupted too as an error code but I guess that will just be silently
> ignored here.
>
> If you want you can change the condition to:
>
> if (unlikely(r >= 0))
>DRM_ERROR("Waiting for FB fence failed: id=%d res=%d\n",
> plane->id, r);


Note that reservation_object_wait_timeout_rcu has a flag 'bool intr: if 
true, do interruptible wait', we set it to false since the code in case 
of flip runs form the kernel worker thread and not from IOCTL meaning we 
are not in user mode context and hence are not going to recieve user 
signals (cannot be interrupted). So the only values we can recieve here 
are either 0 for time out or val > 0 for wait that completed before time 
out value.

Andrey

>
> But with or without that change this patch is:
>
> Reviewed-by: Nicholas Kazlauskas 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Kazlauskas, Nicholas
On 2/13/19 2:10 PM, Grodzovsky, Andrey wrote:
> 
> On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
>> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
>>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>>> the BO reserved and then stack waiting for fences to signal in
>>> reservation_object_wait_timeout_rcu (which won't signal because there
>>> was a hnag). Then when we try to shutdown display block during reset
>>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>>> from dm_plane_helper_cleanup_fb ending in deadlock.
>>> Also remove useless WARN_ON
>>>
>>> Signed-off-by: Andrey Grodzovsky 
>>> ---
>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 
>>> +--
>>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index acc4ff8..f8dec36 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
>>> drm_atomic_state *state,
>>>  */
>>> abo = gem_to_amdgpu_bo(fb->obj[0]);
>>> r = amdgpu_bo_reserve(abo, true);
>>> -   if (unlikely(r != 0)) {
>>> +   if (unlikely(r != 0))
>>> DRM_ERROR("failed to reserve buffer 
>>> before flip\n");
>>> -   WARN_ON(1);
>>> -   }
>>> 
>>> -   /* Wait for all fences on this FB */
>>> -   
>>> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
>>> -   
>>> MAX_SCHEDULE_TIMEOUT) < 0);
>>> +   /*
>>> +* Wait for all fences on this FB. Do limited wait to 
>>> avoid
>>> +* deadlock during GPU reset when this fence will not 
>>> signal
>>> +* but we hold reservation lock for the BO.
>>> +*/
>>> +   r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>>> +   true, false,
>>> +   
>>> msecs_to_jiffies(5000));
>>> +   if (unlikely(r == 0))
>>> +   DRM_ERROR("Waiting for fences timed out.");
>>> +
>>> +
>>> 
>>> amdgpu_bo_get_tiling_flags(abo, _flags);
>>> 
>>>
>> Is it safe that we're just continuing like this? It's probably better to
>> just unreserve the buffer and continue to the next plane, no?
>>
>> Nicholas Kazlauskas
> 
> As far as I see it should be safe as you are simply flipping to a buffer
> for which rendering hasn't finished (or stack actually in this case) so
> you might see visual corruption but that the least of your problems if
> after 5s the BO still not finalized for presentation, the system is
> already probably in very bad shape. Also, in case we do want to  do
> error handling we should also take care of  amdgpu_bo_reserve failure
> just before that.
> 
> Andrey
> 
> 

Yeah, I guess this whole blocks needs to be cleaned up in that case. 
This is a good first step at least. Technically 
reservation_object_wait_timeout_rcu will return < 0 when it's been 
interrupted too as an error code but I guess that will just be silently 
ignored here.

If you want you can change the condition to:

if (unlikely(r >= 0))
  DRM_ERROR("Waiting for FB fence failed: id=%d res=%d\n", 
plane->id, r);

But with or without that change this patch is:

Reviewed-by: Nicholas Kazlauskas 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Grodzovsky, Andrey

On 2/13/19 2:00 PM, Kazlauskas, Nicholas wrote:
> On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
>> When ring hang happens amdgpu_dm_commit_planes during flip is holding
>> the BO reserved and then stack waiting for fences to signal in
>> reservation_object_wait_timeout_rcu (which won't signal because there
>> was a hnag). Then when we try to shutdown display block during reset
>> recovery from drm_atomic_helper_suspend we also try to reserve the BO
>> from dm_plane_helper_cleanup_fb ending in deadlock.
>> Also remove useless WARN_ON
>>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +--
>>1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index acc4ff8..f8dec36 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
>> drm_atomic_state *state,
>>   */
>>  abo = gem_to_amdgpu_bo(fb->obj[0]);
>>  r = amdgpu_bo_reserve(abo, true);
>> -if (unlikely(r != 0)) {
>> +if (unlikely(r != 0))
>>  DRM_ERROR("failed to reserve buffer before 
>> flip\n");
>> -WARN_ON(1);
>> -}
>>
>> -/* Wait for all fences on this FB */
>> -
>> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
>> -
>> MAX_SCHEDULE_TIMEOUT) < 0);
>> +/*
>> + * Wait for all fences on this FB. Do limited wait to 
>> avoid
>> + * deadlock during GPU reset when this fence will not 
>> signal
>> + * but we hold reservation lock for the BO.
>> + */
>> +r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
>> +true, false,
>> +
>> msecs_to_jiffies(5000));
>> +if (unlikely(r == 0))
>> +DRM_ERROR("Waiting for fences timed out.");
>> +
>> +
>>
>>  amdgpu_bo_get_tiling_flags(abo, _flags);
>>
>>
> Is it safe that we're just continuing like this? It's probably better to
> just unreserve the buffer and continue to the next plane, no?
>
> Nicholas Kazlauskas

As far as I see it should be safe as you are simply flipping to a buffer 
for which rendering hasn't finished (or stack actually in this case) so 
you might see visual corruption but that the least of your problems if 
after 5s the BO still not finalized for presentation, the system is 
already probably in very bad shape. Also, in case we do want to  do 
error handling we should also take care of  amdgpu_bo_reserve failure 
just before that.

Andrey


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Kazlauskas, Nicholas
On 2/13/19 1:58 PM, Andrey Grodzovsky wrote:
> When ring hang happens amdgpu_dm_commit_planes during flip is holding
> the BO reserved and then stack waiting for fences to signal in
> reservation_object_wait_timeout_rcu (which won't signal because there
> was a hnag). Then when we try to shutdown display block during reset
> recovery from drm_atomic_helper_suspend we also try to reserve the BO
> from dm_plane_helper_cleanup_fb ending in deadlock.
> Also remove useless WARN_ON
> 
> Signed-off-by: Andrey Grodzovsky 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +--
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index acc4ff8..f8dec36 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
> drm_atomic_state *state,
>*/
>   abo = gem_to_amdgpu_bo(fb->obj[0]);
>   r = amdgpu_bo_reserve(abo, true);
> - if (unlikely(r != 0)) {
> + if (unlikely(r != 0))
>   DRM_ERROR("failed to reserve buffer before 
> flip\n");
> - WARN_ON(1);
> - }
>   
> - /* Wait for all fences on this FB */
> - 
> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
> - 
> MAX_SCHEDULE_TIMEOUT) < 0);
> + /*
> +  * Wait for all fences on this FB. Do limited wait to 
> avoid
> +  * deadlock during GPU reset when this fence will not 
> signal
> +  * but we hold reservation lock for the BO.
> +  */
> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
> + true, false,
> + 
> msecs_to_jiffies(5000));
> + if (unlikely(r == 0))
> + DRM_ERROR("Waiting for fences timed out.");
> +
> +
>   
>   amdgpu_bo_get_tiling_flags(abo, _flags);
>   
> 

Is it safe that we're just continuing like this? It's probably better to 
just unreserve the buffer and continue to the next plane, no?

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/display: Fix deadlock with display during hanged ring recovery.

2019-02-13 Thread Andrey Grodzovsky
When ring hang happens amdgpu_dm_commit_planes during flip is holding
the BO reserved and then stack waiting for fences to signal in
reservation_object_wait_timeout_rcu (which won't signal because there
was a hnag). Then when we try to shutdown display block during reset
recovery from drm_atomic_helper_suspend we also try to reserve the BO
from dm_plane_helper_cleanup_fb ending in deadlock.
Also remove useless WARN_ON

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index acc4ff8..f8dec36 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4802,14 +4802,21 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
 */
abo = gem_to_amdgpu_bo(fb->obj[0]);
r = amdgpu_bo_reserve(abo, true);
-   if (unlikely(r != 0)) {
+   if (unlikely(r != 0))
DRM_ERROR("failed to reserve buffer before 
flip\n");
-   WARN_ON(1);
-   }
 
-   /* Wait for all fences on this FB */
-   
WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
-   
MAX_SCHEDULE_TIMEOUT) < 0);
+   /*
+* Wait for all fences on this FB. Do limited wait to 
avoid
+* deadlock during GPU reset when this fence will not 
signal
+* but we hold reservation lock for the BO.
+*/
+   r = reservation_object_wait_timeout_rcu(abo->tbo.resv,
+   true, false,
+   
msecs_to_jiffies(5000));
+   if (unlikely(r == 0))
+   DRM_ERROR("Waiting for fences timed out.");
+
+
 
amdgpu_bo_get_tiling_flags(abo, _flags);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx