Re: [PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-18 Thread Wentland, Harry
On 2018-10-18 1:38 p.m., Wentland, Harry wrote:
> On 2018-10-16 11:48 a.m., Alex Deucher wrote:
>> On Tue, Oct 16, 2018 at 11:00 AM Li, Sun peng (Leo)  
>> wrote:
>>>
>>>
>>>
>>> On 2018-10-16 08:33 AM, Daniel Vetter wrote:
 On Mon, Oct 15, 2018 at 09:46:40AM -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
>
> This fixes a general protection fault, caused by accessing the contents
> of a flip_done completion object that has already been freed. It occurs
> due to the preemption of a non-blocking commit worker thread W by
> another commit thread X. X continues to clear its atomic state at the
> end, destroying the CRTC commit object that W still needs. Switching
> back to W and accessing the commit objects then leads to bad results.
>
> Worker W becomes preemptable when waiting for flip_done to complete. At
> this point, a frequently occurring commit thread X can take over. Here's
> an example where W is a worker thread that flips on both CRTCs, and X
> does a legacy cursor update on both CRTCs:
>
>  ...
>   1. W does flip work
>   2. W runs commit_hw_done()
>   3. W waits for flip_done on CRTC 1
>   4. > flip_done for CRTC 1 completes
>   5. W finishes waiting for CRTC 1
>   6. W waits for flip_done on CRTC 2
>
>   7. > Preempted by X
>   8. > flip_done for CRTC 2 completes
>   9. X atomic_check: hw_done and flip_done are complete on all CRTCs
>  10. X updates cursor on both CRTCs
>  11. X destroys atomic state
>  12. X done
>
>  13. > Switch back to W
>  14. W waits for flip_done on CRTC 2
>  15. W raises general protection fault
>
> The error looks like so:
>
>  general protection fault:  [#1] PREEMPT SMP PTI
>  **snip**
>  Call Trace:
>   lock_acquire+0xa2/0x1b0
>   _raw_spin_lock_irq+0x39/0x70
>   wait_for_completion_timeout+0x31/0x130
>   drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
>   amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
>   commit_tail+0x3d/0x70 [drm_kms_helper]
>   process_one_work+0x212/0x650
>   worker_thread+0x49/0x420
>   kthread+0xfb/0x130
>   ret_from_fork+0x3a/0x50
>  Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
>  gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
>  fb_sys_fops ttm(O) drm(O)
>
> Note that i915 has this issue masked, since hw_done is signaled after
> waiting for flip_done. Doing so will block the cursor update from
> happening until hw_done is signaled, preventing the cursor commit from
> destroying the state.
>
> v2: The reference on the commit object needs to be obtained before
>  hw_done() is signaled, since that's the point where another commit
>  is allowed to modify the state. Assuming that the
>  new_crtc_state->commit object still exists within flip_done() is
>  incorrect.
>
>  Fix by getting a reference in setup_commit(), and releasing it
>  during default_clear().
>
> Signed-off-by: Leo Li 
> ---
>
> Sending again, this time to the correct mailing list :)
>
> Leo

 Reviewed-by: Daniel Vetter 
 Cc: sta...@vger.kernel.org

 I think it'd be really good if you could get intel-gfx-ci to test this
 patch. Simply submit it to intel-...@lists.freedesktop.org. I'll leave
 applying to one of the amd drm-misc committers, once it's passed CI.
>>
>> Leo, do you or Harry have drm-misc commit access yet?  If not, you should.
>>
> 
> I believe I do and will push the patch. Leo's getting the ball rolling to get 
> access (fdo account, etc).
> 

... and pushed to drm-misc-fixes.

Harry

> Harry
> 
>> Alex
>>

 Cheers, Daniel
>>>
>>> Thanks for the rb!
>>>
>>> On the topic of CI, is it possible to write a test (maybe one already
>>> exists) for this issue? I've attempted to do one here:
>>>
>>> https://patchwork.freedesktop.org/patch/256319/
>>>
>>> The problem is finding a surefire way to trigger the sequence, I'm not
>>> sure how that can be done. If you have any ideas, I would love to hear them.
>>>
>>> Leo
>>>

>
>   drivers/gpu/drm/drm_atomic.c|  5 +
>   drivers/gpu/drm/drm_atomic_helper.c | 12 
>   include/drm/drm_atomic.h| 11 +++
>   3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index 3eb061e..12ae917 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
> drm_atomic_state *state)
>  state->crtcs[i].state = NULL;
>  

Re: [PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-18 Thread Wentland, Harry
On 2018-10-16 11:48 a.m., Alex Deucher wrote:
> On Tue, Oct 16, 2018 at 11:00 AM Li, Sun peng (Leo)  
> wrote:
>>
>>
>>
>> On 2018-10-16 08:33 AM, Daniel Vetter wrote:
>>> On Mon, Oct 15, 2018 at 09:46:40AM -0400, sunpeng...@amd.com wrote:
 From: Leo Li 

 This fixes a general protection fault, caused by accessing the contents
 of a flip_done completion object that has already been freed. It occurs
 due to the preemption of a non-blocking commit worker thread W by
 another commit thread X. X continues to clear its atomic state at the
 end, destroying the CRTC commit object that W still needs. Switching
 back to W and accessing the commit objects then leads to bad results.

 Worker W becomes preemptable when waiting for flip_done to complete. At
 this point, a frequently occurring commit thread X can take over. Here's
 an example where W is a worker thread that flips on both CRTCs, and X
 does a legacy cursor update on both CRTCs:

  ...
   1. W does flip work
   2. W runs commit_hw_done()
   3. W waits for flip_done on CRTC 1
   4. > flip_done for CRTC 1 completes
   5. W finishes waiting for CRTC 1
   6. W waits for flip_done on CRTC 2

   7. > Preempted by X
   8. > flip_done for CRTC 2 completes
   9. X atomic_check: hw_done and flip_done are complete on all CRTCs
  10. X updates cursor on both CRTCs
  11. X destroys atomic state
  12. X done

  13. > Switch back to W
  14. W waits for flip_done on CRTC 2
  15. W raises general protection fault

 The error looks like so:

  general protection fault:  [#1] PREEMPT SMP PTI
  **snip**
  Call Trace:
   lock_acquire+0xa2/0x1b0
   _raw_spin_lock_irq+0x39/0x70
   wait_for_completion_timeout+0x31/0x130
   drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
   amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
   commit_tail+0x3d/0x70 [drm_kms_helper]
   process_one_work+0x212/0x650
   worker_thread+0x49/0x420
   kthread+0xfb/0x130
   ret_from_fork+0x3a/0x50
  Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
  gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
  fb_sys_fops ttm(O) drm(O)

 Note that i915 has this issue masked, since hw_done is signaled after
 waiting for flip_done. Doing so will block the cursor update from
 happening until hw_done is signaled, preventing the cursor commit from
 destroying the state.

 v2: The reference on the commit object needs to be obtained before
  hw_done() is signaled, since that's the point where another commit
  is allowed to modify the state. Assuming that the
  new_crtc_state->commit object still exists within flip_done() is
  incorrect.

  Fix by getting a reference in setup_commit(), and releasing it
  during default_clear().

 Signed-off-by: Leo Li 
 ---

 Sending again, this time to the correct mailing list :)

 Leo
>>>
>>> Reviewed-by: Daniel Vetter 
>>> Cc: sta...@vger.kernel.org
>>>
>>> I think it'd be really good if you could get intel-gfx-ci to test this
>>> patch. Simply submit it to intel-...@lists.freedesktop.org. I'll leave
>>> applying to one of the amd drm-misc committers, once it's passed CI.
> 
> Leo, do you or Harry have drm-misc commit access yet?  If not, you should.
> 

I believe I do and will push the patch. Leo's getting the ball rolling to get 
access (fdo account, etc).

Harry

> Alex
> 
>>>
>>> Cheers, Daniel
>>
>> Thanks for the rb!
>>
>> On the topic of CI, is it possible to write a test (maybe one already
>> exists) for this issue? I've attempted to do one here:
>>
>> https://patchwork.freedesktop.org/patch/256319/
>>
>> The problem is finding a surefire way to trigger the sequence, I'm not
>> sure how that can be done. If you have any ideas, I would love to hear them.
>>
>> Leo
>>
>>>

   drivers/gpu/drm/drm_atomic.c|  5 +
   drivers/gpu/drm/drm_atomic_helper.c | 12 
   include/drm/drm_atomic.h| 11 +++
   3 files changed, 24 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
 index 3eb061e..12ae917 100644
 --- a/drivers/gpu/drm/drm_atomic.c
 +++ b/drivers/gpu/drm/drm_atomic.c
 @@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
 drm_atomic_state *state)
  state->crtcs[i].state = NULL;
  state->crtcs[i].old_state = NULL;
  state->crtcs[i].new_state = NULL;
 +
 +if (state->crtcs[i].commit) {
 +drm_crtc_commit_put(state->crtcs[i].commit);
 +

Re: [PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-16 Thread Alex Deucher
On Tue, Oct 16, 2018 at 11:00 AM Li, Sun peng (Leo)  wrote:
>
>
>
> On 2018-10-16 08:33 AM, Daniel Vetter wrote:
> > On Mon, Oct 15, 2018 at 09:46:40AM -0400, sunpeng...@amd.com wrote:
> >> From: Leo Li 
> >>
> >> This fixes a general protection fault, caused by accessing the contents
> >> of a flip_done completion object that has already been freed. It occurs
> >> due to the preemption of a non-blocking commit worker thread W by
> >> another commit thread X. X continues to clear its atomic state at the
> >> end, destroying the CRTC commit object that W still needs. Switching
> >> back to W and accessing the commit objects then leads to bad results.
> >>
> >> Worker W becomes preemptable when waiting for flip_done to complete. At
> >> this point, a frequently occurring commit thread X can take over. Here's
> >> an example where W is a worker thread that flips on both CRTCs, and X
> >> does a legacy cursor update on both CRTCs:
> >>
> >>  ...
> >>   1. W does flip work
> >>   2. W runs commit_hw_done()
> >>   3. W waits for flip_done on CRTC 1
> >>   4. > flip_done for CRTC 1 completes
> >>   5. W finishes waiting for CRTC 1
> >>   6. W waits for flip_done on CRTC 2
> >>
> >>   7. > Preempted by X
> >>   8. > flip_done for CRTC 2 completes
> >>   9. X atomic_check: hw_done and flip_done are complete on all CRTCs
> >>  10. X updates cursor on both CRTCs
> >>  11. X destroys atomic state
> >>  12. X done
> >>
> >>  13. > Switch back to W
> >>  14. W waits for flip_done on CRTC 2
> >>  15. W raises general protection fault
> >>
> >> The error looks like so:
> >>
> >>  general protection fault:  [#1] PREEMPT SMP PTI
> >>  **snip**
> >>  Call Trace:
> >>   lock_acquire+0xa2/0x1b0
> >>   _raw_spin_lock_irq+0x39/0x70
> >>   wait_for_completion_timeout+0x31/0x130
> >>   drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
> >>   amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
> >>   commit_tail+0x3d/0x70 [drm_kms_helper]
> >>   process_one_work+0x212/0x650
> >>   worker_thread+0x49/0x420
> >>   kthread+0xfb/0x130
> >>   ret_from_fork+0x3a/0x50
> >>  Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
> >>  gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
> >>  fb_sys_fops ttm(O) drm(O)
> >>
> >> Note that i915 has this issue masked, since hw_done is signaled after
> >> waiting for flip_done. Doing so will block the cursor update from
> >> happening until hw_done is signaled, preventing the cursor commit from
> >> destroying the state.
> >>
> >> v2: The reference on the commit object needs to be obtained before
> >>  hw_done() is signaled, since that's the point where another commit
> >>  is allowed to modify the state. Assuming that the
> >>  new_crtc_state->commit object still exists within flip_done() is
> >>  incorrect.
> >>
> >>  Fix by getting a reference in setup_commit(), and releasing it
> >>  during default_clear().
> >>
> >> Signed-off-by: Leo Li 
> >> ---
> >>
> >> Sending again, this time to the correct mailing list :)
> >>
> >> Leo
> >
> > Reviewed-by: Daniel Vetter 
> > Cc: sta...@vger.kernel.org
> >
> > I think it'd be really good if you could get intel-gfx-ci to test this
> > patch. Simply submit it to intel-...@lists.freedesktop.org. I'll leave
> > applying to one of the amd drm-misc committers, once it's passed CI.

Leo, do you or Harry have drm-misc commit access yet?  If not, you should.

Alex

> >
> > Cheers, Daniel
>
> Thanks for the rb!
>
> On the topic of CI, is it possible to write a test (maybe one already
> exists) for this issue? I've attempted to do one here:
>
> https://patchwork.freedesktop.org/patch/256319/
>
> The problem is finding a surefire way to trigger the sequence, I'm not
> sure how that can be done. If you have any ideas, I would love to hear them.
>
> Leo
>
> >
> >>
> >>   drivers/gpu/drm/drm_atomic.c|  5 +
> >>   drivers/gpu/drm/drm_atomic_helper.c | 12 
> >>   include/drm/drm_atomic.h| 11 +++
> >>   3 files changed, 24 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> >> index 3eb061e..12ae917 100644
> >> --- a/drivers/gpu/drm/drm_atomic.c
> >> +++ b/drivers/gpu/drm/drm_atomic.c
> >> @@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
> >> drm_atomic_state *state)
> >>  state->crtcs[i].state = NULL;
> >>  state->crtcs[i].old_state = NULL;
> >>  state->crtcs[i].new_state = NULL;
> >> +
> >> +if (state->crtcs[i].commit) {
> >> +drm_crtc_commit_put(state->crtcs[i].commit);
> >> +state->crtcs[i].commit = NULL;
> >> +}
> >>  }
> >>
> >>  for (i = 0; i < config->num_total_plane; i++) {
> >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> >> 

Re: [PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-16 Thread Li, Sun peng (Leo)


On 2018-10-16 08:33 AM, Daniel Vetter wrote:
> On Mon, Oct 15, 2018 at 09:46:40AM -0400, sunpeng...@amd.com wrote:
>> From: Leo Li 
>>
>> This fixes a general protection fault, caused by accessing the contents
>> of a flip_done completion object that has already been freed. It occurs
>> due to the preemption of a non-blocking commit worker thread W by
>> another commit thread X. X continues to clear its atomic state at the
>> end, destroying the CRTC commit object that W still needs. Switching
>> back to W and accessing the commit objects then leads to bad results.
>>
>> Worker W becomes preemptable when waiting for flip_done to complete. At
>> this point, a frequently occurring commit thread X can take over. Here's
>> an example where W is a worker thread that flips on both CRTCs, and X
>> does a legacy cursor update on both CRTCs:
>>
>>  ...
>>   1. W does flip work
>>   2. W runs commit_hw_done()
>>   3. W waits for flip_done on CRTC 1
>>   4. > flip_done for CRTC 1 completes
>>   5. W finishes waiting for CRTC 1
>>   6. W waits for flip_done on CRTC 2
>>
>>   7. > Preempted by X
>>   8. > flip_done for CRTC 2 completes
>>   9. X atomic_check: hw_done and flip_done are complete on all CRTCs
>>  10. X updates cursor on both CRTCs
>>  11. X destroys atomic state
>>  12. X done
>>
>>  13. > Switch back to W
>>  14. W waits for flip_done on CRTC 2
>>  15. W raises general protection fault
>>
>> The error looks like so:
>>
>>  general protection fault:  [#1] PREEMPT SMP PTI
>>  **snip**
>>  Call Trace:
>>   lock_acquire+0xa2/0x1b0
>>   _raw_spin_lock_irq+0x39/0x70
>>   wait_for_completion_timeout+0x31/0x130
>>   drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
>>   amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
>>   commit_tail+0x3d/0x70 [drm_kms_helper]
>>   process_one_work+0x212/0x650
>>   worker_thread+0x49/0x420
>>   kthread+0xfb/0x130
>>   ret_from_fork+0x3a/0x50
>>  Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
>>  gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
>>  fb_sys_fops ttm(O) drm(O)
>>
>> Note that i915 has this issue masked, since hw_done is signaled after
>> waiting for flip_done. Doing so will block the cursor update from
>> happening until hw_done is signaled, preventing the cursor commit from
>> destroying the state.
>>
>> v2: The reference on the commit object needs to be obtained before
>>  hw_done() is signaled, since that's the point where another commit
>>  is allowed to modify the state. Assuming that the
>>  new_crtc_state->commit object still exists within flip_done() is
>>  incorrect.
>>
>>  Fix by getting a reference in setup_commit(), and releasing it
>>  during default_clear().
>>
>> Signed-off-by: Leo Li 
>> ---
>>
>> Sending again, this time to the correct mailing list :)
>>
>> Leo
> 
> Reviewed-by: Daniel Vetter 
> Cc: sta...@vger.kernel.org
> 
> I think it'd be really good if you could get intel-gfx-ci to test this
> patch. Simply submit it to intel-...@lists.freedesktop.org. I'll leave
> applying to one of the amd drm-misc committers, once it's passed CI.
> 
> Cheers, Daniel

Thanks for the rb!

On the topic of CI, is it possible to write a test (maybe one already
exists) for this issue? I've attempted to do one here:

https://patchwork.freedesktop.org/patch/256319/

The problem is finding a surefire way to trigger the sequence, I'm not
sure how that can be done. If you have any ideas, I would love to hear them.

Leo

> 
>>
>>   drivers/gpu/drm/drm_atomic.c|  5 +
>>   drivers/gpu/drm/drm_atomic_helper.c | 12 
>>   include/drm/drm_atomic.h| 11 +++
>>   3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
>> index 3eb061e..12ae917 100644
>> --- a/drivers/gpu/drm/drm_atomic.c
>> +++ b/drivers/gpu/drm/drm_atomic.c
>> @@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
>> drm_atomic_state *state)
>>  state->crtcs[i].state = NULL;
>>  state->crtcs[i].old_state = NULL;
>>  state->crtcs[i].new_state = NULL;
>> +
>> +if (state->crtcs[i].commit) {
>> +drm_crtc_commit_put(state->crtcs[i].commit);
>> +state->crtcs[i].commit = NULL;
>> +}
>>  }
>>   
>>  for (i = 0; i < config->num_total_plane; i++) {
>> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
>> b/drivers/gpu/drm/drm_atomic_helper.c
>> index 80be74d..1bb4c31 100644
>> --- a/drivers/gpu/drm/drm_atomic_helper.c
>> +++ b/drivers/gpu/drm/drm_atomic_helper.c
>> @@ -1408,15 +1408,16 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks);
>>   void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
>>struct drm_atomic_state *old_state)
>>  

Re: [PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-16 Thread Daniel Vetter
On Mon, Oct 15, 2018 at 09:46:40AM -0400, sunpeng...@amd.com wrote:
> From: Leo Li 
> 
> This fixes a general protection fault, caused by accessing the contents
> of a flip_done completion object that has already been freed. It occurs
> due to the preemption of a non-blocking commit worker thread W by
> another commit thread X. X continues to clear its atomic state at the
> end, destroying the CRTC commit object that W still needs. Switching
> back to W and accessing the commit objects then leads to bad results.
> 
> Worker W becomes preemptable when waiting for flip_done to complete. At
> this point, a frequently occurring commit thread X can take over. Here's
> an example where W is a worker thread that flips on both CRTCs, and X
> does a legacy cursor update on both CRTCs:
> 
> ...
>  1. W does flip work
>  2. W runs commit_hw_done()
>  3. W waits for flip_done on CRTC 1
>  4. > flip_done for CRTC 1 completes
>  5. W finishes waiting for CRTC 1
>  6. W waits for flip_done on CRTC 2
> 
>  7. > Preempted by X
>  8. > flip_done for CRTC 2 completes
>  9. X atomic_check: hw_done and flip_done are complete on all CRTCs
> 10. X updates cursor on both CRTCs
> 11. X destroys atomic state
> 12. X done
> 
> 13. > Switch back to W
> 14. W waits for flip_done on CRTC 2
> 15. W raises general protection fault
> 
> The error looks like so:
> 
> general protection fault:  [#1] PREEMPT SMP PTI
> **snip**
> Call Trace:
>  lock_acquire+0xa2/0x1b0
>  _raw_spin_lock_irq+0x39/0x70
>  wait_for_completion_timeout+0x31/0x130
>  drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
>  amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
>  commit_tail+0x3d/0x70 [drm_kms_helper]
>  process_one_work+0x212/0x650
>  worker_thread+0x49/0x420
>  kthread+0xfb/0x130
>  ret_from_fork+0x3a/0x50
> Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
> gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
> fb_sys_fops ttm(O) drm(O)
> 
> Note that i915 has this issue masked, since hw_done is signaled after
> waiting for flip_done. Doing so will block the cursor update from
> happening until hw_done is signaled, preventing the cursor commit from
> destroying the state.
> 
> v2: The reference on the commit object needs to be obtained before
> hw_done() is signaled, since that's the point where another commit
> is allowed to modify the state. Assuming that the
> new_crtc_state->commit object still exists within flip_done() is
> incorrect.
> 
> Fix by getting a reference in setup_commit(), and releasing it
> during default_clear().
> 
> Signed-off-by: Leo Li 
> ---
> 
> Sending again, this time to the correct mailing list :)
> 
> Leo

Reviewed-by: Daniel Vetter 
Cc: sta...@vger.kernel.org

I think it'd be really good if you could get intel-gfx-ci to test this
patch. Simply submit it to intel-...@lists.freedesktop.org. I'll leave
applying to one of the amd drm-misc committers, once it's passed CI.

Cheers, Daniel

> 
>  drivers/gpu/drm/drm_atomic.c|  5 +
>  drivers/gpu/drm/drm_atomic_helper.c | 12 
>  include/drm/drm_atomic.h| 11 +++
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index 3eb061e..12ae917 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
> drm_atomic_state *state)
>   state->crtcs[i].state = NULL;
>   state->crtcs[i].old_state = NULL;
>   state->crtcs[i].new_state = NULL;
> +
> + if (state->crtcs[i].commit) {
> + drm_crtc_commit_put(state->crtcs[i].commit);
> + state->crtcs[i].commit = NULL;
> + }
>   }
>  
>   for (i = 0; i < config->num_total_plane; i++) {
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index 80be74d..1bb4c31 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1408,15 +1408,16 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks);
>  void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
> struct drm_atomic_state *old_state)
>  {
> - struct drm_crtc_state *new_crtc_state;
>   struct drm_crtc *crtc;
>   int i;
>  
> - for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) {
> - struct drm_crtc_commit *commit = new_crtc_state->commit;
> + for (i = 0; i < dev->mode_config.num_crtc; i++) {
> + struct drm_crtc_commit *commit = old_state->crtcs[i].commit;
>   int ret;
>  
> - if (!commit)
> + crtc = old_state->crtcs[i].ptr;
> +
> + if (!crtc || !commit)
> 

[PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-15 Thread sunpeng.li
From: Leo Li 

This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.

Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:

...
 1. W does flip work
 2. W runs commit_hw_done()
 3. W waits for flip_done on CRTC 1
 4. > flip_done for CRTC 1 completes
 5. W finishes waiting for CRTC 1
 6. W waits for flip_done on CRTC 2

 7. > Preempted by X
 8. > flip_done for CRTC 2 completes
 9. X atomic_check: hw_done and flip_done are complete on all CRTCs
10. X updates cursor on both CRTCs
11. X destroys atomic state
12. X done

13. > Switch back to W
14. W waits for flip_done on CRTC 2
15. W raises general protection fault

The error looks like so:

general protection fault:  [#1] PREEMPT SMP PTI
**snip**
Call Trace:
 lock_acquire+0xa2/0x1b0
 _raw_spin_lock_irq+0x39/0x70
 wait_for_completion_timeout+0x31/0x130
 drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
 amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
 commit_tail+0x3d/0x70 [drm_kms_helper]
 process_one_work+0x212/0x650
 worker_thread+0x49/0x420
 kthread+0xfb/0x130
 ret_from_fork+0x3a/0x50
Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm(O) drm(O)

Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.

v2: The reference on the commit object needs to be obtained before
hw_done() is signaled, since that's the point where another commit
is allowed to modify the state. Assuming that the
new_crtc_state->commit object still exists within flip_done() is
incorrect.

Fix by getting a reference in setup_commit(), and releasing it
during default_clear().

Signed-off-by: Leo Li 
---

Sending again, this time to the correct mailing list :)

Leo

 drivers/gpu/drm/drm_atomic.c|  5 +
 drivers/gpu/drm/drm_atomic_helper.c | 12 
 include/drm/drm_atomic.h| 11 +++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 3eb061e..12ae917 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
drm_atomic_state *state)
state->crtcs[i].state = NULL;
state->crtcs[i].old_state = NULL;
state->crtcs[i].new_state = NULL;
+
+   if (state->crtcs[i].commit) {
+   drm_crtc_commit_put(state->crtcs[i].commit);
+   state->crtcs[i].commit = NULL;
+   }
}
 
for (i = 0; i < config->num_total_plane; i++) {
diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 80be74d..1bb4c31 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1408,15 +1408,16 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks);
 void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
  struct drm_atomic_state *old_state)
 {
-   struct drm_crtc_state *new_crtc_state;
struct drm_crtc *crtc;
int i;
 
-   for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) {
-   struct drm_crtc_commit *commit = new_crtc_state->commit;
+   for (i = 0; i < dev->mode_config.num_crtc; i++) {
+   struct drm_crtc_commit *commit = old_state->crtcs[i].commit;
int ret;
 
-   if (!commit)
+   crtc = old_state->crtcs[i].ptr;
+
+   if (!crtc || !commit)
continue;
 
ret = wait_for_completion_timeout(>flip_done, 10 * HZ);
@@ -1934,6 +1935,9 @@ int drm_atomic_helper_setup_commit(struct 
drm_atomic_state *state,
drm_crtc_commit_get(commit);
 
commit->abort_completion = true;
+
+   state->crtcs[i].commit = commit;
+   drm_crtc_commit_get(commit);
}
 
for_each_oldnew_connector_in_state(state, conn, old_conn_state, 
new_conn_state, i) {
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h

[PATCH v2] drm: Get ref on CRTC commit object when waiting for flip_done

2018-10-12 Thread sunpeng.li
From: Leo Li 

This fixes a general protection fault, caused by accessing the contents
of a flip_done completion object that has already been freed. It occurs
due to the preemption of a non-blocking commit worker thread W by
another commit thread X. X continues to clear its atomic state at the
end, destroying the CRTC commit object that W still needs. Switching
back to W and accessing the commit objects then leads to bad results.

Worker W becomes preemptable when waiting for flip_done to complete. At
this point, a frequently occurring commit thread X can take over. Here's
an example where W is a worker thread that flips on both CRTCs, and X
does a legacy cursor update on both CRTCs:

...
 1. W does flip work
 2. W runs commit_hw_done()
 3. W waits for flip_done on CRTC 1
 4. > flip_done for CRTC 1 completes
 5. W finishes waiting for CRTC 1
 6. W waits for flip_done on CRTC 2

 7. > Preempted by X
 8. > flip_done for CRTC 2 completes
 9. X atomic_check: hw_done and flip_done are complete on all CRTCs
10. X updates cursor on both CRTCs
11. X destroys atomic state
12. X done

13. > Switch back to W
14. W waits for flip_done on CRTC 2
15. W raises general protection fault

The error looks like so:

general protection fault:  [#1] PREEMPT SMP PTI
**snip**
Call Trace:
 lock_acquire+0xa2/0x1b0
 _raw_spin_lock_irq+0x39/0x70
 wait_for_completion_timeout+0x31/0x130
 drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper]
 amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu]
 commit_tail+0x3d/0x70 [drm_kms_helper]
 process_one_work+0x212/0x650
 worker_thread+0x49/0x420
 kthread+0xfb/0x130
 ret_from_fork+0x3a/0x50
Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O)
gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm(O) drm(O)

Note that i915 has this issue masked, since hw_done is signaled after
waiting for flip_done. Doing so will block the cursor update from
happening until hw_done is signaled, preventing the cursor commit from
destroying the state.

v2: The reference on the commit object needs to be obtained before
hw_done() is signaled, since that's the point where another commit
is allowed to modify the state. Assuming that the
new_crtc_state->commit object still exists within flip_done() is
incorrect.

Fix by getting a reference in setup_commit(), and releasing it
during default_clear().

Signed-off-by: Leo Li 
---
 drivers/gpu/drm/drm_atomic.c|  5 +
 drivers/gpu/drm/drm_atomic_helper.c | 12 
 include/drm/drm_atomic.h| 11 +++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 3eb061e..12ae917 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -174,6 +174,11 @@ void drm_atomic_state_default_clear(struct 
drm_atomic_state *state)
state->crtcs[i].state = NULL;
state->crtcs[i].old_state = NULL;
state->crtcs[i].new_state = NULL;
+
+   if (state->crtcs[i].commit) {
+   drm_crtc_commit_put(state->crtcs[i].commit);
+   state->crtcs[i].commit = NULL;
+   }
}
 
for (i = 0; i < config->num_total_plane; i++) {
diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 80be74d..1bb4c31 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1408,15 +1408,16 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_vblanks);
 void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev,
  struct drm_atomic_state *old_state)
 {
-   struct drm_crtc_state *new_crtc_state;
struct drm_crtc *crtc;
int i;
 
-   for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) {
-   struct drm_crtc_commit *commit = new_crtc_state->commit;
+   for (i = 0; i < dev->mode_config.num_crtc; i++) {
+   struct drm_crtc_commit *commit = old_state->crtcs[i].commit;
int ret;
 
-   if (!commit)
+   crtc = old_state->crtcs[i].ptr;
+
+   if (!crtc || !commit)
continue;
 
ret = wait_for_completion_timeout(>flip_done, 10 * HZ);
@@ -1934,6 +1935,9 @@ int drm_atomic_helper_setup_commit(struct 
drm_atomic_state *state,
drm_crtc_commit_get(commit);
 
commit->abort_completion = true;
+
+   state->crtcs[i].commit = commit;
+   drm_crtc_commit_get(commit);
}
 
for_each_oldnew_connector_in_state(state, conn, old_conn_state, 
new_conn_state, i) {
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index da9d95a..1e71315 100644
--- a/include/drm/drm_atomic.h