Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Grodzovsky, Andrey


On 12/14/2018 02:17 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 2:06 PM, Grodzovsky, Andrey wrote:
>> In general I agree with Michel that  DRM solution is required to
>> properly address this but since now it's not really obvious what is the
>> proper solution it seems to me OK to go with this fix until it's found.
>>
>> Reviewed-by: Andrey Grodzovsky 
>>
>> Andrey
> Thanks for the review.
>
> FWIW, we're not the only ones with the fb change check like this - msm
> does it too. The only other user of atomic_async_check and
> atomic_async_update is vc4 and they don't have it but I'd imagine they
> see a similar bug with interleaving framebuffers.
>
> It may be difficult to develop a "fix" for the behavior in DRM given the
> semantics of the function (in place update vs full swap). The
> old_plane_state is technically plane->state in this case, so even just
> adding cleanup_fb(plane, old_plane_state) after atomic_async_update
> isn't enough. What *should* be done here is the full state swap like in
> a regular atomic commit but that may cause breakages in other drivers.

Your change effectively does that for AMDGPU by forcing non async update 
for when
new_plane->state->fb != curret_plane->state->fb.
But after looking more into it looks to me that this fix is the generic 
solution, I don't see any way around it, if you swap the fb to a new
one you must not unreference it until after a new fb arrives and set as 
current swapping out this one (in some following commit).
Why you think that making this the generic solution by moving this from 
dm_plane_atomic_async_check to drm_atomic_helper_async_check
will break other drivers ?

Andrey

>
> Copying plane->state and calling cleanup_fb on that would also work to
> fix it, but the behavior is certainly unintuitive and asking for worse
> bugs than this one to pop up since nothing else in DRM works like that.
>
> Nicholas Kazlauskas
>
>>
>> On 12/14/2018 12:51 PM, Kazlauskas, Nicholas wrote:
>>> On 12/14/18 12:47 PM, Grodzovsky, Andrey wrote:
 On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
>> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
>>> [Why]
>>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>>> whether the commit was asynchronous or not. When it's called from
>>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>>> plane state has been swapped so it calls cleanup_fb on the old plane
>>> state.
>>>
>>> However, in the asynchronous commit codepath the call to
>>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>>> atomic_async_update has been called. Since the plane state is updated
>>> in place and has not been swapped the cleanup_fb call affects the new
>>> plane state.
>>>
>>> This results in a use after free for the given sequence:
>>>
>>> - Fast update, fb1 pin/ref, fb1 unpin/unref
>>> - Fast update, fb2 pin/ref, fb2 unpin/unref
>>> - Slow update, fb1 pin/ref, fb2 unpin/unref
>> Shouldn't you have use after free already at this point ?
>>
>> Andrey
> This assumes there was 1 reference on the bo. You would be getting use
> after free on every single update (be it fast or slow) if this wasn't
> the case.
 Why would I get it on every single update if it's all balanced - first
 pin/ref then unpin/unref ?
>>> It's balanced, but has a reference not from the atomic code path, ie.
>>> from creation.
>>>
>>> So this is actually:
>>>
>>> fb1 pin, refcount=2, b1 unpin refcount=1
>>>
> So in the case where there was 1 reference on fb2 it's actually freed at
> the end of slow update since the ref count is now 0. Then the use after
> free happens:
 I still don't understand where exactly the 1 reference on fb2 during
 slow update comes form
 if all i see before that is 'Fast update, fb2 pin/ref, fb2 unpin/unref'
 - can you clarify that ?

 Andrey
>>> There isn't any pin/ref on fb2 during the slow update. The pin/ref
>>> happens during a prepare_fb call - which is *always* called on
>>> new_plane_state. So the state looks like this in commit tail:
>>>
>>> old_plane_state->fb = fb2
>>> new_plane_state->fb = fb1
>>>
>>> Then at the end during cleanup planes, cleanup_fb is called on
>>> old_plane_state (fb2).
>>>
>>> Nicholas Kazlauskas
>>>
>>> - Fast update, fb2 pin/ref -> use after free. bug
> ...here.
>
> You can see this exact sequence happening in Michel's log.
>
> Nicholas Kazlauskas
>
>>> [How]
>>> Disallow framebuffer changes in the fast path. Since this includes
>>> a NULL framebuffer, this means that only framebuffers that have
>>> been previously pin+ref at least once will be used, preventing a
>>> use after free.
>>>
>>> This has a significant throughput reduction for cursor updates where
>>> the framebuffer changes. For most 

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v5

2018-12-14 Thread Kuehling, Felix
Except for the GEM and CS parts, the series is Reviewed-by: Felix
Kuehling 

Regards,
  Felix

On 2018-12-14 4:10 p.m., Yang, Philip wrote:
> Use HMM helper function hmm_vma_fault() to get physical pages backing
> userptr and start CPU page table update track of those pages. Then use
> hmm_vma_range_done() to check if those pages are updated before
> amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
>
> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
> from scratch, for kfd, restore worker is rescheduled to retry.
>
> HMM simplify the CPU page table concurrent update check, so remove
> guptasklock, mmu_invalidations, last_set_pages fields from
> amdgpu_ttm_tt struct.
>
> HMM does not pin the page (increase page ref count), so remove related
> operations like release_pages(), put_page(), mark_page_dirty().
>
> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 185 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  25 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 168 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   1 -
>  11 files changed, 209 insertions(+), 293 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 70429f7aa9a8..717791d4fa45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -62,7 +62,6 @@ struct kgd_mem {
>  
>   atomic_t invalid;
>   struct amdkfd_process_info *process_info;
> - struct page **user_pages;
>  
>   struct amdgpu_sync sync;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index be1ab43473c6..2897793600f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
> mm_struct *mm,
>   goto out;
>   }
>  
> - /* If no restore worker is running concurrently, user_pages
> -  * should not be allocated
> -  */
> - WARN(mem->user_pages, "Leaking user_pages array");
> -
> - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
> -sizeof(struct page *),
> -GFP_KERNEL | __GFP_ZERO);
> - if (!mem->user_pages) {
> - pr_err("%s: Failed to allocate pages array\n", __func__);
> - ret = -ENOMEM;
> - goto unregister_out;
> - }
> -
> - ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
> + ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
>   if (ret) {
>   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
> - goto free_out;
> + goto unregister_out;
>   }
>  
> - amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
> -
>   ret = amdgpu_bo_reserve(bo, true);
>   if (ret) {
>   pr_err("%s: Failed to reserve BO\n", __func__);
> @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
> mm_struct *mm,
>   amdgpu_bo_unreserve(bo);
>  
>  release_out:
> - if (ret)
> - release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
> -free_out:
> - kvfree(mem->user_pages);
> - mem->user_pages = NULL;
> + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>  unregister_out:
>   if (ret)
>   amdgpu_mn_unregister(bo);
> @@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
>   ctx->kfd_bo.priority = 0;
>   ctx->kfd_bo.tv.bo = >tbo;
>   ctx->kfd_bo.tv.num_shared = 1;
> - ctx->kfd_bo.user_pages = NULL;
>   list_add(>kfd_bo.tv.head, >list);
>  
>   amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
> @@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
>   ctx->kfd_bo.priority = 0;
>   ctx->kfd_bo.tv.bo = >tbo;
>   ctx->kfd_bo.tv.num_shared = 1;
> - ctx->kfd_bo.user_pages = NULL;
>   list_add(>kfd_bo.tv.head, >list);
>  
>   i = 0;
> @@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   list_del(_list_entry->head);
>   mutex_unlock(_info->lock);
>  
> - /* Free user pages if necessary */
> - if (mem->user_pages) {
> - pr_debug("%s: Freeing user_pages array\n", __func__);
> - if 

Re: [PATCH 1/3] drm/amdgpu: Relocate kgd2kfd function declaration

2018-12-14 Thread Kuehling, Felix
The series is Reviewed-by: Felix Kuehling 

On 2018-12-14 10:22 a.m., Lin, Amber wrote:
> Since amdkfd is merged into amdgpu module and amdgpu can access amdkfd
> directly, move declaration of kgd2kfd functions from kfd_priv.h to
> amdgpu_amdkfd.h
>
> Signed-off-by: Amber Lin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   | 43 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 20 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 22 
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |  3 --
>  7 files changed, 66 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 2dfaf15..358f690 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -619,4 +619,47 @@ struct kfd2kgd_calls 
> *amdgpu_amdkfd_gfx_9_0_get_functions(void)
>  {
>   return NULL;
>  }
> +
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
> +   const struct kfd2kgd_calls *f2g)
> +{
> + return NULL;
> +}
> +
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +  const struct kgd2kfd_shared_resources *gpu_resources)
> +{
> + return false;
> +}
> +
> +void kgd2kfd_device_exit(struct kfd_dev *kfd)
> +{
> +}
> +
> +void kgd2kfd_exit(void)
> +{
> +}
> +
> +void kgd2kfd_suspend(struct kfd_dev *kfd)
> +{
> +}
> +
> +int kgd2kfd_resume(struct kfd_dev *kfd)
> +{
> + return 0;
> +}
> +
> +int kgd2kfd_pre_reset(struct kfd_dev *kfd)
> +{
> + return 0;
> +}
> +
> +int kgd2kfd_post_reset(struct kfd_dev *kfd)
> +{
> + return 0;
> +}
> +
> +void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> +{
> +}
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 70429f7..3214d31 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -33,7 +33,6 @@
>  #include "amdgpu_sync.h"
>  #include "amdgpu_vm.h"
>  
> -extern const struct kgd2kfd_calls *kgd2kfd;
>  extern uint64_t amdgpu_amdkfd_total_mem_size;
>  
>  struct amdgpu_device;
> @@ -214,4 +213,23 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
> *kgd,
>  void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>  void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
>  
> +/* KGD2KFD callbacks */
> +int kgd2kfd_init(unsigned interface_version,
> +  const struct kgd2kfd_calls **g2f);
> +void kgd2kfd_exit(void);
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
> +   const struct kfd2kgd_calls *f2g);
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +  const struct kgd2kfd_shared_resources *gpu_resources);
> +void kgd2kfd_device_exit(struct kfd_dev *kfd);
> +void kgd2kfd_suspend(struct kfd_dev *kfd);
> +int kgd2kfd_resume(struct kfd_dev *kfd);
> +int kgd2kfd_pre_reset(struct kfd_dev *kfd);
> +int kgd2kfd_post_reset(struct kfd_dev *kfd);
> +void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
> +int kgd2kfd_quiesce_mm(struct mm_struct *mm);
> +int kgd2kfd_resume_mm(struct mm_struct *mm);
> +int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
> +struct dma_fence *fence);
> +
>  #endif /* AMDGPU_AMDKFD_H_INCLUDED */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> index 574c118..3c7055e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -31,6 +31,7 @@
>  
>  static const struct dma_fence_ops amdkfd_fence_ops;
>  static atomic_t fence_seq = ATOMIC_INIT(0);
> +extern const struct kgd2kfd_calls *kgd2kfd;
>  
>  /* Eviction Fence
>   * Fence helper functions to deal with KFD memory eviction.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index be1ab43..3fc2618 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -44,6 +44,8 @@
>   */
>  #define AMDGPU_USERPTR_RESTORE_DELAY_MS 1
>  
> +extern const struct kgd2kfd_calls *kgd2kfd;
> +
>  /* Impose limit on how much memory KFD can use */
>  static struct {
>   uint64_t max_system_mem_limit;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 8018163..030b39d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include "kfd_priv.h"
> 

Re: [PATCH 04/16 v2] drm/amd/display: Add tracing to dc

2018-12-14 Thread Kuehling, Felix

On 2018-12-14 2:31 p.m., Wentland, Harry wrote:
> On 2018-12-07 3:32 p.m., Kuehling, Felix wrote:
>> On 2018-12-07 9:46 a.m., Wentland, Harry wrote:
>>> On 2018-12-07 9:41 a.m., Wentland, Harry wrote:
 On 2018-12-07 12:40 a.m., Kuehling, Felix wrote:
> This change seems to be breaking the build for me. I'm getting errors 
> like this:
>
> CC [M]  drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.o
> In file included from ../include/trace/events/tlb.h:9:0,
>from ../arch/x86/include/asm/mmu_context.h:10,
>from ../include/linux/mmu_context.h:5,
>from ../drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:30,
>from ../drivers/gpu/drm/amd/amdgpu/amdgpu.h:85,
>from 
> ../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:34:
> ../include/linux/tracepoint.h:285:20: error: redefinition of 
> â__tpstrtab_amdgpu_dc_rregâ
> static const char __tpstrtab_##name[] \
>   ^
> ../include/linux/tracepoint.h:293:2: note: in expansion of macro 
> âDEFINE_TRACE_FNâ
> DEFINE_TRACE_FN(name, NULL, NULL);
> ^
> ../include/trace/define_trace.h:28:2: note: in expansion of macro 
> âDEFINE_TRACEâ
> DEFINE_TRACE(name)
> ^
> ../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/./amdgpu_dm_trace.h:34:1:
>  note: in expansion of macro âTRACE_EVENTâ
>TRACE_EVENT(amdgpu_dc_rreg,
>^
>
> I have a local change that adds #include  to amdgpu.h, 
> which pulls in include/trace/events/tlb.h. That seems to create some kind 
> of conflict with trace definitions. Any ideas how to fix this? It seems a 
> bit fragile if adding some random include can break the build like this.
>
 That's the trace subsystem for you. David and I are trying to understand 
 what's going on. I think the problem is that both tlb.h and 
 amdgpu_dm_trace.h define trace events and we now include both here.

 I think we'd want to include neither trace events from amdgpu.h to avoid 
 this problem in the future, so we'll probably have to (a) clean up the 
 dc.h include to only contain what amdgpu.h needs and (b) clean up 
 amdgpu_amdkfd.h to only contain what amdgpu.h needs. At the end amdgpu.h 
 doesn't care about the tracer. The problem seems that dc.h and 
 amdgpu_amdkfd.h are the main includes for our respective drivers but are 
 also wholesale included in amdgpu.h.

>>> Apologies for the spam.
>>>
>>> Just noticed that amdgpu.h includes only amdgpu_dm.h which is clean. The 
>>> problem is that including amdgpu.h from amdgpu_dm.c now pulls in your trace 
>>> events (from tlb.h) which we don't expect and care about. I think we should 
>>> make sure amdgpu.h won't include anything that defines TRACE_EVENTs.
>> amdgpu_amdkfd.h defines a macro that uses use_mm and unuse_mm. Therefore
>> it needs to include mmu_context.h, which pulls in the conflicting trace
>> events. Maybe we can move that into a different header file that doesn't
>> get included in amdgpu.h. Or find another way to avoid including
>> amdgpu_amdkfd.h in amdgpu.h.
>>
> It's defined but not used in amdgpu_amdkfd.h (at least in the amd-stg tree). 
> Couldn't you include mmu_context.h in the files that use it 
> (amdgpu_amdkfd_gfx_v7/8/9.c) instead and avoid the problem that way?
Yes, that's what I ended up doing. I already fixed it in this commit:

commit 2a90860cfb895ff55d961a1b727d58bb68e213ba
Author: Felix Kuehling 
Date:   Fri Dec 7 17:01:27 2018 -0500

    drm/amdgpu: Workaround build failure due to trace conflict
   
    Avoid including mmu_context.h in amdgpu_amdkfd.h since that may be
    included in other header files that define traces. This leads to
    conflicts due to traces defined in other headers included via
    mmu_context.h.
   
    Signed-off-by: Felix Kuehling 
    Acked-by: Alex Deucher 

Regards,
  Felix


>
> Harry
>
>> Thanks,
>>   Felix
>>
>>
>>
>>> Harry
>>>
 Harry

> Thanks,
> Felix
>
> -Original Message-
> From: amd-gfx  On Behalf Of David 
> Francis
> Sent: Friday, November 30, 2018 9:57 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Francis, David 
> Subject: [PATCH 04/16 v2] drm/amd/display: Add tracing to dc
>
> [Why]
> Tracing is a useful and cheap debug functionality
>
> [How]
> This creates a new trace system amdgpu_dm, currently with three trace 
> events
>
> amdgpu_dc_rreg and amdgpu_dc_wreg report the address and value of any dc 
> register reads and writes
>
> amdgpu_dc_performance requires at least one of those two to be enabled.  
> It counts the register reads and writes since the last entry
>
> v2: Don't check for NULL before kfree
>
> Signed-off-by: David Francis 
> Reviewed-by: Harry Wentland 
> 

[PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v5

2018-12-14 Thread Yang, Philip
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
from scratch, for kfd, restore worker is rescheduled to retry.

HMM simplify the CPU page table concurrent update check, so remove
guptasklock, mmu_invalidations, last_set_pages fields from
amdgpu_ttm_tt struct.

HMM does not pin the page (increase page ref count), so remove related
operations like release_pages(), put_page(), mark_page_dirty().

Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |   2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 185 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  25 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 168 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   1 -
 11 files changed, 209 insertions(+), 293 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 70429f7aa9a8..717791d4fa45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -62,7 +62,6 @@ struct kgd_mem {
 
atomic_t invalid;
struct amdkfd_process_info *process_info;
-   struct page **user_pages;
 
struct amdgpu_sync sync;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index be1ab43473c6..2897793600f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
goto out;
}
 
-   /* If no restore worker is running concurrently, user_pages
-* should not be allocated
-*/
-   WARN(mem->user_pages, "Leaking user_pages array");
-
-   mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
-  sizeof(struct page *),
-  GFP_KERNEL | __GFP_ZERO);
-   if (!mem->user_pages) {
-   pr_err("%s: Failed to allocate pages array\n", __func__);
-   ret = -ENOMEM;
-   goto unregister_out;
-   }
-
-   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
+   ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
if (ret) {
pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
-   goto free_out;
+   goto unregister_out;
}
 
-   amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
-
ret = amdgpu_bo_reserve(bo, true);
if (ret) {
pr_err("%s: Failed to reserve BO\n", __func__);
@@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
mm_struct *mm,
amdgpu_bo_unreserve(bo);
 
 release_out:
-   if (ret)
-   release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
-free_out:
-   kvfree(mem->user_pages);
-   mem->user_pages = NULL;
+   amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
 unregister_out:
if (ret)
amdgpu_mn_unregister(bo);
@@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
 
amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
@@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
ctx->kfd_bo.priority = 0;
ctx->kfd_bo.tv.bo = >tbo;
ctx->kfd_bo.tv.num_shared = 1;
-   ctx->kfd_bo.user_pages = NULL;
list_add(>kfd_bo.tv.head, >list);
 
i = 0;
@@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
list_del(_list_entry->head);
mutex_unlock(_info->lock);
 
-   /* Free user pages if necessary */
-   if (mem->user_pages) {
-   pr_debug("%s: Freeing user_pages array\n", __func__);
-   if (mem->user_pages[0])
-   release_pages(mem->user_pages,
-   mem->bo->tbo.ttm->num_pages);
-   kvfree(mem->user_pages);
-   }
-
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, );
if 

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v4

2018-12-14 Thread Yang, Philip
Hi Felix,

Thanks, I will submit another patch to fix those KFD part issues.

Philip

On 2018-12-14 2:54 p.m., Kuehling, Felix wrote:
> Looks good for the KFD part. One nit-pick inline. And two more
> suggestions about amdgpu_ttm.c to make our handling of
> get_user_pages_done more robust.
> 
> Christian, would you review the CS and GEM parts? And maybe take a look
> you see nothing wrong with the amdgpu_ttm changes either.
> 
> On 2018-12-13 4:01 p.m., Yang, Philip wrote:
>> Use HMM helper function hmm_vma_fault() to get physical pages backing
>> userptr and start CPU page table update track of those pages. Then use
>> hmm_vma_range_done() to check if those pages are updated before
>> amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
>>
>> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
>> from scratch, for kfd, restore worker is rescheduled to retry.
>>
>> HMM simplify the CPU page table concurrent update check, so remove
>> guptasklock, mmu_invalidations, last_set_pages fields from
>> amdgpu_ttm_tt struct.
>>
>> HMM does not pin the page (increase page ref count), so remove related
>> operations like release_pages(), put_page(), mark_page_dirty().
>>
>> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
>> Signed-off-by: Philip Yang 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |   2 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   3 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 185 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  27 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 166 ++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   1 -
>>   11 files changed, 208 insertions(+), 294 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 70429f7aa9a8..717791d4fa45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -62,7 +62,6 @@ struct kgd_mem {
>>   
>>  atomic_t invalid;
>>  struct amdkfd_process_info *process_info;
>> -struct page **user_pages;
>>   
>>  struct amdgpu_sync sync;
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index be1ab43473c6..2897793600f7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
>> mm_struct *mm,
>>  goto out;
>>  }
>>   
>> -/* If no restore worker is running concurrently, user_pages
>> - * should not be allocated
>> - */
>> -WARN(mem->user_pages, "Leaking user_pages array");
>> -
>> -mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
>> -   sizeof(struct page *),
>> -   GFP_KERNEL | __GFP_ZERO);
>> -if (!mem->user_pages) {
>> -pr_err("%s: Failed to allocate pages array\n", __func__);
>> -ret = -ENOMEM;
>> -goto unregister_out;
>> -}
>> -
>> -ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
>> +ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
>>  if (ret) {
>>  pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
>> -goto free_out;
>> +goto unregister_out;
>>  }
>>   
>> -amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
>> -
>>  ret = amdgpu_bo_reserve(bo, true);
>>  if (ret) {
>>  pr_err("%s: Failed to reserve BO\n", __func__);
>> @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
>> mm_struct *mm,
>>  amdgpu_bo_unreserve(bo);
>>   
>>   release_out:
>> -if (ret)
>> -release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
>> -free_out:
>> -kvfree(mem->user_pages);
>> -mem->user_pages = NULL;
>> +amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>>   unregister_out:
>>  if (ret)
>>  amdgpu_mn_unregister(bo);
>> @@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
>>  ctx->kfd_bo.priority = 0;
>>  ctx->kfd_bo.tv.bo = >tbo;
>>  ctx->kfd_bo.tv.num_shared = 1;
>> -ctx->kfd_bo.user_pages = NULL;
>>  list_add(>kfd_bo.tv.head, >list);
>>   
>>  amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
>> @@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
>>  ctx->kfd_bo.priority = 0;
>>  ctx->kfd_bo.tv.bo = >tbo;
>>  ctx->kfd_bo.tv.num_shared = 1;
>> 

Re: [PATCH 3/3] drm/amdgpu: replace get_user_pages with HMM address mirror helpers v4

2018-12-14 Thread Kuehling, Felix
Looks good for the KFD part. One nit-pick inline. And two more
suggestions about amdgpu_ttm.c to make our handling of
get_user_pages_done more robust.

Christian, would you review the CS and GEM parts? And maybe take a look
you see nothing wrong with the amdgpu_ttm changes either.

On 2018-12-13 4:01 p.m., Yang, Philip wrote:
> Use HMM helper function hmm_vma_fault() to get physical pages backing
> userptr and start CPU page table update track of those pages. Then use
> hmm_vma_range_done() to check if those pages are updated before
> amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
>
> If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
> from scratch, for kfd, restore worker is rescheduled to retry.
>
> HMM simplify the CPU page table concurrent update check, so remove
> guptasklock, mmu_invalidations, last_set_pages fields from
> amdgpu_ttm_tt struct.
>
> HMM does not pin the page (increase page ref count), so remove related
> operations like release_pages(), put_page(), mark_page_dirty().
>
> Change-Id: I2e8c0c6f0d8c21e5596a32d7fc91762778bc9e67
> Signed-off-by: Philip Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   1 -
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  95 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c   |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h   |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 185 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |  14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|  27 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h|   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 166 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   1 -
>  11 files changed, 208 insertions(+), 294 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 70429f7aa9a8..717791d4fa45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -62,7 +62,6 @@ struct kgd_mem {
>  
>   atomic_t invalid;
>   struct amdkfd_process_info *process_info;
> - struct page **user_pages;
>  
>   struct amdgpu_sync sync;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index be1ab43473c6..2897793600f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -582,28 +582,12 @@ static int init_user_pages(struct kgd_mem *mem, struct 
> mm_struct *mm,
>   goto out;
>   }
>  
> - /* If no restore worker is running concurrently, user_pages
> -  * should not be allocated
> -  */
> - WARN(mem->user_pages, "Leaking user_pages array");
> -
> - mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
> -sizeof(struct page *),
> -GFP_KERNEL | __GFP_ZERO);
> - if (!mem->user_pages) {
> - pr_err("%s: Failed to allocate pages array\n", __func__);
> - ret = -ENOMEM;
> - goto unregister_out;
> - }
> -
> - ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
> + ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, bo->tbo.ttm->pages);
>   if (ret) {
>   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
> - goto free_out;
> + goto unregister_out;
>   }
>  
> - amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
> -
>   ret = amdgpu_bo_reserve(bo, true);
>   if (ret) {
>   pr_err("%s: Failed to reserve BO\n", __func__);
> @@ -616,11 +600,7 @@ static int init_user_pages(struct kgd_mem *mem, struct 
> mm_struct *mm,
>   amdgpu_bo_unreserve(bo);
>  
>  release_out:
> - if (ret)
> - release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
> -free_out:
> - kvfree(mem->user_pages);
> - mem->user_pages = NULL;
> + amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
>  unregister_out:
>   if (ret)
>   amdgpu_mn_unregister(bo);
> @@ -679,7 +659,6 @@ static int reserve_bo_and_vm(struct kgd_mem *mem,
>   ctx->kfd_bo.priority = 0;
>   ctx->kfd_bo.tv.bo = >tbo;
>   ctx->kfd_bo.tv.num_shared = 1;
> - ctx->kfd_bo.user_pages = NULL;
>   list_add(>kfd_bo.tv.head, >list);
>  
>   amdgpu_vm_get_pd_bo(vm, >list, >vm_pd[0]);
> @@ -743,7 +722,6 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
>   ctx->kfd_bo.priority = 0;
>   ctx->kfd_bo.tv.bo = >tbo;
>   ctx->kfd_bo.tv.num_shared = 1;
> - ctx->kfd_bo.user_pages = NULL;
>   list_add(>kfd_bo.tv.head, >list);
>  
>   i = 0;
> @@ -1371,15 +1349,6 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   list_del(_list_entry->head);
>   

Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Wentland, Harry
On 2018-12-14 12:26 p.m., Nicholas Kazlauskas wrote:
> [Why]
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped so it calls cleanup_fb on the old plane
> state.
> 
> However, in the asynchronous commit codepath the call to
> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
> atomic_async_update has been called. Since the plane state is updated
> in place and has not been swapped the cleanup_fb call affects the new
> plane state.
> 
> This results in a use after free for the given sequence:
> 
> - Fast update, fb1 pin/ref, fb1 unpin/unref
> - Fast update, fb2 pin/ref, fb2 unpin/unref
> - Slow update, fb1 pin/ref, fb2 unpin/unref
> - Fast update, fb2 pin/ref -> use after free. bug
> 
> [How]
> Disallow framebuffer changes in the fast path. Since this includes
> a NULL framebuffer, this means that only framebuffers that have
> been previously pin+ref at least once will be used, preventing a
> use after free.
> 
> This has a significant throughput reduction for cursor updates where
> the framebuffer changes. For most desktop usage this isn't a problem,
> but it does introduce performance regressions for two specific IGT
> tests:
> 
> - cursor-vs-flip-toggle
> - cursor-vs-flip-varying-size
> 
> Cc: Leo Li 
> Cc: Harry Wentland 
> Cc: Michel Dänzer 
> Signed-off-by: Nicholas Kazlauskas 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d01315965af0..dc1eb9ec2c38 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
> *plane,
>  static int dm_plane_atomic_async_check(struct drm_plane *plane,
>  struct drm_plane_state *new_plane_state)
>  {
> + struct drm_plane_state *old_plane_state =
> + drm_atomic_get_old_plane_state(new_plane_state->state, plane);
> +
>   /* Only support async updates on cursor planes. */
>   if (plane->type != DRM_PLANE_TYPE_CURSOR)
>   return -EINVAL;
>  
> + /*
> +  * DRM calls prepare_fb and cleanup_fb on new_plane_state for
> +  * async commits so don't allow fb changes.
> +  */
> + if (old_plane_state->fb != new_plane_state->fb)
> + return -EINVAL;
> +
>   return 0;
>  }
>  
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-14 Thread Wentland, Harry
On 2018-12-04 5:18 p.m., Mikhail Gavrilov wrote:
> Hi guys.
> I having troubles when Vega GPU is hang I unable reboot system.
> 
> Can somebody help me please.
> 
> I tried enter follow commands:
> # init 6
> # reboot
> But no one of them having success.
> 
> SysRq-W showing to us this:
> 

Looks like there's an error before this happens that might get us into this 
mess:

[  229.741741] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, 
signaled seq=28686, emitted seq=28688
[  229.741806] [drm] GPU recovery disabled.

Harry

> [  613.397629] sysrq: SysRq : Show Blocked State
> [  613.397638]   taskPC stack   pid father
> [  613.397729] kworker/u32:4   D12912   252  2 0x8000
> [  613.397748] Workqueue: events_unbound commit_work [drm_kms_helper]
> [  613.397752] Call Trace:
> [  613.397764]  ? __schedule+0x2f3/0xb90
> [  613.397770]  ? __lock_acquire+0x279/0x1650
> [  613.397778]  ? dma_fence_default_wait+0x242/0x330
> [  613.397784]  schedule+0x2f/0x90
> [  613.397788]  schedule_timeout+0x31c/0x4f0
> [  613.397792]  ? find_held_lock+0x34/0xa0
> [  613.397796]  ? find_held_lock+0x34/0xa0
> [  613.397802]  ? mark_held_locks+0x57/0x80
> [  613.397807]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
> [  613.397813]  ? dma_fence_default_wait+0x242/0x330
> [  613.397817]  dma_fence_default_wait+0x26e/0x330
> [  613.397822]  ? dma_fence_release+0x120/0x120
> [  613.397829]  dma_fence_wait_timeout+0x182/0x200
> [  613.397835]  reservation_object_wait_timeout_rcu+0x236/0x4e0
> [  613.397911]  amdgpu_dm_do_flip+0x112/0x380 [amdgpu]
> [  613.397993]  amdgpu_dm_atomic_commit_tail+0x6d0/0xd30 [amdgpu]
> [  613.398001]  ? _raw_spin_unlock_irq+0x29/0x40
> [  613.398005]  ? wait_for_completion_timeout+0x73/0x1a0
> [  613.398022]  commit_tail+0x3d/0x70 [drm_kms_helper]
> [  613.398028]  process_one_work+0x27d/0x600
> [  613.398039]  worker_thread+0x3c/0x390
> [  613.398045]  ? drain_workqueue+0x180/0x180
> [  613.398049]  kthread+0x120/0x140
> [  613.398054]  ? kthread_park+0x80/0x80
> [  613.398060]  ret_from_fork+0x27/0x50
> [  613.398204] gdbus   D13416  2138  1 0x8006
> [  613.398211] Call Trace:
> [  613.398219]  ? __schedule+0x2f3/0xb90
> [  613.398224]  ? sched_clock+0x5/0x10
> [  613.398283]  ? amdgpu_ctx_mgr_entity_flush+0x3c/0xc0 [amdgpu]
> [  613.398288]  schedule+0x2f/0x90
> [  613.398292]  schedule_preempt_disabled+0x11/0x20
> [  613.398296]  __mutex_lock+0x25b/0x9c0
> [  613.398360]  ? amdgpu_ctx_mgr_entity_flush+0x3c/0xc0 [amdgpu]
> [  613.398415]  amdgpu_ctx_mgr_entity_flush+0x3c/0xc0 [amdgpu]
> [  613.398468]  amdgpu_flush+0x1f/0x30 [amdgpu]
> [  613.398474]  filp_close+0x34/0x70
> [  613.398481]  put_files_struct.part.13+0x5b/0xc0
> [  613.398489]  do_exit+0x3df/0xca0
> [  613.398495]  ? sched_clock+0x5/0x10
> [  613.398500]  ? sched_clock_cpu+0xc/0xb0
> [  613.398507]  do_group_exit+0x47/0xc0
> [  613.398513]  get_signal+0x28f/0x850
> [  613.398521]  do_signal+0x36/0x650
> [  613.398535]  exit_to_usermode_loop+0x7b/0xe0
> [  613.398540]  do_syscall_64+0x1df/0x1f0
> [  613.398546]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  613.398551] RIP: 0033:0x7f07a2e551f1
> [  613.398558] Code: Bad RIP value.
> [  613.398562] RSP: 002b:7f079eb25860 EFLAGS: 0293 ORIG_RAX:
> 0007
> [  613.398566] RAX: 0003 RBX: 7f077c0689c0 RCX: 
> 7f07a2e551f1
> [  613.398569] RDX:  RSI: 0004 RDI: 
> 7f077c0689c0
> [  613.398573] RBP: 0004 R08:  R09: 
> 7f0784031d70
> [  613.398576] R10: 55a328098420 R11: 0293 R12: 
> 
> [  613.398579] R13: 7f07a3bb60f0 R14:  R15: 
> 0004
> [  613.399425] (sd-sync)   D13944  6221  1 0x
> [  613.399432] Call Trace:
> [  613.399440]  ? __schedule+0x2f3/0xb90
> [  613.399446]  ? prepare_to_wait_event+0xd2/0x180
> [  613.399453]  schedule+0x2f/0x90
> [  613.399459]  drm_sched_entity_flush+0x1df/0x1f0 [gpu_sched]
> [  613.399464]  ? finish_wait+0x80/0x80
> [  613.399521]  amdgpu_ctx_mgr_entity_flush+0x7c/0xc0 [amdgpu]
> [  613.399574]  amdgpu_flush+0x1f/0x30 [amdgpu]
> [  613.399580]  filp_close+0x34/0x70
> [  613.399585]  __x64_sys_close+0x1e/0x50
> [  613.399589]  do_syscall_64+0x60/0x1f0
> [  613.399594]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  613.399598] RIP: 0033:0x7f8d537d3ec7
> [  613.399603] Code: Bad RIP value.
> [  613.399606] RSP: 002b:7ffc9df8e580 EFLAGS: 0293 ORIG_RAX:
> 0003
> [  613.399610] RAX: ffda RBX: 007b RCX: 
> 7f8d537d3ec7
> [  613.399614] RDX:  RSI:  RDI: 
> 007b
> [  613.399617] RBP: 555bc87c5780 R08: 555bc87c628e R09: 
> 
> [  613.399620] R10: 7f8d5376cae0 R11: 0293 R12: 
> 7f8d52aea750
> [  613.399623] R13:  R14:  R15: 
> 
> [  613.399636] (-restore)  

RX 540 GPU Hardware bug

2018-12-14 Thread UTKU HELVACI
4.16 and older kernels work great with the hardware, no performance 
drops or glitching/freezing

kernels 4.17 , 4.18 , 4.19 and 4.20 is not working with the hardware, 
causing heavy performance drops and system unstability,

commit responsible is: 
https://github.com/torvalds/linux/commit/320b164abb32db876866a4ff8c2cb710524ac6ea
 


Bugzilla bug page: https://bugzilla.kernel.org/show_bug.cgi?id=201077

i found another person heaving this problem (in bugzilla page)


Utku Helvacı

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: unify Vega20 PSP SOS firmwares for A0 and A1

2018-12-14 Thread Grodzovsky, Andrey
With this change in latest drm-next and related commit in latest FW i get


[  148.887374] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[  148.887535] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP 
block  failed -22


Had to revert to be able to boot.


Andrey

On 12/13/2018 10:05 AM, Deucher, Alexander wrote:

Acked-by: Alex Deucher 



From: amd-gfx 

 on behalf of Evan Quan 
Sent: Thursday, December 13, 2018 1:14:51 AM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan
Subject: [PATCH] drm/amdgpu: unify Vega20 PSP SOS firmwares for A0 and A1

The new PSP SOS firmware can support both A0 and A1.

Change-Id: I9bf85eb77b183a4403667c77e291e32689aed0af
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index 61cf2f6954e7..f3f5d4dd4631 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -34,14 +34,11 @@
 #include "nbio/nbio_7_4_offset.h"

 MODULE_FIRMWARE("amdgpu/vega20_sos.bin");
-MODULE_FIRMWARE("amdgpu/vega20_sos_old.bin");
 MODULE_FIRMWARE("amdgpu/vega20_ta.bin");

 /* address block */
 #define smnMP1_FIRMWARE_FLAGS   0x3010024

-#define VEGA20_BL_VERSION_VAR_NEW 0xA1
-
 static int
 psp_v11_0_get_fw_type(struct amdgpu_firmware_info *ucode, enum psp_gfx_fw_type 
*type)
 {
@@ -104,7 +101,6 @@ static int psp_v11_0_init_microcode(struct psp_context *psp)
 int err = 0;
 const struct psp_firmware_header_v1_0 *sos_hdr;
 const struct ta_firmware_header_v1_0 *ta_hdr;
-   uint32_t bl_version;

 DRM_DEBUG("\n");

@@ -116,13 +112,7 @@ static int psp_v11_0_init_microcode(struct psp_context 
*psp)
 BUG();
 }

-   bl_version = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_100);
-   bl_version = (bl_version & 0xFF) >> 16;
-
-   if (bl_version == VEGA20_BL_VERSION_VAR_NEW)
-   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sos.bin", 
chip_name);
-   else
-   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sos_old.bin", 
chip_name);
+   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sos.bin", chip_name);
 err = request_firmware(>psp.sos_fw, fw_name, adev->dev);
 if (err)
 goto out;
--
2.19.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH i-g-t] igt: add timeline test cases v2

2018-12-14 Thread Wentland, Harry
On 2018-12-11 5:37 a.m., Chunming Zhou wrote:
> v2: adapt to new transfer ioctl
> 
> Signed-off-by: Chunming Zhou 

+igt-dev

I think intel-gfx still works for IGT development but most of the IGT work 
happens on igt-...@lists.freedesktop.org now.

Harry

> ---
>  include/drm-uapi/drm.h   |   33 ++
>  lib/igt_syncobj.c|  206 
>  lib/igt_syncobj.h|   19 +
>  tests/meson.build|1 +
>  tests/syncobj_timeline.c | 1032 ++
>  5 files changed, 1291 insertions(+)
>  create mode 100644 tests/syncobj_timeline.c
> 
> diff --git a/include/drm-uapi/drm.h b/include/drm-uapi/drm.h
> index 85c685a2..dcd245d9 100644
> --- a/include/drm-uapi/drm.h
> +++ b/include/drm-uapi/drm.h
> @@ -731,6 +731,8 @@ struct drm_syncobj_handle {
>  
>  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
>  #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
> +/* wait for time point to become available */
> +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2)
>  struct drm_syncobj_wait {
>   __u64 handles;
>   /* absolute timeout */
> @@ -741,11 +743,38 @@ struct drm_syncobj_wait {
>   __u32 pad;
>  };
>  
> +struct drm_syncobj_timeline_wait {
> +__u64 handles;
> +/* wait on specific timeline point for every handles*/
> +__u64 points;
> +/* absolute timeout */
> +__s64 timeout_nsec;
> +__u32 count_handles;
> +__u32 flags;
> +__u32 first_signaled; /* only valid when not waiting all */
> +__u32 pad;
> +};
> +
>  struct drm_syncobj_array {
>   __u64 handles;
>   __u32 count_handles;
>   __u32 pad;
>  };
> +struct drm_syncobj_timeline_array {
> +   __u64 handles;
> +   __u64 points;
> +   __u32 count_handles;
> +   __u32 pad;
> +};
> +
> +struct drm_syncobj_transfer {
> +__u32 src_handle;
> +__u32 dst_handle;
> +__u64 src_point;
> +__u64 dst_point;
> +__u32 flags;
> +__u32 pad;
> +};
>  
>  /* Query current scanout sequence number */
>  struct drm_crtc_get_sequence {
> @@ -902,6 +931,10 @@ extern "C" {
>  #define DRM_IOCTL_MODE_LIST_LESSEES  DRM_IOWR(0xC7, struct 
> drm_mode_list_lessees)
>  #define DRM_IOCTL_MODE_GET_LEASE DRM_IOWR(0xC8, struct 
> drm_mode_get_lease)
>  #define DRM_IOCTL_MODE_REVOKE_LEASE  DRM_IOWR(0xC9, struct 
> drm_mode_revoke_lease)
> +#define DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT DRM_IOWR(0xCA, struct 
> drm_syncobj_timeline_wait)
> +#define DRM_IOCTL_SYNCOBJ_QUERY DRM_IOWR(0xCB, struct 
> drm_syncobj_timeline_array)
> +#define DRM_IOCTL_SYNCOBJ_TRANSFERDRM_IOWR(0xCC, struct 
> drm_syncobj_transfer)
> +#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNALDRM_IOWR(0xCD, struct 
> drm_syncobj_timeline_array)
>  
>  /**
>   * Device specific ioctls should only be in their respective headers
> diff --git a/lib/igt_syncobj.c b/lib/igt_syncobj.c
> index d9114ca8..efa2adc4 100644
> --- a/lib/igt_syncobj.c
> +++ b/lib/igt_syncobj.c
> @@ -286,3 +286,209 @@ syncobj_signal(int fd, uint32_t *handles, uint32_t 
> count)
>  {
>   igt_assert_eq(__syncobj_signal(fd, handles, count), 0);
>  }
> +
> +static int
> +__syncobj_timeline_signal(int fd, uint32_t *handles, uint64_t *points, 
> uint32_t count)
> +{
> + struct drm_syncobj_timeline_array array = { 0 };
> + int err = 0;
> +
> + array.handles = to_user_pointer(handles);
> + array.points = to_user_pointer(points);
> + array.count_handles = count;
> + if (drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, ))
> + err = -errno;
> + return err;
> +}
> +
> +/**
> + * syncobj_signal:
> + * @fd: The DRM file descriptor.
> + * @handles: Array of syncobj handles to signal
> + * @points: List of point of handles to signal.
> + * @count: Count of syncobj handles.
> + *
> + * Signal a set of syncobjs.
> + */
> +void
> +syncobj_timeline_signal(int fd, uint32_t *handles, uint64_t *points, 
> uint32_t count)
> +{
> + igt_assert_eq(__syncobj_timeline_signal(fd, handles, points, count), 0);
> +}
> +int
> +__syncobj_timeline_wait_ioctl(int fd, struct drm_syncobj_timeline_wait *args)
> +{
> + int err = 0;
> + if (drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, args))
> + err = -errno;
> + return err;
> +}
> +static int
> +__syncobj_timeline_wait(int fd, uint32_t *handles, uint64_t *points,
> + unsigned num_handles,
> + int64_t timeout_nsec, unsigned flags,
> + uint32_t *first_signaled)
> +{
> + struct drm_syncobj_timeline_wait args;
> + int ret;
> +
> + args.handles = to_user_pointer(handles);
> + args.points = (uint64_t)to_user_pointer(points);
> + args.timeout_nsec = timeout_nsec;
> + args.count_handles = num_handles;
> + args.flags = flags;
> + args.first_signaled = 0;
> + args.pad = 0;
> +
> + ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
> + if (ret < 0)
> + return -errno;
> +
> 

Re: [PATCH 1/1] drm/amdkfd: Fix handling of return code of dma_buf_get

2018-12-14 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Kuehling, 
Felix 
Sent: Friday, December 14, 2018 2:05:11 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix
Subject: [PATCH 1/1] drm/amdkfd: Fix handling of return code of dma_buf_get

On errors, dma_buf_get returns a negative error code, rather than NULL.

Reported-by: Dan Carpenter 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3623538..db6f27f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1629,8 +1629,8 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
 return -EINVAL;

 dmabuf = dma_buf_get(args->dmabuf_fd);
-   if (!dmabuf)
-   return -EINVAL;
+   if (IS_ERR(dmabuf))
+   return PTR_ERR(dmabuf);

 mutex_lock(>mutex);

--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 2:06 PM, Grodzovsky, Andrey wrote:
> In general I agree with Michel that  DRM solution is required to
> properly address this but since now it's not really obvious what is the
> proper solution it seems to me OK to go with this fix until it's found.
> 
> Reviewed-by: Andrey Grodzovsky 
> 
> Andrey

Thanks for the review.

FWIW, we're not the only ones with the fb change check like this - msm 
does it too. The only other user of atomic_async_check and 
atomic_async_update is vc4 and they don't have it but I'd imagine they 
see a similar bug with interleaving framebuffers.

It may be difficult to develop a "fix" for the behavior in DRM given the 
semantics of the function (in place update vs full swap). The 
old_plane_state is technically plane->state in this case, so even just 
adding cleanup_fb(plane, old_plane_state) after atomic_async_update 
isn't enough. What *should* be done here is the full state swap like in 
a regular atomic commit but that may cause breakages in other drivers.

Copying plane->state and calling cleanup_fb on that would also work to 
fix it, but the behavior is certainly unintuitive and asking for worse 
bugs than this one to pop up since nothing else in DRM works like that.

Nicholas Kazlauskas

> 
> 
> On 12/14/2018 12:51 PM, Kazlauskas, Nicholas wrote:
>> On 12/14/18 12:47 PM, Grodzovsky, Andrey wrote:
>>>
>>> On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
 On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
>> [Why]
>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>> whether the commit was asynchronous or not. When it's called from
>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>> plane state has been swapped so it calls cleanup_fb on the old plane
>> state.
>>
>> However, in the asynchronous commit codepath the call to
>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>> atomic_async_update has been called. Since the plane state is updated
>> in place and has not been swapped the cleanup_fb call affects the new
>> plane state.
>>
>> This results in a use after free for the given sequence:
>>
>> - Fast update, fb1 pin/ref, fb1 unpin/unref
>> - Fast update, fb2 pin/ref, fb2 unpin/unref
>> - Slow update, fb1 pin/ref, fb2 unpin/unref
> Shouldn't you have use after free already at this point ?
>
> Andrey
 This assumes there was 1 reference on the bo. You would be getting use
 after free on every single update (be it fast or slow) if this wasn't
 the case.
>>> Why would I get it on every single update if it's all balanced - first
>>> pin/ref then unpin/unref ?
>> It's balanced, but has a reference not from the atomic code path, ie.
>> from creation.
>>
>> So this is actually:
>>
>> fb1 pin, refcount=2, b1 unpin refcount=1
>>
 So in the case where there was 1 reference on fb2 it's actually freed at
 the end of slow update since the ref count is now 0. Then the use after
 free happens:
>>> I still don't understand where exactly the 1 reference on fb2 during
>>> slow update comes form
>>> if all i see before that is 'Fast update, fb2 pin/ref, fb2 unpin/unref'
>>> - can you clarify that ?
>>>
>>> Andrey
>> There isn't any pin/ref on fb2 during the slow update. The pin/ref
>> happens during a prepare_fb call - which is *always* called on
>> new_plane_state. So the state looks like this in commit tail:
>>
>> old_plane_state->fb = fb2
>> new_plane_state->fb = fb1
>>
>> Then at the end during cleanup planes, cleanup_fb is called on
>> old_plane_state (fb2).
>>
>> Nicholas Kazlauskas
>>
>> - Fast update, fb2 pin/ref -> use after free. bug
 ...here.

 You can see this exact sequence happening in Michel's log.

 Nicholas Kazlauskas

>> [How]
>> Disallow framebuffer changes in the fast path. Since this includes
>> a NULL framebuffer, this means that only framebuffers that have
>> been previously pin+ref at least once will be used, preventing a
>> use after free.
>>
>> This has a significant throughput reduction for cursor updates where
>> the framebuffer changes. For most desktop usage this isn't a problem,
>> but it does introduce performance regressions for two specific IGT
>> tests:
>>
>> - cursor-vs-flip-toggle
>> - cursor-vs-flip-varying-size
>>
>> Cc: Leo Li 
>> Cc: Harry Wentland 
>> Cc: Michel Dänzer 
>> Signed-off-by: Nicholas Kazlauskas 
>> ---
>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>>1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index d01315965af0..dc1eb9ec2c38 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ 

Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Grodzovsky, Andrey
In general I agree with Michel that  DRM solution is required to 
properly address this but since now it's not really obvious what is the 
proper solution it seems to me OK to go with this fix until it's found.

Reviewed-by: Andrey Grodzovsky 

Andrey


On 12/14/2018 12:51 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 12:47 PM, Grodzovsky, Andrey wrote:
>>
>> On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
>>> On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
 On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
> [Why]
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped so it calls cleanup_fb on the old plane
> state.
>
> However, in the asynchronous commit codepath the call to
> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
> atomic_async_update has been called. Since the plane state is updated
> in place and has not been swapped the cleanup_fb call affects the new
> plane state.
>
> This results in a use after free for the given sequence:
>
> - Fast update, fb1 pin/ref, fb1 unpin/unref
> - Fast update, fb2 pin/ref, fb2 unpin/unref
> - Slow update, fb1 pin/ref, fb2 unpin/unref
 Shouldn't you have use after free already at this point ?

 Andrey
>>> This assumes there was 1 reference on the bo. You would be getting use
>>> after free on every single update (be it fast or slow) if this wasn't
>>> the case.
>> Why would I get it on every single update if it's all balanced - first
>> pin/ref then unpin/unref ?
> It's balanced, but has a reference not from the atomic code path, ie.
> from creation.
>
> So this is actually:
>
> fb1 pin, refcount=2, b1 unpin refcount=1
>
>>> So in the case where there was 1 reference on fb2 it's actually freed at
>>> the end of slow update since the ref count is now 0. Then the use after
>>> free happens:
>> I still don't understand where exactly the 1 reference on fb2 during
>> slow update comes form
>> if all i see before that is 'Fast update, fb2 pin/ref, fb2 unpin/unref'
>> - can you clarify that ?
>>
>> Andrey
> There isn't any pin/ref on fb2 during the slow update. The pin/ref
> happens during a prepare_fb call - which is *always* called on
> new_plane_state. So the state looks like this in commit tail:
>
> old_plane_state->fb = fb2
> new_plane_state->fb = fb1
>
> Then at the end during cleanup planes, cleanup_fb is called on
> old_plane_state (fb2).
>
> Nicholas Kazlauskas
>
> - Fast update, fb2 pin/ref -> use after free. bug
>>> ...here.
>>>
>>> You can see this exact sequence happening in Michel's log.
>>>
>>> Nicholas Kazlauskas
>>>
> [How]
> Disallow framebuffer changes in the fast path. Since this includes
> a NULL framebuffer, this means that only framebuffers that have
> been previously pin+ref at least once will be used, preventing a
> use after free.
>
> This has a significant throughput reduction for cursor updates where
> the framebuffer changes. For most desktop usage this isn't a problem,
> but it does introduce performance regressions for two specific IGT
> tests:
>
> - cursor-vs-flip-toggle
> - cursor-vs-flip-varying-size
>
> Cc: Leo Li 
> Cc: Harry Wentland 
> Cc: Michel Dänzer 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d01315965af0..dc1eb9ec2c38 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
> *plane,
>   static int dm_plane_atomic_async_check(struct drm_plane *plane,
>  struct drm_plane_state 
> *new_plane_state)
>   {
> + struct drm_plane_state *old_plane_state =
> + drm_atomic_get_old_plane_state(new_plane_state->state, plane);
> +
>   /* Only support async updates on cursor planes. */
>   if (plane->type != DRM_PLANE_TYPE_CURSOR)
>   return -EINVAL;
>   
> + /*
> +  * DRM calls prepare_fb and cleanup_fb on new_plane_state for
> +  * async commits so don't allow fb changes.
> +  */
> + if (old_plane_state->fb != new_plane_state->fb)
> + return -EINVAL;
> +
>   return 0;
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/1] drm/amdkfd: Fix handling of return code of dma_buf_get

2018-12-14 Thread Kuehling, Felix
On errors, dma_buf_get returns a negative error code, rather than NULL.

Reported-by: Dan Carpenter 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 3623538..db6f27f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1629,8 +1629,8 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
return -EINVAL;
 
dmabuf = dma_buf_get(args->dmabuf_fd);
-   if (!dmabuf)
-   return -EINVAL;
+   if (IS_ERR(dmabuf))
+   return PTR_ERR(dmabuf);
 
mutex_lock(>mutex);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 12:47 PM, Grodzovsky, Andrey wrote:
> 
> 
> On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
>> On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
>>>
>>> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
 [Why]
 The behavior of drm_atomic_helper_cleanup_planes differs depending on
 whether the commit was asynchronous or not. When it's called from
 amdgpu_dm_atomic_commit_tail during a typical atomic commit the
 plane state has been swapped so it calls cleanup_fb on the old plane
 state.

 However, in the asynchronous commit codepath the call to
 drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
 atomic_async_update has been called. Since the plane state is updated
 in place and has not been swapped the cleanup_fb call affects the new
 plane state.

 This results in a use after free for the given sequence:

 - Fast update, fb1 pin/ref, fb1 unpin/unref
 - Fast update, fb2 pin/ref, fb2 unpin/unref
 - Slow update, fb1 pin/ref, fb2 unpin/unref
>>> Shouldn't you have use after free already at this point ?
>>>
>>> Andrey
>> This assumes there was 1 reference on the bo. You would be getting use
>> after free on every single update (be it fast or slow) if this wasn't
>> the case.
> 
> Why would I get it on every single update if it's all balanced - first
> pin/ref then unpin/unref ?

It's balanced, but has a reference not from the atomic code path, ie. 
from creation.

So this is actually:

fb1 pin, refcount=2, b1 unpin refcount=1

> 
>>
>> So in the case where there was 1 reference on fb2 it's actually freed at
>> the end of slow update since the ref count is now 0. Then the use after
>> free happens:
> 
> I still don't understand where exactly the 1 reference on fb2 during
> slow update comes form
> if all i see before that is 'Fast update, fb2 pin/ref, fb2 unpin/unref'
> - can you clarify that ?
> 
> Andrey

There isn't any pin/ref on fb2 during the slow update. The pin/ref 
happens during a prepare_fb call - which is *always* called on 
new_plane_state. So the state looks like this in commit tail:

old_plane_state->fb = fb2
new_plane_state->fb = fb1

Then at the end during cleanup planes, cleanup_fb is called on 
old_plane_state (fb2).

Nicholas Kazlauskas

> 
>>
 - Fast update, fb2 pin/ref -> use after free. bug
>> ...here.
>>
>> You can see this exact sequence happening in Michel's log.
>>
>> Nicholas Kazlauskas
>>
 [How]
 Disallow framebuffer changes in the fast path. Since this includes
 a NULL framebuffer, this means that only framebuffers that have
 been previously pin+ref at least once will be used, preventing a
 use after free.

 This has a significant throughput reduction for cursor updates where
 the framebuffer changes. For most desktop usage this isn't a problem,
 but it does introduce performance regressions for two specific IGT
 tests:

 - cursor-vs-flip-toggle
 - cursor-vs-flip-varying-size

 Cc: Leo Li 
 Cc: Harry Wentland 
 Cc: Michel Dänzer 
 Signed-off-by: Nicholas Kazlauskas 
 ---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
 b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 index d01315965af0..dc1eb9ec2c38 100644
 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
 @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
 *plane,
  static int dm_plane_atomic_async_check(struct drm_plane *plane,
   struct drm_plane_state 
 *new_plane_state)
  {
 +  struct drm_plane_state *old_plane_state =
 +  drm_atomic_get_old_plane_state(new_plane_state->state, plane);
 +
/* Only support async updates on cursor planes. */
if (plane->type != DRM_PLANE_TYPE_CURSOR)
return -EINVAL;
  
 +  /*
 +   * DRM calls prepare_fb and cleanup_fb on new_plane_state for
 +   * async commits so don't allow fb changes.
 +   */
 +  if (old_plane_state->fb != new_plane_state->fb)
 +  return -EINVAL;
 +
return 0;
  }
  
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Grodzovsky, Andrey


On 12/14/2018 12:41 PM, Kazlauskas, Nicholas wrote:
> On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
>>
>> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
>>> [Why]
>>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>>> whether the commit was asynchronous or not. When it's called from
>>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>>> plane state has been swapped so it calls cleanup_fb on the old plane
>>> state.
>>>
>>> However, in the asynchronous commit codepath the call to
>>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>>> atomic_async_update has been called. Since the plane state is updated
>>> in place and has not been swapped the cleanup_fb call affects the new
>>> plane state.
>>>
>>> This results in a use after free for the given sequence:
>>>
>>> - Fast update, fb1 pin/ref, fb1 unpin/unref
>>> - Fast update, fb2 pin/ref, fb2 unpin/unref
>>> - Slow update, fb1 pin/ref, fb2 unpin/unref
>> Shouldn't you have use after free already at this point ?
>>
>> Andrey
> This assumes there was 1 reference on the bo. You would be getting use
> after free on every single update (be it fast or slow) if this wasn't
> the case.

Why would I get it on every single update if it's all balanced - first 
pin/ref then unpin/unref ?

>
> So in the case where there was 1 reference on fb2 it's actually freed at
> the end of slow update since the ref count is now 0. Then the use after
> free happens:

I still don't understand where exactly the 1 reference on fb2 during 
slow update comes form
if all i see before that is 'Fast update, fb2 pin/ref, fb2 unpin/unref' 
- can you clarify that ?

Andrey

>
>>> - Fast update, fb2 pin/ref -> use after free. bug
> ...here.
>
> You can see this exact sequence happening in Michel's log.
>
> Nicholas Kazlauskas
>
>>> [How]
>>> Disallow framebuffer changes in the fast path. Since this includes
>>> a NULL framebuffer, this means that only framebuffers that have
>>> been previously pin+ref at least once will be used, preventing a
>>> use after free.
>>>
>>> This has a significant throughput reduction for cursor updates where
>>> the framebuffer changes. For most desktop usage this isn't a problem,
>>> but it does introduce performance regressions for two specific IGT
>>> tests:
>>>
>>> - cursor-vs-flip-toggle
>>> - cursor-vs-flip-varying-size
>>>
>>> Cc: Leo Li 
>>> Cc: Harry Wentland 
>>> Cc: Michel Dänzer 
>>> Signed-off-by: Nicholas Kazlauskas 
>>> ---
>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index d01315965af0..dc1eb9ec2c38 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
>>> *plane,
>>> static int dm_plane_atomic_async_check(struct drm_plane *plane,
>>>struct drm_plane_state 
>>> *new_plane_state)
>>> {
>>> +   struct drm_plane_state *old_plane_state =
>>> +   drm_atomic_get_old_plane_state(new_plane_state->state, plane);
>>> +
>>> /* Only support async updates on cursor planes. */
>>> if (plane->type != DRM_PLANE_TYPE_CURSOR)
>>> return -EINVAL;
>>> 
>>> +   /*
>>> +* DRM calls prepare_fb and cleanup_fb on new_plane_state for
>>> +* async commits so don't allow fb changes.
>>> +*/
>>> +   if (old_plane_state->fb != new_plane_state->fb)
>>> +   return -EINVAL;
>>> +
>>> return 0;
>>> }
>>> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 12:34 PM, Grodzovsky, Andrey wrote:
> 
> 
> On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
>> [Why]
>> The behavior of drm_atomic_helper_cleanup_planes differs depending on
>> whether the commit was asynchronous or not. When it's called from
>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>> plane state has been swapped so it calls cleanup_fb on the old plane
>> state.
>>
>> However, in the asynchronous commit codepath the call to
>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>> atomic_async_update has been called. Since the plane state is updated
>> in place and has not been swapped the cleanup_fb call affects the new
>> plane state.
>>
>> This results in a use after free for the given sequence:
>>
>> - Fast update, fb1 pin/ref, fb1 unpin/unref
>> - Fast update, fb2 pin/ref, fb2 unpin/unref
>> - Slow update, fb1 pin/ref, fb2 unpin/unref
> 
> Shouldn't you have use after free already at this point ?
> 
> Andrey

This assumes there was 1 reference on the bo. You would be getting use 
after free on every single update (be it fast or slow) if this wasn't 
the case.

So in the case where there was 1 reference on fb2 it's actually freed at 
the end of slow update since the ref count is now 0. Then the use after 
free happens:

> 
>> - Fast update, fb2 pin/ref -> use after free. bug

...here.

You can see this exact sequence happening in Michel's log.

Nicholas Kazlauskas

>>
>> [How]
>> Disallow framebuffer changes in the fast path. Since this includes
>> a NULL framebuffer, this means that only framebuffers that have
>> been previously pin+ref at least once will be used, preventing a
>> use after free.
>>
>> This has a significant throughput reduction for cursor updates where
>> the framebuffer changes. For most desktop usage this isn't a problem,
>> but it does introduce performance regressions for two specific IGT
>> tests:
>>
>> - cursor-vs-flip-toggle
>> - cursor-vs-flip-varying-size
>>
>> Cc: Leo Li 
>> Cc: Harry Wentland 
>> Cc: Michel Dänzer 
>> Signed-off-by: Nicholas Kazlauskas 
>> ---
>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>>1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index d01315965af0..dc1eb9ec2c38 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
>> *plane,
>>static int dm_plane_atomic_async_check(struct drm_plane *plane,
>> struct drm_plane_state *new_plane_state)
>>{
>> +struct drm_plane_state *old_plane_state =
>> +drm_atomic_get_old_plane_state(new_plane_state->state, plane);
>> +
>>  /* Only support async updates on cursor planes. */
>>  if (plane->type != DRM_PLANE_TYPE_CURSOR)
>>  return -EINVAL;
>>
>> +/*
>> + * DRM calls prepare_fb and cleanup_fb on new_plane_state for
>> + * async commits so don't allow fb changes.
>> + */
>> +if (old_plane_state->fb != new_plane_state->fb)
>> +return -EINVAL;
>> +
>>  return 0;
>>}
>>
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Grodzovsky, Andrey


On 12/14/2018 12:26 PM, Nicholas Kazlauskas wrote:
> [Why]
> The behavior of drm_atomic_helper_cleanup_planes differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped so it calls cleanup_fb on the old plane
> state.
>
> However, in the asynchronous commit codepath the call to
> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
> atomic_async_update has been called. Since the plane state is updated
> in place and has not been swapped the cleanup_fb call affects the new
> plane state.
>
> This results in a use after free for the given sequence:
>
> - Fast update, fb1 pin/ref, fb1 unpin/unref
> - Fast update, fb2 pin/ref, fb2 unpin/unref
> - Slow update, fb1 pin/ref, fb2 unpin/unref

Shouldn't you have use after free already at this point ?

Andrey

> - Fast update, fb2 pin/ref -> use after free. bug
>
> [How]
> Disallow framebuffer changes in the fast path. Since this includes
> a NULL framebuffer, this means that only framebuffers that have
> been previously pin+ref at least once will be used, preventing a
> use after free.
>
> This has a significant throughput reduction for cursor updates where
> the framebuffer changes. For most desktop usage this isn't a problem,
> but it does introduce performance regressions for two specific IGT
> tests:
>
> - cursor-vs-flip-toggle
> - cursor-vs-flip-varying-size
>
> Cc: Leo Li 
> Cc: Harry Wentland 
> Cc: Michel Dänzer 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d01315965af0..dc1eb9ec2c38 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
> *plane,
>   static int dm_plane_atomic_async_check(struct drm_plane *plane,
>  struct drm_plane_state *new_plane_state)
>   {
> + struct drm_plane_state *old_plane_state =
> + drm_atomic_get_old_plane_state(new_plane_state->state, plane);
> +
>   /* Only support async updates on cursor planes. */
>   if (plane->type != DRM_PLANE_TYPE_CURSOR)
>   return -EINVAL;
>   
> + /*
> +  * DRM calls prepare_fb and cleanup_fb on new_plane_state for
> +  * async commits so don't allow fb changes.
> +  */
> + if (old_plane_state->fb != new_plane_state->fb)
> + return -EINVAL;
> +
>   return 0;
>   }
>   

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Skip fast cursor updates for fb changes

2018-12-14 Thread Nicholas Kazlauskas
[Why]
The behavior of drm_atomic_helper_cleanup_planes differs depending on
whether the commit was asynchronous or not. When it's called from
amdgpu_dm_atomic_commit_tail during a typical atomic commit the
plane state has been swapped so it calls cleanup_fb on the old plane
state.

However, in the asynchronous commit codepath the call to
drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
atomic_async_update has been called. Since the plane state is updated
in place and has not been swapped the cleanup_fb call affects the new
plane state.

This results in a use after free for the given sequence:

- Fast update, fb1 pin/ref, fb1 unpin/unref
- Fast update, fb2 pin/ref, fb2 unpin/unref
- Slow update, fb1 pin/ref, fb2 unpin/unref
- Fast update, fb2 pin/ref -> use after free. bug

[How]
Disallow framebuffer changes in the fast path. Since this includes
a NULL framebuffer, this means that only framebuffers that have
been previously pin+ref at least once will be used, preventing a
use after free.

This has a significant throughput reduction for cursor updates where
the framebuffer changes. For most desktop usage this isn't a problem,
but it does introduce performance regressions for two specific IGT
tests:

- cursor-vs-flip-toggle
- cursor-vs-flip-varying-size

Cc: Leo Li 
Cc: Harry Wentland 
Cc: Michel Dänzer 
Signed-off-by: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d01315965af0..dc1eb9ec2c38 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3628,10 +3628,20 @@ static int dm_plane_atomic_check(struct drm_plane 
*plane,
 static int dm_plane_atomic_async_check(struct drm_plane *plane,
   struct drm_plane_state *new_plane_state)
 {
+   struct drm_plane_state *old_plane_state =
+   drm_atomic_get_old_plane_state(new_plane_state->state, plane);
+
/* Only support async updates on cursor planes. */
if (plane->type != DRM_PLANE_TYPE_CURSOR)
return -EINVAL;
 
+   /*
+* DRM calls prepare_fb and cleanup_fb on new_plane_state for
+* async commits so don't allow fb changes.
+*/
+   if (old_plane_state->fb != new_plane_state->fb)
+   return -EINVAL;
+
return 0;
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 6/6] drm/amdgpu/vcn:Remove bit 31 for scratch2 to indicate the WA is active

2018-12-14 Thread Liu, Leo
The series are:

Acked-by: Leo Liu 

On 12/13/18 1:08 PM, Zhu, James wrote:
> Remove bit 31 for scratch2 to indicate the Hardware bug work around is active.
>
> Signed-off-by: James Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 564ed94..eb6b783 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -279,7 +279,7 @@ static int amdgpu_vcn_pause_dpg_mode(struct amdgpu_device 
> *adev,
>   
>   ring = >vcn.ring_dec;
>   WREG32_SOC15(UVD, 0, mmUVD_RBC_RB_WPTR,
> -RREG32_SOC15(UVD, 0, 
> mmUVD_SCRATCH2));
> +RREG32_SOC15(UVD, 0, 
> mmUVD_SCRATCH2) & 0x7FFF);
>   SOC15_WAIT_ON_RREG(UVD, 0, mmUVD_POWER_STATUS,
>  
> UVD_PGFSM_CONFIG__UVDM_UVDU_PWR_ON,
>  
> UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, ret_code);
> @@ -339,7 +339,7 @@ static int amdgpu_vcn_pause_dpg_mode(struct amdgpu_device 
> *adev,
>   
>   ring = >vcn.ring_dec;
>   WREG32_SOC15(UVD, 0, mmUVD_RBC_RB_WPTR,
> -RREG32_SOC15(UVD, 0, 
> mmUVD_SCRATCH2));
> +RREG32_SOC15(UVD, 0, 
> mmUVD_SCRATCH2) & 0x7FFF);
>   SOC15_WAIT_ON_RREG(UVD, 0, mmUVD_POWER_STATUS,
>  
> UVD_PGFSM_CONFIG__UVDM_UVDU_PWR_ON,
>  
> UVD_POWER_STATUS__UVD_POWER_STATUS_MASK, ret_code);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Don't cleanup new cursor fb in fast cursor update

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 10:19 AM, Kazlauskas, Nicholas wrote:
> On 12/14/18 10:01 AM, Michel Dänzer wrote:
>> On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:
>>> [Why]
>>> The behavior of dm_plane_helper_cleanup_fb differs depending on
>>> whether the commit was asynchronous or not. When it's called from
>>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>>> plane state has been swapped so it calls cleanup_fb on the old plane
>>> state.
>>>
>>> However, in the asynchronous commit codepath the call to
>>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>>> atomic_async_update has been called. Since the plane state has not
>>> been swapped, the cleanup_fb call affects the new plane state with the
>>> active fb.
>>>
>>> The slow, full path is only ever used for the cursor update when the
>>> cursor CRTC changes. If there was previously a fb in the state before
>>> that had gone through the fast asynchronous path then cleanup_fb will
>>> be called a second time on an fb that was previously unpinned/unref'd.
>>>
>>> This results in a use after free on the next cursor update.
>>>
>>> [How]
>>> Regardless of whether this was intentional behavior in DRM or not,
>>> the fix is to manually call cleanup_fb on the old plane state's fb.
>>>
>>> Since the pointer to the old_plane_state is actually the current
>>> plane->state in this function, a backup is needed. This has a minor
>>> change to behavior for handle_cursor_update since it now correctly uses
>>> the old state.
>>>
>>> The next step is to prevent the cleanup_fb call from DRM from unpinning
>>> the active new_state->fb. Setting new_state->fb to NULL would be
>>> enough to handle this, but DRM has a warning in the
>>> drm_atomic_helper_async_commit helper if
>>> plane->state->fb != new_state->fb.
>>>
>>> A new field was added (cursor_update) to dm_plane_state that's only
>>> ever set to true during the fast path. If it's true, then cleanup_fb
>>> doesn't unpin and unref the fb.
>>>
>>> Cc: Leo Li 
>>> Cc: Harry Wentland 
>>> Cc: Michel Dänzer 
>>> Signed-off-by: Nicholas Kazlauskas 
>>>
>>> [...]
>>>
>>> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct 
>>> drm_plane *plane,
>>> plane->state->crtc_w = new_state->crtc_w;
>>> plane->state->crtc_h = new_state->crtc_h;
>>>
>>> -   handle_cursor_update(plane, old_state);
>>> +   handle_cursor_update(plane, _state);
>>> +
>>> +   /*
>>> +* While DRM already takes care of drm_atomic_helper_cleanup_planes,
>>> +* it does it on the wrong state (new_state instead of old_state).
>>> +* This is a workaround for that behavior.
>>> +*/
>>> +   dm_plane_helper_cleanup_fb(plane, _state);
>>> +   dm_plane_state->cursor_update = true;
>>
>> We shouldn't work around DRM core code in driver code. Please work on a
>> solution with the core code developers on the dri-devel list, and revert
>> the cursor fast path for now.
>>
>>
> 
> I'd rather not revert the patch given how much it improves the user
> experience. Some compositors (and wine games) are unusable or unplayable
> without it.
> 
> In general I agree with not doing core workarounds, but I'm not even
> sure if this is a bug in core DRM to begin with. Since the plane state
> modification is in place if you free the old_state then you're freeing
> what's currently active on plane->state. This works for us since we only
> work on the ->fb during prepare/cleanup, but this isn't true in general.
> 
> The only other way to do this is not using
> async_commit_check/async_commit_update and reimplementing atomic commit
> behavior for the fast path. Intel does this with their legacy cursor
> update fast path - it's a lot more code and it involves using
> "deprecated" elements of the atomic API.
> 
> Nicholas Kazlauskas
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

A simpler solution that I would still be fine with is to just disallow 
framebuffer changes in the fast path until DRM behavior is clarified or 
fixed. This would work with DRM freeing either the new_plane_state or 
the old_plane_state - it's the same FB in either case.

This will still resolve most cursor stuttering problems while being 
significantly less hacky and easier to revert.

It unfortunately does affect performance quite a bit for cases where the 
the cursor is changing the framebuffer, but you don't see this in most 
desktop usage. Just in the cursor-vs-flip-toggle and 
cursor-vs-flip-varying-size igt tests.

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [WIP PATCH 13/15] drm/dp_mst: Start tracking per-port VCPI allocations

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:42PM -0500, Lyude Paul wrote:
> There has been a TODO waiting for quite a long time in
> drm_dp_mst_topology.c:
> 
>   /* We cannot rely on port->vcpi.num_slots to update
>* topology_state->avail_slots as the port may not exist if the parent
>* branch device was unplugged. This should be fixed by tracking
>* per-port slot allocation in drm_dp_mst_topology_state instead of
>* depending on the caller to tell us how many slots to release.
>*/
> 
> That's not the only reason we should fix this: forcing the driver to
> track the VCPI allocations throughout a state's atomic check is
> error prone, because it means that extra care has to be taken with the
> order that drm_dp_atomic_find_vcpi_slots() and
> drm_dp_atomic_release_vcpi_slots() are called in in order to ensure
> idempotency. Currently the only driver actually using these helpers,
> i915, doesn't even do this correctly: multiple ->best_encoder() checks
> with i915's current implementation would not be idempotent and would
> over-allocate VCPI slots, something I learned trying to implement
> fallback retraining in MST.
> 
> So: simplify this whole mess, and teach drm_dp_atomic_find_vcpi_slots()
> and drm_dp_atomic_release_vcpi_slots() to track the VCPI allocations for
> each port. This allows us to ensure idempotency without having to rely
> on the driver as much. Additionally: the driver doesn't need to do any
> kind of VCPI slot tracking anymore if it doesn't need it for it's own
> internal state.
> 
> Additionally; this adds a new drm_dp_mst_atomic_check() helper which
> must be used by atomic drivers to perform validity checks for the new
> VCPI allocations incurred by a state.
> 
> Also: update the documentation and make it more obvious that these
> /must/ be called by /all/ atomic drivers supporting MST.
> 
> Changes since v6:
>  - Keep a kref to all of the ports we have allocations on. This required
>a good bit of changing to when we call drm_dp_find_vcpi_slots(),
>mainly that we need to ensure that we only redo VCPI allocations on
>actual mode or CRTC changes, not crtc_state->active changes.
>Additionally, we no longer take the registration of the DRM connector
>for each port into account because so long as we have a kref to the
>port in the new or previous atomic state, the connector will stay
>registered.
>  - Use the small changes to drm_dp_put_port() to add even more error
>checking to make misusage of the helpers more obvious. I added this
>after having to chase down various use-after-free conditions that
>started popping up from the new helpers so no one else has to
>troubleshoot that.
>  - Move some accidental DRM_DEBUG_KMS() calls to DRM_DEBUG_ATOMIC()
>  - Update documentation again, note that find/release() should both not be
>called on the same port in a single atomic check phase (but multiple
>calls to one or the other is OK)
> 
> Changes since v4:
>  - Don't skip the atomic checks for VCPI allocations if no new VCPI
>allocations happen in a state. This makes the next change I'm about
>to list here a lot easier to implement.
>  - Don't ignore VCPI allocations on destroyed ports, instead ensure that
>when ports are destroyed and still have VCPI allocations in the
>topology state, the only state changes allowed are releasing said
>ports' VCPI. This prevents a state with a mix of VCPI allocations
>from destroyed ports, and allocations from valid ports.
> 
> Changes since v3:
>  - Don't release VCPI allocations in the topology state immediately in
>drm_dp_atomic_release_vcpi_slots(), instead mark them as 0 and skip
>over them in drm_dp_mst_duplicate_state(). This makes it so
>drm_dp_atomic_release_vcpi_slots() is still idempotent while also
>throwing warnings if the driver messes up it's book keeping and tries
>to release VCPI slots on a port that doesn't have any pre-existing
>VCPI allocation - danvet
>  - Change mst_state/state in some debugging messages to "mst state"
> 
> Changes since v2:
>  - Use kmemdup() for duplicating MST state - danvet
>  - Move port validation out of duplicate state callback - danvet
>  - Handle looping through MST topology states in
>drm_dp_mst_atomic_check() so the driver doesn't have to do it
>  - Fix documentation in drm_dp_atomic_find_vcpi_slots()
>  - Move the atomic check for each individual topology state into it's
>own function, reduces indenting
>  - Don't consider "stale" MST ports when calculating the bandwidth
>requirements. This is needed because originally we relied on the
>state duplication functions to prune any stale ports from the new
>state, which would prevent us from incorrectly considering their
>bandwidth requirements alongside legitimate new payloads.
>  - Add function references in drm_dp_atomic_release_vcpi_slots() - danvet
>  - Annotate atomic VCPI and atomic check functions with 

Re: [PATCH] drm/amdgpu: fix IH overflow on Vega10

2018-12-14 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: Christian König 
Sent: Friday, December 14, 2018 9:37:23 AM
To: Deucher, Alexander; alexdeuc...@gmail.com; amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: fix IH overflow on Vega10

When an ring buffer overflow happens the appropriate bit is set in the WPTR
register which is also written back to memory. But clearing the bit in the
WPTR doesn't trigger another memory writeback.

So what can happen is that we end up processing the buffer overflow over and
over again because the bit is never cleared. Resulting in a random system
lockup because of an infinite loop in an interrupt handler.

This is 100% reproducible on Vega10, but it's most likely an issue we have
in the driver over all generations all the way back to radeon.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 992c8a8b8f77..0ab7785079c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -276,31 +276,49 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,

 wptr = le32_to_cpu(*ih->wptr_cpu);

-   if (REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW)) {
-   wptr = REG_SET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW, 0);
-
-   /* When a ring buffer overflow happen start parsing interrupt
-* from the last not overwritten vector (wptr + 32). Hopefully
-* this should allow us to catchup.
-*/
-   tmp = (wptr + 32) & ih->ptr_mask;
-   dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 
0x%08X)\n",
-wptr, ih->rptr, tmp);
-   ih->rptr = tmp;
-
-   if (ih == >irq.ih)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
-   else if (ih == >irq.ih1)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING1);
-   else if (ih == >irq.ih2)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING2);
-   else
-   BUG();
-
-   tmp = RREG32_NO_KIQ(reg);
-   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
-   WREG32_NO_KIQ(reg, tmp);
-   }
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+
+   /* Double check that the overflow wasn't already cleared. */
+   if (ih == >irq.ih)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR);
+   else if (ih == >irq.ih1)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR_RING1);
+   else if (ih == >irq.ih2)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR_RING2);
+   else
+   BUG();
+
+   wptr = RREG32_NO_KIQ(reg);
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+
+   wptr = REG_SET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW, 0);
+
+   /* When a ring buffer overflow happen start parsing interrupt
+* from the last not overwritten vector (wptr + 32). Hopefully
+* this should allow us to catchup.
+*/
+   tmp = (wptr + 32) & ih->ptr_mask;
+   dev_warn(adev->dev, "IH ring buffer overflow "
+"(0x%08X, 0x%08X, 0x%08X)\n",
+wptr, ih->rptr, tmp);
+   ih->rptr = tmp;
+
+   if (ih == >irq.ih)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
+   else if (ih == >irq.ih1)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   else if (ih == >irq.ih2)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   else
+   BUG();
+
+   tmp = RREG32_NO_KIQ(reg);
+   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
+   WREG32_NO_KIQ(reg, tmp);
+
+out:
 return (wptr & ih->ptr_mask);
 }

--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amdgpu: Simply kgd2kfd interface

2018-12-14 Thread Lin, Amber
After amdkfd is merged into amdgpu module, amdgpu can call amdkfd
functions directly.

Signed-off-by: Amber Lin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   | 26 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ++
 3 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 358f690..fc926e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -61,17 +61,13 @@ int amdgpu_amdkfd_init(void)
 
 void amdgpu_amdkfd_fini(void)
 {
-   if (kgd2kfd)
-   kgd2kfd->exit();
+   kgd2kfd_exit();
 }
 
 void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 {
const struct kfd2kgd_calls *kfd2kgd;
 
-   if (!kgd2kfd)
-   return;
-
switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_CIK
case CHIP_KAVERI:
@@ -98,8 +94,8 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
return;
}
 
-   adev->kfd.dev = kgd2kfd->probe((struct kgd_dev *)adev,
-  adev->pdev, kfd2kgd);
+   adev->kfd.dev = kgd2kfd_probe((struct kgd_dev *)adev,
+ adev->pdev, kfd2kgd);
 
if (adev->kfd.dev)
amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
@@ -182,7 +178,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
_resources.doorbell_start_offset);
 
if (adev->asic_type < CHIP_VEGA10) {
-   kgd2kfd->device_init(adev->kfd.dev, _resources);
+   kgd2kfd_device_init(adev->kfd.dev, _resources);
return;
}
 
@@ -211,14 +207,14 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
gpu_resources.reserved_doorbell_mask = 0x1e0;
gpu_resources.reserved_doorbell_val  = 0x0e0;
 
-   kgd2kfd->device_init(adev->kfd.dev, _resources);
+   kgd2kfd_device_init(adev->kfd.dev, _resources);
}
 }
 
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
 {
if (adev->kfd.dev) {
-   kgd2kfd->device_exit(adev->kfd.dev);
+   kgd2kfd_device_exit(adev->kfd.dev);
adev->kfd.dev = NULL;
}
 }
@@ -227,13 +223,13 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry)
 {
if (adev->kfd.dev)
-   kgd2kfd->interrupt(adev->kfd.dev, ih_ring_entry);
+   kgd2kfd_interrupt(adev->kfd.dev, ih_ring_entry);
 }
 
 void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
 {
if (adev->kfd.dev)
-   kgd2kfd->suspend(adev->kfd.dev);
+   kgd2kfd_suspend(adev->kfd.dev);
 }
 
 int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
@@ -241,7 +237,7 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
int r = 0;
 
if (adev->kfd.dev)
-   r = kgd2kfd->resume(adev->kfd.dev);
+   r = kgd2kfd_resume(adev->kfd.dev);
 
return r;
 }
@@ -251,7 +247,7 @@ int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev)
int r = 0;
 
if (adev->kfd.dev)
-   r = kgd2kfd->pre_reset(adev->kfd.dev);
+   r = kgd2kfd_pre_reset(adev->kfd.dev);
 
return r;
 }
@@ -261,7 +257,7 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev)
int r = 0;
 
if (adev->kfd.dev)
-   r = kgd2kfd->post_reset(adev->kfd.dev);
+   r = kgd2kfd_post_reset(adev->kfd.dev);
 
return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 3c7055e..3107b95 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -31,7 +31,6 @@
 
 static const struct dma_fence_ops amdkfd_fence_ops;
 static atomic_t fence_seq = ATOMIC_INIT(0);
-extern const struct kgd2kfd_calls *kgd2kfd;
 
 /* Eviction Fence
  * Fence helper functions to deal with KFD memory eviction.
@@ -123,7 +122,7 @@ static bool amdkfd_fence_enable_signaling(struct dma_fence 
*f)
if (dma_fence_is_signaled(f))
return true;
 
-   if (!kgd2kfd->schedule_evict_and_restore_process(fence->mm, f))
+   if (!kgd2kfd_schedule_evict_and_restore_process(fence->mm, f))
return true;
 
return false;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3fc2618..d7b10d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -44,8 +44,6 @@
  */
 #define AMDGPU_USERPTR_RESTORE_DELAY_MS 1
 
-extern const struct kgd2kfd_calls *kgd2kfd;
-
 /* Impose 

[PATCH 3/3] drm/amdgpu: Remove kgd2kfd function pointers

2018-12-14 Thread Lin, Amber
kgd2kfd function pointers and global kgd2kfd pointer are no longer in use.

Signed-off-by: Amber Lin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |  7 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_module.c | 29 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 50 -
 4 files changed, 4 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index fc926e2..612887c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-const struct kgd2kfd_calls *kgd2kfd;
-
 static const unsigned int compute_vmid_bitmap = 0xFF00;
 
 /* Total memory size in system memory and all GPU VRAM. Used to
@@ -47,12 +45,9 @@ int amdgpu_amdkfd_init(void)
amdgpu_amdkfd_total_mem_size *= si.mem_unit;
 
 #ifdef CONFIG_HSA_AMD
-   ret = kgd2kfd_init(KFD_INTERFACE_VERSION, );
-   if (ret)
-   kgd2kfd = NULL;
+   ret = kgd2kfd_init();
amdgpu_amdkfd_gpuvm_init_mem_limits();
 #else
-   kgd2kfd = NULL;
ret = -ENOENT;
 #endif
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 3214d31..0b31a18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -214,8 +214,7 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
 
 /* KGD2KFD callbacks */
-int kgd2kfd_init(unsigned interface_version,
-const struct kgd2kfd_calls **g2f);
+int kgd2kfd_init(void);
 void kgd2kfd_exit(void);
 struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
  const struct kfd2kgd_calls *f2g);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 030b39d..932007e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -25,22 +25,6 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 
-static const struct kgd2kfd_calls kgd2kfd = {
-   .exit   = kgd2kfd_exit,
-   .probe  = kgd2kfd_probe,
-   .device_init= kgd2kfd_device_init,
-   .device_exit= kgd2kfd_device_exit,
-   .interrupt  = kgd2kfd_interrupt,
-   .suspend= kgd2kfd_suspend,
-   .resume = kgd2kfd_resume,
-   .quiesce_mm = kgd2kfd_quiesce_mm,
-   .resume_mm  = kgd2kfd_resume_mm,
-   .schedule_evict_and_restore_process =
- kgd2kfd_schedule_evict_and_restore_process,
-   .pre_reset  = kgd2kfd_pre_reset,
-   .post_reset = kgd2kfd_post_reset,
-};
-
 static int kfd_init(void)
 {
int err;
@@ -92,18 +76,9 @@ static void kfd_exit(void)
kfd_chardev_exit();
 }
 
-int kgd2kfd_init(unsigned int interface_version,
-   const struct kgd2kfd_calls **g2f)
+int kgd2kfd_init()
 {
-   int err;
-
-   err = kfd_init();
-   if (err)
-   return err;
-
-   *g2f = 
-
-   return 0;
+   return kfd_init();
 }
 
 void kgd2kfd_exit(void)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 3d5c3b0..83d9601 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -34,7 +34,6 @@
 
 struct pci_dev;
 
-#define KFD_INTERFACE_VERSION 2
 #define KGD_MAX_QUEUES 128
 
 struct kfd_dev;
@@ -330,53 +329,4 @@ struct kfd2kgd_calls {
 
 };
 
-/**
- * struct kgd2kfd_calls
- *
- * @exit: Notifies amdkfd that kgd module is unloaded
- *
- * @probe: Notifies amdkfd about a probe done on a device in the kgd driver.
- *
- * @device_init: Initialize the newly probed device (if it is a device that
- * amdkfd supports)
- *
- * @device_exit: Notifies amdkfd about a removal of a kgd device
- *
- * @suspend: Notifies amdkfd about a suspend action done to a kgd device
- *
- * @resume: Notifies amdkfd about a resume action done to a kgd device
- *
- * @quiesce_mm: Quiesce all user queue access to specified MM address space
- *
- * @resume_mm: Resume user queue access to specified MM address space
- *
- * @schedule_evict_and_restore_process: Schedules work queue that will prepare
- * for safe eviction of KFD BOs that belong to the specified process.
- *
- * @pre_reset: Notifies amdkfd that amdgpu about to reset the gpu
- *
- * @post_reset: Notify amdkfd that amgpu successfully reseted the gpu
- *
- * This structure contains function callback pointers so the kgd driver
- * will notify to the amdkfd about certain status changes.
- *
- */
-struct kgd2kfd_calls {
-   void (*exit)(void);
-   struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev,
-   const struct kfd2kgd_calls *f2g);
-   bool 

[PATCH 1/3] drm/amdgpu: Relocate kgd2kfd function declaration

2018-12-14 Thread Lin, Amber
Since amdkfd is merged into amdgpu module and amdgpu can access amdkfd
directly, move declaration of kgd2kfd functions from kfd_priv.h to
amdgpu_amdkfd.h

Signed-off-by: Amber Lin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c   | 43 
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 20 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 22 
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |  3 --
 7 files changed, 66 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 2dfaf15..358f690 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -619,4 +619,47 @@ struct kfd2kgd_calls 
*amdgpu_amdkfd_gfx_9_0_get_functions(void)
 {
return NULL;
 }
+
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
+ const struct kfd2kgd_calls *f2g)
+{
+   return NULL;
+}
+
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+const struct kgd2kfd_shared_resources *gpu_resources)
+{
+   return false;
+}
+
+void kgd2kfd_device_exit(struct kfd_dev *kfd)
+{
+}
+
+void kgd2kfd_exit(void)
+{
+}
+
+void kgd2kfd_suspend(struct kfd_dev *kfd)
+{
+}
+
+int kgd2kfd_resume(struct kfd_dev *kfd)
+{
+   return 0;
+}
+
+int kgd2kfd_pre_reset(struct kfd_dev *kfd)
+{
+   return 0;
+}
+
+int kgd2kfd_post_reset(struct kfd_dev *kfd)
+{
+   return 0;
+}
+
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
+{
+}
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 70429f7..3214d31 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -33,7 +33,6 @@
 #include "amdgpu_sync.h"
 #include "amdgpu_vm.h"
 
-extern const struct kgd2kfd_calls *kgd2kfd;
 extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
@@ -214,4 +213,23 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
 
+/* KGD2KFD callbacks */
+int kgd2kfd_init(unsigned interface_version,
+const struct kgd2kfd_calls **g2f);
+void kgd2kfd_exit(void);
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev,
+ const struct kfd2kgd_calls *f2g);
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+const struct kgd2kfd_shared_resources *gpu_resources);
+void kgd2kfd_device_exit(struct kfd_dev *kfd);
+void kgd2kfd_suspend(struct kfd_dev *kfd);
+int kgd2kfd_resume(struct kfd_dev *kfd);
+int kgd2kfd_pre_reset(struct kfd_dev *kfd);
+int kgd2kfd_post_reset(struct kfd_dev *kfd);
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
+int kgd2kfd_quiesce_mm(struct mm_struct *mm);
+int kgd2kfd_resume_mm(struct mm_struct *mm);
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+  struct dma_fence *fence);
+
 #endif /* AMDGPU_AMDKFD_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
index 574c118..3c7055e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -31,6 +31,7 @@
 
 static const struct dma_fence_ops amdkfd_fence_ops;
 static atomic_t fence_seq = ATOMIC_INIT(0);
+extern const struct kgd2kfd_calls *kgd2kfd;
 
 /* Eviction Fence
  * Fence helper functions to deal with KFD memory eviction.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index be1ab43..3fc2618 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -44,6 +44,8 @@
  */
 #define AMDGPU_USERPTR_RESTORE_DELAY_MS 1
 
+extern const struct kgd2kfd_calls *kgd2kfd;
+
 /* Impose limit on how much memory KFD can use */
 static struct {
uint64_t max_system_mem_limit;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 8018163..030b39d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "kfd_priv.h"
+#include "amdgpu_amdkfd.h"
 
 static const struct kgd2kfd_calls kgd2kfd = {
.exit   = kgd2kfd_exit,
@@ -104,7 +105,6 @@ int kgd2kfd_init(unsigned int interface_version,
 
return 0;
 }
-EXPORT_SYMBOL(kgd2kfd_init);
 
 void kgd2kfd_exit(void)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 

Re: [PATCH] drm/amd/display: Don't cleanup new cursor fb in fast cursor update

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 10:01 AM, Michel Dänzer wrote:
> On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:
>> [Why]
>> The behavior of dm_plane_helper_cleanup_fb differs depending on
>> whether the commit was asynchronous or not. When it's called from
>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
>> plane state has been swapped so it calls cleanup_fb on the old plane
>> state.
>>
>> However, in the asynchronous commit codepath the call to
>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
>> atomic_async_update has been called. Since the plane state has not
>> been swapped, the cleanup_fb call affects the new plane state with the
>> active fb.
>>
>> The slow, full path is only ever used for the cursor update when the
>> cursor CRTC changes. If there was previously a fb in the state before
>> that had gone through the fast asynchronous path then cleanup_fb will
>> be called a second time on an fb that was previously unpinned/unref'd.
>>
>> This results in a use after free on the next cursor update.
>>
>> [How]
>> Regardless of whether this was intentional behavior in DRM or not,
>> the fix is to manually call cleanup_fb on the old plane state's fb.
>>
>> Since the pointer to the old_plane_state is actually the current
>> plane->state in this function, a backup is needed. This has a minor
>> change to behavior for handle_cursor_update since it now correctly uses
>> the old state.
>>
>> The next step is to prevent the cleanup_fb call from DRM from unpinning
>> the active new_state->fb. Setting new_state->fb to NULL would be
>> enough to handle this, but DRM has a warning in the
>> drm_atomic_helper_async_commit helper if
>> plane->state->fb != new_state->fb.
>>
>> A new field was added (cursor_update) to dm_plane_state that's only
>> ever set to true during the fast path. If it's true, then cleanup_fb
>> doesn't unpin and unref the fb.
>>
>> Cc: Leo Li 
>> Cc: Harry Wentland 
>> Cc: Michel Dänzer 
>> Signed-off-by: Nicholas Kazlauskas 
>>
>> [...]
>>   
>> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct 
>> drm_plane *plane,
>>  plane->state->crtc_w = new_state->crtc_w;
>>  plane->state->crtc_h = new_state->crtc_h;
>>   
>> -handle_cursor_update(plane, old_state);
>> +handle_cursor_update(plane, _state);
>> +
>> +/*
>> + * While DRM already takes care of drm_atomic_helper_cleanup_planes,
>> + * it does it on the wrong state (new_state instead of old_state).
>> + * This is a workaround for that behavior.
>> + */
>> +dm_plane_helper_cleanup_fb(plane, _state);
>> +dm_plane_state->cursor_update = true;
> 
> We shouldn't work around DRM core code in driver code. Please work on a
> solution with the core code developers on the dri-devel list, and revert
> the cursor fast path for now.
> 
> 

I'd rather not revert the patch given how much it improves the user 
experience. Some compositors (and wine games) are unusable or unplayable 
without it.

In general I agree with not doing core workarounds, but I'm not even 
sure if this is a bug in core DRM to begin with. Since the plane state 
modification is in place if you free the old_state then you're freeing 
what's currently active on plane->state. This works for us since we only 
work on the ->fb during prepare/cleanup, but this isn't true in general.

The only other way to do this is not using 
async_commit_check/async_commit_update and reimplementing atomic commit 
behavior for the fast path. Intel does this with their legacy cursor 
update fast path - it's a lot more code and it involves using 
"deprecated" elements of the atomic API.

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.

2018-12-14 Thread Grodzovsky, Andrey
Just a reminder. Any new comments in light of all the discussion ?

Andrey


On 12/12/2018 08:08 AM, Grodzovsky, Andrey wrote:
> BTW, the problem I pointed out with drm_sched_entity_kill_jobs_cb is not
> an issue with this patch set since it removes the cb from
> s_fence->finished in general so we only free the job once - directly
> from drm_sched_entity_kill_jobs_cb.
>
> Andrey
>
>
> On 12/11/2018 11:20 AM, Christian König wrote:
>> Yeah, completely correct explained.
>>
>> I was unfortunately really busy today, but going to give that a look
>> as soon as I have time.
>>
>> Christian.
>>
>> Am 11.12.18 um 17:01 schrieb Grodzovsky, Andrey:
>>> A I understand you say that by the time the fence callback runs the job
>>> might have already been released,
>>>
>>> but how so if the job gets released from drm_sched_job_finish work
>>> handler in the normal flow - so, after the HW
>>>
>>> fence (s_fence->parent) cb is executed. Other 2 flows are error use
>>> cases where amdgpu_job_free is called directly in which
>>>
>>> cases I assume the job wasn't submitted to HW. Last flow I see is
>>> drm_sched_entity_kill_jobs_cb and here I actually see a problem
>>>
>>> with the code as it's today - drm_sched_fence_finished is called which
>>> will trigger s_fence->finished callback to run which today
>>>
>>> schedules drm_sched_job_finish which releases the job, but we don't even
>>> wait for that and call free_job cb directly from
>>>
>>> after that which seems wrong to me.
>>>
>>> Andrey
>>>
>>>
>>> On 12/10/2018 09:45 PM, Zhou, David(ChunMing) wrote:
 I don't think adding cb to sched job would work as soon as their
 lifetime is different with fence.
 Unless you make the sched job reference, otherwise we will get
 trouble sooner or later.

 -David

> -Original Message-
> From: amd-gfx  On Behalf Of
> Andrey Grodzovsky
> Sent: Tuesday, December 11, 2018 5:44 AM
> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
> ckoenig.leichtzumer...@gmail.com; e...@anholt.net;
> etna...@lists.freedesktop.org
> Cc: Zhou, David(ChunMing) ; Liu, Monk
> ; Grodzovsky, Andrey
> 
> Subject: [PATCH v3 2/2] drm/sched: Rework HW fence processing.
>
> Expedite job deletion from ring mirror list to the HW fence signal
> callback
> instead from finish_work, together with waiting for all such fences
> to signal in
> drm_sched_stop we garantee that already signaled job will not be
> processed
> twice.
> Remove the sched finish fence callback and just submit finish_work
> directly
> from the HW fence callback.
>
> v2: Fix comments.
>
> v3: Attach  hw fence cb to sched_job
>
> Suggested-by: Christian Koenig 
> Signed-off-by: Andrey Grodzovsky 
> ---
>     drivers/gpu/drm/scheduler/sched_main.c | 58
> --
> 
>     include/drm/gpu_scheduler.h    |  6 ++--
>     2 files changed, 30 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index cdf95e2..f0c1f32 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,8 +284,6 @@ static void drm_sched_job_finish(struct
> work_struct
> *work)
>     cancel_delayed_work_sync(>work_tdr);
>
>     spin_lock_irqsave(>job_list_lock, flags);
> -    /* remove job from ring_mirror_list */
> -    list_del_init(_job->node);
>     /* queue TDR for next job */
>     drm_sched_start_timeout(sched);
>     spin_unlock_irqrestore(>job_list_lock, flags); @@
> -293,22
> +291,11 @@ static void drm_sched_job_finish(struct work_struct *work)
>     sched->ops->free_job(s_job);
>     }
>
> -static void drm_sched_job_finish_cb(struct dma_fence *f,
> -    struct dma_fence_cb *cb)
> -{
> -    struct drm_sched_job *job = container_of(cb, struct
> drm_sched_job,
> - finish_cb);
> -    schedule_work(>finish_work);
> -}
> -
>     static void drm_sched_job_begin(struct drm_sched_job *s_job)  {
>     struct drm_gpu_scheduler *sched = s_job->sched;
>     unsigned long flags;
>
> - dma_fence_add_callback(_job->s_fence->finished, _job-
>> finish_cb,
> -   drm_sched_job_finish_cb);
> -
>     spin_lock_irqsave(>job_list_lock, flags);
>     list_add_tail(_job->node, >ring_mirror_list);
>     drm_sched_start_timeout(sched);
> @@ -359,12 +346,11 @@ void drm_sched_stop(struct drm_gpu_scheduler
> *sched, struct drm_sched_job *bad,
>     list_for_each_entry_reverse(s_job, >ring_mirror_list,
> node)
> {
>     if (s_job->s_fence->parent &&
> dma_fence_remove_callback(s_job->s_fence->parent,

Re: [PATCH] drm/amd/display: Don't cleanup new cursor fb in fast cursor update

2018-12-14 Thread Michel Dänzer
On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:
> [Why]
> The behavior of dm_plane_helper_cleanup_fb differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped so it calls cleanup_fb on the old plane
> state.
> 
> However, in the asynchronous commit codepath the call to
> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
> atomic_async_update has been called. Since the plane state has not
> been swapped, the cleanup_fb call affects the new plane state with the
> active fb.
> 
> The slow, full path is only ever used for the cursor update when the
> cursor CRTC changes. If there was previously a fb in the state before
> that had gone through the fast asynchronous path then cleanup_fb will
> be called a second time on an fb that was previously unpinned/unref'd.
> 
> This results in a use after free on the next cursor update.
> 
> [How]
> Regardless of whether this was intentional behavior in DRM or not,
> the fix is to manually call cleanup_fb on the old plane state's fb.
> 
> Since the pointer to the old_plane_state is actually the current
> plane->state in this function, a backup is needed. This has a minor
> change to behavior for handle_cursor_update since it now correctly uses
> the old state.
> 
> The next step is to prevent the cleanup_fb call from DRM from unpinning
> the active new_state->fb. Setting new_state->fb to NULL would be
> enough to handle this, but DRM has a warning in the
> drm_atomic_helper_async_commit helper if
> plane->state->fb != new_state->fb.
> 
> A new field was added (cursor_update) to dm_plane_state that's only
> ever set to true during the fast path. If it's true, then cleanup_fb
> doesn't unpin and unref the fb.
> 
> Cc: Leo Li 
> Cc: Harry Wentland 
> Cc: Michel Dänzer 
> Signed-off-by: Nicholas Kazlauskas 
> 
> [...]
>  
> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct 
> drm_plane *plane,
>   plane->state->crtc_w = new_state->crtc_w;
>   plane->state->crtc_h = new_state->crtc_h;
>  
> - handle_cursor_update(plane, old_state);
> + handle_cursor_update(plane, _state);
> +
> + /*
> +  * While DRM already takes care of drm_atomic_helper_cleanup_planes,
> +  * it does it on the wrong state (new_state instead of old_state).
> +  * This is a workaround for that behavior.
> +  */
> + dm_plane_helper_cleanup_fb(plane, _state);
> + dm_plane_state->cursor_update = true;

We shouldn't work around DRM core code in driver code. Please work on a
solution with the core code developers on the dri-devel list, and revert
the cursor fast path for now.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[bug report] drm/amdkfd: Add DMABuf import functionality

2018-12-14 Thread Dan Carpenter
Hello Felix Kuehling,

The patch 1dde0ea95b78: "drm/amdkfd: Add DMABuf import functionality"
from Nov 20, 2018, leads to the following static checker warning:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:1643 
kfd_ioctl_import_dmabuf()
error: 'dmabuf' dereferencing possible ERR_PTR()

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c
  1615  static int kfd_ioctl_import_dmabuf(struct file *filep,
  1616 struct kfd_process *p, void *data)
  1617  {
  1618  struct kfd_ioctl_import_dmabuf_args *args = data;
  1619  struct kfd_process_device *pdd;
  1620  struct dma_buf *dmabuf;
  1621  struct kfd_dev *dev;
  1622  int idr_handle;
  1623  uint64_t size;
  1624  void *mem;
  1625  int r;
  1626  
  1627  dev = kfd_device_by_id(args->gpu_id);
  1628  if (!dev)
  1629  return -EINVAL;
  1630  
  1631  dmabuf = dma_buf_get(args->dmabuf_fd);
 
returns error pointers and maybe NULL as well?

  1632  if (!dmabuf)
  1633  return -EINVAL;
  1634  
  1635  mutex_lock(>mutex);
  1636  
  1637  pdd = kfd_bind_process_to_device(dev, p);
  1638  if (IS_ERR(pdd)) {
  1639  r = PTR_ERR(pdd);
  1640  goto err_unlock;
  1641  }
  1642  
  1643  r = amdgpu_amdkfd_gpuvm_import_dmabuf(dev->kgd, dmabuf,
^^
Dereferenced inside function.

  1644args->va_addr, pdd->vm,
  1645(struct kgd_mem **), 
,
  1646NULL);
  1647  if (r)
  1648  goto err_unlock;

regards,
dan carpenter
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Don't cleanup new cursor fb in fast cursor update

2018-12-14 Thread Nicholas Kazlauskas
[Why]
The behavior of dm_plane_helper_cleanup_fb differs depending on
whether the commit was asynchronous or not. When it's called from
amdgpu_dm_atomic_commit_tail during a typical atomic commit the
plane state has been swapped so it calls cleanup_fb on the old plane
state.

However, in the asynchronous commit codepath the call to
drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
atomic_async_update has been called. Since the plane state has not
been swapped, the cleanup_fb call affects the new plane state with the
active fb.

The slow, full path is only ever used for the cursor update when the
cursor CRTC changes. If there was previously a fb in the state before
that had gone through the fast asynchronous path then cleanup_fb will
be called a second time on an fb that was previously unpinned/unref'd.

This results in a use after free on the next cursor update.

[How]
Regardless of whether this was intentional behavior in DRM or not,
the fix is to manually call cleanup_fb on the old plane state's fb.

Since the pointer to the old_plane_state is actually the current
plane->state in this function, a backup is needed. This has a minor
change to behavior for handle_cursor_update since it now correctly uses
the old state.

The next step is to prevent the cleanup_fb call from DRM from unpinning
the active new_state->fb. Setting new_state->fb to NULL would be
enough to handle this, but DRM has a warning in the
drm_atomic_helper_async_commit helper if
plane->state->fb != new_state->fb.

A new field was added (cursor_update) to dm_plane_state that's only
ever set to true during the fast path. If it's true, then cleanup_fb
doesn't unpin and unref the fb.

Cc: Leo Li 
Cc: Harry Wentland 
Cc: Michel Dänzer 
Signed-off-by: Nicholas Kazlauskas 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 17 +
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h   |  2 ++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d01315965af0..b71c834a959d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3589,9 +3589,10 @@ static void dm_plane_helper_cleanup_fb(struct drm_plane 
*plane,
   struct drm_plane_state *old_state)
 {
struct amdgpu_bo *rbo;
+   struct dm_plane_state *dm_plane_state = to_dm_plane_state(old_state);
int r;
 
-   if (!old_state->fb)
+   if (!old_state->fb || dm_plane_state->cursor_update)
return;
 
rbo = gem_to_amdgpu_bo(old_state->fb->obj[0]);
@@ -3638,8 +3639,8 @@ static int dm_plane_atomic_async_check(struct drm_plane 
*plane,
 static void dm_plane_atomic_async_update(struct drm_plane *plane,
 struct drm_plane_state *new_state)
 {
-   struct drm_plane_state *old_state =
-   drm_atomic_get_old_plane_state(new_state->state, plane);
+   struct drm_plane_state old_state = *plane->state;
+   struct dm_plane_state *dm_plane_state = to_dm_plane_state(new_state);
 
if (plane->state->fb != new_state->fb)
drm_atomic_set_fb_for_plane(plane->state, new_state->fb);
@@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct 
drm_plane *plane,
plane->state->crtc_w = new_state->crtc_w;
plane->state->crtc_h = new_state->crtc_h;
 
-   handle_cursor_update(plane, old_state);
+   handle_cursor_update(plane, _state);
+
+   /*
+* While DRM already takes care of drm_atomic_helper_cleanup_planes,
+* it does it on the wrong state (new_state instead of old_state).
+* This is a workaround for that behavior.
+*/
+   dm_plane_helper_cleanup_fb(plane, _state);
+   dm_plane_state->cursor_update = true;
 }
 
 static const struct drm_plane_helper_funcs dm_plane_helper_funcs = {
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 25bb91ee80ba..ecac91036864 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -254,6 +254,8 @@ struct dc_plane_state;
 struct dm_plane_state {
struct drm_plane_state base;
struct dc_plane_state *dc_state;
+
+   bool cursor_update;
 };
 
 struct dm_crtc_state {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix IH overflow on Vega10

2018-12-14 Thread Christian König
When an ring buffer overflow happens the appropriate bit is set in the WPTR
register which is also written back to memory. But clearing the bit in the
WPTR doesn't trigger another memory writeback.

So what can happen is that we end up processing the buffer overflow over and
over again because the bit is never cleared. Resulting in a random system
lockup because of an infinite loop in an interrupt handler.

This is 100% reproducible on Vega10, but it's most likely an issue we have
in the driver over all generations all the way back to radeon.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 68 --
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 992c8a8b8f77..0ab7785079c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -276,31 +276,49 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
 
wptr = le32_to_cpu(*ih->wptr_cpu);
 
-   if (REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW)) {
-   wptr = REG_SET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW, 0);
-
-   /* When a ring buffer overflow happen start parsing interrupt
-* from the last not overwritten vector (wptr + 32). Hopefully
-* this should allow us to catchup.
-*/
-   tmp = (wptr + 32) & ih->ptr_mask;
-   dev_warn(adev->dev, "IH ring buffer overflow (0x%08X, 0x%08X, 
0x%08X)\n",
-wptr, ih->rptr, tmp);
-   ih->rptr = tmp;
-
-   if (ih == >irq.ih)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
-   else if (ih == >irq.ih1)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING1);
-   else if (ih == >irq.ih2)
-   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING2);
-   else
-   BUG();
-
-   tmp = RREG32_NO_KIQ(reg);
-   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
-   WREG32_NO_KIQ(reg, tmp);
-   }
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+
+   /* Double check that the overflow wasn't already cleared. */
+   if (ih == >irq.ih)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR);
+   else if (ih == >irq.ih1)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR_RING1);
+   else if (ih == >irq.ih2)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_WPTR_RING2);
+   else
+   BUG();
+
+   wptr = RREG32_NO_KIQ(reg);
+   if (!REG_GET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW))
+   goto out;
+
+   wptr = REG_SET_FIELD(wptr, IH_RB_WPTR, RB_OVERFLOW, 0);
+
+   /* When a ring buffer overflow happen start parsing interrupt
+* from the last not overwritten vector (wptr + 32). Hopefully
+* this should allow us to catchup.
+*/
+   tmp = (wptr + 32) & ih->ptr_mask;
+   dev_warn(adev->dev, "IH ring buffer overflow "
+"(0x%08X, 0x%08X, 0x%08X)\n",
+wptr, ih->rptr, tmp);
+   ih->rptr = tmp;
+
+   if (ih == >irq.ih)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL);
+   else if (ih == >irq.ih1)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING1);
+   else if (ih == >irq.ih2)
+   reg = SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL_RING2);
+   else
+   BUG();
+
+   tmp = RREG32_NO_KIQ(reg);
+   tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
+   WREG32_NO_KIQ(reg, tmp);
+
+out:
return (wptr & ih->ptr_mask);
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Add fast path for cursor plane updates

2018-12-14 Thread Kazlauskas, Nicholas
On 12/14/18 3:46 AM, Michel Dänzer wrote:
> On 2018-12-13 9:59 p.m., Kazlauskas, Nicholas wrote:
>> On 12/13/18 2:21 PM, Kazlauskas, Nicholas wrote:
>>> On 12/13/18 11:01 AM, Kazlauskas, Nicholas wrote:
 On 12/13/18 10:48 AM, Michel Dänzer wrote:
> On 2018-12-05 8:59 p.m., Nicholas Kazlauskas wrote:
>> [Why]
>> Legacy cursor plane updates from drm helpers go through the full
>> atomic codepath. A high volume of cursor updates through this slow
>> code path can cause subsequent page-flips to skip vblank intervals
>> since each individual update is slow.
>>
>> This problem is particularly noticeable for the compton compositor.
>>
>> [How]
>> A fast path for cursor plane updates is added by using DRM asynchronous
>> commit support provided by async_check and async_update. These don't do
>> a full state/flip_done dependency stall and they don't block other
>> commit work.
>>
>> However, DC still expects itself to be single-threaded for anything
>> that can issue register writes. Screen corruption or hangs can occur
>> if write sequences overlap. Every call that potentially perform
>> register writes needs to be guarded for asynchronous updates to work.
>> The dc_lock mutex was added for this.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106175
>>
>> Cc: Leo Li 
>> Cc: Harry Wentland 
>> Signed-off-by: Nicholas Kazlauskas 
>
> Looks like this change introduced (or at least exposed) a reference
> counting bug resulting in use-after-free when Xorg shuts down[0]. See
> the attached dmesg excerpt (note that I wrapped the !bo->pin_count check
> in amdgpu_bo_unpin in WARN_ON_ONCE).
>
>
> [0] Only with
> https://gitlab.freedesktop.org/xorg/driver/xf86-video-amdgpu/commit/0d60233d26ec70d4e1faa343b438e33829c6d5e4
> , i.e. alternating between two BOs for the HW cursor, instead of always
> using the same one.
>
>

 Thanks for the heads up, I don't think I had that patch in my
 xf86-video-amdgpu when testing the desktop stack.

 The async atomic helper does the:

 drm_atomic_helper_prepare_planes
 drm_atomic_helper_async_commit
 drm_atomic_helper_cleanup_planes

 ...sequence correctly from what I can tell, so maybe it's something with
 dm_plane_helper_prepare_fb or dm_plane_helper_cleanup_fb itself.

 One case where unref could be called (not following a ref) is during
 drm_atomic_helper_prepare_planes - if prepare_fb fails then cleanup_fb
 gets called regardless, and we only ref the fb if prepare_fb is in the
 success path.
>>>
>>> The prepare_fb/cleanup_fb calls are actually fine since cleanup_fb only
>>> gets called on planes that had prepare_fb succeed in all cases as far as
>>> I can tell.
>>>
>>> I think the bug here might be forgetting to set the plane->state to the
>>> new_state. The cleanup fb callback decides whether to call it on the old
>>> plane state or new plane state depending on if the commit was aborted or
>>> not. I think every fast plane update might be treated as aborted in this
>>> case.
>>
>> This is a bug with DRM, actually.
>>
>> Typically for a regular atomic commit the prepare_fb callback is called
>> for the new_plane_state and cleanup_fb is called for the old_plane_state
>> at the end of commit tail.
>>
>> However, for asynchronous commits this isn't the same - prepare_fb is
>> called for new_plane_state and cleanup_fb is then immediately called
>> after also for the new_plane_state.
>>
>> Looking at your stack trace I can see that this is exactly what causes
>> the use after free,
>>
>> The CRTC has changed so it's stuck in the slow path (commit_tail is in
>> the trace). However, the plane->state->fb has already been unpinned and
>> unref. But the plane->state->fb is *not* NULL from the previous fast
>> update, so when it gets to cleanup planes it tries to free the
>> old_plane_state it unpins and unrefs the bo a second time.
>>
>> Then a new fast cursor update comes along (and the fb hasn't changed) so
>> it tries to prepare_fb on the same freed bo.
> 
> Do you have an idea for a fix? If not, I'm afraid we need to revert this
> change again for now, as the consequences can be severe (in one case,
> ext4 code started complaining, I couldn't reboot cleanly and had to fsck
> afterwards).
> 
> 

Yeah, I have a workaround that I will post for this.

I'm not sure if this is intended behavior or not for DRM - it doesn't 
feel correct to me, but I can understand why it'd make sense in some 
cases. It really depends on what the cleanup_fb is intending to do. It 
certainly doesn't work with the pin/unpin model though.

Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [WIP PATCH 04/15] drm/dp_mst: Stop releasing VCPI when removing ports from topology

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:33PM -0500, Lyude Paul wrote:
> This has never actually worked, and isn't needed anyway: the driver's
> always going to try to deallocate VCPI when it tears down the display
> that the VCPI belongs to.
> 
> Signed-off-by: Lyude Paul 

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index c196fb580beb..ae9d019af9f2 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1084,8 +1084,6 @@ static void drm_dp_destroy_port(struct kref *kref)
>   struct drm_dp_mst_topology_mgr *mgr = port->mgr;
>  
>   if (!port->input) {
> - port->vcpi.num_slots = 0;
> -
>   kfree(port->cached_edid);
>  
>   /*
> @@ -3381,12 +3379,6 @@ static void drm_dp_destroy_connector_work(struct 
> work_struct *work)
>   drm_dp_port_teardown_pdt(port, port->pdt);
>   port->pdt = DP_PEER_DEVICE_NONE;
>  
> - if (!port->input && port->vcpi.vcpi > 0) {
> - drm_dp_mst_reset_vcpi_slots(mgr, port);
> - drm_dp_update_payload_part1(mgr);
> - drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> - }
> -
>   drm_dp_mst_put_port_malloc(port);
>   send_hotplug = true;
>   }
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [WIP PATCH 05/15] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:34PM -0500, Lyude Paul wrote:
> Up until now, freeing payloads on remote MST hubs that just had ports
> removed has almost never worked because we've been relying on port
> validation in order to stop us from accessing ports that have already
> been freed from memory, but ports which need their payloads released due
> to being removed will never be a valid part of the topology after
> they've been removed.
> 
> Since we've introduced malloc refs, we can replace all of the validation
> logic in payload helpers which are used for deallocation with some
> well-placed malloc krefs. This ensures that regardless of whether or not
> the ports are still valid and in the topology, any port which has an
> allocated payload will remain allocated in memory until it's payloads
> have been removed - finally allowing us to actually release said
> payloads correctly.
> 
> Signed-off-by: Lyude Paul 

I think with this we can also remove the int return value (that everyone
ignored except for some debug output) from drm_dp_update_payload_part1/2.
Follow-up cleanup patch ofc.

This looks good.

Reviewed-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 54 +++
>  1 file changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index ae9d019af9f2..93f08bfd2ab3 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1989,10 +1989,6 @@ static int drm_dp_payload_send_msg(struct 
> drm_dp_mst_topology_mgr *mgr,
>   u8 sinks[DRM_DP_MAX_SDP_STREAMS];
>   int i;
>  
> - port = drm_dp_mst_topology_get_port_validated(mgr, port);
> - if (!port)
> - return -EINVAL;
> -
>   port_num = port->port_num;
>   mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port->parent);
>   if (!mstb) {
> @@ -2000,10 +1996,8 @@ static int drm_dp_payload_send_msg(struct 
> drm_dp_mst_topology_mgr *mgr,
>  port->parent,
>  _num);
>  
> - if (!mstb) {
> - drm_dp_mst_topology_put_port(port);
> + if (!mstb)
>   return -EINVAL;
> - }
>   }
>  
>   txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL);
> @@ -2032,7 +2026,6 @@ static int drm_dp_payload_send_msg(struct 
> drm_dp_mst_topology_mgr *mgr,
>   kfree(txmsg);
>  fail_put:
>   drm_dp_mst_topology_put_mstb(mstb);
> - drm_dp_mst_topology_put_port(port);
>   return ret;
>  }
>  
> @@ -2137,15 +2130,16 @@ static int drm_dp_destroy_payload_step2(struct 
> drm_dp_mst_topology_mgr *mgr,
>   */
>  int drm_dp_update_payload_part1(struct drm_dp_mst_topology_mgr *mgr)
>  {
> - int i, j;
> - int cur_slots = 1;
>   struct drm_dp_payload req_payload;
>   struct drm_dp_mst_port *port;
> + int i, j;
> + int cur_slots = 1;
>  
>   mutex_lock(>payload_lock);
>   for (i = 0; i < mgr->max_payloads; i++) {
>   struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
>   struct drm_dp_payload *payload = >payloads[i];
> + bool put_port = false;
>  
>   /* solve the current payloads - compare to the hw ones
>  - update the hw view */
> @@ -2153,12 +2147,20 @@ int drm_dp_update_payload_part1(struct 
> drm_dp_mst_topology_mgr *mgr)
>   if (vcpi) {
>   port = container_of(vcpi, struct drm_dp_mst_port,
>   vcpi);
> - port = drm_dp_mst_topology_get_port_validated(mgr,
> -   port);
> - if (!port) {
> - mutex_unlock(>payload_lock);
> - return -EINVAL;
> +
> + /* Validated ports don't matter if we're releasing
> +  * VCPI
> +  */
> + if (vcpi->num_slots) {
> + port = drm_dp_mst_topology_get_port_validated(
> + mgr, port);
> + if (!port) {
> + mutex_unlock(>payload_lock);
> + return -EINVAL;
> + }
> + put_port = true;
>   }
> +
>   req_payload.num_slots = vcpi->num_slots;
>   req_payload.vcpi = vcpi->vcpi;
>   } else {
> @@ -2190,7 +2192,7 @@ int drm_dp_update_payload_part1(struct 
> drm_dp_mst_topology_mgr *mgr)
>   }
>   cur_slots += req_payload.num_slots;
>  
> - if (port)
> + if (put_port)
>   drm_dp_mst_topology_put_port(port);
>   }
>  

Re: [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:35PM -0500, Lyude Paul wrote:
> So that the ports stay around until we've destroyed the connectors, in
> order to ensure that we don't pass an invalid pointer to any MST helpers
> once we introduce the new MST VCPI helpers.
> 
> Signed-off-by: Lyude Paul 
> ---
>  drivers/gpu/drm/i915/intel_connector.c | 4 
>  drivers/gpu/drm/i915/intel_dp_mst.c| 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_connector.c 
> b/drivers/gpu/drm/i915/intel_connector.c
> index 18e370f607bc..37d2c644f4b8 100644
> --- a/drivers/gpu/drm/i915/intel_connector.c
> +++ b/drivers/gpu/drm/i915/intel_connector.c
> @@ -95,6 +95,10 @@ void intel_connector_destroy(struct drm_connector 
> *connector)
>   intel_panel_fini(_connector->panel);
>  
>   drm_connector_cleanup(connector);
> +
> + if (intel_connector->port)

We set this as the first thing in intel_dp_add_mst_connector, so this
check isn't doing anything.

> + drm_dp_mst_put_port_malloc(intel_connector->port);
> +
>   kfree(connector);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c 
> b/drivers/gpu/drm/i915/intel_dp_mst.c
> index f05427b74e34..4d6ced34d465 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -484,6 +484,8 @@ static struct drm_connector 
> *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
>   if (ret)
>   goto err;
>  
> + drm_dp_mst_get_port_malloc(port);

Needs to be moved up where we assing intel_connector->port, or it'll
underflow on cleanup on error paths.

> +
>   return connector;
>  
>  err:
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote:
> The current way of handling refcounting in the DP MST helpers is really
> confusing and probably just plain wrong because it's been hacked up many
> times over the years without anyone actually going over the code and
> seeing if things could be simplified.
> 
> To the best of my understanding, the current scheme works like this:
> drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
> this refcount hits 0 for either of the two, they're removed from the
> topology state, but not immediately freed. Both ports and branch devices
> will reinitialize their kref once it's hit 0 before actually destroying
> themselves. The intended purpose behind this is so that we can avoid
> problems like not being able to free a remote payload that might still
> be active, due to us having removed all of the port/branch device
> structures in memory, as per:
> 
> 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> 
> Which may have worked, but then it caused use-after-free errors. Being
> new to MST at the time, I tried fixing it;
> 
> 263efde31f97 ("drm/dp/mst: Get validated port ref in 
> drm_dp_update_payload_part1()")
> 
> But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
> are validated in almost every DP MST helper function. Simply put, this
> means we go through the topology and try to see if the given
> drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> before trying to use it in order to avoid dereferencing freed memory
> (something that has happened a LOT in the past with this library).
> Because of this it doesn't actually matter whether or not we keep keep
> the ports and branches around in memory as that's not enough, because
> any function that validates the branches and ports passed to it will
> still reject them anyway since they're no longer in the topology
> structure. So, use-after-free errors were fixed but payload deallocation
> was completely broken.
> 
> Two years later, AMD informed me about this issue and I attempted to
> come up with a temporary fix, pending a long-overdue cleanup of this
> library:
> 
> c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just 
> ref")
> 
> But then that introduced use-after-free errors, so I quickly reverted
> it:
> 
> 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, 
> just ref"")
> 
> And in the process, learned that there is just no simple fix for this:
> the design is just broken. Unfortuntely, the usage of these helpers are
> quite broken as well. Some drivers like i915 have been smart enough to
> avoid accessing any kind of information from MST port structures, but
> others like nouveau have assumed, understandably so, that
> drm_dp_mst_port structures are normal and can just be accessed at any
> time without worrying about use-after-free errors.
> 
> After a lot of discussion, me and Daniel Vetter came up with a better
> idea to replace all of this.
> 
> To summarize, since this is documented far more indepth in the
> documentation this patch introduces, we make it so that drm_dp_mst_port
> and drm_dp_mst_branch structures have two different classes of
> refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
> the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> given topology. Once it hits zero, any associated connectors are removed
> and the branch or port can no longer be validated. malloc_kref
> corresponds to the lifetime of the memory allocation for the actual
> structure, and will always be non-zero so long as the topology_kref is
> non-zero. This gives us a way to allow callers to hold onto port and
> branch device structures past their topology lifetime, and dramatically
> simplifies the lifetimes of both structures. This also finally fixes the
> port deallocation problem, properly.
> 
> Additionally: since this now means that we can keep ports and branch
> devices allocated in memory for however long we need, we no longer need
> a significant amount of the port validation that we currently do.
> 
> Additionally, there is one last scenario that this fixes, which couldn't
> have been fixed properly beforehand:
> 
> - CPU1 unrefs port from topology (refcount 1->0)
> - CPU2 refs port in topology(refcount 0->1)
> 
> Since we now can guarantee memory safety for ports and branches
> as-needed, we also can make our main reference counting functions fix
> this problem by using kref_get_unless_zero() internally so that topology
> refcounts can only ever reach 0 once.
> 
> Signed-off-by: Lyude Paul 
> Cc: Daniel Vetter 
> Cc: David Airlie 
> Cc: Jerry Zuo 
> Cc: Harry Wentland 
> Cc: Juston Li 
> ---
>  .../gpu/dp-mst/topology-figure-1.dot  |  31 ++
>  .../gpu/dp-mst/topology-figure-2.dot  |  37 ++
>  .../gpu/dp-mst/topology-figure-3.dot  |  40 ++
>  Documentation/gpu/drm-kms-helpers.rst | 125 -
>  

Re: [PATCH] drm/amd/display: Add fast path for cursor plane updates

2018-12-14 Thread Michel Dänzer
On 2018-12-13 9:59 p.m., Kazlauskas, Nicholas wrote:
> On 12/13/18 2:21 PM, Kazlauskas, Nicholas wrote:
>> On 12/13/18 11:01 AM, Kazlauskas, Nicholas wrote:
>>> On 12/13/18 10:48 AM, Michel Dänzer wrote:
 On 2018-12-05 8:59 p.m., Nicholas Kazlauskas wrote:
> [Why]
> Legacy cursor plane updates from drm helpers go through the full
> atomic codepath. A high volume of cursor updates through this slow
> code path can cause subsequent page-flips to skip vblank intervals
> since each individual update is slow.
>
> This problem is particularly noticeable for the compton compositor.
>
> [How]
> A fast path for cursor plane updates is added by using DRM asynchronous
> commit support provided by async_check and async_update. These don't do
> a full state/flip_done dependency stall and they don't block other
> commit work.
>
> However, DC still expects itself to be single-threaded for anything
> that can issue register writes. Screen corruption or hangs can occur
> if write sequences overlap. Every call that potentially perform
> register writes needs to be guarded for asynchronous updates to work.
> The dc_lock mutex was added for this.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106175
>
> Cc: Leo Li 
> Cc: Harry Wentland 
> Signed-off-by: Nicholas Kazlauskas 

 Looks like this change introduced (or at least exposed) a reference
 counting bug resulting in use-after-free when Xorg shuts down[0]. See
 the attached dmesg excerpt (note that I wrapped the !bo->pin_count check
 in amdgpu_bo_unpin in WARN_ON_ONCE).


 [0] Only with
 https://gitlab.freedesktop.org/xorg/driver/xf86-video-amdgpu/commit/0d60233d26ec70d4e1faa343b438e33829c6d5e4
 , i.e. alternating between two BOs for the HW cursor, instead of always
 using the same one.


>>>
>>> Thanks for the heads up, I don't think I had that patch in my
>>> xf86-video-amdgpu when testing the desktop stack.
>>>
>>> The async atomic helper does the:
>>>
>>> drm_atomic_helper_prepare_planes
>>> drm_atomic_helper_async_commit
>>> drm_atomic_helper_cleanup_planes
>>>
>>> ...sequence correctly from what I can tell, so maybe it's something with
>>> dm_plane_helper_prepare_fb or dm_plane_helper_cleanup_fb itself.
>>>
>>> One case where unref could be called (not following a ref) is during
>>> drm_atomic_helper_prepare_planes - if prepare_fb fails then cleanup_fb
>>> gets called regardless, and we only ref the fb if prepare_fb is in the
>>> success path.
>>
>> The prepare_fb/cleanup_fb calls are actually fine since cleanup_fb only
>> gets called on planes that had prepare_fb succeed in all cases as far as
>> I can tell.
>>
>> I think the bug here might be forgetting to set the plane->state to the
>> new_state. The cleanup fb callback decides whether to call it on the old
>> plane state or new plane state depending on if the commit was aborted or
>> not. I think every fast plane update might be treated as aborted in this
>> case.
> 
> This is a bug with DRM, actually.
> 
> Typically for a regular atomic commit the prepare_fb callback is called 
> for the new_plane_state and cleanup_fb is called for the old_plane_state 
> at the end of commit tail.
> 
> However, for asynchronous commits this isn't the same - prepare_fb is 
> called for new_plane_state and cleanup_fb is then immediately called 
> after also for the new_plane_state.
> 
> Looking at your stack trace I can see that this is exactly what causes 
> the use after free,
> 
> The CRTC has changed so it's stuck in the slow path (commit_tail is in 
> the trace). However, the plane->state->fb has already been unpinned and 
> unref. But the plane->state->fb is *not* NULL from the previous fast 
> update, so when it gets to cleanup planes it tries to free the 
> old_plane_state it unpins and unrefs the bo a second time.
> 
> Then a new fast cursor update comes along (and the fb hasn't changed) so 
> it tries to prepare_fb on the same freed bo.

Do you have an idea for a fix? If not, I'm afraid we need to revert this
change again for now, as the consequences can be severe (in one case,
ext4 code started complaining, I couldn't reboot cleanly and had to fsck
afterwards).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [WIP PATCH 02/15] drm/dp_mst: Refactor drm_dp_update_payload_part1()

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:31PM -0500, Lyude Paul wrote:
> There should be no functional changes here

Would be good to explain what you did refactor here, instead of me trying
to reconstruct it from the patch. Especially pre-coffee that helps :-)
> 
> Signed-off-by: Lyude Paul 
> Cc: Juston Li 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 71 ---
>  1 file changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 9b1b5c9b1fa0..2ab16c9e6243 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1879,39 +1879,48 @@ int drm_dp_update_payload_part1(struct 
> drm_dp_mst_topology_mgr *mgr)
>  
>   mutex_lock(>payload_lock);
>   for (i = 0; i < mgr->max_payloads; i++) {
> + struct drm_dp_vcpi *vcpi = mgr->proposed_vcpis[i];
> + struct drm_dp_payload *payload = >payloads[i];
> +
>   /* solve the current payloads - compare to the hw ones
>  - update the hw view */
>   req_payload.start_slot = cur_slots;
> - if (mgr->proposed_vcpis[i]) {
> - port = container_of(mgr->proposed_vcpis[i], struct 
> drm_dp_mst_port, vcpi);
> + if (vcpi) {
> + port = container_of(vcpi, struct drm_dp_mst_port,
> + vcpi);
>   port = drm_dp_get_validated_port_ref(mgr, port);
>   if (!port) {
>   mutex_unlock(>payload_lock);
>   return -EINVAL;
>   }
> - req_payload.num_slots = 
> mgr->proposed_vcpis[i]->num_slots;
> - req_payload.vcpi = mgr->proposed_vcpis[i]->vcpi;
> + req_payload.num_slots = vcpi->num_slots;
> + req_payload.vcpi = vcpi->vcpi;
>   } else {
>   port = NULL;
>   req_payload.num_slots = 0;
>   }
>  
> - mgr->payloads[i].start_slot = req_payload.start_slot;
> + payload->start_slot = req_payload.start_slot;
>   /* work out what is required to happen with this payload */
> - if (mgr->payloads[i].num_slots != req_payload.num_slots) {
> + if (payload->num_slots != req_payload.num_slots) {
>  
>   /* need to push an update for this payload */
>   if (req_payload.num_slots) {
> - drm_dp_create_payload_step1(mgr, 
> mgr->proposed_vcpis[i]->vcpi, _payload);
> - mgr->payloads[i].num_slots = 
> req_payload.num_slots;
> - mgr->payloads[i].vcpi = req_payload.vcpi;
> - } else if (mgr->payloads[i].num_slots) {
> - mgr->payloads[i].num_slots = 0;
> - drm_dp_destroy_payload_step1(mgr, port, 
> mgr->payloads[i].vcpi, >payloads[i]);
> - req_payload.payload_state = 
> mgr->payloads[i].payload_state;
> - mgr->payloads[i].start_slot = 0;
> + drm_dp_create_payload_step1(mgr, vcpi->vcpi,
> + _payload);
> + payload->num_slots = req_payload.num_slots;
> + payload->vcpi = req_payload.vcpi;
> +
> + } else if (payload->num_slots) {
> + payload->num_slots = 0;
> + drm_dp_destroy_payload_step1(mgr, port,
> +  payload->vcpi,
> +  payload);
> + req_payload.payload_state =
> + payload->payload_state;
> + payload->start_slot = 0;
>   }
> - mgr->payloads[i].payload_state = 
> req_payload.payload_state;
> + payload->payload_state = req_payload.payload_state;
>   }
>   cur_slots += req_payload.num_slots;
>  
> @@ -1920,22 +1929,26 @@ int drm_dp_update_payload_part1(struct 
> drm_dp_mst_topology_mgr *mgr)
>   }
>  
>   for (i = 0; i < mgr->max_payloads; i++) {
> - if (mgr->payloads[i].payload_state == DP_PAYLOAD_DELETE_LOCAL) {
> - DRM_DEBUG_KMS("removing payload %d\n", i);
> - for (j = i; j < mgr->max_payloads - 1; j++) {
> - memcpy(>payloads[j], >payloads[j + 
> 1], sizeof(struct drm_dp_payload));
> - mgr->proposed_vcpis[j] = mgr->proposed_vcpis[j 
> + 1];
> - if (mgr->proposed_vcpis[j] && 
> 

Re: [WIP PATCH 01/15] drm/dp_mst: Remove bogus conditional in drm_dp_update_payload_part1()

2018-12-14 Thread Daniel Vetter
On Thu, Dec 13, 2018 at 08:25:30PM -0500, Lyude Paul wrote:
> There's no reason we need this, it's just confusing looking.
> 
> Signed-off-by: Lyude Paul 
> Cc: Juston Li 
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index ad0fb6d003be..9b1b5c9b1fa0 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1896,9 +1896,7 @@ int drm_dp_update_payload_part1(struct 
> drm_dp_mst_topology_mgr *mgr)
>   req_payload.num_slots = 0;
>   }
>  
> - if (mgr->payloads[i].start_slot != req_payload.start_slot) {
> - mgr->payloads[i].start_slot = req_payload.start_slot;
> - }
> + mgr->payloads[i].start_slot = req_payload.start_slot;

Entertaining!

Reviewed-by: Daniel Vetter 

>   /* work out what is required to happen with this payload */
>   if (mgr->payloads[i].num_slots != req_payload.num_slots) {
>  
> -- 
> 2.19.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx