Re: kfdtest failures for amdkfd (amd-staging-drm-next)

2018-09-14 Thread Felix Kuehling
You need ROCm 1.9 to work with the upstream KFD. libhsakmt from ROCm 1.8
is incompatible with the upstream KFD ABI.

Where did you get KFDTest? It's part of the same repository on GitHub as
libhsakmt. It's new on the 1.9 branch. You need libhsakmt from the same
branch. The ROCm 1.9 binaries are planned to be released later today if
all goes well.

Regards,
  Felix


On 2018-09-14 07:50 AM, Alexander Frolov wrote:
> Hi!
>
> I am trying to use amd-staging-drm-next to work with amdkfd (built
> into amdgpu) for the AMD Instinct MI25 device.
>
> As a first step I compiled libhsakmt 1.8.x and tried to run kfdtest.
> But it produces lots of failures (see below).
> Here are the results:
>
> ...
> [==] 76 tests from 14 test cases ran. (80250 ms total)
> [  PASSED  ] 39 tests.
> [  FAILED  ] 37 tests, listed below:
> [  FAILED  ] KFDEvictTest.QueueTest
> [  FAILED  ] KFDGraphicsInterop.RegisterGraphicsHandle
> [  FAILED  ] KFDIPCTest.BasicTest
> [  FAILED  ] KFDIPCTest.CrossMemoryAttachTest
> [  FAILED  ] KFDIPCTest.CMABasicTest
> [  FAILED  ] KFDLocalMemoryTest.BasicTest
> [  FAILED  ] KFDLocalMemoryTest.VerifyContentsAfterUnmapAndMap
> [  FAILED  ] KFDLocalMemoryTest.CheckZeroInitializationVram
> [  FAILED  ] KFDMemoryTest.MapUnmapToNodes
> [  FAILED  ] KFDMemoryTest.MemoryRegisterSamePtr
> [  FAILED  ] KFDMemoryTest.FlatScratchAccess
> [  FAILED  ] KFDMemoryTest.MMBench
> [  FAILED  ] KFDMemoryTest.QueryPointerInfo
> [  FAILED  ] KFDMemoryTest.PtraceAccessInvisibleVram
> [  FAILED  ] KFDMemoryTest.SignalHandling
> [  FAILED  ] KFDQMTest.CreateCpQueue
> [  FAILED  ] KFDQMTest.CreateMultipleSdmaQueues
> [  FAILED  ] KFDQMTest.SdmaConcurrentCopies
> [  FAILED  ] KFDQMTest.CreateMultipleCpQueues
> [  FAILED  ] KFDQMTest.DisableSdmaQueueByUpdateWithNullAddress
> [  FAILED  ] KFDQMTest.DisableCpQueueByUpdateWithZeroPercentage
> [  FAILED  ] KFDQMTest.OverSubscribeCpQueues
> [  FAILED  ] KFDQMTest.BasicCuMaskingEven
> [  FAILED  ] KFDQMTest.QueuePriorityOnDifferentPipe
> [  FAILED  ] KFDQMTest.QueuePriorityOnSamePipe
> [  FAILED  ] KFDQMTest.EmptyDispatch
> [  FAILED  ] KFDQMTest.SimpleWriteDispatch
> [  FAILED  ] KFDQMTest.MultipleCpQueuesStressDispatch
> [  FAILED  ] KFDQMTest.CpuWriteCoherence
> [  FAILED  ] KFDQMTest.CreateAqlCpQueue
> [  FAILED  ] KFDQMTest.QueueLatency
> [  FAILED  ] KFDQMTest.CpQueueWraparound
> [  FAILED  ] KFDQMTest.SdmaQueueWraparound
> [  FAILED  ] KFDQMTest.Atomics
> [  FAILED  ] KFDQMTest.P2PTest
> [  FAILED  ] KFDQMTest.SdmaEventInterrupt
> [  FAILED  ] KFDTopologyTest.BasicTest
>
> Does it mean that current amdkfd from the kernel cant be used with
> libhsakmt 1.8.x? or I am doing something wrong...
> Thank you!
>
> Best,
>    Alexander
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-14 Thread Darren Hart
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote:
> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann 
> ---
...
>  drivers/platform/x86/wmi.c  | 2 +-
...
>  static void link_event_work(struct work_struct *work)
> diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
> index 04791ea5d97b..e4d0697e07d6 100644
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -886,7 +886,7 @@ static const struct file_operations wmi_fops = {
>   .read   = wmi_char_read,
>   .open   = wmi_char_open,
>   .unlocked_ioctl = wmi_ioctl,
> - .compat_ioctl   = wmi_ioctl,
> + .compat_ioctl   = generic_compat_ioctl_ptrarg,
>  };

For platform/drivers/x86:

Acked-by: Darren Hart (VMware) 

As for a longer term solution, would it be possible to init fops in such
a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg
so we don't have to duplicate this boilerplate for every ioctl fops
structure?

-- 
Darren Hart
VMware Open Source Technology Center
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/7] drm/amdgpu: fix up GDS/GWS/OA shifting

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:12 PM Christian König
 wrote:
>
> That only worked by pure coincident. Completely remove the shifting and
> always apply correct PAGE_SHIFT.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h|  7 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 12 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 14 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 15 +++
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |  9 -
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  9 -
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 12 +---
>  9 files changed, 25 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d762d78e5102..8836186eb5ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -721,16 +721,16 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
> e->bo_va = amdgpu_vm_bo_find(vm, ttm_to_amdgpu_bo(e->tv.bo));
>
> if (gds) {
> -   p->job->gds_base = amdgpu_bo_gpu_offset(gds);
> -   p->job->gds_size = amdgpu_bo_size(gds);
> +   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> +   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> }
> if (gws) {
> -   p->job->gws_base = amdgpu_bo_gpu_offset(gws);
> -   p->job->gws_size = amdgpu_bo_size(gws);
> +   p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
> +   p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
> }
> if (oa) {
> -   p->job->oa_base = amdgpu_bo_gpu_offset(oa);
> -   p->job->oa_size = amdgpu_bo_size(oa);
> +   p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
> +   p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
> }
>
> if (!r && p->uf_entry.tv.bo) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> index e73728d90388..ecbcefe49a98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> @@ -24,13 +24,6 @@
>  #ifndef __AMDGPU_GDS_H__
>  #define __AMDGPU_GDS_H__
>
> -/* Because TTM request that alloacted buffer should be PAGE_SIZE aligned,
> - * we should report GDS/GWS/OA size as PAGE_SIZE aligned
> - * */
> -#define AMDGPU_GDS_SHIFT   2
> -#define AMDGPU_GWS_SHIFT   PAGE_SHIFT
> -#define AMDGPU_OA_SHIFTPAGE_SHIFT
> -
>  struct amdgpu_ring;
>  struct amdgpu_bo;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index d30a0838851b..7b3d1ebda9df 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -244,16 +244,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, 
> void *data,
> return -EINVAL;
> }
> flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
> -   if (args->in.domains == AMDGPU_GEM_DOMAIN_GDS)
> -   size = size << AMDGPU_GDS_SHIFT;
> -   else if (args->in.domains == AMDGPU_GEM_DOMAIN_GWS)
> -   size = size << AMDGPU_GWS_SHIFT;
> -   else if (args->in.domains == AMDGPU_GEM_DOMAIN_OA)
> -   size = size << AMDGPU_OA_SHIFT;
> -   else
> -   return -EINVAL;
> +   /* GDS allocations must be DW aligned */
> +   if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS)
> +   size = ALIGN(size, 4);
> }
> -   size = roundup(size, PAGE_SIZE);
>
> if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
> r = amdgpu_bo_reserve(vm->root.base.bo, false);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b766270d86cb..64cc483db973 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -528,13 +528,13 @@ static int amdgpu_info_ioctl(struct drm_device *dev, 
> void *data, struct drm_file
> struct drm_amdgpu_info_gds gds_info;
>
> memset(_info, 0, sizeof(gds_info));
> -   gds_info.gds_gfx_partition_size = 
> adev->gds.mem.gfx_partition_size >> AMDGPU_GDS_SHIFT;
> -   gds_info.compute_partition_size = 
> adev->gds.mem.cs_partition_size >> AMDGPU_GDS_SHIFT;
> -   gds_info.gds_total_size = adev->gds.mem.total_size >> 
> AMDGPU_GDS_SHIFT;
> -   gds_info.gws_per_gfx_partition = 
> adev->gds.gws.gfx_partition_size >> AMDGPU_GWS_SHIFT;
> -   

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-14 Thread Felix Kuehling
On 2018-09-14 01:52 PM, Christian König wrote:
> Am 14.09.2018 um 19:47 schrieb Philip Yang:
>> On 2018-09-14 03:51 AM, Christian König wrote:
>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling:
 On 2018-09-13 04:52 PM, Philip Yang wrote:
> Replace our MMU notifier with
> hmm_mirror_ops.sync_cpu_device_pagetables
> callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
> DRM_AMDGPU_USERPTR Kconfig.
>
> It supports both KFD userptr and gfx userptr paths.
>
> This depends on several HMM patchset from Jérôme Glisse queued for
> upstream.
>
> Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
>   drivers/gpu/drm/amd/amdgpu/Makefile    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121
> ++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
>   4 files changed, 56 insertions(+), 75 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
> b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 9221e54..960a633 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
>   config DRM_AMDGPU_USERPTR
>   bool "Always enable userptr write support"
>   depends on DRM_AMDGPU
> -    select MMU_NOTIFIER
> +    select HMM_MIRROR
>   help
> -  This option selects CONFIG_MMU_NOTIFIER if it isn't already
> -  selected to enabled full userptr support.
> +  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
> +  isn't already selected to enabled full userptr support.
>     config DRM_AMDGPU_GART_DEBUGFS
>   bool "Allow GART access through debugfs"
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 138cb78..c1e5d43 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -171,7 +171,7 @@ endif
>   amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
>   amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
>   amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
> -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
> +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
>     include $(FULL_AMD_PATH)/powerplay/Makefile
>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index e55508b..ad52f34 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -45,7 +45,7 @@
>     #include 
>   #include 
> -#include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -66,6 +66,7 @@
 Need to remove @mn documentation.

>    * @objects: interval tree containing amdgpu_mn_nodes
>    * @read_lock: mutex for recursive locking of @lock
>    * @recursion: depth of recursion
> + * @mirror: HMM mirror function support
>    *
>    * Data for each amdgpu device and process address space.
>    */
> @@ -73,7 +74,6 @@ struct amdgpu_mn {
>   /* constant after initialisation */
>   struct amdgpu_device    *adev;
>   struct mm_struct    *mm;
> -    struct mmu_notifier    mn;
>   enum amdgpu_mn_type    type;
>     /* only used on destruction */
> @@ -87,6 +87,9 @@ struct amdgpu_mn {
>   struct rb_root_cached    objects;
>   struct mutex    read_lock;
>   atomic_t    recursion;
> +
> +    /* HMM mirror */
> +    struct hmm_mirror    mirror;
>   };
>     /**
> @@ -103,7 +106,7 @@ struct amdgpu_mn_node {
>   };
>     /**
> - * amdgpu_mn_destroy - destroy the MMU notifier
> + * amdgpu_mn_destroy - destroy the HMM mirror
>    *
>    * @work: previously sheduled work item
>    *
> @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct
> work_struct *work)
>   }
>   up_write(>lock);
>   mutex_unlock(>mn_lock);
> -    mmu_notifier_unregister_no_release(>mn, amn->mm);
> +    hmm_mirror_unregister(>mirror);
> +
>   kfree(amn);
>   }
>     /**
>    * amdgpu_mn_release - callback to notify about mm destruction
 Update the function name in the comment.

>    *
> - * @mn: our notifier
> - * @mm: the mm this callback is about
> + * @mirror: the HMM mirror (mm) this callback is about
>    *
> - * Shedule a work item to lazy destroy our notifier.
> + * Shedule a work item to lazy destroy HMM mirror.
>    */
> -static void amdgpu_mn_release(struct mmu_notifier *mn,
> -  struct mm_struct *mm)
> +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
>   {
> -    struct amdgpu_mn *amn = 

Re: [PATCH 7/7] drm/amdgpu: move reserving GDS/GWS/OA into common code

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:13 PM Christian König
 wrote:
>
> We don't need that in the per ASIC code.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 19 ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 19 ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 19 ---
>  4 files changed, 18 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 438390fce714..cf93a9831318 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1848,6 +1848,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> return r;
> }
>
> +   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
> +   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
> +   >gds.gds_gfx_bo, NULL, NULL);
> +   if (r)
> +   return r;
> +
> r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
>adev->gds.gws.total_size);
> if (r) {
> @@ -1855,6 +1861,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> return r;
> }
>
> +   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
> +   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
> +   >gds.gws_gfx_bo, NULL, NULL);
> +   if (r)
> +   return r;
> +
> r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
>adev->gds.oa.total_size);
> if (r) {
> @@ -1862,6 +1874,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> return r;
> }
>
> +   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
> +   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
> +   >gds.oa_gfx_bo, NULL, NULL);
> +   if (r)
> +   return r;
> +
> /* Register debugfs entries for amdgpu_ttm */
> r = amdgpu_ttm_debugfs_init(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index c0f9732cbaf7..fc39ebbc9d9f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -4582,25 +4582,6 @@ static int gfx_v7_0_sw_init(void *handle)
> }
> }
>
> -   /* reserve GDS, GWS and OA resource for gfx */
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
> -   >gds.gds_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
> -   >gds.gws_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
> -   >gds.oa_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> adev->gfx.ce_ram_size = 0x8000;
>
> gfx_v7_0_gpu_early_init(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 57e4b14e3bd1..5d9fd2c2c244 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -2161,25 +2161,6 @@ static int gfx_v8_0_sw_init(void *handle)
> if (r)
> return r;
>
> -   /* reserve GDS, GWS and OA resource for gfx */
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
> -   >gds.gds_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
> -   >gds.gws_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> -   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
> -   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
> -   >gds.oa_gfx_bo, NULL, NULL);
> -   if (r)
> -   return r;
> -
> adev->gfx.ce_ram_size = 0x8000;
>
> r = gfx_v8_0_gpu_early_init(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index d31a2bc00d61..c075c0b6bb2b 100644
> --- 

Re: [PATCH 6/7] drm/amdgpu: drop size check

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:13 PM Christian König
 wrote:
>
> We no don't allocate zero sized kernel BOs any longer.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 710e7751c567..438390fce714 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1809,14 +1809,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  * This is used for VGA emulation and pre-OS scanout buffers to
>  * avoid display artifacts while transitioning between pre-OS
>  * and driver.  */
> -   if (adev->gmc.stolen_size) {
> -   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, 
> PAGE_SIZE,
> -   AMDGPU_GEM_DOMAIN_VRAM,
> -   >stolen_vga_memory,
> -   NULL, NULL);
> -   if (r)
> -   return r;
> -   }
> +   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE,
> +   AMDGPU_GEM_DOMAIN_VRAM,
> +   >stolen_vga_memory,
> +   NULL, NULL);
> +   if (r)
> +   return r;
> DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
>  (unsigned) (adev->gmc.real_vram_size / (1024 * 1024)));
>
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/7] drm/amdgpu: don't allocate zero sized kernel BOs

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:12 PM Christian König
 wrote:
>
> Just free the BO if the size is should be zero.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index e1f32a196f6d..d282e923d1b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -250,6 +250,11 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev,
> bool free = false;
> int r;
>
> +   if (!size) {
> +   amdgpu_bo_unref(bo_ptr);
> +   return 0;
> +   }
> +
> memset(, 0, sizeof(bp));
> bp.size = size;
> bp.byte_align = align;
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/7] drm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:12 PM Christian König
 wrote:
>
> Stops crashing on SI.
>
> Signed-off-by: Christian König 

Presumably ttm allows this?
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 48 
> +
>  1 file changed, 18 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 3e450159fe1f..710e7751c567 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1843,34 +1843,25 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>  (unsigned)(gtt_size / (1024 * 1024)));
>
> /* Initialize various on-chip memory pools */
> -   /* GDS Memory */
> -   if (adev->gds.mem.total_size) {
> -   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS,
> -  adev->gds.mem.total_size);
> -   if (r) {
> -   DRM_ERROR("Failed initializing GDS heap.\n");
> -   return r;
> -   }
> +   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS,
> +  adev->gds.mem.total_size);
> +   if (r) {
> +   DRM_ERROR("Failed initializing GDS heap.\n");
> +   return r;
> }
>
> -   /* GWS */
> -   if (adev->gds.gws.total_size) {
> -   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
> -  adev->gds.gws.total_size);
> -   if (r) {
> -   DRM_ERROR("Failed initializing gws heap.\n");
> -   return r;
> -   }
> +   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
> +  adev->gds.gws.total_size);
> +   if (r) {
> +   DRM_ERROR("Failed initializing gws heap.\n");
> +   return r;
> }
>
> -   /* OA */
> -   if (adev->gds.oa.total_size) {
> -   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
> -  adev->gds.oa.total_size);
> -   if (r) {
> -   DRM_ERROR("Failed initializing oa heap.\n");
> -   return r;
> -   }
> +   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
> +  adev->gds.oa.total_size);
> +   if (r) {
> +   DRM_ERROR("Failed initializing oa heap.\n");
> +   return r;
> }
>
> /* Register debugfs entries for amdgpu_ttm */
> @@ -1907,12 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
>
> ttm_bo_clean_mm(>mman.bdev, TTM_PL_VRAM);
> ttm_bo_clean_mm(>mman.bdev, TTM_PL_TT);
> -   if (adev->gds.mem.total_size)
> -   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS);
> -   if (adev->gds.gws.total_size)
> -   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS);
> -   if (adev->gds.oa.total_size)
> -   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA);
> +   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS);
> +   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS);
> +   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA);
> ttm_bo_device_release(>mman.bdev);
> amdgpu_ttm_global_fini(adev);
> adev->mman.initialized = false;
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/7] drm/amdgpu: stop crashing on GDS/GWS/OA eviction

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:13 PM Christian König
 wrote:
>
> Simply ignore any copying here.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c691275cd1f0..3e450159fe1f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -256,6 +256,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
> *bo,
>
> abo = ttm_to_amdgpu_bo(bo);
> switch (bo->mem.mem_type) {
> +   case AMDGPU_PL_GDS:
> +   case AMDGPU_PL_GWS:
> +   case AMDGPU_PL_OA:
> +   placement->num_placement = 0;
> +   placement->num_busy_placement = 0;
> +   return;
> +
> case TTM_PL_VRAM:
> if (!adev->mman.buffer_funcs_enabled) {
> /* Move to system memory */
> @@ -283,6 +290,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
> *bo,
> case TTM_PL_TT:
> default:
> amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
> +   break;
> }
> *placement = abo->placement;
>  }
> @@ -675,6 +683,16 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
> bool evict,
> amdgpu_move_null(bo, new_mem);
> return 0;
> }
> +   if (old_mem->mem_type == AMDGPU_PL_GDS ||
> +   old_mem->mem_type == AMDGPU_PL_GWS ||
> +   old_mem->mem_type == AMDGPU_PL_OA ||
> +   new_mem->mem_type == AMDGPU_PL_GDS ||
> +   new_mem->mem_type == AMDGPU_PL_GWS ||
> +   new_mem->mem_type == AMDGPU_PL_OA) {
> +   /* Nothing to save here */
> +   amdgpu_move_null(bo, new_mem);
> +   return 0;
> +   }
>
> if (!adev->mman.buffer_funcs_enabled)
> goto memcpy;
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/7] drm/amdgpu: add GDS, GWS and OA debugfs files

2018-09-14 Thread Alex Deucher
On Fri, Sep 14, 2018 at 3:12 PM Christian König
 wrote:
>
> Additional to the existing files for VRAM and GTT.
>
> Signed-off-by: Christian König 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index f12ae6b525b9..1565344cc139 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -2208,7 +2208,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
>  static int amdgpu_mm_dump_table(struct seq_file *m, void *data)
>  {
> struct drm_info_node *node = (struct drm_info_node *)m->private;
> -   unsigned ttm_pl = *(int *)node->info_ent->data;
> +   unsigned ttm_pl = (uintptr_t)node->info_ent->data;
> struct drm_device *dev = node->minor->dev;
> struct amdgpu_device *adev = dev->dev_private;
> struct ttm_mem_type_manager *man = >mman.bdev.man[ttm_pl];
> @@ -2218,12 +2218,12 @@ static int amdgpu_mm_dump_table(struct seq_file *m, 
> void *data)
> return 0;
>  }
>
> -static int ttm_pl_vram = TTM_PL_VRAM;
> -static int ttm_pl_tt = TTM_PL_TT;
> -
>  static const struct drm_info_list amdgpu_ttm_debugfs_list[] = {
> -   {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, _pl_vram},
> -   {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, _pl_tt},
> +   {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_VRAM},
> +   {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_TT},
> +   {"amdgpu_gds_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GDS},
> +   {"amdgpu_gws_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GWS},
> +   {"amdgpu_oa_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_OA},
> {"ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL},
>  #ifdef CONFIG_SWIOTLB
> {"ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL}
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/7] drm/amdgpu: don't allocate zero sized kernel BOs

2018-09-14 Thread Christian König
Just free the BO if the size is should be zero.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e1f32a196f6d..d282e923d1b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -250,6 +250,11 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev,
bool free = false;
int r;
 
+   if (!size) {
+   amdgpu_bo_unref(bo_ptr);
+   return 0;
+   }
+
memset(, 0, sizeof(bp));
bp.size = size;
bp.byte_align = align;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 7/7] drm/amdgpu: move reserving GDS/GWS/OA into common code

2018-09-14 Thread Christian König
We don't need that in the per ASIC code.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 19 ---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 19 ---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 19 ---
 4 files changed, 18 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 438390fce714..cf93a9831318 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1848,6 +1848,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
 
+   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
+   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
+   >gds.gds_gfx_bo, NULL, NULL);
+   if (r)
+   return r;
+
r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
   adev->gds.gws.total_size);
if (r) {
@@ -1855,6 +1861,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
 
+   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
+   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
+   >gds.gws_gfx_bo, NULL, NULL);
+   if (r)
+   return r;
+
r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
   adev->gds.oa.total_size);
if (r) {
@@ -1862,6 +1874,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
 
+   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
+   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
+   >gds.oa_gfx_bo, NULL, NULL);
+   if (r)
+   return r;
+
/* Register debugfs entries for amdgpu_ttm */
r = amdgpu_ttm_debugfs_init(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index c0f9732cbaf7..fc39ebbc9d9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -4582,25 +4582,6 @@ static int gfx_v7_0_sw_init(void *handle)
}
}
 
-   /* reserve GDS, GWS and OA resource for gfx */
-   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
-   >gds.gds_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
-   >gds.gws_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
-   >gds.oa_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
adev->gfx.ce_ram_size = 0x8000;
 
gfx_v7_0_gpu_early_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 57e4b14e3bd1..5d9fd2c2c244 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2161,25 +2161,6 @@ static int gfx_v8_0_sw_init(void *handle)
if (r)
return r;
 
-   /* reserve GDS, GWS and OA resource for gfx */
-   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS,
-   >gds.gds_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS,
-   >gds.gws_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
-   r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size,
-   PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA,
-   >gds.oa_gfx_bo, NULL, NULL);
-   if (r)
-   return r;
-
adev->gfx.ce_ram_size = 0x8000;
 
r = gfx_v8_0_gpu_early_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d31a2bc00d61..c075c0b6bb2b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1700,25 +1700,6 @@ static int gfx_v9_0_sw_init(void *handle)
if (r)
return r;
 
-   /* reserve GDS, GWS and OA resource for gfx */
-   r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size,
-   

[PATCH 3/7] drm/amdgpu: stop crashing on GDS/GWS/OA eviction

2018-09-14 Thread Christian König
Simply ignore any copying here.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c691275cd1f0..3e450159fe1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -256,6 +256,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object 
*bo,
 
abo = ttm_to_amdgpu_bo(bo);
switch (bo->mem.mem_type) {
+   case AMDGPU_PL_GDS:
+   case AMDGPU_PL_GWS:
+   case AMDGPU_PL_OA:
+   placement->num_placement = 0;
+   placement->num_busy_placement = 0;
+   return;
+
case TTM_PL_VRAM:
if (!adev->mman.buffer_funcs_enabled) {
/* Move to system memory */
@@ -283,6 +290,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
case TTM_PL_TT:
default:
amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
+   break;
}
*placement = abo->placement;
 }
@@ -675,6 +683,16 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
bool evict,
amdgpu_move_null(bo, new_mem);
return 0;
}
+   if (old_mem->mem_type == AMDGPU_PL_GDS ||
+   old_mem->mem_type == AMDGPU_PL_GWS ||
+   old_mem->mem_type == AMDGPU_PL_OA ||
+   new_mem->mem_type == AMDGPU_PL_GDS ||
+   new_mem->mem_type == AMDGPU_PL_GWS ||
+   new_mem->mem_type == AMDGPU_PL_OA) {
+   /* Nothing to save here */
+   amdgpu_move_null(bo, new_mem);
+   return 0;
+   }
 
if (!adev->mman.buffer_funcs_enabled)
goto memcpy;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 6/7] drm/amdgpu: drop size check

2018-09-14 Thread Christian König
We no don't allocate zero sized kernel BOs any longer.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 710e7751c567..438390fce714 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1809,14 +1809,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 * This is used for VGA emulation and pre-OS scanout buffers to
 * avoid display artifacts while transitioning between pre-OS
 * and driver.  */
-   if (adev->gmc.stolen_size) {
-   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, 
PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   >stolen_vga_memory,
-   NULL, NULL);
-   if (r)
-   return r;
-   }
+   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_VRAM,
+   >stolen_vga_memory,
+   NULL, NULL);
+   if (r)
+   return r;
DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
 (unsigned) (adev->gmc.real_vram_size / (1024 * 1024)));
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/7] drm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized

2018-09-14 Thread Christian König
Stops crashing on SI.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 48 +
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 3e450159fe1f..710e7751c567 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1843,34 +1843,25 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 (unsigned)(gtt_size / (1024 * 1024)));
 
/* Initialize various on-chip memory pools */
-   /* GDS Memory */
-   if (adev->gds.mem.total_size) {
-   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS,
-  adev->gds.mem.total_size);
-   if (r) {
-   DRM_ERROR("Failed initializing GDS heap.\n");
-   return r;
-   }
+   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS,
+  adev->gds.mem.total_size);
+   if (r) {
+   DRM_ERROR("Failed initializing GDS heap.\n");
+   return r;
}
 
-   /* GWS */
-   if (adev->gds.gws.total_size) {
-   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
-  adev->gds.gws.total_size);
-   if (r) {
-   DRM_ERROR("Failed initializing gws heap.\n");
-   return r;
-   }
+   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS,
+  adev->gds.gws.total_size);
+   if (r) {
+   DRM_ERROR("Failed initializing gws heap.\n");
+   return r;
}
 
-   /* OA */
-   if (adev->gds.oa.total_size) {
-   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
-  adev->gds.oa.total_size);
-   if (r) {
-   DRM_ERROR("Failed initializing oa heap.\n");
-   return r;
-   }
+   r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA,
+  adev->gds.oa.total_size);
+   if (r) {
+   DRM_ERROR("Failed initializing oa heap.\n");
+   return r;
}
 
/* Register debugfs entries for amdgpu_ttm */
@@ -1907,12 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
 
ttm_bo_clean_mm(>mman.bdev, TTM_PL_VRAM);
ttm_bo_clean_mm(>mman.bdev, TTM_PL_TT);
-   if (adev->gds.mem.total_size)
-   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS);
-   if (adev->gds.gws.total_size)
-   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS);
-   if (adev->gds.oa.total_size)
-   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA);
+   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS);
+   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS);
+   ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA);
ttm_bo_device_release(>mman.bdev);
amdgpu_ttm_global_fini(adev);
adev->mman.initialized = false;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/7] drm/amdgpu: add GDS, GWS and OA debugfs files

2018-09-14 Thread Christian König
Additional to the existing files for VRAM and GTT.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index f12ae6b525b9..1565344cc139 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2208,7 +2208,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 static int amdgpu_mm_dump_table(struct seq_file *m, void *data)
 {
struct drm_info_node *node = (struct drm_info_node *)m->private;
-   unsigned ttm_pl = *(int *)node->info_ent->data;
+   unsigned ttm_pl = (uintptr_t)node->info_ent->data;
struct drm_device *dev = node->minor->dev;
struct amdgpu_device *adev = dev->dev_private;
struct ttm_mem_type_manager *man = >mman.bdev.man[ttm_pl];
@@ -2218,12 +2218,12 @@ static int amdgpu_mm_dump_table(struct seq_file *m, 
void *data)
return 0;
 }
 
-static int ttm_pl_vram = TTM_PL_VRAM;
-static int ttm_pl_tt = TTM_PL_TT;
-
 static const struct drm_info_list amdgpu_ttm_debugfs_list[] = {
-   {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, _pl_vram},
-   {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, _pl_tt},
+   {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_VRAM},
+   {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_TT},
+   {"amdgpu_gds_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GDS},
+   {"amdgpu_gws_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GWS},
+   {"amdgpu_oa_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_OA},
{"ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL},
 #ifdef CONFIG_SWIOTLB
{"ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL}
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/7] drm/amdgpu: fix up GDS/GWS/OA shifting

2018-09-14 Thread Christian König
That only worked by pure coincident. Completely remove the shifting and
always apply correct PAGE_SHIFT.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h|  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 14 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 15 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |  9 -
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  9 -
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 12 +---
 9 files changed, 25 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d762d78e5102..8836186eb5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -721,16 +721,16 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
*p,
e->bo_va = amdgpu_vm_bo_find(vm, ttm_to_amdgpu_bo(e->tv.bo));
 
if (gds) {
-   p->job->gds_base = amdgpu_bo_gpu_offset(gds);
-   p->job->gds_size = amdgpu_bo_size(gds);
+   p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
+   p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
}
if (gws) {
-   p->job->gws_base = amdgpu_bo_gpu_offset(gws);
-   p->job->gws_size = amdgpu_bo_size(gws);
+   p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT;
+   p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT;
}
if (oa) {
-   p->job->oa_base = amdgpu_bo_gpu_offset(oa);
-   p->job->oa_size = amdgpu_bo_size(oa);
+   p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT;
+   p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT;
}
 
if (!r && p->uf_entry.tv.bo) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index e73728d90388..ecbcefe49a98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -24,13 +24,6 @@
 #ifndef __AMDGPU_GDS_H__
 #define __AMDGPU_GDS_H__
 
-/* Because TTM request that alloacted buffer should be PAGE_SIZE aligned,
- * we should report GDS/GWS/OA size as PAGE_SIZE aligned
- * */
-#define AMDGPU_GDS_SHIFT   2
-#define AMDGPU_GWS_SHIFT   PAGE_SHIFT
-#define AMDGPU_OA_SHIFTPAGE_SHIFT
-
 struct amdgpu_ring;
 struct amdgpu_bo;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d30a0838851b..7b3d1ebda9df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -244,16 +244,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
return -EINVAL;
}
flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
-   if (args->in.domains == AMDGPU_GEM_DOMAIN_GDS)
-   size = size << AMDGPU_GDS_SHIFT;
-   else if (args->in.domains == AMDGPU_GEM_DOMAIN_GWS)
-   size = size << AMDGPU_GWS_SHIFT;
-   else if (args->in.domains == AMDGPU_GEM_DOMAIN_OA)
-   size = size << AMDGPU_OA_SHIFT;
-   else
-   return -EINVAL;
+   /* GDS allocations must be DW aligned */
+   if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS)
+   size = ALIGN(size, 4);
}
-   size = roundup(size, PAGE_SIZE);
 
if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
r = amdgpu_bo_reserve(vm->root.base.bo, false);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b766270d86cb..64cc483db973 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -528,13 +528,13 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
struct drm_amdgpu_info_gds gds_info;
 
memset(_info, 0, sizeof(gds_info));
-   gds_info.gds_gfx_partition_size = 
adev->gds.mem.gfx_partition_size >> AMDGPU_GDS_SHIFT;
-   gds_info.compute_partition_size = 
adev->gds.mem.cs_partition_size >> AMDGPU_GDS_SHIFT;
-   gds_info.gds_total_size = adev->gds.mem.total_size >> 
AMDGPU_GDS_SHIFT;
-   gds_info.gws_per_gfx_partition = 
adev->gds.gws.gfx_partition_size >> AMDGPU_GWS_SHIFT;
-   gds_info.gws_per_compute_partition = 
adev->gds.gws.cs_partition_size >> AMDGPU_GWS_SHIFT;
-   gds_info.oa_per_gfx_partition = adev->gds.oa.gfx_partition_size 
>> AMDGPU_OA_SHIFT;
-   gds_info.oa_per_compute_partition = 
adev->gds.oa.cs_partition_size >> 

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Christian König

Am 14.09.2018 um 20:24 schrieb Daniel Vetter:

On Fri, Sep 14, 2018 at 6:43 PM, Christian König
 wrote:

Am 14.09.2018 um 18:10 schrieb Daniel Vetter:

On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote:

Am 14.09.2018 um 12:37 schrieb Chunming Zhou:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is
called syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer
payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
  * CPU query - A host operation that allows querying the payload of
the
timeline syncobj.
  * CPU wait - A host operation that allows a blocking wait for a
timeline syncobj to reach a specified value.
  * Device wait - A device operation that allows waiting for a
timeline syncobj to reach a specified value.
  * Device signal - A device operation that allows advancing the
timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is
signaled before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence,
when PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than
timeline value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT,
we need a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch.
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate
patch.
5. drop the submission_fence implementation and instead use wait_event()
for that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and
Christian)
   a. normal syncobj signal op will create a signal PT to tail of
signal pt list.
   b. normal syncobj wait op will create a wait pt with last signal
point, and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 

At least on first glance that looks like it should work, going to do a
detailed review on Monday.

Just for my understanding, it's all condensed down to 1 patch now? I kinda
didn't follow the detailed discussion last few days at all :-/


I've already committed all the cleanup/fix prerequisites to drm-misc-next.

The driver specific implementation needs to come on top and maybe a new CPU
wait IOCTL.

But essentially this patch is just the core of the kernel implementation.

Ah cool, missed that.


Also, is there a testcase, igt highly preferred (because then we'll run it
in our intel-gfx CI, and a bunch of people outside of intel have already
discovered that and are using it).


libdrm patches and I think amdgpu based test cases where already published
as well.

Not sure about igt testcases.

I guess we can write them when the intel implementation shows up. Just
kinda still hoping that we'd have a more unfified test suite. And not
really well-kept secret: We do have an amdgpu in our CI, in the form
of kbl-g :-) But unfortunately it's not running the full test set for
patches (only for drm-tip). But we could perhaps run more of the
amdgpu tests somehow, if there's serious interest.


Well I wouldn't mind if we sooner or later get rid of the amdgpu unit 
tests in libdrm.


They are more or less just a really bloody mess.

Christian.



Cheers, Daniel



Christian.



Thanks, Daniel


Christian.


---
drivers/gpu/drm/drm_syncobj.c  | 294
++---
drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
include/drm/drm_syncobj.h  |  62 +++--
include/uapi/drm/drm.h |   1 +
4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ 

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Daniel Vetter
On Fri, Sep 14, 2018 at 6:43 PM, Christian König
 wrote:
> Am 14.09.2018 um 18:10 schrieb Daniel Vetter:
>>
>> On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote:
>>>
>>> Am 14.09.2018 um 12:37 schrieb Chunming Zhou:

 This patch is for VK_KHR_timeline_semaphore extension, semaphore is
 called syncobj in kernel side:
 This extension introduces a new type of syncobj that has an integer
 payload
 identifying a point in a timeline. Such timeline syncobjs support the
 following operations:
  * CPU query - A host operation that allows querying the payload of
 the
timeline syncobj.
  * CPU wait - A host operation that allows a blocking wait for a
timeline syncobj to reach a specified value.
  * Device wait - A device operation that allows waiting for a
timeline syncobj to reach a specified value.
  * Device signal - A device operation that allows advancing the
timeline syncobj to a specified value.

 Since it's a timeline, that means the front time point(PT) always is
 signaled before the late PT.
 a. signal PT design:
 Signal PT fence N depends on PT[N-1] fence and signal opertion fence,
 when PT[N] fence is signaled,
 the timeline will increase to value of PT[N].
 b. wait PT design:
 Wait PT fence is signaled by reaching timeline point value, when
 timeline is increasing, will compare
 wait PTs value with new timeline value, if PT value is lower than
 timeline value, then wait PT will be
 signaled, otherwise keep in list. syncobj wait operation can wait on any
 point of timeline,
 so need a RB tree to order them. And wait PT could ahead of signal PT,
 we need a sumission fence to
 perform that.

 v2:
 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
 2. move unexposed denitions to .c file. (Daniel Vetter)
 3. split up the change to drm_syncobj_find_fence() in a separate patch.
 (Christian)
 4. split up the change to drm_syncobj_replace_fence() in a separate
 patch.
 5. drop the submission_fence implementation and instead use wait_event()
 for that. (Christian)
 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

 v3:
 1. replace normal syncobj with timeline implemenation. (Vetter and
 Christian)
   a. normal syncobj signal op will create a signal PT to tail of
 signal pt list.
   b. normal syncobj wait op will create a wait pt with last signal
 point, and this wait PT is only signaled by related signal point PT.
 2. many bug fix and clean up
 3. stub fence moving is moved to other patch.

 v4:
 1. fix RB tree loop with while(node=rb_first(...)). (Christian)
 2. fix syncobj lifecycle. (Christian)
 3. only enable_signaling when there is wait_pt. (Christian)
 4. fix timeline path issues.
 5. write a timeline test in libdrm

 v5: (Christian)
 1. semaphore is called syncobj in kernel side.
 2. don't need 'timeline' characters in some function name.
 3. keep syncobj cb

 normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
 timeline syncobj is tested by ./amdgpu_test -s 9

 Signed-off-by: Chunming Zhou 
 Cc: Christian Konig 
 Cc: Dave Airlie 
 Cc: Daniel Rakos 
 Cc: Daniel Vetter 
>>>
>>> At least on first glance that looks like it should work, going to do a
>>> detailed review on Monday.
>>
>> Just for my understanding, it's all condensed down to 1 patch now? I kinda
>> didn't follow the detailed discussion last few days at all :-/
>
>
> I've already committed all the cleanup/fix prerequisites to drm-misc-next.
>
> The driver specific implementation needs to come on top and maybe a new CPU
> wait IOCTL.
>
> But essentially this patch is just the core of the kernel implementation.

Ah cool, missed that.

>> Also, is there a testcase, igt highly preferred (because then we'll run it
>> in our intel-gfx CI, and a bunch of people outside of intel have already
>> discovered that and are using it).
>
>
> libdrm patches and I think amdgpu based test cases where already published
> as well.
>
> Not sure about igt testcases.

I guess we can write them when the intel implementation shows up. Just
kinda still hoping that we'd have a more unfified test suite. And not
really well-kept secret: We do have an amdgpu in our CI, in the form
of kbl-g :-) But unfortunately it's not running the full test set for
patches (only for drm-tip). But we could perhaps run more of the
amdgpu tests somehow, if there's serious interest.

Cheers, Daniel


> Christian.
>
>
>>
>> Thanks, Daniel
>>
>>> Christian.
>>>
 ---
drivers/gpu/drm/drm_syncobj.c  | 294
 ++---
drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
include/drm/drm_syncobj.h  |  62 +++--

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-14 Thread Christian König

Am 14.09.2018 um 19:47 schrieb Philip Yang:

On 2018-09-14 03:51 AM, Christian König wrote:

Am 13.09.2018 um 23:51 schrieb Felix Kuehling:

On 2018-09-13 04:52 PM, Philip Yang wrote:
Replace our MMU notifier with 
hmm_mirror_ops.sync_cpu_device_pagetables

callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream.

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
  drivers/gpu/drm/amd/amdgpu/Makefile    |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 
++---

  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
  4 files changed, 56 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
  bool "Always enable userptr write support"
  depends on DRM_AMDGPU
-    select MMU_NOTIFIER
+    select HMM_MIRROR
  help
-  This option selects CONFIG_MMU_NOTIFIER if it isn't already
-  selected to enabled full userptr support.
+  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+  isn't already selected to enabled full userptr support.
    config DRM_AMDGPU_GART_DEBUGFS
  bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
    include $(FULL_AMD_PATH)/powerplay/Makefile
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ad52f34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -45,7 +45,7 @@
    #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -66,6 +66,7 @@

Need to remove @mn documentation.


   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -73,7 +74,6 @@ struct amdgpu_mn {
  /* constant after initialisation */
  struct amdgpu_device    *adev;
  struct mm_struct    *mm;
-    struct mmu_notifier    mn;
  enum amdgpu_mn_type    type;
    /* only used on destruction */
@@ -87,6 +87,9 @@ struct amdgpu_mn {
  struct rb_root_cached    objects;
  struct mutex    read_lock;
  atomic_t    recursion;
+
+    /* HMM mirror */
+    struct hmm_mirror    mirror;
  };
    /**
@@ -103,7 +106,7 @@ struct amdgpu_mn_node {
  };
    /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct 
work_struct *work)

  }
  up_write(>lock);
  mutex_unlock(>mn_lock);
-    mmu_notifier_unregister_no_release(>mn, amn->mm);
+    hmm_mirror_unregister(>mirror);
+
  kfree(amn);
  }
    /**
   * amdgpu_mn_release - callback to notify about mm destruction

Update the function name in the comment.


   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
-  struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-    struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+    struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, 
mirror);

    INIT_WORK(>work, amdgpu_mn_destroy);
  schedule_work(>work);
  }
  -
  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct 
amdgpu_mn_node *node,

  /**
   * amdgpu_mn_invalidate_range_start_gfx - callback to notify 
about mm change

   *
- * @mn: our notifier
- * @mm: the mm this callback is about
- * @start: start of updated range
- * @end: end of updated range
+ * @mirror: the hmm_mirror (mm) is about to update
+ * @update: the update start, end address
   *
   * Block for operations on BOs to finish and mark pages as 
accessed and

   * potentially dirty.

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-14 Thread Philip Yang

On 2018-09-14 03:51 AM, Christian König wrote:

Am 13.09.2018 um 23:51 schrieb Felix Kuehling:

On 2018-09-13 04:52 PM, Philip Yang wrote:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream.

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
  drivers/gpu/drm/amd/amdgpu/Makefile    |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 
++---

  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
  4 files changed, 56 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
  bool "Always enable userptr write support"
  depends on DRM_AMDGPU
-    select MMU_NOTIFIER
+    select HMM_MIRROR
  help
-  This option selects CONFIG_MMU_NOTIFIER if it isn't already
-  selected to enabled full userptr support.
+  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+  isn't already selected to enabled full userptr support.
    config DRM_AMDGPU_GART_DEBUGFS
  bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
    include $(FULL_AMD_PATH)/powerplay/Makefile
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ad52f34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -45,7 +45,7 @@
    #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -66,6 +66,7 @@

Need to remove @mn documentation.


   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -73,7 +74,6 @@ struct amdgpu_mn {
  /* constant after initialisation */
  struct amdgpu_device    *adev;
  struct mm_struct    *mm;
-    struct mmu_notifier    mn;
  enum amdgpu_mn_type    type;
    /* only used on destruction */
@@ -87,6 +87,9 @@ struct amdgpu_mn {
  struct rb_root_cached    objects;
  struct mutex    read_lock;
  atomic_t    recursion;
+
+    /* HMM mirror */
+    struct hmm_mirror    mirror;
  };
    /**
@@ -103,7 +106,7 @@ struct amdgpu_mn_node {
  };
    /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct 
work_struct *work)

  }
  up_write(>lock);
  mutex_unlock(>mn_lock);
-    mmu_notifier_unregister_no_release(>mn, amn->mm);
+    hmm_mirror_unregister(>mirror);
+
  kfree(amn);
  }
    /**
   * amdgpu_mn_release - callback to notify about mm destruction

Update the function name in the comment.


   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
-  struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-    struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+    struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, 
mirror);

    INIT_WORK(>work, amdgpu_mn_destroy);
  schedule_work(>work);
  }
  -
  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct 
amdgpu_mn_node *node,

  /**
   * amdgpu_mn_invalidate_range_start_gfx - callback to notify about 
mm change

   *
- * @mn: our notifier
- * @mm: the mm this callback is about
- * @start: start of updated range
- * @end: end of updated range
+ * @mirror: the hmm_mirror (mm) is about to update
+ * @update: the update start, end address
   *
   * Block for operations on BOs to finish and mark pages as 
accessed and

   * potentially dirty.
   */
-static int 

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Christian König

Am 14.09.2018 um 18:10 schrieb Daniel Vetter:

On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote:

Am 14.09.2018 um 12:37 schrieb Chunming Zhou:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
 * CPU query - A host operation that allows querying the payload of the
   timeline syncobj.
 * CPU wait - A host operation that allows a blocking wait for a
   timeline syncobj to reach a specified value.
 * Device wait - A device operation that allows waiting for a
   timeline syncobj to reach a specified value.
 * Device signal - A device operation that allows advancing the
   timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
  a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
  b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 

At least on first glance that looks like it should work, going to do a
detailed review on Monday.

Just for my understanding, it's all condensed down to 1 patch now? I kinda
didn't follow the detailed discussion last few days at all :-/


I've already committed all the cleanup/fix prerequisites to drm-misc-next.

The driver specific implementation needs to come on top and maybe a new 
CPU wait IOCTL.


But essentially this patch is just the core of the kernel implementation.


Also, is there a testcase, igt highly preferred (because then we'll run it
in our intel-gfx CI, and a bunch of people outside of intel have already
discovered that and are using it).


libdrm patches and I think amdgpu based test cases where already 
published as well.


Not sure about igt testcases.

Christian.



Thanks, Daniel


Christian.


---
   drivers/gpu/drm/drm_syncobj.c  | 294 ++---
   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
   include/drm/drm_syncobj.h  |  62 +++--
   include/uapi/drm/drm.h |   1 +
   4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@either
   #include "drm_internal.h"
   #include 
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
   struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
   };
+struct drm_syncobj_signal_pt {
+   struct dma_fence_array *base;
+   u64value;
+   struct list_head list;
+};
   /**
* drm_syncobj_find - lookup and reference a sync object.
@@ -124,7 +132,7 @@ static int 

Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Daniel Vetter
On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote:
> Am 14.09.2018 um 12:37 schrieb Chunming Zhou:
> > This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
> > syncobj in kernel side:
> > This extension introduces a new type of syncobj that has an integer payload
> > identifying a point in a timeline. Such timeline syncobjs support the
> > following operations:
> > * CPU query - A host operation that allows querying the payload of the
> >   timeline syncobj.
> > * CPU wait - A host operation that allows a blocking wait for a
> >   timeline syncobj to reach a specified value.
> > * Device wait - A device operation that allows waiting for a
> >   timeline syncobj to reach a specified value.
> > * Device signal - A device operation that allows advancing the
> >   timeline syncobj to a specified value.
> > 
> > Since it's a timeline, that means the front time point(PT) always is 
> > signaled before the late PT.
> > a. signal PT design:
> > Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
> > PT[N] fence is signaled,
> > the timeline will increase to value of PT[N].
> > b. wait PT design:
> > Wait PT fence is signaled by reaching timeline point value, when timeline 
> > is increasing, will compare
> > wait PTs value with new timeline value, if PT value is lower than timeline 
> > value, then wait PT will be
> > signaled, otherwise keep in list. syncobj wait operation can wait on any 
> > point of timeline,
> > so need a RB tree to order them. And wait PT could ahead of signal PT, we 
> > need a sumission fence to
> > perform that.
> > 
> > v2:
> > 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
> > 2. move unexposed denitions to .c file. (Daniel Vetter)
> > 3. split up the change to drm_syncobj_find_fence() in a separate patch. 
> > (Christian)
> > 4. split up the change to drm_syncobj_replace_fence() in a separate patch.
> > 5. drop the submission_fence implementation and instead use wait_event() 
> > for that. (Christian)
> > 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)
> > 
> > v3:
> > 1. replace normal syncobj with timeline implemenation. (Vetter and 
> > Christian)
> >  a. normal syncobj signal op will create a signal PT to tail of signal 
> > pt list.
> >  b. normal syncobj wait op will create a wait pt with last signal 
> > point, and this wait PT is only signaled by related signal point PT.
> > 2. many bug fix and clean up
> > 3. stub fence moving is moved to other patch.
> > 
> > v4:
> > 1. fix RB tree loop with while(node=rb_first(...)). (Christian)
> > 2. fix syncobj lifecycle. (Christian)
> > 3. only enable_signaling when there is wait_pt. (Christian)
> > 4. fix timeline path issues.
> > 5. write a timeline test in libdrm
> > 
> > v5: (Christian)
> > 1. semaphore is called syncobj in kernel side.
> > 2. don't need 'timeline' characters in some function name.
> > 3. keep syncobj cb
> > 
> > normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
> > timeline syncobj is tested by ./amdgpu_test -s 9
> > 
> > Signed-off-by: Chunming Zhou 
> > Cc: Christian Konig 
> > Cc: Dave Airlie 
> > Cc: Daniel Rakos 
> > Cc: Daniel Vetter 
> 
> At least on first glance that looks like it should work, going to do a
> detailed review on Monday.

Just for my understanding, it's all condensed down to 1 patch now? I kinda
didn't follow the detailed discussion last few days at all :-/

Also, is there a testcase, igt highly preferred (because then we'll run it
in our intel-gfx CI, and a bunch of people outside of intel have already
discovered that and are using it).

Thanks, Daniel

> 
> Christian.
> 
> > ---
> >   drivers/gpu/drm/drm_syncobj.c  | 294 ++---
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
> >   include/drm/drm_syncobj.h  |  62 +++--
> >   include/uapi/drm/drm.h |   1 +
> >   4 files changed, 292 insertions(+), 69 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> > index e9ce623d049e..e78d076f2703 100644
> > --- a/drivers/gpu/drm/drm_syncobj.c
> > +++ b/drivers/gpu/drm/drm_syncobj.c
> > @@ -56,6 +56,9 @@either
> >   #include "drm_internal.h"
> >   #include 
> > +/* merge normal syncobj to timeline syncobj, the point interval is 1 */
> > +#define DRM_SYNCOBJ_NORMAL_POINT 1
> > +
> >   struct drm_syncobj_stub_fence {
> > struct dma_fence base;
> > spinlock_t lock;
> > @@ -82,6 +85,11 @@ static const struct dma_fence_ops 
> > drm_syncobj_stub_fence_ops = {
> > .release = drm_syncobj_stub_fence_release,
> >   };
> > +struct drm_syncobj_signal_pt {
> > +   struct dma_fence_array *base;
> > +   u64value;
> > +   struct list_head list;
> > +};
> >   /**
> >* drm_syncobj_find - lookup and reference a sync object.
> > @@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
> > drm_syncobj *syncobj,
> >   {
> >  

[ANNOUNCE] xf86-video-ati 18.1.0

2018-09-14 Thread Michel Dänzer

I'm pleased to announce the 18.1.0 release of xf86-video-ati, the Xorg
driver for ATI/AMD Radeon GPUs supported by the radeon kernel driver.
This release supports xserver versions 1.13-1.20.

Highlights:

* Fixed random screen corruption and crashes when using GLAMOR with Xorg
  1.20.
* Support for leasing RandR outputs to clients.
* Various robustness fixes for TearFree. In particular, fixed several
  cases in which disabling TearFree at runtime would result in the Xorg
  process freezing or crashing.
* Fixed some m4 related build issues with older versions of autotools.

Plus other improvements and fixes. Thanks to everybody who contributed
to this release in any way!


Emil Velikov (1):
  Do not export the DriverRec RADEON

Jammy Zhou (1):
  Remove throttling from radeon_dri2_copy_region2

Jim Qu (1):
  Wait for pending scanout update before calling drmmode_crtc_scanout_free

Keith Packard (3):
  modesetting: Record non-desktop kernel property at PreInit time
  modesetting: Create CONNECTOR_ID properties for outputs [v2]
  Add RandR leases support

Michel Dänzer (55):
  Bail from dri2_create_buffer2 if we can't get a pixmap
  glamor: Bail CreatePixmap on unsupported pixmap depth
  Drop unused drmmode_create_bo_pixmap surface parameter
  EXA: Remove old RADEONEXACreatePixmap hook
  Only initialize libdrm_radeon surface manager for >= R600
  glamor: Don't store radeon_surfaces in pixmaps
  Factor out radeon_surface_initialize helper
  Move flush from radeon_scanout_do_update to its callers
  Refactor radeon_finish helper
  Add struct radeon_buffer
  glamor: Use GBM for BO allocation when possible
  Swap pixmap privates in radeon_dri2_exchange_buffers
  Ignore RADEON_DRM_QUEUE_ERROR (0) in radeon_drm_abort_entry
  Track DRM event queue sequence number in scanout_update_pending
  Abort scanout_update_pending event when possible
  Update RandR CRTC state if set_mode_major fails in set_desired_modes
  Simplify drmmode_crtc_scanout_update
  Don't call scanout_flip/update with a legacy RandR scanout buffer
  Simplify drmmode_handle_transform
  Set drmmode_crtc->scanout_id = 0 when TearFree is disabled
  Refactor drmmode_output_set_tear_free helper
  Wait for pending flips in drmmode_output_set_tear_free
  Replace 'foo == NULL' with '!foo'
  Call drmmode_do_crtc_dpms from drmmode_crtc_dpms as well
  Use drmmode_crtc_dpms in drmmode_set_desired_modes
  Check dimensions passed to drmmode_xf86crtc_resize
  Remove #if 0'd code
  Call drmmode_crtc_gamma_do_set from drmmode_setup_colormap
  glamor: Fix glamor_block_handler argument in radeon_glamor_finish
  glamor: Invalidate cached GEM handle in radeon_set_pixmap_bo
  Don't allocate drmmode_output->props twice
  Hardcode "non-desktop" RandR property name
  Remove drmmode_terminate_leases
  Use strcpy for RandR output property names
  Bump version to 18.0.99
  glamor: Use glamor_egl_create_textured_pixmap_from_gbm_bo when possible
  glamor: Set RADEON_CREATE_PIXMAP_DRI2 for DRI3 pixmaps
  Store FB for each CRTC in drmmode_flipdata_rec
  Use correct FB handle in radeon_do_pageflip
  Move DRM event queue related initialization to radeon_drm_queue_init
  Add radeon_drm_wait_pending_flip function
  Add radeon_drm_handle_event wrapper for drmHandleEvent
  Defer vblank event handling while waiting for a pending flip
  Remove drmmode_crtc_private_rec::present_vblank_* related code
  Add m4 directory
  Use AC_CONFIG_MACRO_DIR instead of AC_CONFIG_MACRO_DIRS
  EXA: Handle NULL BO pointer in radeon_set_pixmap_bo
  Handle ihandle == -1 in radeon_set_shared_pixmap_backing
  EXA: Handle ihandle == -1 in RADEONEXASharePixmapBacking
  glamor: Handle ihandle == -1 in radeon_glamor_set_shared_pixmap_backing
  Always delete entry from list in drm_queue_handler
  Don't use xorg_list_for_each_entry_safe for signalled flips
  Bail early from drm_wait_pending_flip if there's no pending flip
  Fix uninitialized use of local variable pitch in radeon_setup_kernel_mem
  Bump version for 18.1.0 release

git tag: xf86-video-ati-18.1.0

https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.bz2
MD5:  7910883fff7f4a462efac0fe059ed7e3  xf86-video-ati-18.1.0.tar.bz2
SHA1: 87beb7d09f5b722570adda9a5a1822cbd19e7059  xf86-video-ati-18.1.0.tar.bz2
SHA256: 6c335f423c1dc3d904550d41cb871ca4130ba7037dda67d82e3f1555e1bfb9ac  
xf86-video-ati-18.1.0.tar.bz2
SHA512: 
7a58c9a6cb4876bd2ff37d837372b4e360e81fec7de6a6c7a48d70a5338d62745f734f5d4207f30aa368ff2d9ef44f5f1ef36afd73802a618998c16fe395ed53
  xf86-video-ati-18.1.0.tar.bz2
PGP:  
https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.bz2.sig

https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.gz
MD5:  1c87fce3ebf10a0704a01433bfbf  

[pull] amdgpu/kfd, radeon, ttm, scheduler drm-next-4.20

2018-09-14 Thread Alex Deucher
Hi Dave,

First pull for 4.20 for amdgpu/kfd, radeon, ttm, and the GPU scheduler.
amdgpu/kfd:
- Picasso (new APU) support
- Raven2 (new APU) support
- Vega20 enablement
- ACP powergating improvements
- Add ABGR/XBGR display support
- VCN JPEG engine support
- Initial xGMI support
- Use load balancing for engine scheduling
- Lots of new documentation
- Rework and clean up i2c and aux handling in DC
- Add DP YCbCr 4:2:0 support in DC
- Add DMCU firmware loading for Raven (used for ABM and PSR)
- New debugfs features in DC
- LVDS support in DC
- Implement wave kill for gfx/compute (light weight reset for shaders)
- Use AGP aperture to avoid gart mappings when possible
- GPUVM performance improvements
- Bulk moves for more efficient GPUVM LRU handling
- Merge amdgpu and amdkfd into one module
- Enable gfxoff and stutter mode on Raven
- Misc cleanups

Scheduler:
- Load balancing support
- Bug fixes

ttm:
- Bulk move functionality
- Bug fixes

radeon:
- Misc cleanups

The following changes since commit 5b394b2ddf0347bef56e50c69a58773c94343ff3:

  Linux 4.19-rc1 (2018-08-26 14:11:59 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~agd5f/linux drm-next-4.20

for you to fetch changes up to 0957dc7097a3f462f6cedb45cf9b9785cc29e5bb:

  drm/amdgpu: revert "stop using gart_start as offset for the GTT domain" 
(2018-09-14 10:05:42 -0500)


Alex Deucher (22):
  drm/amdgpu/pp: endian fixes for process_pptables_v1_0.c
  drm/amdgpu/pp: endian fixes for processpptables.c
  drm/amdgpu/powerplay: check vrefresh when when changing displays
  drm/amdgpu: add AVFS control to PP_FEATURE_MASK
  drm/amdgpu/powerplay/smu7: enable AVFS control via ppfeaturemask
  drm/amdgpu/powerplay/vega10: enable AVFS control via ppfeaturemask
  Revert "drm/amdgpu: Add nbio support for vega20 (v2)"
  drm/amdgpu: remove experimental flag for vega20
  drm/amdgpu/display: add support for LVDS (v5)
  drm/amdgpu: add missing CHIP_HAINAN in amdgpu_ucode_get_load_type
  drm/amdgpu/gmc9: rework stolen vga memory handling
  drm/amdgpu/gmc9: don't keep stolen memory on Raven
  drm/amdgpu/gmc9: don't keep stolen memory on vega12
  drm/amdgpu/gmc9: don't keep stolen memory on vega20
  drm/amdgpu/gmc: add initial xgmi structure to amdgpu_gmc structure
  drm/amdgpu/gmc9: add a new gfxhub 1.1 helper for xgmi
  drm/amdgpu/gmc9: Adjust GART and AGP location with xgmi offset (v2)
  drm/amdgpu: use IP presence to free uvd and vce handles
  drm/amdgpu: set external rev id for raven2
  drm/amdgpu/soc15: clean up picasso support
  drm/amdgpu: simplify Raven, Raven2, and Picasso handling
  drm/amdgpu/display: return proper error codes in dm

Alvin lee (2):
  drm/amd/display: Enable Stereo in Dal3
  drm/amd/display: Program vsc_infopacket in commit_planes_for_stream

Amber Lin (4):
  drm/amdgpu: Merge amdkfd into amdgpu
  drm/amdgpu: Remove CONFIG_HSA_AMD_MODULE
  drm/amdgpu: Move KFD parameters to amdgpu (v3)
  drm/amdgpu: Relocate some definitions v2

Andrey Grodzovsky (8):
  drm/amdgpu: Fix page fault and kasan warning on pci device remove.
  drm/scheduler: Add job dependency trace.
  drm/amdgpu: Add job pipe sync dependecy trace
  drm/scheduler: Add stopped flag to drm_sched_entity
  drm/amdgpu: Refine gmc9 VM fault print.
  drm/amdgpu: Use drm_dev_unplug in PCI .remove
  drm/amdgpu: Fix SDMA TO after GPU reset v3
  drm/amd/display: Fix pflip IRQ status after gpu reset.

Anthony Koo (10):
  drm/amd/display: Refactor FreeSync module
  drm/amd/display: add method to check for supported range
  drm/amd/display: Fix bug where refresh rate becomes fixed
  drm/amd/display: Fix bug that causes black screen
  drm/amd/display: Add back code to allow for rounding error
  drm/amd/display: fix LFC tearing at top of screen
  drm/amd/display: refactor vupdate interrupt registration
  drm/amd/display: Correct rounding calcs in mod_freesync_is_valid_range
  drm/amd/display: add config for sending VSIF
  drm/amd/display: move edp fast boot optimization flag to stream

Bhawanpreet Lakha (3):
  drm/amd/display: Build stream update and plane updates in dm
  drm/amd/display: Add Raven2 definitions in dc
  drm/amd/display: Add DC config flag for Raven2 (v2)

Boyuan Zhang (6):
  drm/amdgpu: add emit reg write reg wait for vcn jpeg
  drm/amdgpu: add system interrupt register offset header
  drm/amdgpu: add system interrupt mask for jrbc
  drm/amdgpu: enable system interrupt for jrbc
  drm/amdgpu: add emit trap for vcn jpeg
  drm/amdgpu: fix emit frame size and comments for jpeg

Charlene Liu (2):
  drm/amd/display: pass compat_level to hubp
  drm/amd/display: add retimer log for HWQ tuning use.

Chiawen Huang (2):
  drm/amd/display: add aux transition event log.
  

[ANNOUNCE] xf86-video-amdgpu 18.1.0

2018-09-14 Thread Michel Dänzer

I'm pleased to announce the 18.1.0 release of xf86-video-amdgpu, the
Xorg driver for AMD Radeon GPUs supported by the amdgpu kernel driver.
This release supports xserver versions 1.13-1.20.

Highlights:

* When using DC as of Linux 4.17:
  - Support advanced colour management functionality.
  - Support gamma correction and X11 colormaps when Xorg runs at depth
30 as well.
* Support for leasing RandR outputs to clients.
* Various robustness fixes for TearFree. In particular, fixed several
  cases in which disabling TearFree at runtime would result in the Xorg
  process freezing or crashing.
* Fixed some m4 related build issues with older versions of autotools.

Plus other improvements and fixes. Thanks to everybody who contributed
to this release in any way!


Emil Velikov (3):
  Move amdgpu_bus_id/amgpu_kernel_mode within amdgpu_kernel_open_fd
  Do not export the DriverRec AMDGPU
  Remove set but unused amdgpu_dri2::pKernelDRMVersion

Jim Qu (1):
  Wait for pending scanout update before calling drmmode_crtc_scanout_free

Keith Packard (3):
  modesetting: Record non-desktop kernel property at PreInit time
  modesetting: Create CONNECTOR_ID properties for outputs [v2]
  Add RandR leases support

Leo Li (Sunpeng) (7):
  Cache color property IDs and LUT sizes during pre-init
  Initialize color properties on CRTC during CRTC init
  Configure color properties when creating output resources
  Update color properties on output_get_property
  Enable setting of color properties via RandR
  Compose non-legacy with legacy regamma LUT
  Also compose LUT when setting legacy gamma

Michel Dänzer (48):
  Post-release version bump
  Ignore AMDGPU_DRM_QUEUE_ERROR (0) in amdgpu_drm_abort_entry
  Track DRM event queue sequence number in scanout_update_pending
  Abort scanout_update_pending event when possible
  Update RandR CRTC state if set_mode_major fails in set_desired_modes
  Simplify drmmode_crtc_scanout_update
  Don't call scanout_flip/update with a legacy RandR scanout buffer
  Simplify drmmode_handle_transform
  Set drmmode_crtc->scanout_id = 0 when TearFree is disabled
  Refactor drmmode_output_set_tear_free helper
  Wait for pending flips in drmmode_output_set_tear_free
  Replace 'foo == NULL' with '!foo'
  Call drmmode_do_crtc_dpms from drmmode_crtc_dpms as well
  Use drmmode_crtc_dpms in drmmode_set_desired_modes
  Check dimensions passed to drmmode_xf86crtc_resize
  Don't apply gamma to HW cursor data if colour management is enabled
  Remove #if 0'd code
  Call drmmode_crtc_gamma_do_set from drmmode_setup_colormap
  Bail from dri2_create_buffer2 if we can't get a pixmap
  glamor: Bail CreatePixmap on unsupported pixmap depth
  Move flush from radeon_scanout_do_update to its callers
  Support gamma correction & colormaps at depth 30 as well
  Hardcode "non-desktop" RandR property name
  Free previous xf86CrtcRec gamma LUT memory
  Don't use DRM_IOCTL_GEM_FLINK in create_pixmap_for_fbcon
  Remove AMDGPUInfoRec::fbcon_pixmap
  Remove drmmode_terminate_leases
  Use strcpy for RandR output property names
  glamor: Set AMDGPU_CREATE_PIXMAP_DRI2 for DRI3 pixmaps
  Store FB for each CRTC in drmmode_flipdata_rec
  glamor: Use glamor_egl_create_textured_pixmap_from_gbm_bo when possible
  glamor: Check glamor module version for depth 30 support
  Move DRM event queue related initialization to amdgpu_drm_queue_init
  Add amdgpu_drm_wait_pending_flip function
  Add amdgpu_drm_handle_event wrapper for drmHandleEvent
  Defer vblank event handling while waiting for a pending flip
  Remove drmmode_crtc_private_rec::present_vblank_* related code
  Use correct FB handle in amdgpu_do_pageflip
  Add m4 directory
  Use AC_CONFIG_MACRO_DIR instead of AC_CONFIG_MACRO_DIRS
  Handle ihandle == -1 in amdgpu_set_shared_pixmap_backing
  glamor: Handle ihandle == -1 in amdgpu_glamor_set_shared_pixmap_backing
  Always delete entry from list in drm_queue_handler
  Don't use xorg_list_for_each_entry_safe for signalled flips
  Do not push the CM_GAMMA_LUT property values in drmmode_crtc_cm_init
  Bail early from drm_wait_pending_flip if there's no pending flip
  Bail from drmmode_cm_init if there's no CRTC
  Bump version for the 18.1.0 release

Slava Grigorev (1):
  Include xf86platformBus.h unconditionally

git tag: xf86-video-amdgpu-18.1.0

https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-18.1.0.tar.bz2
MD5:  5d75f5993cda5e013cd851c5947ec450  xf86-video-amdgpu-18.1.0.tar.bz2
SHA1: d3097af7da3b56396721e214f348e7ceb5f3a358  xf86-video-amdgpu-18.1.0.tar.bz2
SHA256: e11f25bb51d718b8ea938ad2b8095323c0ab16f4ddffd92091d80f9a445a9672  
xf86-video-amdgpu-18.1.0.tar.bz2
SHA512: 

Re: [PATCH 0/9] KFD upstreaming September 2018

2018-09-14 Thread Alex Deucher
On Wed, Sep 12, 2018 at 9:44 PM Felix Kuehling  wrote:
>
> This patch series is based on amd-staging-drm-next.
>
> Patches 1-3 are important fixes that would be good to be included in
> drm-fixes for 4.19.
>
> Patches 3-8 are small feature enhancements.
>
> Patch 9 is random cleanup.
>
> I'll send a separate patch series that adds Vega20 support to KFD on top of 
> this.
>
> Amber Lin (1):
>   drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
>
> Emily Deng (1):
>   drm/amdkfd: KFD doesn't support TONGA SRIOV
>
> Eric Huang (1):
>   drm/amdkfd: reflect atomic support in IO link properties
>
> Felix Kuehling (2):
>   drm/amdkfd: Report SDMA firmware version in the topology
>   drm/amdgpu: remove unnecessary forward declaration
>
> Harish Kasiviswanathan (1):
>   drm/amdgpu: Enable BAD_OPCODE intr for gfx8
>
> Jay Cornwall (1):
>   drm/amdkfd: Add wavefront context save state retrieval ioctl
>
> Yong Zhao (2):
>   drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
>   drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
>

Series is:
Acked-by: Alex Deucher 


>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |  5 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 21 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c| 35 +---
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 37 +
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  8 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 13 -
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h   |  8 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 25 -
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c| 23 
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 12 
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 22 
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c  | 64 
> +-
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h|  2 +-
>  include/uapi/linux/kfd_ioctl.h | 13 -
>  17 files changed, 251 insertions(+), 47 deletions(-)
>
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/6] Initial Vega20 support for KFD

2018-09-14 Thread Alex Deucher
On Wed, Sep 12, 2018 at 9:44 PM Felix Kuehling  wrote:
>
> This patch series is based on amd-staging-drm-next + the patch series
> "KFD upstreaming September 2018".
>
> Emily Deng (1):
>   drm/amdgpu/sriov: Correct the setting about sdma doorbell offset of
> Vega10
>
> Shaoyun Liu (5):
>   drm/amdgpu: Doorbell assignment for 8 sdma user queue per engine
>   drm/amdkfd: Make the number of SDMA queues variable
>   drm/amd: Interface change to support 64 bit page_table_base
>   drm/amdgpu: Add vega20 support on kfd probe
>   drm/amdkfd: Vega20 bring up on amdkfd side
>

Series is:
Acked-by: Alex Deucher 

>  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 23 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 50 
> +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |  7 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |  7 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  |  7 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   |  8 +++-
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 12 --
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  |  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c| 33 ++
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 18 +---
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  1 -
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c   |  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c  |  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c   |  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c|  1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c  |  1 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h| 10 ++---
>  20 files changed, 136 insertions(+), 54 deletions(-)
>
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-14 Thread David Sterba
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote:
> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann 
> ---

>  fs/btrfs/super.c| 2 +-

Acked-by: David Sterba 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation

2018-09-14 Thread James Zhu

This series are

Acked-by: James Zhu 
Tested-by: James Zhu 


On 2018-09-14 06:57 AM, Christian König wrote:

We need to mask the lower bits not the upper one.

Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ae4467113240..9a5b252784a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
else
mc->gart_start = mc->mc_mask - mc->gart_size + 1;
  
-	mc->gart_start &= four_gb - 1;

+   mc->gart_start &= ~(four_gb - 1);
mc->gart_end = mc->gart_start + mc->gart_size - 1;
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests

2018-09-14 Thread Deucher, Alexander
Series is:

Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, September 14, 2018 9:09:06 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests

Add allocation tests for GDW, GWS and OA.

Signed-off-by: Christian König 
---
 tests/amdgpu/amdgpu_test.h | 48 +-
 tests/amdgpu/bo_tests.c| 21 
 2 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index d1e14e23..af3041e5 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -207,11 +207,9 @@ static inline amdgpu_bo_handle gpu_mem_alloc(
 amdgpu_va_handle *va_handle)
 {
 struct amdgpu_bo_alloc_request req = {0};
-   amdgpu_bo_handle buf_handle;
+   amdgpu_bo_handle buf_handle = NULL;
 int r;

-   CU_ASSERT_NOT_EQUAL(vmc_addr, NULL);
-
 req.alloc_size = size;
 req.phys_alignment = alignment;
 req.preferred_heap = type;
@@ -222,16 +220,19 @@ static inline amdgpu_bo_handle gpu_mem_alloc(
 if (r)
 return NULL;

-   r = amdgpu_va_range_alloc(device_handle,
- amdgpu_gpu_va_range_general,
- size, alignment, 0, vmc_addr,
- va_handle, 0);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   goto error_free_bo;
-
-   r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, 
AMDGPU_VA_OP_MAP);
-   CU_ASSERT_EQUAL(r, 0);
+   if (vmc_addr && va_handle) {
+   r = amdgpu_va_range_alloc(device_handle,
+ amdgpu_gpu_va_range_general,
+ size, alignment, 0, vmc_addr,
+ va_handle, 0);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   goto error_free_bo;
+
+   r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0,
+   AMDGPU_VA_OP_MAP);
+   CU_ASSERT_EQUAL(r, 0);
+   }

 return buf_handle;

@@ -256,15 +257,18 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo,
 if (!bo)
 return 0;

-   r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   return r;
-
-   r = amdgpu_va_range_free(va_handle);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   return r;
+   if (va_handle) {
+   r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0,
+   AMDGPU_VA_OP_UNMAP);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
+
+   r = amdgpu_va_range_free(va_handle);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
+   }

 r = amdgpu_bo_free(bo);
 CU_ASSERT_EQUAL(r, 0);
diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index dc2de9b7..7cff4cf7 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -242,6 +242,27 @@ static void amdgpu_memory_alloc(void)

 r = gpu_mem_free(bo, va_handle, bo_mc, 4096);
 CU_ASSERT_EQUAL(r, 0);
+
+   /* Test GDS */
+   bo = gpu_mem_alloc(device_handle, 1024, 0,
+   AMDGPU_GEM_DOMAIN_GDS, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+
+   /* Test GWS */
+   bo = gpu_mem_alloc(device_handle, 1, 0,
+   AMDGPU_GEM_DOMAIN_GWS, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+
+   /* Test OA */
+   bo = gpu_mem_alloc(device_handle, 1, 0,
+   AMDGPU_GEM_DOMAIN_OA, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
 }

 static void amdgpu_mem_fail_alloc(void)
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation

2018-09-14 Thread Deucher, Alexander
Reviewed-by: Alex Deucher 


From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, September 14, 2018 6:57 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation

We need to mask the lower bits not the upper one.

Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ae4467113240..9a5b252784a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
 else
 mc->gart_start = mc->mc_mask - mc->gart_size + 1;

-   mc->gart_start &= four_gb - 1;
+   mc->gart_start &= ~(four_gb - 1);
 mc->gart_end = mc->gart_start + mc->gart_size - 1;
 dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
 mc->gart_size >> 20, mc->gart_start, mc->gart_end);
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for the GTT domain"

2018-09-14 Thread Deucher, Alexander
Acked-by: Alex Deucher 


From: amd-gfx  on behalf of Christian 
König 
Sent: Friday, September 14, 2018 6:57:28 AM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for 
the GTT domain"

Turned out the commit is incomplete and since we remove using the AGP
mapping from the GTT manager it is also not necessary any more.

This reverts commit 22d8bfafcc12dfa17b91d2e8ae4e1898e782003a.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c2539f6821c0..da7b1b92d9cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -143,8 +143,7 @@ static int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager 
*man,
 spin_unlock(>lock);

 if (!r)
-   mem->start = node->node.start +
-   (adev->gmc.gart_start >> PAGE_SHIFT);
+   mem->start = node->node.start;

 return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8a158ee922f7..f12ae6b525b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -188,7 +188,7 @@ static int amdgpu_init_mem_type(struct ttm_bo_device *bdev, 
uint32_t type,
 case TTM_PL_TT:
 /* GTT memory  */
 man->func = _gtt_mgr_func;
-   man->gpu_offset = 0;
+   man->gpu_offset = adev->gmc.gart_start;
 man->available_caching = TTM_PL_MASK_CACHING;
 man->default_caching = TTM_PL_FLAG_CACHED;
 man->flags = TTM_MEMTYPE_FLAG_MAPPABLE | TTM_MEMTYPE_FLAG_CMA;
@@ -1060,7 +1060,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_tt *ttm,
 flags = amdgpu_ttm_tt_pte_flags(adev, ttm, bo_mem);

 /* bind pages into GART page tables */
-   gtt->offset = ((u64)bo_mem->start << PAGE_SHIFT) - adev->gmc.gart_start;
+   gtt->offset = (u64)bo_mem->start << PAGE_SHIFT;
 r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
 ttm->pages, gtt->ttm.dma_address, flags);

@@ -1112,8 +1112,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
 flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );

 /* Bind pages */
-   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) -
-   adev->gmc.gart_start;
+   gtt->offset = (u64)tmp.start << PAGE_SHIFT;
 r = amdgpu_ttm_gart_bind(adev, bo, flags);
 if (unlikely(r)) {
 ttm_bo_mem_put(bo, );
--
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery

2018-09-14 Thread Christian König
It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.

Signed-off-by: Christian König 
Acked-by: Junwei Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 762dc5f886cd..899342c6dfad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3003,7 +3003,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct 
amdgpu_device *adev,
 }
 
 /**
- * amdgpu_device_handle_vram_lost - Handle the loss of VRAM contents
+ * amdgpu_device_recover_vram - Recover some VRAM contents
  *
  * @adev: amdgpu_device pointer
  *
@@ -3012,7 +3012,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct 
amdgpu_device *adev,
  * the contents of VRAM might be lost.
  * Returns 0 on success, 1 on failure.
  */
-static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
+static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 {
struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
struct amdgpu_bo *bo, *tmp;
@@ -3139,8 +3139,8 @@ static int amdgpu_device_reset(struct amdgpu_device *adev)
}
}
 
-   if (!r && ((need_full_reset && !(adev->flags & AMD_IS_APU)) || 
vram_lost))
-   r = amdgpu_device_handle_vram_lost(adev);
+   if (!r)
+   r = amdgpu_device_recover_vram(adev);
 
return r;
 }
@@ -3186,7 +3186,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device 
*adev,
amdgpu_virt_release_full_gpu(adev, true);
if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
atomic_inc(>vram_lost_counter);
-   r = amdgpu_device_handle_vram_lost(adev);
+   r = amdgpu_device_recover_vram(adev);
}
 
return r;
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/5] drm/amdgpu: fix shadow BO restoring

2018-09-14 Thread Christian König
Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
 3 files changed, 43 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 899342c6dfad..1cbc372964f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct 
amdgpu_device *adev)
return 0;
 }
 
-/**
- * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: amdgpu_bo buffer whose shadow is being restored
- * @fence: dma_fence associated with the operation
- *
- * Restores the VRAM buffer contents from the shadow in GTT.  Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
- struct amdgpu_ring *ring,
- struct amdgpu_bo *bo,
- struct dma_fence **fence)
-{
-   uint32_t domain;
-   int r;
-
-   if (!bo->shadow)
-   return 0;
-
-   r = amdgpu_bo_reserve(bo, true);
-   if (r)
-   return r;
-   domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
-   /* if bo has been evicted, then no need to recover */
-   if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   r = amdgpu_bo_validate(bo->shadow);
-   if (r) {
-   DRM_ERROR("bo validate failed!\n");
-   goto err;
-   }
-
-   r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
-NULL, fence, true);
-   if (r) {
-   DRM_ERROR("recover page table failed!\n");
-   goto err;
-   }
-   }
-err:
-   amdgpu_bo_unreserve(bo);
-   return r;
-}
-
 /**
  * amdgpu_device_recover_vram - Recover some VRAM contents
  *
@@ -3010,16 +2962,15 @@ static int 
amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
  * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
  * restore things like GPUVM page tables after a GPU reset where
  * the contents of VRAM might be lost.
- * Returns 0 on success, 1 on failure.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
  */
 static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 {
-   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-   struct amdgpu_bo *bo, *tmp;
struct dma_fence *fence = NULL, *next = NULL;
-   long r = 1;
-   int i = 0;
-   long tmo;
+   struct amdgpu_bo *shadow;
+   long r = 1, tmo;
 
if (amdgpu_sriov_runtime(adev))
tmo = msecs_to_jiffies(8000);
@@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct 
amdgpu_device *adev)
 
DRM_INFO("recover vram bo from shadow start\n");
mutex_lock(>shadow_list_lock);
-   list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) {
-   next = NULL;
-   amdgpu_device_recover_vram_from_shadow(adev, ring, bo, );
+   list_for_each_entry(shadow, >shadow_list, shadow_list) {
+
+   /* No need to recover an evicted BO */
+   if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
+   shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
+   continue;
+
+   r = amdgpu_bo_restore_shadow(shadow, );
+   if (r)
+   break;
+
if (fence) {
r = dma_fence_wait_timeout(fence, false, tmo);
-   if (r == 0)
-   pr_err("wait fence %p[%d] timeout\n", fence, i);
-   else if (r < 0)
-   pr_err("wait fence %p[%d] interrupted\n", 
fence, i);
-   if (r < 1) {
-   dma_fence_put(fence);
-   fence = next;
+   dma_fence_put(fence);
+   fence = next;
+   if (r <= 0)
break;
-   }
-   i++;
+   } else {
+   fence = next;
}
-
-   dma_fence_put(fence);
-   fence = next;
}

[PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves

2018-09-14 Thread Christian König
We are going to need this for recoverable page fault handling and it
makes shadow handling during GPU reset much more easier.

Signed-off-by: Christian König 
Acked-by: Junwei Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e6909252aefa..e6e5e5e50c98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1360,7 +1360,7 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
 {
WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM);
WARN_ON_ONCE(!ww_mutex_is_locked(>tbo.resv->lock) &&
-!bo->pin_count);
+!bo->pin_count && bo->tbo.type != ttm_bo_type_kernel);
WARN_ON_ONCE(bo->tbo.mem.start == AMDGPU_BO_INVALID_OFFSET);
WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_VRAM &&
 !(bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS));
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8a158ee922f7..9e7991b1c8ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -524,7 +524,11 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
if (r)
goto error;
 
-   r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);
+   /* Always block for VM page tables before committing the new location */
+   if (bo->type == ttm_bo_type_kernel)
+   r = ttm_bo_move_accel_cleanup(bo, fence, true, new_mem);
+   else
+   r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);
dma_fence_put(fence);
return r;
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment

2018-09-14 Thread Christian König
They aren't directly used by the hardware.

Signed-off-by: Christian König 
Reviewed-by: Junwei Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d8e8d653d518..650c45c896f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -526,7 +526,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 }
 
 static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
-  unsigned long size, int byte_align,
+  unsigned long size,
   struct amdgpu_bo *bo)
 {
struct amdgpu_bo_param bp;
@@ -537,7 +537,6 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device 
*adev,
 
memset(, 0, sizeof(bp));
bp.size = size;
-   bp.byte_align = byte_align;
bp.domain = AMDGPU_GEM_DOMAIN_GTT;
bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC |
AMDGPU_GEM_CREATE_SHADOW;
@@ -586,7 +585,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv,
NULL));
 
-   r = amdgpu_bo_create_shadow(adev, bp->size, bp->byte_align, 
(*bo_ptr));
+   r = amdgpu_bo_create_shadow(adev, bp->size, *bo_ptr);
 
if (!bp->resv)
reservation_object_unlock((*bo_ptr)->tbo.resv);
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/5] drm/amdgpu: always enable shadow BOs v2

2018-09-14 Thread Christian König
Even when GPU recovery is disabled we could run into a manually
triggered recovery.

v2: keep accidental removed comments

Signed-off-by: Christian König 
Acked-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e6e5e5e50c98..d8e8d653d518 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -51,18 +51,6 @@
  *
  */
 
-static bool amdgpu_bo_need_backup(struct amdgpu_device *adev)
-{
-   if (adev->flags & AMD_IS_APU)
-   return false;
-
-   if (amdgpu_gpu_recovery == 0 ||
-   (amdgpu_gpu_recovery == -1  && !amdgpu_sriov_vf(adev)))
-   return false;
-
-   return true;
-}
-
 /**
  * amdgpu_bo_subtract_pin_size - Remove BO from pin_size accounting
  *
@@ -593,7 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (r)
return r;
 
-   if ((flags & AMDGPU_GEM_CREATE_SHADOW) && amdgpu_bo_need_backup(adev)) {
+   if ((flags & AMDGPU_GEM_CREATE_SHADOW) && !(adev->flags & AMD_IS_APU)) {
if (!bp->resv)
WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv,
NULL));
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/dc: Trigger set power state task when display configuration changes

2018-09-14 Thread Deucher, Alexander
Acked-by: Alex Deucher 


From: amd-gfx  on behalf of Rex Zhu 

Sent: Friday, September 14, 2018 1:57:07 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhu, Rex
Subject: [PATCH] drm/amd/dc: Trigger set power state task when display 
configuration changes

Revert "drm/amd/display: Remove call to amdgpu_pm_compute_clocks"

This reverts commit dcd473770e86517543691bdb227103d6c781cd0a.

when display configuration changes, dc need to update the changes
to powerplay, also need to trigger a power state task.
amdgpu_pm_compute_clocks is the interface to set power state task
either dpm enabled or powerplay enabled

Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
index 6d16b4a..0fab64a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
@@ -105,6 +105,8 @@ bool dm_pp_apply_display_requirements(
 adev->powerplay.pp_funcs->display_configuration_change(
 adev->powerplay.pp_handle,
 >pm.pm_display_cfg);
+
+   amdgpu_pm_compute_clocks(adev);
 }

 return true;
--
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 2/3] test/amdgpu: add proper error handling

2018-09-14 Thread Christian König
Otherwise the calling function won't notice that something is wrong.

Signed-off-by: Christian König 
---
 tests/amdgpu/amdgpu_test.h | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index f2ece3c3..d1e14e23 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -219,17 +219,31 @@ static inline amdgpu_bo_handle gpu_mem_alloc(
 
r = amdgpu_bo_alloc(device_handle, , _handle);
CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return NULL;
 
r = amdgpu_va_range_alloc(device_handle,
  amdgpu_gpu_va_range_general,
  size, alignment, 0, vmc_addr,
  va_handle, 0);
CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   goto error_free_bo;
 
r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, 
AMDGPU_VA_OP_MAP);
CU_ASSERT_EQUAL(r, 0);
 
return buf_handle;
+
+error_free_va:
+   r = amdgpu_va_range_free(*va_handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+error_free_bo:
+   r = amdgpu_bo_free(buf_handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   return NULL;
 }
 
 static inline int gpu_mem_free(amdgpu_bo_handle bo,
@@ -239,16 +253,23 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo,
 {
int r;
 
+   if (!bo)
+   return 0;
+
r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP);
CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
 
r = amdgpu_va_range_free(va_handle);
CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
 
r = amdgpu_bo_free(bo);
CU_ASSERT_EQUAL(r, 0);
 
-   return 0;
+   return r;
 }
 
 static inline int
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests

2018-09-14 Thread Christian König
Add allocation tests for GDW, GWS and OA.

Signed-off-by: Christian König 
---
 tests/amdgpu/amdgpu_test.h | 48 +-
 tests/amdgpu/bo_tests.c| 21 
 2 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index d1e14e23..af3041e5 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -207,11 +207,9 @@ static inline amdgpu_bo_handle gpu_mem_alloc(
amdgpu_va_handle *va_handle)
 {
struct amdgpu_bo_alloc_request req = {0};
-   amdgpu_bo_handle buf_handle;
+   amdgpu_bo_handle buf_handle = NULL;
int r;
 
-   CU_ASSERT_NOT_EQUAL(vmc_addr, NULL);
-
req.alloc_size = size;
req.phys_alignment = alignment;
req.preferred_heap = type;
@@ -222,16 +220,19 @@ static inline amdgpu_bo_handle gpu_mem_alloc(
if (r)
return NULL;
 
-   r = amdgpu_va_range_alloc(device_handle,
- amdgpu_gpu_va_range_general,
- size, alignment, 0, vmc_addr,
- va_handle, 0);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   goto error_free_bo;
-
-   r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, 
AMDGPU_VA_OP_MAP);
-   CU_ASSERT_EQUAL(r, 0);
+   if (vmc_addr && va_handle) {
+   r = amdgpu_va_range_alloc(device_handle,
+ amdgpu_gpu_va_range_general,
+ size, alignment, 0, vmc_addr,
+ va_handle, 0);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   goto error_free_bo;
+
+   r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0,
+   AMDGPU_VA_OP_MAP);
+   CU_ASSERT_EQUAL(r, 0);
+   }
 
return buf_handle;
 
@@ -256,15 +257,18 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo,
if (!bo)
return 0;
 
-   r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   return r;
-
-   r = amdgpu_va_range_free(va_handle);
-   CU_ASSERT_EQUAL(r, 0);
-   if (r)
-   return r;
+   if (va_handle) {
+   r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0,
+   AMDGPU_VA_OP_UNMAP);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
+
+   r = amdgpu_va_range_free(va_handle);
+   CU_ASSERT_EQUAL(r, 0);
+   if (r)
+   return r;
+   }
 
r = amdgpu_bo_free(bo);
CU_ASSERT_EQUAL(r, 0);
diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c
index dc2de9b7..7cff4cf7 100644
--- a/tests/amdgpu/bo_tests.c
+++ b/tests/amdgpu/bo_tests.c
@@ -242,6 +242,27 @@ static void amdgpu_memory_alloc(void)
 
r = gpu_mem_free(bo, va_handle, bo_mc, 4096);
CU_ASSERT_EQUAL(r, 0);
+
+   /* Test GDS */
+   bo = gpu_mem_alloc(device_handle, 1024, 0,
+   AMDGPU_GEM_DOMAIN_GDS, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+
+   /* Test GWS */
+   bo = gpu_mem_alloc(device_handle, 1, 0,
+   AMDGPU_GEM_DOMAIN_GWS, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
+
+   /* Test OA */
+   bo = gpu_mem_alloc(device_handle, 1, 0,
+   AMDGPU_GEM_DOMAIN_OA, 0,
+   NULL, NULL);
+   r = gpu_mem_free(bo, NULL, 0, 4096);
+   CU_ASSERT_EQUAL(r, 0);
 }
 
 static void amdgpu_mem_fail_alloc(void)
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 1/3] amdgpu: remove invalid check in amdgpu_bo_alloc

2018-09-14 Thread Christian König
The heap is checked by the kernel and not libdrm, to make it even worse
it prevented allocating resources other than VRAM and GTT.

Signed-off-by: Christian König 
---
 amdgpu/amdgpu_bo.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 6a95929c..34904e38 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -74,19 +74,14 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
amdgpu_bo_handle *buf_handle)
 {
union drm_amdgpu_gem_create args;
-   unsigned heap = alloc_buffer->preferred_heap;
-   int r = 0;
-
-   /* It's an error if the heap is not specified */
-   if (!(heap & (AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM)))
-   return -EINVAL;
+   int r;
 
memset(, 0, sizeof(args));
args.in.bo_size = alloc_buffer->alloc_size;
args.in.alignment = alloc_buffer->phys_alignment;
 
/* Set the placement. */
-   args.in.domains = heap;
+   args.in.domains = alloc_buffer->preferred_heap;
args.in.domain_flags = alloc_buffer->flags;
 
/* Allocate the buffer with the preferred heap. */
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring

2018-09-14 Thread Christian König

Am 13.09.2018 um 11:29 schrieb Zhang, Jerry(Junwei):

On 09/11/2018 05:56 PM, Christian König wrote:

Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 
-

  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
  3 files changed, 43 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 5eba66ecf668..20bb702f5c7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2940,54 +2940,6 @@ static int 
amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)

  return 0;
  }
  -/**
- * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM 
buffers

- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: amdgpu_bo buffer whose shadow is being restored
- * @fence: dma_fence associated with the operation
- *
- * Restores the VRAM buffer contents from the shadow in GTT. Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram_from_shadow(struct 
amdgpu_device *adev,

-  struct amdgpu_ring *ring,
-  struct amdgpu_bo *bo,
-  struct dma_fence **fence)
-{
-    uint32_t domain;
-    int r;
-
-    if (!bo->shadow)
-    return 0;
-
-    r = amdgpu_bo_reserve(bo, true);
-    if (r)
-    return r;
-    domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
-    /* if bo has been evicted, then no need to recover */
-    if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
-    r = amdgpu_bo_validate(bo->shadow);
-    if (r) {
-    DRM_ERROR("bo validate failed!\n");
-    goto err;
-    }
-
-    r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
- NULL, fence, true);
-    if (r) {
-    DRM_ERROR("recover page table failed!\n");
-    goto err;
-    }
-    }
-err:
-    amdgpu_bo_unreserve(bo);
-    return r;
-}
-
  /**
   * amdgpu_device_recover_vram - Recover some VRAM contents
   *
@@ -2996,16 +2948,15 @@ static int 
amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
   * Restores the contents of VRAM buffers from the shadows in GTT.  
Used to

   * restore things like GPUVM page tables after a GPU reset where
   * the contents of VRAM might be lost.
- * Returns 0 on success, 1 on failure.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
   */
  static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
  {
-    struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-    struct amdgpu_bo *bo, *tmp;
  struct dma_fence *fence = NULL, *next = NULL;
-    long r = 1;
-    int i = 0;
-    long tmo;
+    struct amdgpu_bo *shadow;
+    long r = 1, tmo;
    if (amdgpu_sriov_runtime(adev))
  tmo = msecs_to_jiffies(8000);
@@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct 
amdgpu_device *adev)

    DRM_INFO("recover vram bo from shadow start\n");
  mutex_lock(>shadow_list_lock);
-    list_for_each_entry_safe(bo, tmp, >shadow_list, 
shadow_list) {

-    next = NULL;
-    amdgpu_device_recover_vram_from_shadow(adev, ring, bo, );
+    list_for_each_entry(shadow, >shadow_list, shadow_list) {
+
+    /* No need to recover an evicted BO */
+    if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
+    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)

is there a change that shadow bo evicted to other domain?
like SYSTEM?


Yes, that's why I test "!= TTM_PL_TT" here.

What can happen is that either the shadow or the page table or page 
directory is evicted.


But in this case we don't need to restore anything because of patch #1 
in this series.


Regards,
Christian.



Regards,
Jerry

+    continue;
+
+    r = amdgpu_bo_restore_shadow(shadow, );
+    if (r)
+    break;
+
  if (fence) {
  r = dma_fence_wait_timeout(fence, false, tmo);
-    if (r == 0)
-    pr_err("wait fence %p[%d] timeout\n", fence, i);
-    else if (r < 0)
-    pr_err("wait fence %p[%d] interrupted\n", fence, i);
-    if (r < 1) {
-    dma_fence_put(fence);
-    fence = next;
+    dma_fence_put(fence);
+    fence = next;
+    if (r <= 0)
  break;
-    }
-    i++;
+    } else {
+    fence = next;
  }
-
-    dma_fence_put(fence);
-    fence = next;
  }
  mutex_unlock(>shadow_list_lock);
  -    if (fence) {
-    r = dma_fence_wait_timeout(fence, 

kfdtest failures for amdkfd (amd-staging-drm-next)

2018-09-14 Thread Alexander Frolov

Hi!

I am trying to use amd-staging-drm-next to work with amdkfd (built into 
amdgpu) for the AMD Instinct MI25 device.


As a first step I compiled libhsakmt 1.8.x and tried to run kfdtest. But 
it produces lots of failures (see below).

Here are the results:

...
[==] 76 tests from 14 test cases ran. (80250 ms total)
[  PASSED  ] 39 tests.
[  FAILED  ] 37 tests, listed below:
[  FAILED  ] KFDEvictTest.QueueTest
[  FAILED  ] KFDGraphicsInterop.RegisterGraphicsHandle
[  FAILED  ] KFDIPCTest.BasicTest
[  FAILED  ] KFDIPCTest.CrossMemoryAttachTest
[  FAILED  ] KFDIPCTest.CMABasicTest
[  FAILED  ] KFDLocalMemoryTest.BasicTest
[  FAILED  ] KFDLocalMemoryTest.VerifyContentsAfterUnmapAndMap
[  FAILED  ] KFDLocalMemoryTest.CheckZeroInitializationVram
[  FAILED  ] KFDMemoryTest.MapUnmapToNodes
[  FAILED  ] KFDMemoryTest.MemoryRegisterSamePtr
[  FAILED  ] KFDMemoryTest.FlatScratchAccess
[  FAILED  ] KFDMemoryTest.MMBench
[  FAILED  ] KFDMemoryTest.QueryPointerInfo
[  FAILED  ] KFDMemoryTest.PtraceAccessInvisibleVram
[  FAILED  ] KFDMemoryTest.SignalHandling
[  FAILED  ] KFDQMTest.CreateCpQueue
[  FAILED  ] KFDQMTest.CreateMultipleSdmaQueues
[  FAILED  ] KFDQMTest.SdmaConcurrentCopies
[  FAILED  ] KFDQMTest.CreateMultipleCpQueues
[  FAILED  ] KFDQMTest.DisableSdmaQueueByUpdateWithNullAddress
[  FAILED  ] KFDQMTest.DisableCpQueueByUpdateWithZeroPercentage
[  FAILED  ] KFDQMTest.OverSubscribeCpQueues
[  FAILED  ] KFDQMTest.BasicCuMaskingEven
[  FAILED  ] KFDQMTest.QueuePriorityOnDifferentPipe
[  FAILED  ] KFDQMTest.QueuePriorityOnSamePipe
[  FAILED  ] KFDQMTest.EmptyDispatch
[  FAILED  ] KFDQMTest.SimpleWriteDispatch
[  FAILED  ] KFDQMTest.MultipleCpQueuesStressDispatch
[  FAILED  ] KFDQMTest.CpuWriteCoherence
[  FAILED  ] KFDQMTest.CreateAqlCpQueue
[  FAILED  ] KFDQMTest.QueueLatency
[  FAILED  ] KFDQMTest.CpQueueWraparound
[  FAILED  ] KFDQMTest.SdmaQueueWraparound
[  FAILED  ] KFDQMTest.Atomics
[  FAILED  ] KFDQMTest.P2PTest
[  FAILED  ] KFDQMTest.SdmaEventInterrupt
[  FAILED  ] KFDTopologyTest.BasicTest

Does it mean that current amdkfd from the kernel cant be used with 
libhsakmt 1.8.x? or I am doing something wrong...

Thank you!

Best,
   Alexander

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: fix mask in GART location calculation

2018-09-14 Thread Christian König
We need to mask the lower bits not the upper one.

Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ae4467113240..9a5b252784a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, 
struct amdgpu_gmc *mc)
else
mc->gart_start = mc->mc_mask - mc->gart_size + 1;
 
-   mc->gart_start &= four_gb - 1;
+   mc->gart_start &= ~(four_gb - 1);
mc->gart_end = mc->gart_start + mc->gart_size - 1;
dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n",
mc->gart_size >> 20, mc->gart_start, mc->gart_end);
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for the GTT domain"

2018-09-14 Thread Christian König
Turned out the commit is incomplete and since we remove using the AGP
mapping from the GTT manager it is also not necessary any more.

This reverts commit 22d8bfafcc12dfa17b91d2e8ae4e1898e782003a.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index c2539f6821c0..da7b1b92d9cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -143,8 +143,7 @@ static int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager 
*man,
spin_unlock(>lock);
 
if (!r)
-   mem->start = node->node.start +
-   (adev->gmc.gart_start >> PAGE_SHIFT);
+   mem->start = node->node.start;
 
return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8a158ee922f7..f12ae6b525b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -188,7 +188,7 @@ static int amdgpu_init_mem_type(struct ttm_bo_device *bdev, 
uint32_t type,
case TTM_PL_TT:
/* GTT memory  */
man->func = _gtt_mgr_func;
-   man->gpu_offset = 0;
+   man->gpu_offset = adev->gmc.gart_start;
man->available_caching = TTM_PL_MASK_CACHING;
man->default_caching = TTM_PL_FLAG_CACHED;
man->flags = TTM_MEMTYPE_FLAG_MAPPABLE | TTM_MEMTYPE_FLAG_CMA;
@@ -1060,7 +1060,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_tt *ttm,
flags = amdgpu_ttm_tt_pte_flags(adev, ttm, bo_mem);
 
/* bind pages into GART page tables */
-   gtt->offset = ((u64)bo_mem->start << PAGE_SHIFT) - adev->gmc.gart_start;
+   gtt->offset = (u64)bo_mem->start << PAGE_SHIFT;
r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages,
ttm->pages, gtt->ttm.dma_address, flags);
 
@@ -1112,8 +1112,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );
 
/* Bind pages */
-   gtt->offset = ((u64)tmp.start << PAGE_SHIFT) -
-   adev->gmc.gart_start;
+   gtt->offset = (u64)tmp.start << PAGE_SHIFT;
r = amdgpu_ttm_gart_bind(adev, bo, flags);
if (unlikely(r)) {
ttm_bo_mem_put(bo, );
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Christian König

Am 14.09.2018 um 12:37 schrieb Chunming Zhou:

This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
* CPU query - A host operation that allows querying the payload of the
  timeline syncobj.
* CPU wait - A host operation that allows a blocking wait for a
  timeline syncobj to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline syncobj to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
 a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
 b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 


At least on first glance that looks like it should work, going to do a 
detailed review on Monday.


Christian.


---
  drivers/gpu/drm/drm_syncobj.c  | 294 ++---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  62 +++--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@either
  #include "drm_internal.h"
  #include 
  
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
  struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
  };
  
+struct drm_syncobj_signal_pt {

+   struct dma_fence_array *base;
+   u64value;
+   struct list_head list;
+};
  
  /**

   * drm_syncobj_find - lookup and reference a sync object.
@@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
  {
int ret;
  
-	*fence = drm_syncobj_fence_get(syncobj);

+   ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
if (*fence)
return 1;
  
@@ -133,10 +141,10 @@ static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,

 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence) {
-   *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,
-
lockdep_is_held(>lock)));
-   ret = 1;
+   if (fence) {
+   drm_syncobj_search_fence(syncobj, 0, 0, fence);
+   if (*fence)
+   ret = 1;
} 

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-14 Thread Christian König

Am 14.09.2018 um 11:22 schrieb Michel Dänzer:

On 2018-09-14 10:22 a.m., Huang Rui wrote:

On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote:

Am 13.09.2018 um 10:31 schrieb Huang Rui:

On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote:

While cutting the lists we sometimes accidentally added a list_head from
the stack to the LRUs, effectively corrupting the list.

Remove the list cutting and use explicit list manipulation instead.

This patch actually fixes the corruption bug. Was it a defect of
list_cut_position or list_splice handlers?

We somehow did something illegal with list_cut_position. I haven't
narrowed it down till the end, but we ended up with list_heads from the
stack to the lru.

I am confused, in theory, even we do any manipulation with list helper, it
should not trigger the list corruption. The usage of those helpers should
ensure the list operation safely...

There's nothing the helpers can do about being passed in pointers to
stack memory. It's a bug in the code using the helpers.


Actually I'm not 100% sure of that. To me it looks like we hit a corner 
case list_cut_position doesn't support.


Or we indeed had a logic error in how we called it, anyway the explicit 
implementation only uses 6 assignments and so is much easier to handle.


Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-14 Thread Chunming Zhou
This patch is for VK_KHR_timeline_semaphore extension, semaphore is called 
syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
   * CPU query - A host operation that allows querying the payload of the
 timeline syncobj.
   * CPU wait - A host operation that allows a blocking wait for a
 timeline syncobj to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline syncobj to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline syncobj to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point 
of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
b. normal syncobj wait op will create a wait pt with last signal point, and 
this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb

normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/drm_syncobj.c  | 294 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 include/drm/drm_syncobj.h  |  62 +++--
 include/uapi/drm/drm.h |   1 +
 4 files changed, 292 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e9ce623d049e..e78d076f2703 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
 #include "drm_internal.h"
 #include 
 
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
 struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
 };
 
+struct drm_syncobj_signal_pt {
+   struct dma_fence_array *base;
+   u64value;
+   struct list_head list;
+};
 
 /**
  * drm_syncobj_find - lookup and reference a sync object.
@@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
 {
int ret;
 
-   *fence = drm_syncobj_fence_get(syncobj);
+   ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
if (*fence)
return 1;
 
@@ -133,10 +141,10 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence) {
-   *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,
-
lockdep_is_held(>lock)));
-   ret = 1;
+   if (fence) {
+   drm_syncobj_search_fence(syncobj, 0, 0, fence);
+   if (*fence)
+   ret = 1;
} else {
*fence = NULL;
drm_syncobj_add_callback_locked(syncobj, cb, func);
@@ -164,6 +172,151 @@ void drm_syncobj_remove_callback(struct drm_syncobj 
*syncobj,
 

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-14 Thread Michel Dänzer
On 2018-09-14 10:22 a.m., Huang Rui wrote:
> On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote:
>> Am 13.09.2018 um 10:31 schrieb Huang Rui:
>>> On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote:
 While cutting the lists we sometimes accidentally added a list_head from
 the stack to the LRUs, effectively corrupting the list.

 Remove the list cutting and use explicit list manipulation instead.
>>> This patch actually fixes the corruption bug. Was it a defect of
>>> list_cut_position or list_splice handlers?
>>
>> We somehow did something illegal with list_cut_position. I haven't 
>> narrowed it down till the end, but we ended up with list_heads from the 
>> stack to the lru.
> 
> I am confused, in theory, even we do any manipulation with list helper, it
> should not trigger the list corruption. The usage of those helpers should
> ensure the list operation safely...

There's nothing the helpers can do about being passed in pointers to
stack memory. It's a bug in the code using the helpers.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] tests/amdgpu: add unaligned VM test

2018-09-14 Thread Zhang, Jerry(Junwei)

On 09/13/2018 08:20 PM, Christian König wrote:

Am 11.09.2018 um 04:06 schrieb Zhang, Jerry (Junwei):

On 09/10/2018 05:33 PM, Christian König wrote:

Am 10.09.2018 um 04:44 schrieb Zhang, Jerry (Junwei):

On 09/10/2018 02:04 AM, Christian König wrote:

Make a VM mapping which is as unaligned as possible.


Is it going to test unaligned address between BO allocation and BO 
mapping

and skip huge page mapping?


Yes and no.

Huge page handling works by mapping at least 2MB of continuous 
memory on a 2MB aligned address.


What I do here is I allocate 4GB of VRAM and try to map it to an 
address which is aligned to 1GB + 4KB.


In other words the VM subsystem will add a single PTE to align the 
entry to 8KB, then it add two PTEs to align it to 16KB, then four to 
get to 32KB and so on until we have the maximum alignment of 2GB

which Vega/Raven support in the L1.


Thanks to explain that.

From the trace log, it will map 1*4KB, 2*4KB, ..., 256*4KB, then back 
to 1*4KB.


 amdgpu_test-1384  [005]    110.634466: amdgpu_vm_bo_update: 
soffs=11, eoffs=1f, flags=70
 amdgpu_test-1384  [005]    110.634467: amdgpu_vm_set_ptes: 
pe=f5feffd008, addr=01fec0, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd010, addr=01fec01000, incr=4096, flags=f1, count=2
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd020, addr=01fec03000, incr=4096, flags=171, count=4
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd040, addr=01fec07000, incr=4096, flags=1f1, count=8
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd080, addr=01fec0f000, incr=4096, flags=271, count=16
 amdgpu_test-1384  [005]    110.634468: amdgpu_vm_set_ptes: 
pe=f5feffd100, addr=01fec1f000, incr=4096, flags=2f1, count=32
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd200, addr=01fec3f000, incr=4096, flags=371, count=64
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd400, addr=01fec7f000, incr=4096, flags=3f1, count=128
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffd800, addr=01fecff000, incr=4096, flags=471, count=256
 amdgpu_test-1384  [005]    110.634469: amdgpu_vm_set_ptes: 
pe=f5feffc000, addr=01fedff000, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634470: amdgpu_vm_set_ptes: 
pe=f5feffc008, addr=01fea0, incr=4096, flags=71, count=1
 amdgpu_test-1384  [005]    110.634470: amdgpu_vm_set_ptes: 
pe=f5feffc010, addr=01fea01000, incr=4096, flags=f1, count=2


Yes, that it is exactly the expected result with the old code.



And it sounds like a performance test for Vega and later.
If so, shall we add some time stamp in the log?


Well I used it as performance test, but the resulting numbers are not 
very comparable.


It is useful to push to libdrm because it also exercises the VM code 
and makes sure that the code doesn't crash on corner cases.

Thanks for your info.
That's fine for me.

Reviewed-by: Junwei Zhang 

BTW, still think adding a print here is a good choice.
+ /* Don't let the test fail if the device doesn't have enough VRAM */
+ if (r)
+ return;

Regards,
Jerry


Regards,
Christian.



Regards,
Jerry



Regards,
Christian.





Signed-off-by: Christian König 
---
  tests/amdgpu/vm_tests.c | 45 
-

  1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/tests/amdgpu/vm_tests.c b/tests/amdgpu/vm_tests.c
index 7b6dc5d6..fada2987 100644
--- a/tests/amdgpu/vm_tests.c
+++ b/tests/amdgpu/vm_tests.c
@@ -31,8 +31,8 @@ static  amdgpu_device_handle device_handle;
  static  uint32_t  major_version;
  static  uint32_t  minor_version;

-
  static void amdgpu_vmid_reserve_test(void);
+static void amdgpu_vm_unaligned_map(void);

  CU_BOOL suite_vm_tests_enable(void)
  {
@@ -84,6 +84,7 @@ int suite_vm_tests_clean(void)

  CU_TestInfo vm_tests[] = {
  { "resere vmid test",  amdgpu_vmid_reserve_test },
+    { "unaligned map",  amdgpu_vm_unaligned_map },
  CU_TEST_INFO_NULL,
  };

@@ -167,3 +168,45 @@ static void amdgpu_vmid_reserve_test(void)
  r = amdgpu_cs_ctx_free(context_handle);
  CU_ASSERT_EQUAL(r, 0);
  }
+
+static void amdgpu_vm_unaligned_map(void)
+{
+    const uint64_t map_size = (4ULL << 30) - (2 << 12);
+    struct amdgpu_bo_alloc_request request = {};
+    amdgpu_bo_handle buf_handle;
+    amdgpu_va_handle handle;
+    uint64_t vmc_addr;
+    int r;
+
+    request.alloc_size = 4ULL << 30;
+    request.phys_alignment = 4096;
+    request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM;
+    request.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
+
+    r = amdgpu_bo_alloc(device_handle, , _handle);
+    /* Don't let the test fail if the device doesn't have enough 
VRAM */


We may print some info to the console here.

Regards,
Jerry


+    if (r)
+    return;
+
+    r = 

RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Friday, September 14, 2018 3:27 PM
> To: Zhou, David(ChunMing) ; Zhou,
> David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-gfx@lists.freedesktop.org; Daniel Vetter
> 
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 14.09.2018 um 05:59 schrieb zhoucm1:
> >
> >
> > On 2018年09月14日 11:14, zhoucm1 wrote:
> >>
> >>
> >> On 2018年09月13日 18:22, Christian König wrote:
> >>> Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):
> 
> > -Original Message-
> > From: Koenig, Christian
> > Sent: Thursday, September 13, 2018 5:20 PM
> > To: Zhou, David(ChunMing) ; dri-
> > de...@lists.freedesktop.org
> > Cc: Dave Airlie ; Rakos, Daniel
> > ; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >
> > Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):
> >>> -Original Message-
> >>> From: Christian König 
> >>> Sent: Thursday, September 13, 2018 4:50 PM
> >>> To: Zhou, David(ChunMing) ; Koenig,
> >>> Christian ;
> >>> dri-de...@lists.freedesktop.org
> >>> Cc: Dave Airlie ; Rakos, Daniel
> >>> ; amd-gfx@lists.freedesktop.org
> >>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support
> >>> v4
> >>>
> >>> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
> > -Original Message-
> > From: Koenig, Christian
> > Sent: Thursday, September 13, 2018 2:56 PM
> > To: Zhou, David(ChunMing) ; Zhou,
> > David(ChunMing) ; dri-
> > de...@lists.freedesktop.org
> > Cc: Dave Airlie ; Rakos, Daniel
> > ; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline
> > support v4
> >
> > Am 13.09.2018 um 04:15 schrieb zhoucm1:
> >> On 2018年09月12日 19:05, Christian König wrote:
> >>> [SNIP]
> >>> +static void
> >>> +drm_syncobj_find_signal_pt_for_wait_pt(struct
> >>> drm_syncobj *syncobj,
> >>> +   struct drm_syncobj_wait_pt
> >>> +*wait_pt) {
> >> That whole approach still looks horrible complicated to me.
>  It's already very close to what you said before.
> 
> >> Especially the separation of signal and wait pt is
> >> completely unnecessary as far as I can see.
> >> When a wait pt is requested we just need to search for
> >> the signal point which it will trigger.
>  Yeah, I tried this, but when I implement cpu wait ioctl on
>  specific point, we need a advanced wait pt fence,
>  otherwise, we could still need old syncobj cb.
> >>> Why? I mean you just need to call drm_syncobj_find_fence()
> >>> and
> >>> when
> >>> that one returns NULL you use wait_event_*() to wait for a
> >>> signal point >= your wait point to appear and try again.
> >> e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC
> >> have no fence yet, as you said, during
> >> drm_syncobj_find_fence(A) is working on wait_event,
> syncobjB
> >> and syncobjC could already be signaled, then we don't know
> >> which one is first signaled, which is need when wait ioctl
> >> returns.
> > I don't really see a problem with that. When you wait for the
> > first one you need to wait for A,B,C at the same time anyway.
> >
> > So what you do is to register a fence callback on the fences
> > you already have and for the syncobj which doesn't yet have a
> > fence you make sure that they wake up your thread when they
> > get one.
> >
> > So essentially exactly what
> > drm_syncobj_fence_get_or_add_callback()
> > already does today.
>  So do you mean we need still use old syncobj CB for that?
> >>> Yes, as far as I can see it should work.
> >>>
>      Advanced wait pt is bad?
> >>> Well it isn't bad, I just don't see any advantage in it.
> >> The advantage is to replace old syncobj cb.
> >>
> >>> The existing mechanism
> >>> should already be able to handle that.
> >> I thought more a bit, we don't that mechanism at all, if use
> >> advanced wait
> > pt, we can easily use fence array to achieve it for wait ioctl, we
> > should use kernel existing feature as much as possible, not invent
> > another, shouldn't we?
> > I remember  you said  it before.
> >
> > Yeah, but the syncobj cb is an existing feature.
>  This is obviously a workaround when doing for wait ioctl, Do you
>  see it used in other place?
> 
> > And I absolutely don't see a
> > need to modify that and 

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread Christian König

Am 14.09.2018 um 05:59 schrieb zhoucm1:



On 2018年09月14日 11:14, zhoucm1 wrote:



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline 
support v4


Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.

Especially the separation of signal and wait pt is 
completely

unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, 
otherwise, we

could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when
that one returns NULL you use wait_event_*() to wait for a 
signal

point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC 
have

no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, 
which is

need when wait ioctl returns.
I don't really see a problem with that. When you wait for the 
first

one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a 
fence you

make sure that they wake up your thread when they get one.

So essentially exactly what 
drm_syncobj_fence_get_or_add_callback()

already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you 
see it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do 
the trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' 
what Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so 
that the patch can pass soon.
When I try to remove wait pt future fence, I encounter another 
problem, drm_syncobj_find_fence cannot get a fence if signal pt 
already is collected as garbage, then CS will report error, any idea 
for that?


Well when the signal pt is already garbage collected you know that it is 
already signaled. So you can just return a dummy fence.


I actually thought that this was the intention of 

Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote:
> Am 13.09.2018 um 10:31 schrieb Huang Rui:
> > On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote:
> >> While cutting the lists we sometimes accidentally added a list_head from
> >> the stack to the LRUs, effectively corrupting the list.
> >>
> >> Remove the list cutting and use explicit list manipulation instead.
> > This patch actually fixes the corruption bug. Was it a defect of
> > list_cut_position or list_splice handlers?
> 
> We somehow did something illegal with list_cut_position. I haven't 
> narrowed it down till the end, but we ended up with list_heads from the 
> stack to the lru.
> 

I am confused, in theory, even we do any manipulation with list helper, it
should not trigger the list corruption. The usage of those helpers should
ensure the list operation safely...

> Anyway adding a specialized list bulk move function is much simpler and 
> avoids the issue.
> 
> I've just split that up as Michel suggested and send it out to the 
> mailing lists, please review that version once more.
> 

Sure, already reviewed.

> Thanks,
> Christian.
> 
> >
> > Reviewed-and-Tested: Huang Rui 
> >
> >> Signed-off-by: Christian König 
> >> ---
> >>   drivers/gpu/drm/ttm/ttm_bo.c | 51 
> >> ++--
> >>   1 file changed, 30 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >> index 138c98902033..b2a33bf1ef10 100644
> >> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >> @@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct 
> >> ttm_buffer_object *bo,
> >>   }
> >>   EXPORT_SYMBOL(ttm_bo_move_to_lru_tail);
> >>   
> >> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos,
> >> -  struct list_head *lru, bool is_swap)
> >> +static void ttm_list_move_bulk_tail(struct list_head *list,
> >> +  struct list_head *first,
> >> +  struct list_head *last)
> >>   {
> >> -  struct list_head *list;
> >> -  LIST_HEAD(entries);
> >> -  LIST_HEAD(before);
> >> +  first->prev->next = last->next;
> >> +  last->next->prev = first->prev;
> >>   
> >> -  reservation_object_assert_held(pos->last->resv);
> >> -  list = is_swap ? >last->swap : >last->lru;
> >> -  list_cut_position(, lru, list);
> >> +  list->prev->next = first;
> >> +  first->prev = list->prev;
> >>   
> >> -  reservation_object_assert_held(pos->first->resv);
> >> -  list = is_swap ? pos->first->swap.prev : pos->first->lru.prev;
> >> -  list_cut_position(, , list);
> >> -
> >> -  list_splice(, lru);
> >> -  list_splice_tail(, lru);
> >> +  last->next = list;
> >> +  list->prev = last;
> >>   }
> >>   
> >>   void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk)
> >> @@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct 
> >> ttm_lru_bulk_move *bulk)
> >>unsigned i;
> >>   
> >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> >> +  struct ttm_lru_bulk_move_pos *pos = >tt[i];
> >>struct ttm_mem_type_manager *man;
> >>   
> >> -  if (!bulk->tt[i].first)
> >> +  if (!pos->first)
> >>continue;
> >>   
> >> -  man = >tt[i].first->bdev->man[TTM_PL_TT];
> >> -  ttm_bo_bulk_move_helper(>tt[i], >lru[i], false);
> >> +  reservation_object_assert_held(pos->first->resv);
> >> +  reservation_object_assert_held(pos->last->resv);
> >> +
> >> +  man = >first->bdev->man[TTM_PL_TT];
> >> +  ttm_list_move_bulk_tail(>lru[i], >first->lru,
> >> +  >last->lru);
> >>}
> >>   
> >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> >> +  struct ttm_lru_bulk_move_pos *pos = >vram[i];
> >>struct ttm_mem_type_manager *man;
> >>   
> >> -  if (!bulk->vram[i].first)
> >> +  if (!pos->first)
> >>continue;
> >>   
> >> -  man = >vram[i].first->bdev->man[TTM_PL_VRAM];
> >> -  ttm_bo_bulk_move_helper(>vram[i], >lru[i], false);
> >> +  reservation_object_assert_held(pos->first->resv);
> >> +  reservation_object_assert_held(pos->last->resv);
> >> +
> >> +  man = >first->bdev->man[TTM_PL_VRAM];
> >> +  ttm_list_move_bulk_tail(>lru[i], >first->lru,
> >> +  >last->lru);
> >>}
> >>   
> >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
> >> @@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct 
> >> ttm_lru_bulk_move *bulk)
> >>if (!pos->first)
> >>continue;
> >>   
> >> +  reservation_object_assert_held(pos->first->resv);
> >> +  reservation_object_assert_held(pos->last->resv);
> >> +
> >>lru = >first->bdev->glob->swap_lru[i];
> >> -  ttm_bo_bulk_move_helper(>swap[i], lru, true);
> >> +  ttm_list_move_bulk_tail(lru, 

Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

2018-09-14 Thread Christian König

Am 13.09.2018 um 23:51 schrieb Felix Kuehling:

On 2018-09-13 04:52 PM, Philip Yang wrote:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream.

Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |   6 +-
  drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |   2 +-
  4 files changed, 56 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
bool "Always enable userptr write support"
depends on DRM_AMDGPU
-   select MMU_NOTIFIER
+   select HMM_MIRROR
help
- This option selects CONFIG_MMU_NOTIFIER if it isn't already
- selected to enabled full userptr support.
+ This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+ isn't already selected to enabled full userptr support.
  
  config DRM_AMDGPU_GART_DEBUGFS

bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
  
  include $(FULL_AMD_PATH)/powerplay/Makefile
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ad52f34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -45,7 +45,7 @@
  
  #include 

  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
@@ -66,6 +66,7 @@

Need to remove @mn documentation.


   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -73,7 +74,6 @@ struct amdgpu_mn {
/* constant after initialisation */
struct amdgpu_device*adev;
struct mm_struct*mm;
-   struct mmu_notifier mn;
enum amdgpu_mn_type type;
  
  	/* only used on destruction */

@@ -87,6 +87,9 @@ struct amdgpu_mn {
struct rb_root_cached   objects;
struct mutexread_lock;
atomic_trecursion;
+
+   /* HMM mirror */
+   struct hmm_mirror   mirror;
  };
  
  /**

@@ -103,7 +106,7 @@ struct amdgpu_mn_node {
  };
  
  /**

- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work)
}
up_write(>lock);
mutex_unlock(>mn_lock);
-   mmu_notifier_unregister_no_release(>mn, amn->mm);
+   hmm_mirror_unregister(>mirror);
+
kfree(amn);
  }
  
  /**

   * amdgpu_mn_release - callback to notify about mm destruction

Update the function name in the comment.


   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-   struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+   struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
  
  	INIT_WORK(>work, amdgpu_mn_destroy);

schedule_work(>work);
  }
  
-

  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct 
amdgpu_mn_node *node,
  /**
   * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change
   *
- * @mn: our notifier
- * @mm: the mm this callback is about
- * @start: start of updated range
- * @end: end of updated range
+ * @mirror: the hmm_mirror (mm) is about to update
+ * @update: the update start, end address
   *
   * Block for operations on BOs to finish and mark pages as accessed and
   * potentially 

[PATCH] drm/amd/dc: Trigger set power state task when display configuration changes

2018-09-14 Thread Rex Zhu
Revert "drm/amd/display: Remove call to amdgpu_pm_compute_clocks"

This reverts commit dcd473770e86517543691bdb227103d6c781cd0a.

when display configuration changes, dc need to update the changes
to powerplay, also need to trigger a power state task.
amdgpu_pm_compute_clocks is the interface to set power state task
either dpm enabled or powerplay enabled

Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
index 6d16b4a..0fab64a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
@@ -105,6 +105,8 @@ bool dm_pp_apply_display_requirements(
adev->powerplay.pp_funcs->display_configuration_change(
adev->powerplay.pp_handle,
>pm.pm_display_cfg);
+
+   amdgpu_pm_compute_clocks(adev);
}
 
return true;
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: simplify Raven, Raven2, and Picasso handling

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 03:45:27PM -0500, Alex Deucher wrote:
> Treat them all as Raven rather than adding a new picasso
> asic type.  This simplifies a lot of code and also handles the
> case of rv2 chips with the 0x15d8 pci id.  It also fixes dmcu
> fw handling for picasso.

We drop the Picasso asic type, and keep the separate ucode.
It's fine. We can also support the RV2 PCO refresh with the change.

Acked-by: Huang Rui 

> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c|  7 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  4 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 32 ++-
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 --
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c| 11 ++--
>  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c |  5 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 11 +---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 66 
> ++
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |  8 +--
>  drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c|  1 -
>  .../gpu/drm/amd/powerplay/hwmgr/processpptables.c  |  8 +--
>  include/drm/amd_asic_type.h|  1 -
>  16 files changed, 60 insertions(+), 113 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 762dc5f886cd..354f0557d697 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -91,7 +91,6 @@ static const char *amdgpu_asic_name[] = {
>   "VEGA12",
>   "VEGA20",
>   "RAVEN",
> - "PICASSO",
>   "LAST",
>  };
>  
> @@ -1337,12 +1336,11 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
> amdgpu_device *adev)
>   case CHIP_RAVEN:
>   if (adev->rev_id >= 8)
>   chip_name = "raven2";
> + else if (adev->pdev->device == 0x15d8)
> + chip_name = "picasso";
>   else
>   chip_name = "raven";
>   break;
> - case CHIP_PICASSO:
> - chip_name = "picasso";
> - break;
>   }
>  
>   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name);
> @@ -1468,8 +1466,7 @@ static int amdgpu_device_ip_early_init(struct 
> amdgpu_device *adev)
>   case CHIP_VEGA12:
>   case CHIP_VEGA20:
>   case CHIP_RAVEN:
> - case CHIP_PICASSO:
> - if ((adev->asic_type == CHIP_RAVEN) || (adev->asic_type == 
> CHIP_PICASSO))
> + if (adev->asic_type == CHIP_RAVEN)
>   adev->family = AMDGPU_FAMILY_RV;
>   else
>   adev->family = AMDGPU_FAMILY_AI;
> @@ -2183,7 +2180,6 @@ bool amdgpu_device_asic_has_dc_support(enum 
> amd_asic_type asic_type)
>   case CHIP_VEGA20:
>  #if defined(CONFIG_DRM_AMD_DC_DCN1_0)
>   case CHIP_RAVEN:
> - case CHIP_PICASSO:
>  #endif
>   return amdgpu_dc != 0;
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 33e1856fb8cc..ff10df4f50d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -874,8 +874,7 @@ static const struct pci_device_id pciidlist[] = {
>   {0x1002, 0x66AF, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA20},
>   /* Raven */
>   {0x1002, 0x15dd, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU},
> - /* Picasso */
> - {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_PICASSO|AMD_IS_APU},
> + {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU},
>  
>   {0, 0, 0}
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 611c06d3600a..bd397d2916fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -56,7 +56,6 @@ static int psp_sw_init(void *handle)
>   psp_v3_1_set_psp_funcs(psp);
>   break;
>   case CHIP_RAVEN:
> - case CHIP_PICASSO:
>   psp_v10_0_set_psp_funcs(psp);
>   break;
>   case CHIP_VEGA20:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index acb4c66fe89b..1fa8bc337859 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -303,7 +303,6 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, 
> int load_type)
>   return AMDGPU_FW_LOAD_SMU;
>   case CHIP_VEGA10:
>   case CHIP_RAVEN:
> - case CHIP_PICASSO:
>   case CHIP_VEGA12:
> 

Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread Christian König

Am 14.09.2018 um 09:46 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Friday, September 14, 2018 3:27 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org; Daniel Vetter

Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 14.09.2018 um 05:59 schrieb zhoucm1:


On 2018年09月14日 11:14, zhoucm1 wrote:


On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig,
Christian ;
dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support
v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline
support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void
+drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is
completely unnecessary as far as I can see.
When a wait pt is requested we just need to search for
the signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence,
otherwise, we could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence()
and

when

that one returns NULL you use wait_event_*() to wait for a
signal point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC
have no fence yet, as you said, during
drm_syncobj_find_fence(A) is working on wait_event,

syncobjB

and syncobjC could already be signaled, then we don't know
which one is first signaled, which is need when wait ioctl
returns.

I don't really see a problem with that. When you wait for the
first one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences
you already have and for the syncobj which doesn't yet have a
fence you make sure that they wake up your thread when they
get one.

So essentially exactly what
drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


     Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.

I thought more a bit, we don't that mechanism at all, if use
advanced wait

pt, we can easily use fence array to achieve it for wait ioctl, we
should use kernel existing feature as much as possible, not invent
another, shouldn't we?
I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.

This is obviously a workaround when doing for wait ioctl, Do you
see it used in other place?


And I absolutely don't see a
need to modify that and replace it with something far more complex.

The wait ioctl is simplified much more by fence array, not complex,
and we just need  to allocate a wait pt.  If keeping old syncobj cb
workaround, all wait pt logic still is there, just save allocation
and wait pt handling, in fact, which part isn't complex at all. But
compare with ugly syncobj cb, which is simpler.

I strongly disagree on that. You just need to extend the syncobj cb
with the sequence number and you are done.

We could clean that up in the long term by adding some wait_multi
event macro, but for now just adding the sequence number should do
the trick.

Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence
semantics will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence'
what Daniel Vetter said before, which obviously a right direction.


Anyway, I will change the patch as you like if no other comment, so
that the patch can pass soon.

When I try to remove wait pt future 

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-14 Thread Christian König
Well as long as we don't need to save any content it should be trivial 
to implement resource management with the existing code.


I will take a look why allocating GDS BOs fail at the moment, if it is 
something trivial we could still fix it.


Christian.

Am 13.09.2018 um 23:01 schrieb Marek Olšák:

To be fair, since we have only 7 user VMIDs and 8 chunks of GDS, we
can make the 8th GDS chunk global and allocatable and use it based on
a CS flag. It would need more work and a lot of testing though. I
don't think we can do the testing part now because of the complexity
of interactions between per-VMID GDS and global GDS, but it's
certainly something that people could add in the future.

Marek

On Thu, Sep 13, 2018 at 3:04 PM, Marek Olšák  wrote:

I was thinking about that too, but it would be too much trouble for
something we don't need.

Marek

On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander
 wrote:

Why don't we just fix up the current GDS code so it works the same as vram
and then we can add a new CS or context flag to ignore the current static
allocation for gfx.  We can ignore data persistence if it's too much
trouble.  Assume you always have to init the memory before you use it.
That's already the case.


Alex

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v3

2018-09-14 Thread Christian König

Am 13.09.2018 um 22:45 schrieb Philip Yang:

On 2018-09-13 02:24 PM, Christian König wrote:

Am 13.09.2018 um 20:00 schrieb Philip Yang:

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.

This depends on several HMM patchset from Jérôme Glisse queued for
upstream. See
http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD 
intranet)


Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e
Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/Kconfig |  6 +--
  drivers/gpu/drm/amd/amdgpu/Makefile    |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 88 
+++---

  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 9221e54..960a633 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK
  config DRM_AMDGPU_USERPTR
  bool "Always enable userptr write support"
  depends on DRM_AMDGPU
-    select MMU_NOTIFIER
+    select HMM_MIRROR
  help
-  This option selects CONFIG_MMU_NOTIFIER if it isn't already
-  selected to enabled full userptr support.
+  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
+  isn't already selected to enabled full userptr support.
    config DRM_AMDGPU_GART_DEBUGFS
  bool "Allow GART access through debugfs"
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile

index 138cb78..c1e5d43 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -171,7 +171,7 @@ endif
  amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o
  amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
  amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
-amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o
+amdgpu-$(CONFIG_HMM) += amdgpu_mn.o
    include $(FULL_AMD_PATH)/powerplay/Makefile
  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c

index e55508b..ea8671f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -46,6 +46,7 @@
  #include 
  #include 
  #include 
+#include 


Can we now drop including linux/mmu_notifier.h?

Yes, will use hmm_mirror_ops to replace gfx and kfd mmu_notifier_ops


Please drop that and implement the gfx and kfd operations directly.

Thanks,
Christian.


  #include 
  #include 
  #include 
@@ -66,6 +67,7 @@
   * @objects: interval tree containing amdgpu_mn_nodes
   * @read_lock: mutex for recursive locking of @lock
   * @recursion: depth of recursion
+ * @mirror: HMM mirror function support
   *
   * Data for each amdgpu device and process address space.
   */
@@ -87,6 +89,9 @@ struct amdgpu_mn {
  struct rb_root_cached    objects;
  struct mutex    read_lock;
  atomic_t    recursion;
+
+    /* HMM mirror */
+    struct hmm_mirror    mirror;
  };
    /**
@@ -103,7 +108,7 @@ struct amdgpu_mn_node {
  };
    /**
- * amdgpu_mn_destroy - destroy the MMU notifier
+ * amdgpu_mn_destroy - destroy the HMM mirror
   *
   * @work: previously sheduled work item
   *
@@ -129,28 +134,27 @@ static void amdgpu_mn_destroy(struct 
work_struct *work)

  }
  up_write(>lock);
  mutex_unlock(>mn_lock);
-    mmu_notifier_unregister_no_release(>mn, amn->mm);
+
+    hmm_mirror_unregister(>mirror);
+
  kfree(amn);
  }
    /**
   * amdgpu_mn_release - callback to notify about mm destruction
   *
- * @mn: our notifier
- * @mm: the mm this callback is about
+ * @mirror: the HMM mirror (mm) this callback is about
   *
- * Shedule a work item to lazy destroy our notifier.
+ * Shedule a work item to lazy destroy HMM mirror.
   */
-static void amdgpu_mn_release(struct mmu_notifier *mn,
-  struct mm_struct *mm)
+static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
  {
-    struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
+    struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, 
mirror);

    INIT_WORK(>work, amdgpu_mn_destroy);
  schedule_work(>work);
  }
  -
  /**
   * amdgpu_mn_lock - take the write side lock for this notifier
   *
@@ -355,12 +359,10 @@ static void 
amdgpu_mn_invalidate_range_end(struct mmu_notifier *mn,

    static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
  [AMDGPU_MN_TYPE_GFX] = {
-    .release = amdgpu_mn_release,
  .invalidate_range_start = 
amdgpu_mn_invalidate_range_start_gfx,

  .invalidate_range_end = amdgpu_mn_invalidate_range_end,
  },
  [AMDGPU_MN_TYPE_HSA] = {
-    .release = amdgpu_mn_release,
  .invalidate_range_start = 
amdgpu_mn_invalidate_range_start_hsa,

  .invalidate_range_end = amdgpu_mn_invalidate_range_end,
  },
@@ -373,12 +375,63 @@ static const struct 

Re: [PATCH] drm/amdgpu/display: return proper error codes in dm

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 11:29:28AM -0500, Alex Deucher wrote:
> Replace -1 with proper error codes.
> 
> Signed-off-by: Alex Deucher 

Acked-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index eccae63d3ef1..541f33749961 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -493,7 +493,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
>  error:
>   amdgpu_dm_fini(adev);
>  
> - return -1;
> + return -EINVAL;
>  }
>  
>  static void amdgpu_dm_fini(struct amdgpu_device *adev)
> @@ -548,7 +548,7 @@ static int load_dmcu_fw(struct amdgpu_device *adev)
>   break;
>   default:
>   DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type);
> - return -1;
> + return -EINVAL;
>   }
>  
>   if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
> @@ -1537,7 +1537,7 @@ static int amdgpu_dm_initialize_drm_device(struct 
> amdgpu_device *adev)
>   link_cnt = dm->dc->caps.max_links;
>   if (amdgpu_dm_mode_config_init(dm->adev)) {
>   DRM_ERROR("DM: Failed to initialize mode config\n");
> - return -1;
> + return -EINVAL;
>   }
>  
>   /* Identify the number of planes to be initialized */
> @@ -1652,7 +1652,7 @@ static int amdgpu_dm_initialize_drm_device(struct 
> amdgpu_device *adev)
>   kfree(aconnector);
>   for (i = 0; i < dm->dc->caps.max_planes; i++)
>   kfree(mode_info->planes[i]);
> - return -1;
> + return -EINVAL;
>  }
>  
>  static void amdgpu_dm_destroy_drm_device(struct amdgpu_display_manager *dm)
> -- 
> 2.13.6
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when

that one returns NULL you use wait_event_*() to wait for a signal
point >= your wait point to appear and try again.

e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, which is
need when wait ioctl returns.

I don't really see a problem with that. When you wait for the first
one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence you
make sure that they wake up your thread when they get one.

So essentially exactly what drm_syncobj_fence_get_or_add_callback()
already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do the 
trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' what 
Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so that 
the patch can pass soon.


Thanks,
David Zhou


Regards,
Christian.



Thanks,
David Zhou

Regards,
Christian.


Thanks,
David Zhou

Christian.


Thanks,
David Zhou

Regards,
Christian.


Back to my implementation, it already fixes all your concerns
before, and can be able to easily used in wait_ioctl. When you
feel that is complicated, I guess that is because we merged all
logic to that and much clean up in one patch. In fact, it already
is very simple, timeline_init/fini, create signal/wait_pt, find

Re: [PATCH] drm/amdgpu/soc15: clean up picasso support

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 03:07:57PM -0500, Alex Deucher wrote:
> It's the same as raven so remove the duplicate case.
> 
> Signed-off-by: Alex Deucher 

Acked-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 17 -
>  1 file changed, 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index f5a44d1fe5da..f930e09071d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -546,23 +546,6 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, _v4_0_ip_block);
>   break;
>   case CHIP_RAVEN:
> - amdgpu_device_ip_block_add(adev, _common_ip_block);
> - amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
> - amdgpu_device_ip_block_add(adev, _ih_ip_block);
> - amdgpu_device_ip_block_add(adev, _v10_0_ip_block);
> - amdgpu_device_ip_block_add(adev, _smu_ip_block);
> - if (adev->enable_virtual_display || amdgpu_sriov_vf(adev))
> - amdgpu_device_ip_block_add(adev, _virtual_ip_block);
> -#if defined(CONFIG_DRM_AMD_DC)
> - else if (amdgpu_device_has_dc_support(adev))
> - amdgpu_device_ip_block_add(adev, _ip_block);
> -#else
> -#warning "Enable CONFIG_DRM_AMD_DC for display support on SOC15."
> -#endif
> - amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
> - amdgpu_device_ip_block_add(adev, _v4_0_ip_block);
> - amdgpu_device_ip_block_add(adev, _v1_0_ip_block);
> - break;
>   case CHIP_PICASSO:
>   amdgpu_device_ip_block_add(adev, _common_ip_block);
>   amdgpu_device_ip_block_add(adev, _v9_0_ip_block);
> -- 
> 2.13.6
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread zhoucm1



On 2018年09月14日 11:14, zhoucm1 wrote:



On 2018年09月13日 18:22, Christian König wrote:

Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):



-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 5:20 PM
To: Zhou, David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):

-Original Message-
From: Christian König 
Sent: Thursday, September 13, 2018 4:50 PM
To: Zhou, David(ChunMing) ; Koenig, Christian
; dri-de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):

-Original Message-
From: Koenig, Christian
Sent: Thursday, September 13, 2018 2:56 PM
To: Zhou, David(ChunMing) ; Zhou,
David(ChunMing) ; dri-
de...@lists.freedesktop.org
Cc: Dave Airlie ; Rakos, Daniel
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

Am 13.09.2018 um 04:15 schrieb zhoucm1:

On 2018年09月12日 19:05, Christian König wrote:

[SNIP]
+static void drm_syncobj_find_signal_pt_for_wait_pt(struct
drm_syncobj *syncobj,
+   struct drm_syncobj_wait_pt
+*wait_pt) {

That whole approach still looks horrible complicated to me.

It's already very close to what you said before.


Especially the separation of signal and wait pt is completely
unnecessary as far as I can see.
When a wait pt is requested we just need to search for the
signal point which it will trigger.

Yeah, I tried this, but when I implement cpu wait ioctl on
specific point, we need a advanced wait pt fence, otherwise, we
could still need old syncobj cb.

Why? I mean you just need to call drm_syncobj_find_fence() and

when
that one returns NULL you use wait_event_*() to wait for a 
signal

point >= your wait point to appear and try again.
e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC 
have

no fence yet, as you said, during drm_syncobj_find_fence(A) is
working on wait_event, syncobjB and syncobjC could already be
signaled, then we don't know which one is first signaled, 
which is

need when wait ioctl returns.
I don't really see a problem with that. When you wait for the 
first

one you need to wait for A,B,C at the same time anyway.

So what you do is to register a fence callback on the fences you
already have and for the syncobj which doesn't yet have a fence 
you

make sure that they wake up your thread when they get one.

So essentially exactly what 
drm_syncobj_fence_get_or_add_callback()

already does today.

So do you mean we need still use old syncobj CB for that?

Yes, as far as I can see it should work.


    Advanced wait pt is bad?

Well it isn't bad, I just don't see any advantage in it.

The advantage is to replace old syncobj cb.


The existing mechanism
should already be able to handle that.
I thought more a bit, we don't that mechanism at all, if use 
advanced wait
pt, we can easily use fence array to achieve it for wait ioctl, we 
should use
kernel existing feature as much as possible, not invent another, 
shouldn't we?

I remember  you said  it before.

Yeah, but the syncobj cb is an existing feature.
This is obviously a workaround when doing for wait ioctl, Do you see 
it used in other place?



And I absolutely don't see a
need to modify that and replace it with something far more complex.
The wait ioctl is simplified much more by fence array, not complex, 
and we just need  to allocate a wait pt.  If keeping old syncobj cb 
workaround, all wait pt logic still is there, just save allocation 
and wait pt handling, in fact, which part isn't complex at all. But 
compare with ugly syncobj cb, which is simpler.


I strongly disagree on that. You just need to extend the syncobj cb 
with the sequence number and you are done.


We could clean that up in the long term by adding some wait_multi 
event macro, but for now just adding the sequence number should do 
the trick.


Quote from Daniel Vetter comment when v1, "

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

"
I think the advanced wait pt is a similar concept as 'future fence' 
what Daniel Vetter said before, which obviously a right direction.



Anyway, I will change the patch as you like if no other comment, so 
that the patch can pass soon.
When I try to remove wait pt future fence, I encounter another problem, 
drm_syncobj_find_fence cannot get a fence if signal pt already is 
collected as garbage, then CS will report error, any idea for that?
I still think the future fence is right thing, Could you give futher 
thought on it again? Otherwise, we could need various workarounds.


Thanks,
David Zhou


Thanks,
David Zhou


Regards,
Christian.



Thanks,
David Zhou


Re: [PATCH 0/2] DMCU firmware version storing and access

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 03:45:12PM -0400, David Francis wrote:
> David Francis (2):
>   drm/amd/display: Add DMCU firmware version
>   drm/amdgpu: Add DMCU to firmware query interface

Thanks David. As these patches, we can also monitor the ucode at
amdgpu_firmware_info interface.

Series are Reviewed-by: Huang Rui 

> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   | 12 
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 ++
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  1 +
>  include/uapi/drm/amdgpu_drm.h |  2 ++
>  4 files changed, 17 insertions(+)
> 
> -- 
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] list: introduce list_bulk_move_tail helper

2018-09-14 Thread Huang Rui
On Thu, Sep 13, 2018 at 01:22:07PM +0200, Christian König wrote:
> Move all entries between @first and including @last before @head.
> 
> This is useful for LRU lists where a whole block of entries should be
> moved to the end of an list.
> 
> Signed-off-by: Christian König 

Bulk move helper is useful for TTM driver to improve the LRU moving
efficiency. Please go on with my RB.

Series are Reviewed-and-Tested-by: Huang Rui 

> ---
>  include/linux/list.h | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/list.h b/include/linux/list.h
> index de04cc5ed536..edb7628e46ed 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -183,6 +183,29 @@ static inline void list_move_tail(struct list_head *list,
>   list_add_tail(list, head);
>  }
>  
> +/**
> + * list_bulk_move_tail - move a subsection of a list to its tail
> + * @head: the head that will follow our entry
> + * @first: first entry to move
> + * @last: last entry to move, can be the same as first
> + *
> + * Move all entries between @first and including @last before @head.
> + * All three entries must belong to the same linked list.
> + */
> +static inline void list_bulk_move_tail(struct list_head *head,
> +struct list_head *first,
> +struct list_head *last)
> +{
> + first->prev->next = last->next;
> + last->next->prev = first->prev;
> +
> + head->prev->next = first;
> + first->prev = head->prev;
> +
> + last->next = head;
> + head->prev = last;
> +}
> +
>  /**
>   * list_is_last - tests whether @list is the last entry in list @head
>   * @list: the entry to test
> -- 
> 2.14.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx