Re: kfdtest failures for amdkfd (amd-staging-drm-next)
You need ROCm 1.9 to work with the upstream KFD. libhsakmt from ROCm 1.8 is incompatible with the upstream KFD ABI. Where did you get KFDTest? It's part of the same repository on GitHub as libhsakmt. It's new on the 1.9 branch. You need libhsakmt from the same branch. The ROCm 1.9 binaries are planned to be released later today if all goes well. Regards, Felix On 2018-09-14 07:50 AM, Alexander Frolov wrote: > Hi! > > I am trying to use amd-staging-drm-next to work with amdkfd (built > into amdgpu) for the AMD Instinct MI25 device. > > As a first step I compiled libhsakmt 1.8.x and tried to run kfdtest. > But it produces lots of failures (see below). > Here are the results: > > ... > [==] 76 tests from 14 test cases ran. (80250 ms total) > [ PASSED ] 39 tests. > [ FAILED ] 37 tests, listed below: > [ FAILED ] KFDEvictTest.QueueTest > [ FAILED ] KFDGraphicsInterop.RegisterGraphicsHandle > [ FAILED ] KFDIPCTest.BasicTest > [ FAILED ] KFDIPCTest.CrossMemoryAttachTest > [ FAILED ] KFDIPCTest.CMABasicTest > [ FAILED ] KFDLocalMemoryTest.BasicTest > [ FAILED ] KFDLocalMemoryTest.VerifyContentsAfterUnmapAndMap > [ FAILED ] KFDLocalMemoryTest.CheckZeroInitializationVram > [ FAILED ] KFDMemoryTest.MapUnmapToNodes > [ FAILED ] KFDMemoryTest.MemoryRegisterSamePtr > [ FAILED ] KFDMemoryTest.FlatScratchAccess > [ FAILED ] KFDMemoryTest.MMBench > [ FAILED ] KFDMemoryTest.QueryPointerInfo > [ FAILED ] KFDMemoryTest.PtraceAccessInvisibleVram > [ FAILED ] KFDMemoryTest.SignalHandling > [ FAILED ] KFDQMTest.CreateCpQueue > [ FAILED ] KFDQMTest.CreateMultipleSdmaQueues > [ FAILED ] KFDQMTest.SdmaConcurrentCopies > [ FAILED ] KFDQMTest.CreateMultipleCpQueues > [ FAILED ] KFDQMTest.DisableSdmaQueueByUpdateWithNullAddress > [ FAILED ] KFDQMTest.DisableCpQueueByUpdateWithZeroPercentage > [ FAILED ] KFDQMTest.OverSubscribeCpQueues > [ FAILED ] KFDQMTest.BasicCuMaskingEven > [ FAILED ] KFDQMTest.QueuePriorityOnDifferentPipe > [ FAILED ] KFDQMTest.QueuePriorityOnSamePipe > [ FAILED ] KFDQMTest.EmptyDispatch > [ FAILED ] KFDQMTest.SimpleWriteDispatch > [ FAILED ] KFDQMTest.MultipleCpQueuesStressDispatch > [ FAILED ] KFDQMTest.CpuWriteCoherence > [ FAILED ] KFDQMTest.CreateAqlCpQueue > [ FAILED ] KFDQMTest.QueueLatency > [ FAILED ] KFDQMTest.CpQueueWraparound > [ FAILED ] KFDQMTest.SdmaQueueWraparound > [ FAILED ] KFDQMTest.Atomics > [ FAILED ] KFDQMTest.P2PTest > [ FAILED ] KFDQMTest.SdmaEventInterrupt > [ FAILED ] KFDTopologyTest.BasicTest > > Does it mean that current amdkfd from the kernel cant be used with > libhsakmt 1.8.x? or I am doing something wrong... > Thank you! > > Best, > Alexander > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote: > The .ioctl and .compat_ioctl file operations have the same prototype so > they can both point to the same function, which works great almost all > the time when all the commands are compatible. > > One exception is the s390 architecture, where a compat pointer is only > 31 bit wide, and converting it into a 64-bit pointer requires calling > compat_ptr(). Most drivers here will ever run in s390, but since we now > have a generic helper for it, it's easy enough to use it consistently. > > I double-checked all these drivers to ensure that all ioctl arguments > are used as pointers or are ignored, but are not interpreted as integer > values. > > Signed-off-by: Arnd Bergmann > --- ... > drivers/platform/x86/wmi.c | 2 +- ... > static void link_event_work(struct work_struct *work) > diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c > index 04791ea5d97b..e4d0697e07d6 100644 > --- a/drivers/platform/x86/wmi.c > +++ b/drivers/platform/x86/wmi.c > @@ -886,7 +886,7 @@ static const struct file_operations wmi_fops = { > .read = wmi_char_read, > .open = wmi_char_open, > .unlocked_ioctl = wmi_ioctl, > - .compat_ioctl = wmi_ioctl, > + .compat_ioctl = generic_compat_ioctl_ptrarg, > }; For platform/drivers/x86: Acked-by: Darren Hart (VMware) As for a longer term solution, would it be possible to init fops in such a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg so we don't have to duplicate this boilerplate for every ioctl fops structure? -- Darren Hart VMware Open Source Technology Center ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/7] drm/amdgpu: fix up GDS/GWS/OA shifting
On Fri, Sep 14, 2018 at 3:12 PM Christian König wrote: > > That only worked by pure coincident. Completely remove the shifting and > always apply correct PAGE_SHIFT. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h| 7 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 12 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 14 +++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 15 +++ > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 9 - > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 9 - > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 12 +--- > 9 files changed, 25 insertions(+), 71 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index d762d78e5102..8836186eb5ef 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -721,16 +721,16 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser > *p, > e->bo_va = amdgpu_vm_bo_find(vm, ttm_to_amdgpu_bo(e->tv.bo)); > > if (gds) { > - p->job->gds_base = amdgpu_bo_gpu_offset(gds); > - p->job->gds_size = amdgpu_bo_size(gds); > + p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT; > + p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT; > } > if (gws) { > - p->job->gws_base = amdgpu_bo_gpu_offset(gws); > - p->job->gws_size = amdgpu_bo_size(gws); > + p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT; > + p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT; > } > if (oa) { > - p->job->oa_base = amdgpu_bo_gpu_offset(oa); > - p->job->oa_size = amdgpu_bo_size(oa); > + p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT; > + p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT; > } > > if (!r && p->uf_entry.tv.bo) { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h > index e73728d90388..ecbcefe49a98 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h > @@ -24,13 +24,6 @@ > #ifndef __AMDGPU_GDS_H__ > #define __AMDGPU_GDS_H__ > > -/* Because TTM request that alloacted buffer should be PAGE_SIZE aligned, > - * we should report GDS/GWS/OA size as PAGE_SIZE aligned > - * */ > -#define AMDGPU_GDS_SHIFT 2 > -#define AMDGPU_GWS_SHIFT PAGE_SHIFT > -#define AMDGPU_OA_SHIFTPAGE_SHIFT > - > struct amdgpu_ring; > struct amdgpu_bo; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > index d30a0838851b..7b3d1ebda9df 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > @@ -244,16 +244,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, > void *data, > return -EINVAL; > } > flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS; > - if (args->in.domains == AMDGPU_GEM_DOMAIN_GDS) > - size = size << AMDGPU_GDS_SHIFT; > - else if (args->in.domains == AMDGPU_GEM_DOMAIN_GWS) > - size = size << AMDGPU_GWS_SHIFT; > - else if (args->in.domains == AMDGPU_GEM_DOMAIN_OA) > - size = size << AMDGPU_OA_SHIFT; > - else > - return -EINVAL; > + /* GDS allocations must be DW aligned */ > + if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS) > + size = ALIGN(size, 4); > } > - size = roundup(size, PAGE_SIZE); > > if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) { > r = amdgpu_bo_reserve(vm->root.base.bo, false); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index b766270d86cb..64cc483db973 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -528,13 +528,13 @@ static int amdgpu_info_ioctl(struct drm_device *dev, > void *data, struct drm_file > struct drm_amdgpu_info_gds gds_info; > > memset(_info, 0, sizeof(gds_info)); > - gds_info.gds_gfx_partition_size = > adev->gds.mem.gfx_partition_size >> AMDGPU_GDS_SHIFT; > - gds_info.compute_partition_size = > adev->gds.mem.cs_partition_size >> AMDGPU_GDS_SHIFT; > - gds_info.gds_total_size = adev->gds.mem.total_size >> > AMDGPU_GDS_SHIFT; > - gds_info.gws_per_gfx_partition = > adev->gds.gws.gfx_partition_size >> AMDGPU_GWS_SHIFT; > -
Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
On 2018-09-14 01:52 PM, Christian König wrote: > Am 14.09.2018 um 19:47 schrieb Philip Yang: >> On 2018-09-14 03:51 AM, Christian König wrote: >>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling: On 2018-09-13 04:52 PM, Philip Yang wrote: > Replace our MMU notifier with > hmm_mirror_ops.sync_cpu_device_pagetables > callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in > DRM_AMDGPU_USERPTR Kconfig. > > It supports both KFD userptr and gfx userptr paths. > > This depends on several HMM patchset from Jérôme Glisse queued for > upstream. > > Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e > Signed-off-by: Philip Yang > --- > drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +- > drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 > ++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- > 4 files changed, 56 insertions(+), 75 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig > b/drivers/gpu/drm/amd/amdgpu/Kconfig > index 9221e54..960a633 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig > @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK > config DRM_AMDGPU_USERPTR > bool "Always enable userptr write support" > depends on DRM_AMDGPU > - select MMU_NOTIFIER > + select HMM_MIRROR > help > - This option selects CONFIG_MMU_NOTIFIER if it isn't already > - selected to enabled full userptr support. > + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it > + isn't already selected to enabled full userptr support. > config DRM_AMDGPU_GART_DEBUGFS > bool "Allow GART access through debugfs" > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile > b/drivers/gpu/drm/amd/amdgpu/Makefile > index 138cb78..c1e5d43 100644 > --- a/drivers/gpu/drm/amd/amdgpu/Makefile > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile > @@ -171,7 +171,7 @@ endif > amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o > amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o > amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o > -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o > +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o > include $(FULL_AMD_PATH)/powerplay/Makefile > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > index e55508b..ad52f34 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > @@ -45,7 +45,7 @@ > #include > #include > -#include > +#include > #include > #include > #include > @@ -66,6 +66,7 @@ Need to remove @mn documentation. > * @objects: interval tree containing amdgpu_mn_nodes > * @read_lock: mutex for recursive locking of @lock > * @recursion: depth of recursion > + * @mirror: HMM mirror function support > * > * Data for each amdgpu device and process address space. > */ > @@ -73,7 +74,6 @@ struct amdgpu_mn { > /* constant after initialisation */ > struct amdgpu_device *adev; > struct mm_struct *mm; > - struct mmu_notifier mn; > enum amdgpu_mn_type type; > /* only used on destruction */ > @@ -87,6 +87,9 @@ struct amdgpu_mn { > struct rb_root_cached objects; > struct mutex read_lock; > atomic_t recursion; > + > + /* HMM mirror */ > + struct hmm_mirror mirror; > }; > /** > @@ -103,7 +106,7 @@ struct amdgpu_mn_node { > }; > /** > - * amdgpu_mn_destroy - destroy the MMU notifier > + * amdgpu_mn_destroy - destroy the HMM mirror > * > * @work: previously sheduled work item > * > @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct > work_struct *work) > } > up_write(>lock); > mutex_unlock(>mn_lock); > - mmu_notifier_unregister_no_release(>mn, amn->mm); > + hmm_mirror_unregister(>mirror); > + > kfree(amn); > } > /** > * amdgpu_mn_release - callback to notify about mm destruction Update the function name in the comment. > * > - * @mn: our notifier > - * @mm: the mm this callback is about > + * @mirror: the HMM mirror (mm) this callback is about > * > - * Shedule a work item to lazy destroy our notifier. > + * Shedule a work item to lazy destroy HMM mirror. > */ > -static void amdgpu_mn_release(struct mmu_notifier *mn, > - struct mm_struct *mm) > +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) > { > - struct amdgpu_mn *amn =
Re: [PATCH 7/7] drm/amdgpu: move reserving GDS/GWS/OA into common code
On Fri, Sep 14, 2018 at 3:13 PM Christian König wrote: > > We don't need that in the per ASIC code. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++ > drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 19 --- > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 19 --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 19 --- > 4 files changed, 18 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 438390fce714..cf93a9831318 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1848,6 +1848,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > return r; > } > > + r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, > + PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, > + >gds.gds_gfx_bo, NULL, NULL); > + if (r) > + return r; > + > r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, >adev->gds.gws.total_size); > if (r) { > @@ -1855,6 +1861,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > return r; > } > > + r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, > + PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, > + >gds.gws_gfx_bo, NULL, NULL); > + if (r) > + return r; > + > r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, >adev->gds.oa.total_size); > if (r) { > @@ -1862,6 +1874,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > return r; > } > > + r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, > + PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, > + >gds.oa_gfx_bo, NULL, NULL); > + if (r) > + return r; > + > /* Register debugfs entries for amdgpu_ttm */ > r = amdgpu_ttm_debugfs_init(adev); > if (r) { > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > index c0f9732cbaf7..fc39ebbc9d9f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c > @@ -4582,25 +4582,6 @@ static int gfx_v7_0_sw_init(void *handle) > } > } > > - /* reserve GDS, GWS and OA resource for gfx */ > - r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, > - >gds.gds_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > - r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, > - >gds.gws_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > - r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, > - >gds.oa_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > adev->gfx.ce_ram_size = 0x8000; > > gfx_v7_0_gpu_early_init(adev); > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > index 57e4b14e3bd1..5d9fd2c2c244 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > @@ -2161,25 +2161,6 @@ static int gfx_v8_0_sw_init(void *handle) > if (r) > return r; > > - /* reserve GDS, GWS and OA resource for gfx */ > - r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, > - >gds.gds_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > - r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, > - >gds.gws_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > - r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, > - PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, > - >gds.oa_gfx_bo, NULL, NULL); > - if (r) > - return r; > - > adev->gfx.ce_ram_size = 0x8000; > > r = gfx_v8_0_gpu_early_init(adev); > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c > index d31a2bc00d61..c075c0b6bb2b 100644 > ---
Re: [PATCH 6/7] drm/amdgpu: drop size check
On Fri, Sep 14, 2018 at 3:13 PM Christian König wrote: > > We no don't allocate zero sized kernel BOs any longer. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 710e7751c567..438390fce714 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1809,14 +1809,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > * This is used for VGA emulation and pre-OS scanout buffers to > * avoid display artifacts while transitioning between pre-OS > * and driver. */ > - if (adev->gmc.stolen_size) { > - r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, > PAGE_SIZE, > - AMDGPU_GEM_DOMAIN_VRAM, > - >stolen_vga_memory, > - NULL, NULL); > - if (r) > - return r; > - } > + r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE, > + AMDGPU_GEM_DOMAIN_VRAM, > + >stolen_vga_memory, > + NULL, NULL); > + if (r) > + return r; > DRM_INFO("amdgpu: %uM of VRAM memory ready\n", > (unsigned) (adev->gmc.real_vram_size / (1024 * 1024))); > > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/7] drm/amdgpu: don't allocate zero sized kernel BOs
On Fri, Sep 14, 2018 at 3:12 PM Christian König wrote: > > Just free the BO if the size is should be zero. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index e1f32a196f6d..d282e923d1b4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -250,6 +250,11 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev, > bool free = false; > int r; > > + if (!size) { > + amdgpu_bo_unref(bo_ptr); > + return 0; > + } > + > memset(, 0, sizeof(bp)); > bp.size = size; > bp.byte_align = align; > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 4/7] drm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized
On Fri, Sep 14, 2018 at 3:12 PM Christian König wrote: > > Stops crashing on SI. > > Signed-off-by: Christian König Presumably ttm allows this? Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 48 > + > 1 file changed, 18 insertions(+), 30 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 3e450159fe1f..710e7751c567 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1843,34 +1843,25 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > (unsigned)(gtt_size / (1024 * 1024))); > > /* Initialize various on-chip memory pools */ > - /* GDS Memory */ > - if (adev->gds.mem.total_size) { > - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS, > - adev->gds.mem.total_size); > - if (r) { > - DRM_ERROR("Failed initializing GDS heap.\n"); > - return r; > - } > + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS, > + adev->gds.mem.total_size); > + if (r) { > + DRM_ERROR("Failed initializing GDS heap.\n"); > + return r; > } > > - /* GWS */ > - if (adev->gds.gws.total_size) { > - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, > - adev->gds.gws.total_size); > - if (r) { > - DRM_ERROR("Failed initializing gws heap.\n"); > - return r; > - } > + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, > + adev->gds.gws.total_size); > + if (r) { > + DRM_ERROR("Failed initializing gws heap.\n"); > + return r; > } > > - /* OA */ > - if (adev->gds.oa.total_size) { > - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, > - adev->gds.oa.total_size); > - if (r) { > - DRM_ERROR("Failed initializing oa heap.\n"); > - return r; > - } > + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, > + adev->gds.oa.total_size); > + if (r) { > + DRM_ERROR("Failed initializing oa heap.\n"); > + return r; > } > > /* Register debugfs entries for amdgpu_ttm */ > @@ -1907,12 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev) > > ttm_bo_clean_mm(>mman.bdev, TTM_PL_VRAM); > ttm_bo_clean_mm(>mman.bdev, TTM_PL_TT); > - if (adev->gds.mem.total_size) > - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS); > - if (adev->gds.gws.total_size) > - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS); > - if (adev->gds.oa.total_size) > - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA); > + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS); > + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS); > + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA); > ttm_bo_device_release(>mman.bdev); > amdgpu_ttm_global_fini(adev); > adev->mman.initialized = false; > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/7] drm/amdgpu: stop crashing on GDS/GWS/OA eviction
On Fri, Sep 14, 2018 at 3:13 PM Christian König wrote: > > Simply ignore any copying here. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index c691275cd1f0..3e450159fe1f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -256,6 +256,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object > *bo, > > abo = ttm_to_amdgpu_bo(bo); > switch (bo->mem.mem_type) { > + case AMDGPU_PL_GDS: > + case AMDGPU_PL_GWS: > + case AMDGPU_PL_OA: > + placement->num_placement = 0; > + placement->num_busy_placement = 0; > + return; > + > case TTM_PL_VRAM: > if (!adev->mman.buffer_funcs_enabled) { > /* Move to system memory */ > @@ -283,6 +290,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object > *bo, > case TTM_PL_TT: > default: > amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); > + break; > } > *placement = abo->placement; > } > @@ -675,6 +683,16 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, > bool evict, > amdgpu_move_null(bo, new_mem); > return 0; > } > + if (old_mem->mem_type == AMDGPU_PL_GDS || > + old_mem->mem_type == AMDGPU_PL_GWS || > + old_mem->mem_type == AMDGPU_PL_OA || > + new_mem->mem_type == AMDGPU_PL_GDS || > + new_mem->mem_type == AMDGPU_PL_GWS || > + new_mem->mem_type == AMDGPU_PL_OA) { > + /* Nothing to save here */ > + amdgpu_move_null(bo, new_mem); > + return 0; > + } > > if (!adev->mman.buffer_funcs_enabled) > goto memcpy; > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/7] drm/amdgpu: add GDS, GWS and OA debugfs files
On Fri, Sep 14, 2018 at 3:12 PM Christian König wrote: > > Additional to the existing files for VRAM and GTT. > > Signed-off-by: Christian König Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index f12ae6b525b9..1565344cc139 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -2208,7 +2208,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, > static int amdgpu_mm_dump_table(struct seq_file *m, void *data) > { > struct drm_info_node *node = (struct drm_info_node *)m->private; > - unsigned ttm_pl = *(int *)node->info_ent->data; > + unsigned ttm_pl = (uintptr_t)node->info_ent->data; > struct drm_device *dev = node->minor->dev; > struct amdgpu_device *adev = dev->dev_private; > struct ttm_mem_type_manager *man = >mman.bdev.man[ttm_pl]; > @@ -2218,12 +2218,12 @@ static int amdgpu_mm_dump_table(struct seq_file *m, > void *data) > return 0; > } > > -static int ttm_pl_vram = TTM_PL_VRAM; > -static int ttm_pl_tt = TTM_PL_TT; > - > static const struct drm_info_list amdgpu_ttm_debugfs_list[] = { > - {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, _pl_vram}, > - {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, _pl_tt}, > + {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_VRAM}, > + {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_TT}, > + {"amdgpu_gds_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GDS}, > + {"amdgpu_gws_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GWS}, > + {"amdgpu_oa_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_OA}, > {"ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL}, > #ifdef CONFIG_SWIOTLB > {"ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL} > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 5/7] drm/amdgpu: don't allocate zero sized kernel BOs
Just free the BO if the size is should be zero. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e1f32a196f6d..d282e923d1b4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -250,6 +250,11 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev, bool free = false; int r; + if (!size) { + amdgpu_bo_unref(bo_ptr); + return 0; + } + memset(, 0, sizeof(bp)); bp.size = size; bp.byte_align = align; -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 7/7] drm/amdgpu: move reserving GDS/GWS/OA into common code
We don't need that in the per ASIC code. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++ drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 19 --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 19 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 19 --- 4 files changed, 18 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 438390fce714..cf93a9831318 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1848,6 +1848,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) return r; } + r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, + PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, + >gds.gds_gfx_bo, NULL, NULL); + if (r) + return r; + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, adev->gds.gws.total_size); if (r) { @@ -1855,6 +1861,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) return r; } + r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, + PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, + >gds.gws_gfx_bo, NULL, NULL); + if (r) + return r; + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, adev->gds.oa.total_size); if (r) { @@ -1862,6 +1874,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) return r; } + r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, + PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, + >gds.oa_gfx_bo, NULL, NULL); + if (r) + return r; + /* Register debugfs entries for amdgpu_ttm */ r = amdgpu_ttm_debugfs_init(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index c0f9732cbaf7..fc39ebbc9d9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -4582,25 +4582,6 @@ static int gfx_v7_0_sw_init(void *handle) } } - /* reserve GDS, GWS and OA resource for gfx */ - r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, - >gds.gds_gfx_bo, NULL, NULL); - if (r) - return r; - - r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, - >gds.gws_gfx_bo, NULL, NULL); - if (r) - return r; - - r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, - >gds.oa_gfx_bo, NULL, NULL); - if (r) - return r; - adev->gfx.ce_ram_size = 0x8000; gfx_v7_0_gpu_early_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 57e4b14e3bd1..5d9fd2c2c244 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -2161,25 +2161,6 @@ static int gfx_v8_0_sw_init(void *handle) if (r) return r; - /* reserve GDS, GWS and OA resource for gfx */ - r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GDS, - >gds.gds_gfx_bo, NULL, NULL); - if (r) - return r; - - r = amdgpu_bo_create_kernel(adev, adev->gds.gws.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_GWS, - >gds.gws_gfx_bo, NULL, NULL); - if (r) - return r; - - r = amdgpu_bo_create_kernel(adev, adev->gds.oa.gfx_partition_size, - PAGE_SIZE, AMDGPU_GEM_DOMAIN_OA, - >gds.oa_gfx_bo, NULL, NULL); - if (r) - return r; - adev->gfx.ce_ram_size = 0x8000; r = gfx_v8_0_gpu_early_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index d31a2bc00d61..c075c0b6bb2b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1700,25 +1700,6 @@ static int gfx_v9_0_sw_init(void *handle) if (r) return r; - /* reserve GDS, GWS and OA resource for gfx */ - r = amdgpu_bo_create_kernel(adev, adev->gds.mem.gfx_partition_size, -
[PATCH 3/7] drm/amdgpu: stop crashing on GDS/GWS/OA eviction
Simply ignore any copying here. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c691275cd1f0..3e450159fe1f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -256,6 +256,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, abo = ttm_to_amdgpu_bo(bo); switch (bo->mem.mem_type) { + case AMDGPU_PL_GDS: + case AMDGPU_PL_GWS: + case AMDGPU_PL_OA: + placement->num_placement = 0; + placement->num_busy_placement = 0; + return; + case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ @@ -283,6 +290,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case TTM_PL_TT: default: amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + break; } *placement = abo->placement; } @@ -675,6 +683,16 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, amdgpu_move_null(bo, new_mem); return 0; } + if (old_mem->mem_type == AMDGPU_PL_GDS || + old_mem->mem_type == AMDGPU_PL_GWS || + old_mem->mem_type == AMDGPU_PL_OA || + new_mem->mem_type == AMDGPU_PL_GDS || + new_mem->mem_type == AMDGPU_PL_GWS || + new_mem->mem_type == AMDGPU_PL_OA) { + /* Nothing to save here */ + amdgpu_move_null(bo, new_mem); + return 0; + } if (!adev->mman.buffer_funcs_enabled) goto memcpy; -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 6/7] drm/amdgpu: drop size check
We no don't allocate zero sized kernel BOs any longer. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 710e7751c567..438390fce714 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1809,14 +1809,12 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) * This is used for VGA emulation and pre-OS scanout buffers to * avoid display artifacts while transitioning between pre-OS * and driver. */ - if (adev->gmc.stolen_size) { - r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE, - AMDGPU_GEM_DOMAIN_VRAM, - >stolen_vga_memory, - NULL, NULL); - if (r) - return r; - } + r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_VRAM, + >stolen_vga_memory, + NULL, NULL); + if (r) + return r; DRM_INFO("amdgpu: %uM of VRAM memory ready\n", (unsigned) (adev->gmc.real_vram_size / (1024 * 1024))); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4/7] drm/amdgpu: initialize GDS/GWS/OA domains even when they are zero sized
Stops crashing on SI. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 48 + 1 file changed, 18 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 3e450159fe1f..710e7751c567 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1843,34 +1843,25 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) (unsigned)(gtt_size / (1024 * 1024))); /* Initialize various on-chip memory pools */ - /* GDS Memory */ - if (adev->gds.mem.total_size) { - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS, - adev->gds.mem.total_size); - if (r) { - DRM_ERROR("Failed initializing GDS heap.\n"); - return r; - } + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GDS, + adev->gds.mem.total_size); + if (r) { + DRM_ERROR("Failed initializing GDS heap.\n"); + return r; } - /* GWS */ - if (adev->gds.gws.total_size) { - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, - adev->gds.gws.total_size); - if (r) { - DRM_ERROR("Failed initializing gws heap.\n"); - return r; - } + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_GWS, + adev->gds.gws.total_size); + if (r) { + DRM_ERROR("Failed initializing gws heap.\n"); + return r; } - /* OA */ - if (adev->gds.oa.total_size) { - r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, - adev->gds.oa.total_size); - if (r) { - DRM_ERROR("Failed initializing oa heap.\n"); - return r; - } + r = ttm_bo_init_mm(>mman.bdev, AMDGPU_PL_OA, + adev->gds.oa.total_size); + if (r) { + DRM_ERROR("Failed initializing oa heap.\n"); + return r; } /* Register debugfs entries for amdgpu_ttm */ @@ -1907,12 +1898,9 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev) ttm_bo_clean_mm(>mman.bdev, TTM_PL_VRAM); ttm_bo_clean_mm(>mman.bdev, TTM_PL_TT); - if (adev->gds.mem.total_size) - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS); - if (adev->gds.gws.total_size) - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS); - if (adev->gds.oa.total_size) - ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA); + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GDS); + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_GWS); + ttm_bo_clean_mm(>mman.bdev, AMDGPU_PL_OA); ttm_bo_device_release(>mman.bdev); amdgpu_ttm_global_fini(adev); adev->mman.initialized = false; -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/7] drm/amdgpu: add GDS, GWS and OA debugfs files
Additional to the existing files for VRAM and GTT. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index f12ae6b525b9..1565344cc139 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2208,7 +2208,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, static int amdgpu_mm_dump_table(struct seq_file *m, void *data) { struct drm_info_node *node = (struct drm_info_node *)m->private; - unsigned ttm_pl = *(int *)node->info_ent->data; + unsigned ttm_pl = (uintptr_t)node->info_ent->data; struct drm_device *dev = node->minor->dev; struct amdgpu_device *adev = dev->dev_private; struct ttm_mem_type_manager *man = >mman.bdev.man[ttm_pl]; @@ -2218,12 +2218,12 @@ static int amdgpu_mm_dump_table(struct seq_file *m, void *data) return 0; } -static int ttm_pl_vram = TTM_PL_VRAM; -static int ttm_pl_tt = TTM_PL_TT; - static const struct drm_info_list amdgpu_ttm_debugfs_list[] = { - {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, _pl_vram}, - {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, _pl_tt}, + {"amdgpu_vram_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_VRAM}, + {"amdgpu_gtt_mm", amdgpu_mm_dump_table, 0, (void *)TTM_PL_TT}, + {"amdgpu_gds_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GDS}, + {"amdgpu_gws_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_GWS}, + {"amdgpu_oa_mm", amdgpu_mm_dump_table, 0, (void *)AMDGPU_PL_OA}, {"ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL}, #ifdef CONFIG_SWIOTLB {"ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL} -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/7] drm/amdgpu: fix up GDS/GWS/OA shifting
That only worked by pure coincident. Completely remove the shifting and always apply correct PAGE_SHIFT. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h| 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 14 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 15 +++ drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 9 - drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 9 - drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 12 +--- 9 files changed, 25 insertions(+), 71 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d762d78e5102..8836186eb5ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -721,16 +721,16 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, e->bo_va = amdgpu_vm_bo_find(vm, ttm_to_amdgpu_bo(e->tv.bo)); if (gds) { - p->job->gds_base = amdgpu_bo_gpu_offset(gds); - p->job->gds_size = amdgpu_bo_size(gds); + p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT; + p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT; } if (gws) { - p->job->gws_base = amdgpu_bo_gpu_offset(gws); - p->job->gws_size = amdgpu_bo_size(gws); + p->job->gws_base = amdgpu_bo_gpu_offset(gws) >> PAGE_SHIFT; + p->job->gws_size = amdgpu_bo_size(gws) >> PAGE_SHIFT; } if (oa) { - p->job->oa_base = amdgpu_bo_gpu_offset(oa); - p->job->oa_size = amdgpu_bo_size(oa); + p->job->oa_base = amdgpu_bo_gpu_offset(oa) >> PAGE_SHIFT; + p->job->oa_size = amdgpu_bo_size(oa) >> PAGE_SHIFT; } if (!r && p->uf_entry.tv.bo) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h index e73728d90388..ecbcefe49a98 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h @@ -24,13 +24,6 @@ #ifndef __AMDGPU_GDS_H__ #define __AMDGPU_GDS_H__ -/* Because TTM request that alloacted buffer should be PAGE_SIZE aligned, - * we should report GDS/GWS/OA size as PAGE_SIZE aligned - * */ -#define AMDGPU_GDS_SHIFT 2 -#define AMDGPU_GWS_SHIFT PAGE_SHIFT -#define AMDGPU_OA_SHIFTPAGE_SHIFT - struct amdgpu_ring; struct amdgpu_bo; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index d30a0838851b..7b3d1ebda9df 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -244,16 +244,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, return -EINVAL; } flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS; - if (args->in.domains == AMDGPU_GEM_DOMAIN_GDS) - size = size << AMDGPU_GDS_SHIFT; - else if (args->in.domains == AMDGPU_GEM_DOMAIN_GWS) - size = size << AMDGPU_GWS_SHIFT; - else if (args->in.domains == AMDGPU_GEM_DOMAIN_OA) - size = size << AMDGPU_OA_SHIFT; - else - return -EINVAL; + /* GDS allocations must be DW aligned */ + if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS) + size = ALIGN(size, 4); } - size = roundup(size, PAGE_SIZE); if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) { r = amdgpu_bo_reserve(vm->root.base.bo, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b766270d86cb..64cc483db973 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -528,13 +528,13 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file struct drm_amdgpu_info_gds gds_info; memset(_info, 0, sizeof(gds_info)); - gds_info.gds_gfx_partition_size = adev->gds.mem.gfx_partition_size >> AMDGPU_GDS_SHIFT; - gds_info.compute_partition_size = adev->gds.mem.cs_partition_size >> AMDGPU_GDS_SHIFT; - gds_info.gds_total_size = adev->gds.mem.total_size >> AMDGPU_GDS_SHIFT; - gds_info.gws_per_gfx_partition = adev->gds.gws.gfx_partition_size >> AMDGPU_GWS_SHIFT; - gds_info.gws_per_compute_partition = adev->gds.gws.cs_partition_size >> AMDGPU_GWS_SHIFT; - gds_info.oa_per_gfx_partition = adev->gds.oa.gfx_partition_size >> AMDGPU_OA_SHIFT; - gds_info.oa_per_compute_partition = adev->gds.oa.cs_partition_size >>
Re: [PATCH] [RFC]drm: add syncobj timeline support v5
Am 14.09.2018 um 20:24 schrieb Daniel Vetter: On Fri, Sep 14, 2018 at 6:43 PM, Christian König wrote: Am 14.09.2018 um 18:10 schrieb Daniel Vetter: On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote: Am 14.09.2018 um 12:37 schrieb Chunming Zhou: This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter At least on first glance that looks like it should work, going to do a detailed review on Monday. Just for my understanding, it's all condensed down to 1 patch now? I kinda didn't follow the detailed discussion last few days at all :-/ I've already committed all the cleanup/fix prerequisites to drm-misc-next. The driver specific implementation needs to come on top and maybe a new CPU wait IOCTL. But essentially this patch is just the core of the kernel implementation. Ah cool, missed that. Also, is there a testcase, igt highly preferred (because then we'll run it in our intel-gfx CI, and a bunch of people outside of intel have already discovered that and are using it). libdrm patches and I think amdgpu based test cases where already published as well. Not sure about igt testcases. I guess we can write them when the intel implementation shows up. Just kinda still hoping that we'd have a more unfified test suite. And not really well-kept secret: We do have an amdgpu in our CI, in the form of kbl-g :-) But unfortunately it's not running the full test set for patches (only for drm-tip). But we could perhaps run more of the amdgpu tests somehow, if there's serious interest. Well I wouldn't mind if we sooner or later get rid of the amdgpu unit tests in libdrm. They are more or less just a really bloody mess. Christian. Cheers, Daniel Christian. Thanks, Daniel Christian. --- drivers/gpu/drm/drm_syncobj.c | 294 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 62 +++-- include/uapi/drm/drm.h | 1 + 4 files changed, 292 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e9ce623d049e..e78d076f2703 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++
Re: [PATCH] [RFC]drm: add syncobj timeline support v5
On Fri, Sep 14, 2018 at 6:43 PM, Christian König wrote: > Am 14.09.2018 um 18:10 schrieb Daniel Vetter: >> >> On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote: >>> >>> Am 14.09.2018 um 12:37 schrieb Chunming Zhou: This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter >>> >>> At least on first glance that looks like it should work, going to do a >>> detailed review on Monday. >> >> Just for my understanding, it's all condensed down to 1 patch now? I kinda >> didn't follow the detailed discussion last few days at all :-/ > > > I've already committed all the cleanup/fix prerequisites to drm-misc-next. > > The driver specific implementation needs to come on top and maybe a new CPU > wait IOCTL. > > But essentially this patch is just the core of the kernel implementation. Ah cool, missed that. >> Also, is there a testcase, igt highly preferred (because then we'll run it >> in our intel-gfx CI, and a bunch of people outside of intel have already >> discovered that and are using it). > > > libdrm patches and I think amdgpu based test cases where already published > as well. > > Not sure about igt testcases. I guess we can write them when the intel implementation shows up. Just kinda still hoping that we'd have a more unfified test suite. And not really well-kept secret: We do have an amdgpu in our CI, in the form of kbl-g :-) But unfortunately it's not running the full test set for patches (only for drm-tip). But we could perhaps run more of the amdgpu tests somehow, if there's serious interest. Cheers, Daniel > Christian. > > >> >> Thanks, Daniel >> >>> Christian. >>> --- drivers/gpu/drm/drm_syncobj.c | 294 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 62 +++--
Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
Am 14.09.2018 um 19:47 schrieb Philip Yang: On 2018-09-14 03:51 AM, Christian König wrote: Am 13.09.2018 um 23:51 schrieb Felix Kuehling: On 2018-09-13 04:52 PM, Philip Yang wrote: Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. This depends on several HMM patchset from Jérôme Glisse queued for upstream. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- 4 files changed, 56 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e54..960a633 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK config DRM_AMDGPU_USERPTR bool "Always enable userptr write support" depends on DRM_AMDGPU - select MMU_NOTIFIER + select HMM_MIRROR help - This option selects CONFIG_MMU_NOTIFIER if it isn't already - selected to enabled full userptr support. + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it + isn't already selected to enabled full userptr support. config DRM_AMDGPU_GART_DEBUGFS bool "Allow GART access through debugfs" diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 138cb78..c1e5d43 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -171,7 +171,7 @@ endif amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o include $(FULL_AMD_PATH)/powerplay/Makefile diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index e55508b..ad52f34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -45,7 +45,7 @@ #include #include -#include +#include #include #include #include @@ -66,6 +66,7 @@ Need to remove @mn documentation. * @objects: interval tree containing amdgpu_mn_nodes * @read_lock: mutex for recursive locking of @lock * @recursion: depth of recursion + * @mirror: HMM mirror function support * * Data for each amdgpu device and process address space. */ @@ -73,7 +74,6 @@ struct amdgpu_mn { /* constant after initialisation */ struct amdgpu_device *adev; struct mm_struct *mm; - struct mmu_notifier mn; enum amdgpu_mn_type type; /* only used on destruction */ @@ -87,6 +87,9 @@ struct amdgpu_mn { struct rb_root_cached objects; struct mutex read_lock; atomic_t recursion; + + /* HMM mirror */ + struct hmm_mirror mirror; }; /** @@ -103,7 +106,7 @@ struct amdgpu_mn_node { }; /** - * amdgpu_mn_destroy - destroy the MMU notifier + * amdgpu_mn_destroy - destroy the HMM mirror * * @work: previously sheduled work item * @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work) } up_write(>lock); mutex_unlock(>mn_lock); - mmu_notifier_unregister_no_release(>mn, amn->mm); + hmm_mirror_unregister(>mirror); + kfree(amn); } /** * amdgpu_mn_release - callback to notify about mm destruction Update the function name in the comment. * - * @mn: our notifier - * @mm: the mm this callback is about + * @mirror: the HMM mirror (mm) this callback is about * - * Shedule a work item to lazy destroy our notifier. + * Shedule a work item to lazy destroy HMM mirror. */ -static void amdgpu_mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn); + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(>work, amdgpu_mn_destroy); schedule_work(>work); } - /** * amdgpu_mn_lock - take the write side lock for this notifier * @@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node, /** * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change * - * @mn: our notifier - * @mm: the mm this callback is about - * @start: start of updated range - * @end: end of updated range + * @mirror: the hmm_mirror (mm) is about to update + * @update: the update start, end address * * Block for operations on BOs to finish and mark pages as accessed and * potentially dirty.
Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
On 2018-09-14 03:51 AM, Christian König wrote: Am 13.09.2018 um 23:51 schrieb Felix Kuehling: On 2018-09-13 04:52 PM, Philip Yang wrote: Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. This depends on several HMM patchset from Jérôme Glisse queued for upstream. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- 4 files changed, 56 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e54..960a633 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK config DRM_AMDGPU_USERPTR bool "Always enable userptr write support" depends on DRM_AMDGPU - select MMU_NOTIFIER + select HMM_MIRROR help - This option selects CONFIG_MMU_NOTIFIER if it isn't already - selected to enabled full userptr support. + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it + isn't already selected to enabled full userptr support. config DRM_AMDGPU_GART_DEBUGFS bool "Allow GART access through debugfs" diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 138cb78..c1e5d43 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -171,7 +171,7 @@ endif amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o include $(FULL_AMD_PATH)/powerplay/Makefile diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index e55508b..ad52f34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -45,7 +45,7 @@ #include #include -#include +#include #include #include #include @@ -66,6 +66,7 @@ Need to remove @mn documentation. * @objects: interval tree containing amdgpu_mn_nodes * @read_lock: mutex for recursive locking of @lock * @recursion: depth of recursion + * @mirror: HMM mirror function support * * Data for each amdgpu device and process address space. */ @@ -73,7 +74,6 @@ struct amdgpu_mn { /* constant after initialisation */ struct amdgpu_device *adev; struct mm_struct *mm; - struct mmu_notifier mn; enum amdgpu_mn_type type; /* only used on destruction */ @@ -87,6 +87,9 @@ struct amdgpu_mn { struct rb_root_cached objects; struct mutex read_lock; atomic_t recursion; + + /* HMM mirror */ + struct hmm_mirror mirror; }; /** @@ -103,7 +106,7 @@ struct amdgpu_mn_node { }; /** - * amdgpu_mn_destroy - destroy the MMU notifier + * amdgpu_mn_destroy - destroy the HMM mirror * * @work: previously sheduled work item * @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work) } up_write(>lock); mutex_unlock(>mn_lock); - mmu_notifier_unregister_no_release(>mn, amn->mm); + hmm_mirror_unregister(>mirror); + kfree(amn); } /** * amdgpu_mn_release - callback to notify about mm destruction Update the function name in the comment. * - * @mn: our notifier - * @mm: the mm this callback is about + * @mirror: the HMM mirror (mm) this callback is about * - * Shedule a work item to lazy destroy our notifier. + * Shedule a work item to lazy destroy HMM mirror. */ -static void amdgpu_mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn); + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(>work, amdgpu_mn_destroy); schedule_work(>work); } - /** * amdgpu_mn_lock - take the write side lock for this notifier * @@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node, /** * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change * - * @mn: our notifier - * @mm: the mm this callback is about - * @start: start of updated range - * @end: end of updated range + * @mirror: the hmm_mirror (mm) is about to update + * @update: the update start, end address * * Block for operations on BOs to finish and mark pages as accessed and * potentially dirty. */ -static int
Re: [PATCH] [RFC]drm: add syncobj timeline support v5
Am 14.09.2018 um 18:10 schrieb Daniel Vetter: On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote: Am 14.09.2018 um 12:37 schrieb Chunming Zhou: This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter At least on first glance that looks like it should work, going to do a detailed review on Monday. Just for my understanding, it's all condensed down to 1 patch now? I kinda didn't follow the detailed discussion last few days at all :-/ I've already committed all the cleanup/fix prerequisites to drm-misc-next. The driver specific implementation needs to come on top and maybe a new CPU wait IOCTL. But essentially this patch is just the core of the kernel implementation. Also, is there a testcase, igt highly preferred (because then we'll run it in our intel-gfx CI, and a bunch of people outside of intel have already discovered that and are using it). libdrm patches and I think amdgpu based test cases where already published as well. Not sure about igt testcases. Christian. Thanks, Daniel Christian. --- drivers/gpu/drm/drm_syncobj.c | 294 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 62 +++-- include/uapi/drm/drm.h | 1 + 4 files changed, 292 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e9ce623d049e..e78d076f2703 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,9 @@either #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + struct drm_syncobj_stub_fence { struct dma_fence base; spinlock_t lock; @@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { .release = drm_syncobj_stub_fence_release, }; +struct drm_syncobj_signal_pt { + struct dma_fence_array *base; + u64value; + struct list_head list; +}; /** * drm_syncobj_find - lookup and reference a sync object. @@ -124,7 +132,7 @@ static int
Re: [PATCH] [RFC]drm: add syncobj timeline support v5
On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote: > Am 14.09.2018 um 12:37 schrieb Chunming Zhou: > > This patch is for VK_KHR_timeline_semaphore extension, semaphore is called > > syncobj in kernel side: > > This extension introduces a new type of syncobj that has an integer payload > > identifying a point in a timeline. Such timeline syncobjs support the > > following operations: > > * CPU query - A host operation that allows querying the payload of the > > timeline syncobj. > > * CPU wait - A host operation that allows a blocking wait for a > > timeline syncobj to reach a specified value. > > * Device wait - A device operation that allows waiting for a > > timeline syncobj to reach a specified value. > > * Device signal - A device operation that allows advancing the > > timeline syncobj to a specified value. > > > > Since it's a timeline, that means the front time point(PT) always is > > signaled before the late PT. > > a. signal PT design: > > Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when > > PT[N] fence is signaled, > > the timeline will increase to value of PT[N]. > > b. wait PT design: > > Wait PT fence is signaled by reaching timeline point value, when timeline > > is increasing, will compare > > wait PTs value with new timeline value, if PT value is lower than timeline > > value, then wait PT will be > > signaled, otherwise keep in list. syncobj wait operation can wait on any > > point of timeline, > > so need a RB tree to order them. And wait PT could ahead of signal PT, we > > need a sumission fence to > > perform that. > > > > v2: > > 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) > > 2. move unexposed denitions to .c file. (Daniel Vetter) > > 3. split up the change to drm_syncobj_find_fence() in a separate patch. > > (Christian) > > 4. split up the change to drm_syncobj_replace_fence() in a separate patch. > > 5. drop the submission_fence implementation and instead use wait_event() > > for that. (Christian) > > 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) > > > > v3: > > 1. replace normal syncobj with timeline implemenation. (Vetter and > > Christian) > > a. normal syncobj signal op will create a signal PT to tail of signal > > pt list. > > b. normal syncobj wait op will create a wait pt with last signal > > point, and this wait PT is only signaled by related signal point PT. > > 2. many bug fix and clean up > > 3. stub fence moving is moved to other patch. > > > > v4: > > 1. fix RB tree loop with while(node=rb_first(...)). (Christian) > > 2. fix syncobj lifecycle. (Christian) > > 3. only enable_signaling when there is wait_pt. (Christian) > > 4. fix timeline path issues. > > 5. write a timeline test in libdrm > > > > v5: (Christian) > > 1. semaphore is called syncobj in kernel side. > > 2. don't need 'timeline' characters in some function name. > > 3. keep syncobj cb > > > > normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* > > timeline syncobj is tested by ./amdgpu_test -s 9 > > > > Signed-off-by: Chunming Zhou > > Cc: Christian Konig > > Cc: Dave Airlie > > Cc: Daniel Rakos > > Cc: Daniel Vetter > > At least on first glance that looks like it should work, going to do a > detailed review on Monday. Just for my understanding, it's all condensed down to 1 patch now? I kinda didn't follow the detailed discussion last few days at all :-/ Also, is there a testcase, igt highly preferred (because then we'll run it in our intel-gfx CI, and a bunch of people outside of intel have already discovered that and are using it). Thanks, Daniel > > Christian. > > > --- > > drivers/gpu/drm/drm_syncobj.c | 294 ++--- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- > > include/drm/drm_syncobj.h | 62 +++-- > > include/uapi/drm/drm.h | 1 + > > 4 files changed, 292 insertions(+), 69 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c > > index e9ce623d049e..e78d076f2703 100644 > > --- a/drivers/gpu/drm/drm_syncobj.c > > +++ b/drivers/gpu/drm/drm_syncobj.c > > @@ -56,6 +56,9 @@either > > #include "drm_internal.h" > > #include > > +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ > > +#define DRM_SYNCOBJ_NORMAL_POINT 1 > > + > > struct drm_syncobj_stub_fence { > > struct dma_fence base; > > spinlock_t lock; > > @@ -82,6 +85,11 @@ static const struct dma_fence_ops > > drm_syncobj_stub_fence_ops = { > > .release = drm_syncobj_stub_fence_release, > > }; > > +struct drm_syncobj_signal_pt { > > + struct dma_fence_array *base; > > + u64value; > > + struct list_head list; > > +}; > > /** > >* drm_syncobj_find - lookup and reference a sync object. > > @@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct > > drm_syncobj *syncobj, > > { > >
[ANNOUNCE] xf86-video-ati 18.1.0
I'm pleased to announce the 18.1.0 release of xf86-video-ati, the Xorg driver for ATI/AMD Radeon GPUs supported by the radeon kernel driver. This release supports xserver versions 1.13-1.20. Highlights: * Fixed random screen corruption and crashes when using GLAMOR with Xorg 1.20. * Support for leasing RandR outputs to clients. * Various robustness fixes for TearFree. In particular, fixed several cases in which disabling TearFree at runtime would result in the Xorg process freezing or crashing. * Fixed some m4 related build issues with older versions of autotools. Plus other improvements and fixes. Thanks to everybody who contributed to this release in any way! Emil Velikov (1): Do not export the DriverRec RADEON Jammy Zhou (1): Remove throttling from radeon_dri2_copy_region2 Jim Qu (1): Wait for pending scanout update before calling drmmode_crtc_scanout_free Keith Packard (3): modesetting: Record non-desktop kernel property at PreInit time modesetting: Create CONNECTOR_ID properties for outputs [v2] Add RandR leases support Michel Dänzer (55): Bail from dri2_create_buffer2 if we can't get a pixmap glamor: Bail CreatePixmap on unsupported pixmap depth Drop unused drmmode_create_bo_pixmap surface parameter EXA: Remove old RADEONEXACreatePixmap hook Only initialize libdrm_radeon surface manager for >= R600 glamor: Don't store radeon_surfaces in pixmaps Factor out radeon_surface_initialize helper Move flush from radeon_scanout_do_update to its callers Refactor radeon_finish helper Add struct radeon_buffer glamor: Use GBM for BO allocation when possible Swap pixmap privates in radeon_dri2_exchange_buffers Ignore RADEON_DRM_QUEUE_ERROR (0) in radeon_drm_abort_entry Track DRM event queue sequence number in scanout_update_pending Abort scanout_update_pending event when possible Update RandR CRTC state if set_mode_major fails in set_desired_modes Simplify drmmode_crtc_scanout_update Don't call scanout_flip/update with a legacy RandR scanout buffer Simplify drmmode_handle_transform Set drmmode_crtc->scanout_id = 0 when TearFree is disabled Refactor drmmode_output_set_tear_free helper Wait for pending flips in drmmode_output_set_tear_free Replace 'foo == NULL' with '!foo' Call drmmode_do_crtc_dpms from drmmode_crtc_dpms as well Use drmmode_crtc_dpms in drmmode_set_desired_modes Check dimensions passed to drmmode_xf86crtc_resize Remove #if 0'd code Call drmmode_crtc_gamma_do_set from drmmode_setup_colormap glamor: Fix glamor_block_handler argument in radeon_glamor_finish glamor: Invalidate cached GEM handle in radeon_set_pixmap_bo Don't allocate drmmode_output->props twice Hardcode "non-desktop" RandR property name Remove drmmode_terminate_leases Use strcpy for RandR output property names Bump version to 18.0.99 glamor: Use glamor_egl_create_textured_pixmap_from_gbm_bo when possible glamor: Set RADEON_CREATE_PIXMAP_DRI2 for DRI3 pixmaps Store FB for each CRTC in drmmode_flipdata_rec Use correct FB handle in radeon_do_pageflip Move DRM event queue related initialization to radeon_drm_queue_init Add radeon_drm_wait_pending_flip function Add radeon_drm_handle_event wrapper for drmHandleEvent Defer vblank event handling while waiting for a pending flip Remove drmmode_crtc_private_rec::present_vblank_* related code Add m4 directory Use AC_CONFIG_MACRO_DIR instead of AC_CONFIG_MACRO_DIRS EXA: Handle NULL BO pointer in radeon_set_pixmap_bo Handle ihandle == -1 in radeon_set_shared_pixmap_backing EXA: Handle ihandle == -1 in RADEONEXASharePixmapBacking glamor: Handle ihandle == -1 in radeon_glamor_set_shared_pixmap_backing Always delete entry from list in drm_queue_handler Don't use xorg_list_for_each_entry_safe for signalled flips Bail early from drm_wait_pending_flip if there's no pending flip Fix uninitialized use of local variable pitch in radeon_setup_kernel_mem Bump version for 18.1.0 release git tag: xf86-video-ati-18.1.0 https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.bz2 MD5: 7910883fff7f4a462efac0fe059ed7e3 xf86-video-ati-18.1.0.tar.bz2 SHA1: 87beb7d09f5b722570adda9a5a1822cbd19e7059 xf86-video-ati-18.1.0.tar.bz2 SHA256: 6c335f423c1dc3d904550d41cb871ca4130ba7037dda67d82e3f1555e1bfb9ac xf86-video-ati-18.1.0.tar.bz2 SHA512: 7a58c9a6cb4876bd2ff37d837372b4e360e81fec7de6a6c7a48d70a5338d62745f734f5d4207f30aa368ff2d9ef44f5f1ef36afd73802a618998c16fe395ed53 xf86-video-ati-18.1.0.tar.bz2 PGP: https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.bz2.sig https://xorg.freedesktop.org/archive/individual/driver/xf86-video-ati-18.1.0.tar.gz MD5: 1c87fce3ebf10a0704a01433bfbf
[pull] amdgpu/kfd, radeon, ttm, scheduler drm-next-4.20
Hi Dave, First pull for 4.20 for amdgpu/kfd, radeon, ttm, and the GPU scheduler. amdgpu/kfd: - Picasso (new APU) support - Raven2 (new APU) support - Vega20 enablement - ACP powergating improvements - Add ABGR/XBGR display support - VCN JPEG engine support - Initial xGMI support - Use load balancing for engine scheduling - Lots of new documentation - Rework and clean up i2c and aux handling in DC - Add DP YCbCr 4:2:0 support in DC - Add DMCU firmware loading for Raven (used for ABM and PSR) - New debugfs features in DC - LVDS support in DC - Implement wave kill for gfx/compute (light weight reset for shaders) - Use AGP aperture to avoid gart mappings when possible - GPUVM performance improvements - Bulk moves for more efficient GPUVM LRU handling - Merge amdgpu and amdkfd into one module - Enable gfxoff and stutter mode on Raven - Misc cleanups Scheduler: - Load balancing support - Bug fixes ttm: - Bulk move functionality - Bug fixes radeon: - Misc cleanups The following changes since commit 5b394b2ddf0347bef56e50c69a58773c94343ff3: Linux 4.19-rc1 (2018-08-26 14:11:59 -0700) are available in the git repository at: git://people.freedesktop.org/~agd5f/linux drm-next-4.20 for you to fetch changes up to 0957dc7097a3f462f6cedb45cf9b9785cc29e5bb: drm/amdgpu: revert "stop using gart_start as offset for the GTT domain" (2018-09-14 10:05:42 -0500) Alex Deucher (22): drm/amdgpu/pp: endian fixes for process_pptables_v1_0.c drm/amdgpu/pp: endian fixes for processpptables.c drm/amdgpu/powerplay: check vrefresh when when changing displays drm/amdgpu: add AVFS control to PP_FEATURE_MASK drm/amdgpu/powerplay/smu7: enable AVFS control via ppfeaturemask drm/amdgpu/powerplay/vega10: enable AVFS control via ppfeaturemask Revert "drm/amdgpu: Add nbio support for vega20 (v2)" drm/amdgpu: remove experimental flag for vega20 drm/amdgpu/display: add support for LVDS (v5) drm/amdgpu: add missing CHIP_HAINAN in amdgpu_ucode_get_load_type drm/amdgpu/gmc9: rework stolen vga memory handling drm/amdgpu/gmc9: don't keep stolen memory on Raven drm/amdgpu/gmc9: don't keep stolen memory on vega12 drm/amdgpu/gmc9: don't keep stolen memory on vega20 drm/amdgpu/gmc: add initial xgmi structure to amdgpu_gmc structure drm/amdgpu/gmc9: add a new gfxhub 1.1 helper for xgmi drm/amdgpu/gmc9: Adjust GART and AGP location with xgmi offset (v2) drm/amdgpu: use IP presence to free uvd and vce handles drm/amdgpu: set external rev id for raven2 drm/amdgpu/soc15: clean up picasso support drm/amdgpu: simplify Raven, Raven2, and Picasso handling drm/amdgpu/display: return proper error codes in dm Alvin lee (2): drm/amd/display: Enable Stereo in Dal3 drm/amd/display: Program vsc_infopacket in commit_planes_for_stream Amber Lin (4): drm/amdgpu: Merge amdkfd into amdgpu drm/amdgpu: Remove CONFIG_HSA_AMD_MODULE drm/amdgpu: Move KFD parameters to amdgpu (v3) drm/amdgpu: Relocate some definitions v2 Andrey Grodzovsky (8): drm/amdgpu: Fix page fault and kasan warning on pci device remove. drm/scheduler: Add job dependency trace. drm/amdgpu: Add job pipe sync dependecy trace drm/scheduler: Add stopped flag to drm_sched_entity drm/amdgpu: Refine gmc9 VM fault print. drm/amdgpu: Use drm_dev_unplug in PCI .remove drm/amdgpu: Fix SDMA TO after GPU reset v3 drm/amd/display: Fix pflip IRQ status after gpu reset. Anthony Koo (10): drm/amd/display: Refactor FreeSync module drm/amd/display: add method to check for supported range drm/amd/display: Fix bug where refresh rate becomes fixed drm/amd/display: Fix bug that causes black screen drm/amd/display: Add back code to allow for rounding error drm/amd/display: fix LFC tearing at top of screen drm/amd/display: refactor vupdate interrupt registration drm/amd/display: Correct rounding calcs in mod_freesync_is_valid_range drm/amd/display: add config for sending VSIF drm/amd/display: move edp fast boot optimization flag to stream Bhawanpreet Lakha (3): drm/amd/display: Build stream update and plane updates in dm drm/amd/display: Add Raven2 definitions in dc drm/amd/display: Add DC config flag for Raven2 (v2) Boyuan Zhang (6): drm/amdgpu: add emit reg write reg wait for vcn jpeg drm/amdgpu: add system interrupt register offset header drm/amdgpu: add system interrupt mask for jrbc drm/amdgpu: enable system interrupt for jrbc drm/amdgpu: add emit trap for vcn jpeg drm/amdgpu: fix emit frame size and comments for jpeg Charlene Liu (2): drm/amd/display: pass compat_level to hubp drm/amd/display: add retimer log for HWQ tuning use. Chiawen Huang (2): drm/amd/display: add aux transition event log.
[ANNOUNCE] xf86-video-amdgpu 18.1.0
I'm pleased to announce the 18.1.0 release of xf86-video-amdgpu, the Xorg driver for AMD Radeon GPUs supported by the amdgpu kernel driver. This release supports xserver versions 1.13-1.20. Highlights: * When using DC as of Linux 4.17: - Support advanced colour management functionality. - Support gamma correction and X11 colormaps when Xorg runs at depth 30 as well. * Support for leasing RandR outputs to clients. * Various robustness fixes for TearFree. In particular, fixed several cases in which disabling TearFree at runtime would result in the Xorg process freezing or crashing. * Fixed some m4 related build issues with older versions of autotools. Plus other improvements and fixes. Thanks to everybody who contributed to this release in any way! Emil Velikov (3): Move amdgpu_bus_id/amgpu_kernel_mode within amdgpu_kernel_open_fd Do not export the DriverRec AMDGPU Remove set but unused amdgpu_dri2::pKernelDRMVersion Jim Qu (1): Wait for pending scanout update before calling drmmode_crtc_scanout_free Keith Packard (3): modesetting: Record non-desktop kernel property at PreInit time modesetting: Create CONNECTOR_ID properties for outputs [v2] Add RandR leases support Leo Li (Sunpeng) (7): Cache color property IDs and LUT sizes during pre-init Initialize color properties on CRTC during CRTC init Configure color properties when creating output resources Update color properties on output_get_property Enable setting of color properties via RandR Compose non-legacy with legacy regamma LUT Also compose LUT when setting legacy gamma Michel Dänzer (48): Post-release version bump Ignore AMDGPU_DRM_QUEUE_ERROR (0) in amdgpu_drm_abort_entry Track DRM event queue sequence number in scanout_update_pending Abort scanout_update_pending event when possible Update RandR CRTC state if set_mode_major fails in set_desired_modes Simplify drmmode_crtc_scanout_update Don't call scanout_flip/update with a legacy RandR scanout buffer Simplify drmmode_handle_transform Set drmmode_crtc->scanout_id = 0 when TearFree is disabled Refactor drmmode_output_set_tear_free helper Wait for pending flips in drmmode_output_set_tear_free Replace 'foo == NULL' with '!foo' Call drmmode_do_crtc_dpms from drmmode_crtc_dpms as well Use drmmode_crtc_dpms in drmmode_set_desired_modes Check dimensions passed to drmmode_xf86crtc_resize Don't apply gamma to HW cursor data if colour management is enabled Remove #if 0'd code Call drmmode_crtc_gamma_do_set from drmmode_setup_colormap Bail from dri2_create_buffer2 if we can't get a pixmap glamor: Bail CreatePixmap on unsupported pixmap depth Move flush from radeon_scanout_do_update to its callers Support gamma correction & colormaps at depth 30 as well Hardcode "non-desktop" RandR property name Free previous xf86CrtcRec gamma LUT memory Don't use DRM_IOCTL_GEM_FLINK in create_pixmap_for_fbcon Remove AMDGPUInfoRec::fbcon_pixmap Remove drmmode_terminate_leases Use strcpy for RandR output property names glamor: Set AMDGPU_CREATE_PIXMAP_DRI2 for DRI3 pixmaps Store FB for each CRTC in drmmode_flipdata_rec glamor: Use glamor_egl_create_textured_pixmap_from_gbm_bo when possible glamor: Check glamor module version for depth 30 support Move DRM event queue related initialization to amdgpu_drm_queue_init Add amdgpu_drm_wait_pending_flip function Add amdgpu_drm_handle_event wrapper for drmHandleEvent Defer vblank event handling while waiting for a pending flip Remove drmmode_crtc_private_rec::present_vblank_* related code Use correct FB handle in amdgpu_do_pageflip Add m4 directory Use AC_CONFIG_MACRO_DIR instead of AC_CONFIG_MACRO_DIRS Handle ihandle == -1 in amdgpu_set_shared_pixmap_backing glamor: Handle ihandle == -1 in amdgpu_glamor_set_shared_pixmap_backing Always delete entry from list in drm_queue_handler Don't use xorg_list_for_each_entry_safe for signalled flips Do not push the CM_GAMMA_LUT property values in drmmode_crtc_cm_init Bail early from drm_wait_pending_flip if there's no pending flip Bail from drmmode_cm_init if there's no CRTC Bump version for the 18.1.0 release Slava Grigorev (1): Include xf86platformBus.h unconditionally git tag: xf86-video-amdgpu-18.1.0 https://xorg.freedesktop.org/archive/individual/driver/xf86-video-amdgpu-18.1.0.tar.bz2 MD5: 5d75f5993cda5e013cd851c5947ec450 xf86-video-amdgpu-18.1.0.tar.bz2 SHA1: d3097af7da3b56396721e214f348e7ceb5f3a358 xf86-video-amdgpu-18.1.0.tar.bz2 SHA256: e11f25bb51d718b8ea938ad2b8095323c0ab16f4ddffd92091d80f9a445a9672 xf86-video-amdgpu-18.1.0.tar.bz2 SHA512:
Re: [PATCH 0/9] KFD upstreaming September 2018
On Wed, Sep 12, 2018 at 9:44 PM Felix Kuehling wrote: > > This patch series is based on amd-staging-drm-next. > > Patches 1-3 are important fixes that would be good to be included in > drm-fixes for 4.19. > > Patches 3-8 are small feature enhancements. > > Patch 9 is random cleanup. > > I'll send a separate patch series that adds Vega20 support to KFD on top of > this. > > Amber Lin (1): > drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7 > > Emily Deng (1): > drm/amdkfd: KFD doesn't support TONGA SRIOV > > Eric Huang (1): > drm/amdkfd: reflect atomic support in IO link properties > > Felix Kuehling (2): > drm/amdkfd: Report SDMA firmware version in the topology > drm/amdgpu: remove unnecessary forward declaration > > Harish Kasiviswanathan (1): > drm/amdgpu: Enable BAD_OPCODE intr for gfx8 > > Jay Cornwall (1): > drm/amdkfd: Add wavefront context save state retrieval ioctl > > Yong Zhao (2): > drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9 > drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs > Series is: Acked-by: Alex Deucher > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 5 +- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 21 +++ > drivers/gpu/drm/amd/amdkfd/kfd_device.c| 35 +--- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 37 + > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 8 +++ > drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 13 - > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 8 +++ > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 25 - > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c| 23 > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 12 > .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 22 > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 64 > +- > drivers/gpu/drm/amd/include/kgd_kfd_interface.h| 2 +- > include/uapi/linux/kfd_ioctl.h | 13 - > 17 files changed, 251 insertions(+), 47 deletions(-) > > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 0/6] Initial Vega20 support for KFD
On Wed, Sep 12, 2018 at 9:44 PM Felix Kuehling wrote: > > This patch series is based on amd-staging-drm-next + the patch series > "KFD upstreaming September 2018". > > Emily Deng (1): > drm/amdgpu/sriov: Correct the setting about sdma doorbell offset of > Vega10 > > Shaoyun Liu (5): > drm/amdgpu: Doorbell assignment for 8 sdma user queue per engine > drm/amdkfd: Make the number of SDMA queues variable > drm/amd: Interface change to support 64 bit page_table_base > drm/amdgpu: Add vega20 support on kfd probe > drm/amdkfd: Vega20 bring up on amdkfd side > Series is: Acked-by: Alex Deucher > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 23 +++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 50 > +++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 7 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 7 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 7 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +++- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 12 -- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_device.c| 33 ++ > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +--- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 1 - > drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 3 +- > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 1 + > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + > drivers/gpu/drm/amd/include/kgd_kfd_interface.h| 10 ++--- > 20 files changed, 136 insertions(+), 54 deletions(-) > > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote: > The .ioctl and .compat_ioctl file operations have the same prototype so > they can both point to the same function, which works great almost all > the time when all the commands are compatible. > > One exception is the s390 architecture, where a compat pointer is only > 31 bit wide, and converting it into a 64-bit pointer requires calling > compat_ptr(). Most drivers here will ever run in s390, but since we now > have a generic helper for it, it's easy enough to use it consistently. > > I double-checked all these drivers to ensure that all ioctl arguments > are used as pointers or are ignored, but are not interpreted as integer > values. > > Signed-off-by: Arnd Bergmann > --- > fs/btrfs/super.c| 2 +- Acked-by: David Sterba ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation
This series are Acked-by: James Zhu Tested-by: James Zhu On 2018-09-14 06:57 AM, Christian König wrote: We need to mask the lower bits not the upper one. Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2 Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index ae4467113240..9a5b252784a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc) else mc->gart_start = mc->mc_mask - mc->gart_size + 1; - mc->gart_start &= four_gb - 1; + mc->gart_start &= ~(four_gb - 1); mc->gart_end = mc->gart_start + mc->gart_size - 1; dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n", mc->gart_size >> 20, mc->gart_start, mc->gart_end); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests
Series is: Reviewed-by: Alex Deucher From: amd-gfx on behalf of Christian König Sent: Friday, September 14, 2018 9:09:06 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests Add allocation tests for GDW, GWS and OA. Signed-off-by: Christian König --- tests/amdgpu/amdgpu_test.h | 48 +- tests/amdgpu/bo_tests.c| 21 2 files changed, 47 insertions(+), 22 deletions(-) diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h index d1e14e23..af3041e5 100644 --- a/tests/amdgpu/amdgpu_test.h +++ b/tests/amdgpu/amdgpu_test.h @@ -207,11 +207,9 @@ static inline amdgpu_bo_handle gpu_mem_alloc( amdgpu_va_handle *va_handle) { struct amdgpu_bo_alloc_request req = {0}; - amdgpu_bo_handle buf_handle; + amdgpu_bo_handle buf_handle = NULL; int r; - CU_ASSERT_NOT_EQUAL(vmc_addr, NULL); - req.alloc_size = size; req.phys_alignment = alignment; req.preferred_heap = type; @@ -222,16 +220,19 @@ static inline amdgpu_bo_handle gpu_mem_alloc( if (r) return NULL; - r = amdgpu_va_range_alloc(device_handle, - amdgpu_gpu_va_range_general, - size, alignment, 0, vmc_addr, - va_handle, 0); - CU_ASSERT_EQUAL(r, 0); - if (r) - goto error_free_bo; - - r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, AMDGPU_VA_OP_MAP); - CU_ASSERT_EQUAL(r, 0); + if (vmc_addr && va_handle) { + r = amdgpu_va_range_alloc(device_handle, + amdgpu_gpu_va_range_general, + size, alignment, 0, vmc_addr, + va_handle, 0); + CU_ASSERT_EQUAL(r, 0); + if (r) + goto error_free_bo; + + r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, + AMDGPU_VA_OP_MAP); + CU_ASSERT_EQUAL(r, 0); + } return buf_handle; @@ -256,15 +257,18 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo, if (!bo) return 0; - r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP); - CU_ASSERT_EQUAL(r, 0); - if (r) - return r; - - r = amdgpu_va_range_free(va_handle); - CU_ASSERT_EQUAL(r, 0); - if (r) - return r; + if (va_handle) { + r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, + AMDGPU_VA_OP_UNMAP); + CU_ASSERT_EQUAL(r, 0); + if (r) + return r; + + r = amdgpu_va_range_free(va_handle); + CU_ASSERT_EQUAL(r, 0); + if (r) + return r; + } r = amdgpu_bo_free(bo); CU_ASSERT_EQUAL(r, 0); diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c index dc2de9b7..7cff4cf7 100644 --- a/tests/amdgpu/bo_tests.c +++ b/tests/amdgpu/bo_tests.c @@ -242,6 +242,27 @@ static void amdgpu_memory_alloc(void) r = gpu_mem_free(bo, va_handle, bo_mc, 4096); CU_ASSERT_EQUAL(r, 0); + + /* Test GDS */ + bo = gpu_mem_alloc(device_handle, 1024, 0, + AMDGPU_GEM_DOMAIN_GDS, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); + + /* Test GWS */ + bo = gpu_mem_alloc(device_handle, 1, 0, + AMDGPU_GEM_DOMAIN_GWS, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); + + /* Test OA */ + bo = gpu_mem_alloc(device_handle, 1, 0, + AMDGPU_GEM_DOMAIN_OA, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); } static void amdgpu_mem_fail_alloc(void) -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation
Reviewed-by: Alex Deucher From: amd-gfx on behalf of Christian König Sent: Friday, September 14, 2018 6:57 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH 1/2] drm/amdgpu: fix mask in GART location calculation We need to mask the lower bits not the upper one. Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2 Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index ae4467113240..9a5b252784a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc) else mc->gart_start = mc->mc_mask - mc->gart_size + 1; - mc->gart_start &= four_gb - 1; + mc->gart_start &= ~(four_gb - 1); mc->gart_end = mc->gart_start + mc->gart_size - 1; dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n", mc->gart_size >> 20, mc->gart_start, mc->gart_end); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for the GTT domain"
Acked-by: Alex Deucher From: amd-gfx on behalf of Christian König Sent: Friday, September 14, 2018 6:57:28 AM To: amd-gfx@lists.freedesktop.org Subject: [PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for the GTT domain" Turned out the commit is incomplete and since we remove using the AGP mapping from the GTT manager it is also not necessary any more. This reverts commit 22d8bfafcc12dfa17b91d2e8ae4e1898e782003a. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c2539f6821c0..da7b1b92d9cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -143,8 +143,7 @@ static int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man, spin_unlock(>lock); if (!r) - mem->start = node->node.start + - (adev->gmc.gart_start >> PAGE_SHIFT); + mem->start = node->node.start; return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8a158ee922f7..f12ae6b525b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -188,7 +188,7 @@ static int amdgpu_init_mem_type(struct ttm_bo_device *bdev, uint32_t type, case TTM_PL_TT: /* GTT memory */ man->func = _gtt_mgr_func; - man->gpu_offset = 0; + man->gpu_offset = adev->gmc.gart_start; man->available_caching = TTM_PL_MASK_CACHING; man->default_caching = TTM_PL_FLAG_CACHED; man->flags = TTM_MEMTYPE_FLAG_MAPPABLE | TTM_MEMTYPE_FLAG_CMA; @@ -1060,7 +1060,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_tt *ttm, flags = amdgpu_ttm_tt_pte_flags(adev, ttm, bo_mem); /* bind pages into GART page tables */ - gtt->offset = ((u64)bo_mem->start << PAGE_SHIFT) - adev->gmc.gart_start; + gtt->offset = (u64)bo_mem->start << PAGE_SHIFT; r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages, ttm->pages, gtt->ttm.dma_address, flags); @@ -1112,8 +1112,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, ); /* Bind pages */ - gtt->offset = ((u64)tmp.start << PAGE_SHIFT) - - adev->gmc.gart_start; + gtt->offset = (u64)tmp.start << PAGE_SHIFT; r = amdgpu_ttm_gart_bind(adev, bo, flags); if (unlikely(r)) { ttm_bo_mem_put(bo, ); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery
It shouldn't add much overhead and we should make sure that critical VRAM content is always restored. Signed-off-by: Christian König Acked-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 762dc5f886cd..899342c6dfad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3003,7 +3003,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, } /** - * amdgpu_device_handle_vram_lost - Handle the loss of VRAM contents + * amdgpu_device_recover_vram - Recover some VRAM contents * * @adev: amdgpu_device pointer * @@ -3012,7 +3012,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, * the contents of VRAM might be lost. * Returns 0 on success, 1 on failure. */ -static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) +static int amdgpu_device_recover_vram(struct amdgpu_device *adev) { struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; struct amdgpu_bo *bo, *tmp; @@ -3139,8 +3139,8 @@ static int amdgpu_device_reset(struct amdgpu_device *adev) } } - if (!r && ((need_full_reset && !(adev->flags & AMD_IS_APU)) || vram_lost)) - r = amdgpu_device_handle_vram_lost(adev); + if (!r) + r = amdgpu_device_recover_vram(adev); return r; } @@ -3186,7 +3186,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, amdgpu_virt_release_full_gpu(adev, true); if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) { atomic_inc(>vram_lost_counter); - r = amdgpu_device_handle_vram_lost(adev); + r = amdgpu_device_recover_vram(adev); } return r; -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 5/5] drm/amdgpu: fix shadow BO restoring
Don't grab the reservation lock any more and simplify the handling quite a bit. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 46 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 8 +-- 3 files changed, 43 insertions(+), 120 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 899342c6dfad..1cbc372964f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev) return 0; } -/** - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers - * - * @adev: amdgpu_device pointer - * @ring: amdgpu_ring for the engine handling the buffer operations - * @bo: amdgpu_bo buffer whose shadow is being restored - * @fence: dma_fence associated with the operation - * - * Restores the VRAM buffer contents from the shadow in GTT. Used to - * restore things like GPUVM page tables after a GPU reset where - * the contents of VRAM might be lost. - * Returns 0 on success, negative error code on failure. - */ -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct dma_fence **fence) -{ - uint32_t domain; - int r; - - if (!bo->shadow) - return 0; - - r = amdgpu_bo_reserve(bo, true); - if (r) - return r; - domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type); - /* if bo has been evicted, then no need to recover */ - if (domain == AMDGPU_GEM_DOMAIN_VRAM) { - r = amdgpu_bo_validate(bo->shadow); - if (r) { - DRM_ERROR("bo validate failed!\n"); - goto err; - } - - r = amdgpu_bo_restore_from_shadow(adev, ring, bo, -NULL, fence, true); - if (r) { - DRM_ERROR("recover page table failed!\n"); - goto err; - } - } -err: - amdgpu_bo_unreserve(bo); - return r; -} - /** * amdgpu_device_recover_vram - Recover some VRAM contents * @@ -3010,16 +2962,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, * Restores the contents of VRAM buffers from the shadows in GTT. Used to * restore things like GPUVM page tables after a GPU reset where * the contents of VRAM might be lost. - * Returns 0 on success, 1 on failure. + * + * Returns: + * 0 on success, negative error code on failure. */ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) { - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; - struct amdgpu_bo *bo, *tmp; struct dma_fence *fence = NULL, *next = NULL; - long r = 1; - int i = 0; - long tmo; + struct amdgpu_bo *shadow; + long r = 1, tmo; if (amdgpu_sriov_runtime(adev)) tmo = msecs_to_jiffies(8000); @@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) DRM_INFO("recover vram bo from shadow start\n"); mutex_lock(>shadow_list_lock); - list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) { - next = NULL; - amdgpu_device_recover_vram_from_shadow(adev, ring, bo, ); + list_for_each_entry(shadow, >shadow_list, shadow_list) { + + /* No need to recover an evicted BO */ + if (shadow->tbo.mem.mem_type != TTM_PL_TT || + shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM) + continue; + + r = amdgpu_bo_restore_shadow(shadow, ); + if (r) + break; + if (fence) { r = dma_fence_wait_timeout(fence, false, tmo); - if (r == 0) - pr_err("wait fence %p[%d] timeout\n", fence, i); - else if (r < 0) - pr_err("wait fence %p[%d] interrupted\n", fence, i); - if (r < 1) { - dma_fence_put(fence); - fence = next; + dma_fence_put(fence); + fence = next; + if (r <= 0) break; - } - i++; + } else { + fence = next; } - - dma_fence_put(fence); - fence = next; }
[PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves
We are going to need this for recoverable page fault handling and it makes shadow handling during GPU reset much more easier. Signed-off-by: Christian König Acked-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 6 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e6909252aefa..e6e5e5e50c98 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1360,7 +1360,7 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo) { WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM); WARN_ON_ONCE(!ww_mutex_is_locked(>tbo.resv->lock) && -!bo->pin_count); +!bo->pin_count && bo->tbo.type != ttm_bo_type_kernel); WARN_ON_ONCE(bo->tbo.mem.start == AMDGPU_BO_INVALID_OFFSET); WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_VRAM && !(bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8a158ee922f7..9e7991b1c8ff 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -524,7 +524,11 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, if (r) goto error; - r = ttm_bo_pipeline_move(bo, fence, evict, new_mem); + /* Always block for VM page tables before committing the new location */ + if (bo->type == ttm_bo_type_kernel) + r = ttm_bo_move_accel_cleanup(bo, fence, true, new_mem); + else + r = ttm_bo_pipeline_move(bo, fence, evict, new_mem); dma_fence_put(fence); return r; -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment
They aren't directly used by the hardware. Signed-off-by: Christian König Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index d8e8d653d518..650c45c896f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -526,7 +526,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, } static int amdgpu_bo_create_shadow(struct amdgpu_device *adev, - unsigned long size, int byte_align, + unsigned long size, struct amdgpu_bo *bo) { struct amdgpu_bo_param bp; @@ -537,7 +537,6 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev, memset(, 0, sizeof(bp)); bp.size = size; - bp.byte_align = byte_align; bp.domain = AMDGPU_GEM_DOMAIN_GTT; bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC | AMDGPU_GEM_CREATE_SHADOW; @@ -586,7 +585,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv, NULL)); - r = amdgpu_bo_create_shadow(adev, bp->size, bp->byte_align, (*bo_ptr)); + r = amdgpu_bo_create_shadow(adev, bp->size, *bo_ptr); if (!bp->resv) reservation_object_unlock((*bo_ptr)->tbo.resv); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/5] drm/amdgpu: always enable shadow BOs v2
Even when GPU recovery is disabled we could run into a manually triggered recovery. v2: keep accidental removed comments Signed-off-by: Christian König Acked-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 +- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e6e5e5e50c98..d8e8d653d518 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -51,18 +51,6 @@ * */ -static bool amdgpu_bo_need_backup(struct amdgpu_device *adev) -{ - if (adev->flags & AMD_IS_APU) - return false; - - if (amdgpu_gpu_recovery == 0 || - (amdgpu_gpu_recovery == -1 && !amdgpu_sriov_vf(adev))) - return false; - - return true; -} - /** * amdgpu_bo_subtract_pin_size - Remove BO from pin_size accounting * @@ -593,7 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (r) return r; - if ((flags & AMDGPU_GEM_CREATE_SHADOW) && amdgpu_bo_need_backup(adev)) { + if ((flags & AMDGPU_GEM_CREATE_SHADOW) && !(adev->flags & AMD_IS_APU)) { if (!bp->resv) WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv, NULL)); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/dc: Trigger set power state task when display configuration changes
Acked-by: Alex Deucher From: amd-gfx on behalf of Rex Zhu Sent: Friday, September 14, 2018 1:57:07 AM To: amd-gfx@lists.freedesktop.org Cc: Zhu, Rex Subject: [PATCH] drm/amd/dc: Trigger set power state task when display configuration changes Revert "drm/amd/display: Remove call to amdgpu_pm_compute_clocks" This reverts commit dcd473770e86517543691bdb227103d6c781cd0a. when display configuration changes, dc need to update the changes to powerplay, also need to trigger a power state task. amdgpu_pm_compute_clocks is the interface to set power state task either dpm enabled or powerplay enabled Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c index 6d16b4a..0fab64a 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c @@ -105,6 +105,8 @@ bool dm_pp_apply_display_requirements( adev->powerplay.pp_funcs->display_configuration_change( adev->powerplay.pp_handle, >pm.pm_display_cfg); + + amdgpu_pm_compute_clocks(adev); } return true; -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH libdrm 2/3] test/amdgpu: add proper error handling
Otherwise the calling function won't notice that something is wrong. Signed-off-by: Christian König --- tests/amdgpu/amdgpu_test.h | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h index f2ece3c3..d1e14e23 100644 --- a/tests/amdgpu/amdgpu_test.h +++ b/tests/amdgpu/amdgpu_test.h @@ -219,17 +219,31 @@ static inline amdgpu_bo_handle gpu_mem_alloc( r = amdgpu_bo_alloc(device_handle, , _handle); CU_ASSERT_EQUAL(r, 0); + if (r) + return NULL; r = amdgpu_va_range_alloc(device_handle, amdgpu_gpu_va_range_general, size, alignment, 0, vmc_addr, va_handle, 0); CU_ASSERT_EQUAL(r, 0); + if (r) + goto error_free_bo; r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, AMDGPU_VA_OP_MAP); CU_ASSERT_EQUAL(r, 0); return buf_handle; + +error_free_va: + r = amdgpu_va_range_free(*va_handle); + CU_ASSERT_EQUAL(r, 0); + +error_free_bo: + r = amdgpu_bo_free(buf_handle); + CU_ASSERT_EQUAL(r, 0); + + return NULL; } static inline int gpu_mem_free(amdgpu_bo_handle bo, @@ -239,16 +253,23 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo, { int r; + if (!bo) + return 0; + r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP); CU_ASSERT_EQUAL(r, 0); + if (r) + return r; r = amdgpu_va_range_free(va_handle); CU_ASSERT_EQUAL(r, 0); + if (r) + return r; r = amdgpu_bo_free(bo); CU_ASSERT_EQUAL(r, 0); - return 0; + return r; } static inline int -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH libdrm 3/3] test/amdgpu: add GDS, GWS and OA tests
Add allocation tests for GDW, GWS and OA. Signed-off-by: Christian König --- tests/amdgpu/amdgpu_test.h | 48 +- tests/amdgpu/bo_tests.c| 21 2 files changed, 47 insertions(+), 22 deletions(-) diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h index d1e14e23..af3041e5 100644 --- a/tests/amdgpu/amdgpu_test.h +++ b/tests/amdgpu/amdgpu_test.h @@ -207,11 +207,9 @@ static inline amdgpu_bo_handle gpu_mem_alloc( amdgpu_va_handle *va_handle) { struct amdgpu_bo_alloc_request req = {0}; - amdgpu_bo_handle buf_handle; + amdgpu_bo_handle buf_handle = NULL; int r; - CU_ASSERT_NOT_EQUAL(vmc_addr, NULL); - req.alloc_size = size; req.phys_alignment = alignment; req.preferred_heap = type; @@ -222,16 +220,19 @@ static inline amdgpu_bo_handle gpu_mem_alloc( if (r) return NULL; - r = amdgpu_va_range_alloc(device_handle, - amdgpu_gpu_va_range_general, - size, alignment, 0, vmc_addr, - va_handle, 0); - CU_ASSERT_EQUAL(r, 0); - if (r) - goto error_free_bo; - - r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, AMDGPU_VA_OP_MAP); - CU_ASSERT_EQUAL(r, 0); + if (vmc_addr && va_handle) { + r = amdgpu_va_range_alloc(device_handle, + amdgpu_gpu_va_range_general, + size, alignment, 0, vmc_addr, + va_handle, 0); + CU_ASSERT_EQUAL(r, 0); + if (r) + goto error_free_bo; + + r = amdgpu_bo_va_op(buf_handle, 0, size, *vmc_addr, 0, + AMDGPU_VA_OP_MAP); + CU_ASSERT_EQUAL(r, 0); + } return buf_handle; @@ -256,15 +257,18 @@ static inline int gpu_mem_free(amdgpu_bo_handle bo, if (!bo) return 0; - r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, AMDGPU_VA_OP_UNMAP); - CU_ASSERT_EQUAL(r, 0); - if (r) - return r; - - r = amdgpu_va_range_free(va_handle); - CU_ASSERT_EQUAL(r, 0); - if (r) - return r; + if (va_handle) { + r = amdgpu_bo_va_op(bo, 0, size, vmc_addr, 0, + AMDGPU_VA_OP_UNMAP); + CU_ASSERT_EQUAL(r, 0); + if (r) + return r; + + r = amdgpu_va_range_free(va_handle); + CU_ASSERT_EQUAL(r, 0); + if (r) + return r; + } r = amdgpu_bo_free(bo); CU_ASSERT_EQUAL(r, 0); diff --git a/tests/amdgpu/bo_tests.c b/tests/amdgpu/bo_tests.c index dc2de9b7..7cff4cf7 100644 --- a/tests/amdgpu/bo_tests.c +++ b/tests/amdgpu/bo_tests.c @@ -242,6 +242,27 @@ static void amdgpu_memory_alloc(void) r = gpu_mem_free(bo, va_handle, bo_mc, 4096); CU_ASSERT_EQUAL(r, 0); + + /* Test GDS */ + bo = gpu_mem_alloc(device_handle, 1024, 0, + AMDGPU_GEM_DOMAIN_GDS, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); + + /* Test GWS */ + bo = gpu_mem_alloc(device_handle, 1, 0, + AMDGPU_GEM_DOMAIN_GWS, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); + + /* Test OA */ + bo = gpu_mem_alloc(device_handle, 1, 0, + AMDGPU_GEM_DOMAIN_OA, 0, + NULL, NULL); + r = gpu_mem_free(bo, NULL, 0, 4096); + CU_ASSERT_EQUAL(r, 0); } static void amdgpu_mem_fail_alloc(void) -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH libdrm 1/3] amdgpu: remove invalid check in amdgpu_bo_alloc
The heap is checked by the kernel and not libdrm, to make it even worse it prevented allocating resources other than VRAM and GTT. Signed-off-by: Christian König --- amdgpu/amdgpu_bo.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c index 6a95929c..34904e38 100644 --- a/amdgpu/amdgpu_bo.c +++ b/amdgpu/amdgpu_bo.c @@ -74,19 +74,14 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev, amdgpu_bo_handle *buf_handle) { union drm_amdgpu_gem_create args; - unsigned heap = alloc_buffer->preferred_heap; - int r = 0; - - /* It's an error if the heap is not specified */ - if (!(heap & (AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM))) - return -EINVAL; + int r; memset(, 0, sizeof(args)); args.in.bo_size = alloc_buffer->alloc_size; args.in.alignment = alloc_buffer->phys_alignment; /* Set the placement. */ - args.in.domains = heap; + args.in.domains = alloc_buffer->preferred_heap; args.in.domain_flags = alloc_buffer->flags; /* Allocate the buffer with the preferred heap. */ -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
Am 13.09.2018 um 11:29 schrieb Zhang, Jerry(Junwei): On 09/11/2018 05:56 PM, Christian König wrote: Don't grab the reservation lock any more and simplify the handling quite a bit. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 46 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 8 +-- 3 files changed, 43 insertions(+), 120 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5eba66ecf668..20bb702f5c7f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2940,54 +2940,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev) return 0; } -/** - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers - * - * @adev: amdgpu_device pointer - * @ring: amdgpu_ring for the engine handling the buffer operations - * @bo: amdgpu_bo buffer whose shadow is being restored - * @fence: dma_fence associated with the operation - * - * Restores the VRAM buffer contents from the shadow in GTT. Used to - * restore things like GPUVM page tables after a GPU reset where - * the contents of VRAM might be lost. - * Returns 0 on success, negative error code on failure. - */ -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, - struct amdgpu_ring *ring, - struct amdgpu_bo *bo, - struct dma_fence **fence) -{ - uint32_t domain; - int r; - - if (!bo->shadow) - return 0; - - r = amdgpu_bo_reserve(bo, true); - if (r) - return r; - domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type); - /* if bo has been evicted, then no need to recover */ - if (domain == AMDGPU_GEM_DOMAIN_VRAM) { - r = amdgpu_bo_validate(bo->shadow); - if (r) { - DRM_ERROR("bo validate failed!\n"); - goto err; - } - - r = amdgpu_bo_restore_from_shadow(adev, ring, bo, - NULL, fence, true); - if (r) { - DRM_ERROR("recover page table failed!\n"); - goto err; - } - } -err: - amdgpu_bo_unreserve(bo); - return r; -} - /** * amdgpu_device_recover_vram - Recover some VRAM contents * @@ -2996,16 +2948,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev, * Restores the contents of VRAM buffers from the shadows in GTT. Used to * restore things like GPUVM page tables after a GPU reset where * the contents of VRAM might be lost. - * Returns 0 on success, 1 on failure. + * + * Returns: + * 0 on success, negative error code on failure. */ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) { - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; - struct amdgpu_bo *bo, *tmp; struct dma_fence *fence = NULL, *next = NULL; - long r = 1; - int i = 0; - long tmo; + struct amdgpu_bo *shadow; + long r = 1, tmo; if (amdgpu_sriov_runtime(adev)) tmo = msecs_to_jiffies(8000); @@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) DRM_INFO("recover vram bo from shadow start\n"); mutex_lock(>shadow_list_lock); - list_for_each_entry_safe(bo, tmp, >shadow_list, shadow_list) { - next = NULL; - amdgpu_device_recover_vram_from_shadow(adev, ring, bo, ); + list_for_each_entry(shadow, >shadow_list, shadow_list) { + + /* No need to recover an evicted BO */ + if (shadow->tbo.mem.mem_type != TTM_PL_TT || + shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM) is there a change that shadow bo evicted to other domain? like SYSTEM? Yes, that's why I test "!= TTM_PL_TT" here. What can happen is that either the shadow or the page table or page directory is evicted. But in this case we don't need to restore anything because of patch #1 in this series. Regards, Christian. Regards, Jerry + continue; + + r = amdgpu_bo_restore_shadow(shadow, ); + if (r) + break; + if (fence) { r = dma_fence_wait_timeout(fence, false, tmo); - if (r == 0) - pr_err("wait fence %p[%d] timeout\n", fence, i); - else if (r < 0) - pr_err("wait fence %p[%d] interrupted\n", fence, i); - if (r < 1) { - dma_fence_put(fence); - fence = next; + dma_fence_put(fence); + fence = next; + if (r <= 0) break; - } - i++; + } else { + fence = next; } - - dma_fence_put(fence); - fence = next; } mutex_unlock(>shadow_list_lock); - if (fence) { - r = dma_fence_wait_timeout(fence,
kfdtest failures for amdkfd (amd-staging-drm-next)
Hi! I am trying to use amd-staging-drm-next to work with amdkfd (built into amdgpu) for the AMD Instinct MI25 device. As a first step I compiled libhsakmt 1.8.x and tried to run kfdtest. But it produces lots of failures (see below). Here are the results: ... [==] 76 tests from 14 test cases ran. (80250 ms total) [ PASSED ] 39 tests. [ FAILED ] 37 tests, listed below: [ FAILED ] KFDEvictTest.QueueTest [ FAILED ] KFDGraphicsInterop.RegisterGraphicsHandle [ FAILED ] KFDIPCTest.BasicTest [ FAILED ] KFDIPCTest.CrossMemoryAttachTest [ FAILED ] KFDIPCTest.CMABasicTest [ FAILED ] KFDLocalMemoryTest.BasicTest [ FAILED ] KFDLocalMemoryTest.VerifyContentsAfterUnmapAndMap [ FAILED ] KFDLocalMemoryTest.CheckZeroInitializationVram [ FAILED ] KFDMemoryTest.MapUnmapToNodes [ FAILED ] KFDMemoryTest.MemoryRegisterSamePtr [ FAILED ] KFDMemoryTest.FlatScratchAccess [ FAILED ] KFDMemoryTest.MMBench [ FAILED ] KFDMemoryTest.QueryPointerInfo [ FAILED ] KFDMemoryTest.PtraceAccessInvisibleVram [ FAILED ] KFDMemoryTest.SignalHandling [ FAILED ] KFDQMTest.CreateCpQueue [ FAILED ] KFDQMTest.CreateMultipleSdmaQueues [ FAILED ] KFDQMTest.SdmaConcurrentCopies [ FAILED ] KFDQMTest.CreateMultipleCpQueues [ FAILED ] KFDQMTest.DisableSdmaQueueByUpdateWithNullAddress [ FAILED ] KFDQMTest.DisableCpQueueByUpdateWithZeroPercentage [ FAILED ] KFDQMTest.OverSubscribeCpQueues [ FAILED ] KFDQMTest.BasicCuMaskingEven [ FAILED ] KFDQMTest.QueuePriorityOnDifferentPipe [ FAILED ] KFDQMTest.QueuePriorityOnSamePipe [ FAILED ] KFDQMTest.EmptyDispatch [ FAILED ] KFDQMTest.SimpleWriteDispatch [ FAILED ] KFDQMTest.MultipleCpQueuesStressDispatch [ FAILED ] KFDQMTest.CpuWriteCoherence [ FAILED ] KFDQMTest.CreateAqlCpQueue [ FAILED ] KFDQMTest.QueueLatency [ FAILED ] KFDQMTest.CpQueueWraparound [ FAILED ] KFDQMTest.SdmaQueueWraparound [ FAILED ] KFDQMTest.Atomics [ FAILED ] KFDQMTest.P2PTest [ FAILED ] KFDQMTest.SdmaEventInterrupt [ FAILED ] KFDTopologyTest.BasicTest Does it mean that current amdkfd from the kernel cant be used with libhsakmt 1.8.x? or I am doing something wrong... Thank you! Best, Alexander ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/amdgpu: fix mask in GART location calculation
We need to mask the lower bits not the upper one. Fixes: ec210e3226dc0 drm/amdgpu: put GART away from VRAM v2 Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index ae4467113240..9a5b252784a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -166,7 +166,7 @@ void amdgpu_gmc_gart_location(struct amdgpu_device *adev, struct amdgpu_gmc *mc) else mc->gart_start = mc->mc_mask - mc->gart_size + 1; - mc->gart_start &= four_gb - 1; + mc->gart_start &= ~(four_gb - 1); mc->gart_end = mc->gart_start + mc->gart_size - 1; dev_info(adev->dev, "GART: %lluM 0x%016llX - 0x%016llX\n", mc->gart_size >> 20, mc->gart_start, mc->gart_end); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/amdgpu: revert "stop using gart_start as offset for the GTT domain"
Turned out the commit is incomplete and since we remove using the AGP mapping from the GTT manager it is also not necessary any more. This reverts commit 22d8bfafcc12dfa17b91d2e8ae4e1898e782003a. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index c2539f6821c0..da7b1b92d9cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -143,8 +143,7 @@ static int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man, spin_unlock(>lock); if (!r) - mem->start = node->node.start + - (adev->gmc.gart_start >> PAGE_SHIFT); + mem->start = node->node.start; return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8a158ee922f7..f12ae6b525b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -188,7 +188,7 @@ static int amdgpu_init_mem_type(struct ttm_bo_device *bdev, uint32_t type, case TTM_PL_TT: /* GTT memory */ man->func = _gtt_mgr_func; - man->gpu_offset = 0; + man->gpu_offset = adev->gmc.gart_start; man->available_caching = TTM_PL_MASK_CACHING; man->default_caching = TTM_PL_FLAG_CACHED; man->flags = TTM_MEMTYPE_FLAG_MAPPABLE | TTM_MEMTYPE_FLAG_CMA; @@ -1060,7 +1060,7 @@ static int amdgpu_ttm_backend_bind(struct ttm_tt *ttm, flags = amdgpu_ttm_tt_pte_flags(adev, ttm, bo_mem); /* bind pages into GART page tables */ - gtt->offset = ((u64)bo_mem->start << PAGE_SHIFT) - adev->gmc.gart_start; + gtt->offset = (u64)bo_mem->start << PAGE_SHIFT; r = amdgpu_gart_bind(adev, gtt->offset, ttm->num_pages, ttm->pages, gtt->ttm.dma_address, flags); @@ -1112,8 +1112,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, ); /* Bind pages */ - gtt->offset = ((u64)tmp.start << PAGE_SHIFT) - - adev->gmc.gart_start; + gtt->offset = (u64)tmp.start << PAGE_SHIFT; r = amdgpu_ttm_gart_bind(adev, bo, flags); if (unlikely(r)) { ttm_bo_mem_put(bo, ); -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] [RFC]drm: add syncobj timeline support v5
Am 14.09.2018 um 12:37 schrieb Chunming Zhou: This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter At least on first glance that looks like it should work, going to do a detailed review on Monday. Christian. --- drivers/gpu/drm/drm_syncobj.c | 294 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 62 +++-- include/uapi/drm/drm.h | 1 + 4 files changed, 292 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e9ce623d049e..e78d076f2703 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,9 @@either #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + struct drm_syncobj_stub_fence { struct dma_fence base; spinlock_t lock; @@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { .release = drm_syncobj_stub_fence_release, }; +struct drm_syncobj_signal_pt { + struct dma_fence_array *base; + u64value; + struct list_head list; +}; /** * drm_syncobj_find - lookup and reference a sync object. @@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, { int ret; - *fence = drm_syncobj_fence_get(syncobj); + ret = drm_syncobj_search_fence(syncobj, 0, 0, fence); if (*fence) return 1; @@ -133,10 +141,10 @@ static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, * have the lock, try one more time just to be sure we don't add a * callback when a fence has already been set. */ - if (syncobj->fence) { - *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, - lockdep_is_held(>lock))); - ret = 1; + if (fence) { + drm_syncobj_search_fence(syncobj, 0, 0, fence); + if (*fence) + ret = 1; }
Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail
Am 14.09.2018 um 11:22 schrieb Michel Dänzer: On 2018-09-14 10:22 a.m., Huang Rui wrote: On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote: Am 13.09.2018 um 10:31 schrieb Huang Rui: On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote: While cutting the lists we sometimes accidentally added a list_head from the stack to the LRUs, effectively corrupting the list. Remove the list cutting and use explicit list manipulation instead. This patch actually fixes the corruption bug. Was it a defect of list_cut_position or list_splice handlers? We somehow did something illegal with list_cut_position. I haven't narrowed it down till the end, but we ended up with list_heads from the stack to the lru. I am confused, in theory, even we do any manipulation with list helper, it should not trigger the list corruption. The usage of those helpers should ensure the list operation safely... There's nothing the helpers can do about being passed in pointers to stack memory. It's a bug in the code using the helpers. Actually I'm not 100% sure of that. To me it looks like we hit a corner case list_cut_position doesn't support. Or we indeed had a logic error in how we called it, anyway the explicit implementation only uses 6 assignments and so is much easier to handle. Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] [RFC]drm: add syncobj timeline support v5
This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 294 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 62 +++-- include/uapi/drm/drm.h | 1 + 4 files changed, 292 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e9ce623d049e..e78d076f2703 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,9 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + struct drm_syncobj_stub_fence { struct dma_fence base; spinlock_t lock; @@ -82,6 +85,11 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { .release = drm_syncobj_stub_fence_release, }; +struct drm_syncobj_signal_pt { + struct dma_fence_array *base; + u64value; + struct list_head list; +}; /** * drm_syncobj_find - lookup and reference a sync object. @@ -124,7 +132,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, { int ret; - *fence = drm_syncobj_fence_get(syncobj); + ret = drm_syncobj_search_fence(syncobj, 0, 0, fence); if (*fence) return 1; @@ -133,10 +141,10 @@ static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, * have the lock, try one more time just to be sure we don't add a * callback when a fence has already been set. */ - if (syncobj->fence) { - *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, - lockdep_is_held(>lock))); - ret = 1; + if (fence) { + drm_syncobj_search_fence(syncobj, 0, 0, fence); + if (*fence) + ret = 1; } else { *fence = NULL; drm_syncobj_add_callback_locked(syncobj, cb, func); @@ -164,6 +172,151 @@ void drm_syncobj_remove_callback(struct drm_syncobj *syncobj,
Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail
On 2018-09-14 10:22 a.m., Huang Rui wrote: > On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote: >> Am 13.09.2018 um 10:31 schrieb Huang Rui: >>> On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote: While cutting the lists we sometimes accidentally added a list_head from the stack to the LRUs, effectively corrupting the list. Remove the list cutting and use explicit list manipulation instead. >>> This patch actually fixes the corruption bug. Was it a defect of >>> list_cut_position or list_splice handlers? >> >> We somehow did something illegal with list_cut_position. I haven't >> narrowed it down till the end, but we ended up with list_heads from the >> stack to the lru. > > I am confused, in theory, even we do any manipulation with list helper, it > should not trigger the list corruption. The usage of those helpers should > ensure the list operation safely... There's nothing the helpers can do about being passed in pointers to stack memory. It's a bug in the code using the helpers. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] tests/amdgpu: add unaligned VM test
On 09/13/2018 08:20 PM, Christian König wrote: Am 11.09.2018 um 04:06 schrieb Zhang, Jerry (Junwei): On 09/10/2018 05:33 PM, Christian König wrote: Am 10.09.2018 um 04:44 schrieb Zhang, Jerry (Junwei): On 09/10/2018 02:04 AM, Christian König wrote: Make a VM mapping which is as unaligned as possible. Is it going to test unaligned address between BO allocation and BO mapping and skip huge page mapping? Yes and no. Huge page handling works by mapping at least 2MB of continuous memory on a 2MB aligned address. What I do here is I allocate 4GB of VRAM and try to map it to an address which is aligned to 1GB + 4KB. In other words the VM subsystem will add a single PTE to align the entry to 8KB, then it add two PTEs to align it to 16KB, then four to get to 32KB and so on until we have the maximum alignment of 2GB which Vega/Raven support in the L1. Thanks to explain that. From the trace log, it will map 1*4KB, 2*4KB, ..., 256*4KB, then back to 1*4KB. amdgpu_test-1384 [005] 110.634466: amdgpu_vm_bo_update: soffs=11, eoffs=1f, flags=70 amdgpu_test-1384 [005] 110.634467: amdgpu_vm_set_ptes: pe=f5feffd008, addr=01fec0, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] 110.634468: amdgpu_vm_set_ptes: pe=f5feffd010, addr=01fec01000, incr=4096, flags=f1, count=2 amdgpu_test-1384 [005] 110.634468: amdgpu_vm_set_ptes: pe=f5feffd020, addr=01fec03000, incr=4096, flags=171, count=4 amdgpu_test-1384 [005] 110.634468: amdgpu_vm_set_ptes: pe=f5feffd040, addr=01fec07000, incr=4096, flags=1f1, count=8 amdgpu_test-1384 [005] 110.634468: amdgpu_vm_set_ptes: pe=f5feffd080, addr=01fec0f000, incr=4096, flags=271, count=16 amdgpu_test-1384 [005] 110.634468: amdgpu_vm_set_ptes: pe=f5feffd100, addr=01fec1f000, incr=4096, flags=2f1, count=32 amdgpu_test-1384 [005] 110.634469: amdgpu_vm_set_ptes: pe=f5feffd200, addr=01fec3f000, incr=4096, flags=371, count=64 amdgpu_test-1384 [005] 110.634469: amdgpu_vm_set_ptes: pe=f5feffd400, addr=01fec7f000, incr=4096, flags=3f1, count=128 amdgpu_test-1384 [005] 110.634469: amdgpu_vm_set_ptes: pe=f5feffd800, addr=01fecff000, incr=4096, flags=471, count=256 amdgpu_test-1384 [005] 110.634469: amdgpu_vm_set_ptes: pe=f5feffc000, addr=01fedff000, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] 110.634470: amdgpu_vm_set_ptes: pe=f5feffc008, addr=01fea0, incr=4096, flags=71, count=1 amdgpu_test-1384 [005] 110.634470: amdgpu_vm_set_ptes: pe=f5feffc010, addr=01fea01000, incr=4096, flags=f1, count=2 Yes, that it is exactly the expected result with the old code. And it sounds like a performance test for Vega and later. If so, shall we add some time stamp in the log? Well I used it as performance test, but the resulting numbers are not very comparable. It is useful to push to libdrm because it also exercises the VM code and makes sure that the code doesn't crash on corner cases. Thanks for your info. That's fine for me. Reviewed-by: Junwei Zhang BTW, still think adding a print here is a good choice. + /* Don't let the test fail if the device doesn't have enough VRAM */ + if (r) + return; Regards, Jerry Regards, Christian. Regards, Jerry Regards, Christian. Signed-off-by: Christian König --- tests/amdgpu/vm_tests.c | 45 - 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/tests/amdgpu/vm_tests.c b/tests/amdgpu/vm_tests.c index 7b6dc5d6..fada2987 100644 --- a/tests/amdgpu/vm_tests.c +++ b/tests/amdgpu/vm_tests.c @@ -31,8 +31,8 @@ static amdgpu_device_handle device_handle; static uint32_t major_version; static uint32_t minor_version; - static void amdgpu_vmid_reserve_test(void); +static void amdgpu_vm_unaligned_map(void); CU_BOOL suite_vm_tests_enable(void) { @@ -84,6 +84,7 @@ int suite_vm_tests_clean(void) CU_TestInfo vm_tests[] = { { "resere vmid test", amdgpu_vmid_reserve_test }, + { "unaligned map", amdgpu_vm_unaligned_map }, CU_TEST_INFO_NULL, }; @@ -167,3 +168,45 @@ static void amdgpu_vmid_reserve_test(void) r = amdgpu_cs_ctx_free(context_handle); CU_ASSERT_EQUAL(r, 0); } + +static void amdgpu_vm_unaligned_map(void) +{ + const uint64_t map_size = (4ULL << 30) - (2 << 12); + struct amdgpu_bo_alloc_request request = {}; + amdgpu_bo_handle buf_handle; + amdgpu_va_handle handle; + uint64_t vmc_addr; + int r; + + request.alloc_size = 4ULL << 30; + request.phys_alignment = 4096; + request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM; + request.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS; + + r = amdgpu_bo_alloc(device_handle, , _handle); + /* Don't let the test fail if the device doesn't have enough VRAM */ We may print some info to the console here. Regards, Jerry + if (r) + return; + + r =
RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> -Original Message- > From: Koenig, Christian > Sent: Friday, September 14, 2018 3:27 PM > To: Zhou, David(ChunMing) ; Zhou, > David(ChunMing) ; dri- > de...@lists.freedesktop.org > Cc: Dave Airlie ; Rakos, Daniel > ; amd-gfx@lists.freedesktop.org; Daniel Vetter > > Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 > > Am 14.09.2018 um 05:59 schrieb zhoucm1: > > > > > > On 2018年09月14日 11:14, zhoucm1 wrote: > >> > >> > >> On 2018年09月13日 18:22, Christian König wrote: > >>> Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing): > > > -Original Message- > > From: Koenig, Christian > > Sent: Thursday, September 13, 2018 5:20 PM > > To: Zhou, David(ChunMing) ; dri- > > de...@lists.freedesktop.org > > Cc: Dave Airlie ; Rakos, Daniel > > ; amd-gfx@lists.freedesktop.org > > Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 > > > > Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing): > >>> -Original Message- > >>> From: Christian König > >>> Sent: Thursday, September 13, 2018 4:50 PM > >>> To: Zhou, David(ChunMing) ; Koenig, > >>> Christian ; > >>> dri-de...@lists.freedesktop.org > >>> Cc: Dave Airlie ; Rakos, Daniel > >>> ; amd-gfx@lists.freedesktop.org > >>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support > >>> v4 > >>> > >>> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing): > > -Original Message- > > From: Koenig, Christian > > Sent: Thursday, September 13, 2018 2:56 PM > > To: Zhou, David(ChunMing) ; Zhou, > > David(ChunMing) ; dri- > > de...@lists.freedesktop.org > > Cc: Dave Airlie ; Rakos, Daniel > > ; amd-gfx@lists.freedesktop.org > > Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline > > support v4 > > > > Am 13.09.2018 um 04:15 schrieb zhoucm1: > >> On 2018年09月12日 19:05, Christian König wrote: > >>> [SNIP] > >>> +static void > >>> +drm_syncobj_find_signal_pt_for_wait_pt(struct > >>> drm_syncobj *syncobj, > >>> + struct drm_syncobj_wait_pt > >>> +*wait_pt) { > >> That whole approach still looks horrible complicated to me. > It's already very close to what you said before. > > >> Especially the separation of signal and wait pt is > >> completely unnecessary as far as I can see. > >> When a wait pt is requested we just need to search for > >> the signal point which it will trigger. > Yeah, I tried this, but when I implement cpu wait ioctl on > specific point, we need a advanced wait pt fence, > otherwise, we could still need old syncobj cb. > >>> Why? I mean you just need to call drm_syncobj_find_fence() > >>> and > >>> when > >>> that one returns NULL you use wait_event_*() to wait for a > >>> signal point >= your wait point to appear and try again. > >> e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC > >> have no fence yet, as you said, during > >> drm_syncobj_find_fence(A) is working on wait_event, > syncobjB > >> and syncobjC could already be signaled, then we don't know > >> which one is first signaled, which is need when wait ioctl > >> returns. > > I don't really see a problem with that. When you wait for the > > first one you need to wait for A,B,C at the same time anyway. > > > > So what you do is to register a fence callback on the fences > > you already have and for the syncobj which doesn't yet have a > > fence you make sure that they wake up your thread when they > > get one. > > > > So essentially exactly what > > drm_syncobj_fence_get_or_add_callback() > > already does today. > So do you mean we need still use old syncobj CB for that? > >>> Yes, as far as I can see it should work. > >>> > Advanced wait pt is bad? > >>> Well it isn't bad, I just don't see any advantage in it. > >> The advantage is to replace old syncobj cb. > >> > >>> The existing mechanism > >>> should already be able to handle that. > >> I thought more a bit, we don't that mechanism at all, if use > >> advanced wait > > pt, we can easily use fence array to achieve it for wait ioctl, we > > should use kernel existing feature as much as possible, not invent > > another, shouldn't we? > > I remember you said it before. > > > > Yeah, but the syncobj cb is an existing feature. > This is obviously a workaround when doing for wait ioctl, Do you > see it used in other place? > > > And I absolutely don't see a > > need to modify that and
Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
Am 14.09.2018 um 05:59 schrieb zhoucm1: On 2018年09月14日 11:14, zhoucm1 wrote: On 2018年09月13日 18:22, Christian König wrote: Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 5:20 PM To: Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing): -Original Message- From: Christian König Sent: Thursday, September 13, 2018 4:50 PM To: Zhou, David(ChunMing) ; Koenig, Christian ; dri-de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 2:56 PM To: Zhou, David(ChunMing) ; Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 04:15 schrieb zhoucm1: On 2018年09月12日 19:05, Christian König wrote: [SNIP] +static void drm_syncobj_find_signal_pt_for_wait_pt(struct drm_syncobj *syncobj, + struct drm_syncobj_wait_pt +*wait_pt) { That whole approach still looks horrible complicated to me. It's already very close to what you said before. Especially the separation of signal and wait pt is completely unnecessary as far as I can see. When a wait pt is requested we just need to search for the signal point which it will trigger. Yeah, I tried this, but when I implement cpu wait ioctl on specific point, we need a advanced wait pt fence, otherwise, we could still need old syncobj cb. Why? I mean you just need to call drm_syncobj_find_fence() and when that one returns NULL you use wait_event_*() to wait for a signal point >= your wait point to appear and try again. e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no fence yet, as you said, during drm_syncobj_find_fence(A) is working on wait_event, syncobjB and syncobjC could already be signaled, then we don't know which one is first signaled, which is need when wait ioctl returns. I don't really see a problem with that. When you wait for the first one you need to wait for A,B,C at the same time anyway. So what you do is to register a fence callback on the fences you already have and for the syncobj which doesn't yet have a fence you make sure that they wake up your thread when they get one. So essentially exactly what drm_syncobj_fence_get_or_add_callback() already does today. So do you mean we need still use old syncobj CB for that? Yes, as far as I can see it should work. Advanced wait pt is bad? Well it isn't bad, I just don't see any advantage in it. The advantage is to replace old syncobj cb. The existing mechanism should already be able to handle that. I thought more a bit, we don't that mechanism at all, if use advanced wait pt, we can easily use fence array to achieve it for wait ioctl, we should use kernel existing feature as much as possible, not invent another, shouldn't we? I remember you said it before. Yeah, but the syncobj cb is an existing feature. This is obviously a workaround when doing for wait ioctl, Do you see it used in other place? And I absolutely don't see a need to modify that and replace it with something far more complex. The wait ioctl is simplified much more by fence array, not complex, and we just need to allocate a wait pt. If keeping old syncobj cb workaround, all wait pt logic still is there, just save allocation and wait pt handling, in fact, which part isn't complex at all. But compare with ugly syncobj cb, which is simpler. I strongly disagree on that. You just need to extend the syncobj cb with the sequence number and you are done. We could clean that up in the long term by adding some wait_multi event macro, but for now just adding the sequence number should do the trick. Quote from Daniel Vetter comment when v1, " Specifically for this stuff here having unified future fence semantics will allow drivers to do clever stuff with them. " I think the advanced wait pt is a similar concept as 'future fence' what Daniel Vetter said before, which obviously a right direction. Anyway, I will change the patch as you like if no other comment, so that the patch can pass soon. When I try to remove wait pt future fence, I encounter another problem, drm_syncobj_find_fence cannot get a fence if signal pt already is collected as garbage, then CS will report error, any idea for that? Well when the signal pt is already garbage collected you know that it is already signaled. So you can just return a dummy fence. I actually thought that this was the intention of
Re: [PATCH] drm/ttm: once more fix ttm_bo_bulk_move_lru_tail
On Thu, Sep 13, 2018 at 07:32:24PM +0800, Christian König wrote: > Am 13.09.2018 um 10:31 schrieb Huang Rui: > > On Wed, Sep 12, 2018 at 09:23:55PM +0200, Christian König wrote: > >> While cutting the lists we sometimes accidentally added a list_head from > >> the stack to the LRUs, effectively corrupting the list. > >> > >> Remove the list cutting and use explicit list manipulation instead. > > This patch actually fixes the corruption bug. Was it a defect of > > list_cut_position or list_splice handlers? > > We somehow did something illegal with list_cut_position. I haven't > narrowed it down till the end, but we ended up with list_heads from the > stack to the lru. > I am confused, in theory, even we do any manipulation with list helper, it should not trigger the list corruption. The usage of those helpers should ensure the list operation safely... > Anyway adding a specialized list bulk move function is much simpler and > avoids the issue. > > I've just split that up as Michel suggested and send it out to the > mailing lists, please review that version once more. > Sure, already reviewed. > Thanks, > Christian. > > > > > Reviewed-and-Tested: Huang Rui > > > >> Signed-off-by: Christian König > >> --- > >> drivers/gpu/drm/ttm/ttm_bo.c | 51 > >> ++-- > >> 1 file changed, 30 insertions(+), 21 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > >> index 138c98902033..b2a33bf1ef10 100644 > >> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >> @@ -247,23 +247,18 @@ void ttm_bo_move_to_lru_tail(struct > >> ttm_buffer_object *bo, > >> } > >> EXPORT_SYMBOL(ttm_bo_move_to_lru_tail); > >> > >> -static void ttm_bo_bulk_move_helper(struct ttm_lru_bulk_move_pos *pos, > >> - struct list_head *lru, bool is_swap) > >> +static void ttm_list_move_bulk_tail(struct list_head *list, > >> + struct list_head *first, > >> + struct list_head *last) > >> { > >> - struct list_head *list; > >> - LIST_HEAD(entries); > >> - LIST_HEAD(before); > >> + first->prev->next = last->next; > >> + last->next->prev = first->prev; > >> > >> - reservation_object_assert_held(pos->last->resv); > >> - list = is_swap ? >last->swap : >last->lru; > >> - list_cut_position(, lru, list); > >> + list->prev->next = first; > >> + first->prev = list->prev; > >> > >> - reservation_object_assert_held(pos->first->resv); > >> - list = is_swap ? pos->first->swap.prev : pos->first->lru.prev; > >> - list_cut_position(, , list); > >> - > >> - list_splice(, lru); > >> - list_splice_tail(, lru); > >> + last->next = list; > >> + list->prev = last; > >> } > >> > >> void ttm_bo_bulk_move_lru_tail(struct ttm_lru_bulk_move *bulk) > >> @@ -271,23 +266,33 @@ void ttm_bo_bulk_move_lru_tail(struct > >> ttm_lru_bulk_move *bulk) > >>unsigned i; > >> > >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { > >> + struct ttm_lru_bulk_move_pos *pos = >tt[i]; > >>struct ttm_mem_type_manager *man; > >> > >> - if (!bulk->tt[i].first) > >> + if (!pos->first) > >>continue; > >> > >> - man = >tt[i].first->bdev->man[TTM_PL_TT]; > >> - ttm_bo_bulk_move_helper(>tt[i], >lru[i], false); > >> + reservation_object_assert_held(pos->first->resv); > >> + reservation_object_assert_held(pos->last->resv); > >> + > >> + man = >first->bdev->man[TTM_PL_TT]; > >> + ttm_list_move_bulk_tail(>lru[i], >first->lru, > >> + >last->lru); > >>} > >> > >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { > >> + struct ttm_lru_bulk_move_pos *pos = >vram[i]; > >>struct ttm_mem_type_manager *man; > >> > >> - if (!bulk->vram[i].first) > >> + if (!pos->first) > >>continue; > >> > >> - man = >vram[i].first->bdev->man[TTM_PL_VRAM]; > >> - ttm_bo_bulk_move_helper(>vram[i], >lru[i], false); > >> + reservation_object_assert_held(pos->first->resv); > >> + reservation_object_assert_held(pos->last->resv); > >> + > >> + man = >first->bdev->man[TTM_PL_VRAM]; > >> + ttm_list_move_bulk_tail(>lru[i], >first->lru, > >> + >last->lru); > >>} > >> > >>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) { > >> @@ -297,8 +302,12 @@ void ttm_bo_bulk_move_lru_tail(struct > >> ttm_lru_bulk_move *bulk) > >>if (!pos->first) > >>continue; > >> > >> + reservation_object_assert_held(pos->first->resv); > >> + reservation_object_assert_held(pos->last->resv); > >> + > >>lru = >first->bdev->glob->swap_lru[i]; > >> - ttm_bo_bulk_move_helper(>swap[i], lru, true); > >> + ttm_list_move_bulk_tail(lru,
Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4
Am 13.09.2018 um 23:51 schrieb Felix Kuehling: On 2018-09-13 04:52 PM, Philip Yang wrote: Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. This depends on several HMM patchset from Jérôme Glisse queued for upstream. Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +- drivers/gpu/drm/amd/amdgpu/Makefile| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 121 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h | 2 +- 4 files changed, 56 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e54..960a633 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK config DRM_AMDGPU_USERPTR bool "Always enable userptr write support" depends on DRM_AMDGPU - select MMU_NOTIFIER + select HMM_MIRROR help - This option selects CONFIG_MMU_NOTIFIER if it isn't already - selected to enabled full userptr support. + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it + isn't already selected to enabled full userptr support. config DRM_AMDGPU_GART_DEBUGFS bool "Allow GART access through debugfs" diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 138cb78..c1e5d43 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -171,7 +171,7 @@ endif amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o include $(FULL_AMD_PATH)/powerplay/Makefile diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index e55508b..ad52f34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -45,7 +45,7 @@ #include #include -#include +#include #include #include #include @@ -66,6 +66,7 @@ Need to remove @mn documentation. * @objects: interval tree containing amdgpu_mn_nodes * @read_lock: mutex for recursive locking of @lock * @recursion: depth of recursion + * @mirror: HMM mirror function support * * Data for each amdgpu device and process address space. */ @@ -73,7 +74,6 @@ struct amdgpu_mn { /* constant after initialisation */ struct amdgpu_device*adev; struct mm_struct*mm; - struct mmu_notifier mn; enum amdgpu_mn_type type; /* only used on destruction */ @@ -87,6 +87,9 @@ struct amdgpu_mn { struct rb_root_cached objects; struct mutexread_lock; atomic_trecursion; + + /* HMM mirror */ + struct hmm_mirror mirror; }; /** @@ -103,7 +106,7 @@ struct amdgpu_mn_node { }; /** - * amdgpu_mn_destroy - destroy the MMU notifier + * amdgpu_mn_destroy - destroy the HMM mirror * * @work: previously sheduled work item * @@ -129,28 +132,26 @@ static void amdgpu_mn_destroy(struct work_struct *work) } up_write(>lock); mutex_unlock(>mn_lock); - mmu_notifier_unregister_no_release(>mn, amn->mm); + hmm_mirror_unregister(>mirror); + kfree(amn); } /** * amdgpu_mn_release - callback to notify about mm destruction Update the function name in the comment. * - * @mn: our notifier - * @mm: the mm this callback is about + * @mirror: the HMM mirror (mm) this callback is about * - * Shedule a work item to lazy destroy our notifier. + * Shedule a work item to lazy destroy HMM mirror. */ -static void amdgpu_mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn); + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(>work, amdgpu_mn_destroy); schedule_work(>work); } - /** * amdgpu_mn_lock - take the write side lock for this notifier * @@ -237,21 +238,19 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node, /** * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change * - * @mn: our notifier - * @mm: the mm this callback is about - * @start: start of updated range - * @end: end of updated range + * @mirror: the hmm_mirror (mm) is about to update + * @update: the update start, end address * * Block for operations on BOs to finish and mark pages as accessed and * potentially
[PATCH] drm/amd/dc: Trigger set power state task when display configuration changes
Revert "drm/amd/display: Remove call to amdgpu_pm_compute_clocks" This reverts commit dcd473770e86517543691bdb227103d6c781cd0a. when display configuration changes, dc need to update the changes to powerplay, also need to trigger a power state task. amdgpu_pm_compute_clocks is the interface to set power state task either dpm enabled or powerplay enabled Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c index 6d16b4a..0fab64a 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c @@ -105,6 +105,8 @@ bool dm_pp_apply_display_requirements( adev->powerplay.pp_funcs->display_configuration_change( adev->powerplay.pp_handle, >pm.pm_display_cfg); + + amdgpu_pm_compute_clocks(adev); } return true; -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: simplify Raven, Raven2, and Picasso handling
On Thu, Sep 13, 2018 at 03:45:27PM -0500, Alex Deucher wrote: > Treat them all as Raven rather than adding a new picasso > asic type. This simplifies a lot of code and also handles the > case of rv2 chips with the 0x15d8 pci id. It also fixes dmcu > fw handling for picasso. We drop the Picasso asic type, and keep the separate ucode. It's fine. We can also support the RV2 PCO refresh with the change. Acked-by: Huang Rui > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +--- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 1 - > drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 1 - > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c| 7 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +- > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 ++- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 -- > drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c| 11 ++-- > drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 5 +- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 11 +--- > drivers/gpu/drm/amd/amdgpu/soc15.c | 66 > ++ > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +-- > drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 1 - > .../gpu/drm/amd/powerplay/hwmgr/processpptables.c | 8 +-- > include/drm/amd_asic_type.h| 1 - > 16 files changed, 60 insertions(+), 113 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 762dc5f886cd..354f0557d697 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -91,7 +91,6 @@ static const char *amdgpu_asic_name[] = { > "VEGA12", > "VEGA20", > "RAVEN", > - "PICASSO", > "LAST", > }; > > @@ -1337,12 +1336,11 @@ static int amdgpu_device_parse_gpu_info_fw(struct > amdgpu_device *adev) > case CHIP_RAVEN: > if (adev->rev_id >= 8) > chip_name = "raven2"; > + else if (adev->pdev->device == 0x15d8) > + chip_name = "picasso"; > else > chip_name = "raven"; > break; > - case CHIP_PICASSO: > - chip_name = "picasso"; > - break; > } > > snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name); > @@ -1468,8 +1466,7 @@ static int amdgpu_device_ip_early_init(struct > amdgpu_device *adev) > case CHIP_VEGA12: > case CHIP_VEGA20: > case CHIP_RAVEN: > - case CHIP_PICASSO: > - if ((adev->asic_type == CHIP_RAVEN) || (adev->asic_type == > CHIP_PICASSO)) > + if (adev->asic_type == CHIP_RAVEN) > adev->family = AMDGPU_FAMILY_RV; > else > adev->family = AMDGPU_FAMILY_AI; > @@ -2183,7 +2180,6 @@ bool amdgpu_device_asic_has_dc_support(enum > amd_asic_type asic_type) > case CHIP_VEGA20: > #if defined(CONFIG_DRM_AMD_DC_DCN1_0) > case CHIP_RAVEN: > - case CHIP_PICASSO: > #endif > return amdgpu_dc != 0; > #endif > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 33e1856fb8cc..ff10df4f50d3 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -874,8 +874,7 @@ static const struct pci_device_id pciidlist[] = { > {0x1002, 0x66AF, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA20}, > /* Raven */ > {0x1002, 0x15dd, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU}, > - /* Picasso */ > - {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_PICASSO|AMD_IS_APU}, > + {0x1002, 0x15d8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RAVEN|AMD_IS_APU}, > > {0, 0, 0} > }; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > index 611c06d3600a..bd397d2916fb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > @@ -56,7 +56,6 @@ static int psp_sw_init(void *handle) > psp_v3_1_set_psp_funcs(psp); > break; > case CHIP_RAVEN: > - case CHIP_PICASSO: > psp_v10_0_set_psp_funcs(psp); > break; > case CHIP_VEGA20: > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > index acb4c66fe89b..1fa8bc337859 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c > @@ -303,7 +303,6 @@ amdgpu_ucode_get_load_type(struct amdgpu_device *adev, > int load_type) > return AMDGPU_FW_LOAD_SMU; > case CHIP_VEGA10: > case CHIP_RAVEN: > - case CHIP_PICASSO: > case CHIP_VEGA12: >
Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
Am 14.09.2018 um 09:46 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Friday, September 14, 2018 3:27 PM To: Zhou, David(ChunMing) ; Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org; Daniel Vetter Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 14.09.2018 um 05:59 schrieb zhoucm1: On 2018年09月14日 11:14, zhoucm1 wrote: On 2018年09月13日 18:22, Christian König wrote: Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 5:20 PM To: Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing): -Original Message- From: Christian König Sent: Thursday, September 13, 2018 4:50 PM To: Zhou, David(ChunMing) ; Koenig, Christian ; dri-de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 2:56 PM To: Zhou, David(ChunMing) ; Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 04:15 schrieb zhoucm1: On 2018年09月12日 19:05, Christian König wrote: [SNIP] +static void +drm_syncobj_find_signal_pt_for_wait_pt(struct drm_syncobj *syncobj, + struct drm_syncobj_wait_pt +*wait_pt) { That whole approach still looks horrible complicated to me. It's already very close to what you said before. Especially the separation of signal and wait pt is completely unnecessary as far as I can see. When a wait pt is requested we just need to search for the signal point which it will trigger. Yeah, I tried this, but when I implement cpu wait ioctl on specific point, we need a advanced wait pt fence, otherwise, we could still need old syncobj cb. Why? I mean you just need to call drm_syncobj_find_fence() and when that one returns NULL you use wait_event_*() to wait for a signal point >= your wait point to appear and try again. e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no fence yet, as you said, during drm_syncobj_find_fence(A) is working on wait_event, syncobjB and syncobjC could already be signaled, then we don't know which one is first signaled, which is need when wait ioctl returns. I don't really see a problem with that. When you wait for the first one you need to wait for A,B,C at the same time anyway. So what you do is to register a fence callback on the fences you already have and for the syncobj which doesn't yet have a fence you make sure that they wake up your thread when they get one. So essentially exactly what drm_syncobj_fence_get_or_add_callback() already does today. So do you mean we need still use old syncobj CB for that? Yes, as far as I can see it should work. Advanced wait pt is bad? Well it isn't bad, I just don't see any advantage in it. The advantage is to replace old syncobj cb. The existing mechanism should already be able to handle that. I thought more a bit, we don't that mechanism at all, if use advanced wait pt, we can easily use fence array to achieve it for wait ioctl, we should use kernel existing feature as much as possible, not invent another, shouldn't we? I remember you said it before. Yeah, but the syncobj cb is an existing feature. This is obviously a workaround when doing for wait ioctl, Do you see it used in other place? And I absolutely don't see a need to modify that and replace it with something far more complex. The wait ioctl is simplified much more by fence array, not complex, and we just need to allocate a wait pt. If keeping old syncobj cb workaround, all wait pt logic still is there, just save allocation and wait pt handling, in fact, which part isn't complex at all. But compare with ugly syncobj cb, which is simpler. I strongly disagree on that. You just need to extend the syncobj cb with the sequence number and you are done. We could clean that up in the long term by adding some wait_multi event macro, but for now just adding the sequence number should do the trick. Quote from Daniel Vetter comment when v1, " Specifically for this stuff here having unified future fence semantics will allow drivers to do clever stuff with them. " I think the advanced wait pt is a similar concept as 'future fence' what Daniel Vetter said before, which obviously a right direction. Anyway, I will change the patch as you like if no other comment, so that the patch can pass soon. When I try to remove wait pt future
Re: [PATCH] drm/amdgpu: reserve GDS resources statically
Well as long as we don't need to save any content it should be trivial to implement resource management with the existing code. I will take a look why allocating GDS BOs fail at the moment, if it is something trivial we could still fix it. Christian. Am 13.09.2018 um 23:01 schrieb Marek Olšák: To be fair, since we have only 7 user VMIDs and 8 chunks of GDS, we can make the 8th GDS chunk global and allocatable and use it based on a CS flag. It would need more work and a lot of testing though. I don't think we can do the testing part now because of the complexity of interactions between per-VMID GDS and global GDS, but it's certainly something that people could add in the future. Marek On Thu, Sep 13, 2018 at 3:04 PM, Marek Olšák wrote: I was thinking about that too, but it would be too much trouble for something we don't need. Marek On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander wrote: Why don't we just fix up the current GDS code so it works the same as vram and then we can add a new CS or context flag to ignore the current static allocation for gfx. We can ignore data persistence if it's too much trouble. Assume you always have to init the memory before you use it. That's already the case. Alex ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v3
Am 13.09.2018 um 22:45 schrieb Philip Yang: On 2018-09-13 02:24 PM, Christian König wrote: Am 13.09.2018 um 20:00 schrieb Philip Yang: Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in DRM_AMDGPU_USERPTR Kconfig. It supports both KFD userptr and gfx userptr paths. This depends on several HMM patchset from Jérôme Glisse queued for upstream. See http://172.27.226.38/root/kernel_amd/commits/hmm-dev-v01 (for AMD intranet) Change-Id: Ie62c3c5e3c5b8521ab3b438d1eff2aa2a003835e Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/Kconfig | 6 +-- drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 88 +++--- 3 files changed, 75 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 9221e54..960a633 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,10 +26,10 @@ config DRM_AMDGPU_CIK config DRM_AMDGPU_USERPTR bool "Always enable userptr write support" depends on DRM_AMDGPU - select MMU_NOTIFIER + select HMM_MIRROR help - This option selects CONFIG_MMU_NOTIFIER if it isn't already - selected to enabled full userptr support. + This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it + isn't already selected to enabled full userptr support. config DRM_AMDGPU_GART_DEBUGFS bool "Allow GART access through debugfs" diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 138cb78..c1e5d43 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -171,7 +171,7 @@ endif amdgpu-$(CONFIG_COMPAT) += amdgpu_ioc32.o amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o -amdgpu-$(CONFIG_MMU_NOTIFIER) += amdgpu_mn.o +amdgpu-$(CONFIG_HMM) += amdgpu_mn.o include $(FULL_AMD_PATH)/powerplay/Makefile diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index e55508b..ea8671f6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -46,6 +46,7 @@ #include #include #include +#include Can we now drop including linux/mmu_notifier.h? Yes, will use hmm_mirror_ops to replace gfx and kfd mmu_notifier_ops Please drop that and implement the gfx and kfd operations directly. Thanks, Christian. #include #include #include @@ -66,6 +67,7 @@ * @objects: interval tree containing amdgpu_mn_nodes * @read_lock: mutex for recursive locking of @lock * @recursion: depth of recursion + * @mirror: HMM mirror function support * * Data for each amdgpu device and process address space. */ @@ -87,6 +89,9 @@ struct amdgpu_mn { struct rb_root_cached objects; struct mutex read_lock; atomic_t recursion; + + /* HMM mirror */ + struct hmm_mirror mirror; }; /** @@ -103,7 +108,7 @@ struct amdgpu_mn_node { }; /** - * amdgpu_mn_destroy - destroy the MMU notifier + * amdgpu_mn_destroy - destroy the HMM mirror * * @work: previously sheduled work item * @@ -129,28 +134,27 @@ static void amdgpu_mn_destroy(struct work_struct *work) } up_write(>lock); mutex_unlock(>mn_lock); - mmu_notifier_unregister_no_release(>mn, amn->mm); + + hmm_mirror_unregister(>mirror); + kfree(amn); } /** * amdgpu_mn_release - callback to notify about mm destruction * - * @mn: our notifier - * @mm: the mm this callback is about + * @mirror: the HMM mirror (mm) this callback is about * - * Shedule a work item to lazy destroy our notifier. + * Shedule a work item to lazy destroy HMM mirror. */ -static void amdgpu_mn_release(struct mmu_notifier *mn, - struct mm_struct *mm) +static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror) { - struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn); + struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror); INIT_WORK(>work, amdgpu_mn_destroy); schedule_work(>work); } - /** * amdgpu_mn_lock - take the write side lock for this notifier * @@ -355,12 +359,10 @@ static void amdgpu_mn_invalidate_range_end(struct mmu_notifier *mn, static const struct mmu_notifier_ops amdgpu_mn_ops[] = { [AMDGPU_MN_TYPE_GFX] = { - .release = amdgpu_mn_release, .invalidate_range_start = amdgpu_mn_invalidate_range_start_gfx, .invalidate_range_end = amdgpu_mn_invalidate_range_end, }, [AMDGPU_MN_TYPE_HSA] = { - .release = amdgpu_mn_release, .invalidate_range_start = amdgpu_mn_invalidate_range_start_hsa, .invalidate_range_end = amdgpu_mn_invalidate_range_end, }, @@ -373,12 +375,63 @@ static const struct
Re: [PATCH] drm/amdgpu/display: return proper error codes in dm
On Thu, Sep 13, 2018 at 11:29:28AM -0500, Alex Deucher wrote: > Replace -1 with proper error codes. > > Signed-off-by: Alex Deucher Acked-by: Huang Rui > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index eccae63d3ef1..541f33749961 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -493,7 +493,7 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) > error: > amdgpu_dm_fini(adev); > > - return -1; > + return -EINVAL; > } > > static void amdgpu_dm_fini(struct amdgpu_device *adev) > @@ -548,7 +548,7 @@ static int load_dmcu_fw(struct amdgpu_device *adev) > break; > default: > DRM_ERROR("Unsupported ASIC type: 0x%X\n", adev->asic_type); > - return -1; > + return -EINVAL; > } > > if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) { > @@ -1537,7 +1537,7 @@ static int amdgpu_dm_initialize_drm_device(struct > amdgpu_device *adev) > link_cnt = dm->dc->caps.max_links; > if (amdgpu_dm_mode_config_init(dm->adev)) { > DRM_ERROR("DM: Failed to initialize mode config\n"); > - return -1; > + return -EINVAL; > } > > /* Identify the number of planes to be initialized */ > @@ -1652,7 +1652,7 @@ static int amdgpu_dm_initialize_drm_device(struct > amdgpu_device *adev) > kfree(aconnector); > for (i = 0; i < dm->dc->caps.max_planes; i++) > kfree(mode_info->planes[i]); > - return -1; > + return -EINVAL; > } > > static void amdgpu_dm_destroy_drm_device(struct amdgpu_display_manager *dm) > -- > 2.13.6 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
On 2018年09月13日 18:22, Christian König wrote: Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 5:20 PM To: Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing): -Original Message- From: Christian König Sent: Thursday, September 13, 2018 4:50 PM To: Zhou, David(ChunMing) ; Koenig, Christian ; dri-de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 2:56 PM To: Zhou, David(ChunMing) ; Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 04:15 schrieb zhoucm1: On 2018年09月12日 19:05, Christian König wrote: [SNIP] +static void drm_syncobj_find_signal_pt_for_wait_pt(struct drm_syncobj *syncobj, + struct drm_syncobj_wait_pt +*wait_pt) { That whole approach still looks horrible complicated to me. It's already very close to what you said before. Especially the separation of signal and wait pt is completely unnecessary as far as I can see. When a wait pt is requested we just need to search for the signal point which it will trigger. Yeah, I tried this, but when I implement cpu wait ioctl on specific point, we need a advanced wait pt fence, otherwise, we could still need old syncobj cb. Why? I mean you just need to call drm_syncobj_find_fence() and when that one returns NULL you use wait_event_*() to wait for a signal point >= your wait point to appear and try again. e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no fence yet, as you said, during drm_syncobj_find_fence(A) is working on wait_event, syncobjB and syncobjC could already be signaled, then we don't know which one is first signaled, which is need when wait ioctl returns. I don't really see a problem with that. When you wait for the first one you need to wait for A,B,C at the same time anyway. So what you do is to register a fence callback on the fences you already have and for the syncobj which doesn't yet have a fence you make sure that they wake up your thread when they get one. So essentially exactly what drm_syncobj_fence_get_or_add_callback() already does today. So do you mean we need still use old syncobj CB for that? Yes, as far as I can see it should work. Advanced wait pt is bad? Well it isn't bad, I just don't see any advantage in it. The advantage is to replace old syncobj cb. The existing mechanism should already be able to handle that. I thought more a bit, we don't that mechanism at all, if use advanced wait pt, we can easily use fence array to achieve it for wait ioctl, we should use kernel existing feature as much as possible, not invent another, shouldn't we? I remember you said it before. Yeah, but the syncobj cb is an existing feature. This is obviously a workaround when doing for wait ioctl, Do you see it used in other place? And I absolutely don't see a need to modify that and replace it with something far more complex. The wait ioctl is simplified much more by fence array, not complex, and we just need to allocate a wait pt. If keeping old syncobj cb workaround, all wait pt logic still is there, just save allocation and wait pt handling, in fact, which part isn't complex at all. But compare with ugly syncobj cb, which is simpler. I strongly disagree on that. You just need to extend the syncobj cb with the sequence number and you are done. We could clean that up in the long term by adding some wait_multi event macro, but for now just adding the sequence number should do the trick. Quote from Daniel Vetter comment when v1, " Specifically for this stuff here having unified future fence semantics will allow drivers to do clever stuff with them. " I think the advanced wait pt is a similar concept as 'future fence' what Daniel Vetter said before, which obviously a right direction. Anyway, I will change the patch as you like if no other comment, so that the patch can pass soon. Thanks, David Zhou Regards, Christian. Thanks, David Zhou Regards, Christian. Thanks, David Zhou Christian. Thanks, David Zhou Regards, Christian. Back to my implementation, it already fixes all your concerns before, and can be able to easily used in wait_ioctl. When you feel that is complicated, I guess that is because we merged all logic to that and much clean up in one patch. In fact, it already is very simple, timeline_init/fini, create signal/wait_pt, find
Re: [PATCH] drm/amdgpu/soc15: clean up picasso support
On Thu, Sep 13, 2018 at 03:07:57PM -0500, Alex Deucher wrote: > It's the same as raven so remove the duplicate case. > > Signed-off-by: Alex Deucher Acked-by: Huang Rui > --- > drivers/gpu/drm/amd/amdgpu/soc15.c | 17 - > 1 file changed, 17 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c > b/drivers/gpu/drm/amd/amdgpu/soc15.c > index f5a44d1fe5da..f930e09071d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > @@ -546,23 +546,6 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) > amdgpu_device_ip_block_add(adev, _v4_0_ip_block); > break; > case CHIP_RAVEN: > - amdgpu_device_ip_block_add(adev, _common_ip_block); > - amdgpu_device_ip_block_add(adev, _v9_0_ip_block); > - amdgpu_device_ip_block_add(adev, _ih_ip_block); > - amdgpu_device_ip_block_add(adev, _v10_0_ip_block); > - amdgpu_device_ip_block_add(adev, _smu_ip_block); > - if (adev->enable_virtual_display || amdgpu_sriov_vf(adev)) > - amdgpu_device_ip_block_add(adev, _virtual_ip_block); > -#if defined(CONFIG_DRM_AMD_DC) > - else if (amdgpu_device_has_dc_support(adev)) > - amdgpu_device_ip_block_add(adev, _ip_block); > -#else > -#warning "Enable CONFIG_DRM_AMD_DC for display support on SOC15." > -#endif > - amdgpu_device_ip_block_add(adev, _v9_0_ip_block); > - amdgpu_device_ip_block_add(adev, _v4_0_ip_block); > - amdgpu_device_ip_block_add(adev, _v1_0_ip_block); > - break; > case CHIP_PICASSO: > amdgpu_device_ip_block_add(adev, _common_ip_block); > amdgpu_device_ip_block_add(adev, _v9_0_ip_block); > -- > 2.13.6 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
On 2018年09月14日 11:14, zhoucm1 wrote: On 2018年09月13日 18:22, Christian König wrote: Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 5:20 PM To: Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing): -Original Message- From: Christian König Sent: Thursday, September 13, 2018 4:50 PM To: Zhou, David(ChunMing) ; Koenig, Christian ; dri-de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing): -Original Message- From: Koenig, Christian Sent: Thursday, September 13, 2018 2:56 PM To: Zhou, David(ChunMing) ; Zhou, David(ChunMing) ; dri- de...@lists.freedesktop.org Cc: Dave Airlie ; Rakos, Daniel ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4 Am 13.09.2018 um 04:15 schrieb zhoucm1: On 2018年09月12日 19:05, Christian König wrote: [SNIP] +static void drm_syncobj_find_signal_pt_for_wait_pt(struct drm_syncobj *syncobj, + struct drm_syncobj_wait_pt +*wait_pt) { That whole approach still looks horrible complicated to me. It's already very close to what you said before. Especially the separation of signal and wait pt is completely unnecessary as far as I can see. When a wait pt is requested we just need to search for the signal point which it will trigger. Yeah, I tried this, but when I implement cpu wait ioctl on specific point, we need a advanced wait pt fence, otherwise, we could still need old syncobj cb. Why? I mean you just need to call drm_syncobj_find_fence() and when that one returns NULL you use wait_event_*() to wait for a signal point >= your wait point to appear and try again. e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no fence yet, as you said, during drm_syncobj_find_fence(A) is working on wait_event, syncobjB and syncobjC could already be signaled, then we don't know which one is first signaled, which is need when wait ioctl returns. I don't really see a problem with that. When you wait for the first one you need to wait for A,B,C at the same time anyway. So what you do is to register a fence callback on the fences you already have and for the syncobj which doesn't yet have a fence you make sure that they wake up your thread when they get one. So essentially exactly what drm_syncobj_fence_get_or_add_callback() already does today. So do you mean we need still use old syncobj CB for that? Yes, as far as I can see it should work. Advanced wait pt is bad? Well it isn't bad, I just don't see any advantage in it. The advantage is to replace old syncobj cb. The existing mechanism should already be able to handle that. I thought more a bit, we don't that mechanism at all, if use advanced wait pt, we can easily use fence array to achieve it for wait ioctl, we should use kernel existing feature as much as possible, not invent another, shouldn't we? I remember you said it before. Yeah, but the syncobj cb is an existing feature. This is obviously a workaround when doing for wait ioctl, Do you see it used in other place? And I absolutely don't see a need to modify that and replace it with something far more complex. The wait ioctl is simplified much more by fence array, not complex, and we just need to allocate a wait pt. If keeping old syncobj cb workaround, all wait pt logic still is there, just save allocation and wait pt handling, in fact, which part isn't complex at all. But compare with ugly syncobj cb, which is simpler. I strongly disagree on that. You just need to extend the syncobj cb with the sequence number and you are done. We could clean that up in the long term by adding some wait_multi event macro, but for now just adding the sequence number should do the trick. Quote from Daniel Vetter comment when v1, " Specifically for this stuff here having unified future fence semantics will allow drivers to do clever stuff with them. " I think the advanced wait pt is a similar concept as 'future fence' what Daniel Vetter said before, which obviously a right direction. Anyway, I will change the patch as you like if no other comment, so that the patch can pass soon. When I try to remove wait pt future fence, I encounter another problem, drm_syncobj_find_fence cannot get a fence if signal pt already is collected as garbage, then CS will report error, any idea for that? I still think the future fence is right thing, Could you give futher thought on it again? Otherwise, we could need various workarounds. Thanks, David Zhou Thanks, David Zhou Regards, Christian. Thanks, David Zhou
Re: [PATCH 0/2] DMCU firmware version storing and access
On Thu, Sep 13, 2018 at 03:45:12PM -0400, David Francis wrote: > David Francis (2): > drm/amd/display: Add DMCU firmware version > drm/amdgpu: Add DMCU to firmware query interface Thanks David. As these patches, we can also monitor the ucode at amdgpu_firmware_info interface. Series are Reviewed-by: Huang Rui > > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++ > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 1 + > include/uapi/drm/amdgpu_drm.h | 2 ++ > 4 files changed, 17 insertions(+) > > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] list: introduce list_bulk_move_tail helper
On Thu, Sep 13, 2018 at 01:22:07PM +0200, Christian König wrote: > Move all entries between @first and including @last before @head. > > This is useful for LRU lists where a whole block of entries should be > moved to the end of an list. > > Signed-off-by: Christian König Bulk move helper is useful for TTM driver to improve the LRU moving efficiency. Please go on with my RB. Series are Reviewed-and-Tested-by: Huang Rui > --- > include/linux/list.h | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/include/linux/list.h b/include/linux/list.h > index de04cc5ed536..edb7628e46ed 100644 > --- a/include/linux/list.h > +++ b/include/linux/list.h > @@ -183,6 +183,29 @@ static inline void list_move_tail(struct list_head *list, > list_add_tail(list, head); > } > > +/** > + * list_bulk_move_tail - move a subsection of a list to its tail > + * @head: the head that will follow our entry > + * @first: first entry to move > + * @last: last entry to move, can be the same as first > + * > + * Move all entries between @first and including @last before @head. > + * All three entries must belong to the same linked list. > + */ > +static inline void list_bulk_move_tail(struct list_head *head, > +struct list_head *first, > +struct list_head *last) > +{ > + first->prev->next = last->next; > + last->next->prev = first->prev; > + > + head->prev->next = first; > + first->prev = head->prev; > + > + last->next = head; > + head->prev = last; > +} > + > /** > * list_is_last - tests whether @list is the last entry in list @head > * @list: the entry to test > -- > 2.14.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx