Re: [PATCH xf86-video-amdgpu 02/19] Guard ODEV_ATTRIB_FD usage with the correct ifdef
On 10 April 2018 at 09:27, Michel Dänzerwrote: > On 2018-04-04 04:29 PM, Emil Velikov wrote: >> From: Emil Velikov >> >> Signed-off-by: Emil Velikov >> --- >> src/amdgpu_probe.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >> index 075e5c1..e65c83b 100644 >> --- a/src/amdgpu_probe.c >> +++ b/src/amdgpu_probe.c >> @@ -120,7 +120,7 @@ static int amdgpu_kernel_open_fd(ScrnInfoPtr pScrn, >> char *busid; >> int fd; >> >> -#ifdef XF86_PDEV_SERVER_FD >> +#ifdef ODEV_ATTRIB_FD >> if (platform_dev) { >> fd = xf86_get_platform_device_int_attrib(platform_dev, >>ODEV_ATTRIB_FD, -1); >> > > ODEV_ATTRIB_FD doesn't seem obviously more "correct" than > XF86_PDEV_SERVER_FD, since both were added in the same xserver commit, > and the latter might be helpful for understanding this is related to the > other code guarded by XF86_PDEV_SERVER_FD. > All the XF86_PDEV_SERVER_FD code is dropped with a later commit ;-) I could move this patch just after said commit, or you prefer to keep the original guard? -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails
On 10 April 2018 at 10:58, Michel Dänzerwrote: > On 2018-04-10 11:47 AM, Emil Velikov wrote: >> On 10 April 2018 at 09:28, Michel Dänzer wrote: >>> On 2018-04-04 04:29 PM, Emil Velikov wrote: From: Emil Velikov Seems like we've been leaking this for years. It became more obvious with the recent refactoring. Signed-off-by: Emil Velikov --- src/amdgpu_probe.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c index 537d44c..588891c 100644 --- a/src/amdgpu_probe.c +++ b/src/amdgpu_probe.c @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num, return TRUE; error: + free(pPriv->ptr); + pPriv->ptr = NULL; return FALSE; } >>> >>> valgrind doesn't report a leak if I force this error path; presumably >>> Xorg frees the private after returning FALSE here. >>> >> Just double-checked and Xorg does not know anything about ptr. The >> only one who clears it up is AMDGPUFreeScreen_KMS. >> >> The magic (for this and the other 'leak') seems to be happening in >> xf86platformAddDevice. Namely: >> - ::platformProbe is called via doPlatformProbe >> - the driver explicitly calls xf86AllocateScreen, yet fails later on >> - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false >> - ::PreInit fails, ::configured is false >> - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka >> AMDGPUFreeScreen_KMS) >> >> Eventually, I could unwrap all that although it makes sense to keep >> things simpler. As effectively done by the patch. >> >> I believe you'll agree? > > I'm afraid not. There's no leak because it's getting cleaned up as > designed, so there's no need for this change. > Fair enough. I'll swap the commit with a comment one for v2. This way, the next person will be less tempted to send the same patch. Something like: pPriv->ptr is freed in our ::FreeScreen callback. Latter of which gets called by xf86DeleteScreen() as the driver ::*Probe call fails. -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 17/19] Store device_name as AMDGPUEntRec::master_node
On 2018-04-10 11:51 AM, Emil Velikov wrote: > On 10 April 2018 at 09:29, Michel Dänzerwrote: >> On 2018-04-04 04:29 PM, Emil Velikov wrote: >>> From: Emil Velikov >>> >>> Rename the variable to reflect what it is. Plus move it out of the dri2 >>> section - it's used in dri2 and dri3. >>> >>> Signed-off-by: Emil Velikov >> >> [...] >> >>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >>> index 4959bd6..e9afe42 100644 >>> --- a/src/amdgpu_probe.c >>> +++ b/src/amdgpu_probe.c >>> @@ -178,6 +178,10 @@ static Bool amdgpu_device_setup(ScrnInfoPtr pScrn, >>> if (pAMDGPUEnt->fd < 0) >>> return FALSE; >>> >>> + pAMDGPUEnt->master_node = drmGetDeviceNameFromFd2(pAMDGPUEnt->fd); >>> + if (pAMDGPUEnt->master_node) >>> +goto error_amdgpu; >> >> This should be >> >> if (!pAMDGPUEnt->master_node) >> >> shouldn't it? >> >> >> ... Which raises the question: How did you test these patches? :) >> > I mentioned it in the cover letter, but seems to have dropped it - > they are untested. > There's a r600 card close-by I could test with, but no amdgpu one :-\ Okay. I can probably test this series, but in general it's preferable for patches to be tested before sending them out for review. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)
On Mon, Apr 09, 2018 at 09:50:19PM +0800, Christian König wrote: > Hi Andrey, > > I think the problem Ray wants to point to is that we now release the > stolen memory after device initialization. > > So during S3 we might run into issues because the first 8MB of VRAM are > corrupted after startup. > Yes. Andrey, Christian. That's why I reserve 8M stolen size bo at the first of the vram. I will forward the history information to you ;-) And nevermind, let me find a vega10 card to test whether these two patches impact the case that I encountered before. Will let you know the result later. Thanks, Ray > Christian. > > Am 09.04.2018 um 15:26 schrieb Grodzovsky, Andrey: > > Top posting (mobile) > > > > I tested S3 with DC enabled only. Even if I disable DC I need a device with > > less then 8M VRAM to reproduce it, don't I ? Otherwise we just gonna > > reserve pre OS FB size of VRAM and not corrupt it. Right ? Should probably > > test it with forcing VRAM size to less then 8M... > > > > Andrey > > > > > > From: Huang Rui> > Sent: 09 April 2018 04:23:06 > > To: Alex Deucher; Grodzovsky, Andrey > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > Subject: Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of > > vram for fw hand-over (v2) > > > > On Fri, Apr 06, 2018 at 02:54:09PM -0500, Alex Deucher wrote: > >> Steal 9 MB for vga emulation and fb if vga is enabled, otherwise, > >> steal enough to cover the current display size as set by the vbios. > >> > >> If no memory is used (e.g., secondary or headless card), skip > >> stolen memory reserve. > >> > >> v2: skip reservation if vram is limited, address Christian's comments > >> > >> Reviewed-and-Tested-by: Andrey Grodzovsky (v1) > >> Signed-off-by: Alex Deucher > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 + > >> drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 23 +-- > >> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 23 +-- > >> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 23 +-- > >> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 51 > >> + > >> 5 files changed, 116 insertions(+), 18 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >> index 205da3ff9cd0..46c69ad34461 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > >> @@ -1454,12 +1454,14 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) > >>return r; > >>} > >> > >> - r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE, > >> - AMDGPU_GEM_DOMAIN_VRAM, > >> - >stolen_vga_memory, > >> - NULL, NULL); > >> - if (r) > >> - return r; > >> + if (adev->gmc.stolen_size) { > >> + r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, > >> PAGE_SIZE, > >> + AMDGPU_GEM_DOMAIN_VRAM, > >> + >stolen_vga_memory, > >> + NULL, NULL); > >> + if (r) > >> + return r; > >> + } > >>DRM_INFO("amdgpu: %uM of VRAM memory ready\n", > >> (unsigned) (adev->gmc.real_vram_size / (1024 * 1024))); > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > >> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > >> index 5617cf62c566..24e1ea36b454 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > >> @@ -825,6 +825,25 @@ static int gmc_v6_0_late_init(void *handle) > >>return 0; > >> } > >> > >> +static unsigned gmc_v6_0_get_vbios_fb_size(struct amdgpu_device *adev) > >> +{ > >> + u32 d1vga_control = RREG32(mmD1VGA_CONTROL); > >> + unsigned size; > >> + > >> + if (REG_GET_FIELD(d1vga_control, D1VGA_CONTROL, D1VGA_MODE_ENABLE)) { > >> + size = 9 * 1024 * 1024; /* reserve 8MB for vga emulator and > >> 1 MB for FB */ > >> + } else { > >> + u32 viewport = RREG32(mmVIEWPORT_SIZE); > >> + size = (REG_GET_FIELD(viewport, VIEWPORT_SIZE, > >> VIEWPORT_HEIGHT) * > >> + REG_GET_FIELD(viewport, VIEWPORT_SIZE, > >> VIEWPORT_WIDTH) * > >> + 4); > >> + } > >> + /* return 0 if the pre-OS buffer uses up most of vram */ > >> + if ((adev->gmc.real_vram_size - size) < (8 * 1024 * 1024)) > >> + return 0; > >> + return size; > >> +} > >> + > >> static int gmc_v6_0_sw_init(void *handle) > >> { > >>int r; > >> @@ -851,8 +870,6 @@ static int gmc_v6_0_sw_init(void *handle) > >> > >>adev->gmc.mc_mask = 0xffULL; > >> > >> -
[PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks
UVD & VCE blocks take up around 1200 msecs of boot time. This patch adds them to the late init work function so as to reduce boot time. Signed-off-by: Shirish S--- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0e798b3..54f1320 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1589,7 +1589,9 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) for (i = 0; i < adev->num_ip_blocks; i++) { if (!adev->ip_blocks[i].status.sw) continue; - if (adev->ip_blocks[i].status.hw) + if (adev->ip_blocks[i].status.hw || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE) continue; r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); if (r) { @@ -1639,17 +1641,18 @@ static bool amdgpu_device_check_vram_lost(struct amdgpu_device *adev) } /** - * amdgpu_device_ip_late_set_cg_state - late init for clockgating + * amdgpu_late_init_ip_blocks - late init of some IP blocks and clockgating * * @adev: amdgpu_device pointer * - * Late initialization pass enabling clockgating for hardware IPs. + * Late initialization pass for high time consuming IP blocks like UVD & VCE + * along with enabling clockgating for hardware IPs. * The list of all the hardware IPs that make up the asic is walked and the * set_clockgating_state callbacks are run. This stage is run late * in the init process. * Returns 0 on success, negative error code on failure. */ -static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) +static int amdgpu_late_init_ip_blocks(struct amdgpu_device *adev) { int i = 0, r; @@ -1657,6 +1660,19 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) return 0; for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.hw && + (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)) { + r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); + if (r) { + DRM_ERROR("hw_init of IP block <%s> failed %d\n", + adev->ip_blocks[i].version->funcs->name, r); + return r; + } + + adev->ip_blocks[i].status.hw = true; + } + if (!adev->ip_blocks[i].status.valid) continue; /* skip CG for VCE/UVD, it's handled specially */ @@ -1823,7 +1839,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) * * @work: work_struct * - * Work handler for amdgpu_device_ip_late_set_cg_state. We put the + * Work handler for amdgpu_late_init_ip_blocks. We put the * clockgating setup into a worker thread to speed up driver init and * resume from suspend. */ @@ -1831,7 +1847,7 @@ static void amdgpu_device_ip_late_init_func_handler(struct work_struct *work) { struct amdgpu_device *adev = container_of(work, struct amdgpu_device, late_init_work.work); - amdgpu_device_ip_late_set_cg_state(adev); + amdgpu_late_init_ip_blocks(adev); } /** -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/3] drm/amdgpu: remove AMDGPU_GEM_CREATE_NO_FALLBACK handling from CS again
On 2018年04月10日 17:00, Christian König wrote: Am 10.04.2018 um 04:43 schrieb zhoucm1: On 2018年04月09日 18:19, Christian König wrote: That should purely be handled by preferred/allowed domains. Although this flag isn't exported to user space yet, I'm curious that how preferred/allowed domains handle no_fallback? IIRC, currently, our driver will always add GTT fallback for VRAM bo. And that is intentional. Going a step further back I think moving the fallback handling into amdgpu_bo_do_create() and adding the flag was a mistake to begin with. Going to send patches to revert all this and further clean the stuff up. if you are able to not change the preferred domain when fallback, it's no problem to me. David Zhou Regards, Christian. Regards, David Zhou Signed-off-by: Christian König--- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 68af2f878bc9..e1756b68a17b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, amdgpu_bo_in_cpu_visible_vram(bo)) p->bytes_moved_vis += ctx.bytes_moved; - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains && - !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) { + if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; goto retry; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 04/19] Remove drmCheckModesettingSupported and kernel module loading
On 2018-04-10 11:20 AM, Emil Velikov wrote: > On 10 April 2018 at 09:26, Michel Dänzerwrote: >> On 2018-04-04 04:29 PM, Emil Velikov wrote: >>> From: Emil Velikov >>> >>> The former of these is a UMS artefact which gives incorrect and >>> misleading promise whether "KMS" is supported. Not to mention that >>> AMDGPU is a only KMS driver. >>> >>> In a similar fashion xf86LoadKernelModule() is a relic of the times, >>> where platforms had no scheme of detecting and loading the appropriate >>> kernel module. >>> >>> Cc: Robert Millan >>> Signed-off-by: Emil Velikov >>> --- >>> Robert, off the top of my head this should work with FreeBSD. Admittedly >>> I'm not an expert on the platform. Please give it a test. >> >> I want to get confirmation from Robert that this will work on FreeBSD >> now, since he explicitly restored the kernel module loading code in >> https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=bfbff3b246db509c820df17b8fcf5899882ffcfa >> . >> > Fully agreed!. That's why I added him to the CC list. > > Throwing some ideas: > - If it's still needed can we keep it !Linux only? The first drmCheckModesettingSupported call? Fine with me. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 17/19] Store device_name as AMDGPUEntRec::master_node
On 10 April 2018 at 09:29, Michel Dänzerwrote: > On 2018-04-04 04:29 PM, Emil Velikov wrote: >> From: Emil Velikov >> >> Rename the variable to reflect what it is. Plus move it out of the dri2 >> section - it's used in dri2 and dri3. >> >> Signed-off-by: Emil Velikov > > [...] > >> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >> index 4959bd6..e9afe42 100644 >> --- a/src/amdgpu_probe.c >> +++ b/src/amdgpu_probe.c >> @@ -178,6 +178,10 @@ static Bool amdgpu_device_setup(ScrnInfoPtr pScrn, >> if (pAMDGPUEnt->fd < 0) >> return FALSE; >> >> + pAMDGPUEnt->master_node = drmGetDeviceNameFromFd2(pAMDGPUEnt->fd); >> + if (pAMDGPUEnt->master_node) >> +goto error_amdgpu; > > This should be > > if (!pAMDGPUEnt->master_node) > > shouldn't it? > > > ... Which raises the question: How did you test these patches? :) > I mentioned it in the cover letter, but seems to have dropped it - they are untested. There's a r600 card close-by I could test with, but no amdgpu one :-\ -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails
On 2018-04-10 11:47 AM, Emil Velikov wrote: > On 10 April 2018 at 09:28, Michel Dänzerwrote: >> On 2018-04-04 04:29 PM, Emil Velikov wrote: >>> From: Emil Velikov >>> >>> Seems like we've been leaking this for years. It became more obvious >>> with the recent refactoring. >>> >>> Signed-off-by: Emil Velikov >>> --- >>> src/amdgpu_probe.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >>> index 537d44c..588891c 100644 >>> --- a/src/amdgpu_probe.c >>> +++ b/src/amdgpu_probe.c >>> @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num, >>> return TRUE; >>> >>> error: >>> + free(pPriv->ptr); >>> + pPriv->ptr = NULL; >>> return FALSE; >>> } >>> >>> >> >> valgrind doesn't report a leak if I force this error path; presumably >> Xorg frees the private after returning FALSE here. >> > Just double-checked and Xorg does not know anything about ptr. The > only one who clears it up is AMDGPUFreeScreen_KMS. > > The magic (for this and the other 'leak') seems to be happening in > xf86platformAddDevice. Namely: > - ::platformProbe is called via doPlatformProbe > - the driver explicitly calls xf86AllocateScreen, yet fails later on > - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false > - ::PreInit fails, ::configured is false > - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka > AMDGPUFreeScreen_KMS) > > Eventually, I could unwrap all that although it makes sense to keep > things simpler. As effectively done by the patch. > > I believe you'll agree? I'm afraid not. There's no leak because it's getting cleaned up as designed, so there's no need for this change. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks
UVD & VCE blocks take up around 1200 msecs of boot time. This patch adds them to the late init work function so as to reduce boot time. Signed-off-by: Shirish S--- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0e798b3..54f1320 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1589,7 +1589,9 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) for (i = 0; i < adev->num_ip_blocks; i++) { if (!adev->ip_blocks[i].status.sw) continue; - if (adev->ip_blocks[i].status.hw) + if (adev->ip_blocks[i].status.hw || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE) continue; r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); if (r) { @@ -1639,17 +1641,18 @@ static bool amdgpu_device_check_vram_lost(struct amdgpu_device *adev) } /** - * amdgpu_device_ip_late_set_cg_state - late init for clockgating + * amdgpu_late_init_ip_blocks - late init of some IP blocks and clockgating * * @adev: amdgpu_device pointer * - * Late initialization pass enabling clockgating for hardware IPs. + * Late initialization pass for high time consuming IP blocks like UVD & VCE + * along with enabling clockgating for hardware IPs. * The list of all the hardware IPs that make up the asic is walked and the * set_clockgating_state callbacks are run. This stage is run late * in the init process. * Returns 0 on success, negative error code on failure. */ -static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) +static int amdgpu_late_init_ip_blocks(struct amdgpu_device *adev) { int i = 0, r; @@ -1657,6 +1660,19 @@ static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev) return 0; for (i = 0; i < adev->num_ip_blocks; i++) { + if (!adev->ip_blocks[i].status.hw && + (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD || + adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)) { + r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev); + if (r) { + DRM_ERROR("hw_init of IP block <%s> failed %d\n", + adev->ip_blocks[i].version->funcs->name, r); + return r; + } + + adev->ip_blocks[i].status.hw = true; + } + if (!adev->ip_blocks[i].status.valid) continue; /* skip CG for VCE/UVD, it's handled specially */ @@ -1823,7 +1839,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev) * * @work: work_struct * - * Work handler for amdgpu_device_ip_late_set_cg_state. We put the + * Work handler for amdgpu_late_init_ip_blocks. We put the * clockgating setup into a worker thread to speed up driver init and * resume from suspend. */ @@ -1831,7 +1847,7 @@ static void amdgpu_device_ip_late_init_func_handler(struct work_struct *work) { struct amdgpu_device *adev = container_of(work, struct amdgpu_device, late_init_work.work); - amdgpu_device_ip_late_set_cg_state(adev); + amdgpu_late_init_ip_blocks(adev); } /** -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Recall: [PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks
S, Shirish would like to recall the message, "[PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks". ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 19/19] TODO
On 10 April 2018 at 09:30, Michel Dänzerwrote: > On 2018-04-04 04:29 PM, Emil Velikov wrote: >> From: Emil Velikov >> >> Signed-off-by: Emil Velikov >> --- >> todo | 9 + >> 1 file changed, 9 insertions(+) >> create mode 100644 todo >> >> diff --git a/todo b/todo >> new file mode 100644 >> index 000..10c1ad5 >> --- /dev/null >> +++ b/todo >> @@ -0,0 +1,9 @@ >> + - on amdgpu_probe failure, the pScrn entry is leaked - missing >> api/examples? > > Might be similar to patch 11; does valgrind actually report a leak if > you force this? > > >> + - introduce xf86ConfigEntity and use it >> + - remove embedded AMDGPUInfoRec::pEnt >> + - consistently use gAMDGPUEntityIndex or getAMDGPUEntityIndex >> + - consistently use of pEnt/entity_num -> pScrn->list[], AMDPRIV >> + - kill off DRI_1_ DRICreatePCIBusID - demote again to DRI1 only in X >> codebase >> + - compose bus string early & strcmp instead of device_match? >> + - remove embedded AMDGPUInfoRec::PciInfo - reuse EntityInfoRec::chipset, >> GDevRec::chiIDi, amdgpu_gpu_info::asic_id or ... >> + - use odev to fetch render_node? > > I'm afraid I don't really see these as important enough to be tracked > like this. > Agreed - no reason to keep these in-tree. Idea was to gather feedback on the topics. One example: Do we need the getAMDGPUEntityIndex helper, considering ~half of the existing codebase uses it. Yet other half references gAMDGPUEntityIndex directly. Most of the above, seem to be a copy/paste from the radeon driver, which in turn is a copy from (?) and the original commit lacks any information :-\ -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 04/19] Remove drmCheckModesettingSupported and kernel module loading
On 10 April 2018 at 09:26, Michel Dänzerwrote: > On 2018-04-04 04:29 PM, Emil Velikov wrote: >> From: Emil Velikov >> >> The former of these is a UMS artefact which gives incorrect and >> misleading promise whether "KMS" is supported. Not to mention that >> AMDGPU is a only KMS driver. >> >> In a similar fashion xf86LoadKernelModule() is a relic of the times, >> where platforms had no scheme of detecting and loading the appropriate >> kernel module. >> >> Cc: Robert Millan >> Signed-off-by: Emil Velikov >> --- >> Robert, off the top of my head this should work with FreeBSD. Admittedly >> I'm not an expert on the platform. Please give it a test. > > I want to get confirmation from Robert that this will work on FreeBSD > now, since he explicitly restored the kernel module loading code in > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=bfbff3b246db509c820df17b8fcf5899882ffcfa > . > Fully agreed!. That's why I added him to the CC list. Throwing some ideas: - If it's still needed can we keep it !Linux only? - Wayland does not have a kernel module loading mechanism. - To prevent fan noise and/or card overheating, one really wants to load the kernel module early. Not when X starts ;-) -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 02/19] Guard ODEV_ATTRIB_FD usage with the correct ifdef
On 2018-04-10 11:24 AM, Emil Velikov wrote: > On 10 April 2018 at 09:27, Michel Dänzerwrote: >> On 2018-04-04 04:29 PM, Emil Velikov wrote: >>> From: Emil Velikov >>> >>> Signed-off-by: Emil Velikov >>> --- >>> src/amdgpu_probe.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >>> index 075e5c1..e65c83b 100644 >>> --- a/src/amdgpu_probe.c >>> +++ b/src/amdgpu_probe.c >>> @@ -120,7 +120,7 @@ static int amdgpu_kernel_open_fd(ScrnInfoPtr pScrn, >>> char *busid; >>> int fd; >>> >>> -#ifdef XF86_PDEV_SERVER_FD >>> +#ifdef ODEV_ATTRIB_FD >>> if (platform_dev) { >>> fd = xf86_get_platform_device_int_attrib(platform_dev, >>>ODEV_ATTRIB_FD, -1); >>> >> >> ODEV_ATTRIB_FD doesn't seem obviously more "correct" than >> XF86_PDEV_SERVER_FD, since both were added in the same xserver commit, >> and the latter might be helpful for understanding this is related to the >> other code guarded by XF86_PDEV_SERVER_FD. >> > All the XF86_PDEV_SERVER_FD code is dropped with a later commit ;-) > I could move this patch just after said commit, or you prefer to keep > the original guard? The latter, less churn. :) -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/3] drm/amdgpu: remove AMDGPU_GEM_CREATE_NO_FALLBACK handling from CS again
Am 10.04.2018 um 04:43 schrieb zhoucm1: On 2018年04月09日 18:19, Christian König wrote: That should purely be handled by preferred/allowed domains. Although this flag isn't exported to user space yet, I'm curious that how preferred/allowed domains handle no_fallback? IIRC, currently, our driver will always add GTT fallback for VRAM bo. And that is intentional. Going a step further back I think moving the fallback handling into amdgpu_bo_do_create() and adding the flag was a mistake to begin with. Going to send patches to revert all this and further clean the stuff up. Regards, Christian. Regards, David Zhou Signed-off-by: Christian König--- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 68af2f878bc9..e1756b68a17b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, amdgpu_bo_in_cpu_visible_vram(bo)) p->bytes_moved_vis += ctx.bytes_moved; - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains && - !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) { + if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; goto retry; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails
On 10 April 2018 at 09:28, Michel Dänzerwrote: > On 2018-04-04 04:29 PM, Emil Velikov wrote: >> From: Emil Velikov >> >> Seems like we've been leaking this for years. It became more obvious >> with the recent refactoring. >> >> Signed-off-by: Emil Velikov >> --- >> src/amdgpu_probe.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c >> index 537d44c..588891c 100644 >> --- a/src/amdgpu_probe.c >> +++ b/src/amdgpu_probe.c >> @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num, >> return TRUE; >> >> error: >> + free(pPriv->ptr); >> + pPriv->ptr = NULL; >> return FALSE; >> } >> >> > > valgrind doesn't report a leak if I force this error path; presumably > Xorg frees the private after returning FALSE here. > Just double-checked and Xorg does not know anything about ptr. The only one who clears it up is AMDGPUFreeScreen_KMS. The magic (for this and the other 'leak') seems to be happening in xf86platformAddDevice. Namely: - ::platformProbe is called via doPlatformProbe - the driver explicitly calls xf86AllocateScreen, yet fails later on - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false - ::PreInit fails, ::configured is false - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka AMDGPUFreeScreen_KMS) Eventually, I could unwrap all that although it makes sense to keep things simpler. As effectively done by the patch. I believe you'll agree? -Emil ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH umr] Fix VMID of chained IBs
We were using the VMID field literally when inside an IB it's inherited instead Signed-off-by: Tom St Denis--- src/lib/dump_ib.c | 8 src/lib/ring_decode.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/lib/dump_ib.c b/src/lib/dump_ib.c index 74a3241c309e..cbf859e3633b 100644 --- a/src/lib/dump_ib.c +++ b/src/lib/dump_ib.c @@ -39,10 +39,10 @@ void umr_dump_ib(struct umr_asic *asic, struct umr_ring_decoder *decoder) if (decoder->src.ib_addr == 0) printf("ring[%s%u%s]", BLUE, (unsigned)decoder->src.addr, RST); else - printf("IB[%s%u%s] at %s%d%s@%s0x%llx%s", - BLUE, (unsigned)decoder->src.addr, RST, - YELLOW, (int)decoder->src.vmid, RST, - YELLOW, (unsigned long long)decoder->src.ib_addr, RST); + printf("IB[%s%u%s@%s0x%llx%s + %s0x%x%s]", + BLUE, (int)decoder->src.vmid, RST, + YELLOW, (unsigned long long)decoder->src.ib_addr, RST, + YELLOW, (unsigned)decoder->src.addr * 4, RST); printf("\n"); diff --git a/src/lib/ring_decode.c b/src/lib/ring_decode.c index c1d6bcb98bae..42265e0a74c9 100644 --- a/src/lib/ring_decode.c +++ b/src/lib/ring_decode.c @@ -540,7 +540,7 @@ static void print_decode_pm4_pkt3(struct umr_asic *asic, struct umr_ring_decoder break; case 2: printf("IB_SIZE:%s%lu%s, VMID: %s%lu%s", BLUE, BITS(ib, 0, 20), RST, BLUE, BITS(ib, 24, 32), RST); decoder->pm4.next_ib_state.ib_size = BITS(ib, 0, 20) * 4; - decoder->pm4.next_ib_state.ib_vmid = BITS(ib, 24, 32); + decoder->pm4.next_ib_state.ib_vmid = decoder->next_ib_info.vmid ? decoder->next_ib_info.vmid : BITS(ib, 24, 32); add_ib_pm4(decoder); break; default: printf("Invalid word for opcode 0x%02lx", (unsigned long)decoder->pm4.cur_opcode); -- 2.14.3 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)
So with my and Alex's patches you still observe corruption ? What if you remove my patch and keep Alex's patch ? Andrey On 04/10/2018 06:53 AM, Huang Rui wrote: On Mon, Apr 09, 2018 at 11:17:58AM -0400, Andrey Grodzovsky wrote: OK, tested with DC disabled , no issues on resume (no visible corruption on display or errors in log). Now the display itself freezes after amdgpu is loaded with DC disabled, this happens only when BIOS in VGA mode , in console mode no such problem. Happens before my and Alex patches, looks like a separate issue. So anyway, if corruption would be there (beginning of VRAM and hence scanout FB corrupted) , i should have seen it with grub in console mode where display is fine and not freezing. Reproduce steps: 1. sudo modprobe amdgpu dc=0 ip_block_mask=0x7f 2. pm-suspend/resume two times. You will see the start of vram is corrupted after S3 resume. [ 570.343635] [drm] PCIE GART of 512M enabled (table at 0x00F4). [ 570.343642] [drm] PSP is resuming... [ 570.343713] gmc_v9_0_process_interrupt: 12 callbacks suppressed [ 570.343715] amdgpu :03:00.0: [mmhub] VMC page fault (src_id:0 ring:0 vmid:0 pasid:0) [ 570.343716] amdgpu :03:00.0: at page 0x00f60070 from 18 [ 570.343716] amdgpu :03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010 [ 570.525510] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 570.525523] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -62 [ 570.525536] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-62). [ 570.536704] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx [ 570.540496] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -62 [ 570.547879] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO [ 570.555434] call :03:00.0+ returned -62 after 1973202 usecs [ 570.689812] PM: Device :03:00.0 failed to resume async: error -62 I attached the whole dmesg. Thanks, Ray ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
Adding Anthony and Aric who've been working on Freesync with DC on other OSes for a while. On 2018-04-09 05:45 PM, Manasi Navare wrote: > Thanks for initiating the discussion. Find my comments below: > > On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: >> Adding dri-devel, which I should've included from the start. >> >> On 2018-04-09 03:56 PM, Harry Wentland wrote: >>> === What is adaptive sync and VRR? === >>> >>> Adaptive sync has been part of the DisplayPort spec for a while now and >>> allows graphics adapters to drive displays with varying frame timings. VRR >>> (variable refresh rate) is essentially the same, but defined for HDMI. >>> >>> >>> >>> === Why allow variable frame timings? === >>> >>> Variable render times don't align with fixed refresh rates, leading to >>> stuttering, tearing, and/or input lag. >>> >>> e.g. (rc = render completion, dr = display refresh) >>> >>> rc B CDE F >>> dr A B C C D E F >>> >>> ^ ^ >>> frame missed >>> repeated display >>> twice refresh >>> >>> >>> >>> === Other use cases of adaptive sync >>> >>> Beside the variable render case, adaptive sync also allows adjustment of >>> refresh rates without a mode change. One such use case would be 24 Hz video. >>> > > One of the the advantages here when the render speed is slower than the > display refresh rate, since we are stretching the vertical blanking interval > the display adapters will follow "draw fast and then go idle" approach. This > gives power savings when render rate is lower than the display refresh rate. Are you talking about a use case, such as an idle desktop, where the renders are quite sporadic? > >>> >>> >>> === A DRM render API to support variable refresh rates === >>> >>> In order to benefit from adaptive sync and VRR userland needs a way to let >>> us know whether to vary frame timings or to target a different frame time. >>> These can be provided as atomic properties on a CRTC: >>> * bool variable_refresh_compatible >>> * int target_frame_duration_ns (nanosecond frame duration) >>> >>> This gives us the following cases: >>> >>> variable_refresh_compatible = 0, target_frame_duration_ns = 0 >>> * drive monitor at timing's normal refresh rate >>> >>> variable_refresh_compatible = 1, target_frame_duration_ns = 0 >>> * send new frame to monitor as soon as it's available, if within min/max >>> of monitor's reported capabilities >>> >>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 >>> * send new frame to monitor with the specified target_frame_duration_ns >>> >>> When a target_frame_duration_ns or variable_refresh_compatible cannot be >>> supported the atomic check will reject the commit. >>> > > What I would like is two sets of properties on a CRTC or preferably on a > connector: > > KMD properties that UMD can query: > * vrr_capable - This will be an immutable property for exposing hardware's > capability of supporting VRR. This will be set by the kernel after > reading the EDID mode information and monitor range capabilities. > * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh > rates supported. > These properties are optional and will be created and attached to the DP/eDP > connector when the connector > is getting intialized. > If we're talking about the properties from the EDID these might not necessarily align with a currently selected mode, which might have a refresh rate lower than the vrr_refresh_max, requiring us to cap it at that. In some scenarios we also might do low framerate compensation [1] where we do magic to allow the framerate to drop below the supported range. I think if a vrr_refresh_max/min are exposed to UMD these should really be only for informational purposes, in which case it might make more sense to expose them through sysfs or even debugfs entries. [1] https://www.amd.com/Documents/freesync-lfc.pdf > Properties that you mentioned above that the UMD can set before kernel can > enable VRR functionality > *bool vrr_enable or vrr_compatible > target_frame_duration_ns > > The monitor only specifies the monitor range through EDID. Apart from this > should we also need to scan the modes and check > if there are modes that have the same pixel clock and horizontal timings but > variable vertical totals? > I'm not sure about the VRR spec, but for adaptive sync we should only consider the range limits specified in the EDID and allow adaptive sync for modes within that range. > I have RFC patches for all the above mentioned. If we get a > concensus/agreement on the above properties and method to check > monitor's VRR capability, I can submit those patches atleast as RFC. > That sounds great. I wouldn't mind trying those patches and then working together to arrive at
Re: [PATCH umr] Fix VMID of chained IBs
Am 10.04.2018 um 17:23 schrieb Tom St Denis: We were using the VMID field literally when inside an IB it's inherited instead Signed-off-by: Tom St DenisAcked-by: Christian König --- src/lib/dump_ib.c | 8 src/lib/ring_decode.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/lib/dump_ib.c b/src/lib/dump_ib.c index 74a3241c309e..cbf859e3633b 100644 --- a/src/lib/dump_ib.c +++ b/src/lib/dump_ib.c @@ -39,10 +39,10 @@ void umr_dump_ib(struct umr_asic *asic, struct umr_ring_decoder *decoder) if (decoder->src.ib_addr == 0) printf("ring[%s%u%s]", BLUE, (unsigned)decoder->src.addr, RST); else - printf("IB[%s%u%s] at %s%d%s@%s0x%llx%s", - BLUE, (unsigned)decoder->src.addr, RST, - YELLOW, (int)decoder->src.vmid, RST, - YELLOW, (unsigned long long)decoder->src.ib_addr, RST); + printf("IB[%s%u%s@%s0x%llx%s + %s0x%x%s]", + BLUE, (int)decoder->src.vmid, RST, + YELLOW, (unsigned long long)decoder->src.ib_addr, RST, + YELLOW, (unsigned)decoder->src.addr * 4, RST); printf("\n"); diff --git a/src/lib/ring_decode.c b/src/lib/ring_decode.c index c1d6bcb98bae..42265e0a74c9 100644 --- a/src/lib/ring_decode.c +++ b/src/lib/ring_decode.c @@ -540,7 +540,7 @@ static void print_decode_pm4_pkt3(struct umr_asic *asic, struct umr_ring_decoder break; case 2: printf("IB_SIZE:%s%lu%s, VMID: %s%lu%s", BLUE, BITS(ib, 0, 20), RST, BLUE, BITS(ib, 24, 32), RST); decoder->pm4.next_ib_state.ib_size = BITS(ib, 0, 20) * 4; - decoder->pm4.next_ib_state.ib_vmid = BITS(ib, 24, 32); + decoder->pm4.next_ib_state.ib_vmid = decoder->next_ib_info.vmid ? decoder->next_ib_info.vmid : BITS(ib, 24, 32); add_ib_pm4(decoder); break; default: printf("Invalid word for opcode 0x%02lx", (unsigned long)decoder->pm4.cur_opcode); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
Am 10.04.2018 um 17:08 schrieb Harry Wentland: On 2018-04-10 03:37 AM, Michel Dänzer wrote: On 2018-04-10 08:45 AM, Christian König wrote: Am 09.04.2018 um 23:45 schrieb Manasi Navare: Thanks for initiating the discussion. Find my comments below: On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: On 2018-04-09 03:56 PM, Harry Wentland wrote: === A DRM render API to support variable refresh rates === In order to benefit from adaptive sync and VRR userland needs a way to let us know whether to vary frame timings or to target a different frame time. These can be provided as atomic properties on a CRTC: * bool variable_refresh_compatible * int target_frame_duration_ns (nanosecond frame duration) This gives us the following cases: variable_refresh_compatible = 0, target_frame_duration_ns = 0 * drive monitor at timing's normal refresh rate variable_refresh_compatible = 1, target_frame_duration_ns = 0 * send new frame to monitor as soon as it's available, if within min/max of monitor's reported capabilities variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 * send new frame to monitor with the specified target_frame_duration_ns When a target_frame_duration_ns or variable_refresh_compatible cannot be supported the atomic check will reject the commit. What I would like is two sets of properties on a CRTC or preferably on a connector: KMD properties that UMD can query: * vrr_capable - This will be an immutable property for exposing hardware's capability of supporting VRR. This will be set by the kernel after reading the EDID mode information and monitor range capabilities. * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh rates supported. These properties are optional and will be created and attached to the DP/eDP connector when the connector is getting intialized. Mhm, aren't those properties actually per mode and not per CRTC/connector? Properties that you mentioned above that the UMD can set before kernel can enable VRR functionality *bool vrr_enable or vrr_compatible target_frame_duration_ns Yeah, that certainly makes sense. But target_frame_duration_ns is a bad name/semantics. We should use an absolute timestamp where the frame should be presented, otherwise you could run into a bunch of trouble with IOCTL restarts or missed blanks. Also, a fixed target frame duration isn't suitable even for video playback, due to drift between the video and audio clocks. Time-based presentation seems to be the right approach for preventing micro-stutter in games as well, Croteam developers have been researching this. I'm not sure if the driver can ever give a guarantee of the exact time a flip occurs. What we have control over with our HW is frame duration. Sounds like you misunderstood what we mean here. The driver does not need to give an exact guarantee that a flip happens at that time. It should just not flip before that specific time. E.g. when we missed a VBLANK your approach would still wait for the specific amount of time, while an absolute timestamp would mean to flip as soon as possible after that timestamp passed. As Michel noted that is also exactly what video players need. Are Croteam devs trying to predict render times? I'm not sure how that would work. We've had bad experience in the past with games that try to do framepacing as that's usually not accurate and tends to lead to more problems than benefits. As far as I understand that is just a regulated feedback system, e.g. the application records the timestamps of the last three frames (or so) and then uses that + margin to as world time for the 3D rendering. When the application has finished sending all rendering commands it sends the frame to be displayed exactly with that timestamp as well. The timestamp when the frame was actually displayed is then used again as input to the algorithm. Regards, Christian. Harry ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/3] drm/amdgpu: revert "add new bo flag that indicates BOs don't need fallback (v2)"
This reverts commit 6f51d28bfe8e1a676de5cd877639245bed3cc818. Makes fallback handling to complicated. This is just a feature for the GEM interface and shouldn't leak into the core BO create function. Signed-off-by: Christian König--- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 + include/uapi/drm/amdgpu_drm.h | 2 -- 3 files changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 68af2f878bc9..e1756b68a17b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, amdgpu_bo_in_cpu_visible_vram(bo)) p->bytes_moved_vis += ctx.bytes_moved; - if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains && - !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) { + if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; goto retry; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 9e23d6f6f3f3..04d6830347ec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -388,8 +388,6 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, unsigned long size, drm_gem_private_object_init(adev->ddev, >gem_base, size); INIT_LIST_HEAD(>shadow_list); INIT_LIST_HEAD(>va); - bo->preferred_domains = preferred_domains; - bo->allowed_domains = allowed_domains; bo->flags = flags; @@ -426,8 +424,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, unsigned long size, r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type, >placement, page_align, , acc_size, NULL, resv, _ttm_bo_destroy); - if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device && - !(flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) { + if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device) { if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; goto retry; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 80665715e651..0087799962cf 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -98,8 +98,6 @@ extern "C" { #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1 << 6) /* Flag that BO sharing will be explicitly synchronized */ #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC(1 << 7) -/* Flag that BO doesn't need fallback */ -#define AMDGPU_GEM_CREATE_NO_FALLBACK (1 << 8) struct drm_amdgpu_gem_create_in { /** the requested memory size */ -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 03:37 AM, Michel Dänzer wrote: > On 2018-04-10 08:45 AM, Christian König wrote: >> Am 09.04.2018 um 23:45 schrieb Manasi Navare: >>> Thanks for initiating the discussion. Find my comments below: >>> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: On 2018-04-09 03:56 PM, Harry Wentland wrote: > > === A DRM render API to support variable refresh rates === > > In order to benefit from adaptive sync and VRR userland needs a way > to let us know whether to vary frame timings or to target a > different frame time. These can be provided as atomic properties on > a CRTC: > * bool variable_refresh_compatible > * int target_frame_duration_ns (nanosecond frame duration) > > This gives us the following cases: > > variable_refresh_compatible = 0, target_frame_duration_ns = 0 > * drive monitor at timing's normal refresh rate > > variable_refresh_compatible = 1, target_frame_duration_ns = 0 > * send new frame to monitor as soon as it's available, if within > min/max of monitor's reported capabilities > > variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 > * send new frame to monitor with the specified > target_frame_duration_ns > > When a target_frame_duration_ns or variable_refresh_compatible > cannot be supported the atomic check will reject the commit. > >>> What I would like is two sets of properties on a CRTC or preferably on >>> a connector: >>> >>> KMD properties that UMD can query: >>> * vrr_capable - This will be an immutable property for exposing >>> hardware's capability of supporting VRR. This will be set by the >>> kernel after >>> reading the EDID mode information and monitor range capabilities. >>> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max >>> refresh rates supported. >>> These properties are optional and will be created and attached to the >>> DP/eDP connector when the connector >>> is getting intialized. >> >> Mhm, aren't those properties actually per mode and not per CRTC/connector? >> >>> Properties that you mentioned above that the UMD can set before kernel >>> can enable VRR functionality >>> *bool vrr_enable or vrr_compatible >>> target_frame_duration_ns >> >> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad >> name/semantics. >> >> We should use an absolute timestamp where the frame should be presented, >> otherwise you could run into a bunch of trouble with IOCTL restarts or >> missed blanks. > > Also, a fixed target frame duration isn't suitable even for video > playback, due to drift between the video and audio clocks. > > Time-based presentation seems to be the right approach for preventing > micro-stutter in games as well, Croteam developers have been researching > this. > I'm not sure if the driver can ever give a guarantee of the exact time a flip occurs. What we have control over with our HW is frame duration. Are Croteam devs trying to predict render times? I'm not sure how that would work. We've had bad experience in the past with games that try to do framepacing as that's usually not accurate and tends to lead to more problems than benefits. Harry > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/3] drm/amdgpu: revert "Don't change preferred domian when fallback GTT v6"
This reverts commit 7d1ca1325260a9e9329b10a21e3692e6f188936f. Makes fallback handling to complicated. This is just a feature for the GEM interface and shouldn't leak into the core BO create function. The intended change to preserve the preferred domains is implemented in a follow up patch. Signed-off-by: Christian König--- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 37 +++--- 2 files changed, 27 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 28c2706e48d7..46b9ea4e6103 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -56,11 +56,23 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size, alignment = PAGE_SIZE; } +retry: r = amdgpu_bo_create(adev, size, alignment, initial_domain, flags, type, resv, ); if (r) { - DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, %d)\n", - size, initial_domain, alignment, r); + if (r != -ERESTARTSYS) { + if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { + flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; + goto retry; + } + + if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) { + initial_domain |= AMDGPU_GEM_DOMAIN_GTT; + goto retry; + } + DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, %d)\n", + size, initial_domain, alignment, r); + } return r; } *obj = >gem_base; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 04d6830347ec..6d08cde8443c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -356,7 +356,6 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, unsigned long size, struct amdgpu_bo *bo; unsigned long page_align; size_t acc_size; - u32 domains, preferred_domains, allowed_domains; int r; page_align = roundup(byte_align, PAGE_SIZE) >> PAGE_SHIFT; @@ -370,24 +369,22 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, unsigned long size, acc_size = ttm_bo_dma_acc_size(>mman.bdev, size, sizeof(struct amdgpu_bo)); - preferred_domains = domain & (AMDGPU_GEM_DOMAIN_VRAM | - AMDGPU_GEM_DOMAIN_GTT | - AMDGPU_GEM_DOMAIN_CPU | - AMDGPU_GEM_DOMAIN_GDS | - AMDGPU_GEM_DOMAIN_GWS | - AMDGPU_GEM_DOMAIN_OA); - allowed_domains = preferred_domains; - if (type != ttm_bo_type_kernel && - allowed_domains == AMDGPU_GEM_DOMAIN_VRAM) - allowed_domains |= AMDGPU_GEM_DOMAIN_GTT; - domains = preferred_domains; -retry: bo = kzalloc(sizeof(struct amdgpu_bo), GFP_KERNEL); if (bo == NULL) return -ENOMEM; drm_gem_private_object_init(adev->ddev, >gem_base, size); INIT_LIST_HEAD(>shadow_list); INIT_LIST_HEAD(>va); + bo->preferred_domains = domain & (AMDGPU_GEM_DOMAIN_VRAM | +AMDGPU_GEM_DOMAIN_GTT | +AMDGPU_GEM_DOMAIN_CPU | +AMDGPU_GEM_DOMAIN_GDS | +AMDGPU_GEM_DOMAIN_GWS | +AMDGPU_GEM_DOMAIN_OA); + bo->allowed_domains = bo->preferred_domains; + if (type != ttm_bo_type_kernel && + bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM) + bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT; bo->flags = flags; @@ -420,20 +417,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, unsigned long size, #endif bo->tbo.bdev = >mman.bdev; - amdgpu_ttm_placement_from_domain(bo, domains); + amdgpu_ttm_placement_from_domain(bo, domain); + r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type, >placement, page_align, , acc_size, NULL, resv, _ttm_bo_destroy); - if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device) { - if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) { - flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; - goto retry; - } else if (domains !=
[PATCH] drm/amdgpu/gfx9: cache DB_DEBUG2 and make it available to userspace
Userspace needs to query this value to work around a hw bug in certain cases. Signed-off-by: Alex Deucher--- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 + drivers/gpu/drm/amd/amdgpu/soc15.c| 3 +++ 3 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index ed5c22bfa3e5..09fa37e9a840 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -867,6 +867,8 @@ struct amdgpu_gfx_config { /* gfx configure feature */ uint32_t double_offchip_lds_buf; + /* cached value of DB_DEBUG2 */ + uint32_t db_debug2; }; struct amdgpu_cu_info { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 9d39fd5b1822..66bd6c1c82c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1600,6 +1600,7 @@ static void gfx_v9_0_gpu_init(struct amdgpu_device *adev) gfx_v9_0_setup_rb(adev); gfx_v9_0_get_cu_info(adev, >gfx.cu_info); + adev->gfx.config.db_debug2 = RREG32_SOC15(GC, 0, mmDB_DEBUG2); /* XXX SH_MEM regs */ /* where to put LDS, scratch, GPUVM in FSA64 space */ diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 2e9ebe8db5cc..65e781f05c24 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -287,6 +287,7 @@ static struct soc15_allowed_register_entry soc15_allowed_read_registers[] = { { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1)}, { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STATUS)}, { SOC15_REG_ENTRY(GC, 0, mmGB_ADDR_CONFIG)}, + { SOC15_REG_ENTRY(GC, 0, mmDB_DEBUG2)}, }; static uint32_t soc15_read_indexed_register(struct amdgpu_device *adev, u32 se_num, @@ -315,6 +316,8 @@ static uint32_t soc15_get_register_value(struct amdgpu_device *adev, } else { if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG)) return adev->gfx.config.gb_addr_config; + else if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2)) + return adev->gfx.config.db_debug2; return RREG32(reg_offset); } } -- 2.13.6 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 07:44 AM, Chris Wilson wrote: > Quoting Christian König (2018-04-10 07:45:04) >> Am 09.04.2018 um 23:45 schrieb Manasi Navare: >>> Properties that you mentioned above that the UMD can set before kernel can >>> enable VRR functionality >>> *bool vrr_enable or vrr_compatible >>> target_frame_duration_ns >> >> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad >> name/semantics. >> >> We should use an absolute timestamp where the frame should be presented, >> otherwise you could run into a bunch of trouble with IOCTL restarts or >> missed blanks. > > Hear, hear. I was disappointed not to see this be the starting point of > the conversation. Imo, the uABI should in terms of absolutes with the > drivers mapping that onto HW and reporting back the discrepancies. I think it's just that some of us that work on KMD display drivers have had our work primarily guided by different use cases, such as gaming, which has then be extended to provide a better experience for video as well. We might not be as intimately aware of some of the work that's been done on video APIs and the pains involved in it but are always happy to learn and work together toward the best solution. Harry > -Chris > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 12:37 PM, Nicolai Hähnle wrote: > On 10.04.2018 18:26, Cyr, Aric wrote: >> That presentation time doesn’t need to come to kernel as such and actually >> is fine as-is completely decoupled from adaptive sync. As long as the video >> player provides the new target_frame_duration_ns on the flip, then the >> driver/HW will target the correct refresh rate to match the source content. >> This simply means that more often than not the video presents will align >> very close to the monitor’s refresh rate, resulting in a smooth video >> experience. For example, if you have 24Hz content, and an adaptive sync >> monitor with a range of 40-60Hz, once the target_frame_duration_ns is >> provided, driver can configure the monitor to a fixed refresh rate of 48Hz >> causing all video presents to be frame-doubled in hardware without further >> application intervention. > > What about multi-monitor displays, where you want to play an animation that > spans multiple monitors. You really want all monitors to flip at the same > time. > Syncing two monitors is what we currently do with our timing sync feature where we drive two monitors from the same clock source if they use the same timing. That, along with VSync, guarantees all monitors flip at the same time. I'm not sure if it works with adaptive sync. Are you suggesting to use adaptive sync to do an in-SW sync of multiple displays? > I understand where you're coming from, but the perspective of refusing a > target presentation time is a rather selfish one of "we're the display, we're > the most important, everybody else has to adjust to us" (e.g. to get perfect > sync between video and audio). I admit I'm phrasing it in a bit of an extreme > way, but perhaps this phrasing helps to see why that's just not a very good > attitude to have. > I really dislike arguing on an emotional basis and would rather not use words such as "selfish" in this discussion. I believe all of us want to come to the best possible solution based on technical merit. > All devices (whether video or audio or whatever) should be able to receive a > target presentation time. > I'm not sure I understand the full extent of the problem as I'm not really familiar with how this is currently done, but isn't the problem the same without variable refresh rates (or targeted refresh rates)? A Video API would still have to somehow synchronize audio and video to 60Hz on most monitors today. What would change if we gave user mode the ability to suggest we flip at video frame rates (24/48Hz)? Harry > If the application can make your life a bit easier by providing the targetted > refresh rate as additional *hint-only* parameter (like in your 24 Hz --> 48 > Hz doubling example), then maybe we should indeed consider that. > > Cheers, > Nicolai > > >> >> >> For video games we have a similar situation where a frame is rendered for a >> certain world time and in the ideal case we would actually display the frame >> at this world time. >> >> That seems like it would be a poorly written game that flips like that, >> unless they are explicitly trying to throttle the framerate for some reason. >> When a game presents a completed frame, they’d like that to happen as soon >> as possible. This is why non-VSYNC modes of flipping exist and many games >> leverage this. Adaptive sync gives you the lower latency of immediate flips >> without the tearing imposed by using non-VSYNC flipping. >> >> >> I mean we have the guys from Valve on this mailing list so I think we should >> just get the feedback from them and see what they prefer. >> >> We have thousands of Steam games on other OSes that work great already, but >> we’d certainly be interested in any additional feedback. My guess is they >> prefer to “do nothing” and let driver/HW manage it, otherwise you exempt all >> existing games from supporting adaptive sync without a rewrite or update. >> >> >> Regards, >> Christian. >> >> >> -Aric >> > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
Am 10.04.2018 um 17:35 schrieb Cyr, Aric: -Original Message- From: Wentland, Harry Sent: Tuesday, April 10, 2018 11:08 To: Michel Dänzer; Koenig, Christian ; Manasi Navare Cc: Haehnle, Nicolai ; Daniel Vetter ; Daenzer, Michel ; dri-devel ; amd-gfx mailing list ; Deucher, Alexander ; Cyr, Aric ; Koo, Anthony Subject: Re: RFC for a render API to support adaptive sync and VRR On 2018-04-10 03:37 AM, Michel Dänzer wrote: On 2018-04-10 08:45 AM, Christian König wrote: Am 09.04.2018 um 23:45 schrieb Manasi Navare: Thanks for initiating the discussion. Find my comments below: On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: On 2018-04-09 03:56 PM, Harry Wentland wrote: === A DRM render API to support variable refresh rates === In order to benefit from adaptive sync and VRR userland needs a way to let us know whether to vary frame timings or to target a different frame time. These can be provided as atomic properties on a CRTC: * bool variable_refresh_compatible * int target_frame_duration_ns (nanosecond frame duration) This gives us the following cases: variable_refresh_compatible = 0, target_frame_duration_ns = 0 * drive monitor at timing's normal refresh rate variable_refresh_compatible = 1, target_frame_duration_ns = 0 * send new frame to monitor as soon as it's available, if within min/max of monitor's reported capabilities variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 * send new frame to monitor with the specified target_frame_duration_ns When a target_frame_duration_ns or variable_refresh_compatible cannot be supported the atomic check will reject the commit. What I would like is two sets of properties on a CRTC or preferably on a connector: KMD properties that UMD can query: * vrr_capable - This will be an immutable property for exposing hardware's capability of supporting VRR. This will be set by the kernel after reading the EDID mode information and monitor range capabilities. * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh rates supported. These properties are optional and will be created and attached to the DP/eDP connector when the connector is getting intialized. Mhm, aren't those properties actually per mode and not per CRTC/connector? Properties that you mentioned above that the UMD can set before kernel can enable VRR functionality *bool vrr_enable or vrr_compatible target_frame_duration_ns Yeah, that certainly makes sense. But target_frame_duration_ns is a bad name/semantics. We should use an absolute timestamp where the frame should be presented, otherwise you could run into a bunch of trouble with IOCTL restarts or missed blanks. Also, a fixed target frame duration isn't suitable even for video playback, due to drift between the video and audio clocks. Why? Even if they drift, you know you want to show your 24Hz video frame for 41.ms and adaptive sync can ensure that with reasonable accuracy. All we're doing is eliminating the need for frame rate converters from the application and offloading that to hardware. Time-based presentation seems to be the right approach for preventing micro-stutter in games as well, Croteam developers have been researching this. I'm not sure if the driver can ever give a guarantee of the exact time a flip occurs. What we have control over with our HW is frame duration. Are Croteam devs trying to predict render times? I'm not sure how that would work. We've had bad experience in the past with games that try to do framepacing as that's usually not accurate and tends to lead to more problems than benefits. For gaming, it doesn't make sense nor is it feasible to know how exactly how long a render will take with microsecond precision, very coarse guesses at best. The point of adaptive sync is that it works *transparently* for the majority of cases, within the capability of the HW and driver. We don't want to have every game re-write their engine to support this, but we do want the majority to "just work". The only exception is the video case where an application may want to request a fixed frame duration aligned to the video content. This requires an explicit interface for the video app, and our proposal is to keep it simple: app knows how long a frame should be presented for, and we try to honour that. Well I strongly disagree on that. See VDPAU for example: https://http.download.nvidia.com/XFree86/vdpau/doxygen/html/group___vdp_presentation_queue.html#ga5bd61ca8ef5d1bc54ca6921aa57f835a [in] earliest_presentation_time The timestamp associated with the surface. The presentation queue will not display the surface until the presentation queue's
Re: [PATCH] drm/amdgpu/gfx9: cache DB_DEBUG2 and make it available to userspace
Thanks! Acked-by: Nicolai HähnleOn 10.04.2018 17:18, Alex Deucher wrote: Userspace needs to query this value to work around a hw bug in certain cases. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 + drivers/gpu/drm/amd/amdgpu/soc15.c| 3 +++ 3 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index ed5c22bfa3e5..09fa37e9a840 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -867,6 +867,8 @@ struct amdgpu_gfx_config { /* gfx configure feature */ uint32_t double_offchip_lds_buf; + /* cached value of DB_DEBUG2 */ + uint32_t db_debug2; }; struct amdgpu_cu_info { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 9d39fd5b1822..66bd6c1c82c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1600,6 +1600,7 @@ static void gfx_v9_0_gpu_init(struct amdgpu_device *adev) gfx_v9_0_setup_rb(adev); gfx_v9_0_get_cu_info(adev, >gfx.cu_info); + adev->gfx.config.db_debug2 = RREG32_SOC15(GC, 0, mmDB_DEBUG2); /* XXX SH_MEM regs */ /* where to put LDS, scratch, GPUVM in FSA64 space */ diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 2e9ebe8db5cc..65e781f05c24 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -287,6 +287,7 @@ static struct soc15_allowed_register_entry soc15_allowed_read_registers[] = { { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1)}, { SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STATUS)}, { SOC15_REG_ENTRY(GC, 0, mmGB_ADDR_CONFIG)}, + { SOC15_REG_ENTRY(GC, 0, mmDB_DEBUG2)}, }; static uint32_t soc15_read_indexed_register(struct amdgpu_device *adev, u32 se_num, @@ -315,6 +316,8 @@ static uint32_t soc15_get_register_value(struct amdgpu_device *adev, } else { if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG)) return adev->gfx.config.gb_addr_config; + else if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2)) + return adev->gfx.config.db_debug2; return RREG32(reg_offset); } } -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 10.04.2018 18:26, Cyr, Aric wrote: That presentation time doesn’t need to come to kernel as such and actually is fine as-is completely decoupled from adaptive sync. As long as the video player provides the new target_frame_duration_ns on the flip, then the driver/HW will target the correct refresh rate to match the source content. This simply means that more often than not the video presents will align very close to the monitor’s refresh rate, resulting in a smooth video experience. For example, if you have 24Hz content, and an adaptive sync monitor with a range of 40-60Hz, once the target_frame_duration_ns is provided, driver can configure the monitor to a fixed refresh rate of 48Hz causing all video presents to be frame-doubled in hardware without further application intervention. What about multi-monitor displays, where you want to play an animation that spans multiple monitors. You really want all monitors to flip at the same time. I understand where you're coming from, but the perspective of refusing a target presentation time is a rather selfish one of "we're the display, we're the most important, everybody else has to adjust to us" (e.g. to get perfect sync between video and audio). I admit I'm phrasing it in a bit of an extreme way, but perhaps this phrasing helps to see why that's just not a very good attitude to have. All devices (whether video or audio or whatever) should be able to receive a target presentation time. If the application can make your life a bit easier by providing the targetted refresh rate as additional *hint-only* parameter (like in your 24 Hz --> 48 Hz doubling example), then maybe we should indeed consider that. Cheers, Nicolai For video games we have a similar situation where a frame is rendered for a certain world time and in the ideal case we would actually display the frame at this world time. That seems like it would be a poorly written game that flips like that, unless they are explicitly trying to throttle the framerate for some reason. When a game presents a completed frame, they’d like that to happen as soon as possible. This is why non-VSYNC modes of flipping exist and many games leverage this. Adaptive sync gives you the lower latency of immediate flips without the tearing imposed by using non-VSYNC flipping. I mean we have the guys from Valve on this mailing list so I think we should just get the feedback from them and see what they prefer. We have thousands of Steam games on other OSes that work great already, but we’d certainly be interested in any additional feedback. My guess is they prefer to “do nothing” and let driver/HW manage it, otherwise you exempt all existing games from supporting adaptive sync without a rewrite or update. Regards, Christian. -Aric ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH umr] Add 'disasm_early_term' option.
For UMDs that don't use the 0xBF9F shader terminator marker this option allows the disassembler to stop once it hits the first s_endpgm opcode. Signed-off-by: Tom St Denis--- doc/sphinx/source/basic.rst | 69 +++-- doc/umr.1 | 5 src/app/main.c | 4 ++- src/lib/umr_llvm_disasm.c | 4 ++- src/umr.h | 3 +- 5 files changed, 49 insertions(+), 36 deletions(-) diff --git a/doc/sphinx/source/basic.rst b/doc/sphinx/source/basic.rst index 8d6db65aa88a..84bf35f38b39 100644 --- a/doc/sphinx/source/basic.rst +++ b/doc/sphinx/source/basic.rst @@ -104,39 +104,42 @@ comma separator. The options available are: -+--+--+ -| **Option** | **Description** | -+--+--+ -| quiet| Disable various informative outputs that are not required for functionality. | -+--+--+ -| read_smc | Enable scanning of SMC registers when issuing a --scan command | -+--+--+ -| bits | Enables the display of bitfields when registers are presented | -+--+--+ -| bitsfull | When printing bits use the full path to the bitfield | -+--+--+ -| empty_log| Empty MMIO tracer after reading it | -+--+--+ -| follow | Tells --logscan to continually read the MMIO tracer | -+--+--+ -| no_follow_ib | Instructs the --ring command to not follow IBs pointed to by the ring| -+--+--+ -| named| Tells --read to print out the register name along with the value | -+--+--+ -| many | Allows matching of register names openly. Used with --read and implies the | -| | *named* option. For instance: '\*.dce100.CRTC' will match any register that | -| | contains the fragment 'CRTC' in it. | -+--+--+ -| use_pci | Enables direct PCI access bypassing the kernels debugfs entries. | -+--+--+ -| use_colour | Enables colourful output in various commands. Also accepts use_color| -+--+--+ -| no_kernel| Attempts to avoid kernel access methods. Implies *use_pci*. | -+--+--+ -| verbose | Enables verbose output, for instance in VM decoding | -+--+--+ -| halt_waves | Halt active waves while reading wave status data | -+--+--+ ++---+-+ +| **Option**| **Description** | ++---+-+ +| quiet | Disable various informative outputs that are not required for | +| | functionality. | ++---+-+ +| read_smc | Enable scanning of SMC registers when issuing a --scan command | ++---+-+ +| bits | Enables the display of bitfields when registers are presented | ++---+-+ +| bitsfull | When printing bits use the full path to the
Re: [PATCH xf86-video-amdgpu 2/5] Hook up CRTC color management functions
On 2018-04-09 11:03 AM, Michel Dänzer wrote: On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote: From: "Leo (Sunpeng) Li"The functions insert into the output resource creation, and property change functions. CRTC destroy is also hooked-up for proper cleanup of the CRTC property list. Signed-off-by: Leo (Sunpeng) Li [...] @@ -1933,6 +1933,9 @@ static void drmmode_output_create_resources(xf86OutputPtr output) } } } + + if (output->crtc) + drmmode_crtc_create_resources(output->crtc, output); output->crtc is only non-NULL here for outputs which are enabled at Xorg startup; other outputs won't have the new properties. Is it necessary to have the CRTC properties on a output if the CRTC is disabled for that output? I've tested hot-plugging with this, and the properties do initialize on hot-plug. Though they stay on the output on hot-unplug... Haven't dug into this just yet. Leo ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 1/5] Add functions for changing CRTC color management properties
On 2018-04-09 11:03 AM, Michel Dänzer wrote: On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote: From: "Leo (Sunpeng) Li"This change adds a few functions in preparation of enabling CRTC color managment via the randr interface. The driver-private CRTC object now contains a list of properties, mirroring the driver-private output object. The lifecycle of the CRTC properties will also mirror the output. Since color managment properties are all DRM blobs, we'll expose the ability to change the blob ID. The user can create blobs via libdrm (which can be done without ownership of DRM master), then set the ID via xrandr. The user will then have to ensure proper cleanup by subsequently releasing the blob. That sounds a bit clunky. :) When changing a blob ID, the change only takes effect on the next atomic commit, doesn't it? How does the client trigger the atomic commit? From the perspective of a client that wishes to change a property, the process between regular properties and blob properties should be essentially the same. Both will trigger an atomic commit when the DRM set property ioctl is called from our DDX driver. The only difference is that DRM property blobs can be arbitrary in size, and needs to be passed by reference through its DRM-defined blob ID. Because of this, the client has to create the blob, save it's id, call libXrandr to change it, then destroy the blob after it's been committed. The client has to call libXrandr due to DRM permissions. IIRC, there can be only one DRM master. And since xserver is DRM master, an external application cannot set DRM properties unless it goes through X. However, creating and destroying DRM property blobs and can be done by anyone. Was this the source of the clunkiness? I've thought about having DDX create and destroy the blob instead, but that needs an interface for the client to get arbitrarily sized data to DDX. I'm not aware of any good ways to do so. Don't think the kernel can do this for us either. It does create the blob for legacy gamma, but that's because there's a dedicated ioctl for it. @@ -1604,6 +1623,18 @@ static void drmmode_output_dpms(xf86OutputPtr output, int mode) } } +static Bool drmmode_crtc_property_ignore(drmModePropertyPtr prop) +{ + if (!prop) + return TRUE; + /* Ignore CRTC gamma lut sizes */ + if (!strcmp(prop->name, "GAMMA_LUT_SIZE") || + !strcmp(prop->name, "DEGAMMA_LUT_SIZE")) + return TRUE; Without these properties, how can a client know the LUT sizes? Good point, I originally thought the sizes are fixed and did not need exposing. But who knows if they may change, or even be different per asic. @@ -1618,6 +1649,163 @@ static Bool drmmode_property_ignore(drmModePropertyPtr prop) return FALSE; } +/** +* Configure and change the given output property through randr. Currently "RandR" +* ignores DRM_MODE_PROP_ENU property types. Used as part of create_resources. DRM_MODE_PROP_ENUM is missing the final M. +* +* Return: 0 on success, X-defined error codes on failure. +*/ +static int __rr_configure_and_change_property(xf86OutputPtr output, + drmmode_prop_ptr pmode_prop) No leading underscores in function names please. > + } + else if (mode_prop->flags & DRM_MODE_PROP_RANGE) { The else should be on the same line as }. +static void drmmode_crtc_create_resources(xf86CrtcPtr crtc, + xf86OutputPtr output) +{ + AMDGPUEntPtr pAMDGPUEnt = AMDGPUEntPriv(crtc->scrn); + int i, j; + + /* 'p' prefix for driver private objects */ + drmmode_crtc_private_ptr pmode_crtc = crtc->driver_private; Existing code refers to this as drmmode_crtc, please stick to that. + drmModeCrtcPtr mode_crtc = pmode_crtc->mode_crtc; + + drmmode_prop_ptr pmode_prop; + drmModePropertyPtr mode_prop; + + /* Get list of DRM CRTC properties, and their values */ + drmModeObjectPropertiesPtr mode_props; All local variable declarations should be in a single block, with no blank lines between them, and generally sorted from longer lines to shorter ones. + mode_props = drmModeObjectGetProperties(pAMDGPUEnt->fd, + mode_crtc->crtc_id, + DRM_MODE_OBJECT_CRTC); + if (!mode_props) + goto err_allocs; + + /* Allocate, then populate the driver-private CRTC property list */ + pmode_crtc->props = calloc(mode_props->count_props + 1, +sizeof(drmmode_prop_rec)); Continuation lines should be aligned to opening parens. Any editor which supports EditorConfig should do this automagically. + if (!pmode_crtc->props) + goto err_allocs; + + pmode_crtc->num_props = 0; + + /* Filter through drm crtc
Re: [PATCH xf86-video-amdgpu 3/5] Keep CRTC properties consistent
On 2018-04-09 11:03 AM, Michel Dänzer wrote: On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote: From: "Leo (Sunpeng) Li"In cases where CRTC properties are updated without going through RRChangeOutputProperty, we don't update the properties in user land. Consider setting legacy gamma. It doesn't go through RRChangeOutputProperty, but modifies the CRTC's color management properties. Unless they are updated, the user properties will remain stale. Can you describe a bit more how the legacy gamma and the new properties interact? Sure thing, I'll include this in the message for v2: In kernel, the legacy set gamma interface is essentially an adapter to the non-legacy set properties interface. In the end, they both set the same property to a DRM property blob, which contains the gamma lookup table. The key difference between them is how this blob is created. For legacy gamma, the kernel takes 3 arrays from user-land, and creates the blob using them. Note that a blob is identified by it's blob_id. For non-legacy gamma, the kernel takes a blob_id from user-land that references the blob. This means user-land is responsible for creating the blob. From the perspective of RandR, this presents some problems. Since both paths modify the same property, RandR must keep the reported property value up-to-date with which ever path is used: 1. Legacy gamma via xrandr --output --gamma x:x:x 2. Non-legacy color properties via xrandr --output --set GAMMA_LUT Keeping the value up-to-date isn't a problem for 2, since RandR updates it for us as part of changing output properties. But if 1 is used, the property blob is created within kernel, and RandR is unaware of the new blob_id. To update it, we need to ask kernel about it. --- continue with rest of message --- Therefore, add a function to update user CRTC properties by querying DRM, and call it whenever legacy gamma is changed. Note that drmmode_crtc_gamma_do_set is called from drmmode_set_mode_major, i.e. on every modeset or under some circumstances when a DRI3 client stops page flipping. The property will have to be updated each time the legacy set gamma ioctl is called, since a new blob (with a new blob_id) is created each time. Not sure if this is a good idea, but perhaps we can have a flag that explicitly enable one or the other, depending on user preference? A user-only property with something like: 0: Use legacy gamma, calls to change non-legacy properties are ignored. 1: Use non-legacy, calls to legacy gamma will be ignored. On 0, we can remove/disable all non-legacy properties from the property list, and avoid having to update them. On 1, we'll enable the properties, and won't have to update them either since legacy gamma is "disabled". It has the added benefit of avoiding unexpected legacy gamma sets when using non-legacy, and vice versa. diff --git a/src/drmmode_display.c b/src/drmmode_display.c index 1966fd2..45457c4 100644 --- a/src/drmmode_display.c +++ b/src/drmmode_display.c @@ -61,8 +61,13 @@ #define DEFAULT_NOMINAL_FRAME_RATE 60 +/* Forward declarations */ + static Bool drmmode_xf86crtc_resize(ScrnInfoPtr scrn, int width, int height); +static void drmmode_crtc_update_resources(xf86CrtcPtr crtc); Can you move the drmmode_crtc_update_resources such that the forward declaration isn't necessary? Seems possible. It uses the rr_configure_and_change helper, so I'll pull both of them up. static Bool AMDGPUZaphodStringMatches(ScrnInfoPtr pScrn, const char *s, char *output_name) { @@ -768,6 +773,7 @@ drmmode_crtc_gamma_do_set(xf86CrtcPtr crtc, uint16_t *red, uint16_t *green, drmModeCrtcSetGamma(pAMDGPUEnt->fd, drmmode_crtc->mode_crtc->crtc_id, size, red, green, blue); + drmmode_crtc_update_resources(crtc); } Bool @@ -1653,10 +1659,15 @@ static Bool drmmode_property_ignore(drmModePropertyPtr prop) * Configure and change the given output property through randr. Currently * ignores DRM_MODE_PROP_ENU property types. Used as part of create_resources. * +* @output: The output to configure and change the property on. +* @pmode_prop: The driver-private property object. These two should have been added in patch 1. Yep, will move. Leo ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH xf86-video-amdgpu 0/5] Implementing non-legacy color management
On 2018-04-09 10:10 AM, Michel Dänzer wrote: Hi Leo, apologies for the late follow-up; I was on vacation and then backlogged. No worries, thanks for the review :) On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote: From: "Leo (Sunpeng) Li"These patches will enable modification of non-legacy color management properties via xrandr. On top of the current legacy gamma, DRM allows the setting of three color management tables: the degamma LUT, the color transform matrix (CTM), and the regamma LUT. To user land, all of them are stored as DRM blobs, and are referenced by CRTC properties via their blob IDs. Therefore, in order to allow setting color management via xrandr, we have to: 1. Enable modification of CRTC properties via xrandr 2. Allow configuring and changing DRM blob properties via their IDs 3. Ensure compatability with legacy gamma The first three patches does the above, while the last two does some refactoring work to remove repetative code. A note to reviewers, I'm a little unclear on whether this woks when one CRTC is connected to multiple outputs. I expect that changing a CRTC property via one of its outputs will update for that output only, since randr still understands it as an "output property". In whic case, there needs to be a v2. Yes, I suspect so. However, I'm not sure how I can setup a test for this. Let me know if you have tips. Something like xrandr --output DVI-D-1 --off --output DVI-D-2 --off xrandr --output DVI-D-1 --crtc 0 --mode 1920x1080 --output DVI-D-2 --crtc 0 --mode 1920x1080 and then verify with xrandr --verbose that both outputs are actually using the same CRTC. Note that I'm getting an error on the second step when trying this right now, so there may be something preventing using the same CRTC for multiple outputs. But AFAIK at least in theory it's possible. I'll give this a shot. Leo I will follow up with some comments on individual patches. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: RFC for a render API to support adaptive sync and VRR
> -Original Message- > From: Wentland, Harry > Sent: Tuesday, April 10, 2018 11:08 > To: Michel Dänzer; Koenig, Christian > ; Manasi Navare > > Cc: Haehnle, Nicolai ; Daniel Vetter > ; Daenzer, Michel > ; dri-devel ; > amd-gfx mailing list ; > Deucher, Alexander ; Cyr, Aric ; > Koo, Anthony > Subject: Re: RFC for a render API to support adaptive sync and VRR > > On 2018-04-10 03:37 AM, Michel Dänzer wrote: > > On 2018-04-10 08:45 AM, Christian König wrote: > >> Am 09.04.2018 um 23:45 schrieb Manasi Navare: > >>> Thanks for initiating the discussion. Find my comments below: > >>> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: > On 2018-04-09 03:56 PM, Harry Wentland wrote: > > > > === A DRM render API to support variable refresh rates === > > > > In order to benefit from adaptive sync and VRR userland needs a way > > to let us know whether to vary frame timings or to target a > > different frame time. These can be provided as atomic properties on > > a CRTC: > > * bool variable_refresh_compatible > > * int target_frame_duration_ns (nanosecond frame duration) > > > > This gives us the following cases: > > > > variable_refresh_compatible = 0, target_frame_duration_ns = 0 > > * drive monitor at timing's normal refresh rate > > > > variable_refresh_compatible = 1, target_frame_duration_ns = 0 > > * send new frame to monitor as soon as it's available, if within > > min/max of monitor's reported capabilities > > > > variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 > > * send new frame to monitor with the specified > > target_frame_duration_ns > > > > When a target_frame_duration_ns or variable_refresh_compatible > > cannot be supported the atomic check will reject the commit. > > > >>> What I would like is two sets of properties on a CRTC or preferably on > >>> a connector: > >>> > >>> KMD properties that UMD can query: > >>> * vrr_capable - This will be an immutable property for exposing > >>> hardware's capability of supporting VRR. This will be set by the > >>> kernel after > >>> reading the EDID mode information and monitor range capabilities. > >>> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max > >>> refresh rates supported. > >>> These properties are optional and will be created and attached to the > >>> DP/eDP connector when the connector > >>> is getting intialized. > >> > >> Mhm, aren't those properties actually per mode and not per CRTC/connector? > >> > >>> Properties that you mentioned above that the UMD can set before kernel > >>> can enable VRR functionality > >>> *bool vrr_enable or vrr_compatible > >>> target_frame_duration_ns > >> > >> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad > >> name/semantics. > >> > >> We should use an absolute timestamp where the frame should be presented, > >> otherwise you could run into a bunch of trouble with IOCTL restarts or > >> missed blanks. > > > > Also, a fixed target frame duration isn't suitable even for video > > playback, due to drift between the video and audio clocks. Why? Even if they drift, you know you want to show your 24Hz video frame for 41.ms and adaptive sync can ensure that with reasonable accuracy. All we're doing is eliminating the need for frame rate converters from the application and offloading that to hardware. > > Time-based presentation seems to be the right approach for preventing > > micro-stutter in games as well, Croteam developers have been researching > > this. > > > > I'm not sure if the driver can ever give a guarantee of the exact time a flip > occurs. What we have control over with our HW is frame > duration. > > Are Croteam devs trying to predict render times? I'm not sure how that would > work. We've had bad experience in the past with > games that try to do framepacing as that's usually not accurate and tends to > lead to more problems than benefits. For gaming, it doesn't make sense nor is it feasible to know how exactly how long a render will take with microsecond precision, very coarse guesses at best. The point of adaptive sync is that it works *transparently* for the majority of cases, within the capability of the HW and driver. We don't want to have every game re-write their engine to support this, but we do want the majority to "just work". The only exception is the video case where an application may want to request a fixed frame duration aligned to the video content. This requires an explicit interface for the video app, and our proposal is to keep it simple: app knows how long a frame should be
Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)
Indeed :( After 2 tries i see the problem, if I remove "drm/amdgpu: Free VGA stolen memory as soon as possible." problem goes away. Andrey On 04/10/2018 06:53 AM, Huang Rui wrote: On Mon, Apr 09, 2018 at 11:17:58AM -0400, Andrey Grodzovsky wrote: OK, tested with DC disabled , no issues on resume (no visible corruption on display or errors in log). Now the display itself freezes after amdgpu is loaded with DC disabled, this happens only when BIOS in VGA mode , in console mode no such problem. Happens before my and Alex patches, looks like a separate issue. So anyway, if corruption would be there (beginning of VRAM and hence scanout FB corrupted) , i should have seen it with grub in console mode where display is fine and not freezing. Reproduce steps: 1. sudo modprobe amdgpu dc=0 ip_block_mask=0x7f 2. pm-suspend/resume two times. You will see the start of vram is corrupted after S3 resume. [ 570.343635] [drm] PCIE GART of 512M enabled (table at 0x00F4). [ 570.343642] [drm] PSP is resuming... [ 570.343713] gmc_v9_0_process_interrupt: 12 callbacks suppressed [ 570.343715] amdgpu :03:00.0: [mmhub] VMC page fault (src_id:0 ring:0 vmid:0 pasid:0) [ 570.343716] amdgpu :03:00.0: at page 0x00f60070 from 18 [ 570.343716] amdgpu :03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010 [ 570.525510] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 570.525523] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -62 [ 570.525536] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-62). [ 570.536704] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx [ 570.540496] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -62 [ 570.547879] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO [ 570.555434] call :03:00.0+ returned -62 after 1973202 usecs [ 570.689812] PM: Device :03:00.0 failed to resume async: error -62 I attached the whole dmesg. Thanks, Ray ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 06:26 PM, Cyr, Aric wrote: > From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 > >> For video games we have a similar situation where a frame is rendered >> for a certain world time and in the ideal case we would actually >> display the frame at this world time. > > That seems like it would be a poorly written game that flips like > that, unless they are explicitly trying to throttle the framerate for > some reason. When a game presents a completed frame, they’d like > that to happen as soon as possible. What you're describing is what most games have been doing traditionally. Croteam's research shows that this results in micro-stuttering, because frames may be presented too early. To avoid that, they want to explicitly time each presentation as described by Christian. Maybe we should try getting the Croteam guys researching this involved directly here. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 05:35 PM, Cyr, Aric wrote: >> On 2018-04-10 03:37 AM, Michel Dänzer wrote: >>> On 2018-04-10 08:45 AM, Christian König wrote: Am 09.04.2018 um 23:45 schrieb Manasi Navare: > Thanks for initiating the discussion. Find my comments > below: On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry > Wentland wrote: >> On 2018-04-09 03:56 PM, Harry Wentland wrote: >>> >>> === A DRM render API to support variable refresh rates >>> === >>> >>> In order to benefit from adaptive sync and VRR userland >>> needs a way to let us know whether to vary frame timings >>> or to target a different frame time. These can be >>> provided as atomic properties on a CRTC: * bool >>> variable_refresh_compatible * int >>> target_frame_duration_ns (nanosecond frame duration) >>> >>> This gives us the following cases: >>> >>> variable_refresh_compatible = 0, target_frame_duration_ns >>> = 0 * drive monitor at timing's normal refresh rate >>> >>> variable_refresh_compatible = 1, target_frame_duration_ns >>> = 0 * send new frame to monitor as soon as it's >>> available, if within min/max of monitor's reported >>> capabilities >>> >>> variable_refresh_compatible = 0/1, >>> target_frame_duration_ns = > 0 * send new frame to >>> monitor with the specified target_frame_duration_ns >>> >>> When a target_frame_duration_ns or >>> variable_refresh_compatible cannot be supported the >>> atomic check will reject the commit. >>> > What I would like is two sets of properties on a CRTC or > preferably on a connector: > > KMD properties that UMD can query: * vrr_capable - This will > be an immutable property for exposing hardware's capability > of supporting VRR. This will be set by the kernel after > reading the EDID mode information and monitor range > capabilities. * vrr_vrefresh_max, vrr_vrefresh_min - To > expose the min and max refresh rates supported. These > properties are optional and will be created and attached to > the DP/eDP connector when the connector is getting > intialized. Mhm, aren't those properties actually per mode and not per CRTC/connector? > Properties that you mentioned above that the UMD can set > before kernel can enable VRR functionality *bool vrr_enable > or vrr_compatible target_frame_duration_ns Yeah, that certainly makes sense. But target_frame_duration_ns is a bad name/semantics. We should use an absolute timestamp where the frame should be presented, otherwise you could run into a bunch of trouble with IOCTL restarts or missed blanks. >>> >>> Also, a fixed target frame duration isn't suitable even for >>> video playback, due to drift between the video and audio clocks. > > Why? Even if they drift, you know you want to show your 24Hz video > frame for 41.ms and adaptive sync can ensure that with reasonable > accuracy. Due to the drift, the video player has to occasionally either skip a frame or present it twice to prevent audio and video going out of sync, resulting in visual artifacts. With time-based presentation and variable refresh rate, audio and video can stay in sync without occasional visual artifacts. It would be a pity to create a "variable refresh rate API" which doesn't allow harnessing this strength of variable refresh rate. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: RFC for a render API to support adaptive sync and VRR
> -Original Message- > From: Michel Dänzer [mailto:mic...@daenzer.net] > Sent: Tuesday, April 10, 2018 13:06 > On 2018-04-10 06:26 PM, Cyr, Aric wrote: > > From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 > > > >> For video games we have a similar situation where a frame is rendered > >> for a certain world time and in the ideal case we would actually > >> display the frame at this world time. > > > > That seems like it would be a poorly written game that flips like > > that, unless they are explicitly trying to throttle the framerate for > > some reason. When a game presents a completed frame, they’d like > > that to happen as soon as possible. > > What you're describing is what most games have been doing traditionally. > Croteam's research shows that this results in micro-stuttering, because > frames may be presented too early. To avoid that, they want to > explicitly time each presentation as described by Christian. Yes, I agree completely. However that's only truly relevant for fixed refreshed rate displays. This is the primary reason for having Adaptive Sync. There is no perfect way to solve this without Adaptive Sync, but yes they can come up with better algorithms to improve fixed refresh rate displays. > > Maybe we should try getting the Croteam guys researching this involved > directly here. I'd be interested in any research they could share, for sure. We also have years of experience and research here, but not distilled into any readily available format. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 10.04.2018 19:25, Cyr, Aric wrote: -Original Message- From: Michel Dänzer [mailto:mic...@daenzer.net] Sent: Tuesday, April 10, 2018 13:16 On 2018-04-10 07:13 PM, Cyr, Aric wrote: -Original Message- From: Michel Dänzer [mailto:mic...@daenzer.net] Sent: Tuesday, April 10, 2018 13:06 On 2018-04-10 06:26 PM, Cyr, Aric wrote: From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 For video games we have a similar situation where a frame is rendered for a certain world time and in the ideal case we would actually display the frame at this world time. That seems like it would be a poorly written game that flips like that, unless they are explicitly trying to throttle the framerate for some reason. When a game presents a completed frame, they’d like that to happen as soon as possible. What you're describing is what most games have been doing traditionally. Croteam's research shows that this results in micro-stuttering, because frames may be presented too early. To avoid that, they want to explicitly time each presentation as described by Christian. Yes, I agree completely. However that's only truly relevant for fixed refreshed rate displays. No, it also affects variable refresh; possibly even more in some cases, because the presentation time is less predictable. Yes, and that's why you don't want to do it when you have variable refresh. The hardware in the monitor and GPU will do it for you, so why bother? I think Michel's point is that the monitor and GPU hardware *cannot* really do this, because there's synchronization with audio to take into account, which the GPU or monitor don't know about. Also, as I wrote separately, there's the case of synchronizing multiple monitors. The input to their algorithms will be noisy causing worst estimations. If you just present as fast as you can, it'll just work (within reason). The majority of gamers want maximum FPS for their games, and there's quite frequently outrage at a particular game when they are limited to something lower that what their monitor could otherwise support (i.e. I don't want my game limited to 30Hz if I have a shiny 144Hz gaming display I paid good money for). Of course, there's always exceptions... but in our experience those are few and far between. I agree that games most likely shouldn't try to be smart. I'm curious about the Croteam findings, but even if they did a really clever thing that works better than just telling the display driver "display ASAP please", chances are that *most* developers won't do that. And they'll most likely get it wrong, so our guidance should really be "games should ask for ASAP presentation, and nothing else". However, there *are* legitimate use cases for requesting a specific presentation time, and there *is* precedent of APIs that expose such features. Are there any real problems with exposing an absolute target present time? Cheers, Nicolai ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 01:52 PM, Harry Wentland wrote: > On 2018-04-10 12:37 PM, Nicolai Hähnle wrote: >> On 10.04.2018 18:26, Cyr, Aric wrote: >>> That presentation time doesn’t need to come to kernel as such and actually >>> is fine as-is completely decoupled from adaptive sync. As long as the >>> video player provides the new target_frame_duration_ns on the flip, then >>> the driver/HW will target the correct refresh rate to match the source >>> content. This simply means that more often than not the video presents >>> will align very close to the monitor’s refresh rate, resulting in a smooth >>> video experience. For example, if you have 24Hz content, and an adaptive >>> sync monitor with a range of 40-60Hz, once the target_frame_duration_ns is >>> provided, driver can configure the monitor to a fixed refresh rate of 48Hz >>> causing all video presents to be frame-doubled in hardware without further >>> application intervention. >> >> What about multi-monitor displays, where you want to play an animation that >> spans multiple monitors. You really want all monitors to flip at the same >> time. >> > > Syncing two monitors is what we currently do with our timing sync feature > where we drive two monitors from the same clock source if they use the same > timing. That, along with VSync, guarantees all monitors flip at the same > time. I'm not sure if it works with adaptive sync. > > Are you suggesting to use adaptive sync to do an in-SW sync of multiple > displays? > >> I understand where you're coming from, but the perspective of refusing a >> target presentation time is a rather selfish one of "we're the display, >> we're the most important, everybody else has to adjust to us" (e.g. to get >> perfect sync between video and audio). I admit I'm phrasing it in a bit of >> an extreme way, but perhaps this phrasing helps to see why that's just not a >> very good attitude to have. >> > > I really dislike arguing on an emotional basis and would rather not use words > such as "selfish" in this discussion. I believe all of us want to come to the > best possible solution based on technical merit. > >> All devices (whether video or audio or whatever) should be able to receive a >> target presentation time. >> > > I'm not sure I understand the full extent of the problem as I'm not really > familiar with how this is currently done, but isn't the problem the same > without variable refresh rates (or targeted refresh rates)? A Video API would > still have to somehow synchronize audio and video to 60Hz on most monitors > today. What would change if we gave user mode the ability to suggest we flip > at video frame rates (24/48Hz)? > Never mind. Just saw Michel's reply to an earlier message. Harry > Harry > >> If the application can make your life a bit easier by providing the >> targetted refresh rate as additional *hint-only* parameter (like in your 24 >> Hz --> 48 Hz doubling example), then maybe we should indeed consider that. >> >> Cheers, >> Nicolai >> >> >>> >>> >>> For video games we have a similar situation where a frame is rendered for a >>> certain world time and in the ideal case we would actually display the >>> frame at this world time. >>> >>> That seems like it would be a poorly written game that flips like that, >>> unless they are explicitly trying to throttle the framerate for some >>> reason. When a game presents a completed frame, they’d like that to happen >>> as soon as possible. This is why non-VSYNC modes of flipping exist and >>> many games leverage this. Adaptive sync gives you the lower latency of >>> immediate flips without the tearing imposed by using non-VSYNC flipping. >>> >>> >>> I mean we have the guys from Valve on this mailing list so I think we >>> should just get the feedback from them and see what they prefer. >>> >>> We have thousands of Steam games on other OSes that work great already, but >>> we’d certainly be interested in any additional feedback. My guess is they >>> prefer to “do nothing” and let driver/HW manage it, otherwise you exempt >>> all existing games from supporting adaptive sync without a rewrite or >>> update. >>> >>> >>> Regards, >>> Christian. >>> >>> >>> -Aric >>> >> > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amd/pp: remove unnecessary forward declaration
On Tue, Apr 10, 2018 at 1:18 AM, Rex Zhuwrote: > Signed-off-by: Rex Zhu Series is: Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 84 > +++--- > 1 file changed, 41 insertions(+), 43 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c > index e8ded22..ac44f9c 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c > @@ -75,8 +75,6 @@ > #define DF_CS_AON0_DramBaseAddress0__IntLvNumChan_MASK > 0x00F0L > #define DF_CS_AON0_DramBaseAddress0__IntLvAddrSel_MASK > 0x0700L > #define DF_CS_AON0_DramBaseAddress0__DramBaseAddr_MASK > 0xF000L > -static int vega10_force_clock_level(struct pp_hwmgr *hwmgr, > - enum pp_clock_type type, uint32_t mask); > > static const ULONG PhwVega10_Magic = (ULONG)(PHM_VIslands_Magic); > > @@ -4106,6 +4104,47 @@ static void vega10_set_fan_control_mode(struct > pp_hwmgr *hwmgr, uint32_t mode) > } > } > > +static int vega10_force_clock_level(struct pp_hwmgr *hwmgr, > + enum pp_clock_type type, uint32_t mask) > +{ > + struct vega10_hwmgr *data = hwmgr->backend; > + > + switch (type) { > + case PP_SCLK: > + data->smc_state_table.gfx_boot_level = mask ? (ffs(mask) - 1) > : 0; > + data->smc_state_table.gfx_max_level = mask ? (fls(mask) - 1) > : 0; > + > + PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr), > + "Failed to upload boot level to lowest!", > + return -EINVAL); > + > + PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr), > + "Failed to upload dpm max level to highest!", > + return -EINVAL); > + break; > + > + case PP_MCLK: > + data->smc_state_table.mem_boot_level = mask ? (ffs(mask) - 1) > : 0; > + data->smc_state_table.mem_max_level = mask ? (fls(mask) - 1) > : 0; > + > + PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr), > + "Failed to upload boot level to lowest!", > + return -EINVAL); > + > + PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr), > + "Failed to upload dpm max level to highest!", > + return -EINVAL); > + > + break; > + > + case PP_PCIE: > + default: > + break; > + } > + > + return 0; > +} > + > static int vega10_dpm_force_dpm_level(struct pp_hwmgr *hwmgr, > enum amd_dpm_forced_level level) > { > @@ -4392,47 +4431,6 @@ static int > vega10_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr, > return result; > } > > -static int vega10_force_clock_level(struct pp_hwmgr *hwmgr, > - enum pp_clock_type type, uint32_t mask) > -{ > - struct vega10_hwmgr *data = hwmgr->backend; > - > - switch (type) { > - case PP_SCLK: > - data->smc_state_table.gfx_boot_level = mask ? (ffs(mask) - 1) > : 0; > - data->smc_state_table.gfx_max_level = mask ? (fls(mask) - 1) > : 0; > - > - PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr), > - "Failed to upload boot level to lowest!", > - return -EINVAL); > - > - PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr), > - "Failed to upload dpm max level to highest!", > - return -EINVAL); > - break; > - > - case PP_MCLK: > - data->smc_state_table.mem_boot_level = mask ? (ffs(mask) - 1) > : 0; > - data->smc_state_table.mem_max_level = mask ? (fls(mask) - 1) > : 0; > - > - PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr), > - "Failed to upload boot level to lowest!", > - return -EINVAL); > - > - PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr), > - "Failed to upload dpm max level to highest!", > - return -EINVAL); > - > - break; > - > - case PP_PCIE: > - default: > - break; > - } > - > - return 0; > -} > - > static int vega10_print_clock_levels(struct pp_hwmgr *hwmgr, > enum pp_clock_type type, char *buf) > { > -- > 1.9.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org >
Re: [PATCH] drm/amdkfd: Remove vla
Thanks Christian for catching that. I'm working on a patch series to upstream Vega10 support, about 95% done. It will add this ASIC info for Vega10: static const struct kfd_device_info vega10_device_info = { .asic_family = CHIP_VEGA10, .max_pasid_bits = 16, .max_no_of_hqd = 24, .doorbell_size = 8, .ih_ring_entry_size = 8 * sizeof(uint32_t), /* !!! IH ring entry size is bigger on Vega10 !!! */ .event_interrupt_class = _interrupt_class_v9, .num_of_watch_points = 4, .mqd_size_aligned = MQD_SIZE_ALIGNED, .supports_cwsr = true, .needs_iommu_device = false, .needs_pci_atomics = false, }; If you change it to uint32_t ih_ring_entry[8] and update the check, it should be reasonably future proof. Regards, Felix On 2018-04-10 02:38 AM, Christian König wrote: > Am 09.04.2018 um 23:06 schrieb Laura Abbott: >> There's an ongoing effort to remove VLAs[1] from the kernel to >> eventually >> turn on -Wvla. The single VLA usage in the amdkfd driver is actually >> constant across all current platforms. > > Actually that isn't correct. > > Could be that we haven't upstreamed KFD support for them, but Vega10 > have a different interrupt ring entry size and so would cause the > error message here. > >> Switch to a constant size array >> instead. > > I would say to just make make the array bigger. > > Regards, > Christian. > >> >> [1] https://lkml.org/lkml/2018/3/7/621 >> >> Signed-off-by: Laura Abbott>> --- >> drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 8 +--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c >> b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c >> index 035c351f47c5..c9863858f343 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c >> @@ -139,10 +139,12 @@ static void interrupt_wq(struct work_struct *work) >> { >> struct kfd_dev *dev = container_of(work, struct kfd_dev, >> interrupt_work); >> + uint32_t ih_ring_entry[4]; >> - uint32_t ih_ring_entry[DIV_ROUND_UP( >> - dev->device_info->ih_ring_entry_size, >> - sizeof(uint32_t))]; >> + if (dev->device_info->ih_ring_entry_size > (4 * >> sizeof(uint32_t))) { >> + dev_err(kfd_chardev(), "Ring entry too small\n"); >> + return; >> + } >> while (dequeue_ih_ring_entry(dev, ih_ring_entry)) >> dev->device_info->event_interrupt_class->interrupt_wq(dev, > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdkfd: Remove vla
On 04/09/2018 11:38 PM, Christian König wrote: Am 09.04.2018 um 23:06 schrieb Laura Abbott: There's an ongoing effort to remove VLAs[1] from the kernel to eventually turn on -Wvla. The single VLA usage in the amdkfd driver is actually constant across all current platforms. Actually that isn't correct. Could be that we haven't upstreamed KFD support for them, but Vega10 have a different interrupt ring entry size and so would cause the error message here. Switch to a constant size array instead. I would say to just make make the array bigger. Regards, Christian. What array size would accommodate future chips? [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Laura Abbott--- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c index 035c351f47c5..c9863858f343 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c @@ -139,10 +139,12 @@ static void interrupt_wq(struct work_struct *work) { struct kfd_dev *dev = container_of(work, struct kfd_dev, interrupt_work); + uint32_t ih_ring_entry[4]; - uint32_t ih_ring_entry[DIV_ROUND_UP( - dev->device_info->ih_ring_entry_size, - sizeof(uint32_t))]; + if (dev->device_info->ih_ring_entry_size > (4 * sizeof(uint32_t))) { + dev_err(kfd_chardev(), "Ring entry too small\n"); + return; + } while (dequeue_ih_ring_entry(dev, ih_ring_entry)) dev->device_info->event_interrupt_class->interrupt_wq(dev, ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers
Am 10.04.2018 um 20:25 schrieb Sinan Kaya: Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU. IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do. Good catch, but wrong place to set this. Please move it into the device initialization functions. Regards, Christian. Signed-off-by: Sinan Kaya--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index e4bb435..02465cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -787,6 +787,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) enum dma_data_direction direction = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + dma_set_max_seg_size(adev->dev, PAGE_SIZE); + r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0, ttm->num_pages << PAGE_SHIFT, GFP_KERNEL); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU. IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do. Signed-off-by: Sinan Kaya--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index e4bb435..02465cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -787,6 +787,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) enum dma_data_direction direction = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; + dma_set_max_seg_size(adev->dev, PAGE_SIZE); + r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0, ttm->num_pages << PAGE_SHIFT, GFP_KERNEL); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: RFC for a render API to support adaptive sync and VRR
On 2018-04-10 07:13 PM, Cyr, Aric wrote: >> -Original Message- >> From: Michel Dänzer [mailto:mic...@daenzer.net] >> Sent: Tuesday, April 10, 2018 13:06 >> On 2018-04-10 06:26 PM, Cyr, Aric wrote: >>> From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 >>> For video games we have a similar situation where a frame is rendered for a certain world time and in the ideal case we would actually display the frame at this world time. >>> >>> That seems like it would be a poorly written game that flips like >>> that, unless they are explicitly trying to throttle the framerate for >>> some reason. When a game presents a completed frame, they’d like >>> that to happen as soon as possible. >> >> What you're describing is what most games have been doing traditionally. >> Croteam's research shows that this results in micro-stuttering, because >> frames may be presented too early. To avoid that, they want to >> explicitly time each presentation as described by Christian. > > Yes, I agree completely. However that's only truly relevant for fixed > refreshed rate displays. No, it also affects variable refresh; possibly even more in some cases, because the presentation time is less predictable. I have to leave for today, I'll look up the Croteam video on Youtube explaining this tomorrow if nobody beats me to it. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: RFC for a render API to support adaptive sync and VRR
> -Original Message- > From: Michel Dänzer [mailto:mic...@daenzer.net] > Sent: Tuesday, April 10, 2018 13:16 > > On 2018-04-10 07:13 PM, Cyr, Aric wrote: > >> -Original Message- > >> From: Michel Dänzer [mailto:mic...@daenzer.net] > >> Sent: Tuesday, April 10, 2018 13:06 > >> On 2018-04-10 06:26 PM, Cyr, Aric wrote: > >>> From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 > >>> > For video games we have a similar situation where a frame is rendered > for a certain world time and in the ideal case we would actually > display the frame at this world time. > >>> > >>> That seems like it would be a poorly written game that flips like > >>> that, unless they are explicitly trying to throttle the framerate for > >>> some reason. When a game presents a completed frame, they’d like > >>> that to happen as soon as possible. > >> > >> What you're describing is what most games have been doing traditionally. > >> Croteam's research shows that this results in micro-stuttering, because > >> frames may be presented too early. To avoid that, they want to > >> explicitly time each presentation as described by Christian. > > > > Yes, I agree completely. However that's only truly relevant for fixed > > refreshed rate displays. > > No, it also affects variable refresh; possibly even more in some cases, > because the presentation time is less predictable. Yes, and that's why you don't want to do it when you have variable refresh. The hardware in the monitor and GPU will do it for you, so why bother? The input to their algorithms will be noisy causing worst estimations. If you just present as fast as you can, it'll just work (within reason). The majority of gamers want maximum FPS for their games, and there's quite frequently outrage at a particular game when they are limited to something lower that what their monitor could otherwise support (i.e. I don't want my game limited to 30Hz if I have a shiny 144Hz gaming display I paid good money for). Of course, there's always exceptions... but in our experience those are few and far between. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amd/display: Fix 64-bit division in hwss_edp_power_control
On Tue, Apr 10, 2018 at 4:10 PM, Harry Wentlandwrote: > Signed-off-by: Harry Wentland Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c > b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c > index f32ccdfd18a3..3ba057e2a467 100644 > --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c > +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c > @@ -860,7 +860,7 @@ void hwss_edp_power_control( > dm_get_elapse_time_in_ns( > ctx, > current_ts, > - > link->link_trace.time_stamp.edp_poweroff) / 100; > + > div64_u64(link->link_trace.time_stamp.edp_poweroff, 100)); > unsigned long long wait_time_ms = 0; > > /* max 500ms from LCDVDD off to on */ > -- > 2.15.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 07/21] drm/amd/display: fix segfault on insufficient TG during validation
From: Dmytro LaktyushkinSigned-off-by: Dmytro Laktyushkin Reviewed-by: Dmytro Laktyushkin Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 667fac8749b9..faaba0ea0ace 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1700,7 +1700,7 @@ enum dc_status resource_map_pool_resources( pipe_idx = acquire_first_split_pipe(>res_ctx, pool, stream); #endif - if (pipe_idx < 0) + if (pipe_idx < 0 || context->res_ctx.pipe_ctx[pipe_idx].stream_res.tg == NULL) return DC_NO_CONTROLLER_RESOURCE; pipe_ctx = >res_ctx.pipe_ctx[pipe_idx]; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 11/21] drm/amd/display: Check lid state to determine fast boot optimization.
From: Yongqiang SunFor legacy enable boot up with lid closed, eDP information couldn't be read correctly via SBIOS_SCRATCH_3 results in eDP cannot be light up properly when open lid. Check lid state instead can resolve the issue. Signed-off-by: Yongqiang Sun Reviewed-by: Eric Yang Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dc_stream.h | 1 + .../amd/display/dc/dce110/dce110_hw_sequencer.c| 24 ++ 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h b/drivers/gpu/drm/amd/display/dc/dc_stream.h index 046e87aa699a..5f215ca38c07 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_stream.h +++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h @@ -98,6 +98,7 @@ struct dc_stream_state { int phy_pix_clk; enum signal_type signal; bool dpms_off; + bool lid_state_closed; struct dc_stream_status status; diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 0ff2a8092782..3ba057e2a467 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1481,6 +1481,17 @@ static void disable_vga_and_power_gate_all_controllers( } } +static bool is_eDP_lid_closed(struct dc_state *context) +{ + int i; + + for (i = 0; i < context->stream_count; i++) { + if (context->streams[i]->signal == SIGNAL_TYPE_EDP) + return context->streams[i]->lid_state_closed; + } + return false; +} + static struct dc_link *get_link_for_edp_not_in_use( struct dc *dc, struct dc_state *context) @@ -1515,20 +1526,17 @@ static struct dc_link *get_link_for_edp_not_in_use( */ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) { - struct dc_bios *dcb = dc->ctx->dc_bios; - - /* vbios already light up eDP, so we can leverage vbios and skip eDP + /* check eDP lid state: +* If lid is open, vbios already light up eDP, so we can leverage vbios and skip eDP * programming */ - bool can_eDP_fast_boot_optimize = - (dcb->funcs->get_vga_enabled_displays(dc->ctx->dc_bios) == ATOM_DISPLAY_LCD1_ACTIVE); - - /* if OS doesn't light up eDP and eDP link is available, we want to disable */ + bool lid_state_closed = is_eDP_lid_closed(context); struct dc_link *edp_link_to_turnoff = NULL; - if (can_eDP_fast_boot_optimize) { + if (!lid_state_closed) { edp_link_to_turnoff = get_link_for_edp_not_in_use(dc, context); + /* if OS doesn't light up eDP and eDP link is available, we want to disable */ if (!edp_link_to_turnoff) dc->apply_edp_fast_boot_optimization = true; } -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 03/21] drm/amd/display: Move dp_pixel_encoding_type to stream_encoder include
From: Eric BernsteinSigned-off-by: Eric Bernstein Reviewed-by: Nikola Cornij Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h | 17 - .../gpu/drm/amd/display/dc/inc/hw/stream_encoder.h| 19 +++ 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h b/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h index 9fe73028d588..cf7433ebf91a 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h @@ -186,23 +186,6 @@ enum controller_dp_test_pattern { CONTROLLER_DP_TEST_PATTERN_COLORSQUARES_CEA }; -enum dp_pixel_encoding_type { - DP_PIXEL_ENCODING_TYPE_RGB444 = 0x, - DP_PIXEL_ENCODING_TYPE_YCBCR422 = 0x0001, - DP_PIXEL_ENCODING_TYPE_YCBCR444 = 0x0002, - DP_PIXEL_ENCODING_TYPE_RGB_WIDE_GAMUT = 0x0003, - DP_PIXEL_ENCODING_TYPE_Y_ONLY = 0x0004, - DP_PIXEL_ENCODING_TYPE_YCBCR420 = 0x0005 -}; - -enum dp_component_depth { - DP_COMPONENT_PIXEL_DEPTH_6BPC = 0x, - DP_COMPONENT_PIXEL_DEPTH_8BPC = 0x0001, - DP_COMPONENT_PIXEL_DEPTH_10BPC = 0x0002, - DP_COMPONENT_PIXEL_DEPTH_12BPC = 0x0003, - DP_COMPONENT_PIXEL_DEPTH_16BPC = 0x0004 -}; - enum dc_lut_mode { LUT_BYPASS, LUT_RAM_A, diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h b/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h index 5c21336cae4c..cfa7ec9517ae 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h @@ -29,11 +29,29 @@ #define STREAM_ENCODER_H_ #include "audio_types.h" +#include "hw_shared.h" struct dc_bios; struct dc_context; struct dc_crtc_timing; +enum dp_pixel_encoding_type { + DP_PIXEL_ENCODING_TYPE_RGB444 = 0x, + DP_PIXEL_ENCODING_TYPE_YCBCR422 = 0x0001, + DP_PIXEL_ENCODING_TYPE_YCBCR444 = 0x0002, + DP_PIXEL_ENCODING_TYPE_RGB_WIDE_GAMUT = 0x0003, + DP_PIXEL_ENCODING_TYPE_Y_ONLY = 0x0004, + DP_PIXEL_ENCODING_TYPE_YCBCR420 = 0x0005 +}; + +enum dp_component_depth { + DP_COMPONENT_PIXEL_DEPTH_6BPC = 0x, + DP_COMPONENT_PIXEL_DEPTH_8BPC = 0x0001, + DP_COMPONENT_PIXEL_DEPTH_10BPC = 0x0002, + DP_COMPONENT_PIXEL_DEPTH_12BPC = 0x0003, + DP_COMPONENT_PIXEL_DEPTH_16BPC = 0x0004 +}; + struct encoder_info_frame { /* auxiliary video information */ struct dc_info_packet avi; @@ -138,6 +156,7 @@ struct stream_encoder_funcs { void (*set_avmute)( struct stream_encoder *enc, bool enable); + }; #endif /* STREAM_ENCODER_H_ */ -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 18/21] drm/amd/display: add rq/dlg/ttu to dtn log
From: Dmytro LaktyushkinSigned-off-by: Dmytro Laktyushkin Reviewed-by: Dmytro Laktyushkin Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dc_helper.c | 59 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c | 153 - drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h | 19 +-- .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 114 ++- drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h | 20 +++ drivers/gpu/drm/amd/display/dc/inc/reg_helper.h| 56 6 files changed, 401 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc_helper.c b/drivers/gpu/drm/amd/display/dc/dc_helper.c index 48e1fcf53d43..bd0fda0ceb91 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_helper.c +++ b/drivers/gpu/drm/amd/display/dc/dc_helper.c @@ -117,6 +117,65 @@ uint32_t generic_reg_get5(const struct dc_context *ctx, uint32_t addr, return reg_val; } +uint32_t generic_reg_get6(const struct dc_context *ctx, uint32_t addr, + uint8_t shift1, uint32_t mask1, uint32_t *field_value1, + uint8_t shift2, uint32_t mask2, uint32_t *field_value2, + uint8_t shift3, uint32_t mask3, uint32_t *field_value3, + uint8_t shift4, uint32_t mask4, uint32_t *field_value4, + uint8_t shift5, uint32_t mask5, uint32_t *field_value5, + uint8_t shift6, uint32_t mask6, uint32_t *field_value6) +{ + uint32_t reg_val = dm_read_reg(ctx, addr); + *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1); + *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2); + *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3); + *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4); + *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5); + *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6); + return reg_val; +} + +uint32_t generic_reg_get7(const struct dc_context *ctx, uint32_t addr, + uint8_t shift1, uint32_t mask1, uint32_t *field_value1, + uint8_t shift2, uint32_t mask2, uint32_t *field_value2, + uint8_t shift3, uint32_t mask3, uint32_t *field_value3, + uint8_t shift4, uint32_t mask4, uint32_t *field_value4, + uint8_t shift5, uint32_t mask5, uint32_t *field_value5, + uint8_t shift6, uint32_t mask6, uint32_t *field_value6, + uint8_t shift7, uint32_t mask7, uint32_t *field_value7) +{ + uint32_t reg_val = dm_read_reg(ctx, addr); + *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1); + *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2); + *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3); + *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4); + *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5); + *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6); + *field_value7 = get_reg_field_value_ex(reg_val, mask7, shift7); + return reg_val; +} + +uint32_t generic_reg_get8(const struct dc_context *ctx, uint32_t addr, + uint8_t shift1, uint32_t mask1, uint32_t *field_value1, + uint8_t shift2, uint32_t mask2, uint32_t *field_value2, + uint8_t shift3, uint32_t mask3, uint32_t *field_value3, + uint8_t shift4, uint32_t mask4, uint32_t *field_value4, + uint8_t shift5, uint32_t mask5, uint32_t *field_value5, + uint8_t shift6, uint32_t mask6, uint32_t *field_value6, + uint8_t shift7, uint32_t mask7, uint32_t *field_value7, + uint8_t shift8, uint32_t mask8, uint32_t *field_value8) +{ + uint32_t reg_val = dm_read_reg(ctx, addr); + *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1); + *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2); + *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3); + *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4); + *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5); + *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6); + *field_value7 = get_reg_field_value_ex(reg_val, mask7, shift7); + *field_value8 = get_reg_field_value_ex(reg_val, mask8, shift8); + return reg_val; +} /* note: va version of this is pretty bad idea, since there is a output parameter pass by pointer * compiler won't be able to check for size match and is prone to stack corruption type of bugs diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c index 4ca9b6e9a824..58062172cf3f 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c +++
[PATCH 13/21] drm/amd/display: Move DCC support functions into dchubbub
From: Eric BernsteinAdded dchububu.h header file for common enum/struct definitions. Added new interface functions get_dcc_compression_cap, dcc_support_swizzle, dcc_support_pixel_format. Signed-off-by: Eric Bernstein Reviewed-by: Dmytro Laktyushkin Acked-by: Harry Wentland --- .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c| 221 +++- .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.h| 7 +- .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c | 231 + drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h | 64 ++ 4 files changed, 291 insertions(+), 232 deletions(-) create mode 100644 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c index 738f67ffd1b4..b9fb14a3224b 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c @@ -476,8 +476,227 @@ void hubbub1_toggle_watermark_change_req(struct hubbub *hubbub) DCHUBBUB_ARB_WATERMARK_CHANGE_REQUEST, watermark_change_req); } +static bool hubbub1_dcc_support_swizzle( + enum swizzle_mode_values swizzle, + unsigned int bytes_per_element, + enum segment_order *segment_order_horz, + enum segment_order *segment_order_vert) +{ + bool standard_swizzle = false; + bool display_swizzle = false; + + switch (swizzle) { + case DC_SW_4KB_S: + case DC_SW_64KB_S: + case DC_SW_VAR_S: + case DC_SW_4KB_S_X: + case DC_SW_64KB_S_X: + case DC_SW_VAR_S_X: + standard_swizzle = true; + break; + case DC_SW_4KB_D: + case DC_SW_64KB_D: + case DC_SW_VAR_D: + case DC_SW_4KB_D_X: + case DC_SW_64KB_D_X: + case DC_SW_VAR_D_X: + display_swizzle = true; + break; + default: + break; + } + + if (bytes_per_element == 1 && standard_swizzle) { + *segment_order_horz = segment_order__contiguous; + *segment_order_vert = segment_order__na; + return true; + } + if (bytes_per_element == 2 && standard_swizzle) { + *segment_order_horz = segment_order__non_contiguous; + *segment_order_vert = segment_order__contiguous; + return true; + } + if (bytes_per_element == 4 && standard_swizzle) { + *segment_order_horz = segment_order__non_contiguous; + *segment_order_vert = segment_order__contiguous; + return true; + } + if (bytes_per_element == 8 && standard_swizzle) { + *segment_order_horz = segment_order__na; + *segment_order_vert = segment_order__contiguous; + return true; + } + if (bytes_per_element == 8 && display_swizzle) { + *segment_order_horz = segment_order__contiguous; + *segment_order_vert = segment_order__non_contiguous; + return true; + } + + return false; +} + +static bool hubbub1_dcc_support_pixel_format( + enum surface_pixel_format format, + unsigned int *bytes_per_element) +{ + /* DML: get_bytes_per_element */ + switch (format) { + case SURFACE_PIXEL_FORMAT_GRPH_ARGB1555: + case SURFACE_PIXEL_FORMAT_GRPH_RGB565: + *bytes_per_element = 2; + return true; + case SURFACE_PIXEL_FORMAT_GRPH_ARGB: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR: + case SURFACE_PIXEL_FORMAT_GRPH_ARGB2101010: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010: + *bytes_per_element = 4; + return true; + case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616: + case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F: + case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F: + *bytes_per_element = 8; + return true; + default: + return false; + } +} + +static void hubbub1_get_blk256_size(unsigned int *blk256_width, unsigned int *blk256_height, + unsigned int bytes_per_element) +{ + /* copied from DML. might want to refactor DML to leverage from DML */ + /* DML : get_blk256_size */ + if (bytes_per_element == 1) { + *blk256_width = 16; + *blk256_height = 16; + } else if (bytes_per_element == 2) { + *blk256_width = 16; + *blk256_height = 8; + } else if (bytes_per_element == 4) { + *blk256_width = 8; + *blk256_height = 8; + } else if (bytes_per_element == 8) { + *blk256_width = 8; + *blk256_height = 4; + } +} + +static void hubbub1_det_request_size( +
[PATCH 02/21] drm/amd/display: fix brightness level after resume from suspend
From: Roman LiAdding missing call to cache current backlight values. Otherwise the brightness resets to default value on resume. Signed-off-by: Roman Li Reviewed-by: Charlene Liu Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 13 + drivers/gpu/drm/amd/display/dc/dc_link.h| 2 ++ drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 4 +++- 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/display/dc/core/dc_link.c index 0cd286f8eaa0..b44cf52090a5 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c @@ -2018,6 +2018,19 @@ bool dc_link_set_backlight_level(const struct dc_link *link, uint32_t level, return true; } +bool dc_link_set_abm_disable(const struct dc_link *link) +{ + struct dc *core_dc = link->ctx->dc; + struct abm *abm = core_dc->res_pool->abm; + + if ((abm == NULL) || (abm->funcs->set_backlight_level == NULL)) + return false; + + abm->funcs->set_abm_immediate_disable(abm); + + return true; +} + bool dc_link_set_psr_enable(const struct dc_link *link, bool enable, bool wait) { struct dc *core_dc = link->ctx->dc; diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h b/drivers/gpu/drm/amd/display/dc/dc_link.h index eeff98741293..8a716baa1203 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_link.h +++ b/drivers/gpu/drm/amd/display/dc/dc_link.h @@ -141,6 +141,8 @@ static inline struct dc_link *dc_get_link_at_index(struct dc *dc, uint32_t link_ bool dc_link_set_backlight_level(const struct dc_link *dc_link, uint32_t level, uint32_t frame_ramp, const struct dc_stream_state *stream); +bool dc_link_set_abm_disable(const struct dc_link *dc_link); + bool dc_link_set_psr_enable(const struct dc_link *dc_link, bool enable, bool wait); bool dc_link_get_psr_state(const struct dc_link *dc_link, uint32_t *psr_state); diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 308a1989fb94..71e4812217bb 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1046,8 +1046,10 @@ void dce110_blank_stream(struct pipe_ctx *pipe_ctx) struct dc_stream_state *stream = pipe_ctx->stream; struct dc_link *link = stream->sink->link; - if (link->local_sink && link->local_sink->sink_signal == SIGNAL_TYPE_EDP) + if (link->local_sink && link->local_sink->sink_signal == SIGNAL_TYPE_EDP) { link->dc->hwss.edp_backlight_control(link, false); + dc_link_set_abm_disable(link); + } if (dc_is_dp_signal(pipe_ctx->stream->signal)) pipe_ctx->stream_res.stream_enc->funcs->dp_blank(pipe_ctx->stream_res.stream_enc); -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 17/21] drm/amd/display: Check SCRATCH reg to determine S3 resume.
From: Yongqiang SunUse lid state only to determine fast boot optimization is not enough. For S3/Resume, due to bios isn't involved in boot, eDP wasn't light up, while lid state is open, if do fast boot optimization, eDP panel will skip enable link and result in black screen after boot. And becasue of bios isn't involved, no matter UEFI or Legacy boot, BIOS_SCRATCH_3 value should be 0, use this to determine the case. Signed-off-by: Yongqiang Sun Reviewed-by: Charlene Liu Acked-by: Harry Wentland --- .../amd/display/dc/dce110/dce110_hw_sequencer.c| 33 ++ 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 3ba057e2a467..9e1a8823d3d8 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1526,18 +1526,41 @@ static struct dc_link *get_link_for_edp_not_in_use( */ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) { - /* check eDP lid state: -* If lid is open, vbios already light up eDP, so we can leverage vbios and skip eDP -* programming + /* check eDP lid state and BIOS_SCRATCH_3 to determine fast boot optimization +* UEFI boot +* edp_active_status_from_scratch fast boot optimization +* S4/S5 resume: +* Lid Open true true +* Lid Closefalse false +* +* S3/ resume: +* Lid Open false false +* Lid Closefalse false +* +* Legacy boot: +* edp_active_status_from_scratch fast boot optimization +* S4/S resume: +* Lid Open true true +* Lid Closetrue false +* +* S3/ resume: +* Lid Open false false +* Lid Closefalse false */ + struct dc_bios *dcb = dc->ctx->dc_bios; bool lid_state_closed = is_eDP_lid_closed(context); struct dc_link *edp_link_to_turnoff = NULL; + bool edp_active_status_from_scratch = + (dcb->funcs->get_vga_enabled_displays(dc->ctx->dc_bios) == ATOM_DISPLAY_LCD1_ACTIVE); + /*Lid open*/ if (!lid_state_closed) { edp_link_to_turnoff = get_link_for_edp_not_in_use(dc, context); - /* if OS doesn't light up eDP and eDP link is available, we want to disable */ - if (!edp_link_to_turnoff) + /* if OS doesn't light up eDP and eDP link is available, we want to disable +* If resume from S4/S5, should optimization. +*/ + if (!edp_link_to_turnoff && edp_active_status_from_scratch) dc->apply_edp_fast_boot_optimization = true; } -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 06/21] drm/amd/display: Fix bug where refresh rate becomes fixed
From: Anthony KooThis issue occurs if refresh rate range is very small and lfc is not used. When frame spikes occur, refresh rate becomes fixed and will not restore properly Signed-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- .../drm/amd/display/modules/freesync/freesync.c| 43 -- .../gpu/drm/amd/display/modules/inc/mod_freesync.h | 3 ++ 2 files changed, 26 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index 4af73a72b9a9..be6a6c63b4cc 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -168,21 +168,6 @@ static unsigned int calc_v_total_from_duration( return v_total; } -static unsigned long long calc_nominal_field_rate(const struct dc_stream_state *stream) -{ - unsigned long long nominal_field_rate_in_uhz = 0; - - /* Calculate nominal field rate for stream */ - nominal_field_rate_in_uhz = stream->timing.pix_clk_khz; - nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL; - nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, - stream->timing.h_total); - nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, - stream->timing.v_total); - - return nominal_field_rate_in_uhz; -} - static void update_v_total_for_static_ramp( struct core_freesync *core_freesync, const struct dc_stream_state *stream, @@ -441,10 +426,11 @@ static void apply_fixed_refresh(struct core_freesync *core_freesync, in_out_vrr->adjust.v_total_min; } else { in_out_vrr->adjust.v_total_min = - calc_v_total_from_refresh( - stream, in_out_vrr->max_refresh_in_uhz); + calc_v_total_from_refresh(stream, + in_out_vrr->max_refresh_in_uhz); in_out_vrr->adjust.v_total_max = - in_out_vrr->adjust.v_total_min; + calc_v_total_from_refresh(stream, + in_out_vrr->min_refresh_in_uhz); } } } @@ -638,7 +624,8 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, core_freesync = MOD_FREESYNC_TO_CORE(mod_freesync); /* Calculate nominal field rate for stream */ - nominal_field_rate_in_uhz = calc_nominal_field_rate(stream); + nominal_field_rate_in_uhz = + mod_freesync_calc_nominal_field_rate(stream); min_refresh_in_uhz = in_config->min_refresh_in_uhz; max_refresh_in_uhz = in_config->max_refresh_in_uhz; @@ -888,6 +875,22 @@ void mod_freesync_get_settings(struct mod_freesync *mod_freesync, } } +unsigned long long mod_freesync_calc_nominal_field_rate( + const struct dc_stream_state *stream) +{ + unsigned long long nominal_field_rate_in_uhz = 0; + + /* Calculate nominal field rate for stream */ + nominal_field_rate_in_uhz = stream->timing.pix_clk_khz; + nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL; + nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, + stream->timing.h_total); + nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, + stream->timing.v_total); + + return nominal_field_rate_in_uhz; +} + bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, const struct dc_stream_state *stream, uint32_t min_refresh_cap_in_uhz, @@ -897,7 +900,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, { /* Calculate nominal field rate for stream */ unsigned long long nominal_field_rate_in_uhz = - calc_nominal_field_rate(stream); + mod_freesync_calc_nominal_field_rate(stream); // Check nominal is within range if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz || diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h index e7d77bb6209f..85c98afe9375 100644 --- a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h +++ b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h @@ -159,6 +159,9 @@ void mod_freesync_handle_v_update(struct mod_freesync *mod_freesync, const struct dc_stream_state *stream, struct mod_vrr_params *in_out_vrr); +unsigned long long
[PATCH 15/21] drm/amd/display: HDMI has no sound after Panel power off/on
From: Charlene LiuSigned-off-by: Charlene Liu Reviewed-by: Krunoslav Kovac Acked-by: Harry Wentland Cc: sta...@vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c index 07c32421c226..84e26c894046 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c @@ -718,6 +718,8 @@ static void dce110_stream_encoder_update_hdmi_info_packets( if (info_frame->avi.valid) { const uint32_t *content = (const uint32_t *) _frame->avi.sb[0]; + /*we need turn on clock before programming AFMT block*/ + REG_UPDATE(AFMT_CNTL, AFMT_AUDIO_CLOCK_EN, 1); REG_WRITE(AFMT_AVI_INFO0, content[0]); -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 19/21] drm/amd/display: add calculated clock logging to DTN
From: Dmytro LaktyushkinSigned-off-by: Dmytro Laktyushkin Reviewed-by: Dmytro Laktyushkin Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index c9d4e96084b7..468113d49c95 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -311,7 +311,16 @@ void dcn10_log_hw_state(struct dc *dc) print_rq_dlg_ttu_regs(dc_ctx, ); DTN_INFO("\n"); } - DTN_INFO("\n"); + + DTN_INFO("\nCALCULATED Clocks: dcfclk_khz:%d dcfclk_deep_sleep_khz:%d dispclk_khz:%d\n" + "dppclk_khz:%d max_supported_dppclk_khz:%d fclk_khz:%d socclk_khz:%d\n\n", + dc->current_state->bw.dcn.calc_clk.dcfclk_khz, + dc->current_state->bw.dcn.calc_clk.dcfclk_deep_sleep_khz, + dc->current_state->bw.dcn.calc_clk.dispclk_khz, + dc->current_state->bw.dcn.calc_clk.dppclk_khz, + dc->current_state->bw.dcn.calc_clk.max_supported_dppclk_khz, + dc->current_state->bw.dcn.calc_clk.fclk_khz, + dc->current_state->bw.dcn.calc_clk.socclk_khz); log_mpc_crc(dc); -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 00/21] DC Patches Apr 10, 2018
* Fix audio enablement on HDMI after panel power off/on * Fix brightness after resume Anthony Koo (7): drm/amd/display: add method to check for supported range drm/amd/display: Fix bug where refresh rate becomes fixed drm/amd/display: Fix bug that causes black screen drm/amd/display: Add back code to allow for rounding error drm/amd/display: Do not create memory allocation if stats not enabled drm/amd/display: fix LFC tearing at top of screen drm/amd/display: refactor vupdate interrupt registration Charlene Liu (1): drm/amd/display: HDMI has no sound after Panel power off/on Dmytro Laktyushkin (4): drm/amd/display: fix segfault on insufficient TG during validation drm/amd/display: change dml init to use default structs drm/amd/display: add rq/dlg/ttu to dtn log drm/amd/display: add calculated clock logging to DTN Eric Bernstein (2): drm/amd/display: Move dp_pixel_encoding_type to stream_encoder include drm/amd/display: Move DCC support functions into dchubbub Eric Yang (1): drm/amd/display: dal 3.1.42 Leo (Sunpeng) Li (1): drm/amd/display: Fix regamma not affecting full-intensity color values Roman Li (1): drm/amd/display: fix brightness level after resume from suspend Yongqiang Sun (3): drm/amd/display: Check lid state to determine fast boot optimization. drm/amd/display: Check SCRATCH reg to determine S3 resume. drm/amd/display: Use dig enable to determine fast boot optimization. Yue Hin Lau (1): drm/amd/display: add missing colorspace for set black color .../gpu/drm/amd/display/dc/core/dc_hw_sequencer.c | 21 +- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 13 ++ drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +- drivers/gpu/drm/amd/display/dc/dc.h| 2 +- drivers/gpu/drm/amd/display/dc/dc_helper.c | 59 ++ drivers/gpu/drm/amd/display/dc/dc_link.h | 2 + .../gpu/drm/amd/display/dc/dce/dce_link_encoder.c | 6 +- .../gpu/drm/amd/display/dc/dce/dce_link_encoder.h | 2 + .../drm/amd/display/dc/dce/dce_stream_encoder.c| 2 + .../amd/display/dc/dce110/dce110_hw_sequencer.c| 43 ++-- .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c| 221 +++- .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.h| 7 +- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c | 153 +- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h | 19 +- .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 123 ++- .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c | 231 + .../gpu/drm/amd/display/dc/dml/display_mode_lib.c | 138 ++-- drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h | 64 ++ drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h | 20 ++ drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h | 17 -- .../gpu/drm/amd/display/dc/inc/hw/link_encoder.h | 1 + .../gpu/drm/amd/display/dc/inc/hw/stream_encoder.h | 19 ++ drivers/gpu/drm/amd/display/dc/inc/reg_helper.h| 56 + .../drm/amd/display/modules/freesync/freesync.c| 127 +++ .../gpu/drm/amd/display/modules/inc/mod_freesync.h | 10 + drivers/gpu/drm/amd/display/modules/stats/stats.c | 26 ++- 26 files changed, 986 insertions(+), 398 deletions(-) create mode 100644 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 04/21] drm/amd/display: Fix regamma not affecting full-intensity color values
From: "Leo (Sunpeng) Li"Hardware understands the regamma LUT as a piecewise linear function, with points spaced exponentially along the range. We previously programmed the LUT for range [2^-10, 2^0). This causes (normalized) color values of 1 (=2^0) to miss the programmed LUT, and fall onto the end region. For DCE, the end region is extrapolated using a single (base, slope) pair, using the max y-value from the last point in the curve as base. This presents a problem, since this value affects all three color channels. Scaling down the intensity of say - the blue regamma curve - will not affect it's end region. This is especially noticiable when using RedShift. It scales down the blue and green channels, but leaves full-intensity colors unshifted. Therefore, extend the range to cover [2^-10, 2^1) by programming another hardware segment, containing only one point. That way, we won't be hitting the end region. Note that things are a bit different for DCN, since the end region can be set per-channel. Signed-off-by: Leo (Sunpeng) Li Reviewed-by: Krunoslav Kovac Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 71e4812217bb..0ff2a8092782 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -456,10 +456,13 @@ dce110_translate_regamma_to_hw_format(const struct dc_transfer_func *output_tf, } else { /* 10 segments -* segment is from 2^-10 to 2^0 +* segment is from 2^-10 to 2^1 +* We include an extra segment for range [2^0, 2^1). This is to +* ensure that colors with normalized values of 1 don't miss the +* LUT. */ region_start = -10; - region_end = 0; + region_end = 1; seg_distr[0] = 4; seg_distr[1] = 4; @@ -471,7 +474,7 @@ dce110_translate_regamma_to_hw_format(const struct dc_transfer_func *output_tf, seg_distr[7] = 4; seg_distr[8] = 4; seg_distr[9] = 4; - seg_distr[10] = -1; + seg_distr[10] = 0; seg_distr[11] = -1; seg_distr[12] = -1; seg_distr[13] = -1; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 21/21] drm/amd/display: Use dig enable to determine fast boot optimization.
From: Yongqiang SunLinux doesn't know lid state, better to check dig enable value from register. Signed-off-by: Yongqiang Sun Reviewed-by: Tony Cheng Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dc_stream.h | 1 - .../gpu/drm/amd/display/dc/dce/dce_link_encoder.c | 6 ++- .../gpu/drm/amd/display/dc/dce/dce_link_encoder.h | 2 + .../amd/display/dc/dce110/dce110_hw_sequencer.c| 47 +++--- .../gpu/drm/amd/display/dc/inc/hw/link_encoder.h | 1 + 5 files changed, 21 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h b/drivers/gpu/drm/amd/display/dc/dc_stream.h index 5f215ca38c07..046e87aa699a 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_stream.h +++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h @@ -98,7 +98,6 @@ struct dc_stream_state { int phy_pix_clk; enum signal_type signal; bool dpms_off; - bool lid_state_closed; struct dc_stream_status status; diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c index 8167cad7bcf7..dbe3b26b6d9e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c @@ -113,6 +113,7 @@ static const struct link_encoder_funcs dce110_lnk_enc_funcs = { .connect_dig_be_to_fe = dce110_link_encoder_connect_dig_be_to_fe, .enable_hpd = dce110_link_encoder_enable_hpd, .disable_hpd = dce110_link_encoder_disable_hpd, + .is_dig_enabled = dce110_is_dig_enabled, .destroy = dce110_link_encoder_destroy }; @@ -535,8 +536,9 @@ void dce110_psr_program_secondary_packet(struct link_encoder *enc, DP_SEC_GSP0_PRIORITY, 1); } -static bool is_dig_enabled(const struct dce110_link_encoder *enc110) +bool dce110_is_dig_enabled(struct link_encoder *enc) { + struct dce110_link_encoder *enc110 = TO_DCE110_LINK_ENC(enc); uint32_t value; REG_GET(DIG_BE_EN_CNTL, DIG_ENABLE, ); @@ -1031,7 +1033,7 @@ void dce110_link_encoder_disable_output( struct bp_transmitter_control cntl = { 0 }; enum bp_result result; - if (!is_dig_enabled(enc110)) { + if (!dce110_is_dig_enabled(enc)) { /* OF_SKIP_POWER_DOWN_INACTIVE_ENCODER */ return; } diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h index 0ec3433d34b6..347069461a22 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h @@ -263,4 +263,6 @@ void dce110_psr_program_dp_dphy_fast_training(struct link_encoder *enc, void dce110_psr_program_secondary_packet(struct link_encoder *enc, unsigned int sdp_transmit_line_num_deadline); +bool dce110_is_dig_enabled(struct link_encoder *enc); + #endif /* __DC_LINK_ENCODER__DCE110_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index 9e1a8823d3d8..5dbd4335cd6e 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -1481,15 +1481,15 @@ static void disable_vga_and_power_gate_all_controllers( } } -static bool is_eDP_lid_closed(struct dc_state *context) +static struct dc_link *get_link_for_edp(struct dc *dc) { int i; - for (i = 0; i < context->stream_count; i++) { - if (context->streams[i]->signal == SIGNAL_TYPE_EDP) - return context->streams[i]->lid_state_closed; + for (i = 0; i < dc->link_count; i++) { + if (dc->links[i]->connector_signal == SIGNAL_TYPE_EDP) + return dc->links[i]; } - return false; + return NULL; } static struct dc_link *get_link_for_edp_not_in_use( @@ -1526,41 +1526,22 @@ static struct dc_link *get_link_for_edp_not_in_use( */ void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context) { - /* check eDP lid state and BIOS_SCRATCH_3 to determine fast boot optimization -* UEFI boot -* edp_active_status_from_scratch fast boot optimization -* S4/S5 resume: -* Lid Open true true -* Lid Closefalse false -* -* S3/ resume: -* Lid Open false false -* Lid Closefalse false -* -* Legacy boot: -*
[PATCH 20/21] drm/amd/display: add missing colorspace for set black color
From: Yue Hin LauSigned-off-by: Yue Hin Lau Reviewed-by: Tony Cheng Acked-by: Harry Wentland --- .../gpu/drm/amd/display/dc/core/dc_hw_sequencer.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c index ebc96b720083..ab50b5f0745c 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c @@ -208,6 +208,7 @@ void color_space_to_black_color( case COLOR_SPACE_YCBCR709: case COLOR_SPACE_YCBCR601_LIMITED: case COLOR_SPACE_YCBCR709_LIMITED: + case COLOR_SPACE_2020_YCBCR: *black_color = black_color_format[BLACK_COLOR_FORMAT_YUV_CV]; break; @@ -216,7 +217,25 @@ void color_space_to_black_color( black_color_format[BLACK_COLOR_FORMAT_RGB_LIMITED]; break; - default: + /** +* Remove default and add case for all color space +* so when we forget to add new color space +* compiler will give a warning +*/ + case COLOR_SPACE_UNKNOWN: + case COLOR_SPACE_SRGB: + case COLOR_SPACE_XR_RGB: + case COLOR_SPACE_MSREF_SCRGB: + case COLOR_SPACE_XV_YCC_709: + case COLOR_SPACE_XV_YCC_601: + case COLOR_SPACE_2020_RGB_FULLRANGE: + case COLOR_SPACE_2020_RGB_LIMITEDRANGE: + case COLOR_SPACE_ADOBERGB: + case COLOR_SPACE_DCIP3: + case COLOR_SPACE_DISPLAYNATIVE: + case COLOR_SPACE_DOLBYVISION: + case COLOR_SPACE_APPCTRL: + case COLOR_SPACE_CUSTOMPOINTS: /* fefault is sRGB black (full range). */ *black_color = black_color_format[BLACK_COLOR_FORMAT_RGB_FULLRANGE]; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 05/21] drm/amd/display: add method to check for supported range
From: Anthony KooSigned-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- .../drm/amd/display/modules/freesync/freesync.c| 64 -- .../gpu/drm/amd/display/modules/inc/mod_freesync.h | 7 +++ 2 files changed, 65 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index 5e12e463c06a..4af73a72b9a9 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -168,6 +168,21 @@ static unsigned int calc_v_total_from_duration( return v_total; } +static unsigned long long calc_nominal_field_rate(const struct dc_stream_state *stream) +{ + unsigned long long nominal_field_rate_in_uhz = 0; + + /* Calculate nominal field rate for stream */ + nominal_field_rate_in_uhz = stream->timing.pix_clk_khz; + nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL; + nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, + stream->timing.h_total); + nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, + stream->timing.v_total); + + return nominal_field_rate_in_uhz; +} + static void update_v_total_for_static_ramp( struct core_freesync *core_freesync, const struct dc_stream_state *stream, @@ -623,12 +638,7 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, core_freesync = MOD_FREESYNC_TO_CORE(mod_freesync); /* Calculate nominal field rate for stream */ - nominal_field_rate_in_uhz = stream->timing.pix_clk_khz; - nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL; - nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, - stream->timing.h_total); - nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, - stream->timing.v_total); + nominal_field_rate_in_uhz = calc_nominal_field_rate(stream); min_refresh_in_uhz = in_config->min_refresh_in_uhz; max_refresh_in_uhz = in_config->max_refresh_in_uhz; @@ -878,3 +888,45 @@ void mod_freesync_get_settings(struct mod_freesync *mod_freesync, } } +bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, + const struct dc_stream_state *stream, + uint32_t min_refresh_cap_in_uhz, + uint32_t max_refresh_cap_in_uhz, + uint32_t min_refresh_request_in_uhz, + uint32_t max_refresh_request_in_uhz) +{ + /* Calculate nominal field rate for stream */ + unsigned long long nominal_field_rate_in_uhz = + calc_nominal_field_rate(stream); + + // Check nominal is within range + if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz || + nominal_field_rate_in_uhz < min_refresh_cap_in_uhz) + return false; + + // If nominal is less than max, limit the max allowed refresh rate + if (nominal_field_rate_in_uhz < max_refresh_cap_in_uhz) + max_refresh_cap_in_uhz = nominal_field_rate_in_uhz; + + // Don't allow min > max + if (min_refresh_request_in_uhz > max_refresh_request_in_uhz) + return false; + + // Check min is within range + if (min_refresh_request_in_uhz > max_refresh_cap_in_uhz || + min_refresh_request_in_uhz < min_refresh_cap_in_uhz) + return false; + + // Check max is within range + if (max_refresh_request_in_uhz > max_refresh_cap_in_uhz || + max_refresh_request_in_uhz < min_refresh_cap_in_uhz) + return false; + + // For variable range, check for at least 10 Hz range + if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) && + (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 1000)) + return false; + + return true; +} + diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h index bd75ca5f1cd3..e7d77bb6209f 100644 --- a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h +++ b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h @@ -159,4 +159,11 @@ void mod_freesync_handle_v_update(struct mod_freesync *mod_freesync, const struct dc_stream_state *stream, struct mod_vrr_params *in_out_vrr); +bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, + const struct dc_stream_state *stream, + uint32_t min_refresh_cap_in_uhz, + uint32_t max_refresh_cap_in_uhz, + uint32_t
[PATCH 09/21] drm/amd/display: change dml init to use default structs
From: Dmytro LaktyushkinSigned-off-by: Dmytro Laktyushkin Reviewed-by: Eric Bernstein Acked-by: Harry Wentland --- .../gpu/drm/amd/display/dc/dml/display_mode_lib.c | 138 - 1 file changed, 76 insertions(+), 62 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c index c109b2c34c8f..fd9d97aab071 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c +++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c @@ -26,75 +26,89 @@ #include "display_mode_lib.h" #include "dc_features.h" +static const struct _vcs_dpi_ip_params_st dcn1_0_ip = { + .rob_buffer_size_kbytes = 64, + .det_buffer_size_kbytes = 164, + .dpte_buffer_size_in_pte_reqs = 42, + .dpp_output_buffer_pixels = 2560, + .opp_output_buffer_lines = 1, + .pixel_chunk_size_kbytes = 8, + .pte_enable = 1, + .pte_chunk_size_kbytes = 2, + .meta_chunk_size_kbytes = 2, + .writeback_chunk_size_kbytes = 2, + .line_buffer_size_bits = 589824, + .max_line_buffer_lines = 12, + .IsLineBufferBppFixed = 0, + .LineBufferFixedBpp = -1, + .writeback_luma_buffer_size_kbytes = 12, + .writeback_chroma_buffer_size_kbytes = 8, + .max_num_dpp = 4, + .max_num_wb = 2, + .max_dchub_pscl_bw_pix_per_clk = 4, + .max_pscl_lb_bw_pix_per_clk = 2, + .max_lb_vscl_bw_pix_per_clk = 4, + .max_vscl_hscl_bw_pix_per_clk = 4, + .max_hscl_ratio = 4, + .max_vscl_ratio = 4, + .hscl_mults = 4, + .vscl_mults = 4, + .max_hscl_taps = 8, + .max_vscl_taps = 8, + .dispclk_ramp_margin_percent = 1, + .underscan_factor = 1.10, + .min_vblank_lines = 14, + .dppclk_delay_subtotal = 90, + .dispclk_delay_subtotal = 42, + .dcfclk_cstate_latency = 10, + .max_inter_dcn_tile_repeaters = 8, + .can_vstartup_lines_exceed_vsync_plus_back_porch_lines_minus_one = 0, + .bug_forcing_LC_req_same_size_fixed = 0, +}; + +static const struct _vcs_dpi_soc_bounding_box_st dcn1_0_soc = { + .sr_exit_time_us = 9.0, + .sr_enter_plus_exit_time_us = 11.0, + .urgent_latency_us = 4.0, + .writeback_latency_us = 12.0, + .ideal_dram_bw_after_urgent_percent = 80.0, + .max_request_size_bytes = 256, + .downspread_percent = 0.5, + .dram_page_open_time_ns = 50.0, + .dram_rw_turnaround_time_ns = 17.5, + .dram_return_buffer_per_channel_bytes = 8192, + .round_trip_ping_latency_dcfclk_cycles = 128, + .urgent_out_of_order_return_per_channel_bytes = 256, + .channel_interleave_bytes = 256, + .num_banks = 8, + .num_chans = 2, + .vmm_page_size_bytes = 4096, + .dram_clock_change_latency_us = 17.0, + .writeback_dram_clock_change_latency_us = 23.0, + .return_bus_width_bytes = 64, +}; + static void set_soc_bounding_box(struct _vcs_dpi_soc_bounding_box_st *soc, enum dml_project project) { - if (project == DML_PROJECT_RAVEN1) { - soc->sr_exit_time_us = 9.0; - soc->sr_enter_plus_exit_time_us = 11.0; - soc->urgent_latency_us = 4.0; - soc->writeback_latency_us = 12.0; - soc->ideal_dram_bw_after_urgent_percent = 80.0; - soc->max_request_size_bytes = 256; - soc->downspread_percent = 0.5; - soc->dram_page_open_time_ns = 50.0; - soc->dram_rw_turnaround_time_ns = 17.5; - soc->dram_return_buffer_per_channel_bytes = 8192; - soc->round_trip_ping_latency_dcfclk_cycles = 128; - soc->urgent_out_of_order_return_per_channel_bytes = 256; - soc->channel_interleave_bytes = 256; - soc->num_banks = 8; - soc->num_chans = 2; - soc->vmm_page_size_bytes = 4096; - soc->dram_clock_change_latency_us = 17.0; - soc->writeback_dram_clock_change_latency_us = 23.0; - soc->return_bus_width_bytes = 64; - } else { - BREAK_TO_DEBUGGER(); /* Invalid Project Specified */ + switch (project) { + case DML_PROJECT_RAVEN1: + *soc = dcn1_0_soc; + break; + default: + ASSERT(0); + break; } } static void set_ip_params(struct _vcs_dpi_ip_params_st *ip, enum dml_project project) { - if (project == DML_PROJECT_RAVEN1) { - ip->rob_buffer_size_kbytes = 64; - ip->det_buffer_size_kbytes = 164; - ip->dpte_buffer_size_in_pte_reqs = 42; - ip->dpp_output_buffer_pixels = 2560; - ip->opp_output_buffer_lines = 1; - ip->pixel_chunk_size_kbytes = 8; - ip->pte_enable = 1; -
[PATCH 16/21] drm/amd/display: refactor vupdate interrupt registration
From: Anthony KooWe only need to register once OS calls the interrupt control. Also, if we are entering static screen mode, disable after ramping is done. Disable shall be done via timer of 2 seconds regardless of ramping complete or not, just to simplify. Also, ramp to mid instead of min, due to better flicker performance... Signed-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- .../gpu/drm/amd/display/modules/freesync/freesync.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index daad60ec1ce3..349387eb9fe6 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -109,12 +109,6 @@ static unsigned int calc_duration_in_us_from_v_total( * 1000) * stream->timing.h_total, stream->timing.pix_clk_khz)); - if (duration_in_us < in_vrr->min_duration_in_us) - duration_in_us = in_vrr->min_duration_in_us; - - if (duration_in_us > in_vrr->max_duration_in_us) - duration_in_us = in_vrr->max_duration_in_us; - return duration_in_us; } @@ -230,10 +224,9 @@ static void update_v_total_for_static_ramp( } } - v_total = calc_v_total_from_duration(stream, - in_out_vrr, - current_duration_in_us); - + v_total = div64_u64(div64_u64(((unsigned long long)( + current_duration_in_us) * stream->timing.pix_clk_khz), + stream->timing.h_total), 1000); in_out_vrr->adjust.v_total_min = v_total; in_out_vrr->adjust.v_total_max = v_total; @@ -702,7 +695,11 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, } else if (in_out_vrr->state == VRR_STATE_ACTIVE_FIXED) { in_out_vrr->fixed.target_refresh_in_uhz = in_out_vrr->min_refresh_in_uhz; - if (in_out_vrr->fixed.ramping_active) { + if (in_out_vrr->fixed.ramping_active && + in_out_vrr->fixed.fixed_active) { + /* Do not update vtotals if ramping is already active +* in order to continue ramp from current refresh. +*/ in_out_vrr->fixed.fixed_active = true; } else { in_out_vrr->fixed.fixed_active = true; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 01/21] drm/amd/display: dal 3.1.42
From: Eric YangSigned-off-by: Eric Yang Reviewed-by: Anthony Koo Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/dc/dc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index 0f566a1ba35b..7ac8a1bee5ac 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -38,7 +38,7 @@ #include "inc/compressor.h" #include "dml/display_mode_lib.h" -#define DC_VER "3.1.41" +#define DC_VER "3.1.42" #define MAX_SURFACES 3 #define MAX_STREAMS 6 -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 14/21] drm/amd/display: fix LFC tearing at top of screen
From: Anthony KooTearing occurred because new VTOTAL MIN/MAX was being programmed too early. The flip can happen within the VUPDATE high region, and the new min/max would take effect immediately. But this means that frame is not variable anymore, and tearing would occur when the flip actually happens. The fixed insert duration should be programmed on the first VUPDATE interrupt instead. Signed-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index abd5c9374eb3..daad60ec1ce3 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -371,12 +371,6 @@ static void apply_below_the_range(struct core_freesync *core_freesync, inserted_frame_duration_in_us; in_out_vrr->btr.frames_to_insert = frames_to_insert; in_out_vrr->btr.frame_counter = frames_to_insert; - - in_out_vrr->adjust.v_total_min = - calc_v_total_from_duration(stream, in_out_vrr, - in_out_vrr->btr.inserted_duration_in_us); - in_out_vrr->adjust.v_total_max = - in_out_vrr->adjust.v_total_min; } } -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 10/21] drm/amd/display: Add back code to allow for rounding error
From: Anthony KooSigned-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index 4887c888bbe7..abd5c9374eb3 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -896,6 +896,17 @@ bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, unsigned long long nominal_field_rate_in_uhz = mod_freesync_calc_nominal_field_rate(stream); + /* Allow for some rounding error of actual video timing by taking ceil. +* For example, 144 Hz mode timing may actually be 143.xxx Hz when +* calculated from pixel rate and vertical/horizontal totals, but +* this should be allowed instead of blocking FreeSync. +*/ + nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, 100); + min_refresh_cap_in_uhz /= 100; + max_refresh_cap_in_uhz /= 100; + min_refresh_request_in_uhz /= 100; + max_refresh_request_in_uhz /= 100; + // Check nominal is within range if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz || nominal_field_rate_in_uhz < min_refresh_cap_in_uhz) @@ -921,7 +932,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync, // For variable range, check for at least 10 Hz range if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) && - (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 1000)) + (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 10)) return false; return true; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 12/21] drm/amd/display: Do not create memory allocation if stats not enabled
From: Anthony KooSigned-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- drivers/gpu/drm/amd/display/modules/stats/stats.c | 26 +-- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/stats/stats.c b/drivers/gpu/drm/amd/display/modules/stats/stats.c index ed5f6809a64e..48e02197919f 100644 --- a/drivers/gpu/drm/amd/display/modules/stats/stats.c +++ b/drivers/gpu/drm/amd/display/modules/stats/stats.c @@ -115,18 +115,22 @@ struct mod_stats *mod_stats_create(struct dc *dc) _data, sizeof(unsigned int), )) core_stats->enabled = reg_data; - core_stats->entries = DAL_STATS_ENTRIES_REGKEY_DEFAULT; - if (dm_read_persistent_data(dc->ctx, NULL, NULL, - DAL_STATS_ENTRIES_REGKEY, - _data, sizeof(unsigned int), )) { - if (reg_data > DAL_STATS_ENTRIES_REGKEY_MAX) - core_stats->entries = DAL_STATS_ENTRIES_REGKEY_MAX; - else - core_stats->entries = reg_data; - } + if (core_stats->enabled) { + core_stats->entries = DAL_STATS_ENTRIES_REGKEY_DEFAULT; + if (dm_read_persistent_data(dc->ctx, NULL, NULL, + DAL_STATS_ENTRIES_REGKEY, + _data, sizeof(unsigned int), )) { + if (reg_data > DAL_STATS_ENTRIES_REGKEY_MAX) + core_stats->entries = DAL_STATS_ENTRIES_REGKEY_MAX; + else + core_stats->entries = reg_data; + } - core_stats->time = kzalloc(sizeof(struct stats_time_cache) * core_stats->entries, - GFP_KERNEL); + core_stats->time = kzalloc(sizeof(struct stats_time_cache) * core_stats->entries, + GFP_KERNEL); + } else { + core_stats->entries = 0; + } if (core_stats->time == NULL) goto fail_construct; -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdkfd: fix clock counter retrieval for node without GPU
From: Andres RodriguezCurrently if a user requests clock counters for a node without a GPU resource we will always return EINVAL. Instead if no GPU resource is attached, fill the gpu_clock_counter argument with zeroes so that we may proceed and return valid CPU counters. Signed-off-by: Andres Rodriguez Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index cd679cf..b5e5f0e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -749,12 +749,13 @@ static int kfd_ioctl_get_clock_counters(struct file *filep, struct timespec64 time; dev = kfd_device_by_id(args->gpu_id); - if (dev == NULL) - return -EINVAL; - - /* Reading GPU clock counter from KGD */ - args->gpu_clock_counter = - dev->kfd2kgd->get_gpu_clock_counter(dev->kgd); + if (dev) + /* Reading GPU clock counter from KGD */ + args->gpu_clock_counter = + dev->kfd2kgd->get_gpu_clock_counter(dev->kgd); + else + /* Node without GPU resource */ + args->gpu_clock_counter = 0; /* No access to rdtsc. Using raw monotonic time */ getrawmonotonic64(); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amd/display: Fix 64-bit division in hwss_edp_power_control
Signed-off-by: Harry Wentland--- drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c index f32ccdfd18a3..3ba057e2a467 100644 --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c @@ -860,7 +860,7 @@ void hwss_edp_power_control( dm_get_elapse_time_in_ns( ctx, current_ts, - link->link_trace.time_stamp.edp_poweroff) / 100; + div64_u64(link->link_trace.time_stamp.edp_poweroff, 100)); unsigned long long wait_time_ms = 0; /* max 500ms from LCDVDD off to on */ -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: RFC for a render API to support adaptive sync and VRR
> From: Haehnle, Nicolai > Sent: Tuesday, April 10, 2018 13:48 > On 10.04.2018 19:25, Cyr, Aric wrote: > >> -Original Message- > >> From: Michel Dänzer [mailto:mic...@daenzer.net] > >> Sent: Tuesday, April 10, 2018 13:16 > >> > >> On 2018-04-10 07:13 PM, Cyr, Aric wrote: > -Original Message- > From: Michel Dänzer [mailto:mic...@daenzer.net] > Sent: Tuesday, April 10, 2018 13:06 > On 2018-04-10 06:26 PM, Cyr, Aric wrote: > > From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43 > > > >> For video games we have a similar situation where a frame is rendered > >> for a certain world time and in the ideal case we would actually > >> display the frame at this world time. > > > > That seems like it would be a poorly written game that flips like > > that, unless they are explicitly trying to throttle the framerate for > > some reason. When a game presents a completed frame, they’d like > > that to happen as soon as possible. > > What you're describing is what most games have been doing traditionally. > Croteam's research shows that this results in micro-stuttering, because > frames may be presented too early. To avoid that, they want to > explicitly time each presentation as described by Christian. > >>> > >>> Yes, I agree completely. However that's only truly relevant for fixed > >>> refreshed rate displays. > >> > >> No, it also affects variable refresh; possibly even more in some cases, > >> because the presentation time is less predictable. > > > > Yes, and that's why you don't want to do it when you have variable refresh. > > The hardware in the monitor and GPU will do it for you, > so why bother? > > I think Michel's point is that the monitor and GPU hardware *cannot* > really do this, because there's synchronization with audio to take into > account, which the GPU or monitor don't know about. How does it work fine today given that all kernel seems to know is 'current' or 'current+1' vsyncs. Presumably the applications somehow schedule all this just fine. If this works without variable refresh for 60Hz, will it not work for a fixed-rate "48Hz" monitor (assuming a 24Hz video)? > Also, as I wrote separately, there's the case of synchronizing multiple > monitors. For multimonitor to work with VRR, they'll have to be timing and flip synchronized. This is impossible for an application to manage, it needs driver/HW control or you end up with one display flipping before the other and it looks terrible. And definitely forget about multiGPU without professional workstation-type support needed to sync the displays across adapters. > > The input to their algorithms will be noisy causing worst estimations. If > > you just present as fast as you can, it'll just work (within > reason). > > The majority of gamers want maximum FPS for their games, and there's quite > > frequently outrage at a particular game when they are > limited to something lower that what their monitor could otherwise support > (i.e. I don't want my game limited to 30Hz if I have a shiny > 144Hz gaming display I paid good money for). Of course, there's always > exceptions... but in our experience those are few and far > between. > > I agree that games most likely shouldn't try to be smart. I'm curious > about the Croteam findings, but even if they did a really clever thing > that works better than just telling the display driver "display ASAP > please", chances are that *most* developers won't do that. And they'll > most likely get it wrong, so our guidance should really be "games should > ask for ASAP presentation, and nothing else". Right, I think this is the 'easy' case and is covered in Harry's initial proposal when target_frame_duration_ns = 0. > However, there *are* legitimate use cases for requesting a specific > presentation time, and there *is* precedent of APIs that expose such > features. > > Are there any real problems with exposing an absolute target present time? Realistically, how far into the future are you requesting a presentation time? Won't it almost always be something like current_time+1000/video_frame_rate? If so, why not just tell the driver to set 1000/video_frame_rate and have the GPU/monitor create nicely spaced VSYNCs for you that match the source content? In fact, you probably wouldn't even need to change your video player at all, other than having it pass the target_frame_duration_ns. You could consider this a 'hint' as you suggested, since it's cannot be guaranteed in cases your driver or HW doesn't support variable refresh. If the target_frame_duration_ns hint is supported/applied, then the video app should have nothing extra to do that it wouldn't already do for any arbitrary fixed-refresh rate display. If not supported (say the drm_atomic_check fails with -EINVAL or something), the video app falls can stop requesting a fixed target_frame_duration_ns. A fundamental problem
Re: [PATCH] drm/amd/display: Don't spam debug messages
On 2018-04-10 04:44 PM, Harry Wentland wrote: Ping On 2018-04-09 02:06 PM, Harry Wentland wrote: Signed-off-by: Harry WentlandReviewed-by: Leo (Sunpeng) Li --- drivers/gpu/drm/amd/display/include/logger_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/include/logger_types.h b/drivers/gpu/drm/amd/display/include/logger_types.h index 4f332e80cecc..b608a0830801 100644 --- a/drivers/gpu/drm/amd/display/include/logger_types.h +++ b/drivers/gpu/drm/amd/display/include/logger_types.h @@ -32,7 +32,7 @@ #define DC_LOG_ERROR(...) DRM_ERROR(__VA_ARGS__) #define DC_LOG_WARNING(...) DRM_WARN(__VA_ARGS__) -#define DC_LOG_DEBUG(...) DRM_INFO(__VA_ARGS__) +#define DC_LOG_DEBUG(...) DRM_DEBUG_KMS(__VA_ARGS__) #define DC_LOG_DC(...) DRM_DEBUG_KMS(__VA_ARGS__) #define DC_LOG_DTN(...) DRM_DEBUG_KMS(__VA_ARGS__) #define DC_LOG_SURFACE(...) pr_debug("[SURFACE]:"__VA_ARGS__) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler
Signed-off-by: Shaoyun LiuSigned-off-by: Jay Cornwall Signed-off-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 1495 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 13 +- 2 files changed, 1505 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm new file mode 100644 index 000..da09794 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm @@ -0,0 +1,1495 @@ +/* + * Copyright 2016 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#if 0 +HW (GFX9) source code for CWSR trap handler +#Version 18 + multiple trap handler + +// this performance-optimal version was originally from Seven Xu at SRDC + +// Revison #18 --... +/* Rev History +** #1. Branch from gc dv. //gfxip/gfx9/main/src/test/suites/block/cs/sr/cs_trap_handler.sp3#1,#50, #51, #52-53(Skip, Already Fixed by PV), #54-56(merged),#57-58(mergerd, skiped-already fixed by PV) +** #4. SR Memory Layout: +** 1. VGPR-SGPR-HWREG-{LDS} +** 2. tba_hi.bits.26 - reconfigured as the first wave in tg bits, for defer Save LDS for a threadgroup.. performance concern.. +** #5. Update: 1. Accurate g8sr_ts_save_d timestamp +** #6. Update: 1. Fix s_barrier usage; 2. VGPR s/r using swizzle buffer?(NoNeed, already matched the swizzle pattern, more investigation) +** #7. Update: 1. don't barrier if noLDS +** #8. Branch: 1. Branch to ver#0, which is very similar to gc dv version +**2. Fix SQ issue by s_sleep 2 +** #9. Update: 1. Fix scc restore failed issue, restore wave_status at last +**2. optimize s_buffer save by burst 16sgprs... +** #10. Update 1. Optimize restore sgpr by busrt 16 sgprs. +** #11. Update 1. Add 2 more timestamp for debug version +** #12. Update 1. Add VGPR SR using DWx4, some case improve and some case drop performance +** #13. Integ 1. Always use MUBUF for PV trap shader... +** #14. Update 1. s_buffer_store soft clause... +** #15. Update 1. PERF - sclar write with glc:0/mtype0 to allow L2 combine. perf improvement a lot. +** #16. Update 1. PRRF - UNROLL LDS_DMA got 2500cycle save in IP tree +** #17. Update 1. FUNC - LDS_DMA has issues while ATC, replace with ds_read/buffer_store for save part[TODO restore part] +**2. PERF - Save LDS before save VGPR to cover LDS save long latency... +** #18. Update 1. FUNC - Implicitly estore STATUS.VCCZ, which is not writable by s_setreg_b32 +**2. FUNC - Handle non-CWSR traps +*/ + +var G8SR_WDMEM_HWREG_OFFSET = 0 +var G8SR_WDMEM_SGPR_OFFSET = 128 // in bytes + +// Keep definition same as the app shader, These 2 time stamps are part of the app shader... Should before any Save and after restore. + +var G8SR_DEBUG_TIMESTAMP = 0 +var G8SR_DEBUG_TS_SAVE_D_OFFSET = 40*4 // ts_save_d timestamp offset relative to SGPR_SR_memory_offset +var s_g8sr_ts_save_s = s[34:35] // save start +var s_g8sr_ts_sq_save_msg = s[36:37] // The save shader send SAVEWAVE msg to spi +var s_g8sr_ts_spi_wrexec = s[38:39] // the SPI write the sr address to SQ +var s_g8sr_ts_save_d = s[40:41] // save end +var s_g8sr_ts_restore_s = s[42:43] // restore start +var s_g8sr_ts_restore_d = s[44:45] // restore end + +var G8SR_VGPR_SR_IN_DWX4 = 0 +var G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4 = 0x0010 // DWx4 stride is 4*4Bytes +var G8SR_RESTORE_BUF_RSRC_WORD1_STRIDE_DWx4 = G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4 + + +/*/ +/* control on how to run the shader */ +/*/ +//any
[PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet
kq->queue->properties.write_ptr is a GPU address which can'd be derefenced in the kernel. Use kq->wptr_kernel instead, which is the kernel CPU address of the same buffer. Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c index 23e586b..9f38161 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c @@ -279,7 +279,7 @@ static void submit_packet(struct kernel_queue *kq) static void rollback_packet(struct kernel_queue *kq) { - kq->pending_wptr = *kq->queue->properties.write_ptr; + kq->pending_wptr = *kq->wptr_kernel; } struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 10 + drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 25 +-- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h | 7 ++- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c | 9 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 9 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 9 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + 7 files changed, 63 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index 36c9269e..5d7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value) } } +void write_kernel_doorbell64(void __iomem *db, u64 value) +{ + if (db) { + WARN(((unsigned long)db & 7) != 0, +"Unaligned 64-bit doorbell"); + writeq(value, (u64 __iomem *)db); + pr_debug("writing %llu to doorbell address 0x%p\n", value, db); + } +} + unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd, struct kfd_process *process, unsigned int doorbell_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c index 9f38161..476951d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c @@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev, kq->rptr_kernel = kq->rptr_mem->cpu_ptr; kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr; - retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel), + retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size, >wptr_mem); if (retval != 0) @@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq, size_t available_size; size_t queue_size_dwords; uint32_t wptr, rptr; + uint64_t wptr64; unsigned int *queue_address; /* When rptr == wptr, the buffer is empty. @@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq, * the opposite. So we can only use up to queue_size_dwords - 1 dwords. */ rptr = *kq->rptr_kernel; - wptr = *kq->wptr_kernel; + wptr = kq->pending_wptr; + wptr64 = kq->pending_wptr64; queue_address = (unsigned int *)kq->pq_kernel_addr; queue_size_dwords = kq->queue->properties.queue_size / 4; @@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq, while (wptr > 0) { queue_address[wptr] = kq->nop_packet; wptr = (wptr + 1) % queue_size_dwords; + wptr64++; } } *buffer_ptr = _address[wptr]; kq->pending_wptr = wptr + packet_size_in_dwords; + kq->pending_wptr64 = wptr64 + packet_size_in_dwords; return 0; @@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq) pr_debug("\n"); #endif - *kq->wptr_kernel = kq->pending_wptr; - write_kernel_doorbell(kq->queue->properties.doorbell_ptr, - kq->pending_wptr); + kq->ops_asic_specific.submit_packet(kq); } static void rollback_packet(struct kernel_queue *kq) { - kq->pending_wptr = *kq->wptr_kernel; + if (kq->dev->device_info->doorbell_size == 8) { + kq->pending_wptr64 = *kq->wptr64_kernel; + kq->pending_wptr = *kq->wptr_kernel % + (kq->queue->properties.queue_size / 4); + } else { + kq->pending_wptr = *kq->wptr_kernel; + } } struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, @@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, case CHIP_HAWAII: kernel_queue_init_cik(>ops_asic_specific); break; + + case CHIP_VEGA10: + case CHIP_RAVEN: + kernel_queue_init_v9(>ops_asic_specific); + break; default: WARN(1, "Unexpected ASIC family %u", dev->device_info->asic_family); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h index 5940531..97aff20 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h @@ -72,6 +72,7 @@ struct kernel_queue { struct kfd_dev *dev; struct mqd_manager *mqd; struct queue*queue; + uint64_tpending_wptr64; uint32_t
[PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager
Signed-off-by: John BridgmanSigned-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/Makefile| 2 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 + .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c | 84 ++ drivers/gpu/drm/amd/amdkfd/kfd_module.c| 5 ++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 ++ 6 files changed, 106 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index 094b591..ff8b5aa 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -35,7 +35,7 @@ amdkfd-y := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \ kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \ kfd_packet_manager.o kfd_process_queue_manager.o \ kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \ - kfd_device_queue_manager_vi.o \ + kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \ kfd_interrupt.o kfd_events.o cik_event_interrupt.o \ kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 500f022..9af94b1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1386,7 +1386,10 @@ static bool set_cache_memory_policy(struct device_queue_manager *dqm, void __user *alternate_aperture_base, uint64_t alternate_aperture_size) { - bool retval; + bool retval = true; + + if (!dqm->asic_ops.set_cache_memory_policy) + return retval; mutex_lock(>lock); @@ -1655,6 +1658,11 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev) case CHIP_POLARIS11: device_queue_manager_init_vi_tonga(>asic_ops); break; + + case CHIP_VEGA10: + case CHIP_RAVEN: + device_queue_manager_init_v9(>asic_ops); + break; default: WARN(1, "Unexpected ASIC family %u", dev->device_info->asic_family); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 412beff..59a6b19 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -200,6 +200,8 @@ void device_queue_manager_init_vi( struct device_queue_manager_asic_ops *asic_ops); void device_queue_manager_init_vi_tonga( struct device_queue_manager_asic_ops *asic_ops); +void device_queue_manager_init_v9( + struct device_queue_manager_asic_ops *asic_ops); void program_sh_mem_settings(struct device_queue_manager *dqm, struct qcm_process_device *qpd); unsigned int get_queues_num(struct device_queue_manager *dqm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c new file mode 100644 index 000..79e5bcf --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c @@ -0,0 +1,84 @@ +/* + * Copyright 2016-2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "kfd_device_queue_manager.h" +#include "vega10_enum.h" +#include "gc/gc_9_0_offset.h" +#include "gc/gc_9_0_sh_mask.h" +#include "sdma0/sdma0_4_0_sh_mask.h" + +static int update_qpd_v9(struct device_queue_manager *dqm, +
[PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions
Signed-off-by: Shaoyun LiuSigned-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/Makefile | 7 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 331 + drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 18 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 4 + drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h | 583 +++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 6 + 6 files changed, 937 insertions(+), 12 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index 0d02422..52b3c1b 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -31,9 +31,10 @@ amdkfd-y := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \ kfd_process.o kfd_queue.o kfd_mqd_manager.o \ kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \ kfd_kernel_queue.o kfd_kernel_queue_cik.o \ - kfd_kernel_queue_vi.o kfd_packet_manager.o \ - kfd_process_queue_manager.o kfd_device_queue_manager.o \ - kfd_device_queue_manager_cik.o kfd_device_queue_manager_vi.o \ + kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \ + kfd_packet_manager.o kfd_process_queue_manager.o \ + kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \ + kfd_device_queue_manager_vi.o \ kfd_interrupt.o kfd_events.o cik_event_interrupt.o \ kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c new file mode 100644 index 000..ece7d59 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c @@ -0,0 +1,331 @@ +/* + * Copyright 2016-2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include "kfd_kernel_queue.h" +#include "kfd_device_queue_manager.h" +#include "kfd_pm4_headers_ai.h" +#include "kfd_pm4_opcodes.h" + +static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev, + enum kfd_queue_type type, unsigned int queue_size); +static void uninitialize_v9(struct kernel_queue *kq); + +void kernel_queue_init_v9(struct kernel_queue_ops *ops) +{ + ops->initialize = initialize_v9; + ops->uninitialize = uninitialize_v9; +} + +static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev, + enum kfd_queue_type type, unsigned int queue_size) +{ + int retval; + + retval = kfd_gtt_sa_allocate(dev, PAGE_SIZE, >eop_mem); + if (retval) + return false; + + kq->eop_gpu_addr = kq->eop_mem->gpu_addr; + kq->eop_kernel_addr = kq->eop_mem->cpu_ptr; + + memset(kq->eop_kernel_addr, 0, PAGE_SIZE); + + return true; +} + +static void uninitialize_v9(struct kernel_queue *kq) +{ + kfd_gtt_sa_free(kq->dev, kq->eop_mem); +} + +static int pm_map_process_v9(struct packet_manager *pm, + uint32_t *buffer, struct qcm_process_device *qpd) +{ + struct pm4_mes_map_process *packet; + uint64_t vm_page_table_base_addr = + (uint64_t)(qpd->page_table_base) << 12; + + packet = (struct pm4_mes_map_process *)buffer; + memset(buffer, 0, sizeof(struct pm4_mes_map_process)); + + packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS, + sizeof(struct pm4_mes_map_process)); + packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0; + packet->bitfields2.process_quantum = 1; + packet->bitfields2.pasid = qpd->pqm->process->pasid; + packet->bitfields14.gds_size = qpd->gds_size;
[PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs
From: weluReport failure to enable atomics only on GPUs that require them. This allows GPUs that don't require atomics to function, but can benefit if they are available. This is the case for Vega10, which doesn't use atomics for basic functioning of the MEC, AQL and HWS microcode. So it can work without atomics. But shader programs can still use atomic instructions on systems that support PCIe atomics. Signed-off-by: welu Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 053f1d0..ea95f3b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -290,7 +290,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev, const struct kfd2kgd_calls *f2g) { struct kfd_dev *kfd; - + int ret; const struct kfd_device_info *device_info = lookup_device_info(pdev->device); @@ -299,19 +299,18 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, return NULL; } - if (device_info->needs_pci_atomics) { - /* Allow BIF to recode atomics to PCIe 3.0 -* AtomicOps. 32 and 64-bit requests are possible and -* must be supported. -*/ - if (pci_enable_atomic_ops_to_root(pdev, - PCI_EXP_DEVCAP2_ATOMIC_COMP32 | - PCI_EXP_DEVCAP2_ATOMIC_COMP64) < 0) { - dev_info(kfd_device, - "skipped device %x:%x, PCI rejects atomics", -pdev->vendor, pdev->device); - return NULL; - } + /* Allow BIF to recode atomics to PCIe 3.0 AtomicOps. +* 32 and 64-bit requests are possible and must be +* supported. +*/ + ret = pci_enable_atomic_ops_to_root(pdev, + PCI_EXP_DEVCAP2_ATOMIC_COMP32 | + PCI_EXP_DEVCAP2_ATOMIC_COMP64); + if (device_info->needs_pci_atomics && ret < 0) { + dev_info(kfd_device, +"skipped device %x:%x, PCI rejects atomics\n", +pdev->vendor, pdev->device); + return NULL; } kfd = kzalloc(sizeof(*kfd), GFP_KERNEL); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
Signed-off-by: John BridgmanSigned-off-by: Jay Cornwall Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/Makefile | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c| 3 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 + 5 files changed, 451 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index 52b3c1b..094b591 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -30,6 +30,7 @@ amdkfd-y := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \ kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \ kfd_process.o kfd_queue.o kfd_mqd_manager.o \ kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \ + kfd_mqd_manager_v9.o \ kfd_kernel_queue.o kfd_kernel_queue_cik.o \ kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \ kfd_packet_manager.o kfd_process_queue_manager.o \ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index f563acb..c368ce3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size, if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size) return -ENOMEM; - *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO); + *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO); if ((*mem_obj) == NULL) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c index ee7061e..4b8eb50 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c @@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, case CHIP_POLARIS10: case CHIP_POLARIS11: return mqd_manager_init_vi_tonga(type, dev); + case CHIP_VEGA10: + case CHIP_RAVEN: + return mqd_manager_init_v9(type, dev); default: WARN(1, "Unexpected ASIC family %u", dev->device_info->asic_family); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c new file mode 100644 index 000..684054f --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c @@ -0,0 +1,443 @@ +/* + * Copyright 2016-2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include +#include +#include +#include "kfd_priv.h" +#include "kfd_mqd_manager.h" +#include "v9_structs.h" +#include "gc/gc_9_0_offset.h" +#include "gc/gc_9_0_sh_mask.h" +#include "sdma0/sdma0_4_0_sh_mask.h" + +static inline struct v9_mqd *get_mqd(void *mqd) +{ + return (struct v9_mqd *)mqd; +} + +static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd) +{ + return (struct v9_sdma_mqd *)mqd; +} + +static int init_mqd(struct mqd_manager *mm, void **mqd, + struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr, + struct queue_properties *q) +{ + int retval; + uint64_t addr; + struct v9_mqd *m; + struct kfd_dev *kfd = mm->dev; + + /* From V9, for CWSR, the control stack is located on the next page +* boundary after the mqd, we will use the gtt allocation function +* instead of sub-allocation function. +*/ + if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) { +
[PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15
Allocate doorbells according to the doorbell routing information on SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the queue_id as the doorbell ID to maintain compatibility with the Thunk. Signed-off-by: Shaoyun LiuSigned-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 7 ++ .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 82 -- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 12 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 11 ++- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 + .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +++- 6 files changed, 139 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index f6b35f4..1a4d8dc 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -295,6 +295,13 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL; args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id); args->doorbell_offset <<= PAGE_SHIFT; + if (KFD_IS_SOC15(dev->device_info->asic_family)) + /* On SOC15 ASICs, doorbell allocation must be +* per-device, and independent from the per-process +* queue_id. Return the doorbell offset within the +* doorbell aperture to user mode. +*/ + args->doorbell_offset |= q_properties.doorbell_off; mutex_unlock(>mutex); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index d55d29d..e9c72d8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -110,6 +110,57 @@ void program_sh_mem_settings(struct device_queue_manager *dqm, qpd->sh_mem_bases); } +static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q) +{ + struct kfd_dev *dev = qpd->dqm->dev; + + if (!KFD_IS_SOC15(dev->device_info->asic_family)) { + /* On pre-SOC15 chips we need to use the queue ID to +* preserve the user mode ABI. +*/ + q->doorbell_id = q->properties.queue_id; + } else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) { + /* For SDMA queues on SOC15, use static doorbell +* assignments based on the engine and queue. +*/ + q->doorbell_id = dev->shared_resources.sdma_doorbell + [q->properties.sdma_engine_id] + [q->properties.sdma_queue_id]; + } else { + /* For CP queues on SOC15 reserve a free doorbell ID */ + unsigned int found; + + found = find_first_zero_bit(qpd->doorbell_bitmap, + KFD_MAX_NUM_OF_QUEUES_PER_PROCESS); + if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) { + pr_debug("No doorbells available"); + return -EBUSY; + } + set_bit(found, qpd->doorbell_bitmap); + q->doorbell_id = found; + } + + q->properties.doorbell_off = + kfd_doorbell_id_to_offset(dev, q->process, + q->doorbell_id); + + return 0; +} + +static void deallocate_doorbell(struct qcm_process_device *qpd, + struct queue *q) +{ + unsigned int old; + struct kfd_dev *dev = qpd->dqm->dev; + + if (!KFD_IS_SOC15(dev->device_info->asic_family) || + q->properties.type == KFD_QUEUE_TYPE_SDMA) + return; + + old = test_and_clear_bit(q->doorbell_id, qpd->doorbell_bitmap); + WARN_ON(!old); +} + static int allocate_vmid(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q) @@ -301,10 +352,14 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm, if (retval) return retval; + retval = allocate_doorbell(qpd, q); + if (retval) + goto out_deallocate_hqd; + retval = mqd->init_mqd(mqd, >mqd, >mqd_mem_obj, >gart_mqd_addr, >properties); if (retval) - goto out_deallocate_hqd; + goto out_deallocate_doorbell; pr_debug("Loading mqd to hqd on pipe %d, queue %d\n", q->pipe, q->queue); @@ -324,6 +379,8 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm, out_uninit_mqd: mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj); +out_deallocate_doorbell: +
[PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up)
This condition was missed in a previous commit with the same title. Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c index 66852de..f16ac2b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c @@ -307,9 +307,7 @@ int kfd_init_apertures(struct kfd_process *process) struct kfd_process_device *pdd; /*Iterating over all devices*/ - while (kfd_topology_enum_kfd_devices(id, ) == 0 && - id < NUM_OF_SUPPORTED_GPUS) { - + while (kfd_topology_enum_kfd_devices(id, ) == 0) { if (!dev) { id++; /* Skip non GPU devices */ continue; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info
* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology * Report cache information in topology (duplicates GFXv8 info for now) * Add device info for Vega10 support in KFD Raven is not enabled at this time as it needs additional changes in DQM to work with a single SDMA engine. Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 11 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 37 +++ drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 + drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 + 4 files changed, 55 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 4f126ef..296b3f2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -132,6 +132,9 @@ static struct kfd_gpu_cache_info carrizo_cache_info[] = { #define fiji_cache_info carrizo_cache_info #define polaris10_cache_info carrizo_cache_info #define polaris11_cache_info carrizo_cache_info +/* TODO - check & update Vega10 cache details */ +#define vega10_cache_info carrizo_cache_info +#define raven_cache_info carrizo_cache_info static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev, struct crat_subtype_computeunit *cu) @@ -603,6 +606,14 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, pcache_info = polaris11_cache_info; num_of_cache_types = ARRAY_SIZE(polaris11_cache_info); break; + case CHIP_VEGA10: + pcache_info = vega10_cache_info; + num_of_cache_types = ARRAY_SIZE(vega10_cache_info); + break; + case CHIP_RAVEN: + pcache_info = raven_cache_info; + num_of_cache_types = ARRAY_SIZE(raven_cache_info); + break; default: return -EINVAL; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index ea95f3b..fb4a72d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -182,6 +182,34 @@ static const struct kfd_device_info polaris11_device_info = { .needs_pci_atomics = true, }; +static const struct kfd_device_info vega10_device_info = { + .asic_family = CHIP_VEGA10, + .max_pasid_bits = 16, + .max_no_of_hqd = 24, + .doorbell_size = 8, + .ih_ring_entry_size = 8 * sizeof(uint32_t), + .event_interrupt_class = _interrupt_class_v9, + .num_of_watch_points = 4, + .mqd_size_aligned = MQD_SIZE_ALIGNED, + .supports_cwsr = true, + .needs_iommu_device = false, + .needs_pci_atomics = false, +}; + +static const struct kfd_device_info vega10_vf_device_info = { + .asic_family = CHIP_VEGA10, + .max_pasid_bits = 16, + .max_no_of_hqd = 24, + .doorbell_size = 8, + .ih_ring_entry_size = 8 * sizeof(uint32_t), + .event_interrupt_class = _interrupt_class_v9, + .num_of_watch_points = 4, + .mqd_size_aligned = MQD_SIZE_ALIGNED, + .supports_cwsr = true, + .needs_iommu_device = false, + .needs_pci_atomics = false, +}; + struct kfd_deviceid { unsigned short did; @@ -261,6 +289,15 @@ static const struct kfd_deviceid supported_devices[] = { { 0x67EB, _device_info }, /* Polaris11 */ { 0x67EF, _device_info }, /* Polaris11 */ { 0x67FF, _device_info }, /* Polaris11 */ + { 0x6860, _device_info },/* Vega10 */ + { 0x6861, _device_info },/* Vega10 */ + { 0x6862, _device_info },/* Vega10 */ + { 0x6863, _device_info },/* Vega10 */ + { 0x6864, _device_info },/* Vega10 */ + { 0x6867, _device_info },/* Vega10 */ + { 0x6868, _device_info },/* Vega10 */ + { 0x686C, _vf_device_info }, /* Vega10 vf*/ + { 0x687F, _device_info },/* Vega10 */ }; static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index ac28abc..bc95d4df 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1239,6 +1239,12 @@ int kfd_topology_add_device(struct kfd_dev *gpu) HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) & HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK); break; + case CHIP_VEGA10: + case CHIP_RAVEN: + dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 << + HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) & + HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK); + break; default: WARN(1, "Unexpected ASIC family %u", dev->gpu->device_info->asic_family); diff --git
Re: RFC for a render API to support adaptive sync and VRR
On Tue, Apr 10, 2018 at 11:03:02AM -0400, Harry Wentland wrote: > Adding Anthony and Aric who've been working on Freesync with DC on other OSes > for a while. > > On 2018-04-09 05:45 PM, Manasi Navare wrote: > > Thanks for initiating the discussion. Find my comments below: > > > > On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: > >> Adding dri-devel, which I should've included from the start. > >> > >> On 2018-04-09 03:56 PM, Harry Wentland wrote: > >>> === What is adaptive sync and VRR? === > >>> > >>> Adaptive sync has been part of the DisplayPort spec for a while now and > >>> allows graphics adapters to drive displays with varying frame timings. > >>> VRR (variable refresh rate) is essentially the same, but defined for HDMI. > >>> > >>> > >>> > >>> === Why allow variable frame timings? === > >>> > >>> Variable render times don't align with fixed refresh rates, leading to > >>> stuttering, tearing, and/or input lag. > >>> > >>> e.g. (rc = render completion, dr = display refresh) > >>> > >>> rc B CDE F > >>> drA B C C D E F > >>> > >>> ^ ^ > >>> frame missed > >>>repeated display > >>> twice refresh > >>> > >>> > >>> > >>> === Other use cases of adaptive sync > >>> > >>> Beside the variable render case, adaptive sync also allows adjustment of > >>> refresh rates without a mode change. One such use case would be 24 Hz > >>> video. > >>> > > > > One of the the advantages here when the render speed is slower than the > > display refresh rate, since we are stretching the vertical blanking interval > > the display adapters will follow "draw fast and then go idle" approach. > > This gives power savings when render rate is lower than the display refresh > > rate. > > Are you talking about a use case, such as an idle desktop, where the renders > are quite sporadic? > I was refering to a case where the render rate is lower say 24Hz but the display rate is fixed 60Hz, that means we are pretty much displaying the same frame twice. But with Adaptive Sync, the display rate would be lowered to 24hz and the vertical blanking time will be stretched where instead of drawing the same frame twice, the system is now idle in that extra blanking time thus giving some power savings. > > > >>> > >>> > >>> === A DRM render API to support variable refresh rates === > >>> > >>> In order to benefit from adaptive sync and VRR userland needs a way to > >>> let us know whether to vary frame timings or to target a different frame > >>> time. These can be provided as atomic properties on a CRTC: > >>> * bool variable_refresh_compatible > >>> * inttarget_frame_duration_ns (nanosecond frame duration) > >>> > >>> This gives us the following cases: > >>> > >>> variable_refresh_compatible = 0, target_frame_duration_ns = 0 > >>> * drive monitor at timing's normal refresh rate > >>> > >>> variable_refresh_compatible = 1, target_frame_duration_ns = 0 > >>> * send new frame to monitor as soon as it's available, if within min/max > >>> of monitor's reported capabilities > >>> > >>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 > >>> * send new frame to monitor with the specified target_frame_duration_ns > >>> > >>> When a target_frame_duration_ns or variable_refresh_compatible cannot be > >>> supported the atomic check will reject the commit. > >>> > > > > What I would like is two sets of properties on a CRTC or preferably on a > > connector: > > > > KMD properties that UMD can query: > > * vrr_capable - This will be an immutable property for exposing hardware's > > capability of supporting VRR. This will be set by the kernel after > > reading the EDID mode information and monitor range capabilities. > > * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh > > rates supported. > > These properties are optional and will be created and attached to the > > DP/eDP connector when the connector > > is getting intialized. > > > > If we're talking about the properties from the EDID these might not > necessarily align with a currently selected mode, which might have a refresh > rate lower than the vrr_refresh_max, requiring us to cap it at that. In some > scenarios we also might do low framerate compensation [1] where we do magic > to allow the framerate to drop below the supported range. Actually the way I have coded that currently is span through all the EDID modes and for each mode with the same resolution but different refresh rates supported, create a VRR field part of drm_mode_config structure that would have refresh_max and min. So that way we store the max and min per mode as opposed to a per crtc/connector property. > > I think if a vrr_refresh_max/min are exposed to UMD these should really be > only for informational purposes, in
Re: [PATCH 00/21] GFXv9/Vega10 support for KFD
Hi Felix, Just to let you know that I am currently on vacation and will be back home only on 4/21 so all patch reviews from my side will be done after that date. Thanks, Oded On Tue, 10 Apr 2018, 17:33 Felix Kuehlingwrote: > This patch series adds support for GFXv9 GPUs to KFD. In this series it > enables support for Vega10. Raven support requires some extra work that > will follow shortly, but Raven support is already included and I didn't > go out of my way to keep it out. > > Felix Kuehling (19): > drm/amdgpu: Remove unused interface from kfd2kgd interface > drm/amd: Update GFXv9 SDMA MQD structure > drm/amdgpu: Add GFXv9 TLB invalidation packet definition > drm/amdgpu: Add GFXv9 kfd2kgd interface functions > drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources > drm/amdkfd: Make doorbell size ASIC-dependent > drm/amdkfd: Implement doorbell allocation for SOC15 > drm/amdkfd: Move packet writer functions into ASIC-specific file > drm/amdkfd: Add GFXv9 PM4 packet writer functions > drm/amdkfd: Add GFXv9 MQD manager > drm/amdkfd: Add GFXv9 device queue manager > drm/amdkfd: Add SOC15 interrupt processing support > drm/amdkfd: Fix goto usage > drm/amdkfd: Fix kernel queue rollback_packet > drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue > drm/amdkfd: Remove limit on number of GPUs (follow-up) > drm/amdkfd: Support flat memory apertures for GFXv9 > drm/amdkfd: Add GFXv9 CWSR trap handler > drm/amdkfd: Add Vega10 topology and device info > > Harish Kasiviswanathan (1): > drm/amdkfd: Clean up KFD_MMAP_ offset handling > > welu (1): > drm/amdkfd: Try to enable atomics for all GPUs > > MAINTAINERS|2 + > drivers/gpu/drm/amd/amdgpu/Makefile|3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 26 + > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 10 - > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 - > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 1043 ++ > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |1 + > drivers/gpu/drm/amd/amdgpu/soc15d.h|5 + > drivers/gpu/drm/amd/amdkfd/Makefile| 10 +- > .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 1495 > > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 42 +- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 11 + > drivers/gpu/drm/amd/amdkfd/kfd_device.c| 89 +- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 102 +- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 + > .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c | 84 ++ > drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 65 +- > drivers/gpu/drm/amd/amdkfd/kfd_events.c|2 +- > drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 119 +- > drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c| 84 ++ > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 39 +- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h |7 +- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |9 + > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 340 + > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 319 + > drivers/gpu/drm/amd/amdkfd/kfd_module.c|5 + > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c |3 + > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 443 ++ > drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 385 + > drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h| 583 > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 106 +- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 40 +- > .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +- > drivers/gpu/drm/amd/amdkfd/kfd_topology.c |6 + > drivers/gpu/drm/amd/amdkfd/kfd_topology.h |1 + > drivers/gpu/drm/amd/amdkfd/soc15_int.h | 47 + > drivers/gpu/drm/amd/include/kgd_kfd_interface.h| 20 +- > drivers/gpu/drm/amd/include/v9_structs.h | 48 +- > 39 files changed, 5118 insertions(+), 501 deletions(-) > create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c > create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm > create mode 100644 > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c > create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c > create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c > create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h > create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h > > -- > 2.7.4 > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org
[PATCH 08/21] drm/amd/display: Fix bug that causes black screen
From: Anthony KooIgnore MSA bit on DP display is usually set during SetTimings, but there was a case where the module thought refresh rate was not valid and ignore MSA bit was not set. Later, a valid refresh rate range was requested but since ignore MSA bit not set, it caused black screen. Issue if with how the module checked for VRR support. Fix up that logic. DM should call new valid_range function to determine if timing is supported. Signed-off-by: Anthony Koo Reviewed-by: Aric Cyr Acked-by: Harry Wentland --- .../gpu/drm/amd/display/modules/freesync/freesync.c| 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c index be6a6c63b4cc..4887c888bbe7 100644 --- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c +++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c @@ -613,7 +613,6 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, { struct core_freesync *core_freesync = NULL; unsigned long long nominal_field_rate_in_uhz = 0; - bool nominal_field_rate_in_range = true; unsigned int refresh_range = 0; unsigned int min_refresh_in_uhz = 0; unsigned int max_refresh_in_uhz = 0; @@ -638,15 +637,6 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, if (max_refresh_in_uhz > nominal_field_rate_in_uhz) max_refresh_in_uhz = nominal_field_rate_in_uhz; - /* Allow for some rounding error of actual video timing by taking ceil. -* For example, 144 Hz mode timing may actually be 143.xxx Hz when -* calculated from pixel rate and vertical/horizontal totals, but -* this should be allowed instead of blocking FreeSync. -*/ - if ((min_refresh_in_uhz / 100) > - ((nominal_field_rate_in_uhz + 100 - 1) / 100)) - nominal_field_rate_in_range = false; - // Full range may be larger than current video timing, so cap at nominal if (min_refresh_in_uhz > nominal_field_rate_in_uhz) min_refresh_in_uhz = nominal_field_rate_in_uhz; @@ -658,10 +648,14 @@ void mod_freesync_build_vrr_params(struct mod_freesync *mod_freesync, in_out_vrr->state = in_config->state; - if ((in_config->state == VRR_STATE_UNSUPPORTED) || - (!nominal_field_rate_in_range)) { + if (in_config->state == VRR_STATE_UNSUPPORTED) { in_out_vrr->state = VRR_STATE_UNSUPPORTED; in_out_vrr->supported = false; + in_out_vrr->adjust.v_total_min = stream->timing.v_total; + in_out_vrr->adjust.v_total_max = stream->timing.v_total; + + return; + } else { in_out_vrr->min_refresh_in_uhz = min_refresh_in_uhz; in_out_vrr->max_duration_in_us = -- 2.15.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9
Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 115 --- 1 file changed, 87 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c index f16ac2b..97d5423 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c @@ -275,23 +275,35 @@ * for FLAT_* / S_LOAD operations. */ -#define MAKE_GPUVM_APP_BASE(gpu_num) \ +#define MAKE_GPUVM_APP_BASE_VI(gpu_num) \ (((uint64_t)(gpu_num) << 61) + 0x1L) #define MAKE_GPUVM_APP_LIMIT(base, size) \ (((uint64_t)(base) & 0xFF00UL) + (size) - 1) -#define MAKE_SCRATCH_APP_BASE() \ +#define MAKE_SCRATCH_APP_BASE_VI() \ (((uint64_t)(0x1UL) << 61) + 0x1L) #define MAKE_SCRATCH_APP_LIMIT(base) \ (((uint64_t)base & 0xUL) | 0x) -#define MAKE_LDS_APP_BASE() \ +#define MAKE_LDS_APP_BASE_VI() \ (((uint64_t)(0x1UL) << 61) + 0x0) #define MAKE_LDS_APP_LIMIT(base) \ (((uint64_t)(base) & 0xUL) | 0x) +/* On GFXv9 the LDS and scratch apertures are programmed independently + * using the high 16 bits of the 64-bit virtual address. They must be + * in the hole, which will be the case as long as the high 16 bits are + * not 0. + * + * The aperture sizes are still 4GB implicitly. + * + * A GPUVM aperture is not applicable on GFXv9. + */ +#define MAKE_LDS_APP_BASE_V9() ((uint64_t)(0x1UL) << 48) +#define MAKE_SCRATCH_APP_BASE_V9() ((uint64_t)(0x2UL) << 48) + /* User mode manages most of the SVM aperture address space. The low * 16MB are reserved for kernel use (CWSR trap handler and kernel IB * for now). @@ -300,6 +312,55 @@ #define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE) #define SVM_IB_BASE (SVM_CWSR_BASE - PAGE_SIZE) +static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id) +{ + /* +* node id couldn't be 0 - the three MSB bits of +* aperture shoudn't be 0 +*/ + pdd->lds_base = MAKE_LDS_APP_BASE_VI(); + pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base); + + if (!pdd->dev->device_info->needs_iommu_device) { + /* dGPUs: SVM aperture starting at 0 +* with small reserved space for kernel. +* Set them to CANONICAL addresses. +*/ + pdd->gpuvm_base = SVM_USER_BASE; + pdd->gpuvm_limit = + pdd->dev->shared_resources.gpuvm_size - 1; + } else { + /* set them to non CANONICAL addresses, and no SVM is +* allocated. +*/ + pdd->gpuvm_base = MAKE_GPUVM_APP_BASE_VI(id + 1); + pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base, + pdd->dev->shared_resources.gpuvm_size); + } + + pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI(); + pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base); +} + +static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id) +{ + pdd->lds_base = MAKE_LDS_APP_BASE_V9(); + pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base); + + /* Raven needs SVM to support graphic handle, etc. Leave the small +* reserved space before SVM on Raven as well, even though we don't +* have to. +* Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they +* are used in Thunk to reserve SVM. +*/ + pdd->gpuvm_base = SVM_USER_BASE; + pdd->gpuvm_limit = + pdd->dev->shared_resources.gpuvm_size - 1; + + pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9(); + pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base); +} + int kfd_init_apertures(struct kfd_process *process) { uint8_t id = 0; @@ -316,7 +377,7 @@ int kfd_init_apertures(struct kfd_process *process) pdd = kfd_create_process_device_data(dev, process); if (!pdd) { pr_err("Failed to create process device data\n"); - return -1; + return -ENOMEM; } /* * For 64 bit process apertures will be statically reserved in @@ -328,32 +389,30 @@ int kfd_init_apertures(struct kfd_process *process) pdd->gpuvm_base = pdd->gpuvm_limit = 0; pdd->scratch_base = pdd->scratch_limit = 0; } else { - /* Same LDS and scratch apertures can be used -* on all GPUs. This allows using more dGPUs -* than placement options for apertures. -*/ - pdd->lds_base = MAKE_LDS_APP_BASE(); - pdd->lds_limit =
[PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file
This is in preparation for GFXv9 (Vega10) which uses incompatible PM4 packet formats from previous ASIC generations. Signed-off-by: Shaoyun LiuSigned-off-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 310 + drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 381 - drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 35 +- 4 files changed, 420 insertions(+), 316 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index e9c72d8..500f022 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -196,15 +196,19 @@ static int allocate_vmid(struct device_queue_manager *dqm, static int flush_texture_cache_nocpsch(struct kfd_dev *kdev, struct qcm_process_device *qpd) { - uint32_t len; + const struct packet_manager_funcs *pmf = qpd->dqm->packets.pmf; + int ret; if (!qpd->ib_kaddr) return -ENOMEM; - len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr); + ret = pmf->release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr); + if (ret) + return ret; return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid, - qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len); + qpd->ib_base, (uint32_t *)qpd->ib_kaddr, + pmf->release_mem_size / sizeof(uint32_t)); } static void deallocate_vmid(struct device_queue_manager *dqm, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c index f1d4828..7ee326f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c @@ -22,6 +22,9 @@ */ #include "kfd_kernel_queue.h" +#include "kfd_device_queue_manager.h" +#include "kfd_pm4_headers_vi.h" +#include "kfd_pm4_opcodes.h" static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev, enum kfd_queue_type type, unsigned int queue_size); @@ -54,3 +57,310 @@ static void uninitialize_vi(struct kernel_queue *kq) { kfd_gtt_sa_free(kq->dev, kq->eop_mem); } + +static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size) +{ + union PM4_MES_TYPE_3_HEADER header; + + header.u32All = 0; + header.opcode = opcode; + header.count = packet_size / 4 - 2; + header.type = PM4_TYPE_3; + + return header.u32All; +} + +static int pm_map_process_vi(struct packet_manager *pm, uint32_t *buffer, + struct qcm_process_device *qpd) +{ + struct pm4_mes_map_process *packet; + + packet = (struct pm4_mes_map_process *)buffer; + + memset(buffer, 0, sizeof(struct pm4_mes_map_process)); + + packet->header.u32All = build_pm4_header(IT_MAP_PROCESS, + sizeof(struct pm4_mes_map_process)); + packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0; + packet->bitfields2.process_quantum = 1; + packet->bitfields2.pasid = qpd->pqm->process->pasid; + packet->bitfields3.page_table_base = qpd->page_table_base; + packet->bitfields10.gds_size = qpd->gds_size; + packet->bitfields10.num_gws = qpd->num_gws; + packet->bitfields10.num_oac = qpd->num_oac; + packet->bitfields10.num_queues = (qpd->is_debug) ? 0 : qpd->queue_count; + + packet->sh_mem_config = qpd->sh_mem_config; + packet->sh_mem_bases = qpd->sh_mem_bases; + packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base; + packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit; + + packet->sh_hidden_private_base_vmid = qpd->sh_hidden_private_base; + + packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area); + packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area); + + return 0; +} + +static int pm_runlist_vi(struct packet_manager *pm, uint32_t *buffer, + uint64_t ib, size_t ib_size_in_dwords, bool chain) +{ + struct pm4_mes_runlist *packet; + int concurrent_proc_cnt = 0; + struct kfd_dev *kfd = pm->dqm->dev; + + if (WARN_ON(!ib)) + return -EFAULT; + + /* Determine the number of processes to map together to HW: +* it can not exceed the number of VMIDs available to the +* scheduler, and it is determined by the smaller of the number +* of processes in the runlist and kfd module parameter +* hws_max_conc_proc. +* Note: the arbitration between the number of VMIDs and +* hws_max_conc_proc has been done in +* kgd2kfd_device_init(). +*/ + concurrent_proc_cnt =
[PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support
Signed-off-by: Shaoyun LiuSigned-off-by: Oak Zeng Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/Makefile | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 84 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 + drivers/gpu/drm/amd/amdkfd/soc15_int.h | 47 ++ 4 files changed, 134 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index ff8b5aa..ffd096f 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -37,7 +37,7 @@ amdkfd-y := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \ kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \ kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \ kfd_interrupt.o kfd_events.o cik_event_interrupt.o \ - kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o + kfd_int_process_v9.o kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o ifneq ($(CONFIG_AMD_IOMMU_V2),) amdkfd-y += kfd_iommu.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c new file mode 100644 index 000..39d4115 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -0,0 +1,84 @@ +/* + * Copyright 2016-2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "kfd_priv.h" +#include "kfd_events.h" +#include "soc15_int.h" + + +static bool event_interrupt_isr_v9(struct kfd_dev *dev, + const uint32_t *ih_ring_entry) +{ + uint16_t source_id, client_id, pasid, vmid; + + source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry); + client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry); + pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry); + vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry); + + if (pasid) { + const uint32_t *data = ih_ring_entry; + + pr_debug("client id 0x%x, source id %d, pasid 0x%x. raw data:\n", +client_id, source_id, pasid); + pr_debug("%8X, %8X, %8X, %8X, %8X, %8X, %8X, %8X.\n", +data[0], data[1], data[2], data[3], +data[4], data[5], data[6], data[7]); + } + + return (pasid != 0) && + (source_id == SOC15_INTSRC_CP_END_OF_PIPE || +source_id == SOC15_INTSRC_SDMA_TRAP || +source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG || +source_id == SOC15_INTSRC_CP_BAD_OPCODE); +} + +static void event_interrupt_wq_v9(struct kfd_dev *dev, + const uint32_t *ih_ring_entry) +{ + uint16_t source_id, client_id, pasid, vmid; + uint32_t context_id; + + source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry); + client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry); + pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry); + vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry); + context_id = SOC15_CONTEXT_ID0_FROM_IH_ENTRY(ih_ring_entry); + + if (source_id == SOC15_INTSRC_CP_END_OF_PIPE) + kfd_signal_event_interrupt(pasid, context_id, 32); + else if (source_id == SOC15_INTSRC_SDMA_TRAP) + kfd_signal_event_interrupt(pasid, context_id & 0xfff, 28); + else if (source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG) + kfd_signal_event_interrupt(pasid, context_id & 0xff, 24); + else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) + kfd_signal_hw_exception_event(pasid); + else if (client_id == SOC15_IH_CLIENTID_VMC || +
[PATCH 14/21] drm/amdkfd: Fix goto usage
Missed a spot in previous cleanup commit: Remove gotos that do not feature any common cleanup, and use gotos instead of repeating cleanup commands. According to kernel.org: "The goto statement comes in handy when a function exits from multiple locations and some common work such as cleanup has to be done. If there is no cleanup needed then just return directly." Signed-off-by: Kent RussellSigned-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c index 69f4964..23e586b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c @@ -232,18 +232,16 @@ static int acquire_packet_buffer(struct kernel_queue *kq, * make sure calling functions know * acquire_packet_buffer() failed */ - *buffer_ptr = NULL; - return -ENOMEM; + goto err_no_space; } if (wptr + packet_size_in_dwords >= queue_size_dwords) { /* make sure after rolling back to position 0, there is * still enough space. */ - if (packet_size_in_dwords >= rptr) { - *buffer_ptr = NULL; - return -ENOMEM; - } + if (packet_size_in_dwords >= rptr) + goto err_no_space; + /* fill nops, roll back and start at position 0 */ while (wptr > 0) { queue_address[wptr] = kq->nop_packet; @@ -255,6 +253,10 @@ static int acquire_packet_buffer(struct kernel_queue *kq, kq->pending_wptr = wptr + packet_size_in_dwords; return 0; + +err_no_space: + *buffer_ptr = NULL; + return -ENOMEM; } static void submit_packet(struct kernel_queue *kq) -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling
From: Harish KasiviswanathanUse bit-rotate for better clarity and remove _MASK from the #defines as these represent mmap types. Centralize all the parsing of the mmap offset in kfd_mmap and add device parameter to doorbell and reserved_mem map functions. Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits for encoding the the doorbell ID on Vega10. Signed-off-by: Harish Kasiviswanathan Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 35 ++-- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 9 ++-- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 38 --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 +++ 5 files changed, 59 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index b5e5f0e..f6b35f4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -292,7 +292,8 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, /* Return gpu_id as doorbell offset for mmap usage */ - args->doorbell_offset = (KFD_MMAP_DOORBELL_MASK | args->gpu_id); + args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL; + args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id); args->doorbell_offset <<= PAGE_SHIFT; mutex_unlock(>mutex); @@ -1645,23 +1646,33 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) static int kfd_mmap(struct file *filp, struct vm_area_struct *vma) { struct kfd_process *process; + struct kfd_dev *dev = NULL; + unsigned long vm_pgoff; + unsigned int gpu_id; process = kfd_get_process(current); if (IS_ERR(process)) return PTR_ERR(process); - if ((vma->vm_pgoff & KFD_MMAP_DOORBELL_MASK) == - KFD_MMAP_DOORBELL_MASK) { - vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_DOORBELL_MASK; - return kfd_doorbell_mmap(process, vma); - } else if ((vma->vm_pgoff & KFD_MMAP_EVENTS_MASK) == - KFD_MMAP_EVENTS_MASK) { - vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_EVENTS_MASK; + vm_pgoff = vma->vm_pgoff; + vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff); + gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff); + if (gpu_id) + dev = kfd_device_by_id(gpu_id); + + switch (vm_pgoff & KFD_MMAP_TYPE_MASK) { + case KFD_MMAP_TYPE_DOORBELL: + if (!dev) + return -ENODEV; + return kfd_doorbell_mmap(dev, process, vma); + + case KFD_MMAP_TYPE_EVENTS: return kfd_event_mmap(process, vma); - } else if ((vma->vm_pgoff & KFD_MMAP_RESERVED_MEM_MASK) == - KFD_MMAP_RESERVED_MEM_MASK) { - vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_RESERVED_MEM_MASK; - return kfd_reserved_mem_mmap(process, vma); + + case KFD_MMAP_TYPE_RESERVED_MEM: + if (!dev) + return -ENODEV; + return kfd_reserved_mem_mmap(dev, process, vma); } return -EFAULT; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index 4840314..efc59de 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -126,15 +126,10 @@ void kfd_doorbell_fini(struct kfd_dev *kfd) iounmap(kfd->doorbell_kernel_ptr); } -int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma) +int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process, + struct vm_area_struct *vma) { phys_addr_t address; - struct kfd_dev *dev; - - /* Find kfd device according to gpu id */ - dev = kfd_device_by_id(vma->vm_pgoff); - if (!dev) - return -EINVAL; /* * For simplicitly we only allow mapping of the entire doorbell diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index 4890a90..bccf2f7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -345,7 +345,7 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p, case KFD_EVENT_TYPE_DEBUG: ret = create_signal_event(devkfd, p, ev); if (!ret) { - *event_page_offset = KFD_MMAP_EVENTS_MASK; + *event_page_offset = KFD_MMAP_TYPE_EVENTS; *event_page_offset <<= PAGE_SHIFT; *event_slot_index = ev->event_id; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
[PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions
Signed-off-by: John BridgmanSigned-off-by: Shaoyun Liu Signed-off-by: Jay Cornwall Signed-off-by: Yong Zhao Signed-off-by: Felix Kuehling --- MAINTAINERS |1 + drivers/gpu/drm/amd/amdgpu/Makefile |3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|4 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 1043 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |1 + 6 files changed, 1052 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c diff --git a/MAINTAINERS b/MAINTAINERS index 6804170..9bfb765 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -766,6 +766,7 @@ F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c +F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c F: drivers/gpu/drm/amd/amdkfd/ diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index 2ca2b51..f300202 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -130,7 +130,8 @@ amdgpu-y += \ amdgpu_amdkfd.o \ amdgpu_amdkfd_fence.o \ amdgpu_amdkfd_gpuvm.o \ -amdgpu_amdkfd_gfx_v8.o +amdgpu_amdkfd_gfx_v8.o \ +amdgpu_amdkfd_gfx_v9.o # add cgs amdgpu-y += amdgpu_cgs.o diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 4d36203..fcd10db 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -92,6 +92,10 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev) case CHIP_POLARIS11: kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions(); break; + case CHIP_VEGA10: + case CHIP_RAVEN: + kfd2kgd = amdgpu_amdkfd_gfx_9_0_get_functions(); + break; default: dev_dbg(adev->dev, "kfd not supported on this ASIC\n"); return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index c3024b1..12367a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -122,6 +122,7 @@ int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine, struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void); struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void); +struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void); bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c new file mode 100644 index 000..8f37991 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -0,0 +1,1043 @@ +/* + * Copyright 2014-2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#define pr_fmt(fmt) "kfd2kgd: " fmt + +#include +#include +#include +#include +#include +#include "amdgpu.h" +#include "amdgpu_amdkfd.h" +#include "amdgpu_ucode.h" +#include "soc15_hw_ip.h" +#include "gc/gc_9_0_offset.h" +#include "gc/gc_9_0_sh_mask.h" +#include "vega10_enum.h" +#include "sdma0/sdma0_4_0_offset.h" +#include "sdma0/sdma0_4_0_sh_mask.h" +#include "sdma1/sdma1_4_0_offset.h" +#include "sdma1/sdma1_4_0_sh_mask.h" +#include "athub/athub_1_0_offset.h" +#include "athub/athub_1_0_sh_mask.h" +#include
[PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition
Signed-off-by: Shaoyun LiuSigned-off-by: Jay Cornwall Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/soc15d.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h b/drivers/gpu/drm/amd/amdgpu/soc15d.h index 7f408f8..f22f7a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15d.h +++ b/drivers/gpu/drm/amd/amdgpu/soc15d.h @@ -268,6 +268,11 @@ * x=1: tmz_end */ +#definePACKET3_INVALIDATE_TLBS 0x98 +# define PACKET3_INVALIDATE_TLBS_DST_SEL(x) ((x) << 0) +# define PACKET3_INVALIDATE_TLBS_ALL_HUB(x) ((x) << 4) +# define PACKET3_INVALIDATE_TLBS_PASID(x) ((x) << 5) +# define PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(x) ((x) << 29) #define PACKET3_SET_RESOURCES 0xA0 /* 1. header * 2. CONTROL -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 00/21] GFXv9/Vega10 support for KFD
This patch series adds support for GFXv9 GPUs to KFD. In this series it enables support for Vega10. Raven support requires some extra work that will follow shortly, but Raven support is already included and I didn't go out of my way to keep it out. Felix Kuehling (19): drm/amdgpu: Remove unused interface from kfd2kgd interface drm/amd: Update GFXv9 SDMA MQD structure drm/amdgpu: Add GFXv9 TLB invalidation packet definition drm/amdgpu: Add GFXv9 kfd2kgd interface functions drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources drm/amdkfd: Make doorbell size ASIC-dependent drm/amdkfd: Implement doorbell allocation for SOC15 drm/amdkfd: Move packet writer functions into ASIC-specific file drm/amdkfd: Add GFXv9 PM4 packet writer functions drm/amdkfd: Add GFXv9 MQD manager drm/amdkfd: Add GFXv9 device queue manager drm/amdkfd: Add SOC15 interrupt processing support drm/amdkfd: Fix goto usage drm/amdkfd: Fix kernel queue rollback_packet drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue drm/amdkfd: Remove limit on number of GPUs (follow-up) drm/amdkfd: Support flat memory apertures for GFXv9 drm/amdkfd: Add GFXv9 CWSR trap handler drm/amdkfd: Add Vega10 topology and device info Harish Kasiviswanathan (1): drm/amdkfd: Clean up KFD_MMAP_ offset handling welu (1): drm/amdkfd: Try to enable atomics for all GPUs MAINTAINERS|2 + drivers/gpu/drm/amd/amdgpu/Makefile|3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 26 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 10 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 - drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 1043 ++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |1 + drivers/gpu/drm/amd/amdgpu/soc15d.h|5 + drivers/gpu/drm/amd/amdkfd/Makefile| 10 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 1495 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 42 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 11 + drivers/gpu/drm/amd/amdkfd/kfd_device.c| 89 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 102 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |2 + .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c | 84 ++ drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 65 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c|2 +- drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 119 +- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c| 84 ++ drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 39 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h |7 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |9 + drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 340 + drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 319 + drivers/gpu/drm/amd/amdkfd/kfd_module.c|5 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c |3 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c| 443 ++ drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 385 + drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h| 583 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 106 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 40 +- .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c |6 + drivers/gpu/drm/amd/amdkfd/kfd_topology.h |1 + drivers/gpu/drm/amd/amdkfd/soc15_int.h | 47 + drivers/gpu/drm/amd/include/kgd_kfd_interface.h| 20 +- drivers/gpu/drm/amd/include/v9_structs.h | 48 +- 39 files changed, 5118 insertions(+), 501 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface
Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 10 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 -- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 5 - 3 files changed, 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c index ea54e53..0ff36d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c @@ -98,8 +98,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid); -static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, - uint32_t hpd_size, uint64_t hpd_gpu_addr); static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id); static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr, @@ -183,7 +181,6 @@ static const struct kfd2kgd_calls kfd2kgd = { .free_pasid = amdgpu_pasid_free, .program_sh_mem_settings = kgd_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping, - .init_pipeline = kgd_init_pipeline, .init_interrupts = kgd_init_interrupts, .hqd_load = kgd_hqd_load, .hqd_sdma_load = kgd_hqd_sdma_load, @@ -309,13 +306,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, return 0; } -static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, - uint32_t hpd_size, uint64_t hpd_gpu_addr) -{ - /* amdgpu owns the per-pipe state */ - return 0; -} - static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id) { struct amdgpu_device *adev = get_amdgpu_device(kgd); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c index 89264c9..6ef9762 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c @@ -57,8 +57,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_bases); static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid); -static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, - uint32_t hpd_size, uint64_t hpd_gpu_addr); static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id); static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr, @@ -141,7 +139,6 @@ static const struct kfd2kgd_calls kfd2kgd = { .free_pasid = amdgpu_pasid_free, .program_sh_mem_settings = kgd_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping, - .init_pipeline = kgd_init_pipeline, .init_interrupts = kgd_init_interrupts, .hqd_load = kgd_hqd_load, .hqd_sdma_load = kgd_hqd_sdma_load, @@ -270,13 +267,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, return 0; } -static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, - uint32_t hpd_size, uint64_t hpd_gpu_addr) -{ - /* amdgpu owns the per-pipe state */ - return 0; -} - static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id) { struct amdgpu_device *adev = get_amdgpu_device(kgd); diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 286cfe7..7cf3506 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -173,8 +173,6 @@ struct tile_config { * @set_pasid_vmid_mapping: Exposes pasid/vmid pair to the H/W for no cp * scheduling mode. Only used for no cp scheduling mode. * - * @init_pipeline: Initialized the compute pipelines. - * * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp * sceduling mode. * @@ -274,9 +272,6 @@ struct kfd2kgd_calls { int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid); - int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id, - uint32_t hpd_size, uint64_t hpd_gpu_addr); - int (*init_interrupts)(struct kgd_dev *kgd, uint32_t pipe_id); int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure
This matches what the HWS firmware expects on GFXv9 chips. Signed-off-by: Felix Kuehling--- MAINTAINERS | 1 + drivers/gpu/drm/amd/include/v9_structs.h | 48 2 files changed, 25 insertions(+), 24 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 004d2c1..6804170 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -772,6 +772,7 @@ F: drivers/gpu/drm/amd/amdkfd/ F: drivers/gpu/drm/amd/include/cik_structs.h F: drivers/gpu/drm/amd/include/kgd_kfd_interface.h F: drivers/gpu/drm/amd/include/vi_structs.h +F: drivers/gpu/drm/amd/include/v9_structs.h F: include/uapi/linux/kfd_ioctl.h AMD SEATTLE DEVICE TREE SUPPORT diff --git a/drivers/gpu/drm/amd/include/v9_structs.h b/drivers/gpu/drm/amd/include/v9_structs.h index 2fb25ab..ceaf493 100644 --- a/drivers/gpu/drm/amd/include/v9_structs.h +++ b/drivers/gpu/drm/amd/include/v9_structs.h @@ -29,10 +29,10 @@ struct v9_sdma_mqd { uint32_t sdmax_rlcx_rb_base; uint32_t sdmax_rlcx_rb_base_hi; uint32_t sdmax_rlcx_rb_rptr; + uint32_t sdmax_rlcx_rb_rptr_hi; uint32_t sdmax_rlcx_rb_wptr; + uint32_t sdmax_rlcx_rb_wptr_hi; uint32_t sdmax_rlcx_rb_wptr_poll_cntl; - uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi; - uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo; uint32_t sdmax_rlcx_rb_rptr_addr_hi; uint32_t sdmax_rlcx_rb_rptr_addr_lo; uint32_t sdmax_rlcx_ib_cntl; @@ -44,29 +44,29 @@ struct v9_sdma_mqd { uint32_t sdmax_rlcx_skip_cntl; uint32_t sdmax_rlcx_context_status; uint32_t sdmax_rlcx_doorbell; - uint32_t sdmax_rlcx_virtual_addr; - uint32_t sdmax_rlcx_ape1_cntl; + uint32_t sdmax_rlcx_status; uint32_t sdmax_rlcx_doorbell_log; - uint32_t reserved_22; - uint32_t reserved_23; - uint32_t reserved_24; - uint32_t reserved_25; - uint32_t reserved_26; - uint32_t reserved_27; - uint32_t reserved_28; - uint32_t reserved_29; - uint32_t reserved_30; - uint32_t reserved_31; - uint32_t reserved_32; - uint32_t reserved_33; - uint32_t reserved_34; - uint32_t reserved_35; - uint32_t reserved_36; - uint32_t reserved_37; - uint32_t reserved_38; - uint32_t reserved_39; - uint32_t reserved_40; - uint32_t reserved_41; + uint32_t sdmax_rlcx_watermark; + uint32_t sdmax_rlcx_doorbell_offset; + uint32_t sdmax_rlcx_csa_addr_lo; + uint32_t sdmax_rlcx_csa_addr_hi; + uint32_t sdmax_rlcx_ib_sub_remain; + uint32_t sdmax_rlcx_preempt; + uint32_t sdmax_rlcx_dummy_reg; + uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi; + uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo; + uint32_t sdmax_rlcx_rb_aql_cntl; + uint32_t sdmax_rlcx_minor_ptr_update; + uint32_t sdmax_rlcx_midcmd_data0; + uint32_t sdmax_rlcx_midcmd_data1; + uint32_t sdmax_rlcx_midcmd_data2; + uint32_t sdmax_rlcx_midcmd_data3; + uint32_t sdmax_rlcx_midcmd_data4; + uint32_t sdmax_rlcx_midcmd_data5; + uint32_t sdmax_rlcx_midcmd_data6; + uint32_t sdmax_rlcx_midcmd_data7; + uint32_t sdmax_rlcx_midcmd_data8; + uint32_t sdmax_rlcx_midcmd_cntl; uint32_t reserved_42; uint32_t reserved_43; uint32_t reserved_44; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent
This prepares for GFXv9 (Vega10), which has 64-bit doorbells. Signed-off-by: Felix Kuehling--- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 10 +++ drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 48 --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++-- 3 files changed, 39 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 7b57995..f563acb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -41,6 +41,7 @@ static const struct kfd_device_info kaveri_device_info = { .max_pasid_bits = 16, /* max num of queues for KV.TODO should be a dynamic value */ .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -55,6 +56,7 @@ static const struct kfd_device_info carrizo_device_info = { .max_pasid_bits = 16, /* max num of queues for CZ.TODO should be a dynamic value */ .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -70,6 +72,7 @@ static const struct kfd_device_info hawaii_device_info = { .max_pasid_bits = 16, /* max num of queues for KV.TODO should be a dynamic value */ .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -83,6 +86,7 @@ static const struct kfd_device_info tonga_device_info = { .asic_family = CHIP_TONGA, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -96,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = { .asic_family = CHIP_TONGA, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -109,6 +114,7 @@ static const struct kfd_device_info fiji_device_info = { .asic_family = CHIP_FIJI, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -122,6 +128,7 @@ static const struct kfd_device_info fiji_vf_device_info = { .asic_family = CHIP_FIJI, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -136,6 +143,7 @@ static const struct kfd_device_info polaris10_device_info = { .asic_family = CHIP_POLARIS10, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -149,6 +157,7 @@ static const struct kfd_device_info polaris10_vf_device_info = { .asic_family = CHIP_POLARIS10, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, @@ -162,6 +171,7 @@ static const struct kfd_device_info polaris11_device_info = { .asic_family = CHIP_POLARIS11, .max_pasid_bits = 16, .max_no_of_hqd = 24, + .doorbell_size = 4, .ih_ring_entry_size = 4 * sizeof(uint32_t), .event_interrupt_class = _interrupt_class_cik, .num_of_watch_points = 4, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index ebb4da14..4840314 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -33,7 +33,6 @@ static DEFINE_IDA(doorbell_ida); static unsigned int max_doorbell_slices; -#define KFD_SIZE_OF_DOORBELL_IN_BYTES 4 /* * Each device exposes a doorbell aperture, a PCI MMIO aperture that @@ -50,9 +49,9 @@ static unsigned int max_doorbell_slices; */ /* # of doorbell bytes allocated for each process. */ -static inline size_t doorbell_process_allocation(void) +static size_t kfd_doorbell_process_slice(struct kfd_dev *kfd) { - return roundup(KFD_SIZE_OF_DOORBELL_IN_BYTES * + return roundup(kfd->device_info->doorbell_size * KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
RE: RFC for a render API to support adaptive sync and VRR
> From: Manasi Navare [mailto:manasi.d.nav...@intel.com] > Sent: Tuesday, April 10, 2018 17:37 > To: Wentland, Harry> Cc: amd-gfx mailing list ; Daniel Vetter > ; Haehnle, Nicolai > ; Daenzer, Michel ; Deucher, > Alexander ; > Koenig, Christian ; dri-devel > ; Cyr, Aric ; Koo, > Anthony > Subject: Re: RFC for a render API to support adaptive sync and VRR > > On Tue, Apr 10, 2018 at 11:03:02AM -0400, Harry Wentland wrote: > > Adding Anthony and Aric who've been working on Freesync with DC on other > > OSes for a while. > > > > On 2018-04-09 05:45 PM, Manasi Navare wrote: > > > Thanks for initiating the discussion. Find my comments below: > > > > > > On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote: > > >> Adding dri-devel, which I should've included from the start. > > >> > > >> On 2018-04-09 03:56 PM, Harry Wentland wrote: > > >>> === What is adaptive sync and VRR? === > > >>> > > >>> Adaptive sync has been part of the DisplayPort spec for a while now and > > >>> allows graphics adapters to drive displays with varying > frame timings. VRR (variable refresh rate) is essentially the same, but > defined for HDMI. > > >>> > > >>> > > >>> > > >>> === Why allow variable frame timings? === > > >>> > > >>> Variable render times don't align with fixed refresh rates, leading to > > >>> stuttering, tearing, and/or input lag. > > >>> > > >>> e.g. (rc = render completion, dr = display refresh) > > >>> > > >>> rc B CDE F > > >>> dr A B C C D E F > > >>> > > >>> ^ ^ > > >>> frame missed > > >>> repeated display > > >>> twice refresh > > >>> > > >>> > > >>> > > >>> === Other use cases of adaptive sync > > >>> > > >>> Beside the variable render case, adaptive sync also allows adjustment > > >>> of refresh rates without a mode change. One such use > case would be 24 Hz video. > > >>> > > > > > > One of the the advantages here when the render speed is slower than the > > > display refresh rate, since we are stretching the vertical > blanking interval > > > the display adapters will follow "draw fast and then go idle" approach. > > > This gives power savings when render rate is lower than the > display refresh rate. > > > > Are you talking about a use case, such as an idle desktop, where the > > renders are quite sporadic? > > > > I was refering to a case where the render rate is lower say 24Hz but the > display rate is fixed 60Hz, that means we are pretty much > displaying the same frame > twice. But with Adaptive Sync, the display rate would be lowered to 24hz and > the vertical blanking time will be stretched where > instead of drawing the > same frame twice, the system is now idle in that extra blanking time thus > giving some power savings. Hi Manasi, Assuming the panel could go down to 24Hz, this would be possible. If it was a game, it'd naturally do this since the refresh rate would track the render rate. For a video where you have an adaptive sync capable player, it could request a fixed duration to achieve the same thing. Most panels do not support as low as 24Hz however, so usually in the video case at least you'd end up with say 48Hz with the driver/HW providing automatic frame doubling. > > > > > >>> > > >>> > > >>> === A DRM render API to support variable refresh rates === > > >>> > > >>> In order to benefit from adaptive sync and VRR userland needs a way to > > >>> let us know whether to vary frame timings or to target > a different frame time. These can be provided as atomic properties on a CRTC: > > >>> * bool variable_refresh_compatible > > >>> * int target_frame_duration_ns (nanosecond frame duration) > > >>> > > >>> This gives us the following cases: > > >>> > > >>> variable_refresh_compatible = 0, target_frame_duration_ns = 0 > > >>> * drive monitor at timing's normal refresh rate > > >>> > > >>> variable_refresh_compatible = 1, target_frame_duration_ns = 0 > > >>> * send new frame to monitor as soon as it's available, if within > > >>> min/max of monitor's reported capabilities > > >>> > > >>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0 > > >>> * send new frame to monitor with the specified target_frame_duration_ns > > >>> > > >>> When a target_frame_duration_ns or variable_refresh_compatible cannot > > >>> be supported the atomic check will reject the > commit. > > >>> > > > > > > What I would like is two sets of properties on a CRTC or preferably on a > > > connector: > > > > > > KMD properties that UMD can query: > > > * vrr_capable - This will be an immutable property for exposing >