Re: [PATCH xf86-video-amdgpu 02/19] Guard ODEV_ATTRIB_FD usage with the correct ifdef

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 09:27, Michel Dänzer  wrote:
> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>> From: Emil Velikov 
>>
>> Signed-off-by: Emil Velikov 
>> ---
>>  src/amdgpu_probe.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>> index 075e5c1..e65c83b 100644
>> --- a/src/amdgpu_probe.c
>> +++ b/src/amdgpu_probe.c
>> @@ -120,7 +120,7 @@ static int amdgpu_kernel_open_fd(ScrnInfoPtr pScrn,
>>   char *busid;
>>   int fd;
>>
>> -#ifdef XF86_PDEV_SERVER_FD
>> +#ifdef ODEV_ATTRIB_FD
>>   if (platform_dev) {
>>   fd = xf86_get_platform_device_int_attrib(platform_dev,
>>ODEV_ATTRIB_FD, -1);
>>
>
> ODEV_ATTRIB_FD doesn't seem obviously more "correct" than
> XF86_PDEV_SERVER_FD, since both were added in the same xserver commit,
> and the latter might be helpful for understanding this is related to the
> other code guarded by XF86_PDEV_SERVER_FD.
>
All the XF86_PDEV_SERVER_FD code is dropped with a later commit ;-)
I could move this patch just after said commit, or you prefer to keep
the original guard?

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 10:58, Michel Dänzer  wrote:
> On 2018-04-10 11:47 AM, Emil Velikov wrote:
>> On 10 April 2018 at 09:28, Michel Dänzer  wrote:
>>> On 2018-04-04 04:29 PM, Emil Velikov wrote:
 From: Emil Velikov 

 Seems like we've been leaking this for years. It became more obvious
 with the recent refactoring.

 Signed-off-by: Emil Velikov 
 ---
  src/amdgpu_probe.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
 index 537d44c..588891c 100644
 --- a/src/amdgpu_probe.c
 +++ b/src/amdgpu_probe.c
 @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num,
   return TRUE;

  error:
 + free(pPriv->ptr);
 + pPriv->ptr = NULL;
   return FALSE;
  }


>>>
>>> valgrind doesn't report a leak if I force this error path; presumably
>>> Xorg frees the private after returning FALSE here.
>>>
>> Just double-checked and Xorg does not know anything about ptr. The
>> only one who clears it up is AMDGPUFreeScreen_KMS.
>>
>> The magic (for this and the other 'leak') seems to be happening in
>> xf86platformAddDevice. Namely:
>>  - ::platformProbe is called via doPlatformProbe
>>  - the driver explicitly calls xf86AllocateScreen, yet fails later on
>>  - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false
>>  - ::PreInit fails, ::configured is false
>>  - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka
>> AMDGPUFreeScreen_KMS)
>>
>> Eventually, I could unwrap all that although it makes sense to keep
>> things simpler. As effectively done by the patch.
>>
>> I believe you'll agree?
>
> I'm afraid not. There's no leak because it's getting cleaned up as
> designed, so there's no need for this change.
>
Fair enough. I'll swap the commit with a comment one for v2.
This way, the next person will be less tempted to send the same patch.

Something like:

pPriv->ptr is freed in our ::FreeScreen callback. Latter of which gets
called by xf86DeleteScreen() as the driver ::*Probe call fails.

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 17/19] Store device_name as AMDGPUEntRec::master_node

2018-04-10 Thread Michel Dänzer
On 2018-04-10 11:51 AM, Emil Velikov wrote:
> On 10 April 2018 at 09:29, Michel Dänzer  wrote:
>> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>>> From: Emil Velikov 
>>>
>>> Rename the variable to reflect what it is. Plus move it out of the dri2
>>> section - it's used in dri2 and dri3.
>>>
>>> Signed-off-by: Emil Velikov 
>>
>> [...]
>>
>>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>>> index 4959bd6..e9afe42 100644
>>> --- a/src/amdgpu_probe.c
>>> +++ b/src/amdgpu_probe.c
>>> @@ -178,6 +178,10 @@ static Bool amdgpu_device_setup(ScrnInfoPtr pScrn,
>>>   if (pAMDGPUEnt->fd < 0)
>>>   return FALSE;
>>>
>>> + pAMDGPUEnt->master_node = drmGetDeviceNameFromFd2(pAMDGPUEnt->fd);
>>> + if (pAMDGPUEnt->master_node)
>>> +goto error_amdgpu;
>>
>> This should be
>>
>> if (!pAMDGPUEnt->master_node)
>>
>> shouldn't it?
>>
>>
>> ... Which raises the question: How did you test these patches? :)
>>
> I mentioned it in the cover letter, but seems to have dropped it -
> they are untested.
> There's a r600 card close-by I could test with, but no amdgpu one :-\

Okay. I can probably test this series, but in general it's preferable
for patches to be tested before sending them out for review.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)

2018-04-10 Thread Huang Rui
On Mon, Apr 09, 2018 at 09:50:19PM +0800, Christian König wrote:
> Hi Andrey,
> 
> I think the problem Ray wants to point to is that we now release the 
> stolen memory after device initialization.
> 
> So during S3 we might run into issues because the first 8MB of VRAM are 
> corrupted after startup.
> 

Yes. Andrey, Christian. That's why I reserve 8M stolen size bo at the first
of the vram. I will forward the history information to you ;-)

And nevermind, let me find a vega10 card to test whether these two patches
impact the case that I encountered before. Will let you know the result
later.

Thanks,
Ray

> Christian.
> 
> Am 09.04.2018 um 15:26 schrieb Grodzovsky, Andrey:
> > Top posting (mobile)
> >
> > I tested S3 with DC enabled only. Even if I disable DC I need a device with 
> > less then 8M VRAM to reproduce it, don't I ? Otherwise we just gonna 
> > reserve pre OS FB size of VRAM and not corrupt it. Right ? Should probably 
> > test it with forcing VRAM size to less then 8M...
> >
> > Andrey
> >
> > 
> > From: Huang Rui 
> > Sent: 09 April 2018 04:23:06
> > To: Alex Deucher; Grodzovsky, Andrey
> > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> > Subject: Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of 
> > vram for fw hand-over (v2)
> >
> > On Fri, Apr 06, 2018 at 02:54:09PM -0500, Alex Deucher wrote:
> >> Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
> >> steal enough to cover the current display size as set by the vbios.
> >>
> >> If no memory is used (e.g., secondary or headless card), skip
> >> stolen memory reserve.
> >>
> >> v2: skip reservation if vram is limited, address Christian's comments
> >>
> >> Reviewed-and-Tested-by: Andrey Grodzovsky  (v1)
> >> Signed-off-by: Alex Deucher 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 +
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c   | 23 +--
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 23 +--
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 23 +--
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 51 
> >> +
> >>   5 files changed, 116 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> index 205da3ff9cd0..46c69ad34461 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> @@ -1454,12 +1454,14 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> >>return r;
> >>}
> >>
> >> - r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE,
> >> - AMDGPU_GEM_DOMAIN_VRAM,
> >> - >stolen_vga_memory,
> >> - NULL, NULL);
> >> - if (r)
> >> - return r;
> >> + if (adev->gmc.stolen_size) {
> >> + r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, 
> >> PAGE_SIZE,
> >> + AMDGPU_GEM_DOMAIN_VRAM,
> >> + >stolen_vga_memory,
> >> + NULL, NULL);
> >> + if (r)
> >> + return r;
> >> + }
> >>DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
> >> (unsigned) (adev->gmc.real_vram_size / (1024 * 1024)));
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
> >> b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> index 5617cf62c566..24e1ea36b454 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
> >> @@ -825,6 +825,25 @@ static int gmc_v6_0_late_init(void *handle)
> >>return 0;
> >>   }
> >>
> >> +static unsigned gmc_v6_0_get_vbios_fb_size(struct amdgpu_device *adev)
> >> +{
> >> + u32 d1vga_control = RREG32(mmD1VGA_CONTROL);
> >> + unsigned size;
> >> +
> >> + if (REG_GET_FIELD(d1vga_control, D1VGA_CONTROL, D1VGA_MODE_ENABLE)) {
> >> + size = 9 * 1024 * 1024; /* reserve 8MB for vga emulator and 
> >> 1 MB for FB */
> >> + } else {
> >> + u32 viewport = RREG32(mmVIEWPORT_SIZE);
> >> + size = (REG_GET_FIELD(viewport, VIEWPORT_SIZE, 
> >> VIEWPORT_HEIGHT) *
> >> + REG_GET_FIELD(viewport, VIEWPORT_SIZE, 
> >> VIEWPORT_WIDTH) *
> >> + 4);
> >> + }
> >> + /* return 0 if the pre-OS buffer uses up most of vram */
> >> + if ((adev->gmc.real_vram_size - size) < (8 * 1024 * 1024))
> >> + return 0;
> >> + return size;
> >> +}
> >> +
> >>   static int gmc_v6_0_sw_init(void *handle)
> >>   {
> >>int r;
> >> @@ -851,8 +870,6 @@ static int gmc_v6_0_sw_init(void *handle)
> >>
> >>adev->gmc.mc_mask = 0xffULL;
> >>
> >> - 

[PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks

2018-04-10 Thread Shirish S
UVD & VCE blocks take up around 1200 msecs of boot time.
This patch adds them to the late init work function
so as to reduce boot time.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e798b3..54f1320 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1589,7 +1589,9 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.sw)
continue;
-   if (adev->ip_blocks[i].status.hw)
+   if (adev->ip_blocks[i].status.hw ||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD ||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)
continue;
r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
if (r) {
@@ -1639,17 +1641,18 @@ static bool amdgpu_device_check_vram_lost(struct 
amdgpu_device *adev)
 }
 
 /**
- * amdgpu_device_ip_late_set_cg_state - late init for clockgating
+ * amdgpu_late_init_ip_blocks - late init of some IP blocks and clockgating
  *
  * @adev: amdgpu_device pointer
  *
- * Late initialization pass enabling clockgating for hardware IPs.
+ * Late initialization pass for high time consuming IP blocks like UVD & VCE
+ * along with  enabling clockgating for hardware IPs.
  * The list of all the hardware IPs that make up the asic is walked and the
  * set_clockgating_state callbacks are run.  This stage is run late
  * in the init process.
  * Returns 0 on success, negative error code on failure.
  */
-static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev)
+static int amdgpu_late_init_ip_blocks(struct amdgpu_device *adev)
 {
int i = 0, r;
 
@@ -1657,6 +1660,19 @@ static int amdgpu_device_ip_late_set_cg_state(struct 
amdgpu_device *adev)
return 0;
 
for (i = 0; i < adev->num_ip_blocks; i++) {
+   if (!adev->ip_blocks[i].status.hw &&
+   (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD 
||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)) 
{
+   r = adev->ip_blocks[i].version->funcs->hw_init((void 
*)adev);
+   if (r) {
+   DRM_ERROR("hw_init of IP block <%s> failed 
%d\n",
+ 
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
+   }
+
+   adev->ip_blocks[i].status.hw = true;
+   }
+
if (!adev->ip_blocks[i].status.valid)
continue;
/* skip CG for VCE/UVD, it's handled specially */
@@ -1823,7 +1839,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
  *
  * @work: work_struct
  *
- * Work handler for amdgpu_device_ip_late_set_cg_state.  We put the
+ * Work handler for amdgpu_late_init_ip_blocks.  We put the
  * clockgating setup into a worker thread to speed up driver init and
  * resume from suspend.
  */
@@ -1831,7 +1847,7 @@ static void 
amdgpu_device_ip_late_init_func_handler(struct work_struct *work)
 {
struct amdgpu_device *adev =
container_of(work, struct amdgpu_device, late_init_work.work);
-   amdgpu_device_ip_late_set_cg_state(adev);
+   amdgpu_late_init_ip_blocks(adev);
 }
 
 /**
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: remove AMDGPU_GEM_CREATE_NO_FALLBACK handling from CS again

2018-04-10 Thread zhoucm1



On 2018年04月10日 17:00, Christian König wrote:

Am 10.04.2018 um 04:43 schrieb zhoucm1:



On 2018年04月09日 18:19, Christian König wrote:

That should purely be handled by preferred/allowed domains.
Although this flag isn't exported to user space yet, I'm curious that 
how preferred/allowed domains handle no_fallback?

IIRC, currently, our driver will always add GTT fallback for VRAM bo.


And that is intentional. Going a step further back I think moving the 
fallback handling into amdgpu_bo_do_create() and adding the flag was a 
mistake to begin with.


Going to send patches to revert all this and further clean the stuff up.
if you are able to not change the preferred domain when fallback, it's 
no problem to me.


David Zhou


Regards,
Christian.



Regards,
David Zhou


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 68af2f878bc9..e1756b68a17b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct 
amdgpu_cs_parser *p,

  amdgpu_bo_in_cpu_visible_vram(bo))
  p->bytes_moved_vis += ctx.bytes_moved;
  -    if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains &&
-    !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) {
+    if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
  domain = bo->allowed_domains;
  goto retry;
  }






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 04/19] Remove drmCheckModesettingSupported and kernel module loading

2018-04-10 Thread Michel Dänzer
On 2018-04-10 11:20 AM, Emil Velikov wrote:
> On 10 April 2018 at 09:26, Michel Dänzer  wrote:
>> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>>> From: Emil Velikov 
>>>
>>> The former of these is a UMS artefact which gives incorrect and
>>> misleading promise whether "KMS" is supported. Not to mention that
>>> AMDGPU is a only KMS driver.
>>>
>>> In a similar fashion xf86LoadKernelModule() is a relic of the times,
>>> where platforms had no scheme of detecting and loading the appropriate
>>> kernel module.
>>>
>>> Cc: Robert Millan 
>>> Signed-off-by: Emil Velikov 
>>> ---
>>> Robert, off the top of my head this should work with FreeBSD. Admittedly
>>> I'm not an expert on the platform. Please give it a test.
>>
>> I want to get confirmation from Robert that this will work on FreeBSD
>> now, since he explicitly restored the kernel module loading code in
>> https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=bfbff3b246db509c820df17b8fcf5899882ffcfa
>> .
>>
> Fully agreed!. That's why I added him to the CC list.
> 
> Throwing some ideas:
>  - If it's still needed can we keep it !Linux only?

The first drmCheckModesettingSupported call? Fine with me.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 17/19] Store device_name as AMDGPUEntRec::master_node

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 09:29, Michel Dänzer  wrote:
> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>> From: Emil Velikov 
>>
>> Rename the variable to reflect what it is. Plus move it out of the dri2
>> section - it's used in dri2 and dri3.
>>
>> Signed-off-by: Emil Velikov 
>
> [...]
>
>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>> index 4959bd6..e9afe42 100644
>> --- a/src/amdgpu_probe.c
>> +++ b/src/amdgpu_probe.c
>> @@ -178,6 +178,10 @@ static Bool amdgpu_device_setup(ScrnInfoPtr pScrn,
>>   if (pAMDGPUEnt->fd < 0)
>>   return FALSE;
>>
>> + pAMDGPUEnt->master_node = drmGetDeviceNameFromFd2(pAMDGPUEnt->fd);
>> + if (pAMDGPUEnt->master_node)
>> +goto error_amdgpu;
>
> This should be
>
> if (!pAMDGPUEnt->master_node)
>
> shouldn't it?
>
>
> ... Which raises the question: How did you test these patches? :)
>
I mentioned it in the cover letter, but seems to have dropped it -
they are untested.
There's a r600 card close-by I could test with, but no amdgpu one :-\

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails

2018-04-10 Thread Michel Dänzer
On 2018-04-10 11:47 AM, Emil Velikov wrote:
> On 10 April 2018 at 09:28, Michel Dänzer  wrote:
>> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>>> From: Emil Velikov 
>>>
>>> Seems like we've been leaking this for years. It became more obvious
>>> with the recent refactoring.
>>>
>>> Signed-off-by: Emil Velikov 
>>> ---
>>>  src/amdgpu_probe.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>>> index 537d44c..588891c 100644
>>> --- a/src/amdgpu_probe.c
>>> +++ b/src/amdgpu_probe.c
>>> @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num,
>>>   return TRUE;
>>>
>>>  error:
>>> + free(pPriv->ptr);
>>> + pPriv->ptr = NULL;
>>>   return FALSE;
>>>  }
>>>
>>>
>>
>> valgrind doesn't report a leak if I force this error path; presumably
>> Xorg frees the private after returning FALSE here.
>>
> Just double-checked and Xorg does not know anything about ptr. The
> only one who clears it up is AMDGPUFreeScreen_KMS.
> 
> The magic (for this and the other 'leak') seems to be happening in
> xf86platformAddDevice. Namely:
>  - ::platformProbe is called via doPlatformProbe
>  - the driver explicitly calls xf86AllocateScreen, yet fails later on
>  - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false
>  - ::PreInit fails, ::configured is false
>  - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka
> AMDGPUFreeScreen_KMS)
> 
> Eventually, I could unwrap all that although it makes sense to keep
> things simpler. As effectively done by the patch.
> 
> I believe you'll agree?

I'm afraid not. There's no leak because it's getting cleaned up as
designed, so there's no need for this change.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks

2018-04-10 Thread Shirish S
UVD & VCE blocks take up around 1200 msecs of boot time.
This patch adds them to the late init work function
so as to reduce boot time.

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0e798b3..54f1320 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1589,7 +1589,9 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.sw)
continue;
-   if (adev->ip_blocks[i].status.hw)
+   if (adev->ip_blocks[i].status.hw ||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD ||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)
continue;
r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
if (r) {
@@ -1639,17 +1641,18 @@ static bool amdgpu_device_check_vram_lost(struct 
amdgpu_device *adev)
 }
 
 /**
- * amdgpu_device_ip_late_set_cg_state - late init for clockgating
+ * amdgpu_late_init_ip_blocks - late init of some IP blocks and clockgating
  *
  * @adev: amdgpu_device pointer
  *
- * Late initialization pass enabling clockgating for hardware IPs.
+ * Late initialization pass for high time consuming IP blocks like UVD & VCE
+ * along with  enabling clockgating for hardware IPs.
  * The list of all the hardware IPs that make up the asic is walked and the
  * set_clockgating_state callbacks are run.  This stage is run late
  * in the init process.
  * Returns 0 on success, negative error code on failure.
  */
-static int amdgpu_device_ip_late_set_cg_state(struct amdgpu_device *adev)
+static int amdgpu_late_init_ip_blocks(struct amdgpu_device *adev)
 {
int i = 0, r;
 
@@ -1657,6 +1660,19 @@ static int amdgpu_device_ip_late_set_cg_state(struct 
amdgpu_device *adev)
return 0;
 
for (i = 0; i < adev->num_ip_blocks; i++) {
+   if (!adev->ip_blocks[i].status.hw &&
+   (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_UVD 
||
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_VCE)) 
{
+   r = adev->ip_blocks[i].version->funcs->hw_init((void 
*)adev);
+   if (r) {
+   DRM_ERROR("hw_init of IP block <%s> failed 
%d\n",
+ 
adev->ip_blocks[i].version->funcs->name, r);
+   return r;
+   }
+
+   adev->ip_blocks[i].status.hw = true;
+   }
+
if (!adev->ip_blocks[i].status.valid)
continue;
/* skip CG for VCE/UVD, it's handled specially */
@@ -1823,7 +1839,7 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
  *
  * @work: work_struct
  *
- * Work handler for amdgpu_device_ip_late_set_cg_state.  We put the
+ * Work handler for amdgpu_late_init_ip_blocks.  We put the
  * clockgating setup into a worker thread to speed up driver init and
  * resume from suspend.
  */
@@ -1831,7 +1847,7 @@ static void 
amdgpu_device_ip_late_init_func_handler(struct work_struct *work)
 {
struct amdgpu_device *adev =
container_of(work, struct amdgpu_device, late_init_work.work);
-   amdgpu_device_ip_late_set_cg_state(adev);
+   amdgpu_late_init_ip_blocks(adev);
 }
 
 /**
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Recall: [PATCH] drm/amdgpu: defer initing UVD & VCE IP blocks

2018-04-10 Thread S, Shirish
S, Shirish would like to recall the message, "[PATCH] drm/amdgpu: defer initing 
UVD & VCE IP blocks".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 19/19] TODO

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 09:30, Michel Dänzer  wrote:
> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>> From: Emil Velikov 
>>
>> Signed-off-by: Emil Velikov 
>> ---
>>  todo | 9 +
>>  1 file changed, 9 insertions(+)
>>  create mode 100644 todo
>>
>> diff --git a/todo b/todo
>> new file mode 100644
>> index 000..10c1ad5
>> --- /dev/null
>> +++ b/todo
>> @@ -0,0 +1,9 @@
>> + - on amdgpu_probe failure, the pScrn entry is leaked - missing 
>> api/examples?
>
> Might be similar to patch 11; does valgrind actually report a leak if
> you force this?
>
>
>> + - introduce xf86ConfigEntity and use it
>> + - remove embedded AMDGPUInfoRec::pEnt
>> + - consistently use gAMDGPUEntityIndex or getAMDGPUEntityIndex
>> + - consistently use of pEnt/entity_num -> pScrn->list[], AMDPRIV
>> + - kill off DRI_1_ DRICreatePCIBusID - demote again to DRI1 only in X 
>> codebase
>> + - compose bus string early & strcmp instead of device_match?
>> + - remove embedded AMDGPUInfoRec::PciInfo - reuse EntityInfoRec::chipset, 
>> GDevRec::chiIDi, amdgpu_gpu_info::asic_id or ...
>> + - use odev to fetch render_node?
>
> I'm afraid I don't really see these as important enough to be tracked
> like this.
>
Agreed - no reason to keep these in-tree.

Idea was to gather feedback on the topics. One example:
Do we need the getAMDGPUEntityIndex helper, considering ~half of the
existing codebase uses it. Yet other half references
gAMDGPUEntityIndex directly.

Most of the above, seem to be a copy/paste from the radeon driver,
which in turn is a copy from (?) and the original commit lacks any
information :-\

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 04/19] Remove drmCheckModesettingSupported and kernel module loading

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 09:26, Michel Dänzer  wrote:
> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>> From: Emil Velikov 
>>
>> The former of these is a UMS artefact which gives incorrect and
>> misleading promise whether "KMS" is supported. Not to mention that
>> AMDGPU is a only KMS driver.
>>
>> In a similar fashion xf86LoadKernelModule() is a relic of the times,
>> where platforms had no scheme of detecting and loading the appropriate
>> kernel module.
>>
>> Cc: Robert Millan 
>> Signed-off-by: Emil Velikov 
>> ---
>> Robert, off the top of my head this should work with FreeBSD. Admittedly
>> I'm not an expert on the platform. Please give it a test.
>
> I want to get confirmation from Robert that this will work on FreeBSD
> now, since he explicitly restored the kernel module loading code in
> https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=bfbff3b246db509c820df17b8fcf5899882ffcfa
> .
>
Fully agreed!. That's why I added him to the CC list.

Throwing some ideas:
 - If it's still needed can we keep it !Linux only?
 - Wayland does not have a kernel module loading mechanism.
 - To prevent fan noise and/or card overheating, one really wants to
load the kernel module early. Not when X starts ;-)

-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 02/19] Guard ODEV_ATTRIB_FD usage with the correct ifdef

2018-04-10 Thread Michel Dänzer
On 2018-04-10 11:24 AM, Emil Velikov wrote:
> On 10 April 2018 at 09:27, Michel Dänzer  wrote:
>> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>>> From: Emil Velikov 
>>>
>>> Signed-off-by: Emil Velikov 
>>> ---
>>>  src/amdgpu_probe.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>>> index 075e5c1..e65c83b 100644
>>> --- a/src/amdgpu_probe.c
>>> +++ b/src/amdgpu_probe.c
>>> @@ -120,7 +120,7 @@ static int amdgpu_kernel_open_fd(ScrnInfoPtr pScrn,
>>>   char *busid;
>>>   int fd;
>>>
>>> -#ifdef XF86_PDEV_SERVER_FD
>>> +#ifdef ODEV_ATTRIB_FD
>>>   if (platform_dev) {
>>>   fd = xf86_get_platform_device_int_attrib(platform_dev,
>>>ODEV_ATTRIB_FD, -1);
>>>
>>
>> ODEV_ATTRIB_FD doesn't seem obviously more "correct" than
>> XF86_PDEV_SERVER_FD, since both were added in the same xserver commit,
>> and the latter might be helpful for understanding this is related to the
>> other code guarded by XF86_PDEV_SERVER_FD.
>>
> All the XF86_PDEV_SERVER_FD code is dropped with a later commit ;-)
> I could move this patch just after said commit, or you prefer to keep
> the original guard?

The latter, less churn. :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: remove AMDGPU_GEM_CREATE_NO_FALLBACK handling from CS again

2018-04-10 Thread Christian König

Am 10.04.2018 um 04:43 schrieb zhoucm1:



On 2018年04月09日 18:19, Christian König wrote:

That should purely be handled by preferred/allowed domains.
Although this flag isn't exported to user space yet, I'm curious that 
how preferred/allowed domains handle no_fallback?

IIRC, currently, our driver will always add GTT fallback for VRAM bo.


And that is intentional. Going a step further back I think moving the 
fallback handling into amdgpu_bo_do_create() and adding the flag was a 
mistake to begin with.


Going to send patches to revert all this and further clean the stuff up.

Regards,
Christian.



Regards,
David Zhou


Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 68af2f878bc9..e1756b68a17b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct 
amdgpu_cs_parser *p,

  amdgpu_bo_in_cpu_visible_vram(bo))
  p->bytes_moved_vis += ctx.bytes_moved;
  -    if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains &&
-    !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) {
+    if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
  domain = bo->allowed_domains;
  goto retry;
  }




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 11/19] Don't leak a AMDGPUEntRec instance if amdgpu_device_setup fails

2018-04-10 Thread Emil Velikov
On 10 April 2018 at 09:28, Michel Dänzer  wrote:
> On 2018-04-04 04:29 PM, Emil Velikov wrote:
>> From: Emil Velikov 
>>
>> Seems like we've been leaking this for years. It became more obvious
>> with the recent refactoring.
>>
>> Signed-off-by: Emil Velikov 
>> ---
>>  src/amdgpu_probe.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/src/amdgpu_probe.c b/src/amdgpu_probe.c
>> index 537d44c..588891c 100644
>> --- a/src/amdgpu_probe.c
>> +++ b/src/amdgpu_probe.c
>> @@ -243,6 +243,8 @@ amdgpu_probe(ScrnInfoPtr pScrn, int entity_num,
>>   return TRUE;
>>
>>  error:
>> + free(pPriv->ptr);
>> + pPriv->ptr = NULL;
>>   return FALSE;
>>  }
>>
>>
>
> valgrind doesn't report a leak if I force this error path; presumably
> Xorg frees the private after returning FALSE here.
>
Just double-checked and Xorg does not know anything about ptr. The
only one who clears it up is AMDGPUFreeScreen_KMS.

The magic (for this and the other 'leak') seems to be happening in
xf86platformAddDevice. Namely:
 - ::platformProbe is called via doPlatformProbe
 - the driver explicitly calls xf86AllocateScreen, yet fails later on
 - back in Xorg, the "if (old_screens == xf86NumGPUDrivers)" is false
 - ::PreInit fails, ::configured is false
 - xf86DeleteScreen() gets called, which dives into ::FreeScreen (aka
AMDGPUFreeScreen_KMS)

Eventually, I could unwrap all that although it makes sense to keep
things simpler. As effectively done by the patch.

I believe you'll agree?
-Emil
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr] Fix VMID of chained IBs

2018-04-10 Thread Tom St Denis
We were using the VMID field literally when inside an IB it's inherited instead

Signed-off-by: Tom St Denis 
---
 src/lib/dump_ib.c | 8 
 src/lib/ring_decode.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/lib/dump_ib.c b/src/lib/dump_ib.c
index 74a3241c309e..cbf859e3633b 100644
--- a/src/lib/dump_ib.c
+++ b/src/lib/dump_ib.c
@@ -39,10 +39,10 @@ void umr_dump_ib(struct umr_asic *asic, struct 
umr_ring_decoder *decoder)
if (decoder->src.ib_addr == 0)
printf("ring[%s%u%s]", BLUE, (unsigned)decoder->src.addr, RST);
else
-   printf("IB[%s%u%s] at %s%d%s@%s0x%llx%s",
-   BLUE, (unsigned)decoder->src.addr, RST,
-   YELLOW, (int)decoder->src.vmid, RST,
-   YELLOW, (unsigned long long)decoder->src.ib_addr, RST);
+   printf("IB[%s%u%s@%s0x%llx%s + %s0x%x%s]",
+   BLUE, (int)decoder->src.vmid, RST,
+   YELLOW, (unsigned long long)decoder->src.ib_addr, RST,
+   YELLOW, (unsigned)decoder->src.addr * 4, RST);
 
printf("\n");
 
diff --git a/src/lib/ring_decode.c b/src/lib/ring_decode.c
index c1d6bcb98bae..42265e0a74c9 100644
--- a/src/lib/ring_decode.c
+++ b/src/lib/ring_decode.c
@@ -540,7 +540,7 @@ static void print_decode_pm4_pkt3(struct umr_asic *asic, 
struct umr_ring_decoder
break;
case 2: printf("IB_SIZE:%s%lu%s, VMID: 
%s%lu%s", BLUE, BITS(ib, 0, 20), RST, BLUE, BITS(ib, 24, 32), RST);
decoder->pm4.next_ib_state.ib_size = 
BITS(ib, 0, 20) * 4;
-   decoder->pm4.next_ib_state.ib_vmid = 
BITS(ib, 24, 32);
+   decoder->pm4.next_ib_state.ib_vmid = 
decoder->next_ib_info.vmid ? decoder->next_ib_info.vmid : BITS(ib, 24, 32);
add_ib_pm4(decoder);
break;
default: printf("Invalid word for opcode 
0x%02lx", (unsigned long)decoder->pm4.cur_opcode);
-- 
2.14.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)

2018-04-10 Thread Andrey Grodzovsky
So with my and Alex's patches you still observe corruption ? What if you 
remove my patch and keep Alex's patch ?


Andrey


On 04/10/2018 06:53 AM, Huang Rui wrote:

On Mon, Apr 09, 2018 at 11:17:58AM -0400, Andrey Grodzovsky wrote:

OK, tested with DC disabled , no issues on resume (no visible
corruption on display or errors in log). Now the display itself
freezes after amdgpu is loaded with DC disabled, this happens only
when BIOS in VGA mode , in console mode no such problem. Happens
before my and Alex patches, looks like a separate issue.

So anyway, if corruption would be there (beginning of VRAM and hence
scanout FB corrupted) , i should have seen it with grub in console
mode where display is fine and not freezing.


Reproduce steps:
1. sudo modprobe amdgpu dc=0 ip_block_mask=0x7f
2. pm-suspend/resume two times.

You will see the start of vram is corrupted after S3 resume.

[  570.343635] [drm] PCIE GART of 512M enabled (table at 0x00F4).
[  570.343642] [drm] PSP is resuming...
[  570.343713] gmc_v9_0_process_interrupt: 12 callbacks suppressed
[  570.343715] amdgpu :03:00.0: [mmhub] VMC page fault (src_id:0 ring:0 
vmid:0 pasid:0)
[  570.343716] amdgpu :03:00.0:   at page 0x00f60070 from 18
[  570.343716] amdgpu :03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010
[  570.525510] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[  570.525523] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP 
block  failed -62
[  570.525536] [drm:amdgpu_device_resume [amdgpu]] *ERROR* 
amdgpu_device_ip_resume failed (-62).
[  570.536704] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow 
Control: Rx/Tx
[  570.540496] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -62
[  570.547879] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO
[  570.555434] call :03:00.0+ returned -62 after 1973202 usecs
[  570.689812] PM: Device :03:00.0 failed to resume async: error -62

I attached the whole dmesg.

Thanks,
Ray


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Harry Wentland
Adding Anthony and Aric who've been working on Freesync with DC on other OSes 
for a while.

On 2018-04-09 05:45 PM, Manasi Navare wrote:
> Thanks for initiating the discussion. Find my comments below:
> 
> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
>> Adding dri-devel, which I should've included from the start.
>>
>> On 2018-04-09 03:56 PM, Harry Wentland wrote:
>>> === What is adaptive sync and VRR? ===
>>>
>>> Adaptive sync has been part of the DisplayPort spec for a while now and 
>>> allows graphics adapters to drive displays with varying frame timings. VRR 
>>> (variable refresh rate) is essentially the same, but defined for HDMI.
>>>
>>>
>>>
>>> === Why allow variable frame timings? ===
>>>
>>> Variable render times don't align with fixed refresh rates, leading to
>>> stuttering, tearing, and/or input lag.
>>>
>>> e.g. (rc = render completion, dr = display refresh)
>>>
>>> rc   B  CDE  F
>>> dr  A   B   C   C   D   E   F
>>>
>>> ^ ^
>>>   frame missed 
>>>  repeated   display
>>>   twice refresh   
>>>
>>>
>>>
>>> === Other use cases of adaptive sync 
>>>
>>> Beside the variable render case, adaptive sync also allows adjustment of 
>>> refresh rates without a mode change. One such use case would be 24 Hz video.
>>>
> 
> One of the the advantages here when the render speed is slower than the 
> display refresh rate, since we are stretching the vertical blanking interval
> the display adapters will follow "draw fast and then go idle" approach. This 
> gives power savings when render rate is lower than the display refresh rate.

Are you talking about a use case, such as an idle desktop, where the renders 
are quite sporadic?

>  
>>>
>>>
>>> === A DRM render API to support variable refresh rates ===
>>>
>>> In order to benefit from adaptive sync and VRR userland needs a way to let 
>>> us know whether to vary frame timings or to target a different frame time. 
>>> These can be provided as atomic properties on a CRTC:
>>>  * bool variable_refresh_compatible
>>>  * int  target_frame_duration_ns (nanosecond frame duration)
>>>
>>> This gives us the following cases:
>>>
>>> variable_refresh_compatible = 0, target_frame_duration_ns = 0
>>>  * drive monitor at timing's normal refresh rate
>>>
>>> variable_refresh_compatible = 1, target_frame_duration_ns = 0
>>>  * send new frame to monitor as soon as it's available, if within min/max 
>>> of monitor's reported capabilities
>>>
>>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
>>>  * send new frame to monitor with the specified target_frame_duration_ns
>>>
>>> When a target_frame_duration_ns or variable_refresh_compatible cannot be 
>>> supported the atomic check will reject the commit.
>>>
> 
> What I would like is two sets of properties on a CRTC or preferably on a 
> connector:
> 
> KMD properties that UMD can query:
> * vrr_capable -  This will be an immutable property for exposing hardware's 
> capability of supporting VRR. This will be set by the kernel after 
> reading the EDID mode information and monitor range capabilities.
> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh 
> rates supported.
> These properties are optional and will be created and attached to the DP/eDP 
> connector when the connector
> is getting intialized.
> 

If we're talking about the properties from the EDID these might not necessarily 
align with a currently selected mode, which might have a refresh rate lower 
than the vrr_refresh_max, requiring us to cap it at that. In some scenarios we 
also might do low framerate compensation [1] where we do magic to allow the 
framerate to drop below the supported range.

I think if a vrr_refresh_max/min are exposed to UMD these should really be only 
for informational purposes, in which case it might make more sense to expose 
them through sysfs or even debugfs entries.

[1] https://www.amd.com/Documents/freesync-lfc.pdf

> Properties that you mentioned above that the UMD can set before kernel can 
> enable VRR functionality
> *bool vrr_enable or vrr_compatible
> target_frame_duration_ns
> 
> The monitor only specifies the monitor range through EDID. Apart from this 
> should we also need to scan the modes and check
> if there are modes that have the same pixel clock and horizontal timings but 
> variable vertical totals?
> 

I'm not sure about the VRR spec, but for adaptive sync we should only consider 
the range limits specified in the EDID and allow adaptive sync for modes within 
that range.

> I have RFC patches for all the above mentioned. If we get a 
> concensus/agreement on the above properties and method to check
> monitor's VRR capability, I can submit those patches atleast as RFC.
> 

That sounds great. I wouldn't mind trying those patches and then working 
together to arrive at 

Re: [PATCH umr] Fix VMID of chained IBs

2018-04-10 Thread Christian König

Am 10.04.2018 um 17:23 schrieb Tom St Denis:

We were using the VMID field literally when inside an IB it's inherited instead

Signed-off-by: Tom St Denis 


Acked-by: Christian König 


---
  src/lib/dump_ib.c | 8 
  src/lib/ring_decode.c | 2 +-
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/lib/dump_ib.c b/src/lib/dump_ib.c
index 74a3241c309e..cbf859e3633b 100644
--- a/src/lib/dump_ib.c
+++ b/src/lib/dump_ib.c
@@ -39,10 +39,10 @@ void umr_dump_ib(struct umr_asic *asic, struct 
umr_ring_decoder *decoder)
if (decoder->src.ib_addr == 0)
printf("ring[%s%u%s]", BLUE, (unsigned)decoder->src.addr, RST);
else
-   printf("IB[%s%u%s] at %s%d%s@%s0x%llx%s",
-   BLUE, (unsigned)decoder->src.addr, RST,
-   YELLOW, (int)decoder->src.vmid, RST,
-   YELLOW, (unsigned long long)decoder->src.ib_addr, RST);
+   printf("IB[%s%u%s@%s0x%llx%s + %s0x%x%s]",
+   BLUE, (int)decoder->src.vmid, RST,
+   YELLOW, (unsigned long long)decoder->src.ib_addr, RST,
+   YELLOW, (unsigned)decoder->src.addr * 4, RST);
  
  	printf("\n");
  
diff --git a/src/lib/ring_decode.c b/src/lib/ring_decode.c

index c1d6bcb98bae..42265e0a74c9 100644
--- a/src/lib/ring_decode.c
+++ b/src/lib/ring_decode.c
@@ -540,7 +540,7 @@ static void print_decode_pm4_pkt3(struct umr_asic *asic, 
struct umr_ring_decoder
break;
case 2: printf("IB_SIZE:%s%lu%s, VMID: 
%s%lu%s", BLUE, BITS(ib, 0, 20), RST, BLUE, BITS(ib, 24, 32), RST);
decoder->pm4.next_ib_state.ib_size = 
BITS(ib, 0, 20) * 4;
-   decoder->pm4.next_ib_state.ib_vmid = 
BITS(ib, 24, 32);
+   decoder->pm4.next_ib_state.ib_vmid = 
decoder->next_ib_info.vmid ? decoder->next_ib_info.vmid : BITS(ib, 24, 32);
add_ib_pm4(decoder);
break;
default: printf("Invalid word for opcode 0x%02lx", 
(unsigned long)decoder->pm4.cur_opcode);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Christian König

Am 10.04.2018 um 17:08 schrieb Harry Wentland:

On 2018-04-10 03:37 AM, Michel Dänzer wrote:

On 2018-04-10 08:45 AM, Christian König wrote:

Am 09.04.2018 um 23:45 schrieb Manasi Navare:

Thanks for initiating the discussion. Find my comments below:
On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:

On 2018-04-09 03:56 PM, Harry Wentland wrote:

=== A DRM render API to support variable refresh rates ===

In order to benefit from adaptive sync and VRR userland needs a way
to let us know whether to vary frame timings or to target a
different frame time. These can be provided as atomic properties on
a CRTC:
   * bool    variable_refresh_compatible
   * int    target_frame_duration_ns (nanosecond frame duration)

This gives us the following cases:

variable_refresh_compatible = 0, target_frame_duration_ns = 0
   * drive monitor at timing's normal refresh rate

variable_refresh_compatible = 1, target_frame_duration_ns = 0
   * send new frame to monitor as soon as it's available, if within
min/max of monitor's reported capabilities

variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
   * send new frame to monitor with the specified
target_frame_duration_ns

When a target_frame_duration_ns or variable_refresh_compatible
cannot be supported the atomic check will reject the commit.


What I would like is two sets of properties on a CRTC or preferably on
a connector:

KMD properties that UMD can query:
* vrr_capable -  This will be an immutable property for exposing
hardware's capability of supporting VRR. This will be set by the
kernel after
reading the EDID mode information and monitor range capabilities.
* vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
refresh rates supported.
These properties are optional and will be created and attached to the
DP/eDP connector when the connector
is getting intialized.

Mhm, aren't those properties actually per mode and not per CRTC/connector?


Properties that you mentioned above that the UMD can set before kernel
can enable VRR functionality
*bool vrr_enable or vrr_compatible
target_frame_duration_ns

Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
name/semantics.

We should use an absolute timestamp where the frame should be presented,
otherwise you could run into a bunch of trouble with IOCTL restarts or
missed blanks.

Also, a fixed target frame duration isn't suitable even for video
playback, due to drift between the video and audio clocks.

Time-based presentation seems to be the right approach for preventing
micro-stutter in games as well, Croteam developers have been researching
this.


I'm not sure if the driver can ever give a guarantee of the exact time a flip 
occurs. What we have control over with our HW is frame duration.


Sounds like you misunderstood what we mean here.

The driver does not need to give an exact guarantee that a flip happens 
at that time. It should just not flip before that specific time.


E.g. when we missed a VBLANK your approach would still wait for the 
specific amount of time, while an absolute timestamp would mean to flip 
as soon as possible after that timestamp passed.


As Michel noted that is also exactly what video players need.



Are Croteam devs trying to predict render times? I'm not sure how that would 
work. We've had bad experience in the past with games that try to do 
framepacing as that's usually not accurate and tends to lead to more problems 
than benefits.


As far as I understand that is just a regulated feedback system, e.g. 
the application records the timestamps of the last three frames (or so) 
and then uses that + margin to as world time for the 3D rendering.


When the application has finished sending all rendering commands it 
sends the frame to be displayed exactly with that timestamp as well.


The timestamp when the frame was actually displayed is then used again 
as input to the algorithm.


Regards,
Christian.



Harry

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amdgpu: revert "add new bo flag that indicates BOs don't need fallback (v2)"

2018-04-10 Thread Christian König
This reverts commit 6f51d28bfe8e1a676de5cd877639245bed3cc818.

Makes fallback handling to complicated. This is just a feature for the
GEM interface and shouldn't leak into the core BO create function.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +
 include/uapi/drm/amdgpu_drm.h  | 2 --
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 68af2f878bc9..e1756b68a17b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -385,8 +385,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
amdgpu_bo_in_cpu_visible_vram(bo))
p->bytes_moved_vis += ctx.bytes_moved;
 
-   if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains &&
-   !(bo->flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) {
+   if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
domain = bo->allowed_domains;
goto retry;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9e23d6f6f3f3..04d6830347ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -388,8 +388,6 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, 
unsigned long size,
drm_gem_private_object_init(adev->ddev, >gem_base, size);
INIT_LIST_HEAD(>shadow_list);
INIT_LIST_HEAD(>va);
-   bo->preferred_domains = preferred_domains;
-   bo->allowed_domains = allowed_domains;
 
bo->flags = flags;
 
@@ -426,8 +424,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, 
unsigned long size,
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
 >placement, page_align, , acc_size,
 NULL, resv, _ttm_bo_destroy);
-   if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device &&
-   !(flags & AMDGPU_GEM_CREATE_NO_FALLBACK)) {
+   if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device) {
if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
goto retry;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 80665715e651..0087799962cf 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -98,8 +98,6 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID  (1 << 6)
 /* Flag that BO sharing will be explicitly synchronized */
 #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC(1 << 7)
-/* Flag that BO doesn't need fallback */
-#define AMDGPU_GEM_CREATE_NO_FALLBACK  (1 << 8)
 
 struct drm_amdgpu_gem_create_in  {
/** the requested memory size */
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Harry Wentland
On 2018-04-10 03:37 AM, Michel Dänzer wrote:
> On 2018-04-10 08:45 AM, Christian König wrote:
>> Am 09.04.2018 um 23:45 schrieb Manasi Navare:
>>> Thanks for initiating the discussion. Find my comments below:
>>> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
 On 2018-04-09 03:56 PM, Harry Wentland wrote:
>
> === A DRM render API to support variable refresh rates ===
>
> In order to benefit from adaptive sync and VRR userland needs a way
> to let us know whether to vary frame timings or to target a
> different frame time. These can be provided as atomic properties on
> a CRTC:
>   * bool    variable_refresh_compatible
>   * int    target_frame_duration_ns (nanosecond frame duration)
>
> This gives us the following cases:
>
> variable_refresh_compatible = 0, target_frame_duration_ns = 0
>   * drive monitor at timing's normal refresh rate
>
> variable_refresh_compatible = 1, target_frame_duration_ns = 0
>   * send new frame to monitor as soon as it's available, if within
> min/max of monitor's reported capabilities
>
> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
>   * send new frame to monitor with the specified
> target_frame_duration_ns
>
> When a target_frame_duration_ns or variable_refresh_compatible
> cannot be supported the atomic check will reject the commit.
>
>>> What I would like is two sets of properties on a CRTC or preferably on
>>> a connector:
>>>
>>> KMD properties that UMD can query:
>>> * vrr_capable -  This will be an immutable property for exposing
>>> hardware's capability of supporting VRR. This will be set by the
>>> kernel after
>>> reading the EDID mode information and monitor range capabilities.
>>> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
>>> refresh rates supported.
>>> These properties are optional and will be created and attached to the
>>> DP/eDP connector when the connector
>>> is getting intialized.
>>
>> Mhm, aren't those properties actually per mode and not per CRTC/connector?
>>
>>> Properties that you mentioned above that the UMD can set before kernel
>>> can enable VRR functionality
>>> *bool vrr_enable or vrr_compatible
>>> target_frame_duration_ns
>>
>> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
>> name/semantics.
>>
>> We should use an absolute timestamp where the frame should be presented,
>> otherwise you could run into a bunch of trouble with IOCTL restarts or
>> missed blanks.
> 
> Also, a fixed target frame duration isn't suitable even for video
> playback, due to drift between the video and audio clocks.
> 
> Time-based presentation seems to be the right approach for preventing
> micro-stutter in games as well, Croteam developers have been researching
> this.
> 

I'm not sure if the driver can ever give a guarantee of the exact time a flip 
occurs. What we have control over with our HW is frame duration.

Are Croteam devs trying to predict render times? I'm not sure how that would 
work. We've had bad experience in the past with games that try to do 
framepacing as that's usually not accurate and tends to lead to more problems 
than benefits.

Harry

> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amdgpu: revert "Don't change preferred domian when fallback GTT v6"

2018-04-10 Thread Christian König
This reverts commit 7d1ca1325260a9e9329b10a21e3692e6f188936f.

Makes fallback handling to complicated. This is just a feature for the
GEM interface and shouldn't leak into the core BO create function.

The intended change to preserve the preferred domains is implemented in
a follow up patch.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 16 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 37 +++---
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 28c2706e48d7..46b9ea4e6103 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -56,11 +56,23 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
alignment = PAGE_SIZE;
}
 
+retry:
r = amdgpu_bo_create(adev, size, alignment, initial_domain,
 flags, type, resv, );
if (r) {
-   DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, %d)\n",
- size, initial_domain, alignment, r);
+   if (r != -ERESTARTSYS) {
+   if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+   flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
+   goto retry;
+   }
+
+   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
+   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
+   goto retry;
+   }
+   DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",
+ size, initial_domain, alignment, r);
+   }
return r;
}
*obj = >gem_base;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 04d6830347ec..6d08cde8443c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -356,7 +356,6 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev, 
unsigned long size,
struct amdgpu_bo *bo;
unsigned long page_align;
size_t acc_size;
-   u32 domains, preferred_domains, allowed_domains;
int r;
 
page_align = roundup(byte_align, PAGE_SIZE) >> PAGE_SHIFT;
@@ -370,24 +369,22 @@ static int amdgpu_bo_do_create(struct amdgpu_device 
*adev, unsigned long size,
acc_size = ttm_bo_dma_acc_size(>mman.bdev, size,
   sizeof(struct amdgpu_bo));
 
-   preferred_domains = domain & (AMDGPU_GEM_DOMAIN_VRAM |
- AMDGPU_GEM_DOMAIN_GTT |
- AMDGPU_GEM_DOMAIN_CPU |
- AMDGPU_GEM_DOMAIN_GDS |
- AMDGPU_GEM_DOMAIN_GWS |
- AMDGPU_GEM_DOMAIN_OA);
-   allowed_domains = preferred_domains;
-   if (type != ttm_bo_type_kernel &&
-   allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
-   allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
-   domains = preferred_domains;
-retry:
bo = kzalloc(sizeof(struct amdgpu_bo), GFP_KERNEL);
if (bo == NULL)
return -ENOMEM;
drm_gem_private_object_init(adev->ddev, >gem_base, size);
INIT_LIST_HEAD(>shadow_list);
INIT_LIST_HEAD(>va);
+   bo->preferred_domains = domain & (AMDGPU_GEM_DOMAIN_VRAM |
+AMDGPU_GEM_DOMAIN_GTT |
+AMDGPU_GEM_DOMAIN_CPU |
+AMDGPU_GEM_DOMAIN_GDS |
+AMDGPU_GEM_DOMAIN_GWS |
+AMDGPU_GEM_DOMAIN_OA);
+   bo->allowed_domains = bo->preferred_domains;
+   if (type != ttm_bo_type_kernel &&
+   bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
+   bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
 
bo->flags = flags;
 
@@ -420,20 +417,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device 
*adev, unsigned long size,
 #endif
 
bo->tbo.bdev = >mman.bdev;
-   amdgpu_ttm_placement_from_domain(bo, domains);
+   amdgpu_ttm_placement_from_domain(bo, domain);
+
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
 >placement, page_align, , acc_size,
 NULL, resv, _ttm_bo_destroy);
-   if (unlikely(r && r != -ERESTARTSYS) && type == ttm_bo_type_device) {
-   if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
-   flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
-   goto retry;
-   } else if (domains != 

[PATCH] drm/amdgpu/gfx9: cache DB_DEBUG2 and make it available to userspace

2018-04-10 Thread Alex Deucher
Userspace needs to query this value to work around a hw bug in
certain cases.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
 drivers/gpu/drm/amd/amdgpu/soc15.c| 3 +++
 3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ed5c22bfa3e5..09fa37e9a840 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -867,6 +867,8 @@ struct amdgpu_gfx_config {
 
/* gfx configure feature */
uint32_t double_offchip_lds_buf;
+   /* cached value of DB_DEBUG2 */
+   uint32_t db_debug2;
 };
 
 struct amdgpu_cu_info {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9d39fd5b1822..66bd6c1c82c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1600,6 +1600,7 @@ static void gfx_v9_0_gpu_init(struct amdgpu_device *adev)
 
gfx_v9_0_setup_rb(adev);
gfx_v9_0_get_cu_info(adev, >gfx.cu_info);
+   adev->gfx.config.db_debug2 = RREG32_SOC15(GC, 0, mmDB_DEBUG2);
 
/* XXX SH_MEM regs */
/* where to put LDS, scratch, GPUVM in FSA64 space */
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 2e9ebe8db5cc..65e781f05c24 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -287,6 +287,7 @@ static struct soc15_allowed_register_entry 
soc15_allowed_read_registers[] = {
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1)},
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STATUS)},
{ SOC15_REG_ENTRY(GC, 0, mmGB_ADDR_CONFIG)},
+   { SOC15_REG_ENTRY(GC, 0, mmDB_DEBUG2)},
 };
 
 static uint32_t soc15_read_indexed_register(struct amdgpu_device *adev, u32 
se_num,
@@ -315,6 +316,8 @@ static uint32_t soc15_get_register_value(struct 
amdgpu_device *adev,
} else {
if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG))
return adev->gfx.config.gb_addr_config;
+   else if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2))
+   return adev->gfx.config.db_debug2;
return RREG32(reg_offset);
}
 }
-- 
2.13.6

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Harry Wentland
On 2018-04-10 07:44 AM, Chris Wilson wrote:
> Quoting Christian König (2018-04-10 07:45:04)
>> Am 09.04.2018 um 23:45 schrieb Manasi Navare:
>>> Properties that you mentioned above that the UMD can set before kernel can 
>>> enable VRR functionality
>>> *bool vrr_enable or vrr_compatible
>>> target_frame_duration_ns
>>
>> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad 
>> name/semantics.
>>
>> We should use an absolute timestamp where the frame should be presented, 
>> otherwise you could run into a bunch of trouble with IOCTL restarts or 
>> missed blanks.
> 
> Hear, hear. I was disappointed not to see this be the starting point of
> the conversation. Imo, the uABI should in terms of absolutes with the
> drivers mapping that onto HW and reporting back the discrepancies.

I think it's just that some of us that work on KMD display drivers have had our 
work primarily guided by different use cases, such as gaming, which has then be 
extended to provide a better experience for video as well. We might not be as 
intimately aware of some of the work that's been done on video APIs and the 
pains involved in it but are always happy to learn and work together toward the 
best solution.

Harry

> -Chris
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Harry Wentland
On 2018-04-10 12:37 PM, Nicolai Hähnle wrote:
> On 10.04.2018 18:26, Cyr, Aric wrote:
>> That presentation time doesn’t need to come to kernel as such and actually 
>> is fine as-is completely decoupled from adaptive sync.  As long as the video 
>> player provides the new target_frame_duration_ns on the flip, then the 
>> driver/HW will target the correct refresh rate to match the source content.  
>> This simply means that more often than not the video presents will  align 
>> very close to the monitor’s refresh rate, resulting in a smooth video 
>> experience.  For example, if you have 24Hz content, and an adaptive sync 
>> monitor with a range of 40-60Hz, once the target_frame_duration_ns is 
>> provided, driver can configure the monitor to a fixed refresh rate of 48Hz 
>> causing all video presents to be frame-doubled in hardware without further 
>> application intervention.
> 
> What about multi-monitor displays, where you want to play an animation that 
> spans multiple monitors. You really want all monitors to flip at the same 
> time.
> 

Syncing two monitors is what we currently do with our timing sync feature where 
we drive two monitors from the same clock source if they use the same timing. 
That, along with VSync, guarantees all monitors flip at the same time. I'm not 
sure if it works with adaptive sync.

Are you suggesting to use adaptive sync to do an in-SW sync of multiple 
displays?

> I understand where you're coming from, but the perspective of refusing a 
> target presentation time is a rather selfish one of "we're the display, we're 
> the most important, everybody else has to adjust to us" (e.g. to get perfect 
> sync between video and audio). I admit I'm phrasing it in a bit of an extreme 
> way, but perhaps this phrasing helps to see why that's just not a very good 
> attitude to have.
> 

I really dislike arguing on an emotional basis and would rather not use words 
such as "selfish" in this discussion. I believe all of us want to come to the 
best possible solution based on technical merit.

> All devices (whether video or audio or whatever) should be able to receive a 
> target presentation time.
> 

I'm not sure I understand the full extent of the problem as I'm not really 
familiar with how this is currently done, but isn't the problem the same 
without variable refresh rates (or targeted refresh rates)? A Video API would 
still have to somehow synchronize audio and video to 60Hz on most monitors 
today. What would change if we gave user mode the ability to suggest we flip at 
video frame rates (24/48Hz)?

Harry

> If the application can make your life a bit easier by providing the targetted 
> refresh rate as additional *hint-only* parameter (like in your 24 Hz --> 48 
> Hz doubling example), then maybe we should indeed consider that.
> 
> Cheers,
> Nicolai
> 
> 
>>
>>
>> For video games we have a similar situation where a frame is rendered for a 
>> certain world time and in the ideal case we would actually display the frame 
>> at this world time.
>>
>> That seems like it would be a poorly written game that flips like that, 
>> unless they are explicitly trying to throttle the framerate for some reason. 
>>  When a game presents a completed frame, they’d like that to happen as soon 
>> as possible.  This is why non-VSYNC modes of flipping exist and many games 
>> leverage this.  Adaptive sync gives you the lower latency of immediate flips 
>> without the tearing imposed by using non-VSYNC flipping.
>>
>>
>> I mean we have the guys from Valve on this mailing list so I think we should 
>> just get the feedback from them and see what they prefer.
>>
>> We have thousands of Steam games on other OSes that work great already, but 
>> we’d certainly be interested in any additional feedback.  My guess is they 
>> prefer to “do nothing” and let driver/HW manage it, otherwise you exempt all 
>> existing games from supporting adaptive sync without a rewrite or update.
>>
>>
>> Regards,
>> Christian.
>>
>>
>>     -Aric
>>
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Christian König

Am 10.04.2018 um 17:35 schrieb Cyr, Aric:

-Original Message-
From: Wentland, Harry
Sent: Tuesday, April 10, 2018 11:08
To: Michel Dänzer ; Koenig, Christian 
; Manasi Navare

Cc: Haehnle, Nicolai ; Daniel Vetter 
; Daenzer, Michel
; dri-devel ; amd-gfx 
mailing list ;
Deucher, Alexander ; Cyr, Aric ; Koo, 
Anthony 
Subject: Re: RFC for a render API to support adaptive sync and VRR

On 2018-04-10 03:37 AM, Michel Dänzer wrote:

On 2018-04-10 08:45 AM, Christian König wrote:

Am 09.04.2018 um 23:45 schrieb Manasi Navare:

Thanks for initiating the discussion. Find my comments below:
On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:

On 2018-04-09 03:56 PM, Harry Wentland wrote:

=== A DRM render API to support variable refresh rates ===

In order to benefit from adaptive sync and VRR userland needs a way
to let us know whether to vary frame timings or to target a
different frame time. These can be provided as atomic properties on
a CRTC:
   * bool    variable_refresh_compatible
   * int    target_frame_duration_ns (nanosecond frame duration)

This gives us the following cases:

variable_refresh_compatible = 0, target_frame_duration_ns = 0
   * drive monitor at timing's normal refresh rate

variable_refresh_compatible = 1, target_frame_duration_ns = 0
   * send new frame to monitor as soon as it's available, if within
min/max of monitor's reported capabilities

variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
   * send new frame to monitor with the specified
target_frame_duration_ns

When a target_frame_duration_ns or variable_refresh_compatible
cannot be supported the atomic check will reject the commit.


What I would like is two sets of properties on a CRTC or preferably on
a connector:

KMD properties that UMD can query:
* vrr_capable -  This will be an immutable property for exposing
hardware's capability of supporting VRR. This will be set by the
kernel after
reading the EDID mode information and monitor range capabilities.
* vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
refresh rates supported.
These properties are optional and will be created and attached to the
DP/eDP connector when the connector
is getting intialized.

Mhm, aren't those properties actually per mode and not per CRTC/connector?


Properties that you mentioned above that the UMD can set before kernel
can enable VRR functionality
*bool vrr_enable or vrr_compatible
target_frame_duration_ns

Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
name/semantics.

We should use an absolute timestamp where the frame should be presented,
otherwise you could run into a bunch of trouble with IOCTL restarts or
missed blanks.

Also, a fixed target frame duration isn't suitable even for video
playback, due to drift between the video and audio clocks.

Why?  Even if they drift, you know you want to show your 24Hz video frame for 
41.ms and adaptive sync can ensure that with reasonable accuracy.
All we're doing is eliminating the need for frame rate converters from the 
application and offloading that to hardware.


Time-based presentation seems to be the right approach for preventing
micro-stutter in games as well, Croteam developers have been researching
this.


I'm not sure if the driver can ever give a guarantee of the exact time a flip 
occurs. What we have control over with our HW is frame
duration.

Are Croteam devs trying to predict render times? I'm not sure how that would 
work. We've had bad experience in the past with
games that try to do framepacing as that's usually not accurate and tends to 
lead to more problems than benefits.

For gaming, it doesn't make sense nor is it feasible to know how exactly how long a 
render will take with microsecond precision, very coarse guesses at best.  The point of 
adaptive sync is that it works *transparently* for the majority of cases, within the 
capability of the HW and driver.  We don't want to have every game re-write their engine 
to support this, but we do want the majority to "just work".

The only exception is the video case where an application may want to request a 
fixed frame duration aligned to the video content.  This requires an explicit 
interface for the video app, and our proposal is to keep it simple:  app knows 
how long a frame should be presented for, and we try to honour that.


Well I strongly disagree on that.

See VDPAU for example: 
https://http.download.nvidia.com/XFree86/vdpau/doxygen/html/group___vdp_presentation_queue.html#ga5bd61ca8ef5d1bc54ca6921aa57f835a

[in]

	earliest_presentation_time 	The timestamp associated with the 
surface. The presentation queue will not display the surface until the 
presentation queue's 

Re: [PATCH] drm/amdgpu/gfx9: cache DB_DEBUG2 and make it available to userspace

2018-04-10 Thread Nicolai Hähnle

Thanks!

Acked-by: Nicolai Hähnle 


On 10.04.2018 17:18, Alex Deucher wrote:

Userspace needs to query this value to work around a hw bug in
certain cases.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 2 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/soc15.c| 3 +++
  3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ed5c22bfa3e5..09fa37e9a840 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -867,6 +867,8 @@ struct amdgpu_gfx_config {
  
  	/* gfx configure feature */

uint32_t double_offchip_lds_buf;
+   /* cached value of DB_DEBUG2 */
+   uint32_t db_debug2;
  };
  
  struct amdgpu_cu_info {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9d39fd5b1822..66bd6c1c82c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1600,6 +1600,7 @@ static void gfx_v9_0_gpu_init(struct amdgpu_device *adev)
  
  	gfx_v9_0_setup_rb(adev);

gfx_v9_0_get_cu_info(adev, >gfx.cu_info);
+   adev->gfx.config.db_debug2 = RREG32_SOC15(GC, 0, mmDB_DEBUG2);
  
  	/* XXX SH_MEM regs */

/* where to put LDS, scratch, GPUVM in FSA64 space */
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 2e9ebe8db5cc..65e781f05c24 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -287,6 +287,7 @@ static struct soc15_allowed_register_entry 
soc15_allowed_read_registers[] = {
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1)},
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STATUS)},
{ SOC15_REG_ENTRY(GC, 0, mmGB_ADDR_CONFIG)},
+   { SOC15_REG_ENTRY(GC, 0, mmDB_DEBUG2)},
  };
  
  static uint32_t soc15_read_indexed_register(struct amdgpu_device *adev, u32 se_num,

@@ -315,6 +316,8 @@ static uint32_t soc15_get_register_value(struct 
amdgpu_device *adev,
} else {
if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG))
return adev->gfx.config.gb_addr_config;
+   else if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2))
+   return adev->gfx.config.db_debug2;
return RREG32(reg_offset);
}
  }




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Nicolai Hähnle

On 10.04.2018 18:26, Cyr, Aric wrote:
That presentation time doesn’t need to come to kernel as such and 
actually is fine as-is completely decoupled from adaptive sync.  As long 
as the video player provides the new target_frame_duration_ns on the 
flip, then the driver/HW will target the correct refresh rate to match 
the source content.  This simply means that more often than not the 
video presents will  align very close to the monitor’s refresh rate, 
resulting in a smooth video experience.  For example, if you have 24Hz 
content, and an adaptive sync monitor with a range of 40-60Hz, once the 
target_frame_duration_ns is provided, driver can configure the monitor 
to a fixed refresh rate of 48Hz causing all video presents to be 
frame-doubled in hardware without further application intervention.


What about multi-monitor displays, where you want to play an animation 
that spans multiple monitors. You really want all monitors to flip at 
the same time.


I understand where you're coming from, but the perspective of refusing a 
target presentation time is a rather selfish one of "we're the display, 
we're the most important, everybody else has to adjust to us" (e.g. to 
get perfect sync between video and audio). I admit I'm phrasing it in a 
bit of an extreme way, but perhaps this phrasing helps to see why that's 
just not a very good attitude to have.


All devices (whether video or audio or whatever) should be able to 
receive a target presentation time.


If the application can make your life a bit easier by providing the 
targetted refresh rate as additional *hint-only* parameter (like in your 
24 Hz --> 48 Hz doubling example), then maybe we should indeed consider 
that.


Cheers,
Nicolai





For video games we have a similar situation where a frame is rendered 
for a certain world time and in the ideal case we would actually display 
the frame at this world time.


That seems like it would be a poorly written game that flips like that, 
unless they are explicitly trying to throttle the framerate for some 
reason.  When a game presents a completed frame, they’d like that to 
happen as soon as possible.  This is why non-VSYNC modes of flipping 
exist and many games leverage this.  Adaptive sync gives you the lower 
latency of immediate flips without the tearing imposed by using 
non-VSYNC flipping.



I mean we have the guys from Valve on this mailing list so I think we 
should just get the feedback from them and see what they prefer.


We have thousands of Steam games on other OSes that work great already, 
but we’d certainly be interested in any additional feedback.  My guess 
is they prefer to “do nothing” and let driver/HW manage it, otherwise 
you exempt all existing games from supporting adaptive sync without a 
rewrite or update.



Regards,
Christian.


-Aric



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr] Add 'disasm_early_term' option.

2018-04-10 Thread Tom St Denis
For UMDs that don't use the 0xBF9F shader terminator marker this
option allows the disassembler to stop once it hits the first
s_endpgm opcode.

Signed-off-by: Tom St Denis 
---
 doc/sphinx/source/basic.rst | 69 +++--
 doc/umr.1   |  5 
 src/app/main.c  |  4 ++-
 src/lib/umr_llvm_disasm.c   |  4 ++-
 src/umr.h   |  3 +-
 5 files changed, 49 insertions(+), 36 deletions(-)

diff --git a/doc/sphinx/source/basic.rst b/doc/sphinx/source/basic.rst
index 8d6db65aa88a..84bf35f38b39 100644
--- a/doc/sphinx/source/basic.rst
+++ b/doc/sphinx/source/basic.rst
@@ -104,39 +104,42 @@ comma separator.
 
 The options available are:
 
-+--+--+
-| **Option**   | **Description**   
   |
-+--+--+
-| quiet| Disable various informative outputs that are not required for 
functionality. |
-+--+--+
-| read_smc | Enable scanning of SMC registers when issuing a --scan 
command   |
-+--+--+
-| bits | Enables the display of bitfields when registers are presented 
   |
-+--+--+
-| bitsfull | When printing bits use the full path to the bitfield  
   |
-+--+--+
-| empty_log| Empty MMIO tracer after reading it
   |
-+--+--+
-| follow   | Tells --logscan to continually read the MMIO tracer   
   |
-+--+--+
-| no_follow_ib | Instructs the --ring command to not follow IBs pointed to by 
the ring|
-+--+--+
-| named| Tells --read to print out the register name along with the 
value |
-+--+--+
-| many | Allows matching of register names openly.  Used with --read 
and implies the  |
-|  | *named* option.  For instance: '\*.dce100.CRTC' will match 
any register that |
-|  | contains the fragment 'CRTC' in it.   
   |
-+--+--+
-| use_pci  | Enables direct PCI access bypassing the kernels debugfs 
entries. |
-+--+--+
-| use_colour   | Enables colourful output in various commands.  Also accepts 
use_color|
-+--+--+
-| no_kernel| Attempts to avoid kernel access methods.  Implies *use_pci*.  
   |
-+--+--+
-| verbose  | Enables verbose output, for instance in VM decoding   
   |
-+--+--+
-| halt_waves   | Halt active waves while reading wave status data  
   |
-+--+--+
++---+-+
+| **Option**| **Description**  
   |
++---+-+
+| quiet | Disable various informative outputs that are not 
required for   |
+|   | functionality.   
   |
++---+-+
+| read_smc  | Enable scanning of SMC registers when issuing a --scan 
command  |
++---+-+
+| bits  | Enables the display of bitfields when registers are 
presented   |
++---+-+
+| bitsfull  | When printing bits use the full path to the 

Re: [PATCH xf86-video-amdgpu 2/5] Hook up CRTC color management functions

2018-04-10 Thread Leo Li



On 2018-04-09 11:03 AM, Michel Dänzer wrote:

On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote:

From: "Leo (Sunpeng) Li" 

The functions insert into the output resource creation, and property
change functions. CRTC destroy is also hooked-up for proper cleanup of
the CRTC property list.

Signed-off-by: Leo (Sunpeng) Li 


[...]


@@ -1933,6 +1933,9 @@ static void drmmode_output_create_resources(xf86OutputPtr 
output)
}
}
}
+
+   if (output->crtc)
+   drmmode_crtc_create_resources(output->crtc, output);


output->crtc is only non-NULL here for outputs which are enabled at Xorg
startup; other outputs won't have the new properties.


Is it necessary to have the CRTC properties on a output if the CRTC is
disabled for that output?

I've tested hot-plugging with this, and the properties do initialize on
hot-plug. Though they stay on the output on hot-unplug... Haven't dug
into this just yet.

Leo





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 1/5] Add functions for changing CRTC color management properties

2018-04-10 Thread Leo Li



On 2018-04-09 11:03 AM, Michel Dänzer wrote:

On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote:

From: "Leo (Sunpeng) Li" 

This change adds a few functions in preparation of enabling CRTC color
managment via the randr interface.

The driver-private CRTC object now contains a list of properties,
mirroring the driver-private output object. The lifecycle of the CRTC
properties will also mirror the output.

Since color managment properties are all DRM blobs, we'll expose the
ability to change the blob ID. The user can create blobs via libdrm
(which can be done without ownership of DRM master), then set the ID via
xrandr. The user will then have to ensure proper cleanup by subsequently
releasing the blob.


That sounds a bit clunky. :)

When changing a blob ID, the change only takes effect on the next atomic
commit, doesn't it? How does the client trigger the atomic commit?



From the perspective of a client that wishes to change a property, the
process between regular properties and blob properties should be
essentially the same. Both will trigger an atomic commit when the DRM
set property ioctl is called from our DDX driver.

The only difference is that DRM property blobs can be arbitrary in size,
and needs to be passed by reference through its DRM-defined blob ID.
Because of this, the client has to create the blob, save it's id, call
libXrandr to change it, then destroy the blob after it's been committed.

The client has to call libXrandr due to DRM permissions. IIRC, there can
be only one DRM master. And since xserver is DRM master, an external
application cannot set DRM properties unless it goes through X. However,
creating and destroying DRM property blobs and can be done by anyone.

Was this the source of the clunkiness? I've thought about having DDX
create and destroy the blob instead, but that needs an interface for the
client to get arbitrarily sized data to DDX. I'm not aware of any good
ways to do so. Don't think the kernel can do this for us either. It does
create the blob for legacy gamma, but that's because there's a dedicated
ioctl for it.




@@ -1604,6 +1623,18 @@ static void drmmode_output_dpms(xf86OutputPtr output, 
int mode)
}
  }
  
+static Bool drmmode_crtc_property_ignore(drmModePropertyPtr prop)

+{
+   if (!prop)
+   return TRUE;
+   /* Ignore CRTC gamma lut sizes */
+   if (!strcmp(prop->name, "GAMMA_LUT_SIZE") ||
+   !strcmp(prop->name, "DEGAMMA_LUT_SIZE"))
+   return TRUE;


Without these properties, how can a client know the LUT sizes?


Good point, I originally thought the sizes are fixed and did not need
exposing. But who knows if they may change, or even be different per asic.





@@ -1618,6 +1649,163 @@ static Bool drmmode_property_ignore(drmModePropertyPtr 
prop)
return FALSE;
  }
  
+/**

+* Configure and change the given output property through randr. Currently


"RandR"


+* ignores DRM_MODE_PROP_ENU property types. Used as part of create_resources.


DRM_MODE_PROP_ENUM is missing the final M.


+*
+* Return: 0 on success, X-defined error codes on failure.
+*/
+static int __rr_configure_and_change_property(xf86OutputPtr output,
+ drmmode_prop_ptr pmode_prop)


No leading underscores in function names please. >


+   }
+   else if (mode_prop->flags & DRM_MODE_PROP_RANGE) {


The else should be on the same line as }.



+static void drmmode_crtc_create_resources(xf86CrtcPtr crtc,
+ xf86OutputPtr output)
+{
+   AMDGPUEntPtr pAMDGPUEnt = AMDGPUEntPriv(crtc->scrn);
+   int i, j;
+
+   /* 'p' prefix for driver private objects */
+   drmmode_crtc_private_ptr pmode_crtc = crtc->driver_private;


Existing code refers to this as drmmode_crtc, please stick to that.



+   drmModeCrtcPtr mode_crtc = pmode_crtc->mode_crtc;
+
+   drmmode_prop_ptr pmode_prop;
+   drmModePropertyPtr mode_prop;
+
+   /* Get list of DRM CRTC properties, and their values */
+   drmModeObjectPropertiesPtr mode_props;


All local variable declarations should be in a single block, with no
blank lines between them, and generally sorted from longer lines to
shorter ones.



+   mode_props = drmModeObjectGetProperties(pAMDGPUEnt->fd,
+   mode_crtc->crtc_id,
+   DRM_MODE_OBJECT_CRTC);
+   if (!mode_props)
+   goto err_allocs;
+
+   /* Allocate, then populate the driver-private CRTC property list */
+   pmode_crtc->props = calloc(mode_props->count_props + 1,
+sizeof(drmmode_prop_rec));


Continuation lines should be aligned to opening parens. Any editor which
supports EditorConfig should do this automagically.



+   if (!pmode_crtc->props)
+   goto err_allocs;
+
+   pmode_crtc->num_props = 0;
+
+   /* Filter through drm crtc 

Re: [PATCH xf86-video-amdgpu 3/5] Keep CRTC properties consistent

2018-04-10 Thread Leo Li



On 2018-04-09 11:03 AM, Michel Dänzer wrote:

On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote:

From: "Leo (Sunpeng) Li" 

In cases where CRTC properties are updated without going through
RRChangeOutputProperty, we don't update the properties in user land.

Consider setting legacy gamma. It doesn't go through
RRChangeOutputProperty, but modifies the CRTC's color management
properties. Unless they are updated, the user properties will remain
stale.


Can you describe a bit more how the legacy gamma and the new properties
interact?



Sure thing, I'll include this in the message for v2:

In kernel, the legacy set gamma interface is essentially an adapter to
the non-legacy set properties interface. In the end, they both set the
same property to a DRM property blob, which contains the gamma lookup
table. The key difference between them is how this blob is created.

For legacy gamma, the kernel takes 3 arrays from user-land, and creates
the blob using them. Note that a blob is identified by it's blob_id.

For non-legacy gamma, the kernel takes a blob_id from user-land that
references the blob. This means user-land is responsible for creating
the blob.

From the perspective of RandR, this presents some problems. Since both
paths modify the same property, RandR must keep the reported property
value up-to-date with which ever path is used:

1. Legacy gamma via
xrandr --output  --gamma x:x:x
2. Non-legacy color properties via
xrandr --output  --set GAMMA_LUT 

Keeping the value up-to-date isn't a problem for 2, since RandR updates
it for us as part of changing output properties.

But if 1 is used, the property blob is created within kernel, and RandR
is unaware of the new blob_id. To update it, we need to ask kernel about it.

--- continue with rest of message ---



Therefore, add a function to update user CRTC properties by querying DRM,
and call it whenever legacy gamma is changed.


Note that drmmode_crtc_gamma_do_set is called from
drmmode_set_mode_major, i.e. on every modeset or under some
circumstances when a DRI3 client stops page flipping.



The property will have to be updated each time the legacy set gamma
ioctl is called, since a new blob (with a new blob_id) is created each time.

Not sure if this is a good idea, but perhaps we can have a flag that
explicitly enable one or the other, depending on user preference? A
user-only property with something like:

0: Use legacy gamma, calls to change non-legacy properties are ignored.
1: Use non-legacy, calls to legacy gamma will be ignored.

On 0, we can remove/disable all non-legacy properties from the property
list, and avoid having to update them. On 1, we'll enable the
properties, and won't have to update them either since legacy gamma is
"disabled". It has the added benefit of avoiding unexpected legacy gamma
sets when using non-legacy, and vice versa.




diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index 1966fd2..45457c4 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -61,8 +61,13 @@
  
  #define DEFAULT_NOMINAL_FRAME_RATE 60
  
+/* Forward declarations */

+
  static Bool drmmode_xf86crtc_resize(ScrnInfoPtr scrn, int width, int height);
  
+static void drmmode_crtc_update_resources(xf86CrtcPtr crtc);


Can you move the drmmode_crtc_update_resources such that the forward
declaration isn't necessary?



Seems possible. It uses the rr_configure_and_change helper, so I'll pull
both of them up.




  static Bool
  AMDGPUZaphodStringMatches(ScrnInfoPtr pScrn, const char *s, char *output_name)
  {
@@ -768,6 +773,7 @@ drmmode_crtc_gamma_do_set(xf86CrtcPtr crtc, uint16_t *red, 
uint16_t *green,
  
  	drmModeCrtcSetGamma(pAMDGPUEnt->fd, drmmode_crtc->mode_crtc->crtc_id,

size, red, green, blue);
+   drmmode_crtc_update_resources(crtc);
  }
  
  Bool

@@ -1653,10 +1659,15 @@ static Bool drmmode_property_ignore(drmModePropertyPtr 
prop)
  * Configure and change the given output property through randr. Currently
  * ignores DRM_MODE_PROP_ENU property types. Used as part of create_resources.
  *
+* @output: The output to configure and change the property on.
+* @pmode_prop: The driver-private property object.


These two should have been added in patch 1.


Yep, will move.

Leo





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 0/5] Implementing non-legacy color management

2018-04-10 Thread Leo Li



On 2018-04-09 10:10 AM, Michel Dänzer wrote:


Hi Leo,


apologies for the late follow-up; I was on vacation and then backlogged.


No worries, thanks for the review :)




On 2018-03-26 10:00 PM, sunpeng...@amd.com wrote:

From: "Leo (Sunpeng) Li" 

These patches will enable modification of non-legacy color management
properties via xrandr.

On top of the current legacy gamma, DRM allows the setting of three color
management tables: the degamma LUT, the color transform matrix (CTM), and the
regamma LUT. To user land, all of them are stored as DRM blobs, and are
referenced by CRTC properties via their blob IDs.

Therefore, in order to allow setting color management via xrandr, we have to:

1. Enable modification of CRTC properties via xrandr
2. Allow configuring and changing DRM blob properties via their IDs
3. Ensure compatability with legacy gamma

The first three patches does the above, while the last two does some
refactoring work to remove repetative code.

A note to reviewers, I'm a little unclear on whether this woks when one CRTC is
connected to multiple outputs. I expect that changing a CRTC property via one
of its outputs will update for that output only, since randr still understands
it as an "output property". In whic case, there needs to be a v2.


Yes, I suspect so.


However, I'm not sure how I can setup a test for this. Let me know if you have 
tips.


Something like

xrandr --output DVI-D-1 --off --output DVI-D-2 --off
xrandr --output DVI-D-1 --crtc 0 --mode 1920x1080 --output DVI-D-2
  --crtc 0 --mode 1920x1080

and then verify with xrandr --verbose that both outputs are actually
using the same CRTC. Note that I'm getting an error on the second step
when trying this right now, so there may be something preventing using
the same CRTC for multiple outputs. But AFAIK at least in theory it's
possible.


I'll give this a shot.

Leo




I will follow up with some comments on individual patches.



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Cyr, Aric
> -Original Message-
> From: Wentland, Harry
> Sent: Tuesday, April 10, 2018 11:08
> To: Michel Dänzer ; Koenig, Christian 
> ; Manasi Navare
> 
> Cc: Haehnle, Nicolai ; Daniel Vetter 
> ; Daenzer, Michel
> ; dri-devel ; 
> amd-gfx mailing list ;
> Deucher, Alexander ; Cyr, Aric ; 
> Koo, Anthony 
> Subject: Re: RFC for a render API to support adaptive sync and VRR
> 
> On 2018-04-10 03:37 AM, Michel Dänzer wrote:
> > On 2018-04-10 08:45 AM, Christian König wrote:
> >> Am 09.04.2018 um 23:45 schrieb Manasi Navare:
> >>> Thanks for initiating the discussion. Find my comments below:
> >>> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
>  On 2018-04-09 03:56 PM, Harry Wentland wrote:
> >
> > === A DRM render API to support variable refresh rates ===
> >
> > In order to benefit from adaptive sync and VRR userland needs a way
> > to let us know whether to vary frame timings or to target a
> > different frame time. These can be provided as atomic properties on
> > a CRTC:
> >   * bool    variable_refresh_compatible
> >   * int    target_frame_duration_ns (nanosecond frame duration)
> >
> > This gives us the following cases:
> >
> > variable_refresh_compatible = 0, target_frame_duration_ns = 0
> >   * drive monitor at timing's normal refresh rate
> >
> > variable_refresh_compatible = 1, target_frame_duration_ns = 0
> >   * send new frame to monitor as soon as it's available, if within
> > min/max of monitor's reported capabilities
> >
> > variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
> >   * send new frame to monitor with the specified
> > target_frame_duration_ns
> >
> > When a target_frame_duration_ns or variable_refresh_compatible
> > cannot be supported the atomic check will reject the commit.
> >
> >>> What I would like is two sets of properties on a CRTC or preferably on
> >>> a connector:
> >>>
> >>> KMD properties that UMD can query:
> >>> * vrr_capable -  This will be an immutable property for exposing
> >>> hardware's capability of supporting VRR. This will be set by the
> >>> kernel after
> >>> reading the EDID mode information and monitor range capabilities.
> >>> * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
> >>> refresh rates supported.
> >>> These properties are optional and will be created and attached to the
> >>> DP/eDP connector when the connector
> >>> is getting intialized.
> >>
> >> Mhm, aren't those properties actually per mode and not per CRTC/connector?
> >>
> >>> Properties that you mentioned above that the UMD can set before kernel
> >>> can enable VRR functionality
> >>> *bool vrr_enable or vrr_compatible
> >>> target_frame_duration_ns
> >>
> >> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
> >> name/semantics.
> >>
> >> We should use an absolute timestamp where the frame should be presented,
> >> otherwise you could run into a bunch of trouble with IOCTL restarts or
> >> missed blanks.
> >
> > Also, a fixed target frame duration isn't suitable even for video
> > playback, due to drift between the video and audio clocks.

Why?  Even if they drift, you know you want to show your 24Hz video frame for 
41.ms and adaptive sync can ensure that with reasonable accuracy.  
All we're doing is eliminating the need for frame rate converters from the 
application and offloading that to hardware.

> > Time-based presentation seems to be the right approach for preventing
> > micro-stutter in games as well, Croteam developers have been researching
> > this.
> >
> 
> I'm not sure if the driver can ever give a guarantee of the exact time a flip 
> occurs. What we have control over with our HW is frame
> duration.
> 
> Are Croteam devs trying to predict render times? I'm not sure how that would 
> work. We've had bad experience in the past with
> games that try to do framepacing as that's usually not accurate and tends to 
> lead to more problems than benefits.

For gaming, it doesn't make sense nor is it feasible to know how exactly how 
long a render will take with microsecond precision, very coarse guesses at 
best.  The point of adaptive sync is that it works *transparently* for the 
majority of cases, within the capability of the HW and driver.  We don't want 
to have every game re-write their engine to support this, but we do want the 
majority to "just work".

The only exception is the video case where an application may want to request a 
fixed frame duration aligned to the video content.  This requires an explicit 
interface for the video app, and our proposal is to keep it simple:  app knows 
how long a frame should be 

Re: [PATCH 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)

2018-04-10 Thread Andrey Grodzovsky
Indeed :( After 2 tries i see the problem, if I remove "drm/amdgpu: Free 
VGA stolen memory as soon as possible." problem goes away.


Andrey


On 04/10/2018 06:53 AM, Huang Rui wrote:

On Mon, Apr 09, 2018 at 11:17:58AM -0400, Andrey Grodzovsky wrote:

OK, tested with DC disabled , no issues on resume (no visible
corruption on display or errors in log). Now the display itself
freezes after amdgpu is loaded with DC disabled, this happens only
when BIOS in VGA mode , in console mode no such problem. Happens
before my and Alex patches, looks like a separate issue.

So anyway, if corruption would be there (beginning of VRAM and hence
scanout FB corrupted) , i should have seen it with grub in console
mode where display is fine and not freezing.


Reproduce steps:
1. sudo modprobe amdgpu dc=0 ip_block_mask=0x7f
2. pm-suspend/resume two times.

You will see the start of vram is corrupted after S3 resume.

[  570.343635] [drm] PCIE GART of 512M enabled (table at 0x00F4).
[  570.343642] [drm] PSP is resuming...
[  570.343713] gmc_v9_0_process_interrupt: 12 callbacks suppressed
[  570.343715] amdgpu :03:00.0: [mmhub] VMC page fault (src_id:0 ring:0 
vmid:0 pasid:0)
[  570.343716] amdgpu :03:00.0:   at page 0x00f60070 from 18
[  570.343716] amdgpu :03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010
[  570.525510] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[  570.525523] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP 
block  failed -62
[  570.525536] [drm:amdgpu_device_resume [amdgpu]] *ERROR* 
amdgpu_device_ip_resume failed (-62).
[  570.536704] e1000e: enp0s31f6 NIC Link is Up 100 Mbps Full Duplex, Flow 
Control: Rx/Tx
[  570.540496] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -62
[  570.547879] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO
[  570.555434] call :03:00.0+ returned -62 after 1973202 usecs
[  570.689812] PM: Device :03:00.0 failed to resume async: error -62

I attached the whole dmesg.

Thanks,
Ray


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Michel Dänzer
On 2018-04-10 06:26 PM, Cyr, Aric wrote:
> From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43
> 
>> For video games we have a similar situation where a frame is rendered
>> for a certain world time and in the ideal case we would actually
>> display the frame at this world time.
> 
> That seems like it would be a poorly written game that flips like
> that, unless they are explicitly trying to throttle the framerate for
> some reason.  When a game presents a completed frame, they’d like
> that to happen as soon as possible.

What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.


Maybe we should try getting the Croteam guys researching this involved
directly here.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Michel Dänzer
On 2018-04-10 05:35 PM, Cyr, Aric wrote:
>> On 2018-04-10 03:37 AM, Michel Dänzer wrote:
>>> On 2018-04-10 08:45 AM, Christian König wrote:
 Am 09.04.2018 um 23:45 schrieb Manasi Navare:
> Thanks for initiating the discussion. Find my comments
> below: On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry
> Wentland wrote:
>> On 2018-04-09 03:56 PM, Harry Wentland wrote:
>>> 
>>> === A DRM render API to support variable refresh rates
>>> ===
>>> 
>>> In order to benefit from adaptive sync and VRR userland
>>> needs a way to let us know whether to vary frame timings
>>> or to target a different frame time. These can be
>>> provided as atomic properties on a CRTC: * bool
>>> variable_refresh_compatible * int
>>> target_frame_duration_ns (nanosecond frame duration)
>>> 
>>> This gives us the following cases:
>>> 
>>> variable_refresh_compatible = 0, target_frame_duration_ns
>>> = 0 * drive monitor at timing's normal refresh rate
>>> 
>>> variable_refresh_compatible = 1, target_frame_duration_ns
>>> = 0 * send new frame to monitor as soon as it's
>>> available, if within min/max of monitor's reported
>>> capabilities
>>> 
>>> variable_refresh_compatible = 0/1,
>>> target_frame_duration_ns = > 0 * send new frame to
>>> monitor with the specified target_frame_duration_ns
>>> 
>>> When a target_frame_duration_ns or
>>> variable_refresh_compatible cannot be supported the
>>> atomic check will reject the commit.
>>> 
> What I would like is two sets of properties on a CRTC or
> preferably on a connector:
> 
> KMD properties that UMD can query: * vrr_capable -  This will
> be an immutable property for exposing hardware's capability
> of supporting VRR. This will be set by the kernel after 
> reading the EDID mode information and monitor range
> capabilities. * vrr_vrefresh_max, vrr_vrefresh_min - To
> expose the min and max refresh rates supported. These
> properties are optional and will be created and attached to
> the DP/eDP connector when the connector is getting
> intialized.
 
 Mhm, aren't those properties actually per mode and not per
 CRTC/connector?
 
> Properties that you mentioned above that the UMD can set
> before kernel can enable VRR functionality *bool vrr_enable
> or vrr_compatible target_frame_duration_ns
 
 Yeah, that certainly makes sense. But target_frame_duration_ns
 is a bad name/semantics.
 
 We should use an absolute timestamp where the frame should be
 presented, otherwise you could run into a bunch of trouble with
 IOCTL restarts or missed blanks.
>>> 
>>> Also, a fixed target frame duration isn't suitable even for
>>> video playback, due to drift between the video and audio clocks.
> 
> Why?  Even if they drift, you know you want to show your 24Hz video
> frame for 41.ms and adaptive sync can ensure that with reasonable
> accuracy.

Due to the drift, the video player has to occasionally either skip a
frame or present it twice to prevent audio and video going out of sync,
resulting in visual artifacts.

With time-based presentation and variable refresh rate, audio and video
can stay in sync without occasional visual artifacts.

It would be a pity to create a "variable refresh rate API" which doesn't
allow harnessing this strength of variable refresh rate.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Cyr, Aric
> -Original Message-
> From: Michel Dänzer [mailto:mic...@daenzer.net]
> Sent: Tuesday, April 10, 2018 13:06
> On 2018-04-10 06:26 PM, Cyr, Aric wrote:
> > From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43
> >
> >> For video games we have a similar situation where a frame is rendered
> >> for a certain world time and in the ideal case we would actually
> >> display the frame at this world time.
> >
> > That seems like it would be a poorly written game that flips like
> > that, unless they are explicitly trying to throttle the framerate for
> > some reason.  When a game presents a completed frame, they’d like
> > that to happen as soon as possible.
> 
> What you're describing is what most games have been doing traditionally.
> Croteam's research shows that this results in micro-stuttering, because
> frames may be presented too early. To avoid that, they want to
> explicitly time each presentation as described by Christian.

Yes, I agree completely.  However that's only truly relevant for fixed 
refreshed rate displays.
This is the primary reason for having Adaptive Sync.  
There is no perfect way to solve this without Adaptive Sync, but yes they can 
come up with better algorithms to improve fixed refresh rate displays.

> 
> Maybe we should try getting the Croteam guys researching this involved
> directly here.

I'd be interested in any research they could share, for sure.  
We also have years of experience and research here, but not distilled into any 
readily available format.

> 
> 
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Nicolai Hähnle

On 10.04.2018 19:25, Cyr, Aric wrote:

-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net]
Sent: Tuesday, April 10, 2018 13:16

On 2018-04-10 07:13 PM, Cyr, Aric wrote:

-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net]
Sent: Tuesday, April 10, 2018 13:06
On 2018-04-10 06:26 PM, Cyr, Aric wrote:

From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43


For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.


That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason.  When a game presents a completed frame, they’d like
that to happen as soon as possible.


What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.


Yes, I agree completely.  However that's only truly relevant for fixed
refreshed rate displays.


No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.


Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you, so why bother?


I think Michel's point is that the monitor and GPU hardware *cannot* 
really do this, because there's synchronization with audio to take into 
account, which the GPU or monitor don't know about.


Also, as I wrote separately, there's the case of synchronizing multiple 
monitors.




The input to their algorithms will be noisy causing worst estimations.  If you 
just present as fast as you can, it'll just work (within reason).
The majority of gamers want maximum FPS for their games, and there's quite 
frequently outrage at a particular game when they are limited to something 
lower that what their monitor could otherwise support (i.e. I don't want my 
game limited to 30Hz if I have a shiny 144Hz gaming display I paid good money 
for).   Of course, there's always exceptions... but in our experience those are 
few and far between.


I agree that games most likely shouldn't try to be smart. I'm curious 
about the Croteam findings, but even if they did a really clever thing 
that works better than just telling the display driver "display ASAP 
please", chances are that *most* developers won't do that. And they'll 
most likely get it wrong, so our guidance should really be "games should 
ask for ASAP presentation, and nothing else".


However, there *are* legitimate use cases for requesting a specific 
presentation time, and there *is* precedent of APIs that expose such 
features.


Are there any real problems with exposing an absolute target present time?

Cheers,
Nicolai

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Harry Wentland
On 2018-04-10 01:52 PM, Harry Wentland wrote:
> On 2018-04-10 12:37 PM, Nicolai Hähnle wrote:
>> On 10.04.2018 18:26, Cyr, Aric wrote:
>>> That presentation time doesn’t need to come to kernel as such and actually 
>>> is fine as-is completely decoupled from adaptive sync.  As long as the 
>>> video player provides the new target_frame_duration_ns on the flip, then 
>>> the driver/HW will target the correct refresh rate to match the source 
>>> content.  This simply means that more often than not the video presents 
>>> will  align very close to the monitor’s refresh rate, resulting in a smooth 
>>> video experience.  For example, if you have 24Hz content, and an adaptive 
>>> sync monitor with a range of 40-60Hz, once the target_frame_duration_ns is 
>>> provided, driver can configure the monitor to a fixed refresh rate of 48Hz 
>>> causing all video presents to be frame-doubled in hardware without further 
>>> application intervention.
>>
>> What about multi-monitor displays, where you want to play an animation that 
>> spans multiple monitors. You really want all monitors to flip at the same 
>> time.
>>
> 
> Syncing two monitors is what we currently do with our timing sync feature 
> where we drive two monitors from the same clock source if they use the same 
> timing. That, along with VSync, guarantees all monitors flip at the same 
> time. I'm not sure if it works with adaptive sync.
> 
> Are you suggesting to use adaptive sync to do an in-SW sync of multiple 
> displays?
> 
>> I understand where you're coming from, but the perspective of refusing a 
>> target presentation time is a rather selfish one of "we're the display, 
>> we're the most important, everybody else has to adjust to us" (e.g. to get 
>> perfect sync between video and audio). I admit I'm phrasing it in a bit of 
>> an extreme way, but perhaps this phrasing helps to see why that's just not a 
>> very good attitude to have.
>>
> 
> I really dislike arguing on an emotional basis and would rather not use words 
> such as "selfish" in this discussion. I believe all of us want to come to the 
> best possible solution based on technical merit.
> 
>> All devices (whether video or audio or whatever) should be able to receive a 
>> target presentation time.
>>
> 
> I'm not sure I understand the full extent of the problem as I'm not really 
> familiar with how this is currently done, but isn't the problem the same 
> without variable refresh rates (or targeted refresh rates)? A Video API would 
> still have to somehow synchronize audio and video to 60Hz on most monitors 
> today. What would change if we gave user mode the ability to suggest we flip 
> at video frame rates (24/48Hz)?
> 

Never mind. Just saw Michel's reply to an earlier message.

Harry

> Harry
> 
>> If the application can make your life a bit easier by providing the 
>> targetted refresh rate as additional *hint-only* parameter (like in your 24 
>> Hz --> 48 Hz doubling example), then maybe we should indeed consider that.
>>
>> Cheers,
>> Nicolai
>>
>>
>>>
>>>
>>> For video games we have a similar situation where a frame is rendered for a 
>>> certain world time and in the ideal case we would actually display the 
>>> frame at this world time.
>>>
>>> That seems like it would be a poorly written game that flips like that, 
>>> unless they are explicitly trying to throttle the framerate for some 
>>> reason.  When a game presents a completed frame, they’d like that to happen 
>>> as soon as possible.  This is why non-VSYNC modes of flipping exist and 
>>> many games leverage this.  Adaptive sync gives you the lower latency of 
>>> immediate flips without the tearing imposed by using non-VSYNC flipping.
>>>
>>>
>>> I mean we have the guys from Valve on this mailing list so I think we 
>>> should just get the feedback from them and see what they prefer.
>>>
>>> We have thousands of Steam games on other OSes that work great already, but 
>>> we’d certainly be interested in any additional feedback.  My guess is they 
>>> prefer to “do nothing” and let driver/HW manage it, otherwise you exempt 
>>> all existing games from supporting adaptive sync without a rewrite or 
>>> update.
>>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>     -Aric
>>>
>>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amd/pp: remove unnecessary forward declaration

2018-04-10 Thread Alex Deucher
On Tue, Apr 10, 2018 at 1:18 AM, Rex Zhu  wrote:
> Signed-off-by: Rex Zhu 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 84 
> +++---
>  1 file changed, 41 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> index e8ded22..ac44f9c 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
> @@ -75,8 +75,6 @@
>  #define DF_CS_AON0_DramBaseAddress0__IntLvNumChan_MASK   
>  0x00F0L
>  #define DF_CS_AON0_DramBaseAddress0__IntLvAddrSel_MASK   
>  0x0700L
>  #define DF_CS_AON0_DramBaseAddress0__DramBaseAddr_MASK   
>  0xF000L
> -static int vega10_force_clock_level(struct pp_hwmgr *hwmgr,
> -   enum pp_clock_type type, uint32_t mask);
>
>  static const ULONG PhwVega10_Magic = (ULONG)(PHM_VIslands_Magic);
>
> @@ -4106,6 +4104,47 @@ static void vega10_set_fan_control_mode(struct 
> pp_hwmgr *hwmgr, uint32_t mode)
> }
>  }
>
> +static int vega10_force_clock_level(struct pp_hwmgr *hwmgr,
> +   enum pp_clock_type type, uint32_t mask)
> +{
> +   struct vega10_hwmgr *data = hwmgr->backend;
> +
> +   switch (type) {
> +   case PP_SCLK:
> +   data->smc_state_table.gfx_boot_level = mask ? (ffs(mask) - 1) 
> : 0;
> +   data->smc_state_table.gfx_max_level = mask ? (fls(mask) - 1) 
> : 0;
> +
> +   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr),
> +   "Failed to upload boot level to lowest!",
> +   return -EINVAL);
> +
> +   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr),
> +   "Failed to upload dpm max level to highest!",
> +   return -EINVAL);
> +   break;
> +
> +   case PP_MCLK:
> +   data->smc_state_table.mem_boot_level = mask ? (ffs(mask) - 1) 
> : 0;
> +   data->smc_state_table.mem_max_level = mask ? (fls(mask) - 1) 
> : 0;
> +
> +   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr),
> +   "Failed to upload boot level to lowest!",
> +   return -EINVAL);
> +
> +   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr),
> +   "Failed to upload dpm max level to highest!",
> +   return -EINVAL);
> +
> +   break;
> +
> +   case PP_PCIE:
> +   default:
> +   break;
> +   }
> +
> +   return 0;
> +}
> +
>  static int vega10_dpm_force_dpm_level(struct pp_hwmgr *hwmgr,
> enum amd_dpm_forced_level level)
>  {
> @@ -4392,47 +4431,6 @@ static int 
> vega10_set_watermarks_for_clocks_ranges(struct pp_hwmgr *hwmgr,
> return result;
>  }
>
> -static int vega10_force_clock_level(struct pp_hwmgr *hwmgr,
> -   enum pp_clock_type type, uint32_t mask)
> -{
> -   struct vega10_hwmgr *data = hwmgr->backend;
> -
> -   switch (type) {
> -   case PP_SCLK:
> -   data->smc_state_table.gfx_boot_level = mask ? (ffs(mask) - 1) 
> : 0;
> -   data->smc_state_table.gfx_max_level = mask ? (fls(mask) - 1) 
> : 0;
> -
> -   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr),
> -   "Failed to upload boot level to lowest!",
> -   return -EINVAL);
> -
> -   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr),
> -   "Failed to upload dpm max level to highest!",
> -   return -EINVAL);
> -   break;
> -
> -   case PP_MCLK:
> -   data->smc_state_table.mem_boot_level = mask ? (ffs(mask) - 1) 
> : 0;
> -   data->smc_state_table.mem_max_level = mask ? (fls(mask) - 1) 
> : 0;
> -
> -   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_bootup_level(hwmgr),
> -   "Failed to upload boot level to lowest!",
> -   return -EINVAL);
> -
> -   PP_ASSERT_WITH_CODE(!vega10_upload_dpm_max_level(hwmgr),
> -   "Failed to upload dpm max level to highest!",
> -   return -EINVAL);
> -
> -   break;
> -
> -   case PP_PCIE:
> -   default:
> -   break;
> -   }
> -
> -   return 0;
> -}
> -
>  static int vega10_print_clock_levels(struct pp_hwmgr *hwmgr,
> enum pp_clock_type type, char *buf)
>  {
> --
> 1.9.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> 

Re: [PATCH] drm/amdkfd: Remove vla

2018-04-10 Thread Felix Kuehling
Thanks Christian for catching that. I'm working on a patch series to
upstream Vega10 support, about 95% done. It will add this ASIC info for
Vega10:

static const struct kfd_device_info vega10_device_info = {
.asic_family = CHIP_VEGA10,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
.doorbell_size  = 8,
.ih_ring_entry_size = 8 * sizeof(uint32_t), /* !!! IH ring entry size 
is bigger on Vega10 !!! */
.event_interrupt_class = _interrupt_class_v9,
.num_of_watch_points = 4,
.mqd_size_aligned = MQD_SIZE_ALIGNED,
.supports_cwsr = true,
.needs_iommu_device = false,
.needs_pci_atomics = false,
};

If you change it to uint32_t ih_ring_entry[8] and update the check, it
should be reasonably future proof.

Regards,
  Felix


On 2018-04-10 02:38 AM, Christian König wrote:
> Am 09.04.2018 um 23:06 schrieb Laura Abbott:
>> There's an ongoing effort to remove VLAs[1] from the kernel to
>> eventually
>> turn on -Wvla. The single VLA usage in the amdkfd driver is actually
>> constant across all current platforms.
>
> Actually that isn't correct.
>
> Could be that we haven't upstreamed KFD support for them, but Vega10
> have a different interrupt ring entry size and so would cause the
> error message here.
>
>> Switch to a constant size array
>> instead.
>
> I would say to just make make the array bigger.
>
> Regards,
> Christian.
>
>>
>> [1] https://lkml.org/lkml/2018/3/7/621
>>
>> Signed-off-by: Laura Abbott 
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 8 +---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
>> index 035c351f47c5..c9863858f343 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
>> @@ -139,10 +139,12 @@ static void interrupt_wq(struct work_struct *work)
>>   {
>>   struct kfd_dev *dev = container_of(work, struct kfd_dev,
>>   interrupt_work);
>> +    uint32_t ih_ring_entry[4];
>>   -    uint32_t ih_ring_entry[DIV_ROUND_UP(
>> -    dev->device_info->ih_ring_entry_size,
>> -    sizeof(uint32_t))];
>> +    if (dev->device_info->ih_ring_entry_size > (4 *
>> sizeof(uint32_t))) {
>> +    dev_err(kfd_chardev(), "Ring entry too small\n");
>> +    return;
>> +    }
>>     while (dequeue_ih_ring_entry(dev, ih_ring_entry))
>>   dev->device_info->event_interrupt_class->interrupt_wq(dev,
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: Remove vla

2018-04-10 Thread Laura Abbott

On 04/09/2018 11:38 PM, Christian König wrote:

Am 09.04.2018 um 23:06 schrieb Laura Abbott:

There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. The single VLA usage in the amdkfd driver is actually
constant across all current platforms.


Actually that isn't correct.

Could be that we haven't upstreamed KFD support for them, but Vega10 have a 
different interrupt ring entry size and so would cause the error message here.


Switch to a constant size array
instead.


I would say to just make make the array bigger.

Regards,
Christian.



What array size would accommodate future chips?



[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Laura Abbott 
---
  drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
index 035c351f47c5..c9863858f343 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -139,10 +139,12 @@ static void interrupt_wq(struct work_struct *work)
  {
  struct kfd_dev *dev = container_of(work, struct kfd_dev,
  interrupt_work);
+    uint32_t ih_ring_entry[4];
-    uint32_t ih_ring_entry[DIV_ROUND_UP(
-    dev->device_info->ih_ring_entry_size,
-    sizeof(uint32_t))];
+    if (dev->device_info->ih_ring_entry_size > (4 * sizeof(uint32_t))) {
+    dev_err(kfd_chardev(), "Ring entry too small\n");
+    return;
+    }
  while (dequeue_ih_ring_entry(dev, ih_ring_entry))
  dev->device_info->event_interrupt_class->interrupt_wq(dev,




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-10 Thread Christian König

Am 10.04.2018 um 20:25 schrieb Sinan Kaya:

Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.


Good catch, but wrong place to set this.

Please move it into the device initialization functions.

Regards,
Christian.



Signed-off-by: Sinan Kaya 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435..02465cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -787,6 +787,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
enum dma_data_direction direction = write ?
DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
  
+	dma_set_max_seg_size(adev->dev, PAGE_SIZE);

+
r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
  ttm->num_pages << PAGE_SHIFT,
  GFP_KERNEL);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: limit DMA size to PAGE_SIZE for scatter-gather buffers

2018-04-10 Thread Sinan Kaya
Code is expecing to observe the same number of buffers returned from
dma_map_sg() function compared to sg_alloc_table_from_pages(). This
doesn't hold true universally especially for systems with IOMMU.

IOMMU driver tries to combine buffers into a single DMA address as much
as it can. The right thing is to tell the DMA layer how much combining
IOMMU can do.

Signed-off-by: Sinan Kaya 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435..02465cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -787,6 +787,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
enum dma_data_direction direction = write ?
DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
 
+   dma_set_max_seg_size(adev->dev, PAGE_SIZE);
+
r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
  ttm->num_pages << PAGE_SHIFT,
  GFP_KERNEL);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Michel Dänzer
On 2018-04-10 07:13 PM, Cyr, Aric wrote:
>> -Original Message-
>> From: Michel Dänzer [mailto:mic...@daenzer.net]
>> Sent: Tuesday, April 10, 2018 13:06
>> On 2018-04-10 06:26 PM, Cyr, Aric wrote:
>>> From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43
>>>
 For video games we have a similar situation where a frame is rendered
 for a certain world time and in the ideal case we would actually
 display the frame at this world time.
>>>
>>> That seems like it would be a poorly written game that flips like
>>> that, unless they are explicitly trying to throttle the framerate for
>>> some reason.  When a game presents a completed frame, they’d like
>>> that to happen as soon as possible.
>>
>> What you're describing is what most games have been doing traditionally.
>> Croteam's research shows that this results in micro-stuttering, because
>> frames may be presented too early. To avoid that, they want to
>> explicitly time each presentation as described by Christian.
> 
> Yes, I agree completely.  However that's only truly relevant for fixed
> refreshed rate displays.

No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.


I have to leave for today, I'll look up the Croteam video on Youtube
explaining this tomorrow if nobody beats me to it.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Cyr, Aric
> -Original Message-
> From: Michel Dänzer [mailto:mic...@daenzer.net]
> Sent: Tuesday, April 10, 2018 13:16
> 
> On 2018-04-10 07:13 PM, Cyr, Aric wrote:
> >> -Original Message-
> >> From: Michel Dänzer [mailto:mic...@daenzer.net]
> >> Sent: Tuesday, April 10, 2018 13:06
> >> On 2018-04-10 06:26 PM, Cyr, Aric wrote:
> >>> From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43
> >>>
>  For video games we have a similar situation where a frame is rendered
>  for a certain world time and in the ideal case we would actually
>  display the frame at this world time.
> >>>
> >>> That seems like it would be a poorly written game that flips like
> >>> that, unless they are explicitly trying to throttle the framerate for
> >>> some reason.  When a game presents a completed frame, they’d like
> >>> that to happen as soon as possible.
> >>
> >> What you're describing is what most games have been doing traditionally.
> >> Croteam's research shows that this results in micro-stuttering, because
> >> frames may be presented too early. To avoid that, they want to
> >> explicitly time each presentation as described by Christian.
> >
> > Yes, I agree completely.  However that's only truly relevant for fixed
> > refreshed rate displays.
> 
> No, it also affects variable refresh; possibly even more in some cases,
> because the presentation time is less predictable.

Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you, so why bother?
The input to their algorithms will be noisy causing worst estimations.  If you 
just present as fast as you can, it'll just work (within reason).
The majority of gamers want maximum FPS for their games, and there's quite 
frequently outrage at a particular game when they are limited to something 
lower that what their monitor could otherwise support (i.e. I don't want my 
game limited to 30Hz if I have a shiny 144Hz gaming display I paid good money 
for).   Of course, there's always exceptions... but in our experience those are 
few and far between.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Fix 64-bit division in hwss_edp_power_control

2018-04-10 Thread Alex Deucher
On Tue, Apr 10, 2018 at 4:10 PM, Harry Wentland  wrote:
> Signed-off-by: Harry Wentland 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
> b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> index f32ccdfd18a3..3ba057e2a467 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
> @@ -860,7 +860,7 @@ void hwss_edp_power_control(
> dm_get_elapse_time_in_ns(
> ctx,
> current_ts,
> -   
> link->link_trace.time_stamp.edp_poweroff) / 100;
> +   
> div64_u64(link->link_trace.time_stamp.edp_poweroff, 100));
> unsigned long long wait_time_ms = 0;
>
> /* max 500ms from LCDVDD off to on */
> --
> 2.15.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/21] drm/amd/display: fix segfault on insufficient TG during validation

2018-04-10 Thread Harry Wentland
From: Dmytro Laktyushkin 

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 667fac8749b9..faaba0ea0ace 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1700,7 +1700,7 @@ enum dc_status resource_map_pool_resources(
pipe_idx = acquire_first_split_pipe(>res_ctx, pool, 
stream);
 #endif
 
-   if (pipe_idx < 0)
+   if (pipe_idx < 0 || context->res_ctx.pipe_ctx[pipe_idx].stream_res.tg 
== NULL)
return DC_NO_CONTROLLER_RESOURCE;
 
pipe_ctx = >res_ctx.pipe_ctx[pipe_idx];
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 11/21] drm/amd/display: Check lid state to determine fast boot optimization.

2018-04-10 Thread Harry Wentland
From: Yongqiang Sun 

For legacy enable boot up with lid closed, eDP information couldn't be
read correctly via SBIOS_SCRATCH_3 results in eDP cannot be light up
properly when open lid.
Check lid state instead can resolve the issue.

Signed-off-by: Yongqiang Sun 
Reviewed-by: Eric Yang 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dc_stream.h |  1 +
 .../amd/display/dc/dce110/dce110_hw_sequencer.c| 24 ++
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h 
b/drivers/gpu/drm/amd/display/dc/dc_stream.h
index 046e87aa699a..5f215ca38c07 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_stream.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h
@@ -98,6 +98,7 @@ struct dc_stream_state {
int phy_pix_clk;
enum signal_type signal;
bool dpms_off;
+   bool lid_state_closed;
 
struct dc_stream_status status;
 
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 0ff2a8092782..3ba057e2a467 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1481,6 +1481,17 @@ static void disable_vga_and_power_gate_all_controllers(
}
 }
 
+static bool is_eDP_lid_closed(struct dc_state *context)
+{
+   int i;
+
+   for (i = 0; i < context->stream_count; i++) {
+   if (context->streams[i]->signal == SIGNAL_TYPE_EDP)
+   return context->streams[i]->lid_state_closed;
+   }
+   return false;
+}
+
 static struct dc_link *get_link_for_edp_not_in_use(
struct dc *dc,
struct dc_state *context)
@@ -1515,20 +1526,17 @@ static struct dc_link *get_link_for_edp_not_in_use(
  */
 void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context)
 {
-   struct dc_bios *dcb = dc->ctx->dc_bios;
-
-   /* vbios already light up eDP, so we can leverage vbios and skip eDP
+   /* check eDP lid state:
+* If lid is open, vbios already light up eDP, so we can leverage vbios 
and skip eDP
 * programming
 */
-   bool can_eDP_fast_boot_optimize =
-   (dcb->funcs->get_vga_enabled_displays(dc->ctx->dc_bios) 
== ATOM_DISPLAY_LCD1_ACTIVE);
-
-   /* if OS doesn't light up eDP and eDP link is available, we want to 
disable */
+   bool lid_state_closed = is_eDP_lid_closed(context);
struct dc_link *edp_link_to_turnoff = NULL;
 
-   if (can_eDP_fast_boot_optimize) {
+   if (!lid_state_closed) {
edp_link_to_turnoff = get_link_for_edp_not_in_use(dc, context);
 
+   /* if OS doesn't light up eDP and eDP link is available, we 
want to disable */
if (!edp_link_to_turnoff)
dc->apply_edp_fast_boot_optimization = true;
}
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 03/21] drm/amd/display: Move dp_pixel_encoding_type to stream_encoder include

2018-04-10 Thread Harry Wentland
From: Eric Bernstein 

Signed-off-by: Eric Bernstein 
Reviewed-by: Nikola Cornij 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h | 17 -
 .../gpu/drm/amd/display/dc/inc/hw/stream_encoder.h| 19 +++
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h
index 9fe73028d588..cf7433ebf91a 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h
@@ -186,23 +186,6 @@ enum controller_dp_test_pattern {
CONTROLLER_DP_TEST_PATTERN_COLORSQUARES_CEA
 };
 
-enum dp_pixel_encoding_type {
-   DP_PIXEL_ENCODING_TYPE_RGB444   = 0x,
-   DP_PIXEL_ENCODING_TYPE_YCBCR422 = 0x0001,
-   DP_PIXEL_ENCODING_TYPE_YCBCR444 = 0x0002,
-   DP_PIXEL_ENCODING_TYPE_RGB_WIDE_GAMUT   = 0x0003,
-   DP_PIXEL_ENCODING_TYPE_Y_ONLY   = 0x0004,
-   DP_PIXEL_ENCODING_TYPE_YCBCR420 = 0x0005
-};
-
-enum dp_component_depth {
-   DP_COMPONENT_PIXEL_DEPTH_6BPC   = 0x,
-   DP_COMPONENT_PIXEL_DEPTH_8BPC   = 0x0001,
-   DP_COMPONENT_PIXEL_DEPTH_10BPC  = 0x0002,
-   DP_COMPONENT_PIXEL_DEPTH_12BPC  = 0x0003,
-   DP_COMPONENT_PIXEL_DEPTH_16BPC  = 0x0004
-};
-
 enum dc_lut_mode {
LUT_BYPASS,
LUT_RAM_A,
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h 
b/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h
index 5c21336cae4c..cfa7ec9517ae 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h
@@ -29,11 +29,29 @@
 #define STREAM_ENCODER_H_
 
 #include "audio_types.h"
+#include "hw_shared.h"
 
 struct dc_bios;
 struct dc_context;
 struct dc_crtc_timing;
 
+enum dp_pixel_encoding_type {
+   DP_PIXEL_ENCODING_TYPE_RGB444   = 0x,
+   DP_PIXEL_ENCODING_TYPE_YCBCR422 = 0x0001,
+   DP_PIXEL_ENCODING_TYPE_YCBCR444 = 0x0002,
+   DP_PIXEL_ENCODING_TYPE_RGB_WIDE_GAMUT   = 0x0003,
+   DP_PIXEL_ENCODING_TYPE_Y_ONLY   = 0x0004,
+   DP_PIXEL_ENCODING_TYPE_YCBCR420 = 0x0005
+};
+
+enum dp_component_depth {
+   DP_COMPONENT_PIXEL_DEPTH_6BPC   = 0x,
+   DP_COMPONENT_PIXEL_DEPTH_8BPC   = 0x0001,
+   DP_COMPONENT_PIXEL_DEPTH_10BPC  = 0x0002,
+   DP_COMPONENT_PIXEL_DEPTH_12BPC  = 0x0003,
+   DP_COMPONENT_PIXEL_DEPTH_16BPC  = 0x0004
+};
+
 struct encoder_info_frame {
/* auxiliary video information */
struct dc_info_packet avi;
@@ -138,6 +156,7 @@ struct stream_encoder_funcs {
 
void (*set_avmute)(
struct stream_encoder *enc, bool enable);
+
 };
 
 #endif /* STREAM_ENCODER_H_ */
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 18/21] drm/amd/display: add rq/dlg/ttu to dtn log

2018-04-10 Thread Harry Wentland
From: Dmytro Laktyushkin 

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dc_helper.c |  59 
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c  | 153 -
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h  |  19 +--
 .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c  | 114 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h   |  20 +++
 drivers/gpu/drm/amd/display/dc/inc/reg_helper.h|  56 
 6 files changed, 401 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_helper.c 
b/drivers/gpu/drm/amd/display/dc/dc_helper.c
index 48e1fcf53d43..bd0fda0ceb91 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_helper.c
+++ b/drivers/gpu/drm/amd/display/dc/dc_helper.c
@@ -117,6 +117,65 @@ uint32_t generic_reg_get5(const struct dc_context *ctx, 
uint32_t addr,
return reg_val;
 }
 
+uint32_t generic_reg_get6(const struct dc_context *ctx, uint32_t addr,
+   uint8_t shift1, uint32_t mask1, uint32_t *field_value1,
+   uint8_t shift2, uint32_t mask2, uint32_t *field_value2,
+   uint8_t shift3, uint32_t mask3, uint32_t *field_value3,
+   uint8_t shift4, uint32_t mask4, uint32_t *field_value4,
+   uint8_t shift5, uint32_t mask5, uint32_t *field_value5,
+   uint8_t shift6, uint32_t mask6, uint32_t *field_value6)
+{
+   uint32_t reg_val = dm_read_reg(ctx, addr);
+   *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1);
+   *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2);
+   *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3);
+   *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4);
+   *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5);
+   *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6);
+   return reg_val;
+}
+
+uint32_t generic_reg_get7(const struct dc_context *ctx, uint32_t addr,
+   uint8_t shift1, uint32_t mask1, uint32_t *field_value1,
+   uint8_t shift2, uint32_t mask2, uint32_t *field_value2,
+   uint8_t shift3, uint32_t mask3, uint32_t *field_value3,
+   uint8_t shift4, uint32_t mask4, uint32_t *field_value4,
+   uint8_t shift5, uint32_t mask5, uint32_t *field_value5,
+   uint8_t shift6, uint32_t mask6, uint32_t *field_value6,
+   uint8_t shift7, uint32_t mask7, uint32_t *field_value7)
+{
+   uint32_t reg_val = dm_read_reg(ctx, addr);
+   *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1);
+   *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2);
+   *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3);
+   *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4);
+   *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5);
+   *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6);
+   *field_value7 = get_reg_field_value_ex(reg_val, mask7, shift7);
+   return reg_val;
+}
+
+uint32_t generic_reg_get8(const struct dc_context *ctx, uint32_t addr,
+   uint8_t shift1, uint32_t mask1, uint32_t *field_value1,
+   uint8_t shift2, uint32_t mask2, uint32_t *field_value2,
+   uint8_t shift3, uint32_t mask3, uint32_t *field_value3,
+   uint8_t shift4, uint32_t mask4, uint32_t *field_value4,
+   uint8_t shift5, uint32_t mask5, uint32_t *field_value5,
+   uint8_t shift6, uint32_t mask6, uint32_t *field_value6,
+   uint8_t shift7, uint32_t mask7, uint32_t *field_value7,
+   uint8_t shift8, uint32_t mask8, uint32_t *field_value8)
+{
+   uint32_t reg_val = dm_read_reg(ctx, addr);
+   *field_value1 = get_reg_field_value_ex(reg_val, mask1, shift1);
+   *field_value2 = get_reg_field_value_ex(reg_val, mask2, shift2);
+   *field_value3 = get_reg_field_value_ex(reg_val, mask3, shift3);
+   *field_value4 = get_reg_field_value_ex(reg_val, mask4, shift4);
+   *field_value5 = get_reg_field_value_ex(reg_val, mask5, shift5);
+   *field_value6 = get_reg_field_value_ex(reg_val, mask6, shift6);
+   *field_value7 = get_reg_field_value_ex(reg_val, mask7, shift7);
+   *field_value8 = get_reg_field_value_ex(reg_val, mask8, shift8);
+   return reg_val;
+}
 /* note:  va version of this is pretty bad idea, since there is a output 
parameter pass by pointer
  * compiler won't be able to check for size match and is prone to stack 
corruption type of bugs
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c
index 4ca9b6e9a824..58062172cf3f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c
+++ 

[PATCH 13/21] drm/amd/display: Move DCC support functions into dchubbub

2018-04-10 Thread Harry Wentland
From: Eric Bernstein 

Added dchububu.h header file for common enum/struct definitions.
Added new interface functions get_dcc_compression_cap,
dcc_support_swizzle, dcc_support_pixel_format.

Signed-off-by: Eric Bernstein 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Harry Wentland 
---
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c| 221 +++-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.h|   7 +-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c  | 231 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h   |  64 ++
 4 files changed, 291 insertions(+), 232 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
index 738f67ffd1b4..b9fb14a3224b 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c
@@ -476,8 +476,227 @@ void hubbub1_toggle_watermark_change_req(struct hubbub 
*hubbub)
DCHUBBUB_ARB_WATERMARK_CHANGE_REQUEST, 
watermark_change_req);
 }
 
+static bool hubbub1_dcc_support_swizzle(
+   enum swizzle_mode_values swizzle,
+   unsigned int bytes_per_element,
+   enum segment_order *segment_order_horz,
+   enum segment_order *segment_order_vert)
+{
+   bool standard_swizzle = false;
+   bool display_swizzle = false;
+
+   switch (swizzle) {
+   case DC_SW_4KB_S:
+   case DC_SW_64KB_S:
+   case DC_SW_VAR_S:
+   case DC_SW_4KB_S_X:
+   case DC_SW_64KB_S_X:
+   case DC_SW_VAR_S_X:
+   standard_swizzle = true;
+   break;
+   case DC_SW_4KB_D:
+   case DC_SW_64KB_D:
+   case DC_SW_VAR_D:
+   case DC_SW_4KB_D_X:
+   case DC_SW_64KB_D_X:
+   case DC_SW_VAR_D_X:
+   display_swizzle = true;
+   break;
+   default:
+   break;
+   }
+
+   if (bytes_per_element == 1 && standard_swizzle) {
+   *segment_order_horz = segment_order__contiguous;
+   *segment_order_vert = segment_order__na;
+   return true;
+   }
+   if (bytes_per_element == 2 && standard_swizzle) {
+   *segment_order_horz = segment_order__non_contiguous;
+   *segment_order_vert = segment_order__contiguous;
+   return true;
+   }
+   if (bytes_per_element == 4 && standard_swizzle) {
+   *segment_order_horz = segment_order__non_contiguous;
+   *segment_order_vert = segment_order__contiguous;
+   return true;
+   }
+   if (bytes_per_element == 8 && standard_swizzle) {
+   *segment_order_horz = segment_order__na;
+   *segment_order_vert = segment_order__contiguous;
+   return true;
+   }
+   if (bytes_per_element == 8 && display_swizzle) {
+   *segment_order_horz = segment_order__contiguous;
+   *segment_order_vert = segment_order__non_contiguous;
+   return true;
+   }
+
+   return false;
+}
+
+static bool hubbub1_dcc_support_pixel_format(
+   enum surface_pixel_format format,
+   unsigned int *bytes_per_element)
+{
+   /* DML: get_bytes_per_element */
+   switch (format) {
+   case SURFACE_PIXEL_FORMAT_GRPH_ARGB1555:
+   case SURFACE_PIXEL_FORMAT_GRPH_RGB565:
+   *bytes_per_element = 2;
+   return true;
+   case SURFACE_PIXEL_FORMAT_GRPH_ARGB:
+   case SURFACE_PIXEL_FORMAT_GRPH_ABGR:
+   case SURFACE_PIXEL_FORMAT_GRPH_ARGB2101010:
+   case SURFACE_PIXEL_FORMAT_GRPH_ABGR2101010:
+   *bytes_per_element = 4;
+   return true;
+   case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616:
+   case SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616F:
+   case SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616F:
+   *bytes_per_element = 8;
+   return true;
+   default:
+   return false;
+   }
+}
+
+static void hubbub1_get_blk256_size(unsigned int *blk256_width, unsigned int 
*blk256_height,
+   unsigned int bytes_per_element)
+{
+   /* copied from DML.  might want to refactor DML to leverage from DML */
+   /* DML : get_blk256_size */
+   if (bytes_per_element == 1) {
+   *blk256_width = 16;
+   *blk256_height = 16;
+   } else if (bytes_per_element == 2) {
+   *blk256_width = 16;
+   *blk256_height = 8;
+   } else if (bytes_per_element == 4) {
+   *blk256_width = 8;
+   *blk256_height = 8;
+   } else if (bytes_per_element == 8) {
+   *blk256_width = 8;
+   *blk256_height = 4;
+   }
+}
+
+static void hubbub1_det_request_size(
+  

[PATCH 02/21] drm/amd/display: fix brightness level after resume from suspend

2018-04-10 Thread Harry Wentland
From: Roman Li 

Adding missing call to cache current backlight values.
Otherwise the brightness resets to default value on resume.

Signed-off-by: Roman Li 
Reviewed-by: Charlene Liu 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c   | 13 +
 drivers/gpu/drm/amd/display/dc/dc_link.h|  2 ++
 drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c |  4 +++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 0cd286f8eaa0..b44cf52090a5 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -2018,6 +2018,19 @@ bool dc_link_set_backlight_level(const struct dc_link 
*link, uint32_t level,
return true;
 }
 
+bool dc_link_set_abm_disable(const struct dc_link *link)
+{
+   struct dc  *core_dc = link->ctx->dc;
+   struct abm *abm = core_dc->res_pool->abm;
+
+   if ((abm == NULL) || (abm->funcs->set_backlight_level == NULL))
+   return false;
+
+   abm->funcs->set_abm_immediate_disable(abm);
+
+   return true;
+}
+
 bool dc_link_set_psr_enable(const struct dc_link *link, bool enable, bool wait)
 {
struct dc  *core_dc = link->ctx->dc;
diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index eeff98741293..8a716baa1203 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -141,6 +141,8 @@ static inline struct dc_link *dc_get_link_at_index(struct 
dc *dc, uint32_t link_
 bool dc_link_set_backlight_level(const struct dc_link *dc_link, uint32_t level,
uint32_t frame_ramp, const struct dc_stream_state *stream);
 
+bool dc_link_set_abm_disable(const struct dc_link *dc_link);
+
 bool dc_link_set_psr_enable(const struct dc_link *dc_link, bool enable, bool 
wait);
 
 bool dc_link_get_psr_state(const struct dc_link *dc_link, uint32_t *psr_state);
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 308a1989fb94..71e4812217bb 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1046,8 +1046,10 @@ void dce110_blank_stream(struct pipe_ctx *pipe_ctx)
struct dc_stream_state *stream = pipe_ctx->stream;
struct dc_link *link = stream->sink->link;
 
-   if (link->local_sink && link->local_sink->sink_signal == 
SIGNAL_TYPE_EDP)
+   if (link->local_sink && link->local_sink->sink_signal == 
SIGNAL_TYPE_EDP) {
link->dc->hwss.edp_backlight_control(link, false);
+   dc_link_set_abm_disable(link);
+   }
 
if (dc_is_dp_signal(pipe_ctx->stream->signal))

pipe_ctx->stream_res.stream_enc->funcs->dp_blank(pipe_ctx->stream_res.stream_enc);
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 17/21] drm/amd/display: Check SCRATCH reg to determine S3 resume.

2018-04-10 Thread Harry Wentland
From: Yongqiang Sun 

Use lid state only to determine fast boot optimization is not enough.
For S3/Resume, due to bios isn't involved in boot, eDP wasn't
light up, while lid state is open, if do fast boot optimization,
eDP panel will skip enable link and result in black screen after boot.
And becasue of bios isn't involved, no matter UEFI or Legacy boot,
BIOS_SCRATCH_3 value should be 0, use this to determine the case.

Signed-off-by: Yongqiang Sun 
Reviewed-by: Charlene Liu 
Acked-by: Harry Wentland 
---
 .../amd/display/dc/dce110/dce110_hw_sequencer.c| 33 ++
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 3ba057e2a467..9e1a8823d3d8 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1526,18 +1526,41 @@ static struct dc_link *get_link_for_edp_not_in_use(
  */
 void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context)
 {
-   /* check eDP lid state:
-* If lid is open, vbios already light up eDP, so we can leverage vbios 
and skip eDP
-* programming
+   /* check eDP lid state and BIOS_SCRATCH_3 to determine fast boot 
optimization
+* UEFI boot
+*  edp_active_status_from_scratch  
fast boot optimization
+* S4/S5 resume:
+* Lid Open true
true
+* Lid Closefalse   
false
+*
+* S3/ resume:
+* Lid Open false   
false
+* Lid Closefalse   
false
+*
+* Legacy boot:
+*  edp_active_status_from_scratch  
fast boot optimization
+* S4/S resume:
+* Lid Open true
true
+* Lid Closetrue
false
+*
+* S3/ resume:
+* Lid Open false   
false
+* Lid Closefalse   
false
 */
+   struct dc_bios *dcb = dc->ctx->dc_bios;
bool lid_state_closed = is_eDP_lid_closed(context);
struct dc_link *edp_link_to_turnoff = NULL;
+   bool edp_active_status_from_scratch =
+   (dcb->funcs->get_vga_enabled_displays(dc->ctx->dc_bios) 
== ATOM_DISPLAY_LCD1_ACTIVE);
 
+   /*Lid open*/
if (!lid_state_closed) {
edp_link_to_turnoff = get_link_for_edp_not_in_use(dc, context);
 
-   /* if OS doesn't light up eDP and eDP link is available, we 
want to disable */
-   if (!edp_link_to_turnoff)
+   /* if OS doesn't light up eDP and eDP link is available, we 
want to disable
+* If resume from S4/S5, should optimization.
+*/
+   if (!edp_link_to_turnoff && edp_active_status_from_scratch)
dc->apply_edp_fast_boot_optimization = true;
}
 
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 06/21] drm/amd/display: Fix bug where refresh rate becomes fixed

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

This issue occurs if refresh rate range is very small and lfc is not used.
When frame spikes occur, refresh rate becomes fixed and will not restore 
properly

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 .../drm/amd/display/modules/freesync/freesync.c| 43 --
 .../gpu/drm/amd/display/modules/inc/mod_freesync.h |  3 ++
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index 4af73a72b9a9..be6a6c63b4cc 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -168,21 +168,6 @@ static unsigned int calc_v_total_from_duration(
return v_total;
 }
 
-static unsigned long long calc_nominal_field_rate(const struct dc_stream_state 
*stream)
-{
-   unsigned long long nominal_field_rate_in_uhz = 0;
-
-   /* Calculate nominal field rate for stream */
-   nominal_field_rate_in_uhz = stream->timing.pix_clk_khz;
-   nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL;
-   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
-   stream->timing.h_total);
-   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
-   stream->timing.v_total);
-
-   return nominal_field_rate_in_uhz;
-}
-
 static void update_v_total_for_static_ramp(
struct core_freesync *core_freesync,
const struct dc_stream_state *stream,
@@ -441,10 +426,11 @@ static void apply_fixed_refresh(struct core_freesync 
*core_freesync,
in_out_vrr->adjust.v_total_min;
} else {
in_out_vrr->adjust.v_total_min =
-   calc_v_total_from_refresh(
-   stream, in_out_vrr->max_refresh_in_uhz);
+   calc_v_total_from_refresh(stream,
+   in_out_vrr->max_refresh_in_uhz);
in_out_vrr->adjust.v_total_max =
-   in_out_vrr->adjust.v_total_min;
+   calc_v_total_from_refresh(stream,
+   in_out_vrr->min_refresh_in_uhz);
}
}
 }
@@ -638,7 +624,8 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
core_freesync = MOD_FREESYNC_TO_CORE(mod_freesync);
 
/* Calculate nominal field rate for stream */
-   nominal_field_rate_in_uhz = calc_nominal_field_rate(stream);
+   nominal_field_rate_in_uhz =
+   mod_freesync_calc_nominal_field_rate(stream);
 
min_refresh_in_uhz = in_config->min_refresh_in_uhz;
max_refresh_in_uhz = in_config->max_refresh_in_uhz;
@@ -888,6 +875,22 @@ void mod_freesync_get_settings(struct mod_freesync 
*mod_freesync,
}
 }
 
+unsigned long long mod_freesync_calc_nominal_field_rate(
+   const struct dc_stream_state *stream)
+{
+   unsigned long long nominal_field_rate_in_uhz = 0;
+
+   /* Calculate nominal field rate for stream */
+   nominal_field_rate_in_uhz = stream->timing.pix_clk_khz;
+   nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL;
+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
+   stream->timing.h_total);
+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
+   stream->timing.v_total);
+
+   return nominal_field_rate_in_uhz;
+}
+
 bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync,
const struct dc_stream_state *stream,
uint32_t min_refresh_cap_in_uhz,
@@ -897,7 +900,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync 
*mod_freesync,
 {
/* Calculate nominal field rate for stream */
unsigned long long nominal_field_rate_in_uhz =
-   calc_nominal_field_rate(stream);
+   mod_freesync_calc_nominal_field_rate(stream);
 
// Check nominal is within range
if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz ||
diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h 
b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
index e7d77bb6209f..85c98afe9375 100644
--- a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
+++ b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
@@ -159,6 +159,9 @@ void mod_freesync_handle_v_update(struct mod_freesync 
*mod_freesync,
const struct dc_stream_state *stream,
struct mod_vrr_params *in_out_vrr);
 
+unsigned long long 

[PATCH 15/21] drm/amd/display: HDMI has no sound after Panel power off/on

2018-04-10 Thread Harry Wentland
From: Charlene Liu 

Signed-off-by: Charlene Liu 
Reviewed-by: Krunoslav Kovac 
Acked-by: Harry Wentland 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
index 07c32421c226..84e26c894046 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c
@@ -718,6 +718,8 @@ static void dce110_stream_encoder_update_hdmi_info_packets(
if (info_frame->avi.valid) {
const uint32_t *content =
(const uint32_t *) _frame->avi.sb[0];
+   /*we need turn on clock before programming AFMT block*/
+   REG_UPDATE(AFMT_CNTL, AFMT_AUDIO_CLOCK_EN, 1);
 
REG_WRITE(AFMT_AVI_INFO0, content[0]);
 
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 19/21] drm/amd/display: add calculated clock logging to DTN

2018-04-10 Thread Harry Wentland
From: Dmytro Laktyushkin 

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Dmytro Laktyushkin 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index c9d4e96084b7..468113d49c95 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -311,7 +311,16 @@ void dcn10_log_hw_state(struct dc *dc)
print_rq_dlg_ttu_regs(dc_ctx, );
DTN_INFO("\n");
}
-   DTN_INFO("\n");
+
+   DTN_INFO("\nCALCULATED Clocks: dcfclk_khz:%d  dcfclk_deep_sleep_khz:%d  
dispclk_khz:%d\n"
+   "dppclk_khz:%d  max_supported_dppclk_khz:%d  fclk_khz:%d  
socclk_khz:%d\n\n",
+   dc->current_state->bw.dcn.calc_clk.dcfclk_khz,
+   
dc->current_state->bw.dcn.calc_clk.dcfclk_deep_sleep_khz,
+   dc->current_state->bw.dcn.calc_clk.dispclk_khz,
+   dc->current_state->bw.dcn.calc_clk.dppclk_khz,
+   
dc->current_state->bw.dcn.calc_clk.max_supported_dppclk_khz,
+   dc->current_state->bw.dcn.calc_clk.fclk_khz,
+   dc->current_state->bw.dcn.calc_clk.socclk_khz);
 
log_mpc_crc(dc);
 
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 00/21] DC Patches Apr 10, 2018

2018-04-10 Thread Harry Wentland
 * Fix audio enablement on HDMI after panel power off/on
 * Fix brightness after resume

Anthony Koo (7):
  drm/amd/display: add method to check for supported range
  drm/amd/display: Fix bug where refresh rate becomes fixed
  drm/amd/display: Fix bug that causes black screen
  drm/amd/display: Add back code to allow for rounding error
  drm/amd/display: Do not create memory allocation if stats not enabled
  drm/amd/display: fix LFC tearing at top of screen
  drm/amd/display: refactor vupdate interrupt registration

Charlene Liu (1):
  drm/amd/display: HDMI has no sound after Panel power off/on

Dmytro Laktyushkin (4):
  drm/amd/display: fix segfault on insufficient TG during validation
  drm/amd/display: change dml init to use default structs
  drm/amd/display: add rq/dlg/ttu to dtn log
  drm/amd/display: add calculated clock logging to DTN

Eric Bernstein (2):
  drm/amd/display: Move dp_pixel_encoding_type to stream_encoder include
  drm/amd/display: Move DCC support functions into dchubbub

Eric Yang (1):
  drm/amd/display: dal 3.1.42

Leo (Sunpeng) Li (1):
  drm/amd/display: Fix regamma not affecting full-intensity color values

Roman Li (1):
  drm/amd/display: fix brightness level after resume from suspend

Yongqiang Sun (3):
  drm/amd/display: Check lid state to determine fast boot optimization.
  drm/amd/display: Check SCRATCH reg to determine S3 resume.
  drm/amd/display: Use dig enable to determine fast boot optimization.

Yue Hin Lau (1):
  drm/amd/display: add missing colorspace for set black color

 .../gpu/drm/amd/display/dc/core/dc_hw_sequencer.c  |  21 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c  |  13 ++
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c  |   2 +-
 drivers/gpu/drm/amd/display/dc/dc.h|   2 +-
 drivers/gpu/drm/amd/display/dc/dc_helper.c |  59 ++
 drivers/gpu/drm/amd/display/dc/dc_link.h   |   2 +
 .../gpu/drm/amd/display/dc/dce/dce_link_encoder.c  |   6 +-
 .../gpu/drm/amd/display/dc/dce/dce_link_encoder.h  |   2 +
 .../drm/amd/display/dc/dce/dce_stream_encoder.c|   2 +
 .../amd/display/dc/dce110/dce110_hw_sequencer.c|  43 ++--
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.c| 221 +++-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_hubbub.h|   7 +-
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.c  | 153 +-
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hubp.h  |  19 +-
 .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c  | 123 ++-
 .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c  | 231 +
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.c  | 138 ++--
 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h   |  64 ++
 drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h   |  20 ++
 drivers/gpu/drm/amd/display/dc/inc/hw/hw_shared.h  |  17 --
 .../gpu/drm/amd/display/dc/inc/hw/link_encoder.h   |   1 +
 .../gpu/drm/amd/display/dc/inc/hw/stream_encoder.h |  19 ++
 drivers/gpu/drm/amd/display/dc/inc/reg_helper.h|  56 +
 .../drm/amd/display/modules/freesync/freesync.c| 127 +++
 .../gpu/drm/amd/display/modules/inc/mod_freesync.h |  10 +
 drivers/gpu/drm/amd/display/modules/stats/stats.c  |  26 ++-
 26 files changed, 986 insertions(+), 398 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/inc/hw/dchubbub.h

-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/21] drm/amd/display: Fix regamma not affecting full-intensity color values

2018-04-10 Thread Harry Wentland
From: "Leo (Sunpeng) Li" 

Hardware understands the regamma LUT as a piecewise linear function,
with points spaced exponentially along the range. We previously
programmed the LUT for range [2^-10, 2^0). This causes (normalized)
color values of 1 (=2^0) to miss the programmed LUT, and fall onto the
end region.

For DCE, the end region is extrapolated using a single (base, slope)
pair, using the max y-value from the last point in the curve as base.
This presents a problem, since this value affects all three color
channels. Scaling down the intensity of say - the blue regamma curve -
will not affect it's end region. This is especially noticiable when
using RedShift. It scales down the blue and green channels, but leaves
full-intensity colors unshifted.

Therefore, extend the range to cover [2^-10, 2^1) by programming another
hardware segment, containing only one point. That way, we won't be
hitting the end region.

Note that things are a bit different for DCN, since the end region can
be set per-channel.

Signed-off-by: Leo (Sunpeng) Li 
Reviewed-by: Krunoslav Kovac 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 71e4812217bb..0ff2a8092782 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -456,10 +456,13 @@ dce110_translate_regamma_to_hw_format(const struct 
dc_transfer_func *output_tf,
 
} else {
/* 10 segments
-* segment is from 2^-10 to 2^0
+* segment is from 2^-10 to 2^1
+* We include an extra segment for range [2^0, 2^1). This is to
+* ensure that colors with normalized values of 1 don't miss the
+* LUT.
 */
region_start = -10;
-   region_end = 0;
+   region_end = 1;
 
seg_distr[0] = 4;
seg_distr[1] = 4;
@@ -471,7 +474,7 @@ dce110_translate_regamma_to_hw_format(const struct 
dc_transfer_func *output_tf,
seg_distr[7] = 4;
seg_distr[8] = 4;
seg_distr[9] = 4;
-   seg_distr[10] = -1;
+   seg_distr[10] = 0;
seg_distr[11] = -1;
seg_distr[12] = -1;
seg_distr[13] = -1;
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 21/21] drm/amd/display: Use dig enable to determine fast boot optimization.

2018-04-10 Thread Harry Wentland
From: Yongqiang Sun 

Linux doesn't know lid state, better to check dig enable
value from register.

Signed-off-by: Yongqiang Sun 
Reviewed-by: Tony Cheng 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dc_stream.h |  1 -
 .../gpu/drm/amd/display/dc/dce/dce_link_encoder.c  |  6 ++-
 .../gpu/drm/amd/display/dc/dce/dce_link_encoder.h  |  2 +
 .../amd/display/dc/dce110/dce110_hw_sequencer.c| 47 +++---
 .../gpu/drm/amd/display/dc/inc/hw/link_encoder.h   |  1 +
 5 files changed, 21 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h 
b/drivers/gpu/drm/amd/display/dc/dc_stream.h
index 5f215ca38c07..046e87aa699a 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_stream.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h
@@ -98,7 +98,6 @@ struct dc_stream_state {
int phy_pix_clk;
enum signal_type signal;
bool dpms_off;
-   bool lid_state_closed;
 
struct dc_stream_status status;
 
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c
index 8167cad7bcf7..dbe3b26b6d9e 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.c
@@ -113,6 +113,7 @@ static const struct link_encoder_funcs dce110_lnk_enc_funcs 
= {
.connect_dig_be_to_fe = dce110_link_encoder_connect_dig_be_to_fe,
.enable_hpd = dce110_link_encoder_enable_hpd,
.disable_hpd = dce110_link_encoder_disable_hpd,
+   .is_dig_enabled = dce110_is_dig_enabled,
.destroy = dce110_link_encoder_destroy
 };
 
@@ -535,8 +536,9 @@ void dce110_psr_program_secondary_packet(struct 
link_encoder *enc,
DP_SEC_GSP0_PRIORITY, 1);
 }
 
-static bool is_dig_enabled(const struct dce110_link_encoder *enc110)
+bool dce110_is_dig_enabled(struct link_encoder *enc)
 {
+   struct dce110_link_encoder *enc110 = TO_DCE110_LINK_ENC(enc);
uint32_t value;
 
REG_GET(DIG_BE_EN_CNTL, DIG_ENABLE, );
@@ -1031,7 +1033,7 @@ void dce110_link_encoder_disable_output(
struct bp_transmitter_control cntl = { 0 };
enum bp_result result;
 
-   if (!is_dig_enabled(enc110)) {
+   if (!dce110_is_dig_enabled(enc)) {
/* OF_SKIP_POWER_DOWN_INACTIVE_ENCODER */
return;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h 
b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h
index 0ec3433d34b6..347069461a22 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_link_encoder.h
@@ -263,4 +263,6 @@ void dce110_psr_program_dp_dphy_fast_training(struct 
link_encoder *enc,
 void dce110_psr_program_secondary_packet(struct link_encoder *enc,
unsigned int sdp_transmit_line_num_deadline);
 
+bool dce110_is_dig_enabled(struct link_encoder *enc);
+
 #endif /* __DC_LINK_ENCODER__DCE110_H__ */
diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 9e1a8823d3d8..5dbd4335cd6e 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1481,15 +1481,15 @@ static void disable_vga_and_power_gate_all_controllers(
}
 }
 
-static bool is_eDP_lid_closed(struct dc_state *context)
+static struct dc_link *get_link_for_edp(struct dc *dc)
 {
int i;
 
-   for (i = 0; i < context->stream_count; i++) {
-   if (context->streams[i]->signal == SIGNAL_TYPE_EDP)
-   return context->streams[i]->lid_state_closed;
+   for (i = 0; i < dc->link_count; i++) {
+   if (dc->links[i]->connector_signal == SIGNAL_TYPE_EDP)
+   return dc->links[i];
}
-   return false;
+   return NULL;
 }
 
 static struct dc_link *get_link_for_edp_not_in_use(
@@ -1526,41 +1526,22 @@ static struct dc_link *get_link_for_edp_not_in_use(
  */
 void dce110_enable_accelerated_mode(struct dc *dc, struct dc_state *context)
 {
-   /* check eDP lid state and BIOS_SCRATCH_3 to determine fast boot 
optimization
-* UEFI boot
-*  edp_active_status_from_scratch  
fast boot optimization
-* S4/S5 resume:
-* Lid Open true
true
-* Lid Closefalse   
false
-*
-* S3/ resume:
-* Lid Open false   
false
-* Lid Closefalse   
false
-*
-* Legacy boot:
-*  

[PATCH 20/21] drm/amd/display: add missing colorspace for set black color

2018-04-10 Thread Harry Wentland
From: Yue Hin Lau 

Signed-off-by: Yue Hin Lau 
Reviewed-by: Tony Cheng 
Acked-by: Harry Wentland 
---
 .../gpu/drm/amd/display/dc/core/dc_hw_sequencer.c   | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
index ebc96b720083..ab50b5f0745c 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
@@ -208,6 +208,7 @@ void color_space_to_black_color(
case COLOR_SPACE_YCBCR709:
case COLOR_SPACE_YCBCR601_LIMITED:
case COLOR_SPACE_YCBCR709_LIMITED:
+   case COLOR_SPACE_2020_YCBCR:
*black_color = black_color_format[BLACK_COLOR_FORMAT_YUV_CV];
break;
 
@@ -216,7 +217,25 @@ void color_space_to_black_color(
black_color_format[BLACK_COLOR_FORMAT_RGB_LIMITED];
break;
 
-   default:
+   /**
+* Remove default and add case for all color space
+* so when we forget to add new color space
+* compiler will give a warning
+*/
+   case COLOR_SPACE_UNKNOWN:
+   case COLOR_SPACE_SRGB:
+   case COLOR_SPACE_XR_RGB:
+   case COLOR_SPACE_MSREF_SCRGB:
+   case COLOR_SPACE_XV_YCC_709:
+   case COLOR_SPACE_XV_YCC_601:
+   case COLOR_SPACE_2020_RGB_FULLRANGE:
+   case COLOR_SPACE_2020_RGB_LIMITEDRANGE:
+   case COLOR_SPACE_ADOBERGB:
+   case COLOR_SPACE_DCIP3:
+   case COLOR_SPACE_DISPLAYNATIVE:
+   case COLOR_SPACE_DOLBYVISION:
+   case COLOR_SPACE_APPCTRL:
+   case COLOR_SPACE_CUSTOMPOINTS:
/* fefault is sRGB black (full range). */
*black_color =
black_color_format[BLACK_COLOR_FORMAT_RGB_FULLRANGE];
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 05/21] drm/amd/display: add method to check for supported range

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 .../drm/amd/display/modules/freesync/freesync.c| 64 --
 .../gpu/drm/amd/display/modules/inc/mod_freesync.h |  7 +++
 2 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index 5e12e463c06a..4af73a72b9a9 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -168,6 +168,21 @@ static unsigned int calc_v_total_from_duration(
return v_total;
 }
 
+static unsigned long long calc_nominal_field_rate(const struct dc_stream_state 
*stream)
+{
+   unsigned long long nominal_field_rate_in_uhz = 0;
+
+   /* Calculate nominal field rate for stream */
+   nominal_field_rate_in_uhz = stream->timing.pix_clk_khz;
+   nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL;
+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
+   stream->timing.h_total);
+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
+   stream->timing.v_total);
+
+   return nominal_field_rate_in_uhz;
+}
+
 static void update_v_total_for_static_ramp(
struct core_freesync *core_freesync,
const struct dc_stream_state *stream,
@@ -623,12 +638,7 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
core_freesync = MOD_FREESYNC_TO_CORE(mod_freesync);
 
/* Calculate nominal field rate for stream */
-   nominal_field_rate_in_uhz = stream->timing.pix_clk_khz;
-   nominal_field_rate_in_uhz *= 1000ULL * 1000ULL * 1000ULL;
-   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
-   stream->timing.h_total);
-   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz,
-   stream->timing.v_total);
+   nominal_field_rate_in_uhz = calc_nominal_field_rate(stream);
 
min_refresh_in_uhz = in_config->min_refresh_in_uhz;
max_refresh_in_uhz = in_config->max_refresh_in_uhz;
@@ -878,3 +888,45 @@ void mod_freesync_get_settings(struct mod_freesync 
*mod_freesync,
}
 }
 
+bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync,
+   const struct dc_stream_state *stream,
+   uint32_t min_refresh_cap_in_uhz,
+   uint32_t max_refresh_cap_in_uhz,
+   uint32_t min_refresh_request_in_uhz,
+   uint32_t max_refresh_request_in_uhz)
+{
+   /* Calculate nominal field rate for stream */
+   unsigned long long nominal_field_rate_in_uhz =
+   calc_nominal_field_rate(stream);
+
+   // Check nominal is within range
+   if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz ||
+   nominal_field_rate_in_uhz < min_refresh_cap_in_uhz)
+   return false;
+
+   // If nominal is less than max, limit the max allowed refresh rate
+   if (nominal_field_rate_in_uhz < max_refresh_cap_in_uhz)
+   max_refresh_cap_in_uhz = nominal_field_rate_in_uhz;
+
+   // Don't allow min > max
+   if (min_refresh_request_in_uhz > max_refresh_request_in_uhz)
+   return false;
+
+   // Check min is within range
+   if (min_refresh_request_in_uhz > max_refresh_cap_in_uhz ||
+   min_refresh_request_in_uhz < min_refresh_cap_in_uhz)
+   return false;
+
+   // Check max is within range
+   if (max_refresh_request_in_uhz > max_refresh_cap_in_uhz ||
+   max_refresh_request_in_uhz < min_refresh_cap_in_uhz)
+   return false;
+
+   // For variable range, check for at least 10 Hz range
+   if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) &&
+   (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 
1000))
+   return false;
+
+   return true;
+}
+
diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h 
b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
index bd75ca5f1cd3..e7d77bb6209f 100644
--- a/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
+++ b/drivers/gpu/drm/amd/display/modules/inc/mod_freesync.h
@@ -159,4 +159,11 @@ void mod_freesync_handle_v_update(struct mod_freesync 
*mod_freesync,
const struct dc_stream_state *stream,
struct mod_vrr_params *in_out_vrr);
 
+bool mod_freesync_is_valid_range(struct mod_freesync *mod_freesync,
+   const struct dc_stream_state *stream,
+   uint32_t min_refresh_cap_in_uhz,
+   uint32_t max_refresh_cap_in_uhz,
+   uint32_t 

[PATCH 09/21] drm/amd/display: change dml init to use default structs

2018-04-10 Thread Harry Wentland
From: Dmytro Laktyushkin 

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Eric Bernstein 
Acked-by: Harry Wentland 
---
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.c  | 138 -
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c 
b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
index c109b2c34c8f..fd9d97aab071 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.c
@@ -26,75 +26,89 @@
 #include "display_mode_lib.h"
 #include "dc_features.h"
 
+static const struct _vcs_dpi_ip_params_st dcn1_0_ip = {
+   .rob_buffer_size_kbytes = 64,
+   .det_buffer_size_kbytes = 164,
+   .dpte_buffer_size_in_pte_reqs = 42,
+   .dpp_output_buffer_pixels = 2560,
+   .opp_output_buffer_lines = 1,
+   .pixel_chunk_size_kbytes = 8,
+   .pte_enable = 1,
+   .pte_chunk_size_kbytes = 2,
+   .meta_chunk_size_kbytes = 2,
+   .writeback_chunk_size_kbytes = 2,
+   .line_buffer_size_bits = 589824,
+   .max_line_buffer_lines = 12,
+   .IsLineBufferBppFixed = 0,
+   .LineBufferFixedBpp = -1,
+   .writeback_luma_buffer_size_kbytes = 12,
+   .writeback_chroma_buffer_size_kbytes = 8,
+   .max_num_dpp = 4,
+   .max_num_wb = 2,
+   .max_dchub_pscl_bw_pix_per_clk = 4,
+   .max_pscl_lb_bw_pix_per_clk = 2,
+   .max_lb_vscl_bw_pix_per_clk = 4,
+   .max_vscl_hscl_bw_pix_per_clk = 4,
+   .max_hscl_ratio = 4,
+   .max_vscl_ratio = 4,
+   .hscl_mults = 4,
+   .vscl_mults = 4,
+   .max_hscl_taps = 8,
+   .max_vscl_taps = 8,
+   .dispclk_ramp_margin_percent = 1,
+   .underscan_factor = 1.10,
+   .min_vblank_lines = 14,
+   .dppclk_delay_subtotal = 90,
+   .dispclk_delay_subtotal = 42,
+   .dcfclk_cstate_latency = 10,
+   .max_inter_dcn_tile_repeaters = 8,
+   .can_vstartup_lines_exceed_vsync_plus_back_porch_lines_minus_one = 0,
+   .bug_forcing_LC_req_same_size_fixed = 0,
+};
+
+static const struct _vcs_dpi_soc_bounding_box_st dcn1_0_soc = {
+   .sr_exit_time_us = 9.0,
+   .sr_enter_plus_exit_time_us = 11.0,
+   .urgent_latency_us = 4.0,
+   .writeback_latency_us = 12.0,
+   .ideal_dram_bw_after_urgent_percent = 80.0,
+   .max_request_size_bytes = 256,
+   .downspread_percent = 0.5,
+   .dram_page_open_time_ns = 50.0,
+   .dram_rw_turnaround_time_ns = 17.5,
+   .dram_return_buffer_per_channel_bytes = 8192,
+   .round_trip_ping_latency_dcfclk_cycles = 128,
+   .urgent_out_of_order_return_per_channel_bytes = 256,
+   .channel_interleave_bytes = 256,
+   .num_banks = 8,
+   .num_chans = 2,
+   .vmm_page_size_bytes = 4096,
+   .dram_clock_change_latency_us = 17.0,
+   .writeback_dram_clock_change_latency_us = 23.0,
+   .return_bus_width_bytes = 64,
+};
+
 static void set_soc_bounding_box(struct _vcs_dpi_soc_bounding_box_st *soc, 
enum dml_project project)
 {
-   if (project == DML_PROJECT_RAVEN1) {
-   soc->sr_exit_time_us = 9.0;
-   soc->sr_enter_plus_exit_time_us = 11.0;
-   soc->urgent_latency_us = 4.0;
-   soc->writeback_latency_us = 12.0;
-   soc->ideal_dram_bw_after_urgent_percent = 80.0;
-   soc->max_request_size_bytes = 256;
-   soc->downspread_percent = 0.5;
-   soc->dram_page_open_time_ns = 50.0;
-   soc->dram_rw_turnaround_time_ns = 17.5;
-   soc->dram_return_buffer_per_channel_bytes = 8192;
-   soc->round_trip_ping_latency_dcfclk_cycles = 128;
-   soc->urgent_out_of_order_return_per_channel_bytes = 256;
-   soc->channel_interleave_bytes = 256;
-   soc->num_banks = 8;
-   soc->num_chans = 2;
-   soc->vmm_page_size_bytes = 4096;
-   soc->dram_clock_change_latency_us = 17.0;
-   soc->writeback_dram_clock_change_latency_us = 23.0;
-   soc->return_bus_width_bytes = 64;
-   } else {
-   BREAK_TO_DEBUGGER(); /* Invalid Project Specified */
+   switch (project) {
+   case DML_PROJECT_RAVEN1:
+   *soc = dcn1_0_soc;
+   break;
+   default:
+   ASSERT(0);
+   break;
}
 }
 
 static void set_ip_params(struct _vcs_dpi_ip_params_st *ip, enum dml_project 
project)
 {
-   if (project == DML_PROJECT_RAVEN1) {
-   ip->rob_buffer_size_kbytes = 64;
-   ip->det_buffer_size_kbytes = 164;
-   ip->dpte_buffer_size_in_pte_reqs = 42;
-   ip->dpp_output_buffer_pixels = 2560;
-   ip->opp_output_buffer_lines = 1;
-   ip->pixel_chunk_size_kbytes = 8;
-   ip->pte_enable = 1;
-   

[PATCH 16/21] drm/amd/display: refactor vupdate interrupt registration

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

We only need to register once OS calls the interrupt control.
Also, if we are entering static screen mode, disable after ramping is done.
Disable shall be done via timer of 2 seconds regardless of ramping
complete or not, just to simplify.

Also, ramp to mid instead of min, due to better flicker performance...

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 .../gpu/drm/amd/display/modules/freesync/freesync.c   | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index daad60ec1ce3..349387eb9fe6 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -109,12 +109,6 @@ static unsigned int calc_duration_in_us_from_v_total(
* 1000) * stream->timing.h_total,
stream->timing.pix_clk_khz));
 
-   if (duration_in_us < in_vrr->min_duration_in_us)
-   duration_in_us = in_vrr->min_duration_in_us;
-
-   if (duration_in_us > in_vrr->max_duration_in_us)
-   duration_in_us = in_vrr->max_duration_in_us;
-
return duration_in_us;
 }
 
@@ -230,10 +224,9 @@ static void update_v_total_for_static_ramp(
}
}
 
-   v_total = calc_v_total_from_duration(stream,
-   in_out_vrr,
-   current_duration_in_us);
-
+   v_total = div64_u64(div64_u64(((unsigned long long)(
+   current_duration_in_us) * stream->timing.pix_clk_khz),
+   stream->timing.h_total), 1000);
 
in_out_vrr->adjust.v_total_min = v_total;
in_out_vrr->adjust.v_total_max = v_total;
@@ -702,7 +695,11 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
} else if (in_out_vrr->state == VRR_STATE_ACTIVE_FIXED) {
in_out_vrr->fixed.target_refresh_in_uhz =
in_out_vrr->min_refresh_in_uhz;
-   if (in_out_vrr->fixed.ramping_active) {
+   if (in_out_vrr->fixed.ramping_active &&
+   in_out_vrr->fixed.fixed_active) {
+   /* Do not update vtotals if ramping is already active
+* in order to continue ramp from current refresh.
+*/
in_out_vrr->fixed.fixed_active = true;
} else {
in_out_vrr->fixed.fixed_active = true;
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/21] drm/amd/display: dal 3.1.42

2018-04-10 Thread Harry Wentland
From: Eric Yang 

Signed-off-by: Eric Yang 
Reviewed-by: Anthony Koo 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index 0f566a1ba35b..7ac8a1bee5ac 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -38,7 +38,7 @@
 #include "inc/compressor.h"
 #include "dml/display_mode_lib.h"
 
-#define DC_VER "3.1.41"
+#define DC_VER "3.1.42"
 
 #define MAX_SURFACES 3
 #define MAX_STREAMS 6
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 14/21] drm/amd/display: fix LFC tearing at top of screen

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

Tearing occurred because new VTOTAL MIN/MAX was being programmed
too early.
The flip can happen within the VUPDATE high region, and the new min/max
would take effect immediately. But this means that frame is not variable
anymore, and tearing would occur when the flip actually happens.

The fixed insert duration should be programmed on the first VUPDATE
interrupt instead.

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index abd5c9374eb3..daad60ec1ce3 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -371,12 +371,6 @@ static void apply_below_the_range(struct core_freesync 
*core_freesync,
inserted_frame_duration_in_us;
in_out_vrr->btr.frames_to_insert = frames_to_insert;
in_out_vrr->btr.frame_counter = frames_to_insert;
-
-   in_out_vrr->adjust.v_total_min =
-   calc_v_total_from_duration(stream, in_out_vrr,
-   in_out_vrr->btr.inserted_duration_in_us);
-   in_out_vrr->adjust.v_total_max =
-   in_out_vrr->adjust.v_total_min;
}
 }
 
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/21] drm/amd/display: Add back code to allow for rounding error

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index 4887c888bbe7..abd5c9374eb3 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -896,6 +896,17 @@ bool mod_freesync_is_valid_range(struct mod_freesync 
*mod_freesync,
unsigned long long nominal_field_rate_in_uhz =
mod_freesync_calc_nominal_field_rate(stream);
 
+   /* Allow for some rounding error of actual video timing by taking ceil.
+* For example, 144 Hz mode timing may actually be 143.xxx Hz when
+* calculated from pixel rate and vertical/horizontal totals, but
+* this should be allowed instead of blocking FreeSync.
+*/
+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, 100);
+   min_refresh_cap_in_uhz /= 100;
+   max_refresh_cap_in_uhz /= 100;
+   min_refresh_request_in_uhz /= 100;
+   max_refresh_request_in_uhz /= 100;
+
// Check nominal is within range
if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz ||
nominal_field_rate_in_uhz < min_refresh_cap_in_uhz)
@@ -921,7 +932,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync 
*mod_freesync,
 
// For variable range, check for at least 10 Hz range
if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) &&
-   (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 
1000))
+   (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 10))
return false;
 
return true;
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 12/21] drm/amd/display: Do not create memory allocation if stats not enabled

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/modules/stats/stats.c | 26 +--
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/stats/stats.c 
b/drivers/gpu/drm/amd/display/modules/stats/stats.c
index ed5f6809a64e..48e02197919f 100644
--- a/drivers/gpu/drm/amd/display/modules/stats/stats.c
+++ b/drivers/gpu/drm/amd/display/modules/stats/stats.c
@@ -115,18 +115,22 @@ struct mod_stats *mod_stats_create(struct dc *dc)
_data, sizeof(unsigned int), ))
core_stats->enabled = reg_data;
 
-   core_stats->entries = DAL_STATS_ENTRIES_REGKEY_DEFAULT;
-   if (dm_read_persistent_data(dc->ctx, NULL, NULL,
-   DAL_STATS_ENTRIES_REGKEY,
-   _data, sizeof(unsigned int), )) {
-   if (reg_data > DAL_STATS_ENTRIES_REGKEY_MAX)
-   core_stats->entries = DAL_STATS_ENTRIES_REGKEY_MAX;
-   else
-   core_stats->entries = reg_data;
-   }
+   if (core_stats->enabled) {
+   core_stats->entries = DAL_STATS_ENTRIES_REGKEY_DEFAULT;
+   if (dm_read_persistent_data(dc->ctx, NULL, NULL,
+   DAL_STATS_ENTRIES_REGKEY,
+   _data, sizeof(unsigned int), )) {
+   if (reg_data > DAL_STATS_ENTRIES_REGKEY_MAX)
+   core_stats->entries = 
DAL_STATS_ENTRIES_REGKEY_MAX;
+   else
+   core_stats->entries = reg_data;
+   }
 
-   core_stats->time = kzalloc(sizeof(struct stats_time_cache) * 
core_stats->entries,
-   GFP_KERNEL);
+   core_stats->time = kzalloc(sizeof(struct stats_time_cache) * 
core_stats->entries,
+   GFP_KERNEL);
+   } else {
+   core_stats->entries = 0;
+   }
 
if (core_stats->time == NULL)
goto fail_construct;
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: fix clock counter retrieval for node without GPU

2018-04-10 Thread Felix Kuehling
From: Andres Rodriguez 

Currently if a user requests clock counters for a node without a GPU
resource we will always return EINVAL.

Instead if no GPU resource is attached, fill the gpu_clock_counter
argument with zeroes so that we may proceed and return valid CPU
counters.

Signed-off-by: Andres Rodriguez 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index cd679cf..b5e5f0e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -749,12 +749,13 @@ static int kfd_ioctl_get_clock_counters(struct file 
*filep,
struct timespec64 time;
 
dev = kfd_device_by_id(args->gpu_id);
-   if (dev == NULL)
-   return -EINVAL;
-
-   /* Reading GPU clock counter from KGD */
-   args->gpu_clock_counter =
-   dev->kfd2kgd->get_gpu_clock_counter(dev->kgd);
+   if (dev)
+   /* Reading GPU clock counter from KGD */
+   args->gpu_clock_counter =
+   dev->kfd2kgd->get_gpu_clock_counter(dev->kgd);
+   else
+   /* Node without GPU resource */
+   args->gpu_clock_counter = 0;
 
/* No access to rdtsc. Using raw monotonic time */
getrawmonotonic64();
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Fix 64-bit division in hwss_edp_power_control

2018-04-10 Thread Harry Wentland
Signed-off-by: Harry Wentland 
---
 drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index f32ccdfd18a3..3ba057e2a467 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -860,7 +860,7 @@ void hwss_edp_power_control(
dm_get_elapse_time_in_ns(
ctx,
current_ts,
-   
link->link_trace.time_stamp.edp_poweroff) / 100;
+   
div64_u64(link->link_trace.time_stamp.edp_poweroff, 100));
unsigned long long wait_time_ms = 0;
 
/* max 500ms from LCDVDD off to on */
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Cyr, Aric
> From: Haehnle, Nicolai
> Sent: Tuesday, April 10, 2018 13:48
> On 10.04.2018 19:25, Cyr, Aric wrote:
> >> -Original Message-
> >> From: Michel Dänzer [mailto:mic...@daenzer.net]
> >> Sent: Tuesday, April 10, 2018 13:16
> >>
> >> On 2018-04-10 07:13 PM, Cyr, Aric wrote:
>  -Original Message-
>  From: Michel Dänzer [mailto:mic...@daenzer.net]
>  Sent: Tuesday, April 10, 2018 13:06
>  On 2018-04-10 06:26 PM, Cyr, Aric wrote:
> > From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43
> >
> >> For video games we have a similar situation where a frame is rendered
> >> for a certain world time and in the ideal case we would actually
> >> display the frame at this world time.
> >
> > That seems like it would be a poorly written game that flips like
> > that, unless they are explicitly trying to throttle the framerate for
> > some reason.  When a game presents a completed frame, they’d like
> > that to happen as soon as possible.
> 
>  What you're describing is what most games have been doing traditionally.
>  Croteam's research shows that this results in micro-stuttering, because
>  frames may be presented too early. To avoid that, they want to
>  explicitly time each presentation as described by Christian.
> >>>
> >>> Yes, I agree completely.  However that's only truly relevant for fixed
> >>> refreshed rate displays.
> >>
> >> No, it also affects variable refresh; possibly even more in some cases,
> >> because the presentation time is less predictable.
> >
> > Yes, and that's why you don't want to do it when you have variable refresh. 
> >  The hardware in the monitor and GPU will do it for you,
> so why bother?
> 
> I think Michel's point is that the monitor and GPU hardware *cannot*
> really do this, because there's synchronization with audio to take into
> account, which the GPU or monitor don't know about.

How does it work fine today given that all kernel seems to know is 'current' or 
'current+1' vsyncs.  
Presumably the applications somehow schedule all this just fine.
If this works without variable refresh for 60Hz, will it not work for a 
fixed-rate "48Hz" monitor (assuming a 24Hz video)?

> Also, as I wrote separately, there's the case of synchronizing multiple
> monitors.

For multimonitor to work with VRR, they'll have to be timing and flip 
synchronized.
This is impossible for an application to manage, it needs driver/HW control or 
you end up with one display flipping before the other and it looks terrible.
And definitely forget about multiGPU without professional workstation-type 
support needed to sync the displays across adapters.

> > The input to their algorithms will be noisy causing worst estimations.  If 
> > you just present as fast as you can, it'll just work (within
> reason).
> > The majority of gamers want maximum FPS for their games, and there's quite 
> > frequently outrage at a particular game when they are
> limited to something lower that what their monitor could otherwise support 
> (i.e. I don't want my game limited to 30Hz if I have a shiny
> 144Hz gaming display I paid good money for).   Of course, there's always 
> exceptions... but in our experience those are few and far
> between.
> 
> I agree that games most likely shouldn't try to be smart. I'm curious
> about the Croteam findings, but even if they did a really clever thing
> that works better than just telling the display driver "display ASAP
> please", chances are that *most* developers won't do that. And they'll
> most likely get it wrong, so our guidance should really be "games should
> ask for ASAP presentation, and nothing else".

Right, I think this is the 'easy' case and is covered in Harry's initial 
proposal when target_frame_duration_ns = 0.

> However, there *are* legitimate use cases for requesting a specific
> presentation time, and there *is* precedent of APIs that expose such
> features.
>
> Are there any real problems with exposing an absolute target present time?

Realistically, how far into the future are you requesting a presentation time? 
Won't it almost always be something like current_time+1000/video_frame_rate?
If so, why not just tell the driver to set 1000/video_frame_rate and have the 
GPU/monitor create nicely spaced VSYNCs for you that match the source content?

In fact, you probably wouldn't even need to change your video player at all, 
other than having it pass the target_frame_duration_ns.  You could consider 
this a 'hint' as you suggested, since it's cannot be guaranteed in cases your 
driver or HW doesn't support variable refresh.  If the target_frame_duration_ns 
hint is supported/applied, then the video app should have nothing extra to do 
that it wouldn't already do for any arbitrary fixed-refresh rate display.  If 
not supported (say the drm_atomic_check fails with -EINVAL or something), the 
video app falls can stop requesting a fixed target_frame_duration_ns.

A fundamental problem 

Re: [PATCH] drm/amd/display: Don't spam debug messages

2018-04-10 Thread Leo Li



On 2018-04-10 04:44 PM, Harry Wentland wrote:

Ping

On 2018-04-09 02:06 PM, Harry Wentland wrote:

Signed-off-by: Harry Wentland 


Reviewed-by: Leo (Sunpeng) Li 


---
  drivers/gpu/drm/amd/display/include/logger_types.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/include/logger_types.h 
b/drivers/gpu/drm/amd/display/include/logger_types.h
index 4f332e80cecc..b608a0830801 100644
--- a/drivers/gpu/drm/amd/display/include/logger_types.h
+++ b/drivers/gpu/drm/amd/display/include/logger_types.h
@@ -32,7 +32,7 @@
  
  #define DC_LOG_ERROR(...) DRM_ERROR(__VA_ARGS__)

  #define DC_LOG_WARNING(...) DRM_WARN(__VA_ARGS__)
-#define DC_LOG_DEBUG(...) DRM_INFO(__VA_ARGS__)
+#define DC_LOG_DEBUG(...) DRM_DEBUG_KMS(__VA_ARGS__)
  #define DC_LOG_DC(...) DRM_DEBUG_KMS(__VA_ARGS__)
  #define DC_LOG_DTN(...) DRM_DEBUG_KMS(__VA_ARGS__)
  #define DC_LOG_SURFACE(...) pr_debug("[SURFACE]:"__VA_ARGS__)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler

2018-04-10 Thread Felix Kuehling
Signed-off-by: Shaoyun Liu 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|   13 +-
 2 files changed, 1505 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
new file mode 100644
index 000..da09794
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -0,0 +1,1495 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#if 0
+HW (GFX9) source code for CWSR trap handler
+#Version 18 + multiple trap handler
+
+// this performance-optimal version was originally from Seven Xu at SRDC
+
+// Revison #18  --...
+/* Rev History
+** #1. Branch from gc dv.   
//gfxip/gfx9/main/src/test/suites/block/cs/sr/cs_trap_handler.sp3#1,#50, #51, 
#52-53(Skip, Already Fixed by PV), #54-56(merged),#57-58(mergerd, 
skiped-already fixed by PV)
+** #4. SR Memory Layout:
+**  1. VGPR-SGPR-HWREG-{LDS}
+**  2. tba_hi.bits.26 - reconfigured as the first wave in 
tg bits, for defer Save LDS for a threadgroup.. performance concern..
+** #5. Update: 1. Accurate g8sr_ts_save_d timestamp
+** #6. Update: 1. Fix s_barrier usage; 2. VGPR s/r using swizzle 
buffer?(NoNeed, already matched the swizzle pattern, more investigation)
+** #7. Update: 1. don't barrier if noLDS
+** #8. Branch: 1. Branch to ver#0, which is very similar to gc dv version
+**2. Fix SQ issue by s_sleep 2
+** #9. Update: 1. Fix scc restore failed issue, restore wave_status at last
+**2. optimize s_buffer save by burst 16sgprs...
+** #10. Update 1. Optimize restore sgpr by busrt 16 sgprs.
+** #11. Update 1. Add 2 more timestamp for debug version
+** #12. Update 1. Add VGPR SR using DWx4, some case improve and some case drop 
performance
+** #13. Integ  1. Always use MUBUF for PV trap shader...
+** #14. Update 1. s_buffer_store soft clause...
+** #15. Update 1. PERF - sclar write with glc:0/mtype0 to allow L2 combine. 
perf improvement a lot.
+** #16. Update 1. PRRF - UNROLL LDS_DMA got 2500cycle save in IP tree
+** #17. Update 1. FUNC - LDS_DMA has issues while ATC, replace with 
ds_read/buffer_store for save part[TODO restore part]
+**2. PERF - Save LDS before save VGPR to cover LDS save long 
latency...
+** #18. Update 1. FUNC - Implicitly estore STATUS.VCCZ, which is not writable 
by s_setreg_b32
+**2. FUNC - Handle non-CWSR traps
+*/
+
+var G8SR_WDMEM_HWREG_OFFSET = 0
+var G8SR_WDMEM_SGPR_OFFSET  = 128  // in bytes
+
+// Keep definition same as the app shader, These 2 time stamps are part of the 
app shader... Should before any Save and after restore.
+
+var G8SR_DEBUG_TIMESTAMP = 0
+var G8SR_DEBUG_TS_SAVE_D_OFFSET = 40*4 // ts_save_d timestamp offset relative 
to SGPR_SR_memory_offset
+var s_g8sr_ts_save_s   = s[34:35]   // save start
+var s_g8sr_ts_sq_save_msg  = s[36:37]  // The save shader send SAVEWAVE msg to 
spi
+var s_g8sr_ts_spi_wrexec   = s[38:39]  // the SPI write the sr address to SQ
+var s_g8sr_ts_save_d   = s[40:41]   // save end
+var s_g8sr_ts_restore_s = s[42:43]   // restore start
+var s_g8sr_ts_restore_d = s[44:45]   // restore end
+
+var G8SR_VGPR_SR_IN_DWX4 = 0
+var G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4 = 0x0010   // DWx4 stride is 
4*4Bytes
+var G8SR_RESTORE_BUF_RSRC_WORD1_STRIDE_DWx4  = 
G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4
+
+
+/*/
+/* control on how to run the shader */
+/*/
+//any 

[PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet

2018-04-10 Thread Felix Kuehling
kq->queue->properties.write_ptr is a GPU address which can'd be
derefenced in the kernel. Use kq->wptr_kernel instead, which is the
kernel CPU address of the same buffer.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 23e586b..9f38161 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -279,7 +279,7 @@ static void submit_packet(struct kernel_queue *kq)
 
 static void rollback_packet(struct kernel_queue *kq)
 {
-   kq->pending_wptr = *kq->queue->properties.write_ptr;
+   kq->pending_wptr = *kq->wptr_kernel;
 }
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue

2018-04-10 Thread Felix Kuehling
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 10 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 25 +--
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h |  7 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |  9 
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  9 
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |  9 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 7 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 36c9269e..5d7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
}
 }
 
+void write_kernel_doorbell64(void __iomem *db, u64 value)
+{
+   if (db) {
+   WARN(((unsigned long)db & 7) != 0,
+"Unaligned 64-bit doorbell");
+   writeq(value, (u64 __iomem *)db);
+   pr_debug("writing %llu to doorbell address 0x%p\n", value, db);
+   }
+}
+
 unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
struct kfd_process *process,
unsigned int doorbell_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 9f38161..476951d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct 
kfd_dev *dev,
kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
 
-   retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel),
+   retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size,
>wptr_mem);
 
if (retval != 0)
@@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
size_t available_size;
size_t queue_size_dwords;
uint32_t wptr, rptr;
+   uint64_t wptr64;
unsigned int *queue_address;
 
/* When rptr == wptr, the buffer is empty.
@@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 * the opposite. So we can only use up to queue_size_dwords - 1 dwords.
 */
rptr = *kq->rptr_kernel;
-   wptr = *kq->wptr_kernel;
+   wptr = kq->pending_wptr;
+   wptr64 = kq->pending_wptr64;
queue_address = (unsigned int *)kq->pq_kernel_addr;
queue_size_dwords = kq->queue->properties.queue_size / 4;
 
@@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
while (wptr > 0) {
queue_address[wptr] = kq->nop_packet;
wptr = (wptr + 1) % queue_size_dwords;
+   wptr64++;
}
}
 
*buffer_ptr = _address[wptr];
kq->pending_wptr = wptr + packet_size_in_dwords;
+   kq->pending_wptr64 = wptr64 + packet_size_in_dwords;
 
return 0;
 
@@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq)
pr_debug("\n");
 #endif
 
-   *kq->wptr_kernel = kq->pending_wptr;
-   write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
-   kq->pending_wptr);
+   kq->ops_asic_specific.submit_packet(kq);
 }
 
 static void rollback_packet(struct kernel_queue *kq)
 {
-   kq->pending_wptr = *kq->wptr_kernel;
+   if (kq->dev->device_info->doorbell_size == 8) {
+   kq->pending_wptr64 = *kq->wptr64_kernel;
+   kq->pending_wptr = *kq->wptr_kernel %
+   (kq->queue->properties.queue_size / 4);
+   } else {
+   kq->pending_wptr = *kq->wptr_kernel;
+   }
 }
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
@@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
case CHIP_HAWAII:
kernel_queue_init_cik(>ops_asic_specific);
break;
+
+   case CHIP_VEGA10:
+   case CHIP_RAVEN:
+   kernel_queue_init_v9(>ops_asic_specific);
+   break;
default:
WARN(1, "Unexpected ASIC family %u",
 dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
index 5940531..97aff20 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
@@ -72,6 +72,7 @@ struct kernel_queue {
struct kfd_dev  *dev;
struct mqd_manager  *mqd;
struct queue*queue;
+   uint64_tpending_wptr64;
uint32_t   

[PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager

2018-04-10 Thread Felix Kuehling
Signed-off-by: John Bridgman 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Makefile|  2 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 10 ++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  2 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   | 84 ++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c|  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  5 ++
 6 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 094b591..ff8b5aa 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -35,7 +35,7 @@ amdkfd-y  := kfd_module.o kfd_device.o kfd_chardev.o 
kfd_topology.o \
kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
kfd_packet_manager.o kfd_process_queue_manager.o \
kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
-   kfd_device_queue_manager_vi.o \
+   kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \
kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 500f022..9af94b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1386,7 +1386,10 @@ static bool set_cache_memory_policy(struct 
device_queue_manager *dqm,
   void __user *alternate_aperture_base,
   uint64_t alternate_aperture_size)
 {
-   bool retval;
+   bool retval = true;
+
+   if (!dqm->asic_ops.set_cache_memory_policy)
+   return retval;
 
mutex_lock(>lock);
 
@@ -1655,6 +1658,11 @@ struct device_queue_manager 
*device_queue_manager_init(struct kfd_dev *dev)
case CHIP_POLARIS11:
device_queue_manager_init_vi_tonga(>asic_ops);
break;
+
+   case CHIP_VEGA10:
+   case CHIP_RAVEN:
+   device_queue_manager_init_v9(>asic_ops);
+   break;
default:
WARN(1, "Unexpected ASIC family %u",
 dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 412beff..59a6b19 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -200,6 +200,8 @@ void device_queue_manager_init_vi(
struct device_queue_manager_asic_ops *asic_ops);
 void device_queue_manager_init_vi_tonga(
struct device_queue_manager_asic_ops *asic_ops);
+void device_queue_manager_init_v9(
+   struct device_queue_manager_asic_ops *asic_ops);
 void program_sh_mem_settings(struct device_queue_manager *dqm,
struct qcm_process_device *qpd);
 unsigned int get_queues_num(struct device_queue_manager *dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
new file mode 100644
index 000..79e5bcf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "kfd_device_queue_manager.h"
+#include "vega10_enum.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+
+static int update_qpd_v9(struct device_queue_manager *dqm,
+

[PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions

2018-04-10 Thread Felix Kuehling
Signed-off-by: Shaoyun Liu 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |   7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 331 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c |  18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c  |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h  | 583 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   6 +
 6 files changed, 937 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 0d02422..52b3c1b 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -31,9 +31,10 @@ amdkfd-y := kfd_module.o kfd_device.o kfd_chardev.o 
kfd_topology.o \
kfd_process.o kfd_queue.o kfd_mqd_manager.o \
kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
kfd_kernel_queue.o kfd_kernel_queue_cik.o \
-   kfd_kernel_queue_vi.o kfd_packet_manager.o \
-   kfd_process_queue_manager.o kfd_device_queue_manager.o \
-   kfd_device_queue_manager_cik.o kfd_device_queue_manager_vi.o \
+   kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
+   kfd_packet_manager.o kfd_process_queue_manager.o \
+   kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
+   kfd_device_queue_manager_vi.o \
kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
new file mode 100644
index 000..ece7d59
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -0,0 +1,331 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "kfd_kernel_queue.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers_ai.h"
+#include "kfd_pm4_opcodes.h"
+
+static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
+   enum kfd_queue_type type, unsigned int queue_size);
+static void uninitialize_v9(struct kernel_queue *kq);
+
+void kernel_queue_init_v9(struct kernel_queue_ops *ops)
+{
+   ops->initialize = initialize_v9;
+   ops->uninitialize = uninitialize_v9;
+}
+
+static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
+   enum kfd_queue_type type, unsigned int queue_size)
+{
+   int retval;
+
+   retval = kfd_gtt_sa_allocate(dev, PAGE_SIZE, >eop_mem);
+   if (retval)
+   return false;
+
+   kq->eop_gpu_addr = kq->eop_mem->gpu_addr;
+   kq->eop_kernel_addr = kq->eop_mem->cpu_ptr;
+
+   memset(kq->eop_kernel_addr, 0, PAGE_SIZE);
+
+   return true;
+}
+
+static void uninitialize_v9(struct kernel_queue *kq)
+{
+   kfd_gtt_sa_free(kq->dev, kq->eop_mem);
+}
+
+static int pm_map_process_v9(struct packet_manager *pm,
+   uint32_t *buffer, struct qcm_process_device *qpd)
+{
+   struct pm4_mes_map_process *packet;
+   uint64_t vm_page_table_base_addr =
+   (uint64_t)(qpd->page_table_base) << 12;
+
+   packet = (struct pm4_mes_map_process *)buffer;
+   memset(buffer, 0, sizeof(struct pm4_mes_map_process));
+
+   packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS,
+   sizeof(struct pm4_mes_map_process));
+   packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+   packet->bitfields2.process_quantum = 1;
+   packet->bitfields2.pasid = qpd->pqm->process->pasid;
+   packet->bitfields14.gds_size = qpd->gds_size;

[PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs

2018-04-10 Thread Felix Kuehling
From: welu 

Report failure to enable atomics only on GPUs that require them.
This allows GPUs that don't require atomics to function, but can
benefit if they are available. This is the case for Vega10, which
doesn't use atomics for basic functioning of the MEC, AQL and HWS
microcode. So it can work without atomics. But shader programs can
still use atomic instructions on systems that support PCIe atomics.

Signed-off-by: welu 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 053f1d0..ea95f3b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -290,7 +290,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
struct pci_dev *pdev, const struct kfd2kgd_calls *f2g)
 {
struct kfd_dev *kfd;
-
+   int ret;
const struct kfd_device_info *device_info =
lookup_device_info(pdev->device);
 
@@ -299,19 +299,18 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
return NULL;
}
 
-   if (device_info->needs_pci_atomics) {
-   /* Allow BIF to recode atomics to PCIe 3.0
-* AtomicOps. 32 and 64-bit requests are possible and
-* must be supported.
-*/
-   if (pci_enable_atomic_ops_to_root(pdev,
-   PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
-   PCI_EXP_DEVCAP2_ATOMIC_COMP64) < 0) {
-   dev_info(kfd_device,
-   "skipped device %x:%x, PCI rejects atomics",
-pdev->vendor, pdev->device);
-   return NULL;
-   }
+   /* Allow BIF to recode atomics to PCIe 3.0 AtomicOps.
+* 32 and 64-bit requests are possible and must be
+* supported.
+*/
+   ret = pci_enable_atomic_ops_to_root(pdev,
+   PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+   PCI_EXP_DEVCAP2_ATOMIC_COMP64);
+   if (device_info->needs_pci_atomics && ret  < 0) {
+   dev_info(kfd_device,
+"skipped device %x:%x, PCI rejects atomics\n",
+pdev->vendor, pdev->device);
+   return NULL;
}
 
kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager

2018-04-10 Thread Felix Kuehling
Signed-off-by: John Bridgman 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Makefile |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c|   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |   3 +
 5 files changed, 451 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index 52b3c1b..094b591 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -30,6 +30,7 @@ amdkfd-y  := kfd_module.o kfd_device.o kfd_chardev.o 
kfd_topology.o \
kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
kfd_process.o kfd_queue.o kfd_mqd_manager.o \
kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
+   kfd_mqd_manager_v9.o \
kfd_kernel_queue.o kfd_kernel_queue_cik.o \
kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
kfd_packet_manager.o kfd_process_queue_manager.o \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f563acb..c368ce3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int 
size,
if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
return -ENOMEM;
 
-   *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
+   *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
if ((*mem_obj) == NULL)
return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index ee7061e..4b8eb50 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
case CHIP_POLARIS10:
case CHIP_POLARIS11:
return mqd_manager_init_vi_tonga(type, dev);
+   case CHIP_VEGA10:
+   case CHIP_RAVEN:
+   return mqd_manager_init_v9(type, dev);
default:
WARN(1, "Unexpected ASIC family %u",
 dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
new file mode 100644
index 000..684054f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -0,0 +1,443 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+#include "v9_structs.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+
+static inline struct v9_mqd *get_mqd(void *mqd)
+{
+   return (struct v9_mqd *)mqd;
+}
+
+static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
+{
+   return (struct v9_sdma_mqd *)mqd;
+}
+
+static int init_mqd(struct mqd_manager *mm, void **mqd,
+   struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+   struct queue_properties *q)
+{
+   int retval;
+   uint64_t addr;
+   struct v9_mqd *m;
+   struct kfd_dev *kfd = mm->dev;
+
+   /* From V9,  for CWSR, the control stack is located on the next page
+* boundary after the mqd, we will use the gtt allocation function
+* instead of sub-allocation function.
+*/
+   if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
+   

[PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15

2018-04-10 Thread Felix Kuehling
Allocate doorbells according to the doorbell routing information on
SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the
queue_id as the doorbell ID to maintain compatibility with the Thunk.

Signed-off-by: Shaoyun Liu 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 82 --
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  | 12 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 11 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 32 +
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +++-
 6 files changed, 139 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f6b35f4..1a4d8dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -295,6 +295,13 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
args->doorbell_offset <<= PAGE_SHIFT;
+   if (KFD_IS_SOC15(dev->device_info->asic_family))
+   /* On SOC15 ASICs, doorbell allocation must be
+* per-device, and independent from the per-process
+* queue_id. Return the doorbell offset within the
+* doorbell aperture to user mode.
+*/
+   args->doorbell_offset |= q_properties.doorbell_off;
 
mutex_unlock(>mutex);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index d55d29d..e9c72d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -110,6 +110,57 @@ void program_sh_mem_settings(struct device_queue_manager 
*dqm,
qpd->sh_mem_bases);
 }
 
+static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+{
+   struct kfd_dev *dev = qpd->dqm->dev;
+
+   if (!KFD_IS_SOC15(dev->device_info->asic_family)) {
+   /* On pre-SOC15 chips we need to use the queue ID to
+* preserve the user mode ABI.
+*/
+   q->doorbell_id = q->properties.queue_id;
+   } else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+   /* For SDMA queues on SOC15, use static doorbell
+* assignments based on the engine and queue.
+*/
+   q->doorbell_id = dev->shared_resources.sdma_doorbell
+   [q->properties.sdma_engine_id]
+   [q->properties.sdma_queue_id];
+   } else {
+   /* For CP queues on SOC15 reserve a free doorbell ID */
+   unsigned int found;
+
+   found = find_first_zero_bit(qpd->doorbell_bitmap,
+   KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+   if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+   pr_debug("No doorbells available");
+   return -EBUSY;
+   }
+   set_bit(found, qpd->doorbell_bitmap);
+   q->doorbell_id = found;
+   }
+
+   q->properties.doorbell_off =
+   kfd_doorbell_id_to_offset(dev, q->process,
+ q->doorbell_id);
+
+   return 0;
+}
+
+static void deallocate_doorbell(struct qcm_process_device *qpd,
+   struct queue *q)
+{
+   unsigned int old;
+   struct kfd_dev *dev = qpd->dqm->dev;
+
+   if (!KFD_IS_SOC15(dev->device_info->asic_family) ||
+   q->properties.type == KFD_QUEUE_TYPE_SDMA)
+   return;
+
+   old = test_and_clear_bit(q->doorbell_id, qpd->doorbell_bitmap);
+   WARN_ON(!old);
+}
+
 static int allocate_vmid(struct device_queue_manager *dqm,
struct qcm_process_device *qpd,
struct queue *q)
@@ -301,10 +352,14 @@ static int create_compute_queue_nocpsch(struct 
device_queue_manager *dqm,
if (retval)
return retval;
 
+   retval = allocate_doorbell(qpd, q);
+   if (retval)
+   goto out_deallocate_hqd;
+
retval = mqd->init_mqd(mqd, >mqd, >mqd_mem_obj,
>gart_mqd_addr, >properties);
if (retval)
-   goto out_deallocate_hqd;
+   goto out_deallocate_doorbell;
 
pr_debug("Loading mqd to hqd on pipe %d, queue %d\n",
q->pipe, q->queue);
@@ -324,6 +379,8 @@ static int create_compute_queue_nocpsch(struct 
device_queue_manager *dqm,
 
 out_uninit_mqd:
mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+out_deallocate_doorbell:
+  

[PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up)

2018-04-10 Thread Felix Kuehling
This condition was missed in a previous commit with the same title.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 66852de..f16ac2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -307,9 +307,7 @@ int kfd_init_apertures(struct kfd_process *process)
struct kfd_process_device *pdd;
 
/*Iterating over all devices*/
-   while (kfd_topology_enum_kfd_devices(id, ) == 0 &&
-   id < NUM_OF_SUPPORTED_GPUS) {
-
+   while (kfd_topology_enum_kfd_devices(id, ) == 0) {
if (!dev) {
id++; /* Skip non GPU devices */
continue;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info

2018-04-10 Thread Felix Kuehling
* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology
* Report cache information in topology (duplicates GFXv8 info for now)
* Add device info for Vega10 support in KFD

Raven is not enabled at this time as it needs additional changes in
DQM to work with a single SDMA engine.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 11 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 37 +++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  6 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  1 +
 4 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 4f126ef..296b3f2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -132,6 +132,9 @@ static struct kfd_gpu_cache_info carrizo_cache_info[] = {
 #define fiji_cache_info  carrizo_cache_info
 #define polaris10_cache_info carrizo_cache_info
 #define polaris11_cache_info carrizo_cache_info
+/* TODO - check & update Vega10 cache details */
+#define vega10_cache_info carrizo_cache_info
+#define raven_cache_info carrizo_cache_info
 
 static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
struct crat_subtype_computeunit *cu)
@@ -603,6 +606,14 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
pcache_info = polaris11_cache_info;
num_of_cache_types = ARRAY_SIZE(polaris11_cache_info);
break;
+   case CHIP_VEGA10:
+   pcache_info = vega10_cache_info;
+   num_of_cache_types = ARRAY_SIZE(vega10_cache_info);
+   break;
+   case CHIP_RAVEN:
+   pcache_info = raven_cache_info;
+   num_of_cache_types = ARRAY_SIZE(raven_cache_info);
+   break;
default:
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index ea95f3b..fb4a72d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -182,6 +182,34 @@ static const struct kfd_device_info polaris11_device_info 
= {
.needs_pci_atomics = true,
 };
 
+static const struct kfd_device_info vega10_device_info = {
+   .asic_family = CHIP_VEGA10,
+   .max_pasid_bits = 16,
+   .max_no_of_hqd  = 24,
+   .doorbell_size  = 8,
+   .ih_ring_entry_size = 8 * sizeof(uint32_t),
+   .event_interrupt_class = _interrupt_class_v9,
+   .num_of_watch_points = 4,
+   .mqd_size_aligned = MQD_SIZE_ALIGNED,
+   .supports_cwsr = true,
+   .needs_iommu_device = false,
+   .needs_pci_atomics = false,
+};
+
+static const struct kfd_device_info vega10_vf_device_info = {
+   .asic_family = CHIP_VEGA10,
+   .max_pasid_bits = 16,
+   .max_no_of_hqd  = 24,
+   .doorbell_size  = 8,
+   .ih_ring_entry_size = 8 * sizeof(uint32_t),
+   .event_interrupt_class = _interrupt_class_v9,
+   .num_of_watch_points = 4,
+   .mqd_size_aligned = MQD_SIZE_ALIGNED,
+   .supports_cwsr = true,
+   .needs_iommu_device = false,
+   .needs_pci_atomics = false,
+};
+
 
 struct kfd_deviceid {
unsigned short did;
@@ -261,6 +289,15 @@ static const struct kfd_deviceid supported_devices[] = {
{ 0x67EB, _device_info }, /* Polaris11 */
{ 0x67EF, _device_info }, /* Polaris11 */
{ 0x67FF, _device_info }, /* Polaris11 */
+   { 0x6860, _device_info },/* Vega10 */
+   { 0x6861, _device_info },/* Vega10 */
+   { 0x6862, _device_info },/* Vega10 */
+   { 0x6863, _device_info },/* Vega10 */
+   { 0x6864, _device_info },/* Vega10 */
+   { 0x6867, _device_info },/* Vega10 */
+   { 0x6868, _device_info },/* Vega10 */
+   { 0x686C, _vf_device_info }, /* Vega10  vf*/
+   { 0x687F, _device_info },/* Vega10 */
 };
 
 static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index ac28abc..bc95d4df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1239,6 +1239,12 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
break;
+   case CHIP_VEGA10:
+   case CHIP_RAVEN:
+   dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 <<
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+   HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+   break;
default:
WARN(1, "Unexpected ASIC family %u",
 dev->gpu->device_info->asic_family);
diff --git 

Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Manasi Navare
On Tue, Apr 10, 2018 at 11:03:02AM -0400, Harry Wentland wrote:
> Adding Anthony and Aric who've been working on Freesync with DC on other OSes 
> for a while.
> 
> On 2018-04-09 05:45 PM, Manasi Navare wrote:
> > Thanks for initiating the discussion. Find my comments below:
> > 
> > On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
> >> Adding dri-devel, which I should've included from the start.
> >>
> >> On 2018-04-09 03:56 PM, Harry Wentland wrote:
> >>> === What is adaptive sync and VRR? ===
> >>>
> >>> Adaptive sync has been part of the DisplayPort spec for a while now and 
> >>> allows graphics adapters to drive displays with varying frame timings. 
> >>> VRR (variable refresh rate) is essentially the same, but defined for HDMI.
> >>>
> >>>
> >>>
> >>> === Why allow variable frame timings? ===
> >>>
> >>> Variable render times don't align with fixed refresh rates, leading to
> >>> stuttering, tearing, and/or input lag.
> >>>
> >>> e.g. (rc = render completion, dr = display refresh)
> >>>
> >>> rc   B  CDE  F
> >>> drA   B   C   C   D   E   F
> >>>
> >>> ^ ^
> >>> frame missed 
> >>>repeated   display
> >>> twice refresh   
> >>>
> >>>
> >>>
> >>> === Other use cases of adaptive sync 
> >>>
> >>> Beside the variable render case, adaptive sync also allows adjustment of 
> >>> refresh rates without a mode change. One such use case would be 24 Hz 
> >>> video.
> >>>
> > 
> > One of the the advantages here when the render speed is slower than the 
> > display refresh rate, since we are stretching the vertical blanking interval
> > the display adapters will follow "draw fast and then go idle" approach. 
> > This gives power savings when render rate is lower than the display refresh 
> > rate.
> 
> Are you talking about a use case, such as an idle desktop, where the renders 
> are quite sporadic?
>

I was refering to a case where the render rate is lower say 24Hz but the 
display rate is fixed 60Hz, that means we are pretty much displaying the same 
frame
twice. But with Adaptive Sync, the display rate would be lowered to 24hz and 
the vertical blanking time will be stretched where instead of drawing the
same frame twice, the system is now idle in that extra blanking time thus 
giving some power savings.
 
> >  
> >>>
> >>>
> >>> === A DRM render API to support variable refresh rates ===
> >>>
> >>> In order to benefit from adaptive sync and VRR userland needs a way to 
> >>> let us know whether to vary frame timings or to target a different frame 
> >>> time. These can be provided as atomic properties on a CRTC:
> >>>  * bool   variable_refresh_compatible
> >>>  * inttarget_frame_duration_ns (nanosecond frame duration)
> >>>
> >>> This gives us the following cases:
> >>>
> >>> variable_refresh_compatible = 0, target_frame_duration_ns = 0
> >>>  * drive monitor at timing's normal refresh rate
> >>>
> >>> variable_refresh_compatible = 1, target_frame_duration_ns = 0
> >>>  * send new frame to monitor as soon as it's available, if within min/max 
> >>> of monitor's reported capabilities
> >>>
> >>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
> >>>  * send new frame to monitor with the specified target_frame_duration_ns
> >>>
> >>> When a target_frame_duration_ns or variable_refresh_compatible cannot be 
> >>> supported the atomic check will reject the commit.
> >>>
> > 
> > What I would like is two sets of properties on a CRTC or preferably on a 
> > connector:
> > 
> > KMD properties that UMD can query:
> > * vrr_capable -  This will be an immutable property for exposing hardware's 
> > capability of supporting VRR. This will be set by the kernel after 
> > reading the EDID mode information and monitor range capabilities.
> > * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max refresh 
> > rates supported.
> > These properties are optional and will be created and attached to the 
> > DP/eDP connector when the connector
> > is getting intialized.
> > 
> 
> If we're talking about the properties from the EDID these might not 
> necessarily align with a currently selected mode, which might have a refresh 
> rate lower than the vrr_refresh_max, requiring us to cap it at that. In some 
> scenarios we also might do low framerate compensation [1] where we do magic 
> to allow the framerate to drop below the supported range.

Actually the way I have coded that currently is span through all the EDID modes 
and for each mode with the same resolution but different refresh rates 
supported, create a VRR field part of drm_mode_config structure that would have
refresh_max and min. So that way we store the max and min per mode as opposed 
to a per crtc/connector property.

> 
> I think if a vrr_refresh_max/min are exposed to UMD these should really be 
> only for informational purposes, in 

Re: [PATCH 00/21] GFXv9/Vega10 support for KFD

2018-04-10 Thread Oded Gabbay
Hi Felix,
Just to let you know that I am currently on vacation and will be back home
only on 4/21 so all patch reviews from my side will be done after that
date.

Thanks,
Oded

On Tue, 10 Apr 2018, 17:33 Felix Kuehling  wrote:

> This patch series adds support for GFXv9 GPUs to KFD. In this series it
> enables support for Vega10. Raven support requires some extra work that
> will follow shortly, but Raven support is already included and I didn't
> go out of my way to keep it out.
>
> Felix Kuehling (19):
>   drm/amdgpu: Remove unused interface from kfd2kgd interface
>   drm/amd: Update GFXv9 SDMA MQD structure
>   drm/amdgpu: Add GFXv9 TLB invalidation packet definition
>   drm/amdgpu: Add GFXv9 kfd2kgd interface functions
>   drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
>   drm/amdkfd: Make doorbell size ASIC-dependent
>   drm/amdkfd: Implement doorbell allocation for SOC15
>   drm/amdkfd: Move packet writer functions into ASIC-specific file
>   drm/amdkfd: Add GFXv9 PM4 packet writer functions
>   drm/amdkfd: Add GFXv9 MQD manager
>   drm/amdkfd: Add GFXv9 device queue manager
>   drm/amdkfd: Add SOC15 interrupt processing support
>   drm/amdkfd: Fix goto usage
>   drm/amdkfd: Fix kernel queue rollback_packet
>   drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
>   drm/amdkfd: Remove limit on number of GPUs (follow-up)
>   drm/amdkfd: Support flat memory apertures for GFXv9
>   drm/amdkfd: Add GFXv9 CWSR trap handler
>   drm/amdkfd: Add Vega10 topology and device info
>
> Harish Kasiviswanathan (1):
>   drm/amdkfd: Clean up KFD_MMAP_ offset handling
>
> welu (1):
>   drm/amdkfd: Try to enable atomics for all GPUs
>
>  MAINTAINERS|2 +
>  drivers/gpu/drm/amd/amdgpu/Makefile|3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   26 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |1 +
>  drivers/gpu/drm/amd/amdgpu/soc15d.h|5 +
>  drivers/gpu/drm/amd/amdkfd/Makefile|   10 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495
> 
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |   42 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  |   11 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c|   89 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |2 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |   65 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c|2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c   |  119 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c|   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c  |   39 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h  |7 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |9 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c|5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c   |3 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c|  443 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c|  385 +
>  drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h|  583 
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  106 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c   |   40 +-
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c  |6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h  |1 +
>  drivers/gpu/drm/amd/amdkfd/soc15_int.h |   47 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h|   20 +-
>  drivers/gpu/drm/amd/include/v9_structs.h   |   48 +-
>  39 files changed, 5118 insertions(+), 501 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
>  create mode 100644
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h
>
> --
> 2.7.4
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

[PATCH 08/21] drm/amd/display: Fix bug that causes black screen

2018-04-10 Thread Harry Wentland
From: Anthony Koo 

Ignore MSA bit on DP display is usually set during SetTimings, but
there was a case where the module thought refresh rate was not valid
and ignore MSA bit was not set.

Later, a valid refresh rate range was requested but since ignore MSA bit
not set, it caused black screen.

Issue if with how the module checked for VRR support. Fix up that logic.
DM should call new valid_range function to determine if timing is supported.

Signed-off-by: Anthony Koo 
Reviewed-by: Aric Cyr 
Acked-by: Harry Wentland 
---
 .../gpu/drm/amd/display/modules/freesync/freesync.c| 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index be6a6c63b4cc..4887c888bbe7 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -613,7 +613,6 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
 {
struct core_freesync *core_freesync = NULL;
unsigned long long nominal_field_rate_in_uhz = 0;
-   bool nominal_field_rate_in_range = true;
unsigned int refresh_range = 0;
unsigned int min_refresh_in_uhz = 0;
unsigned int max_refresh_in_uhz = 0;
@@ -638,15 +637,6 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
if (max_refresh_in_uhz > nominal_field_rate_in_uhz)
max_refresh_in_uhz = nominal_field_rate_in_uhz;
 
-   /* Allow for some rounding error of actual video timing by taking ceil.
-* For example, 144 Hz mode timing may actually be 143.xxx Hz when
-* calculated from pixel rate and vertical/horizontal totals, but
-* this should be allowed instead of blocking FreeSync.
-*/
-   if ((min_refresh_in_uhz / 100) >
-   ((nominal_field_rate_in_uhz + 100 - 1) / 100))
-   nominal_field_rate_in_range = false;
-
// Full range may be larger than current video timing, so cap at nominal
if (min_refresh_in_uhz > nominal_field_rate_in_uhz)
min_refresh_in_uhz = nominal_field_rate_in_uhz;
@@ -658,10 +648,14 @@ void mod_freesync_build_vrr_params(struct mod_freesync 
*mod_freesync,
 
in_out_vrr->state = in_config->state;
 
-   if ((in_config->state == VRR_STATE_UNSUPPORTED) ||
-   (!nominal_field_rate_in_range)) {
+   if (in_config->state == VRR_STATE_UNSUPPORTED) {
in_out_vrr->state = VRR_STATE_UNSUPPORTED;
in_out_vrr->supported = false;
+   in_out_vrr->adjust.v_total_min = stream->timing.v_total;
+   in_out_vrr->adjust.v_total_max = stream->timing.v_total;
+
+   return;
+
} else {
in_out_vrr->min_refresh_in_uhz = min_refresh_in_uhz;
in_out_vrr->max_duration_in_us =
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9

2018-04-10 Thread Felix Kuehling
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 115 ---
 1 file changed, 87 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index f16ac2b..97d5423 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -275,23 +275,35 @@
  * for FLAT_* / S_LOAD operations.
  */
 
-#define MAKE_GPUVM_APP_BASE(gpu_num) \
+#define MAKE_GPUVM_APP_BASE_VI(gpu_num) \
(((uint64_t)(gpu_num) << 61) + 0x1L)
 
 #define MAKE_GPUVM_APP_LIMIT(base, size) \
(((uint64_t)(base) & 0xFF00UL) + (size) - 1)
 
-#define MAKE_SCRATCH_APP_BASE() \
+#define MAKE_SCRATCH_APP_BASE_VI() \
(((uint64_t)(0x1UL) << 61) + 0x1L)
 
 #define MAKE_SCRATCH_APP_LIMIT(base) \
(((uint64_t)base & 0xUL) | 0x)
 
-#define MAKE_LDS_APP_BASE() \
+#define MAKE_LDS_APP_BASE_VI() \
(((uint64_t)(0x1UL) << 61) + 0x0)
 #define MAKE_LDS_APP_LIMIT(base) \
(((uint64_t)(base) & 0xUL) | 0x)
 
+/* On GFXv9 the LDS and scratch apertures are programmed independently
+ * using the high 16 bits of the 64-bit virtual address. They must be
+ * in the hole, which will be the case as long as the high 16 bits are
+ * not 0.
+ *
+ * The aperture sizes are still 4GB implicitly.
+ *
+ * A GPUVM aperture is not applicable on GFXv9.
+ */
+#define MAKE_LDS_APP_BASE_V9() ((uint64_t)(0x1UL) << 48)
+#define MAKE_SCRATCH_APP_BASE_V9() ((uint64_t)(0x2UL) << 48)
+
 /* User mode manages most of the SVM aperture address space. The low
  * 16MB are reserved for kernel use (CWSR trap handler and kernel IB
  * for now).
@@ -300,6 +312,55 @@
 #define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE)
 #define SVM_IB_BASE   (SVM_CWSR_BASE - PAGE_SIZE)
 
+static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id)
+{
+   /*
+* node id couldn't be 0 - the three MSB bits of
+* aperture shoudn't be 0
+*/
+   pdd->lds_base = MAKE_LDS_APP_BASE_VI();
+   pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+
+   if (!pdd->dev->device_info->needs_iommu_device) {
+   /* dGPUs: SVM aperture starting at 0
+* with small reserved space for kernel.
+* Set them to CANONICAL addresses.
+*/
+   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_limit =
+   pdd->dev->shared_resources.gpuvm_size - 1;
+   } else {
+   /* set them to non CANONICAL addresses, and no SVM is
+* allocated.
+*/
+   pdd->gpuvm_base = MAKE_GPUVM_APP_BASE_VI(id + 1);
+   pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base,
+   pdd->dev->shared_resources.gpuvm_size);
+   }
+
+   pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
+   pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+}
+
+static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id)
+{
+   pdd->lds_base = MAKE_LDS_APP_BASE_V9();
+   pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+
+   /* Raven needs SVM to support graphic handle, etc. Leave the small
+* reserved space before SVM on Raven as well, even though we don't
+* have to.
+* Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they
+* are used in Thunk to reserve SVM.
+*/
+   pdd->gpuvm_base = SVM_USER_BASE;
+   pdd->gpuvm_limit =
+   pdd->dev->shared_resources.gpuvm_size - 1;
+
+   pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
+   pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+}
+
 int kfd_init_apertures(struct kfd_process *process)
 {
uint8_t id  = 0;
@@ -316,7 +377,7 @@ int kfd_init_apertures(struct kfd_process *process)
pdd = kfd_create_process_device_data(dev, process);
if (!pdd) {
pr_err("Failed to create process device data\n");
-   return -1;
+   return -ENOMEM;
}
/*
 * For 64 bit process apertures will be statically reserved in
@@ -328,32 +389,30 @@ int kfd_init_apertures(struct kfd_process *process)
pdd->gpuvm_base = pdd->gpuvm_limit = 0;
pdd->scratch_base = pdd->scratch_limit = 0;
} else {
-   /* Same LDS and scratch apertures can be used
-* on all GPUs. This allows using more dGPUs
-* than placement options for apertures.
-*/
-   pdd->lds_base = MAKE_LDS_APP_BASE();
-   pdd->lds_limit = 

[PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file

2018-04-10 Thread Felix Kuehling
This is in preparation for GFXv9 (Vega10) which uses incompatible PM4
packet formats from previous ASIC generations.

Signed-off-by: Shaoyun Liu 
Signed-off-by: Felix Kuehling 
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   | 310 +
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 381 -
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  35 +-
 4 files changed, 420 insertions(+), 316 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e9c72d8..500f022 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -196,15 +196,19 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 static int flush_texture_cache_nocpsch(struct kfd_dev *kdev,
struct qcm_process_device *qpd)
 {
-   uint32_t len;
+   const struct packet_manager_funcs *pmf = qpd->dqm->packets.pmf;
+   int ret;
 
if (!qpd->ib_kaddr)
return -ENOMEM;
 
-   len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+   ret = pmf->release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+   if (ret)
+   return ret;
 
return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid,
-   qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len);
+   qpd->ib_base, (uint32_t *)qpd->ib_kaddr,
+   pmf->release_mem_size / sizeof(uint32_t));
 }
 
 static void deallocate_vmid(struct device_queue_manager *dqm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index f1d4828..7ee326f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -22,6 +22,9 @@
  */
 
 #include "kfd_kernel_queue.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers_vi.h"
+#include "kfd_pm4_opcodes.h"
 
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
enum kfd_queue_type type, unsigned int queue_size);
@@ -54,3 +57,310 @@ static void uninitialize_vi(struct kernel_queue *kq)
 {
kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
+
+static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
+{
+   union PM4_MES_TYPE_3_HEADER header;
+
+   header.u32All = 0;
+   header.opcode = opcode;
+   header.count = packet_size / 4 - 2;
+   header.type = PM4_TYPE_3;
+
+   return header.u32All;
+}
+
+static int pm_map_process_vi(struct packet_manager *pm, uint32_t *buffer,
+   struct qcm_process_device *qpd)
+{
+   struct pm4_mes_map_process *packet;
+
+   packet = (struct pm4_mes_map_process *)buffer;
+
+   memset(buffer, 0, sizeof(struct pm4_mes_map_process));
+
+   packet->header.u32All = build_pm4_header(IT_MAP_PROCESS,
+   sizeof(struct pm4_mes_map_process));
+   packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+   packet->bitfields2.process_quantum = 1;
+   packet->bitfields2.pasid = qpd->pqm->process->pasid;
+   packet->bitfields3.page_table_base = qpd->page_table_base;
+   packet->bitfields10.gds_size = qpd->gds_size;
+   packet->bitfields10.num_gws = qpd->num_gws;
+   packet->bitfields10.num_oac = qpd->num_oac;
+   packet->bitfields10.num_queues = (qpd->is_debug) ? 0 : qpd->queue_count;
+
+   packet->sh_mem_config = qpd->sh_mem_config;
+   packet->sh_mem_bases = qpd->sh_mem_bases;
+   packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base;
+   packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit;
+
+   packet->sh_hidden_private_base_vmid = qpd->sh_hidden_private_base;
+
+   packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area);
+   packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area);
+
+   return 0;
+}
+
+static int pm_runlist_vi(struct packet_manager *pm, uint32_t *buffer,
+   uint64_t ib, size_t ib_size_in_dwords, bool chain)
+{
+   struct pm4_mes_runlist *packet;
+   int concurrent_proc_cnt = 0;
+   struct kfd_dev *kfd = pm->dqm->dev;
+
+   if (WARN_ON(!ib))
+   return -EFAULT;
+
+   /* Determine the number of processes to map together to HW:
+* it can not exceed the number of VMIDs available to the
+* scheduler, and it is determined by the smaller of the number
+* of processes in the runlist and kfd module parameter
+* hws_max_conc_proc.
+* Note: the arbitration between the number of VMIDs and
+* hws_max_conc_proc has been done in
+* kgd2kfd_device_init().
+*/
+   concurrent_proc_cnt = 

[PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support

2018-04-10 Thread Felix Kuehling
Signed-off-by: Shaoyun Liu 
Signed-off-by: Oak Zeng 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/Makefile |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 84 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  2 +
 drivers/gpu/drm/amd/amdkfd/soc15_int.h  | 47 ++
 4 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index ff8b5aa..ffd096f 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -37,7 +37,7 @@ amdkfd-y  := kfd_module.o kfd_device.o kfd_chardev.o 
kfd_topology.o \
kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \
kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
-   kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
+   kfd_int_process_v9.o kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
 amdkfd-y += kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
new file mode 100644
index 000..39d4115
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+#include "kfd_events.h"
+#include "soc15_int.h"
+
+
+static bool event_interrupt_isr_v9(struct kfd_dev *dev,
+   const uint32_t *ih_ring_entry)
+{
+   uint16_t source_id, client_id, pasid, vmid;
+
+   source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
+   client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
+   pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
+   vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
+
+   if (pasid) {
+   const uint32_t *data = ih_ring_entry;
+
+   pr_debug("client id 0x%x, source id %d, pasid 0x%x. raw 
data:\n",
+client_id, source_id, pasid);
+   pr_debug("%8X, %8X, %8X, %8X, %8X, %8X, %8X, %8X.\n",
+data[0], data[1], data[2], data[3],
+data[4], data[5], data[6], data[7]);
+   }
+
+   return (pasid != 0) &&
+   (source_id == SOC15_INTSRC_CP_END_OF_PIPE ||
+source_id == SOC15_INTSRC_SDMA_TRAP ||
+source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG ||
+source_id == SOC15_INTSRC_CP_BAD_OPCODE);
+}
+
+static void event_interrupt_wq_v9(struct kfd_dev *dev,
+   const uint32_t *ih_ring_entry)
+{
+   uint16_t source_id, client_id, pasid, vmid;
+   uint32_t context_id;
+
+   source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
+   client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
+   pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
+   vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
+   context_id = SOC15_CONTEXT_ID0_FROM_IH_ENTRY(ih_ring_entry);
+
+   if (source_id == SOC15_INTSRC_CP_END_OF_PIPE)
+   kfd_signal_event_interrupt(pasid, context_id, 32);
+   else if (source_id == SOC15_INTSRC_SDMA_TRAP)
+   kfd_signal_event_interrupt(pasid, context_id & 0xfff, 28);
+   else if (source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG)
+   kfd_signal_event_interrupt(pasid, context_id & 0xff, 24);
+   else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE)
+   kfd_signal_hw_exception_event(pasid);
+   else if (client_id == SOC15_IH_CLIENTID_VMC ||
+

[PATCH 14/21] drm/amdkfd: Fix goto usage

2018-04-10 Thread Felix Kuehling
Missed a spot in previous cleanup commit:
Remove gotos that do not feature any common cleanup, and use gotos
instead of repeating cleanup commands.

According to kernel.org: "The goto statement comes in handy when a
function exits from multiple locations and some common work such as
cleanup has to be done. If there is no cleanup needed then just return
directly."

Signed-off-by: Kent Russell 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 69f4964..23e586b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -232,18 +232,16 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 * make sure calling functions know
 * acquire_packet_buffer() failed
 */
-   *buffer_ptr = NULL;
-   return -ENOMEM;
+   goto err_no_space;
}
 
if (wptr + packet_size_in_dwords >= queue_size_dwords) {
/* make sure after rolling back to position 0, there is
 * still enough space.
 */
-   if (packet_size_in_dwords >= rptr) {
-   *buffer_ptr = NULL;
-   return -ENOMEM;
-   }
+   if (packet_size_in_dwords >= rptr)
+   goto err_no_space;
+
/* fill nops, roll back and start at position 0 */
while (wptr > 0) {
queue_address[wptr] = kq->nop_packet;
@@ -255,6 +253,10 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
kq->pending_wptr = wptr + packet_size_in_dwords;
 
return 0;
+
+err_no_space:
+   *buffer_ptr = NULL;
+   return -ENOMEM;
 }
 
 static void submit_packet(struct kernel_queue *kq)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling

2018-04-10 Thread Felix Kuehling
From: Harish Kasiviswanathan 

Use bit-rotate for better clarity and remove _MASK from the #defines as
these represent mmap types.

Centralize all the parsing of the mmap offset in kfd_mmap and add device
parameter to doorbell and reserved_mem map functions.

Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits
for encoding the the doorbell ID on Vega10.

Signed-off-by: Harish Kasiviswanathan 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 35 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  9 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 38 ---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  8 +++
 5 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b5e5f0e..f6b35f4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -292,7 +292,8 @@ static int kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
 
 
/* Return gpu_id as doorbell offset for mmap usage */
-   args->doorbell_offset = (KFD_MMAP_DOORBELL_MASK | args->gpu_id);
+   args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
+   args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
args->doorbell_offset <<= PAGE_SHIFT;
 
mutex_unlock(>mutex);
@@ -1645,23 +1646,33 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
 static int kfd_mmap(struct file *filp, struct vm_area_struct *vma)
 {
struct kfd_process *process;
+   struct kfd_dev *dev = NULL;
+   unsigned long vm_pgoff;
+   unsigned int gpu_id;
 
process = kfd_get_process(current);
if (IS_ERR(process))
return PTR_ERR(process);
 
-   if ((vma->vm_pgoff & KFD_MMAP_DOORBELL_MASK) ==
-   KFD_MMAP_DOORBELL_MASK) {
-   vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_DOORBELL_MASK;
-   return kfd_doorbell_mmap(process, vma);
-   } else if ((vma->vm_pgoff & KFD_MMAP_EVENTS_MASK) ==
-   KFD_MMAP_EVENTS_MASK) {
-   vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_EVENTS_MASK;
+   vm_pgoff = vma->vm_pgoff;
+   vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
+   gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
+   if (gpu_id)
+   dev = kfd_device_by_id(gpu_id);
+
+   switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
+   case KFD_MMAP_TYPE_DOORBELL:
+   if (!dev)
+   return -ENODEV;
+   return kfd_doorbell_mmap(dev, process, vma);
+
+   case KFD_MMAP_TYPE_EVENTS:
return kfd_event_mmap(process, vma);
-   } else if ((vma->vm_pgoff & KFD_MMAP_RESERVED_MEM_MASK) ==
-   KFD_MMAP_RESERVED_MEM_MASK) {
-   vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_RESERVED_MEM_MASK;
-   return kfd_reserved_mem_mmap(process, vma);
+
+   case KFD_MMAP_TYPE_RESERVED_MEM:
+   if (!dev)
+   return -ENODEV;
+   return kfd_reserved_mem_mmap(dev, process, vma);
}
 
return -EFAULT;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 4840314..efc59de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -126,15 +126,10 @@ void kfd_doorbell_fini(struct kfd_dev *kfd)
iounmap(kfd->doorbell_kernel_ptr);
 }
 
-int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
+int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
+ struct vm_area_struct *vma)
 {
phys_addr_t address;
-   struct kfd_dev *dev;
-
-   /* Find kfd device according to gpu id */
-   dev = kfd_device_by_id(vma->vm_pgoff);
-   if (!dev)
-   return -EINVAL;
 
/*
 * For simplicitly we only allow mapping of the entire doorbell
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 4890a90..bccf2f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -345,7 +345,7 @@ int kfd_event_create(struct file *devkfd, struct 
kfd_process *p,
case KFD_EVENT_TYPE_DEBUG:
ret = create_signal_event(devkfd, p, ev);
if (!ret) {
-   *event_page_offset = KFD_MMAP_EVENTS_MASK;
+   *event_page_offset = KFD_MMAP_TYPE_EVENTS;
*event_page_offset <<= PAGE_SHIFT;
*event_slot_index = ev->event_id;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 

[PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions

2018-04-10 Thread Felix Kuehling
Signed-off-by: John Bridgman 
Signed-off-by: Shaoyun Liu 
Signed-off-by: Jay Cornwall 
Signed-off-by: Yong Zhao 
Signed-off-by: Felix Kuehling 
---
 MAINTAINERS   |1 +
 drivers/gpu/drm/amd/amdgpu/Makefile   |3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 1043 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |1 +
 6 files changed, 1052 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6804170..9bfb765 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -766,6 +766,7 @@ F:  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
 F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
 F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
 F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
 F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
 F: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
 F: drivers/gpu/drm/amd/amdkfd/
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2ca2b51..f300202 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -130,7 +130,8 @@ amdgpu-y += \
 amdgpu_amdkfd.o \
 amdgpu_amdkfd_fence.o \
 amdgpu_amdkfd_gpuvm.o \
-amdgpu_amdkfd_gfx_v8.o
+amdgpu_amdkfd_gfx_v8.o \
+amdgpu_amdkfd_gfx_v9.o
 
 # add cgs
 amdgpu-y += amdgpu_cgs.o
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 4d36203..fcd10db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -92,6 +92,10 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
case CHIP_POLARIS11:
kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions();
break;
+   case CHIP_VEGA10:
+   case CHIP_RAVEN:
+   kfd2kgd = amdgpu_amdkfd_gfx_9_0_get_functions();
+   break;
default:
dev_dbg(adev->dev, "kfd not supported on this ASIC\n");
return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index c3024b1..12367a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -122,6 +122,7 @@ int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum 
kgd_engine_type engine,
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
+struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void);
 
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
new file mode 100644
index 000..8f37991
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -0,0 +1,1043 @@
+/*
+ * Copyright 2014-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#define pr_fmt(fmt) "kfd2kgd: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "amdgpu.h"
+#include "amdgpu_amdkfd.h"
+#include "amdgpu_ucode.h"
+#include "soc15_hw_ip.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "vega10_enum.h"
+#include "sdma0/sdma0_4_0_offset.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+#include "sdma1/sdma1_4_0_offset.h"
+#include "sdma1/sdma1_4_0_sh_mask.h"
+#include "athub/athub_1_0_offset.h"
+#include "athub/athub_1_0_sh_mask.h"
+#include 

[PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition

2018-04-10 Thread Felix Kuehling
Signed-off-by: Shaoyun Liu 
Signed-off-by: Jay Cornwall 
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/soc15d.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h 
b/drivers/gpu/drm/amd/amdgpu/soc15d.h
index 7f408f8..f22f7a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15d.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15d.h
@@ -268,6 +268,11 @@
 * x=1: tmz_end
 */
 
+#definePACKET3_INVALIDATE_TLBS 0x98
+#  define PACKET3_INVALIDATE_TLBS_DST_SEL(x) ((x) << 0)
+#  define PACKET3_INVALIDATE_TLBS_ALL_HUB(x) ((x) << 4)
+#  define PACKET3_INVALIDATE_TLBS_PASID(x)   ((x) << 5)
+#  define PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(x)  ((x) << 29)
 #define PACKET3_SET_RESOURCES  0xA0
 /* 1. header
  * 2. CONTROL
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 00/21] GFXv9/Vega10 support for KFD

2018-04-10 Thread Felix Kuehling
This patch series adds support for GFXv9 GPUs to KFD. In this series it
enables support for Vega10. Raven support requires some extra work that
will follow shortly, but Raven support is already included and I didn't
go out of my way to keep it out.

Felix Kuehling (19):
  drm/amdgpu: Remove unused interface from kfd2kgd interface
  drm/amd: Update GFXv9 SDMA MQD structure
  drm/amdgpu: Add GFXv9 TLB invalidation packet definition
  drm/amdgpu: Add GFXv9 kfd2kgd interface functions
  drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
  drm/amdkfd: Make doorbell size ASIC-dependent
  drm/amdkfd: Implement doorbell allocation for SOC15
  drm/amdkfd: Move packet writer functions into ASIC-specific file
  drm/amdkfd: Add GFXv9 PM4 packet writer functions
  drm/amdkfd: Add GFXv9 MQD manager
  drm/amdkfd: Add GFXv9 device queue manager
  drm/amdkfd: Add SOC15 interrupt processing support
  drm/amdkfd: Fix goto usage
  drm/amdkfd: Fix kernel queue rollback_packet
  drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
  drm/amdkfd: Remove limit on number of GPUs (follow-up)
  drm/amdkfd: Support flat memory apertures for GFXv9
  drm/amdkfd: Add GFXv9 CWSR trap handler
  drm/amdkfd: Add Vega10 topology and device info

Harish Kasiviswanathan (1):
  drm/amdkfd: Clean up KFD_MMAP_ offset handling

welu (1):
  drm/amdkfd: Try to enable atomics for all GPUs

 MAINTAINERS|2 +
 drivers/gpu/drm/amd/amdgpu/Makefile|3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |   26 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |1 +
 drivers/gpu/drm/amd/amdgpu/soc15d.h|5 +
 drivers/gpu/drm/amd/amdkfd/Makefile|   10 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |   42 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c  |   11 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|   89 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |2 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |   65 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c|2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c   |  119 +-
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c|   84 ++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c  |   39 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h  |7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |9 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c|5 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c   |3 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c|  443 ++
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c|  385 +
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h|  583 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  106 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   |   40 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c  |6 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h  |1 +
 drivers/gpu/drm/amd/amdkfd/soc15_int.h |   47 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h|   20 +-
 drivers/gpu/drm/amd/include/v9_structs.h   |   48 +-
 39 files changed, 5118 insertions(+), 501 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
 create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface

2018-04-10 Thread Felix Kuehling
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 10 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 --
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  5 -
 3 files changed, 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index ea54e53..0ff36d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -98,8 +98,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, 
uint32_t vmid,
 static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
unsigned int vmid);
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-   uint32_t hpd_size, uint64_t hpd_gpu_addr);
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr,
@@ -183,7 +181,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
.free_pasid = amdgpu_pasid_free,
.program_sh_mem_settings = kgd_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
-   .init_pipeline = kgd_init_pipeline,
.init_interrupts = kgd_init_interrupts,
.hqd_load = kgd_hqd_load,
.hqd_sdma_load = kgd_hqd_sdma_load,
@@ -309,13 +306,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, 
unsigned int pasid,
return 0;
 }
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-   uint32_t hpd_size, uint64_t hpd_gpu_addr)
-{
-   /* amdgpu owns the per-pipe state */
-   return 0;
-}
-
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 89264c9..6ef9762 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -57,8 +57,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, 
uint32_t vmid,
uint32_t sh_mem_bases);
 static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
unsigned int vmid);
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-   uint32_t hpd_size, uint64_t hpd_gpu_addr);
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr,
@@ -141,7 +139,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
.free_pasid = amdgpu_pasid_free,
.program_sh_mem_settings = kgd_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
-   .init_pipeline = kgd_init_pipeline,
.init_interrupts = kgd_init_interrupts,
.hqd_load = kgd_hqd_load,
.hqd_sdma_load = kgd_hqd_sdma_load,
@@ -270,13 +267,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, 
unsigned int pasid,
return 0;
 }
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-   uint32_t hpd_size, uint64_t hpd_gpu_addr)
-{
-   /* amdgpu owns the per-pipe state */
-   return 0;
-}
-
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 286cfe7..7cf3506 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -173,8 +173,6 @@ struct tile_config {
  * @set_pasid_vmid_mapping: Exposes pasid/vmid pair to the H/W for no cp
  * scheduling mode. Only used for no cp scheduling mode.
  *
- * @init_pipeline: Initialized the compute pipelines.
- *
  * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp
  * sceduling mode.
  *
@@ -274,9 +272,6 @@ struct kfd2kgd_calls {
int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid,
unsigned int vmid);
 
-   int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id,
-   uint32_t hpd_size, uint64_t hpd_gpu_addr);
-
int (*init_interrupts)(struct kgd_dev *kgd, uint32_t pipe_id);
 
int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure

2018-04-10 Thread Felix Kuehling
This matches what the HWS firmware expects on GFXv9 chips.

Signed-off-by: Felix Kuehling 
---
 MAINTAINERS  |  1 +
 drivers/gpu/drm/amd/include/v9_structs.h | 48 
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 004d2c1..6804170 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -772,6 +772,7 @@ F:  drivers/gpu/drm/amd/amdkfd/
 F: drivers/gpu/drm/amd/include/cik_structs.h
 F: drivers/gpu/drm/amd/include/kgd_kfd_interface.h
 F: drivers/gpu/drm/amd/include/vi_structs.h
+F: drivers/gpu/drm/amd/include/v9_structs.h
 F: include/uapi/linux/kfd_ioctl.h
 
 AMD SEATTLE DEVICE TREE SUPPORT
diff --git a/drivers/gpu/drm/amd/include/v9_structs.h 
b/drivers/gpu/drm/amd/include/v9_structs.h
index 2fb25ab..ceaf493 100644
--- a/drivers/gpu/drm/amd/include/v9_structs.h
+++ b/drivers/gpu/drm/amd/include/v9_structs.h
@@ -29,10 +29,10 @@ struct v9_sdma_mqd {
uint32_t sdmax_rlcx_rb_base;
uint32_t sdmax_rlcx_rb_base_hi;
uint32_t sdmax_rlcx_rb_rptr;
+   uint32_t sdmax_rlcx_rb_rptr_hi;
uint32_t sdmax_rlcx_rb_wptr;
+   uint32_t sdmax_rlcx_rb_wptr_hi;
uint32_t sdmax_rlcx_rb_wptr_poll_cntl;
-   uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi;
-   uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo;
uint32_t sdmax_rlcx_rb_rptr_addr_hi;
uint32_t sdmax_rlcx_rb_rptr_addr_lo;
uint32_t sdmax_rlcx_ib_cntl;
@@ -44,29 +44,29 @@ struct v9_sdma_mqd {
uint32_t sdmax_rlcx_skip_cntl;
uint32_t sdmax_rlcx_context_status;
uint32_t sdmax_rlcx_doorbell;
-   uint32_t sdmax_rlcx_virtual_addr;
-   uint32_t sdmax_rlcx_ape1_cntl;
+   uint32_t sdmax_rlcx_status;
uint32_t sdmax_rlcx_doorbell_log;
-   uint32_t reserved_22;
-   uint32_t reserved_23;
-   uint32_t reserved_24;
-   uint32_t reserved_25;
-   uint32_t reserved_26;
-   uint32_t reserved_27;
-   uint32_t reserved_28;
-   uint32_t reserved_29;
-   uint32_t reserved_30;
-   uint32_t reserved_31;
-   uint32_t reserved_32;
-   uint32_t reserved_33;
-   uint32_t reserved_34;
-   uint32_t reserved_35;
-   uint32_t reserved_36;
-   uint32_t reserved_37;
-   uint32_t reserved_38;
-   uint32_t reserved_39;
-   uint32_t reserved_40;
-   uint32_t reserved_41;
+   uint32_t sdmax_rlcx_watermark;
+   uint32_t sdmax_rlcx_doorbell_offset;
+   uint32_t sdmax_rlcx_csa_addr_lo;
+   uint32_t sdmax_rlcx_csa_addr_hi;
+   uint32_t sdmax_rlcx_ib_sub_remain;
+   uint32_t sdmax_rlcx_preempt;
+   uint32_t sdmax_rlcx_dummy_reg;
+   uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi;
+   uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo;
+   uint32_t sdmax_rlcx_rb_aql_cntl;
+   uint32_t sdmax_rlcx_minor_ptr_update;
+   uint32_t sdmax_rlcx_midcmd_data0;
+   uint32_t sdmax_rlcx_midcmd_data1;
+   uint32_t sdmax_rlcx_midcmd_data2;
+   uint32_t sdmax_rlcx_midcmd_data3;
+   uint32_t sdmax_rlcx_midcmd_data4;
+   uint32_t sdmax_rlcx_midcmd_data5;
+   uint32_t sdmax_rlcx_midcmd_data6;
+   uint32_t sdmax_rlcx_midcmd_data7;
+   uint32_t sdmax_rlcx_midcmd_data8;
+   uint32_t sdmax_rlcx_midcmd_cntl;
uint32_t reserved_42;
uint32_t reserved_43;
uint32_t reserved_44;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent

2018-04-10 Thread Felix Kuehling
This prepares for GFXv9 (Vega10), which has 64-bit doorbells.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 10 +++
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 48 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  7 +++--
 3 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 7b57995..f563acb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -41,6 +41,7 @@ static const struct kfd_device_info kaveri_device_info = {
.max_pasid_bits = 16,
/* max num of queues for KV.TODO should be a dynamic value */
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -55,6 +56,7 @@ static const struct kfd_device_info carrizo_device_info = {
.max_pasid_bits = 16,
/* max num of queues for CZ.TODO should be a dynamic value */
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -70,6 +72,7 @@ static const struct kfd_device_info hawaii_device_info = {
.max_pasid_bits = 16,
/* max num of queues for KV.TODO should be a dynamic value */
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -83,6 +86,7 @@ static const struct kfd_device_info tonga_device_info = {
.asic_family = CHIP_TONGA,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -96,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = {
.asic_family = CHIP_TONGA,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -109,6 +114,7 @@ static const struct kfd_device_info fiji_device_info = {
.asic_family = CHIP_FIJI,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -122,6 +128,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
.asic_family = CHIP_FIJI,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -136,6 +143,7 @@ static const struct kfd_device_info polaris10_device_info = 
{
.asic_family = CHIP_POLARIS10,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -149,6 +157,7 @@ static const struct kfd_device_info 
polaris10_vf_device_info = {
.asic_family = CHIP_POLARIS10,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
@@ -162,6 +171,7 @@ static const struct kfd_device_info polaris11_device_info = 
{
.asic_family = CHIP_POLARIS11,
.max_pasid_bits = 16,
.max_no_of_hqd  = 24,
+   .doorbell_size  = 4,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
.event_interrupt_class = _interrupt_class_cik,
.num_of_watch_points = 4,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index ebb4da14..4840314 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -33,7 +33,6 @@
 
 static DEFINE_IDA(doorbell_ida);
 static unsigned int max_doorbell_slices;
-#define KFD_SIZE_OF_DOORBELL_IN_BYTES 4
 
 /*
  * Each device exposes a doorbell aperture, a PCI MMIO aperture that
@@ -50,9 +49,9 @@ static unsigned int max_doorbell_slices;
  */
 
 /* # of doorbell bytes allocated for each process. */
-static inline size_t doorbell_process_allocation(void)
+static size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
 {
-   return roundup(KFD_SIZE_OF_DOORBELL_IN_BYTES *
+   return roundup(kfd->device_info->doorbell_size *
KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,

RE: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Cyr, Aric
> From: Manasi Navare [mailto:manasi.d.nav...@intel.com]
> Sent: Tuesday, April 10, 2018 17:37
> To: Wentland, Harry 
> Cc: amd-gfx mailing list ; Daniel Vetter 
> ; Haehnle, Nicolai
> ; Daenzer, Michel ; Deucher, 
> Alexander ;
> Koenig, Christian ; dri-devel 
> ; Cyr, Aric ; Koo,
> Anthony 
> Subject: Re: RFC for a render API to support adaptive sync and VRR
> 
> On Tue, Apr 10, 2018 at 11:03:02AM -0400, Harry Wentland wrote:
> > Adding Anthony and Aric who've been working on Freesync with DC on other 
> > OSes for a while.
> >
> > On 2018-04-09 05:45 PM, Manasi Navare wrote:
> > > Thanks for initiating the discussion. Find my comments below:
> > >
> > > On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
> > >> Adding dri-devel, which I should've included from the start.
> > >>
> > >> On 2018-04-09 03:56 PM, Harry Wentland wrote:
> > >>> === What is adaptive sync and VRR? ===
> > >>>
> > >>> Adaptive sync has been part of the DisplayPort spec for a while now and 
> > >>> allows graphics adapters to drive displays with varying
> frame timings. VRR (variable refresh rate) is essentially the same, but 
> defined for HDMI.
> > >>>
> > >>>
> > >>>
> > >>> === Why allow variable frame timings? ===
> > >>>
> > >>> Variable render times don't align with fixed refresh rates, leading to
> > >>> stuttering, tearing, and/or input lag.
> > >>>
> > >>> e.g. (rc = render completion, dr = display refresh)
> > >>>
> > >>> rc   B  CDE  F
> > >>> dr  A   B   C   C   D   E   F
> > >>>
> > >>> ^ ^
> > >>>   frame missed
> > >>>  repeated   display
> > >>>   twice refresh
> > >>>
> > >>>
> > >>>
> > >>> === Other use cases of adaptive sync 
> > >>>
> > >>> Beside the variable render case, adaptive sync also allows adjustment 
> > >>> of refresh rates without a mode change. One such use
> case would be 24 Hz video.
> > >>>
> > >
> > > One of the the advantages here when the render speed is slower than the 
> > > display refresh rate, since we are stretching the vertical
> blanking interval
> > > the display adapters will follow "draw fast and then go idle" approach. 
> > > This gives power savings when render rate is lower than the
> display refresh rate.
> >
> > Are you talking about a use case, such as an idle desktop, where the 
> > renders are quite sporadic?
> >
> 
> I was refering to a case where the render rate is lower say 24Hz but the 
> display rate is fixed 60Hz, that means we are pretty much
> displaying the same frame
> twice. But with Adaptive Sync, the display rate would be lowered to 24hz and 
> the vertical blanking time will be stretched where
> instead of drawing the
> same frame twice, the system is now idle in that extra blanking time thus 
> giving some power savings.

Hi Manasi,

Assuming the panel could go down to 24Hz, this would be possible.  
If it was a game, it'd naturally do this since the refresh rate would track the 
render rate. 

For a video where you have an adaptive sync capable player, it could request a 
fixed duration to achieve the same thing.
Most panels do not support as low as 24Hz however, so usually in the video case 
at least you'd end up with say 48Hz with the driver/HW providing automatic 
frame doubling.

> > >
> > >>>
> > >>>
> > >>> === A DRM render API to support variable refresh rates ===
> > >>>
> > >>> In order to benefit from adaptive sync and VRR userland needs a way to 
> > >>> let us know whether to vary frame timings or to target
> a different frame time. These can be provided as atomic properties on a CRTC:
> > >>>  * bool variable_refresh_compatible
> > >>>  * int  target_frame_duration_ns (nanosecond frame duration)
> > >>>
> > >>> This gives us the following cases:
> > >>>
> > >>> variable_refresh_compatible = 0, target_frame_duration_ns = 0
> > >>>  * drive monitor at timing's normal refresh rate
> > >>>
> > >>> variable_refresh_compatible = 1, target_frame_duration_ns = 0
> > >>>  * send new frame to monitor as soon as it's available, if within 
> > >>> min/max of monitor's reported capabilities
> > >>>
> > >>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
> > >>>  * send new frame to monitor with the specified target_frame_duration_ns
> > >>>
> > >>> When a target_frame_duration_ns or variable_refresh_compatible cannot 
> > >>> be supported the atomic check will reject the
> commit.
> > >>>
> > >
> > > What I would like is two sets of properties on a CRTC or preferably on a 
> > > connector:
> > >
> > > KMD properties that UMD can query:
> > > * vrr_capable -  This will be an immutable property for exposing 
> 

  1   2   >