date:20180319

Re: [PATCH] drm/amdgpu: add VCN to firmware query interface

2018-03-19 Thread Huang Rui

On Fri, Mar 16, 2018 at 11:07:08AM -0500, Alex Deucher wrote:
> Need to be able to query the VCN firmware version from
> userspace to determine supported features, etc.
> 

Reviewed-by: Huang Rui 

> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 
>  include/uapi/drm/amdgpu_drm.h   |  2 ++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 04a2219ce25e..9290a7b81950 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -213,6 +213,10 @@ static int amdgpu_firmware_info(struct 
> drm_amdgpu_info_firmware *fw_info,
>   fw_info->ver = adev->uvd.fw_version;
>   fw_info->feature = 0;
>   break;
> + case AMDGPU_INFO_FW_VCN:
> + fw_info->ver = adev->vcn.fw_version;
> + fw_info->feature = 0;
> + break;
>   case AMDGPU_INFO_FW_GMC:
>   fw_info->ver = adev->gmc.fw_version;
>   fw_info->feature = 0;
> @@ -1314,6 +1318,14 @@ static int amdgpu_debugfs_firmware_info(struct 
> seq_file *m, void *data)
>  i, fw_info.feature, fw_info.ver);
>   }
>  
> + /* VCN */
> + query_fw.fw_type = AMDGPU_INFO_FW_VCN;
> + ret = amdgpu_firmware_info(_info, _fw, adev);
> + if (ret)
> + return ret;
> + seq_printf(m, "VCN feature version: %u, firmware version: 0x%08x\n",
> +fw_info.feature, fw_info.ver);
> +
>   return 0;
>  }
>  
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index a967e8de5973..68d93b482f42 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -687,6 +687,8 @@ struct drm_amdgpu_cs_chunk_data {
>   #define AMDGPU_INFO_FW_SOS  0x0c
>   /* Subquery id: Query PSP ASD firmware version */
>   #define AMDGPU_INFO_FW_ASD  0x0d
> + /* Subquery id: Query VCN firmware version */
> + #define AMDGPU_INFO_FW_VCN  0x0e
>  /* number of bytes moved for TTM migration */
>  #define AMDGPU_INFO_NUM_BYTES_MOVED  0x0f
>  /* the used VRAM size */
> -- 
> 2.13.6
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Don't change preferred domian when fallback GTT v3

2018-03-19 Thread zhoucm1




On 2018年03月19日 18:03, Christian König wrote:

Am 19.03.2018 um 11:00 schrieb Chunming Zhou:

v2: add sanity checking
v3: make code open

Change-Id: I2cf672ad36b8b4cc1a6b2e704f786bf6a155d9ce
Signed-off-by: Chunming Zhou 


Reviewed-by: Christian König 

With more thinking, this change breaks the order of fallback.
visible to invisible should be first.
So I will make v4 for handle visible to invisible fallback.

Regards,
David Zhou



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 
  2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 6e6570ff9f8b..660f5e44b1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -85,11 +85,6 @@ int amdgpu_gem_object_create(struct amdgpu_device 
*adev, unsigned long size,

  flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
  goto retry;
  }
-
-    if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-    initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-    goto retry;
-    }
  DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",

    size, initial_domain, alignment, r);
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index b3310219e0ac..e167e98c746c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -440,13 +440,21 @@ static int amdgpu_bo_do_create(struct 
amdgpu_device *adev,

  #endif
    bo->tbo.bdev = >mman.bdev;
-    amdgpu_ttm_placement_from_domain(bo, domain);
-
+    amdgpu_ttm_placement_from_domain(bo, bo->preferred_domains);
  r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
   >placement, page_align, , acc_size,
   sg, resv, _ttm_bo_destroy);
-    if (unlikely(r != 0))
-    return r;
+
+    if (unlikely(r && r != -ERESTARTSYS) && bo->allowed_domains !=
+    bo->preferred_domains) {
+    amdgpu_ttm_placement_from_domain(bo, bo->allowed_domains);
+    r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, 
type,

+ >placement, page_align, ,
+ acc_size, sg, resv,
+ _ttm_bo_destroy);
+    if (unlikely(r != 0))
+    return r;
+    }
    if (adev->gmc.visible_vram_size < adev->gmc.real_vram_size &&
  bo->tbo.mem.mem_type == TTM_PL_VRAM &&




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

2018-03-19 Thread Quan, Evan

Hi Christian,

The messages prompted on timeout are Errors not just Warnings although we did 
not see any real problem(for the dgemm special case). That's why we say it 
confusing.
And i suppose you want a fix like my previous patch(see attachment).

Regards,
Evan
> -Original Message-
> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
> Sent: Monday, March 19, 2018 5:42 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
> disabled
> 
> Am 19.03.2018 um 07:08 schrieb Evan Quan:
> > Since under some heavy computing environment(dgemm test), it takes the
> > asic over 10+ seconds to finish the dispatched single job which will
> > trigger the timeout. It's quite confusing although it does not seem to
> > bring any real problems.
> > As a quick workround, we choose to disable timeout when GPU reset is
> > disabled.
> 
> NAK, I enabled those warning intentionally even when the GPU recovery is
> disabled to have a hint in the logs what goes wrong.
> 
> Please only increase the timeout for the compute queue and/or add a
> separate timeout for them.
> 
> Regards,
> Christian.
> 
> 
> >
> > Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> > Signed-off-by: Evan Quan 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 8bd9c3f..9d6a775 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -861,6 +861,13 @@ static void
> amdgpu_device_check_arguments(struct amdgpu_device *adev)
> > amdgpu_lockup_timeout = 1;
> > }
> >
> > +   /*
> > +* Disable timeout when GPU reset is disabled to avoid confusing
> > +* timeout messages in the kernel log.
> > +*/
> > +   if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> > +   amdgpu_lockup_timeout = INT_MAX;
> > +
> > adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
> amdgpu_fw_load_type);
> >   }
> >

--- Begin Message ---
Under some heavy computing test(dgemm) environment, it may takes
the asic over 50+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite annoying although it
does not seem to bring any real problems.
As a quick workround, we choose to not enfoce the timeout
setting on compute queues.

Change-Id: I210011a90898617367e897a90e9f8fb2639281a3
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 008e198..455a81e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -435,7 +435,9 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
r = drm_sched_init(>sched, _sched_ops,
   num_hw_submission, amdgpu_job_hang_limit,
-  msecs_to_jiffies(amdgpu_lockup_timeout), 
ring->name);
+  (ring->funcs->type == 
AMDGPU_RING_TYPE_COMPUTE) ?
+  MAX_SCHEDULE_TIMEOUT : 
msecs_to_jiffies(amdgpu_lockup_timeout),
+  ring->name);
if (r) {
DRM_ERROR("Failed to create scheduler on ring %s.\n",
  ring->name);
--
2.7.4

--- End Message ---
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Documentation about AMD's HSA implementation?

2018-03-19 Thread Ming Yang

Thanks, John!

On Sat, Mar 17, 2018 at 4:17 PM, Bridgman, John  wrote:
>
>>-Original Message-
>>From: Ming Yang [mailto:minos.fut...@gmail.com]
>>Sent: Saturday, March 17, 2018 12:35 PM
>>To: Kuehling, Felix; Bridgman, John
>>Cc: amd-gfx@lists.freedesktop.org
>>Subject: Re: Documentation about AMD's HSA implementation?
>>
>>Hi,
>>
>>After digging into documents and code, our previous discussion about GPU
>>workload scheduling (mainly HWS and ACE scheduling) makes a lot more
>>sense to me now.  Thanks a lot!  I'm writing this email to ask more questions.
>>Before asking, I first share a few links to the documents that are most 
>>helpful
>>to me.
>>
>>GCN (1st gen.?) architecture whitepaper
>>https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
>>Notes: ACE scheduling.
>>
>>Polaris architecture whitepaper (4th gen. GCN)
>>http://radeon.com/_downloads/polaris-whitepaper-4.8.16.pdf
>>Notes: ACE scheduling; HWS; quick response queue (priority assignment);
>>compute units reservation.
>>
>>AMDKFD patch cover letters:
>>v5: https://lwn.net/Articles/619581/
>>v1: https://lwn.net/Articles/605153/
>>
>>A comprehensive performance analysis of HSA and OpenCL 2.0:
>>http://ieeexplore.ieee.org/document/7482093/
>>
>>Partitioning resources of a processor (AMD patent)
>>https://patents.google.com/patent/US8933942B2/
>>Notes: Compute resources are allocated according to the resource
>>requirement percentage of the command.
>>
>>Here come my questions about ACE scheduling:
>>Many of my questions are about ACE scheduling because the firmware is
>>closed-source and how ACE schedules commands (queues) is not detailed
>>enough in these documents.  I'm not able to run experiments on Raven Ridge
>>yet.
>>
>>1. Wavefronts of one command scheduled by an ACE can be spread out to
>>multiple compute engines (shader arrays)?  This is quite confirmed by the
>>cu_mask setting, as cu_mask for one queue can cover CUs over multiple
>>compute engines.
>
> Correct, assuming the work associated with the command is not trivially small
> and so generates enough wavefronts to require multiple CU's.
>
>>
>>2.  If so, how is the competition resolved between commands scheduled by
>>ACEs?  What's the scheduling scheme?  For example, when each ACE has a
>>command ready to occupy 50% compute resources, are these 4 commands
>>each occupies 25%, or they execute in the round-robin with 50% resources at
>>a time?  Or just the first two scheduled commands execute and the later two
>>wait?
>
> Depends on how you measure compute resources, since each SIMD in a CU can
> have up to 10 separate wavefronts running on it as long as total register 
> usage
> for all the threads does not exceed the number available in HW.
>
> If each ACE (let's say pipe for clarity) has enough work to put a single 
> wavefront
> on 50% of the SIMDs then all of the work would get scheduled to the SIMDs (4
> SIMDs per CU) and run in a round-robin-ish manner as each wavefront was
> blocked waiting for memory access.
>
> If each pipe has enough work to fill 50% of the CPUs and all pipes/queues were
> assigned the same priority (see below) then the behaviour would be more like
> "each one would get 25% and each time a wavefront finished another one would
> be started".
>

This makes sense to me.  I will try some experiments once Raven Ridge is ready.

>>
>>3. If the barrier bit of the AQL packet is not set, does ACE schedule the
>>following command using the same scheduling scheme in #2?
>
> Not sure, barrier behaviour has paged so far out of my head that I'll have to 
> skip
> this one.
>

This barrier bit is defined in HSA.  If it is set, the following
packet should wait until the current packet finish.  It's probably the
key implementing out-of-order execution of OpenCL, I'm not sure.  I
should be able to use the profiler to find out the answer once I can
run OpenCL on Raven Ridge.

>>
>>4. ACE takes 3 pipe priorities: low, medium, and high, even though AQL queue
>>has 7 priority levels, right?
>
> Yes-ish. Remember that there are multiple levels of scheduling going on here. 
> At
> any given time a pipe is only processing work from one of the queues; queue
> priorities affect the pipe's round-robin-ing between queues in a way that I 
> have
> managed to forget (but will try to find). There is a separate pipe priority, 
> which
> IIRC is actually programmed per queue and takes effect when the pipe is active
> on that queue. There is also a global (IIRC) setting which adjusts how compute
> work and graphics work are prioritized against each other, giving options like
> making all compute lower priority than graphics or making only high priority
> compute get ahead of graphics.
>
> I believe the pipe priority is also referred to as SPI priority, since it 
> affects
> the way SPI decides which pipe (graphics/compute) to accept work from
> next.
>
> This is all a bit complicated by a separate (global IIRC) option which 
> randomizes
> priority

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Marek Olšák

In theory, Mesa doesn't have to do anything. It can continue setting VRAM
and if the kernel has to put a display buffer into GTT, it doesn't matter
(for Mesa). Whether the VRAM placement is really used is largely determined
by BO priorities.

The way I understand scather/gather is that it only allows the GTT
placement. It doesn't force the GTT placement. Mesa also doesn't force the
GTT placement.

Marek

On Mon, Mar 19, 2018 at 5:12 PM, Alex Deucher  wrote:

> On Mon, Mar 19, 2018 at 4:29 PM, Li, Samuel  wrote:
> >>to my earlier point, there may be cases where it is advantageous to put
> >> display buffers in vram even if s/g display is supported
> >
> > Agreed. That is also why the patch has the options to let user select
> where
> > to put display buffers.
> >
> > As whether to put the option in Mesa or kernel, it seems the difference
> is
> > not much. Also, since amdgpufb can request even without mesa, kernel
> might
> > be a better choice. In addition, putting in the kernel can save client’s
> > duplicate work(mesa, ogl, vulkan, 2d, kernel…)
>
> Why do we even expose different memory pools to the UMDs in the first
> place ;)  Each pool has performance characteristics that may be
> relevant for a particular work load.  Only the UMDs really know the
> finer points of those workloads. In general, you don't want the kernel
> dictating policy if you can avoid it.  The kernel exposes
> functionality and userspace sets the policy.  With the location set in
> userspace, each app/user can have whatever policy makes sense for
> their use case all at the same time without needing to tweak their
> kernel for every use case.
>
> Alex
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: Fix NULL pointer deref on driver unbind.

2018-03-19 Thread Deucher, Alexander

Reviewed-by: Alex Deucher 



From: amd-gfx  on behalf of Andrey 
Grodzovsky 
Sent: Monday, March 19, 2018 4:55 PM
To: amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey; Zhu, Rex
Subject: [PATCH] drm/amd/powerplay: Fix NULL pointer deref on driver unbind.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 3da3dcc..dbb0e69 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -286,6 +286,12 @@ static int pp_resume(void *handle)
 return hwmgr_hw_resume(hwmgr);
 }

+static int pp_set_clockgating_state(void *handle,
+ enum amd_clockgating_state state)
+{
+   return 0;
+}
+
 static const struct amd_ip_funcs pp_ip_funcs = {
 .name = "powerplay",
 .early_init = pp_early_init,
@@ -300,7 +306,7 @@ static const struct amd_ip_funcs pp_ip_funcs = {
 .is_idle = pp_is_idle,
 .wait_for_idle = pp_wait_for_idle,
 .soft_reset = pp_sw_reset,
-   .set_clockgating_state = NULL,
+   .set_clockgating_state = pp_set_clockgating_state,
 .set_powergating_state = pp_set_powergating_state,
 };

--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Alex Deucher

On Mon, Mar 19, 2018 at 4:29 PM, Li, Samuel  wrote:
>>to my earlier point, there may be cases where it is advantageous to put
>> display buffers in vram even if s/g display is supported
>
> Agreed. That is also why the patch has the options to let user select where
> to put display buffers.
>
> As whether to put the option in Mesa or kernel, it seems the difference is
> not much. Also, since amdgpufb can request even without mesa, kernel might
> be a better choice. In addition, putting in the kernel can save client’s
> duplicate work(mesa, ogl, vulkan, 2d, kernel…)

Why do we even expose different memory pools to the UMDs in the first
place ;)  Each pool has performance characteristics that may be
relevant for a particular work load.  Only the UMDs really know the
finer points of those workloads. In general, you don't want the kernel
dictating policy if you can avoid it.  The kernel exposes
functionality and userspace sets the policy.  With the location set in
userspace, each app/user can have whatever policy makes sense for
their use case all at the same time without needing to tweak their
kernel for every use case.

Alex

>
>
>
> Regards,
>
> Samuel Li
>
>
>
> From: Marek Olšák [mailto:mar...@gmail.com]
> Sent: Monday, March 19, 2018 4:27 PM
> To: Deucher, Alexander 
> Cc: Li, Samuel ; Koenig, Christian
> ; Alex Deucher ; Michel
> Dänzer ; amd-gfx list 
>
>
> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>
>
>
> When Mesa wants a buffer in VRAM, it always sets VRAM. It relies on BO move
> throttling to prevent unnecessary BO moves.
>
>
>
> My questions are:
>
> - what should Mesa do differently for tiny VRAM?
>
> - what is a tiny VRAM?
>
> - if VRAM is tiny, which allocations should we put there?
>
>
>
> Marek
>
>
>
> On Mon, Mar 19, 2018 at 4:15 PM, Deucher, Alexander
>  wrote:
>
> s/not/now/.  I meant to say, “we have NOW excluded the possibility of ever
> setting displays anywhere else without a kernel update”.
>
>
>
> Alex
>
>
>
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
> Deucher, Alexander
> Sent: Monday, March 19, 2018 4:13 PM
>
>
> To: Li, Samuel ; Koenig, Christian
> ; Marek Olšák 
> Cc: Alex Deucher ; Michel Dänzer
> ; amd-gfx list 
> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>
>
>
> I'm not sure what you mean by the 3 scenarios.  Generally userspace selects
> what domains it wants a buffer to be in, vram, gtt, or both (don't care).
> I'd rather not have the kernel second guess the UMDs if we can help it.  I'd
> rather leave the kernel for cases where we have to force things due to hw
> bugs, or hw restrictions, etc.  If we force all display buffers to be in gtt
> in the kernel, we have not excluded the possibility of ever setting displays
> anywhere else without a kernel update.  E.g., to my earlier point, there may
> be cases where it is advantageous to put display buffers in vram even if s/g
> display is supported.  That was the point I was trying to make about user
> mode selecting the domain (vram of gtt or vram|gtt).  Say you have a board
> with 2 GB of ram and 1 GB is carved out for "vram".  In that case, it would
> make sense to put the buffer in vram because otherwise you are wasting a
> comparatively scarce resource.
>
>
>
> Alex
>
> 
>
> From: Li, Samuel
> Sent: Monday, March 19, 2018 3:58:52 PM
> To: Deucher, Alexander; Koenig, Christian; Marek Olšák
> Cc: Alex Deucher; Michel Dänzer; amd-gfx list
> Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>
>
>
> Alex,
>
>
>
> I assume you are talking the three scenarios here, 1)VRAM, 2)GTT,
> 3)VRAM/GTT.
>
> But kernel will need the decision too(amdgpufb). I think it shall be better
> to do it in kernel, instead of different clients(mesa, ddx, kernel …)
>
>
>
> Regards,
>
> Samuel Li
>
>
>
> From: Deucher, Alexander
> Sent: Monday, March 19, 2018 3:54 PM
> To: Li, Samuel ; Koenig, Christian
> ; Marek Olšák 
> Cc: Alex Deucher ; Michel Dänzer
> ; amd-gfx list 
> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>
>
>
> My personal preference is still to plumb this through to mesa rather than
> forcing it in the kernel.
>
>
>
> Alex
>
> 
>
> From: amd-gfx  on behalf of Li,
> Samuel 
> Sent: Monday, March 19, 2018 3:50:34 PM
> To: Koenig, Christian; Marek Olšák
> Cc: Alex Deucher; Michel Dänzer; amd-gfx list
> Subject:

[PATCH] drm/amd/powerplay: Fix NULL pointer deref on driver unbind.

2018-03-19 Thread Andrey Grodzovsky

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 3da3dcc..dbb0e69 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -286,6 +286,12 @@ static int pp_resume(void *handle)
return hwmgr_hw_resume(hwmgr);
 }
 
+static int pp_set_clockgating_state(void *handle,
+ enum amd_clockgating_state state)
+{
+   return 0;
+}
+
 static const struct amd_ip_funcs pp_ip_funcs = {
.name = "powerplay",
.early_init = pp_early_init,
@@ -300,7 +306,7 @@ static const struct amd_ip_funcs pp_ip_funcs = {
.is_idle = pp_is_idle,
.wait_for_idle = pp_wait_for_idle,
.soft_reset = pp_sw_reset,
-   .set_clockgating_state = NULL,
+   .set_clockgating_state = pp_set_clockgating_state,
.set_powergating_state = pp_set_powergating_state,
 };
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Marek Olšák

On Mon, Mar 19, 2018 at 4:53 PM, Alex Deucher  wrote:

> On Mon, Mar 19, 2018 at 4:26 PM, Marek Olšák  wrote:
> > When Mesa wants a buffer in VRAM, it always sets VRAM. It relies on BO
> move
> > throttling to prevent unnecessary BO moves.
> >
> > My questions are:
> > - what should Mesa do differently for tiny VRAM?
> > - what is a tiny VRAM?
> > - if VRAM is tiny, which allocations should we put there?
>
> Right now, all mesa would have to do is query the kernel to see if the
> device supports s/g display.  If it does, it can do whatever makes
> sense in the long term once we've had more time to benchmark, etc.,
> but to start off with, I'd say just make it GTT or VRAM|GTT.
> Otherwise we have the kernel second guessing mesa and I'd like to
> avoid that.  I want the kernel to respect that userspace asks for as
> much as possible.
>

Sounds good.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Alex Deucher

On Mon, Mar 19, 2018 at 4:26 PM, Marek Olšák  wrote:
> When Mesa wants a buffer in VRAM, it always sets VRAM. It relies on BO move
> throttling to prevent unnecessary BO moves.
>
> My questions are:
> - what should Mesa do differently for tiny VRAM?
> - what is a tiny VRAM?
> - if VRAM is tiny, which allocations should we put there?

Right now, all mesa would have to do is query the kernel to see if the
device supports s/g display.  If it does, it can do whatever makes
sense in the long term once we've had more time to benchmark, etc.,
but to start off with, I'd say just make it GTT or VRAM|GTT.
Otherwise we have the kernel second guessing mesa and I'd like to
avoid that.  I want the kernel to respect that userspace asks for as
much as possible.

Alex

>
> Marek
>
> On Mon, Mar 19, 2018 at 4:15 PM, Deucher, Alexander
>  wrote:
>>
>> s/not/now/.  I meant to say, “we have NOW excluded the possibility of ever
>> setting displays anywhere else without a kernel update”.
>>
>>
>>
>> Alex
>>
>>
>>
>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
>> Deucher, Alexander
>> Sent: Monday, March 19, 2018 4:13 PM
>>
>>
>> To: Li, Samuel ; Koenig, Christian
>> ; Marek Olšák 
>> Cc: Alex Deucher ; Michel Dänzer
>> ; amd-gfx list 
>> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>>
>>
>>
>> I'm not sure what you mean by the 3 scenarios.  Generally userspace
>> selects what domains it wants a buffer to be in, vram, gtt, or both (don't
>> care).  I'd rather not have the kernel second guess the UMDs if we can help
>> it.  I'd rather leave the kernel for cases where we have to force things due
>> to hw bugs, or hw restrictions, etc.  If we force all display buffers to be
>> in gtt in the kernel, we have not excluded the possibility of ever setting
>> displays anywhere else without a kernel update.  E.g., to my earlier point,
>> there may be cases where it is advantageous to put display buffers in vram
>> even if s/g display is supported.  That was the point I was trying to make
>> about user mode selecting the domain (vram of gtt or vram|gtt).  Say you
>> have a board with 2 GB of ram and 1 GB is carved out for "vram".  In that
>> case, it would make sense to put the buffer in vram because otherwise you
>> are wasting a comparatively scarce resource.
>>
>>
>>
>> Alex
>>
>> 
>>
>> From: Li, Samuel
>> Sent: Monday, March 19, 2018 3:58:52 PM
>> To: Deucher, Alexander; Koenig, Christian; Marek Olšák
>> Cc: Alex Deucher; Michel Dänzer; amd-gfx list
>> Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>>
>>
>>
>> Alex,
>>
>>
>>
>> I assume you are talking the three scenarios here, 1)VRAM, 2)GTT,
>> 3)VRAM/GTT.
>>
>> But kernel will need the decision too(amdgpufb). I think it shall be
>> better to do it in kernel, instead of different clients(mesa, ddx, kernel …)
>>
>>
>>
>> Regards,
>>
>> Samuel Li
>>
>>
>>
>> From: Deucher, Alexander
>> Sent: Monday, March 19, 2018 3:54 PM
>> To: Li, Samuel ; Koenig, Christian
>> ; Marek Olšák 
>> Cc: Alex Deucher ; Michel Dänzer
>> ; amd-gfx list 
>> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>>
>>
>>
>> My personal preference is still to plumb this through to mesa rather than
>> forcing it in the kernel.
>>
>>
>>
>> Alex
>>
>> 
>>
>> From: amd-gfx  on behalf of Li,
>> Samuel 
>> Sent: Monday, March 19, 2018 3:50:34 PM
>> To: Koenig, Christian; Marek Olšák
>> Cc: Alex Deucher; Michel Dänzer; amd-gfx list
>> Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
>>
>>
>>
>> Christian,
>>
>>
>>
>> You misunderstood Alex’s comments,
>>
>>
>>
>> >Regardless of which scenarios we need to support, I think we also need
>>
>> >to really plumb this through to mesa however since user space is who
>>
>> >ultimately requests the location.  Overriding it in the kernel gets
>>
>> >tricky and can lead to ping-ponging as others have noted.  Better to
>>
>>
>>
>> Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.
>>
>> His concern is this might cause ping-pong, not about preferred domain.
>> Since preferred domain can solve the ping-pong issue, it shall address his
>> concern here.
>>
>>
>>
>> Regards,
>>
>> Samuel Li
>>
>>
>>
>> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
>> Sent: Monday, March 19, 2018 3:45 PM
>> To: Li, Samuel ; Marek Olšák ;
>> Koenig, Christian 
>> Cc: Alex Deucher ; Michel Dänzer
>> ;

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Li, Samuel

>to my earlier point, there may be cases where it is advantageous to put 
>display buffers in vram even if s/g display is supported
Agreed. That is also why the patch has the options to let user select where to 
put display buffers.
As whether to put the option in Mesa or kernel, it seems the difference is not 
much. Also, since amdgpufb can request even without mesa, kernel might be a 
better choice. In addition, putting in the kernel can save client’s duplicate 
work(mesa, ogl, vulkan, 2d, kernel…)

Regards,
Samuel Li

From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 4:27 PM
To: Deucher, Alexander 
Cc: Li, Samuel ; Koenig, Christian 
; Alex Deucher ; Michel Dänzer 
; amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

When Mesa wants a buffer in VRAM, it always sets VRAM. It relies on BO move 
throttling to prevent unnecessary BO moves.

My questions are:
- what should Mesa do differently for tiny VRAM?
- what is a tiny VRAM?
- if VRAM is tiny, which allocations should we put there?

Marek

On Mon, Mar 19, 2018 at 4:15 PM, Deucher, Alexander 
> wrote:
s/not/now/.  I meant to say, “we have NOW excluded the possibility of ever 
setting displays anywhere else without a kernel update”.

Alex

From: amd-gfx 
[mailto:amd-gfx-boun...@lists.freedesktop.org]
 On Behalf Of Deucher, Alexander
Sent: Monday, March 19, 2018 4:13 PM

To: Li, Samuel >; Koenig, Christian 
>; Marek Olšák 
>
Cc: Alex Deucher >; Michel 
Dänzer >; amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

I'm not sure what you mean by the 3 scenarios.  Generally userspace selects 
what domains it wants a buffer to be in, vram, gtt, or both (don't care).  I'd 
rather not have the kernel second guess the UMDs if we can help it.  I'd rather 
leave the kernel for cases where we have to force things due to hw bugs, or hw 
restrictions, etc.  If we force all display buffers to be in gtt in the kernel, 
we have not excluded the possibility of ever setting displays anywhere else 
without a kernel update.  E.g., to my earlier point, there may be cases where 
it is advantageous to put display buffers in vram even if s/g display is 
supported.  That was the point I was trying to make about user mode selecting 
the domain (vram of gtt or vram|gtt).  Say you have a board with 2 GB of ram 
and 1 GB is carved out for "vram".  In that case, it would make sense to put 
the buffer in vram because otherwise you are wasting a comparatively scarce 
resource.

Alex

From: Li, Samuel
Sent: Monday, March 19, 2018 3:58:52 PM
To: Deucher, Alexander; Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Alex,

I assume you are talking the three scenarios here, 1)VRAM, 2)GTT, 3)VRAM/GTT.

But kernel will need the decision too(amdgpufb). I think it shall be better to 
do it in kernel, instead of different clients(mesa, ddx, kernel …)

Regards,

Samuel Li

From: Deucher, Alexander
Sent: Monday, March 19, 2018 3:54 PM
To: Li, Samuel >; Koenig, Christian 
>; Marek Olšák 
>
Cc: Alex Deucher >; Michel 
Dänzer >; amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

My personal preference is still to plumb this through to mesa rather than 
forcing it in the kernel.

Alex

From: amd-gfx 
>
 on behalf of Li, Samuel >
Sent: Monday, March 19, 2018 3:50:34 PM
To: Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Christian,

You misunderstood Alex’s comments,

>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Marek Olšák

When Mesa wants a buffer in VRAM, it always sets VRAM. It relies on BO move
throttling to prevent unnecessary BO moves.

My questions are:
- what should Mesa do differently for tiny VRAM?
- what is a tiny VRAM?
- if VRAM is tiny, which allocations should we put there?

Marek

On Mon, Mar 19, 2018 at 4:15 PM, Deucher, Alexander <
alexander.deuc...@amd.com> wrote:

> s/not/now/.  I meant to say, “we have NOW excluded the possibility of
> ever setting displays anywhere else without a kernel update”.
>
>
>
> Alex
>
>
>
> *From:* amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On Behalf
> Of *Deucher, Alexander
> *Sent:* Monday, March 19, 2018 4:13 PM
>
> *To:* Li, Samuel ; Koenig, Christian <
> christian.koe...@amd.com>; Marek Olšák 
> *Cc:* Alex Deucher ; Michel Dänzer <
> mic...@daenzer.net>; amd-gfx list 
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
> support
>
>
>
> I'm not sure what you mean by the 3 scenarios.  Generally userspace
> selects what domains it wants a buffer to be in, vram, gtt, or both (don't
> care).  I'd rather not have the kernel second guess the UMDs if we can help
> it.  I'd rather leave the kernel for cases where we have to force things
> due to hw bugs, or hw restrictions, etc.  If we force all display buffers
> to be in gtt in the kernel, we have not excluded the possibility of ever
> setting displays anywhere else without a kernel update.  E.g., to my
> earlier point, there may be cases where it is advantageous to put display
> buffers in vram even if s/g display is supported.  That was the point I was
> trying to make about user mode selecting the domain (vram of gtt or
> vram|gtt).  Say you have a board with 2 GB of ram and 1 GB is carved out
> for "vram".  In that case, it would make sense to put the buffer in vram
> because otherwise you are wasting a comparatively scarce resource.
>
>
>
> Alex
> --
>
> *From:* Li, Samuel
> *Sent:* Monday, March 19, 2018 3:58:52 PM
> *To:* Deucher, Alexander; Koenig, Christian; Marek Olšák
> *Cc:* Alex Deucher; Michel Dänzer; amd-gfx list
> *Subject:* RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
> support
>
>
>
> Alex,
>
>
>
> I assume you are talking the three scenarios here, 1)VRAM, 2)GTT,
> 3)VRAM/GTT.
>
> But kernel will need the decision too(amdgpufb). I think it shall be
> better to do it in kernel, instead of different clients(mesa, ddx, kernel …)
>
>
>
> Regards,
>
> Samuel Li
>
>
>
> *From:* Deucher, Alexander
> *Sent:* Monday, March 19, 2018 3:54 PM
> *To:* Li, Samuel ; Koenig, Christian <
> christian.koe...@amd.com>; Marek Olšák 
> *Cc:* Alex Deucher ; Michel Dänzer <
> mic...@daenzer.net>; amd-gfx list 
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
> support
>
>
>
> My personal preference is still to plumb this through to mesa rather than
> forcing it in the kernel.
>
>
>
> Alex
> --
>
> *From:* amd-gfx  on behalf of Li,
> Samuel 
> *Sent:* Monday, March 19, 2018 3:50:34 PM
> *To:* Koenig, Christian; Marek Olšák
> *Cc:* Alex Deucher; Michel Dänzer; amd-gfx list
> *Subject:* RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
> support
>
>
>
> Christian,
>
>
>
> You misunderstood Alex’s comments,
>
>
>
> >Regardless of which scenarios we need to support, I think we also need
>
> >to really plumb this through to mesa however since user space is who
>
> >ultimately requests the location.  Overriding it in the kernel gets
>
> >tricky and can lead to ping-ponging as others have noted.  Better to
>
>
>
> Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.
>
> His concern is this might cause ping-pong, not about preferred domain.
> Since preferred domain can solve the ping-pong issue, it shall address his
> concern here.
>
>
>
> Regards,
>
> Samuel Li
>
>
>
> *From:* Christian König [mailto:ckoenig.leichtzumer...@gmail.com
> ]
> *Sent:* Monday, March 19, 2018 3:45 PM
> *To:* Li, Samuel ; Marek Olšák ;
> Koenig, Christian 
> *Cc:* Alex Deucher ; Michel Dänzer <
> mic...@daenzer.net>; amd-gfx list 
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
> support
>
>
>
> Quoting Alex:
>
> Regardless of which scenarios we need to support, I think we also need
>
> to really plumb this through to mesa however since user space is who
>
> ultimately requests the location.  Overriding it in the kernel gets
>
> tricky and can lead to ping-ponging as others have noted.  Better to
>
> have user space know what chips support it or not and request display
>
> buffers in GTT or VRAM from the start.
>
> And I

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Deucher, Alexander

s/not/now/.  I meant to say, "we have NOW excluded the possibility of ever 
setting displays anywhere else without a kernel update".

Alex

From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Deucher, Alexander
Sent: Monday, March 19, 2018 4:13 PM
To: Li, Samuel ; Koenig, Christian 
; Marek Olšák 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

I'm not sure what you mean by the 3 scenarios.  Generally userspace selects 
what domains it wants a buffer to be in, vram, gtt, or both (don't care).  I'd 
rather not have the kernel second guess the UMDs if we can help it.  I'd rather 
leave the kernel for cases where we have to force things due to hw bugs, or hw 
restrictions, etc.  If we force all display buffers to be in gtt in the kernel, 
we have not excluded the possibility of ever setting displays anywhere else 
without a kernel update.  E.g., to my earlier point, there may be cases where 
it is advantageous to put display buffers in vram even if s/g display is 
supported.  That was the point I was trying to make about user mode selecting 
the domain (vram of gtt or vram|gtt).  Say you have a board with 2 GB of ram 
and 1 GB is carved out for "vram".  In that case, it would make sense to put 
the buffer in vram because otherwise you are wasting a comparatively scarce 
resource.

Alex

From: Li, Samuel
Sent: Monday, March 19, 2018 3:58:52 PM
To: Deucher, Alexander; Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Alex,

I assume you are talking the three scenarios here, 1)VRAM, 2)GTT, 3)VRAM/GTT.

But kernel will need the decision too(amdgpufb). I think it shall be better to 
do it in kernel, instead of different clients(mesa, ddx, kernel ...)

Regards,

Samuel Li

From: Deucher, Alexander
Sent: Monday, March 19, 2018 3:54 PM
To: Li, Samuel ; Koenig, Christian 
; Marek Olšák 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

My personal preference is still to plumb this through to mesa rather than 
forcing it in the kernel.

Alex

From: amd-gfx 
>
 on behalf of Li, Samuel >
Sent: Monday, March 19, 2018 3:50:34 PM
To: Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Christian,

You misunderstood Alex's comments,

>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can lead to ping-ponging as others have noted.  Better to

Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.

His concern is this might cause ping-pong, not about preferred domain. Since 
preferred domain can solve the ping-pong issue, it shall address his concern 
here.

Regards,

Samuel Li

From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Monday, March 19, 2018 3:45 PM
To: Li, Samuel >; Marek Olšák 
>; Koenig, Christian 
>
Cc: Alex Deucher >; Michel 
Dänzer >; amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.

And I completely agree with Alex here. So overriding the domain in the kernel 
is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a few lines 
of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.

>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex's concern is about ping-pong, not

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Deucher, Alexander

I'm not sure what you mean by the 3 scenarios.  Generally userspace selects 
what domains it wants a buffer to be in, vram, gtt, or both (don't care).  I'd 
rather not have the kernel second guess the UMDs if we can help it.  I'd rather 
leave the kernel for cases where we have to force things due to hw bugs, or hw 
restrictions, etc.  If we force all display buffers to be in gtt in the kernel, 
we have not excluded the possibility of ever setting displays anywhere else 
without a kernel update.  E.g., to my earlier point, there may be cases where 
it is advantageous to put display buffers in vram even if s/g display is 
supported.  That was the point I was trying to make about user mode selecting 
the domain (vram of gtt or vram|gtt).  Say you have a board with 2 GB of ram 
and 1 GB is carved out for "vram".  In that case, it would make sense to put 
the buffer in vram because otherwise you are wasting a comparatively scarce 
resource.


Alex


From: Li, Samuel
Sent: Monday, March 19, 2018 3:58:52 PM
To: Deucher, Alexander; Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support


Alex,



I assume you are talking the three scenarios here, 1)VRAM, 2)GTT, 3)VRAM/GTT.

But kernel will need the decision too(amdgpufb). I think it shall be better to 
do it in kernel, instead of different clients(mesa, ddx, kernel …)



Regards,

Samuel Li



From: Deucher, Alexander
Sent: Monday, March 19, 2018 3:54 PM
To: Li, Samuel ; Koenig, Christian 
; Marek Olšák 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support



My personal preference is still to plumb this through to mesa rather than 
forcing it in the kernel.



Alex



From: amd-gfx 
>
 on behalf of Li, Samuel >
Sent: Monday, March 19, 2018 3:50:34 PM
To: Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support



Christian,



You misunderstood Alex’s comments,



>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can lead to ping-ponging as others have noted.  Better to



Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.

His concern is this might cause ping-pong, not about preferred domain. Since 
preferred domain can solve the ping-pong issue, it shall address his concern 
here.



Regards,

Samuel Li



From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Monday, March 19, 2018 3:45 PM
To: Li, Samuel >; Marek Olšák 
>; Koenig, Christian 
>
Cc: Alex Deucher >; Michel 
Dänzer >; amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support



Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.

And I completely agree with Alex here. So overriding the domain in the kernel 
is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a few lines 
of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.



>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex’s concern is about ping-pong, not about the preferred domain.


Regards,

Samuel Li



From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 3:39 PM
To: Koenig, Christian 

Cc: Li, Samuel ; Michel Dänzer 
; Alex Deucher 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support



On Mon, Mar 19,

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Christian König

Hi Sam,

no that is seriously a no-go. The placement decision is done my 
userspace, not the kernel.

The only time the kernel overrides that decision is a) hardware 
requirements (for example old UVD on AGP systems) and b) memory pressure.

We are have actually pushed quite hard on avoiding and removing places 
where the kernel overrides the placement decision of userspace, so I 
don't think we should add any new ones.

Regards,
Christian.

Am 19.03.2018 um 20:58 schrieb Li, Samuel:

Alex,

I assume you are talking the three scenarios here, 1)VRAM, 2)GTT, 
3)VRAM/GTT.

But kernel will need the decision too(amdgpufb). I think it shall be 
better to do it in kernel, instead of different clients(mesa, ddx, 
kernel …)

Regards,

Samuel Li

*From:*Deucher, Alexander
*Sent:* Monday, March 19, 2018 3:54 PM
*To:* Li, Samuel ; Koenig, Christian 
; Marek Olšák 
*Cc:* Alex Deucher ; Michel Dänzer 
; amd-gfx list 
*Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display 
support

My personal preference is still to plumb this through to mesa rather 
than forcing it in the kernel.

Alex

*From:*amd-gfx > on behalf of Li, 
Samuel >

*Sent:* Monday, March 19, 2018 3:50:34 PM
*To:* Koenig, Christian; Marek Olšák
*Cc:* Alex Deucher; Michel Dänzer; amd-gfx list
*Subject:* RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display 
support

Christian,

You misunderstood Alex’s comments,

>Regardless of which scenarios we need to support, I think we also need
>to really plumb this through to mesa however since user space is who
>ultimately requests the location.  Overriding it in the kernel gets
>tricky and can lead to ping-ponging as others have noted.  Better to

Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.

His concern is this might cause ping-pong, not about preferred domain. 
Since preferred domain can solve the ping-pong issue, it shall address 
his concern here.

Regards,

Samuel Li

*From:*Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
*Sent:* Monday, March 19, 2018 3:45 PM
*To:* Li, Samuel >; Marek 
Olšák >; Koenig, Christian 
>
*Cc:* Alex Deucher >; Michel Dänzer >; amd-gfx list 
>
*Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display 
support

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.

And I completely agree with Alex here. So overriding the domain in the 
kernel is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a 
few lines of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.

>I think that the consensus with Alex and me is that we should
avoid exactly that.
Christian, Alex’s concern is about ping-pong, not about the
preferred domain.

Regards,

Samuel Li

*From:*Marek Olšák [mailto:mar...@gmail.com]
*Sent:* Monday, March 19, 2018 3:39 PM
*To:* Koenig, Christian 

*Cc:* Li, Samuel  ;
Michel Dänzer  ;
Alex Deucher 
; amd-gfx list

*Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather
display support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König
> wrote:

I think that the consensus with Alex and me is that we should
avoid exactly that.

Overriding the preferred domain in the kernel is a no-go for
that patch set, so please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel.
If you overwrite the preferred domain of the buffer in the kernel,
there will be no ping-ponging between domains. Mesa never changes
the

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Li, Samuel

Alex,

I assume you are talking the three scenarios here, 1)VRAM, 2)GTT, 3)VRAM/GTT.
But kernel will need the decision too(amdgpufb). I think it shall be better to 
do it in kernel, instead of different clients(mesa, ddx, kernel ...)

Regards,
Samuel Li

From: Deucher, Alexander
Sent: Monday, March 19, 2018 3:54 PM
To: Li, Samuel ; Koenig, Christian 
; Marek Olšák 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

My personal preference is still to plumb this through to mesa rather than 
forcing it in the kernel.

Alex

From: amd-gfx 
>
 on behalf of Li, Samuel >
Sent: Monday, March 19, 2018 3:50:34 PM
To: Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Christian,

You misunderstood Alex's comments,

>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can lead to ping-ponging as others have noted.  Better to

Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.

His concern is this might cause ping-pong, not about preferred domain. Since 
preferred domain can solve the ping-pong issue, it shall address his concern 
here.

Regards,

Samuel Li

From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Monday, March 19, 2018 3:45 PM
To: Li, Samuel >; Marek Olšák 
>; Koenig, Christian 
>
Cc: Alex Deucher >; Michel 
Dänzer >; amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.

And I completely agree with Alex here. So overriding the domain in the kernel 
is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a few lines 
of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.

>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex's concern is about ping-pong, not about the preferred domain.

Regards,

Samuel Li

From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 3:39 PM
To: Koenig, Christian 

Cc: Li, Samuel ; Michel Dänzer 
; Alex Deucher 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:

I think that the consensus with Alex and me is that we should avoid exactly 
that.

Overriding the preferred domain in the kernel is a no-go for that patch set, so 
please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If you 
overwrite the preferred domain of the buffer in the kernel, there will be no 
ping-ponging between domains. Mesa never changes the initial preferred domain.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:

I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex Deucher 
>
Cc: amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Am 08.03.2018

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Deucher, Alexander

My personal preference is still to plumb this through to mesa rather than 
forcing it in the kernel.

Alex

From: amd-gfx  on behalf of Li, Samuel 

Sent: Monday, March 19, 2018 3:50:34 PM
To: Koenig, Christian; Marek Olšák
Cc: Alex Deucher; Michel Dänzer; amd-gfx list
Subject: RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Christian,

You misunderstood Alex’s comments,

>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can lead to ping-ponging as others have noted.  Better to

Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.

His concern is this might cause ping-pong, not about preferred domain. Since 
preferred domain can solve the ping-pong issue, it shall address his concern 
here.

Regards,

Samuel Li

From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Monday, March 19, 2018 3:45 PM
To: Li, Samuel ; Marek Olšák ; Koenig, 
Christian 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.

And I completely agree with Alex here. So overriding the domain in the kernel 
is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a few lines 
of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.

>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex’s concern is about ping-pong, not about the preferred domain.

Regards,

Samuel Li

From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 3:39 PM
To: Koenig, Christian 

Cc: Li, Samuel ; Michel Dänzer 
; Alex Deucher 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:

I think that the consensus with Alex and me is that we should avoid exactly 
that.

Overriding the preferred domain in the kernel is a no-go for that patch set, so 
please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If you 
overwrite the preferred domain of the buffer in the kernel, there will be no 
ping-ponging between domains. Mesa never changes the initial preferred domain.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:

I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex Deucher 
>
Cc: amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:

On 2018-03-07 10:47 AM, Christian König wrote:

Am 07.03.2018 um 09:42 schrieb Michel Dänzer:

On 2018-03-06 07:23 PM, Christian König wrote:

E.g. the last time I tested it placing things into GTT still
resulted in quite a performance penalty for rendering.

FWIW, I think the penalty is most likely IOMMU related. Last time I
tested, I couldn't measure a big difference with IOMMU disabled.

No, the penalty I'm talking about came from the ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where unmodified, so both
still assumed VRAM as placement for scanout BOs, but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the VRAM BO to GTT

and

after unpinning it on the first command submission which used the BO
we moved it back to VRAM again.

In the meantime, I've

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Li, Samuel

Christian,

You misunderstood Alex’s comments,

>Regardless of which scenarios we need to support, I think we also need

>to really plumb this through to mesa however since user space is who

>ultimately requests the location.  Overriding it in the kernel gets

>tricky and can lead to ping-ponging as others have noted.  Better to

Here Alex mentioned the scenarios is 1)VRAM, 2)GTT, 3)VRAM/GTT.
His concern is this might cause ping-pong, not about preferred domain. Since 
preferred domain can solve the ping-pong issue, it shall address his concern 
here.

Regards,
Samuel Li

From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Monday, March 19, 2018 3:45 PM
To: Li, Samuel ; Marek Olšák ; Koenig, 
Christian 
Cc: Alex Deucher ; Michel Dänzer ; 
amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need

to really plumb this through to mesa however since user space is who

ultimately requests the location.  Overriding it in the kernel gets

tricky and can lead to ping-ponging as others have noted.  Better to

have user space know what chips support it or not and request display

buffers in GTT or VRAM from the start.
And I completely agree with Alex here. So overriding the domain in the kernel 
is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a few lines 
of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:
Agreed.

>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex’s concern is about ping-pong, not about the preferred domain.

Regards,
Samuel Li

From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 3:39 PM
To: Koenig, Christian 

Cc: Li, Samuel ; Michel Dänzer 
; Alex Deucher 
; amd-gfx list 

Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:
I think that the consensus with Alex and me is that we should avoid exactly 
that.

Overriding the preferred domain in the kernel is a no-go for that patch set, so 
please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If you 
overwrite the preferred domain of the buffer in the kernel, there will be no 
ping-ponging between domains. Mesa never changes the initial preferred domain.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:
I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex Deucher 
>
Cc: amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:
On 2018-03-07 10:47 AM, Christian König wrote:
Am 07.03.2018 um 09:42 schrieb Michel Dänzer:
On 2018-03-06 07:23 PM, Christian König wrote:
E.g. the last time I tested it placing things into GTT still
resulted in quite a performance penalty for rendering.
FWIW, I think the penalty is most likely IOMMU related. Last time I
tested, I couldn't measure a big difference with IOMMU disabled.
No, the penalty I'm talking about came from the ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where unmodified, so both
still assumed VRAM as placement for scanout BOs, but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the VRAM BO to GTT
and
after unpinning it on the first command submission which used the BO
we moved it back to VRAM again.
In the meantime, I've had the same idea as Marek: Can't the kernel
driver simply change the BO's preferred domain to GTT when scanning
out from it? Then it won't move back to VRAM.
Yes, I've considered this as well.

But I think making the decision in Mesa is the cleaner approach.

E.g. so far we only override the placement decision of userspace for two
reasons:
1. We where running out of memory in VRAM.
2. We have a hardware restriction which makes VRAM usage mandatory.

And even then we never

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Christian König

Quoting Alex:

Regardless of which scenarios we need to support, I think we also need
to really plumb this through to mesa however since user space is who
ultimately requests the location.  Overriding it in the kernel gets
tricky and can lead to ping-ponging as others have noted.  Better to
have user space know what chips support it or not and request display
buffers in GTT or VRAM from the start.
And I completely agree with Alex here. So overriding the domain in the 
kernel is a serious NAK from my side as well.

Please implement the necessary bits in Mesa, shouldn't be more than a 
few lines of code anyway.

Regards,
Christian.

Am 19.03.2018 um 20:42 schrieb Li, Samuel:

Agreed.

>I think that the consensus with Alex and me is that we should avoid 
exactly that.
Christian, Alex’s concern is about ping-pong, not about the preferred 
domain.

Regards,

Samuel Li

*From:*Marek Olšák [mailto:mar...@gmail.com]
*Sent:* Monday, March 19, 2018 3:39 PM
*To:* Koenig, Christian 
*Cc:* Li, Samuel ; Michel Dänzer 
; Alex Deucher ; amd-gfx 
list 
*Subject:* Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display 
support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:

I think that the consensus with Alex and me is that we should
avoid exactly that.

Overriding the preferred domain in the kernel is a no-go for that
patch set, so please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If 
you overwrite the preferred domain of the buffer in the kernel, there 
will be no ping-ponging between domains. Mesa never changes the 
initial preferred domain.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:

I agree with Marek/Michel: since kernel sets the domain before
scanning out, it shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex
Deucher >
Cc: amd-gfx list >
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather
display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:

On 2018-03-07 10:47 AM, Christian König wrote:

Am 07.03.2018 um 09:42 schrieb Michel Dänzer:

On 2018-03-06 07:23 PM, Christian König wrote:

E.g. the last time I tested it placing
things into GTT still
resulted in quite a performance penalty
for rendering.

FWIW, I think the penalty is most likely IOMMU
related. Last time I
tested, I couldn't measure a big difference
with IOMMU disabled.

No, the penalty I'm talking about came from the
ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where
unmodified, so both
still assumed VRAM as placement for scanout BOs,
but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the
VRAM BO to GTT

and

after unpinning it on the first command submission
which used the BO
we moved it back to VRAM again.

In the meantime, I've had the same idea as Marek:
Can't the kernel
driver simply change the BO's preferred domain to GTT
when scanning
out from it? Then it won't move back to VRAM.

Yes, I've considered this as well.

But I think making the decision in Mesa is the cleaner
approach.

E.g. so far we only override the placement decision of
userspace for two
reasons:
1. We where running out of memory in VRAM.
2. We have a hardware restriction which makes VRAM usage
mandatory.

And even then we never adjust the placement permanently,
we just
temporary moved the buffer where it was needed and moved
it back after
the operation

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Christian König

Am 19.03.2018 um 20:39 schrieb Marek Olšák:
On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:

I think that the consensus with Alex and me is that we should
avoid exactly that.

Overriding the preferred domain in the kernel is a no-go for that
patch set, so please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If 
you overwrite the preferred domain of the buffer in the kernel, there 
will be no ping-ponging between domains. Mesa never changes the 
initial preferred domain.

Yeah, but it can set the initial domain based on what it knows how the 
buffer will be used.

E.g. when scanout from GTT is supported we would like to always set the 
initial domain as GTT instead of VRAM.

Christian.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:

I agree with Marek/Michel: since kernel sets the domain before
scanning out, it shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex
Deucher >
Cc: amd-gfx list >
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather
display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:

On 2018-03-07 10:47 AM, Christian König wrote:

Am 07.03.2018 um 09:42 schrieb Michel Dänzer:

On 2018-03-06 07:23 PM, Christian König wrote:

E.g. the last time I tested it placing
things into GTT still
resulted in quite a performance penalty
for rendering.

FWIW, I think the penalty is most likely IOMMU
related. Last time I
tested, I couldn't measure a big difference
with IOMMU disabled.

No, the penalty I'm talking about came from the
ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where
unmodified, so both
still assumed VRAM as placement for scanout BOs,
but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the
VRAM BO to GTT

and

after unpinning it on the first command submission
which used the BO
we moved it back to VRAM again.

In the meantime, I've had the same idea as Marek:
Can't the kernel
driver simply change the BO's preferred domain to GTT
when scanning
out from it? Then it won't move back to VRAM.

Yes, I've considered this as well.

But I think making the decision in Mesa is the cleaner
approach.

E.g. so far we only override the placement decision of
userspace for two
reasons:
1. We where running out of memory in VRAM.
2. We have a hardware restriction which makes VRAM usage
mandatory.

And even then we never adjust the placement permanently,
we just
temporary moved the buffer where it was needed and moved
it back after
the operation completed.

Additional to that Mesa might want to set even more flags
and/or changes
it's behavior. E.g. use a tilling mode which both importer
and export in an
A+A laptop understands etc...

Regards,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org 
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Li, Samuel

Agreed.

>I think that the consensus with Alex and me is that we should avoid exactly 
>that.
Christian, Alex’s concern is about ping-pong, not about the preferred domain.

Regards,
Samuel Li

From: Marek Olšák [mailto:mar...@gmail.com]
Sent: Monday, March 19, 2018 3:39 PM
To: Koenig, Christian 
Cc: Li, Samuel ; Michel Dänzer ; Alex 
Deucher ; amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
> wrote:
I think that the consensus with Alex and me is that we should avoid exactly 
that.

Overriding the preferred domain in the kernel is a no-go for that patch set, so 
please implement the discussed changes in Mesa.

I don't see how Mesa can make a smarter decision than the kernel. If you 
overwrite the preferred domain of the buffer in the kernel, there will be no 
ping-ponging between domains. Mesa never changes the initial preferred domain.

Marek

Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:
I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li

-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer >; Li, Samuel
>; Alex Deucher 
>
Cc: amd-gfx list 
>
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:
On 2018-03-07 10:47 AM, Christian König wrote:
Am 07.03.2018 um 09:42 schrieb Michel Dänzer:
On 2018-03-06 07:23 PM, Christian König wrote:
E.g. the last time I tested it placing things into GTT still
resulted in quite a performance penalty for rendering.
FWIW, I think the penalty is most likely IOMMU related. Last time I
tested, I couldn't measure a big difference with IOMMU disabled.
No, the penalty I'm talking about came from the ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where unmodified, so both
still assumed VRAM as placement for scanout BOs, but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the VRAM BO to GTT
and
after unpinning it on the first command submission which used the BO
we moved it back to VRAM again.
In the meantime, I've had the same idea as Marek: Can't the kernel
driver simply change the BO's preferred domain to GTT when scanning
out from it? Then it won't move back to VRAM.
Yes, I've considered this as well.

But I think making the decision in Mesa is the cleaner approach.

E.g. so far we only override the placement decision of userspace for two
reasons:
1. We where running out of memory in VRAM.
2. We have a hardware restriction which makes VRAM usage mandatory.

And even then we never adjust the placement permanently, we just
temporary moved the buffer where it was needed and moved it back after
the operation completed.

Additional to that Mesa might want to set even more flags and/or changes
it's behavior. E.g. use a tilling mode which both importer and export in an
A+A laptop understands etc...

Regards,
Christian.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Marek Olšák

On Mon, Mar 19, 2018 at 3:27 PM, Christian König 
wrote:

> I think that the consensus with Alex and me is that we should avoid
> exactly that.
>
> Overriding the preferred domain in the kernel is a no-go for that patch
> set, so please implement the discussed changes in Mesa.
>

I don't see how Mesa can make a smarter decision than the kernel. If you
overwrite the preferred domain of the buffer in the kernel, there will be
no ping-ponging between domains. Mesa never changes the initial preferred
domain.

Marek



>
> Regards,
> Christian.
>
>
> Am 19.03.2018 um 20:22 schrieb Li, Samuel:
>
>> I agree with Marek/Michel: since kernel sets the domain before scanning
>> out, it shall update the preferred domain here too.
>>
>> Regards,
>> Samuel Li
>>
>>
>> -Original Message-
>>> From: Koenig, Christian
>>> Sent: Thursday, March 08, 2018 4:07 AM
>>> To: Michel Dänzer ; Li, Samuel
>>> ; Alex Deucher 
>>> Cc: amd-gfx list 
>>> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display
>>> support
>>>
>>> Am 08.03.2018 um 09:35 schrieb Michel Dänzer:
>>>
 On 2018-03-07 10:47 AM, Christian König wrote:

> Am 07.03.2018 um 09:42 schrieb Michel Dänzer:
>
>> On 2018-03-06 07:23 PM, Christian König wrote:
>>
>> E.g. the last time I tested it placing things into GTT still
>>> resulted in quite a performance penalty for rendering.
>>>
>> FWIW, I think the penalty is most likely IOMMU related. Last time I
>> tested, I couldn't measure a big difference with IOMMU disabled.
>>
> No, the penalty I'm talking about came from the ping/pong we did with
> the scanout buffers.
>
> See when I tested this the DDX and Mesa where unmodified, so both
> still assumed VRAM as placement for scanout BOs, but the kernel
> forced scanout BOs into GTT for testing.
>
> So what happened was that on scanout we moved the VRAM BO to GTT
>
 and
>>>
 after unpinning it on the first command submission which used the BO
> we moved it back to VRAM again.
>
 In the meantime, I've had the same idea as Marek: Can't the kernel
 driver simply change the BO's preferred domain to GTT when scanning
 out from it? Then it won't move back to VRAM.

>>> Yes, I've considered this as well.
>>>
>>> But I think making the decision in Mesa is the cleaner approach.
>>>
>>> E.g. so far we only override the placement decision of userspace for two
>>> reasons:
>>> 1. We where running out of memory in VRAM.
>>> 2. We have a hardware restriction which makes VRAM usage mandatory.
>>>
>>> And even then we never adjust the placement permanently, we just
>>> temporary moved the buffer where it was needed and moved it back after
>>> the operation completed.
>>>
>>> Additional to that Mesa might want to set even more flags and/or changes
>>> it's behavior. E.g. use a tilling mode which both importer and export in
>>> an
>>> A+A laptop understands etc...
>>>
>>> Regards,
>>> Christian.
>>>
>>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Christian König

I think that the consensus with Alex and me is that we should avoid 
exactly that.


Overriding the preferred domain in the kernel is a no-go for that patch 
set, so please implement the discussed changes in Mesa.


Regards,
Christian.

Am 19.03.2018 um 20:22 schrieb Li, Samuel:

I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li



-Original Message-
From: Koenig, Christian
Sent: Thursday, March 08, 2018 4:07 AM
To: Michel Dänzer ; Li, Samuel
; Alex Deucher 
Cc: amd-gfx list 
Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

Am 08.03.2018 um 09:35 schrieb Michel Dänzer:

On 2018-03-07 10:47 AM, Christian König wrote:

Am 07.03.2018 um 09:42 schrieb Michel Dänzer:

On 2018-03-06 07:23 PM, Christian König wrote:


E.g. the last time I tested it placing things into GTT still
resulted in quite a performance penalty for rendering.

FWIW, I think the penalty is most likely IOMMU related. Last time I
tested, I couldn't measure a big difference with IOMMU disabled.

No, the penalty I'm talking about came from the ping/pong we did with
the scanout buffers.

See when I tested this the DDX and Mesa where unmodified, so both
still assumed VRAM as placement for scanout BOs, but the kernel
forced scanout BOs into GTT for testing.

So what happened was that on scanout we moved the VRAM BO to GTT

and

after unpinning it on the first command submission which used the BO
we moved it back to VRAM again.

In the meantime, I've had the same idea as Marek: Can't the kernel
driver simply change the BO's preferred domain to GTT when scanning
out from it? Then it won't move back to VRAM.

Yes, I've considered this as well.

But I think making the decision in Mesa is the cleaner approach.

E.g. so far we only override the placement decision of userspace for two
reasons:
1. We where running out of memory in VRAM.
2. We have a hardware restriction which makes VRAM usage mandatory.

And even then we never adjust the placement permanently, we just
temporary moved the buffer where it was needed and moved it back after
the operation completed.

Additional to that Mesa might want to set even more flags and/or changes
it's behavior. E.g. use a tilling mode which both importer and export in an
A+A laptop understands etc...

Regards,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add documentation for amdgpu_device.c

2018-03-19 Thread Christian König


Am 19.03.2018 um 19:27 schrieb Alex Deucher:

On Fri, Mar 16, 2018 at 4:19 PM, Alex Deucher  wrote:

Add kernel doc for the functions in amdgpu_device.c

Signed-off-by: Alex Deucher 

Ping?


Acked-by: Christian König 



Alex


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 439 -
  1 file changed, 427 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5bb88aa0a58e..06b6d70c4210 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -89,6 +89,14 @@ static const char *amdgpu_asic_name[] = {

  static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev);

+/**
+ * amdgpu_device_is_px - Is the device is a dGPU with HG/PX power control
+ *
+ * @dev: drm_device pointer
+ *
+ * Returns true if the device is a dGPU with HG/PX power control,
+ * otherwise return false.
+ */
  bool amdgpu_device_is_px(struct drm_device *dev)
  {
 struct amdgpu_device *adev = dev->dev_private;
@@ -101,6 +109,15 @@ bool amdgpu_device_is_px(struct drm_device *dev)
  /*
   * MMIO register access helper functions.
   */
+/**
+ * amdgpu_mm_rreg - read a memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ * @acc_flags: access flags which require special behavior
+ *
+ * Returns the 32 bit value from the offset specified.
+ */
  uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, uint32_t reg,
 uint32_t acc_flags)
  {
@@ -129,6 +146,14 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
uint32_t reg,
   *
  */

+/**
+ * amdgpu_mm_rreg8 - read a memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @offset: byte aligned register offset
+ *
+ * Returns the 8 bit value from the offset specified.
+ */
  uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) {
 if (offset < adev->rmmio_size)
 return (readb(adev->rmmio + offset));
@@ -141,6 +166,15 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, 
uint32_t offset) {
   * @value: the value want to be written to the register
   *
  */
+/**
+ * amdgpu_mm_wreg8 - read a memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @offset: byte aligned register offset
+ * @value: 8 bit value to write
+ *
+ * Writes the value specified to the offset specified.
+ */
  void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
value) {
 if (offset < adev->rmmio_size)
 writeb(value, adev->rmmio + offset);
@@ -148,7 +182,16 @@ void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t 
offset, uint8_t value)
 BUG();
  }

-
+/**
+ * amdgpu_mm_wreg - write to a memory mapped IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ * @v: 32 bit value to write to the register
+ * @acc_flags: access flags which require special behavior
+ *
+ * Writes the value specified to the offset specified.
+ */
  void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v,
 uint32_t acc_flags)
  {
@@ -177,6 +220,14 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v,
 }
  }

+/**
+ * amdgpu_io_rreg - read an IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ *
+ * Returns the 32 bit value from the offset specified.
+ */
  u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg)
  {
 if ((reg * 4) < adev->rio_mem_size)
@@ -187,6 +238,15 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg)
 }
  }

+/**
+ * amdgpu_io_wreg - write to an IO register
+ *
+ * @adev: amdgpu_device pointer
+ * @reg: dword aligned register offset
+ * @v: 32 bit value to write to the register
+ *
+ * Writes the value specified to the offset specified.
+ */
  void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v)
  {
 if (adev->asic_type >= CHIP_VEGA10 && reg == 0) {
@@ -355,6 +415,14 @@ static void amdgpu_block_invalid_wreg(struct amdgpu_device 
*adev,
 BUG();
  }

+/**
+ * amdgpu_device_vram_scratch_init - allocate the VRAM scratch page
+ *
+ * @adev: amdgpu device pointer
+ *
+ * Allocates a scratch page of VRAM for use by various things in the
+ * driver.
+ */
  static int amdgpu_device_vram_scratch_init(struct amdgpu_device *adev)
  {
 return amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE,
@@ -364,6 +432,13 @@ static int amdgpu_device_vram_scratch_init(struct 
amdgpu_device *adev)
(void **)>vram_scratch.ptr);
  }

+/**
+ * amdgpu_device_vram_scratch_fini - Free the VRAM scratch page
+ *
+ * @adev: amdgpu device pointer
+ *
+ * Frees the VRAM scratch page.
+ */
  static void amdgpu_device_vram_scratch_fini(struct amdgpu_device *adev)
  {

RE: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support

2018-03-19 Thread Li, Samuel

I agree with Marek/Michel: since kernel sets the domain before scanning out, it 
shall update the preferred domain here too.

Regards,
Samuel Li


> -Original Message-
> From: Koenig, Christian
> Sent: Thursday, March 08, 2018 4:07 AM
> To: Michel Dänzer ; Li, Samuel
> ; Alex Deucher 
> Cc: amd-gfx list 
> Subject: Re: [PATCH 1/2] drm/amdgpu: Enable scatter gather display support
> 
> Am 08.03.2018 um 09:35 schrieb Michel Dänzer:
> > On 2018-03-07 10:47 AM, Christian König wrote:
> >> Am 07.03.2018 um 09:42 schrieb Michel Dänzer:
> >>> On 2018-03-06 07:23 PM, Christian König wrote:
> >>>
>  E.g. the last time I tested it placing things into GTT still
>  resulted in quite a performance penalty for rendering.
> >>> FWIW, I think the penalty is most likely IOMMU related. Last time I
> >>> tested, I couldn't measure a big difference with IOMMU disabled.
> >> No, the penalty I'm talking about came from the ping/pong we did with
> >> the scanout buffers.
> >>
> >> See when I tested this the DDX and Mesa where unmodified, so both
> >> still assumed VRAM as placement for scanout BOs, but the kernel
> >> forced scanout BOs into GTT for testing.
> >>
> >> So what happened was that on scanout we moved the VRAM BO to GTT
> and
> >> after unpinning it on the first command submission which used the BO
> >> we moved it back to VRAM again.
> > In the meantime, I've had the same idea as Marek: Can't the kernel
> > driver simply change the BO's preferred domain to GTT when scanning
> > out from it? Then it won't move back to VRAM.
> 
> Yes, I've considered this as well.
> 
> But I think making the decision in Mesa is the cleaner approach.
> 
> E.g. so far we only override the placement decision of userspace for two
> reasons:
> 1. We where running out of memory in VRAM.
> 2. We have a hardware restriction which makes VRAM usage mandatory.
> 
> And even then we never adjust the placement permanently, we just
> temporary moved the buffer where it was needed and moved it back after
> the operation completed.
> 
> Additional to that Mesa might want to set even more flags and/or changes
> it's behavior. E.g. use a tilling mode which both importer and export in an
> A+A laptop understands etc...
> 
> Regards,
> Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4

2018-03-19 Thread Felix Kuehling

On 2018-03-19 12:39 PM, Christian König wrote:
> So coming back to this series once more.
>
> Patch #1, #3 are Reviewed-by: Christian König .
>
> Patch #2, #4 - #13 and #18-#19 are Acked-by: Christian König
> .
>
> Patch #14: What's the difference to setting vramlimit=$size_of_bar ?

The debug_largebar option only affects KFD. Graphics can still use all
memory.

>
> Patch #15 & #20: Why is that actually still needed? I thought we have
> fixed all dependencies and can now use the "standard" way of attaching
> fences to reservation objects to do this.

Patch 15 adds a KFD-specific MMU notifier, because the graphics MMU
notifier deals with amdgpu_cs command submission. We need a completely
different treatment of MMU notifiers for KFD. We need to stop user mode
queues.

Patch 20 implements the user mode queue preemption mechanism, and the
corresponding restore function that re-validates userptr BOs before
restarting user mode queues.

I think you're implying that the graphics MMU notifier would wait for
the eviction fence and trigger a regular eviction in KFD. I haven't
tried that. The MMU notifier and userptr eviction mechanism was
implemented in KFD before we had the TTM evictions. We didn't go back
and revisit it after that. There are a few major differences that make
me want to keep the two types of evictions separate, though:

  * TTM evictions are the result of memory pressure (triggered by
another process)
  o Most MMU notifiers are triggered by something in the same
process (fork, mprotect, etc.)
  o Thus the restore delay after MMU notifiers can be much shorter
  * TTM evictions are done asynchronously in a delayed worker
  o MMU notifiers are synchronous, queues need to be stopped before
returning
  * KFD's TTM eviction/restore code doesn't handle userptrs
(get_user_pages, etc)
  o MMU notifier restore worker is specialized to handle just userptrs


>
> Saying so I still need to take a closer look at patch #20.
>
> Patch #16: Looks good to me in general, but I think it would be safer
> if we grab a reference to the task structure. Otherwise grabbing pages
> from a mm_struct sounds a bit scary to me.

You're right. I've never seen it cause problems when testing process
termination, probably because during process termination KFD cancels the
delayed worker that calls this function. But I'll fix this and take a
proper reference.

>
> Patch #17: I think it would be better to allocate the node when the
> locks are not held and free it when we find that it isn't used, but no
> big deal.

OK. I'll change that.

Thanks,
  Felix

>
> Regards,
> Christian.
>
> Am 15.03.2018 um 22:27 schrieb Felix Kuehling:
>> Rebased and integrated review feedback from v3:
>> * Removed vm->vm_context field
>> * Use uninterruptible waiting in initial PD validation to avoid
>> ERESTARTSYS
>> * Return number of successful map/unmap operations in failure cases
>> * Facilitate partial retry after failed map/unmap
>> * Added comments with parameter descriptions to new APIs
>> * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only
>>
>> This patch series also adds Userptr support in patches 15-20.
>>
>> Felix Kuehling (19):
>>    drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
>>    drm/amdgpu: Fix initial validation of PD BO for KFD VMs
>>    drm/amdgpu: Add helper to turn an existing VM into a compute VM
>>    drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
>>    drm/amdkfd: Create KFD VMs on demand
>>    drm/amdkfd: Remove limit on number of GPUs
>>    drm/amdkfd: Aperture setup for dGPUs
>>    drm/amdkfd: Add per-process IDR for buffer handles
>>    drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
>>    drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
>>    drm/amdkfd: Add ioctls for GPUVM memory management
>>    drm/amdkfd: Kmap event page for dGPUs
>>    drm/amdkfd: Add module option for testing large-BAR functionality
>>    drm/amdgpu: Add MMU notifier type for KFD userptr
>>    drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
>>    drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
>>    drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
>>    drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
>>    drm/amdgpu: Add userptr support for KFD
>>
>> Oak Zeng (1):
>>    drm/amdkfd: Populate DRM render device minor
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  37 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818
>> ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c |  96 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |  11 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  30 +-
>>

Re: [PATCH] drm/amdgpu: add VCN to firmware query interface

2018-03-19 Thread Leo Liu


Acked-by: Leo Liu 


On 03/19/2018 02:26 PM, Alex Deucher wrote:

On Fri, Mar 16, 2018 at 12:07 PM, Alex Deucher  wrote:

Need to be able to query the VCN firmware version from
userspace to determine supported features, etc.

Signed-off-by: Alex Deucher 

Ping?

Alex


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 
  include/uapi/drm/amdgpu_drm.h   |  2 ++
  2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 04a2219ce25e..9290a7b81950 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -213,6 +213,10 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
 fw_info->ver = adev->uvd.fw_version;
 fw_info->feature = 0;
 break;
+   case AMDGPU_INFO_FW_VCN:
+   fw_info->ver = adev->vcn.fw_version;
+   fw_info->feature = 0;
+   break;
 case AMDGPU_INFO_FW_GMC:
 fw_info->ver = adev->gmc.fw_version;
 fw_info->feature = 0;
@@ -1314,6 +1318,14 @@ static int amdgpu_debugfs_firmware_info(struct seq_file 
*m, void *data)
i, fw_info.feature, fw_info.ver);
 }

+   /* VCN */
+   query_fw.fw_type = AMDGPU_INFO_FW_VCN;
+   ret = amdgpu_firmware_info(_info, _fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "VCN feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
 return 0;
  }

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index a967e8de5973..68d93b482f42 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -687,6 +687,8 @@ struct drm_amdgpu_cs_chunk_data {
 #define AMDGPU_INFO_FW_SOS  0x0c
 /* Subquery id: Query PSP ASD firmware version */
 #define AMDGPU_INFO_FW_ASD  0x0d
+   /* Subquery id: Query VCN firmware version */
+   #define AMDGPU_INFO_FW_VCN  0x0e
  /* number of bytes moved for TTM migration */
  #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
  /* the used VRAM size */
--
2.13.6


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4.9 121/241] drm/radeon: Fail fb creation from imported dma-bufs.

2018-03-19 Thread Greg Kroah-Hartman

4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Christopher James Halse Rogers 


[ Upstream commit a294043b2fbd8de69d161457ed0c7a4026bbfa5a ]

Any use of the framebuffer will migrate it to VRAM, which is not sensible for
an imported dma-buf.

v2: Use DRM_DEBUG_KMS to prevent userspace accidentally spamming dmesg.

Reviewed-by: Michel Dänzer 
Reviewed-by: Christian König 
Signed-off-by: Christopher James Halse Rogers 

CC: amd-gfx@lists.freedesktop.org
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/radeon/radeon_display.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -1352,6 +1352,12 @@ radeon_user_framebuffer_create(struct dr
return ERR_PTR(-ENOENT);
}
 
+   /* Handle is imported dma-buf, so cannot be migrated to VRAM for 
scanout */
+   if (obj->import_attach) {
+   DRM_DEBUG_KMS("Cannot create framebuffer from imported 
dma_buf\n");
+   return ERR_PTR(-EINVAL);
+   }
+
radeon_fb = kzalloc(sizeof(*radeon_fb), GFP_KERNEL);
if (radeon_fb == NULL) {
drm_gem_object_unreference_unlocked(obj);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3.18 33/68] drm/radeon: Fail fb creation from imported dma-bufs.

2018-03-19 Thread Greg Kroah-Hartman

3.18-stable review patch.  If anyone has any objections, please let me know.

--

From: Christopher James Halse Rogers 


[ Upstream commit a294043b2fbd8de69d161457ed0c7a4026bbfa5a ]

Any use of the framebuffer will migrate it to VRAM, which is not sensible for
an imported dma-buf.

v2: Use DRM_DEBUG_KMS to prevent userspace accidentally spamming dmesg.

Reviewed-by: Michel Dänzer 
Reviewed-by: Christian König 
Signed-off-by: Christopher James Halse Rogers 

CC: amd-gfx@lists.freedesktop.org
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/radeon/radeon_display.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -1320,6 +1320,12 @@ radeon_user_framebuffer_create(struct dr
return ERR_PTR(-ENOENT);
}
 
+   /* Handle is imported dma-buf, so cannot be migrated to VRAM for 
scanout */
+   if (obj->import_attach) {
+   DRM_DEBUG_KMS("Cannot create framebuffer from imported 
dma_buf\n");
+   return ERR_PTR(-EINVAL);
+   }
+
radeon_fb = kzalloc(sizeof(*radeon_fb), GFP_KERNEL);
if (radeon_fb == NULL) {
drm_gem_object_unreference_unlocked(obj);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4.9 122/241] drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)

2018-03-19 Thread Greg Kroah-Hartman

4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Christopher James Halse Rogers 


[ Upstream commit 1769152ac64b0b07583f696b621624df2ca4c840 ]

Any use of the framebuffer will migrate it to VRAM, which is not sensible for
an imported dma-buf.

v2: Use DRM_DEBUG_KMS to prevent userspace accidentally spamming dmesg.

Reviewed-by: Michel Dänzer 
Reviewed-by: Christian König 
Signed-off-by: Christopher James Halse Rogers 

CC: amd-gfx@lists.freedesktop.org
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -533,6 +533,12 @@ amdgpu_user_framebuffer_create(struct dr
return ERR_PTR(-ENOENT);
}
 
+   /* Handle is imported dma-buf, so cannot be migrated to VRAM for 
scanout */
+   if (obj->import_attach) {
+   DRM_DEBUG_KMS("Cannot create framebuffer from imported 
dma_buf\n");
+   return ERR_PTR(-EINVAL);
+   }
+
amdgpu_fb = kzalloc(sizeof(*amdgpu_fb), GFP_KERNEL);
if (amdgpu_fb == NULL) {
drm_gem_object_unreference_unlocked(obj);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4.4 060/134] drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)

2018-03-19 Thread Greg Kroah-Hartman

4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Christopher James Halse Rogers 


[ Upstream commit 1769152ac64b0b07583f696b621624df2ca4c840 ]

Any use of the framebuffer will migrate it to VRAM, which is not sensible for
an imported dma-buf.

v2: Use DRM_DEBUG_KMS to prevent userspace accidentally spamming dmesg.

Reviewed-by: Michel Dänzer 
Reviewed-by: Christian König 
Signed-off-by: Christopher James Halse Rogers 

CC: amd-gfx@lists.freedesktop.org
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -560,6 +560,12 @@ amdgpu_user_framebuffer_create(struct dr
return ERR_PTR(-ENOENT);
}
 
+   /* Handle is imported dma-buf, so cannot be migrated to VRAM for 
scanout */
+   if (obj->import_attach) {
+   DRM_DEBUG_KMS("Cannot create framebuffer from imported 
dma_buf\n");
+   return ERR_PTR(-EINVAL);
+   }
+
amdgpu_fb = kzalloc(sizeof(*amdgpu_fb), GFP_KERNEL);
if (amdgpu_fb == NULL) {
drm_gem_object_unreference_unlocked(obj);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add documentation for amdgpu_device.c

2018-03-19 Thread Alex Deucher

On Fri, Mar 16, 2018 at 4:19 PM, Alex Deucher  wrote:
> Add kernel doc for the functions in amdgpu_device.c
>
> Signed-off-by: Alex Deucher 

Ping?

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 439 
> -
>  1 file changed, 427 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5bb88aa0a58e..06b6d70c4210 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -89,6 +89,14 @@ static const char *amdgpu_asic_name[] = {
>
>  static void amdgpu_device_get_pcie_info(struct amdgpu_device *adev);
>
> +/**
> + * amdgpu_device_is_px - Is the device is a dGPU with HG/PX power control
> + *
> + * @dev: drm_device pointer
> + *
> + * Returns true if the device is a dGPU with HG/PX power control,
> + * otherwise return false.
> + */
>  bool amdgpu_device_is_px(struct drm_device *dev)
>  {
> struct amdgpu_device *adev = dev->dev_private;
> @@ -101,6 +109,15 @@ bool amdgpu_device_is_px(struct drm_device *dev)
>  /*
>   * MMIO register access helper functions.
>   */
> +/**
> + * amdgpu_mm_rreg - read a memory mapped IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @reg: dword aligned register offset
> + * @acc_flags: access flags which require special behavior
> + *
> + * Returns the 32 bit value from the offset specified.
> + */
>  uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, uint32_t reg,
> uint32_t acc_flags)
>  {
> @@ -129,6 +146,14 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
> uint32_t reg,
>   *
>  */
>
> +/**
> + * amdgpu_mm_rreg8 - read a memory mapped IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @offset: byte aligned register offset
> + *
> + * Returns the 8 bit value from the offset specified.
> + */
>  uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) {
> if (offset < adev->rmmio_size)
> return (readb(adev->rmmio + offset));
> @@ -141,6 +166,15 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, 
> uint32_t offset) {
>   * @value: the value want to be written to the register
>   *
>  */
> +/**
> + * amdgpu_mm_wreg8 - read a memory mapped IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @offset: byte aligned register offset
> + * @value: 8 bit value to write
> + *
> + * Writes the value specified to the offset specified.
> + */
>  void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
> value) {
> if (offset < adev->rmmio_size)
> writeb(value, adev->rmmio + offset);
> @@ -148,7 +182,16 @@ void amdgpu_mm_wreg8(struct amdgpu_device *adev, 
> uint32_t offset, uint8_t value)
> BUG();
>  }
>
> -
> +/**
> + * amdgpu_mm_wreg - write to a memory mapped IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @reg: dword aligned register offset
> + * @v: 32 bit value to write to the register
> + * @acc_flags: access flags which require special behavior
> + *
> + * Writes the value specified to the offset specified.
> + */
>  void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v,
> uint32_t acc_flags)
>  {
> @@ -177,6 +220,14 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
> reg, uint32_t v,
> }
>  }
>
> +/**
> + * amdgpu_io_rreg - read an IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @reg: dword aligned register offset
> + *
> + * Returns the 32 bit value from the offset specified.
> + */
>  u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg)
>  {
> if ((reg * 4) < adev->rio_mem_size)
> @@ -187,6 +238,15 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg)
> }
>  }
>
> +/**
> + * amdgpu_io_wreg - write to an IO register
> + *
> + * @adev: amdgpu_device pointer
> + * @reg: dword aligned register offset
> + * @v: 32 bit value to write to the register
> + *
> + * Writes the value specified to the offset specified.
> + */
>  void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v)
>  {
> if (adev->asic_type >= CHIP_VEGA10 && reg == 0) {
> @@ -355,6 +415,14 @@ static void amdgpu_block_invalid_wreg(struct 
> amdgpu_device *adev,
> BUG();
>  }
>
> +/**
> + * amdgpu_device_vram_scratch_init - allocate the VRAM scratch page
> + *
> + * @adev: amdgpu device pointer
> + *
> + * Allocates a scratch page of VRAM for use by various things in the
> + * driver.
> + */
>  static int amdgpu_device_vram_scratch_init(struct amdgpu_device *adev)
>  {
> return amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE,
> @@ -364,6 +432,13 @@ static int amdgpu_device_vram_scratch_init(struct 
> amdgpu_device *adev)
>(void **)>vram_scratch.ptr);
>  }
>
> +/**
> + * amdgpu_device_vram_scratch_fini - Free the VRAM scratch page
> + *
> + * @adev: amdgpu

Re: [PATCH] drm/amdgpu: add VCN to firmware query interface

2018-03-19 Thread Alex Deucher

On Fri, Mar 16, 2018 at 12:07 PM, Alex Deucher  wrote:
> Need to be able to query the VCN firmware version from
> userspace to determine supported features, etc.
>
> Signed-off-by: Alex Deucher 

Ping?

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 
>  include/uapi/drm/amdgpu_drm.h   |  2 ++
>  2 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 04a2219ce25e..9290a7b81950 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -213,6 +213,10 @@ static int amdgpu_firmware_info(struct 
> drm_amdgpu_info_firmware *fw_info,
> fw_info->ver = adev->uvd.fw_version;
> fw_info->feature = 0;
> break;
> +   case AMDGPU_INFO_FW_VCN:
> +   fw_info->ver = adev->vcn.fw_version;
> +   fw_info->feature = 0;
> +   break;
> case AMDGPU_INFO_FW_GMC:
> fw_info->ver = adev->gmc.fw_version;
> fw_info->feature = 0;
> @@ -1314,6 +1318,14 @@ static int amdgpu_debugfs_firmware_info(struct 
> seq_file *m, void *data)
>i, fw_info.feature, fw_info.ver);
> }
>
> +   /* VCN */
> +   query_fw.fw_type = AMDGPU_INFO_FW_VCN;
> +   ret = amdgpu_firmware_info(_info, _fw, adev);
> +   if (ret)
> +   return ret;
> +   seq_printf(m, "VCN feature version: %u, firmware version: 0x%08x\n",
> +  fw_info.feature, fw_info.ver);
> +
> return 0;
>  }
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index a967e8de5973..68d93b482f42 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -687,6 +687,8 @@ struct drm_amdgpu_cs_chunk_data {
> #define AMDGPU_INFO_FW_SOS  0x0c
> /* Subquery id: Query PSP ASD firmware version */
> #define AMDGPU_INFO_FW_ASD  0x0d
> +   /* Subquery id: Query VCN firmware version */
> +   #define AMDGPU_INFO_FW_VCN  0x0e
>  /* number of bytes moved for TTM migration */
>  #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
>  /* the used VRAM size */
> --
> 2.13.6
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add CM_TEST_DEBUG regs for DCN

2018-03-19 Thread Harry Wentland

On 2018-03-19 01:45 PM, Harry Wentland wrote:
> We'd like to use them for reading DCN debug status.
> 
> Signed-off-by: Harry Wentland 
> ---
> 
> See patch for where we use them http://git.amd.com:8080/#/c/137151/1
> 
> Tony can comment on details.
> 

Whoops, didn't mean to include this message. Patch that uses these is coming 
shortly.

Harry

> Harry
> 
>  .../gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_offset.h | 19 
> ---
>  .../drm/amd/include/asic_reg/dcn/dcn_1_0_sh_mask.h|  8 
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_offset.h 
> b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_offset.h
> index 4ccf9681c45d..721c61171045 100644
> --- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_offset.h
> +++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_offset.h
> @@ -3895,6 +3895,10 @@
>  #define mmCM0_CM_MEM_PWR_CTRL_BASE_IDX   
>   2
>  #define mmCM0_CM_MEM_PWR_STATUS  
>   0x0d33
>  #define mmCM0_CM_MEM_PWR_STATUS_BASE_IDX 
>   2
> +#define mmCM0_CM_TEST_DEBUG_INDEX
>   0x0d35
> +#define mmCM0_CM_TEST_DEBUG_INDEX_BASE_IDX   
>   2
> +#define mmCM0_CM_TEST_DEBUG_DATA 
>   0x0d36
> +#define mmCM0_CM_TEST_DEBUG_DATA_BASE_IDX
>   2
>  
>  
>  // addressBlock: dce_dc_dpp0_dispdec_dpp_dcperfmon_dc_perfmon_dispdec
> @@ -4367,7 +4371,10 @@
>  #define mmCM1_CM_MEM_PWR_CTRL_BASE_IDX   
>   2
>  #define mmCM1_CM_MEM_PWR_STATUS  
>   0x0e4e
>  #define mmCM1_CM_MEM_PWR_STATUS_BASE_IDX 
>   2
> -
> +#define mmCM1_CM_TEST_DEBUG_INDEX
>   0x0e50
> +#define mmCM1_CM_TEST_DEBUG_INDEX_BASE_IDX   
>   2
> +#define mmCM1_CM_TEST_DEBUG_DATA 
>   0x0e51
> +#define mmCM1_CM_TEST_DEBUG_DATA_BASE_IDX
>   2
>  
>  // addressBlock: dce_dc_dpp1_dispdec_dpp_dcperfmon_dc_perfmon_dispdec
>  // base address: 0x399c
> @@ -4839,7 +4846,10 @@
>  #define mmCM2_CM_MEM_PWR_CTRL_BASE_IDX   
>   2
>  #define mmCM2_CM_MEM_PWR_STATUS  
>   0x0f69
>  #define mmCM2_CM_MEM_PWR_STATUS_BASE_IDX 
>   2
> -
> +#define mmCM2_CM_TEST_DEBUG_INDEX
>   0x0f6b
> +#define mmCM2_CM_TEST_DEBUG_INDEX_BASE_IDX   
>   2
> +#define mmCM2_CM_TEST_DEBUG_DATA 
>   0x0f6c
> +#define mmCM2_CM_TEST_DEBUG_DATA_BASE_IDX
>   2
>  
>  // addressBlock: dce_dc_dpp2_dispdec_dpp_dcperfmon_dc_perfmon_dispdec
>  // base address: 0x3e08
> @@ -5311,7 +5321,10 @@
>  #define mmCM3_CM_MEM_PWR_CTRL_BASE_IDX   
>   2
>  #define mmCM3_CM_MEM_PWR_STATUS  
>   0x1084
>  #define mmCM3_CM_MEM_PWR_STATUS_BASE_IDX 
>   2
> -
> +#define mmCM3_CM_TEST_DEBUG_INDEX
>   0x1086
> +#define mmCM3_CM_TEST_DEBUG_INDEX_BASE_IDX   
>   2
> +#define mmCM3_CM_TEST_DEBUG_DATA 
>   0x1087
> +#define mmCM3_CM_TEST_DEBUG_DATA_BASE_IDX
>   2
>  
>  // addressBlock: dce_dc_dpp3_dispdec_dpp_dcperfmon_dc_perfmon_dispdec
>  // base address: 0x4274
> diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_sh_mask.h 
> b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_sh_mask.h
> index e2a2f114bd8e..e7c0cad41081 100644
> --- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_sh_mask.h
> +++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_1_0_sh_mask.h
> @@ -14049,6 +14049,14 @@
>  #define CM0_CM_MEM_PWR_STATUS__RGAM_MEM_PWR_STATE__SHIFT

Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4

2018-03-19 Thread Christian König


So coming back to this series once more.

Patch #1, #3 are Reviewed-by: Christian König .

Patch #2, #4 - #13 and #18-#19 are Acked-by: Christian König 
.


Patch #14: What's the difference to setting vramlimit=$size_of_bar ?

Patch #15 & #20: Why is that actually still needed? I thought we have 
fixed all dependencies and can now use the "standard" way of attaching 
fences to reservation objects to do this.


Saying so I still need to take a closer look at patch #20.

Patch #16: Looks good to me in general, but I think it would be safer if 
we grab a reference to the task structure. Otherwise grabbing pages from 
a mm_struct sounds a bit scary to me.


Patch #17: I think it would be better to allocate the node when the 
locks are not held and free it when we find that it isn't used, but no 
big deal.


Regards,
Christian.

Am 15.03.2018 um 22:27 schrieb Felix Kuehling:

Rebased and integrated review feedback from v3:
* Removed vm->vm_context field
* Use uninterruptible waiting in initial PD validation to avoid ERESTARTSYS
* Return number of successful map/unmap operations in failure cases
* Facilitate partial retry after failed map/unmap
* Added comments with parameter descriptions to new APIs
* Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only

This patch series also adds Userptr support in patches 15-20.

Felix Kuehling (19):
   drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
   drm/amdgpu: Fix initial validation of PD BO for KFD VMs
   drm/amdgpu: Add helper to turn an existing VM into a compute VM
   drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
   drm/amdkfd: Create KFD VMs on demand
   drm/amdkfd: Remove limit on number of GPUs
   drm/amdkfd: Aperture setup for dGPUs
   drm/amdkfd: Add per-process IDR for buffer handles
   drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
   drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
   drm/amdkfd: Add ioctls for GPUVM memory management
   drm/amdkfd: Kmap event page for dGPUs
   drm/amdkfd: Add module option for testing large-BAR functionality
   drm/amdgpu: Add MMU notifier type for KFD userptr
   drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
   drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
   drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
   drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
   drm/amdgpu: Add userptr support for KFD

Oak Zeng (1):
   drm/amdkfd: Populate DRM render device minor

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  37 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c |  96 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h |  11 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  30 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  70 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  10 +
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 532 ++
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  |   3 +
  drivers/gpu/drm/amd/amdkfd/kfd_device.c|  40 +-
  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  22 +-
  drivers/gpu/drm/amd/amdkfd/kfd_events.c|  31 +-
  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c   |  59 +-
  drivers/gpu/drm/amd/amdkfd/kfd_module.c|   7 +
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c|   2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c|  37 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  41 ++
  drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 314 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c  |   4 +
  drivers/gpu/drm/amd/amdkfd/kfd_topology.h  |   1 +
  drivers/gpu/drm/amd/include/kgd_kfd_interface.h|  10 +
  include/uapi/linux/kfd_ioctl.h | 122 ++-
  26 files changed, 2090 insertions(+), 213 deletions(-)



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-19 Thread Christian König


Am 19.03.2018 um 16:53 schrieb Chris Wilson:

Quoting Christian König (2018-03-16 14:22:32)
[snip, probably lost too must context]

This allows for full grown pipelining, e.g. the exporter can say I need
to move the buffer for some operation. Then let the move operation wait
for all existing fences in the reservation object and install the fence
of the move operation as exclusive fence.

Ok, the situation I have in mind is the non-pipelined case: revoking
dma-buf for mmu_invalidate_range or shrink_slab. I would need a
completion event that can be waited on the cpu for all the invalidate
callbacks. (Essentially an atomic_t counter plus struct completion; a
lighter version of dma_fence, I wonder where I've seen that before ;)


Actually that is harmless.

When you need to unmap a DMA-buf because of mmu_invalidate_range or 
shrink_slab you need to wait for it's reservation object anyway.


This needs to be done to make sure that the backing memory is now idle, 
it doesn't matter if the jobs where submitted by DMA-buf importers or 
your own driver.


The sg tables pointing to the now released memory might live a bit 
longer, but that is unproblematic and actually intended.


When we would try to destroy the sg tables in an mmu_invalidate_range or 
shrink_slab callback we would run into a lockdep horror.


Regards,
Christian.



Even so, it basically means passing a fence object down to the async
callbacks for them to signal when they are complete. Just to handle the
non-pipelined version. :|
-Chris
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-19 Thread Chris Wilson

Quoting Christian König (2018-03-16 14:22:32)
[snip, probably lost too must context]
> This allows for full grown pipelining, e.g. the exporter can say I need 
> to move the buffer for some operation. Then let the move operation wait 
> for all existing fences in the reservation object and install the fence 
> of the move operation as exclusive fence.

Ok, the situation I have in mind is the non-pipelined case: revoking
dma-buf for mmu_invalidate_range or shrink_slab. I would need a
completion event that can be waited on the cpu for all the invalidate
callbacks. (Essentially an atomic_t counter plus struct completion; a
lighter version of dma_fence, I wonder where I've seen that before ;)

Even so, it basically means passing a fence object down to the async
callbacks for them to signal when they are complete. Just to handle the
non-pipelined version. :|
-Chris
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/amdgpu: passing i2s instance value as platform data

2018-03-19 Thread Deucher, Alexander

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Vijendar Mukunda
> Sent: Monday, March 19, 2018 2:47 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Mukunda,
> Vijendar ; Agrawal, Akshu
> 
> Subject: [PATCH] drm/amd/amdgpu: passing i2s instance value as platform
> data
> 
> i2s instance value is passed as platform data to dwc driver.
> this parameter will be useful to distinguish current i2s instance value when
> multiple i2s controller instances are created.
> 
> Signed-off-by: Vijendar Mukunda 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> index 6cca4d1..61d6cb9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
> @@ -83,6 +83,8 @@
>  #define ACP_TIMEOUT_LOOP 0x00FF
>  #define ACP_DEVS 4
>  #define ACP_SRC_ID   162
> +#define I2S_SP_INSTANCE  1
> +#define I2S_BT_INSTANCE  2
> 

Please add the defines for I2S_SP_INSTANCE and I2S_BT_INSTANCE to the dws 
platform data header so you don't have to define them locally here.
With that fixed, the patch is:
Reviewed-by: Alex Deucher 

>  enum {
>   ACP_TILE_P1 = 0,
> @@ -347,6 +349,7 @@ static int acp_hw_init(void *handle)
>   i2s_pdata[0].snd_rates = SNDRV_PCM_RATE_8000_96000;
>   i2s_pdata[0].i2s_reg_comp1 = ACP_I2S_COMP1_PLAY_REG_OFFSET;
>   i2s_pdata[0].i2s_reg_comp2 = ACP_I2S_COMP2_PLAY_REG_OFFSET;
> + i2s_pdata[0].i2s_instance = I2S_SP_INSTANCE;
>   switch (adev->asic_type) {
>   case CHIP_STONEY:
>   i2s_pdata[1].quirks = DW_I2S_QUIRK_COMP_REG_OFFSET |
> @@ -362,6 +365,7 @@ static int acp_hw_init(void *handle)
>   i2s_pdata[1].snd_rates = SNDRV_PCM_RATE_8000_96000;
>   i2s_pdata[1].i2s_reg_comp1 = ACP_I2S_COMP1_CAP_REG_OFFSET;
>   i2s_pdata[1].i2s_reg_comp2 = ACP_I2S_COMP2_CAP_REG_OFFSET;
> + i2s_pdata[1].i2s_instance = I2S_SP_INSTANCE;
> 
>   i2s_pdata[2].quirks = DW_I2S_QUIRK_COMP_REG_OFFSET;
>   switch (adev->asic_type) {
> @@ -376,6 +380,7 @@ static int acp_hw_init(void *handle)
>   i2s_pdata[2].snd_rates = SNDRV_PCM_RATE_8000_96000;
>   i2s_pdata[2].i2s_reg_comp1 = ACP_BT_COMP1_REG_OFFSET;
>   i2s_pdata[2].i2s_reg_comp2 = ACP_BT_COMP2_REG_OFFSET;
> + i2s_pdata[2].i2s_instance = I2S_BT_INSTANCE;
> 
>   adev->acp.acp_res[0].name = "acp2x_dma";
>   adev->acp.acp_res[0].flags = IORESOURCE_MEM;
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: unpinned DMA-buf exporting v2

2018-03-19 Thread Daniel Vetter

On Fri, Mar 16, 2018 at 02:20:44PM +0100, Christian König wrote:
> Hi everybody,
> 
> since I've got positive feedback from Daniel I continued working on this 
> approach.
> 
> A few issues are still open:
> 1. Daniel suggested that I make the invalidate_mappings callback a parameter 
> of dma_buf_attach().
> 
> This approach unfortunately won't work because when the attachment is
> created the importer is not necessarily ready to handle invalidation
> events.

Why do you have this constraint? This sounds a bit like inverted
create/teardown sequence troubles, where you make an object "life" before
the thing is fully set up.

Can't we fix this by creating the entire ttm scaffolding you'll need for a
dma-buf upfront, and only once you have everything we grab the dma_buf
attachment? At that point you really should be able to evict buffers
again.

Not requiring invalidate_mapping to be set together with the attachment
means we can't ever require importers to support it (e.g. to address your
concern with the userspace dma-buf userptr magic).

> E.g. in the amdgpu example we first need to setup the imported GEM/TMM
> objects and install that in the attachment.
> 
> My solution is to introduce a separate function to grab the locks and
> set the callback, this function could then be used to pin the buffer
> later on if that turns out to be necessary after all.
> 
> 2. With my example setup this currently results in a ping/pong situation
> because the exporter prefers a VRAM placement while the importer prefers
> a GTT placement.
> 
> This results in quite a performance drop, but can be fixed by a simple
> mesa patch which allows shred BOs to be placed in both VRAM and GTT.
> 
> Question is what should we do in the meantime? Accept the performance
> drop or only allow unpinned sharing with new Mesa?

Maybe the exporter should not try to move stuff back into VRAM as long as
there's an active dma-buf? I mean it's really cool that it works, but
maybe let's just do this for a tech demo :-)

Of course if it then runs out of TT then it could still try to move it
back in. And "let's not move it when it's imported" is probably too stupid
too, and will need to be improved again with more heuristics, but would at
least get it off the ground.

Long term you might want to move perhaps once per 10 seconds or so, to get
idle importers to detach. Adjust 10s to match whatever benchmark/workload
you care about.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

2018-03-19 Thread Daniel Vetter

On Fri, Mar 16, 2018 at 02:20:45PM +0100, Christian König wrote:
> Each importer can now provide an invalidate_mappings callback.
> 
> This allows the exporter to provide the mappings without the need to pin
> the backing store.
> 
> v2: don't try to invalidate mappings when the callback is NULL,
> lock the reservation obj while using the attachments,
> add helper to set the callback
> 
> Signed-off-by: Christian König 

Replying here to avoid thread split, but not entirely related.

I thought some more about the lockdep splat discussion, and specifically
that amdgpu needs the reservations for the vm objects when doing a gpu
reset.

Since they're in the same ww_class as all other dma-buf reservations I'm
pretty sure lockdep will complain, at least when cross-release lockdep and
cross-release annotations for dma_fence are merged.

And as long as there's some case where amdgpu needs both the vm object
reservation and other reservations (CS?) then we must have them in the
same class, and in that case the deadlock is real. It'll require an
impressive set of circumstances most likely (the minimal number of threads
we generally needed to actually hit the cross-release stuff was 4+ or
something nuts like that, all doing something else), but it'll be real.

Otoh I think the invalidate stuff here doesn't actually make this worse,
so we can bang our heads against the gpu reset problem at leisure :-)

This stuff here has much more potential to interact badly with core mm
paths, and anything related to that (which ime also means all the cpu
hotplug machinery, which includes suspend/resume, and anything related to
the vfs because someone always manages to drag sysfs into the picture).

It's going to be fun times.

Cheers, Daniel
> ---
>  drivers/dma-buf/dma-buf.c | 60 
> +++
>  include/linux/dma-buf.h   | 38 ++
>  2 files changed, 98 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index d78d5fc173dc..ed2b3559ba25 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -572,7 +572,9 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf 
> *dmabuf,
>   if (ret)
>   goto err_attach;
>   }
> + reservation_object_lock(dmabuf->resv, NULL);
>   list_add(>node, >attachments);
> + reservation_object_unlock(dmabuf->resv);
>  
>   mutex_unlock(>lock);
>   return attach;
> @@ -598,7 +600,9 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
> dma_buf_attachment *attach)
>   return;
>  
>   mutex_lock(>lock);
> + reservation_object_lock(dmabuf->resv, NULL);
>   list_del(>node);
> + reservation_object_unlock(dmabuf->resv);
>   if (dmabuf->ops->detach)
>   dmabuf->ops->detach(dmabuf, attach);
>  
> @@ -632,10 +636,23 @@ struct sg_table *dma_buf_map_attachment(struct 
> dma_buf_attachment *attach,
>   if (WARN_ON(!attach || !attach->dmabuf))
>   return ERR_PTR(-EINVAL);
>  
> + /*
> +  * Mapping a DMA-buf can trigger its invalidation, prevent sending this
> +  * event to the caller by temporary removing this attachment from the
> +  * list.
> +  */
> + if (attach->invalidate_mappings) {
> + reservation_object_assert_held(attach->dmabuf->resv);
> + list_del(>node);
> + }
> +
>   sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
>   if (!sg_table)
>   sg_table = ERR_PTR(-ENOMEM);
>  
> + if (attach->invalidate_mappings)
> + list_add(>node, >dmabuf->attachments);
> +
>   return sg_table;
>  }
>  EXPORT_SYMBOL_GPL(dma_buf_map_attachment);
> @@ -656,6 +673,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
> *attach,
>  {
>   might_sleep();
>  
> + if (attach->invalidate_mappings)
> + reservation_object_assert_held(attach->dmabuf->resv);
> +
>   if (WARN_ON(!attach || !attach->dmabuf || !sg_table))
>   return;
>  
> @@ -664,6 +684,44 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment 
> *attach,
>  }
>  EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
>  
> +/**
> + * dma_buf_set_invalidate_callback - set the invalidate_mappings callback
> + *
> + * @attach:  [in]attachment where to set the callback
> + * @cb:  [in]the callback to set
> + *
> + * Makes sure to take the appropriate locks when updating the invalidate
> + * mappings callback.
> + */
> +void dma_buf_set_invalidate_callback(struct dma_buf_attachment *attach,
> +  void (*cb)(struct dma_buf_attachment *))
> +{
> + reservation_object_lock(attach->dmabuf->resv, NULL);
> + attach->invalidate_mappings = cb;
> + reservation_object_unlock(attach->dmabuf->resv);
> +}
> +EXPORT_SYMBOL_GPL(dma_buf_set_invalidate_callback);
> +
> +/**
> + * dma_buf_invalidate_mappings - invalidate

Re: [PATCH] drm/amdgpu/nbio6: Correct PCIE_INDEX/DATA pair used for smn register accessing

2018-03-19 Thread Alex Deucher

On Mon, Mar 19, 2018 at 5:32 AM, Hawking Zhang  wrote:
> PCIE_INDEX2/DATA2 pair will be used for smn register accessing since from 
> vega.
> PCIE_INDEX/DATA pair should be reserved for smu
>
> Change-Id: Ie597d89001e706225521c94161d2b40443ec3c48
> Signed-off-by: Hawking Zhang 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> index 1cf3424..6f9c549 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
> @@ -220,12 +220,12 @@ static u32 nbio_v6_1_get_hdp_flush_done_offset(struct 
> amdgpu_device *adev)
>
>  static u32 nbio_v6_1_get_pcie_index_offset(struct amdgpu_device *adev)
>  {
> -   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_INDEX);
> +   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_INDEX2);
>  }
>
>  static u32 nbio_v6_1_get_pcie_data_offset(struct amdgpu_device *adev)
>  {
> -   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_DATA);
> +   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_DATA2);
>  }
>
>  static const struct nbio_hdp_flush_reg nbio_v6_1_hdp_flush_reg = {
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: After upgrading LLVM to 7.0.0-0.1.r326462 version and MESA to 18.1.0-0.4.git56dc9f9 the games which are used Vulkan driver have begun crashing at start

2018-03-19 Thread mikhail . v . gavrilov


No one looked why Vulkan driver start crashing?

Thread 1 "vulkandriverque" received signal SIGSEGV, Segmentation fault.
0x in ?? ()
(gdb) thread apply all bt

Thread 3 (Thread 0x7fffef06c700 (LWP 15668)):
#0  0x76e4f82d in futex_wait_cancelable (private=, 
expected=0, futex_word=0x55b2a188) at 
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55b2a138, 
cond=0x55b2a160) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x55b2a160, mutex=0x55b2a138) at 
pthread_cond_wait.c:655
#3  0x74615aab in util_queue_thread_func.lto_priv () from 
/usr/lib64/libvulkan_radeon.so
#4  0x74615977 in impl_thrd_routine.lto_priv () from 
/usr/lib64/libvulkan_radeon.so
#5  0x76e4950b in start_thread (arg=0x7fffef06c700) at 
pthread_create.c:465
#6  0x7735916f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fffef9ae700 (LWP 15667)):
#0  0x76e4f82d in futex_wait_cancelable (private=, 
expected=0, futex_word=0x55a15a08) at 
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x55a159b8, 
cond=0x55a159e0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x55a159e0, mutex=0x55a159b8) at 
pthread_cond_wait.c:655
#3  0x74615aab in util_queue_thread_func.lto_priv () from 
/usr/lib64/libvulkan_radeon.so
#4  0x74615977 in impl_thrd_routine.lto_priv () from 
/usr/lib64/libvulkan_radeon.so
#5  0x76e4950b in start_thread (arg=0x7fffef9ae700) at 
pthread_create.c:465
#6  0x7735916f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x77fce740 (LWP 15663)):
#0  0x in ?? ()
#1  0x5562f4c9 in VulkanSwapChain::initSurface(xcb_connection_t*, 
unsigned int) ()
#2  0x5562dae3 in VulkanExampleBase::initSwapchain() ()
#3  0x55627ab4 in VulkanExampleBase::prepare() ()
#4  0x556145f4 in VulkanExample::prepare() ()
#5  0x556112a3 in main ()
(gdb) 


On Sun, 2018-03-11 at 16:16 +0500, Mikhail Gavrilov wrote:
> After upgrading LLVM to 7.0.0-0.1.r326462 version and MESA to
> 18.1.0-0.4.git56dc9f9 the games which are used Vulkan driver have
> begun crashing at start.
> Latest workable LLVM was 7.0.0-0.1.r323994 and MESA 18.1.0-0.2.gita5053ba
> 
> For example:
> 
> - steam client itself
> 
> Core was generated by
> `/home/mikhail/.local/share/Steam/ubuntu12_32/../ubuntu12_64/vulkandriverquery'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x in ?? ()
> [Current thread is 1 (Thread 0x7f4df789bdc0 (LWP 4363))]
> (gdb) thread apply all bt
> 
> Thread 3 (Thread 0x7f4defc6a700 (LWP 4365)):
> #0  0x7f4df6bef4e0 in __libc_open64 (file=0x7f4de8000d60
> "/home/mikhail/.local/share/Steam/shader_cache_temp_dir_vk_64/mesa_shader_cache/c6/48238cc37fea62b1bf7d4008f17f84f7eeede2.tmp",
> oflag=) at ../sysdeps/unix/sysv/linux/open64.c:46
> #1  0x7f4df3eb55eb in cache_put.lto_priv () from
> /usr/lib64/libvulkan_radeon.so
> #2  0x7f4df3eab49e in util_queue_thread_func.lto_priv () from
> /usr/lib64/libvulkan_radeon.so
> #3  0x7f4df3eab2d7 in impl_thrd_routine.lto_priv () from
> /usr/lib64/libvulkan_radeon.so
> #4  0x7f4df66ee50b in start_thread (arg=0x7f4defc6a700) at
> pthread_create.c:465
> #5  0x7f4df6bfe16f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Thread 2 (Thread 0x7f4def328700 (LWP 4366)):
> #0  0x7f4df66f482d in futex_wait_cancelable (private= out>, expected=0, futex_word=0x5636767a57d8) at
> ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5636767a5788,
> cond=0x5636767a57b0) at pthread_cond_wait.c:502
> #2  __pthread_cond_wait (cond=0x5636767a57b0, mutex=0x5636767a5788) at
> pthread_cond_wait.c:655
> #3  0x7f4df3eab40b in util_queue_thread_func.lto_priv () from
> /usr/lib64/libvulkan_radeon.so
> #4  0x7f4df3eab2d7 in impl_thrd_routine.lto_priv () from
> /usr/lib64/libvulkan_radeon.so
> #5  0x7f4df66ee50b in start_thread (arg=0x7f4def328700) at
> pthread_create.c:465
> #6  0x7f4df6bfe16f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Thread 1 (Thread 0x7f4df789bdc0 (LWP 4363)):
> #0  0x in ?? ()
> #1  0x5636759414c9 in
> VulkanSwapChain::initSurface(xcb_connection_t*, unsigned int) ()
> #2  0x56367593fae3 in VulkanExampleBase::initSwapchain() ()
> #3  0x563675939ab4 in VulkanExampleBase::prepare() ()
> #4  0x5636759265f4 in VulkanExample::prepare() ()
> #5  0x5636759232a3 in main ()
> (gdb)
> 
> 
> - game F1 2017.
> 
> Thread 58 "WinMain" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff6e085700 (LWP 27024)]
> 0x in ?? ()
> (gdb) thread apply all bt
> 
> Thread 69 (Thread 0x7fff8d3dc700 (LWP 27035)):
> #0  0x76ca0106 in

Re: [Outreachy kernel] Re: [PATCH] gpu: drm: Use list_first_entry instead of list_entry

2018-03-19 Thread Julia Lawall



On Mon, 19 Mar 2018, Christian König wrote:

> Mhm, actually that patch isn't correct. What we grab get here is the next
> entry, not the first one.
>
> We don't have an alias list_next_entry for list_first_entry?

As compared to the semantic patch I proposed earlier today, it would seem
that list_first_entry is useful when the types are different?  One would
have to check the result of course, but a list eleemnt with the same type
as the structure that contains the list might be unlikely?

julia

>
> Regards,
> Christian.
>
> Am 18.03.2018 um 22:51 schrieb Arushi Singhal:
> > This patch replaces list_entry with list_first_entry as it makes the
> > code more clear.
> > Done using coccinelle:
> >
> > @@
> > expression e;
> > @@
> > (
> > - list_entry(e->next,
> > + list_first_entry(e,
> >...)
> > |
> > - list_entry(e->prev,
> > + list_last_entry(e,
> >...)
> > )
> >
> > Signed-off-by: Arushi Singhal 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 4 ++--
> >   drivers/gpu/drm/omapdrm/dss/display.c  | 4 ++--
> >   drivers/gpu/drm/radeon/radeon_sa.c | 4 ++--
> >   3 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> > index 3144400..646f593 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> > @@ -158,7 +158,7 @@ static void amdgpu_sa_bo_try_free(struct
> > amdgpu_sa_manager *sa_manager)
> > if (sa_manager->hole->next == _manager->olist)
> > return;
> >   - sa_bo = list_entry(sa_manager->hole->next, struct amdgpu_sa_bo,
> > olist);
> > +   sa_bo = list_first_entry(sa_manager->hole, struct amdgpu_sa_bo,
> > olist);
> > list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
> > if (sa_bo->fence == NULL ||
> > !dma_fence_is_signaled(sa_bo->fence)) {
> > @@ -183,7 +183,7 @@ static inline unsigned amdgpu_sa_bo_hole_eoffset(struct
> > amdgpu_sa_manager *sa_ma
> > struct list_head *hole = sa_manager->hole;
> > if (hole->next != _manager->olist) {
> > -   return list_entry(hole->next, struct amdgpu_sa_bo,
> > olist)->soffset;
> > +   return list_first_entry(hole, struct amdgpu_sa_bo,
> > olist)->soffset;
> > }
> > return sa_manager->size;
> >   }
> > diff --git a/drivers/gpu/drm/omapdrm/dss/display.c
> > b/drivers/gpu/drm/omapdrm/dss/display.c
> > index 0c9480b..fb9ecae 100644
> > --- a/drivers/gpu/drm/omapdrm/dss/display.c
> > +++ b/drivers/gpu/drm/omapdrm/dss/display.c
> > @@ -158,8 +158,8 @@ struct omap_dss_device *omap_dss_get_next_device(struct
> > omap_dss_device *from)
> > goto out;
> > }
> >   - dssdev = list_entry(l->next, struct omap_dss_device,
> > -   panel_list);
> > +   dssdev = list_first_entry(l, struct omap_dss_device,
> > + panel_list);
> > omap_dss_get_device(dssdev);
> > goto out;
> > }
> > diff --git a/drivers/gpu/drm/radeon/radeon_sa.c
> > b/drivers/gpu/drm/radeon/radeon_sa.c
> > index 197b157..66c0482 100644
> > --- a/drivers/gpu/drm/radeon/radeon_sa.c
> > +++ b/drivers/gpu/drm/radeon/radeon_sa.c
> > @@ -158,7 +158,7 @@ static void radeon_sa_bo_try_free(struct
> > radeon_sa_manager *sa_manager)
> > if (sa_manager->hole->next == _manager->olist)
> > return;
> >   - sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo,
> > olist);
> > +   sa_bo = list_first_entry(sa_manager->hole, struct radeon_sa_bo,
> > olist);
> > list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
> > if (sa_bo->fence == NULL ||
> > !radeon_fence_signaled(sa_bo->fence)) {
> > return;
> > @@ -182,7 +182,7 @@ static inline unsigned radeon_sa_bo_hole_eoffset(struct
> > radeon_sa_manager *sa_ma
> > struct list_head *hole = sa_manager->hole;
> > if (hole->next != _manager->olist) {
> > -   return list_entry(hole->next, struct radeon_sa_bo,
> > olist)->soffset;
> > +   return list_first_entry(hole, struct radeon_sa_bo,
> > olist)->soffset;
> > }
> > return sa_manager->size;
> >   }
>
> --
> You received this message because you are subscribed to the Google Groups
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/outreachy-kernel/8b1e22f8-7a05-b66b-8825-7d4d97e46dac%40amd.com.
> For more options, visit https://groups.google.com/d/optout.
>___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [Outreachy kernel] Re: [PATCH] gpu: drm: Use list_first_entry instead of list_entry

2018-03-19 Thread Christian König


Am 19.03.2018 um 12:06 schrieb Julia Lawall:


On Mon, 19 Mar 2018, Christian König wrote:


Mhm, actually that patch isn't correct. What we grab get here is the next
entry, not the first one.

We don't have an alias list_next_entry for list_first_entry?

As compared to the semantic patch I proposed earlier today, it would seem
that list_first_entry is useful when the types are different?  One would
have to check the result of course, but a list eleemnt with the same type
as the structure that contains the list might be unlikely?


The list element and the list head have different types in this case:

if (sa_manager->hole->next == _manager->olist)
return;
   -sa_bo = list_entry(sa_manager->hole->next, struct amdgpu_sa_bo, olist);
+   sa_bo = list_first_entry(sa_manager->hole, struct amdgpu_sa_bo, olist);


sa_manager->olist is the head of the list and sa_manager->hole can point 
to both the head and any element in the list.


The statement "if (sa_manager->hole->next == _manager->olist)" now 
checks if the next pointer points to the head and if so aborts the function.


Then the statement "sa_bo = list_entry(sa_manager->hole->next, struct 
amdgpu_sa_bo, olist);" returns the next element after the hole to try to 
release it.


So with the automated change the code is still correct, but NOT easier 
to understand because we actually don't grab the first element here.


Regards,
Christian.



julia


Regards,
Christian.

Am 18.03.2018 um 22:51 schrieb Arushi Singhal:

This patch replaces list_entry with list_first_entry as it makes the
code more clear.
Done using coccinelle:

@@
expression e;
@@
(
- list_entry(e->next,
+ list_first_entry(e,
...)
|
- list_entry(e->prev,
+ list_last_entry(e,
...)
)

Signed-off-by: Arushi Singhal 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 4 ++--
   drivers/gpu/drm/omapdrm/dss/display.c  | 4 ++--
   drivers/gpu/drm/radeon/radeon_sa.c | 4 ++--
   3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
index 3144400..646f593 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
@@ -158,7 +158,7 @@ static void amdgpu_sa_bo_try_free(struct
amdgpu_sa_manager *sa_manager)
if (sa_manager->hole->next == _manager->olist)
return;
   -sa_bo = list_entry(sa_manager->hole->next, struct amdgpu_sa_bo,
olist);
+   sa_bo = list_first_entry(sa_manager->hole, struct amdgpu_sa_bo,
olist);
list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
if (sa_bo->fence == NULL ||
!dma_fence_is_signaled(sa_bo->fence)) {
@@ -183,7 +183,7 @@ static inline unsigned amdgpu_sa_bo_hole_eoffset(struct
amdgpu_sa_manager *sa_ma
struct list_head *hole = sa_manager->hole;
if (hole->next != _manager->olist) {
-   return list_entry(hole->next, struct amdgpu_sa_bo,
olist)->soffset;
+   return list_first_entry(hole, struct amdgpu_sa_bo,
olist)->soffset;
}
return sa_manager->size;
   }
diff --git a/drivers/gpu/drm/omapdrm/dss/display.c
b/drivers/gpu/drm/omapdrm/dss/display.c
index 0c9480b..fb9ecae 100644
--- a/drivers/gpu/drm/omapdrm/dss/display.c
+++ b/drivers/gpu/drm/omapdrm/dss/display.c
@@ -158,8 +158,8 @@ struct omap_dss_device *omap_dss_get_next_device(struct
omap_dss_device *from)
goto out;
}
   -dssdev = list_entry(l->next, struct omap_dss_device,
-   panel_list);
+   dssdev = list_first_entry(l, struct omap_dss_device,
+ panel_list);
omap_dss_get_device(dssdev);
goto out;
}
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c
b/drivers/gpu/drm/radeon/radeon_sa.c
index 197b157..66c0482 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -158,7 +158,7 @@ static void radeon_sa_bo_try_free(struct
radeon_sa_manager *sa_manager)
if (sa_manager->hole->next == _manager->olist)
return;
   -sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo,
olist);
+   sa_bo = list_first_entry(sa_manager->hole, struct radeon_sa_bo,
olist);
list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
if (sa_bo->fence == NULL ||
!radeon_fence_signaled(sa_bo->fence)) {
return;
@@ -182,7 +182,7 @@ static inline unsigned radeon_sa_bo_hole_eoffset(struct
radeon_sa_manager *sa_ma
struct list_head *hole = sa_manager->hole;
if (hole->next != _manager->olist) {
-   return list_entry(hole->next, struct radeon_sa_bo,
olist)->soffset;
+   return list_first_entry(hole, struct radeon_sa_bo,

[PATCH] drm/amdgpu: re-validate per VM BOs if required

2018-03-19 Thread Christian König

If a per VM BO ends up in a allowed domain it never moves back into the
prefered domain.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 24474294c92a..e8b515dd032c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1770,14 +1770,16 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
 
spin_lock(>status_lock);
while (!list_empty(>moved)) {
-   struct amdgpu_bo_va *bo_va;
struct reservation_object *resv;
+   struct amdgpu_bo_va *bo_va;
+   struct amdgpu_bo *bo;
 
bo_va = list_first_entry(>moved,
struct amdgpu_bo_va, base.vm_status);
spin_unlock(>status_lock);
 
-   resv = bo_va->base.bo->tbo.resv;
+   bo = bo_va->base.bo;
+   resv = bo->tbo.resv;
 
/* Per VM BOs never need to bo cleared in the page tables */
if (resv == vm->root.base.bo->tbo.resv)
@@ -1797,6 +1799,15 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
reservation_object_unlock(resv);
 
spin_lock(>status_lock);
+
+   /* If the BO prefers to be in VRAM, but currently isn't add it
+* back to the evicted list so that it gets validated again on
+* the next command submission.
+*/
+   if (resv == vm->root.base.bo->tbo.resv &&
+   bo->preferred_domains == AMDGPU_GEM_DOMAIN_VRAM &&
+   bo->tbo.mem.mem_type != TTM_PL_VRAM)
+   list_add_tail(_va->base.vm_status, >evicted);
}
spin_unlock(>status_lock);
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: validate fallback BOs

2018-03-19 Thread Christian König


Am 19.03.2018 um 11:00 schrieb Chunming Zhou:

issue:
Game F1 performance drops 13% when per vm bo is enabled.

root cause:
if some BOs are fallback to allowed domain, they will never be validated if no 
eviction happens,
that means they always exist in allowed domain.

Fix:
maintain a per vm allowed domain BOs list, then try to validated them with 
perferred domain.


The idea sounds good, but I find the implementation a bit overkill.

Give me a second to hack something together,
Christian.



Change-Id: I4335470bf867b46ac93c8e2531eac3f8ba9ac2da
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 15 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 49 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  7 -
  3 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 383bf2d31c92..7509b6bd2047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -359,7 +359,7 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device 
*adev, u64 num_bytes,
  }
  
  static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,

-struct amdgpu_bo *bo)
+struct amdgpu_bo *bo, bool *allowed)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = {
@@ -374,6 +374,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
if (bo->pin_count)
return 0;
  
+	if (allowed)

+   *allowed = false;
/* Don't move this buffer if we have depleted our allowance
 * to move it. Don't move anything if the threshold is zero.
 */
@@ -396,6 +398,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
}
  
  retry:

+   if (domain != bo->preferred_domains && domain == bo->allowed_domains &&
+   allowed)
+   *allowed = true;
amdgpu_ttm_placement_from_domain(bo, domain);
r = ttm_bo_validate(>tbo, >placement, );
  
@@ -479,19 +484,19 @@ static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser *p,

return false;
  }
  
-static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo)

+static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo, bool *allowed)
  {
struct amdgpu_cs_parser *p = param;
int r;
  
  	do {

-   r = amdgpu_cs_bo_validate(p, bo);
+   r = amdgpu_cs_bo_validate(p, bo, allowed);
} while (r == -ENOMEM && amdgpu_cs_try_evict(p, bo));
if (r)
return r;
  
  	if (bo->shadow)

-   r = amdgpu_cs_bo_validate(p, bo->shadow);
+   r = amdgpu_cs_bo_validate(p, bo->shadow, NULL);
  
  	return r;

  }
@@ -528,7 +533,7 @@ static int amdgpu_cs_list_validate(struct amdgpu_cs_parser 
*p,
if (p->evictable == lobj)
p->evictable = NULL;
  
-		r = amdgpu_cs_validate(p, bo);

+   r = amdgpu_cs_validate(p, bo, NULL);
if (r)
return r;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index e9a41dd05345..365e8dca05f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -186,6 +186,35 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
list_add(>tv.head, validated);
  }
  
+static int amdgpu_vm_try_validate_allowed(struct amdgpu_device *adev,

+ struct amdgpu_vm *vm,
+ int (*validate)(void *p,
+ struct amdgpu_bo *bo,
+ bool *allowed),
+ void *param)
+{
+   struct amdgpu_vm_bo_base *bo_base, *tmp;
+   int r;
+   bool allowed;
+
+   spin_lock(>status_lock);
+   list_for_each_entry_safe(bo_base, tmp, >allowed_domain,
+allowed_domain_list) {
+   spin_unlock(>status_lock);
+   r = validate(param, bo_base->bo, );
+   if (r)
+   return r;
+   spin_lock(>status_lock);
+   if (!allowed)
+   list_del_init(_base->allowed_domain_list);
+   if (bo_base->bo->tbo.type != ttm_bo_type_kernel)
+   list_move(_base->vm_status, >moved);
+   else
+   list_move(_base->vm_status, >relocated);
+   }
+   spin_unlock(>status_lock);
+   return 0;
+}
  /**
   * amdgpu_vm_validate_pt_bos - validate the page table BOs
   *
@@ -197,16 +226,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
   * Validate the page table BOs on command submission if neccessary.
   */
  int

Re: [PATCH 1/2] drm/amdgpu: Don't change preferred domian when fallback GTT v3

2018-03-19 Thread Christian König


Am 19.03.2018 um 11:00 schrieb Chunming Zhou:

v2: add sanity checking
v3: make code open

Change-Id: I2cf672ad36b8b4cc1a6b2e704f786bf6a155d9ce
Signed-off-by: Chunming Zhou 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 
  2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6e6570ff9f8b..660f5e44b1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -85,11 +85,6 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
goto retry;
}
-
-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",
  size, initial_domain, alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3310219e0ac..e167e98c746c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -440,13 +440,21 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
  #endif
  
  	bo->tbo.bdev = >mman.bdev;

-   amdgpu_ttm_placement_from_domain(bo, domain);
-
+   amdgpu_ttm_placement_from_domain(bo, bo->preferred_domains);
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
 >placement, page_align, , acc_size,
 sg, resv, _ttm_bo_destroy);
-   if (unlikely(r != 0))
-   return r;
+
+   if (unlikely(r && r != -ERESTARTSYS) && bo->allowed_domains !=
+   bo->preferred_domains) {
+   amdgpu_ttm_placement_from_domain(bo, bo->allowed_domains);
+   r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
+>placement, page_align, ,
+acc_size, sg, resv,
+_ttm_bo_destroy);
+   if (unlikely(r != 0))
+   return r;
+   }
  
  	if (adev->gmc.visible_vram_size < adev->gmc.real_vram_size &&

bo->tbo.mem.mem_type == TTM_PL_VRAM &&


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: validate fallback BOs

2018-03-19 Thread Chunming Zhou

issue:
Game F1 performance drops 13% when per vm bo is enabled.

root cause:
if some BOs are fallback to allowed domain, they will never be validated if no 
eviction happens,
that means they always exist in allowed domain.

Fix:
maintain a per vm allowed domain BOs list, then try to validated them with 
perferred domain.

Change-Id: I4335470bf867b46ac93c8e2531eac3f8ba9ac2da
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 49 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  7 -
 3 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 383bf2d31c92..7509b6bd2047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -359,7 +359,7 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device 
*adev, u64 num_bytes,
 }
 
 static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
-struct amdgpu_bo *bo)
+struct amdgpu_bo *bo, bool *allowed)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = {
@@ -374,6 +374,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
if (bo->pin_count)
return 0;
 
+   if (allowed)
+   *allowed = false;
/* Don't move this buffer if we have depleted our allowance
 * to move it. Don't move anything if the threshold is zero.
 */
@@ -396,6 +398,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
}
 
 retry:
+   if (domain != bo->preferred_domains && domain == bo->allowed_domains &&
+   allowed)
+   *allowed = true;
amdgpu_ttm_placement_from_domain(bo, domain);
r = ttm_bo_validate(>tbo, >placement, );
 
@@ -479,19 +484,19 @@ static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser 
*p,
return false;
 }
 
-static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo)
+static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo, bool *allowed)
 {
struct amdgpu_cs_parser *p = param;
int r;
 
do {
-   r = amdgpu_cs_bo_validate(p, bo);
+   r = amdgpu_cs_bo_validate(p, bo, allowed);
} while (r == -ENOMEM && amdgpu_cs_try_evict(p, bo));
if (r)
return r;
 
if (bo->shadow)
-   r = amdgpu_cs_bo_validate(p, bo->shadow);
+   r = amdgpu_cs_bo_validate(p, bo->shadow, NULL);
 
return r;
 }
@@ -528,7 +533,7 @@ static int amdgpu_cs_list_validate(struct amdgpu_cs_parser 
*p,
if (p->evictable == lobj)
p->evictable = NULL;
 
-   r = amdgpu_cs_validate(p, bo);
+   r = amdgpu_cs_validate(p, bo, NULL);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e9a41dd05345..365e8dca05f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -186,6 +186,35 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
list_add(>tv.head, validated);
 }
 
+static int amdgpu_vm_try_validate_allowed(struct amdgpu_device *adev,
+ struct amdgpu_vm *vm,
+ int (*validate)(void *p,
+ struct amdgpu_bo *bo,
+ bool *allowed),
+ void *param)
+{
+   struct amdgpu_vm_bo_base *bo_base, *tmp;
+   int r;
+   bool allowed;
+
+   spin_lock(>status_lock);
+   list_for_each_entry_safe(bo_base, tmp, >allowed_domain,
+allowed_domain_list) {
+   spin_unlock(>status_lock);
+   r = validate(param, bo_base->bo, );
+   if (r)
+   return r;
+   spin_lock(>status_lock);
+   if (!allowed)
+   list_del_init(_base->allowed_domain_list);
+   if (bo_base->bo->tbo.type != ttm_bo_type_kernel)
+   list_move(_base->vm_status, >moved);
+   else
+   list_move(_base->vm_status, >relocated);
+   }
+   spin_unlock(>status_lock);
+   return 0;
+}
 /**
  * amdgpu_vm_validate_pt_bos - validate the page table BOs
  *
@@ -197,16 +226,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
  * Validate the page table BOs on command submission if neccessary.
  */
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
- int (*validate)(void *p, struct amdgpu_bo *bo),
+ int

[PATCH 1/2] drm/amdgpu: Don't change preferred domian when fallback GTT v3

2018-03-19 Thread Chunming Zhou

v2: add sanity checking
v3: make code open

Change-Id: I2cf672ad36b8b4cc1a6b2e704f786bf6a155d9ce
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6e6570ff9f8b..660f5e44b1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -85,11 +85,6 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
goto retry;
}
-
-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",
  size, initial_domain, alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3310219e0ac..e167e98c746c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -440,13 +440,21 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 #endif
 
bo->tbo.bdev = >mman.bdev;
-   amdgpu_ttm_placement_from_domain(bo, domain);
-
+   amdgpu_ttm_placement_from_domain(bo, bo->preferred_domains);
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
 >placement, page_align, , acc_size,
 sg, resv, _ttm_bo_destroy);
-   if (unlikely(r != 0))
-   return r;
+
+   if (unlikely(r && r != -ERESTARTSYS) && bo->allowed_domains !=
+   bo->preferred_domains) {
+   amdgpu_ttm_placement_from_domain(bo, bo->allowed_domains);
+   r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
+>placement, page_align, ,
+acc_size, sg, resv,
+_ttm_bo_destroy);
+   if (unlikely(r != 0))
+   return r;
+   }
 
if (adev->gmc.visible_vram_size < adev->gmc.real_vram_size &&
bo->tbo.mem.mem_type == TTM_PL_VRAM &&
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: Don't change preferred domian when fallback GTT v2

2018-03-19 Thread Christian König


Am 19.03.2018 um 09:19 schrieb Chunming Zhou:

v2: add sanity checking

Change-Id: I2cf672ad36b8b4cc1a6b2e704f786bf6a155d9ce
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +--
  2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6e6570ff9f8b..660f5e44b1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -85,11 +85,6 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
goto retry;
}
-
-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",
  size, initial_domain, alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3310219e0ac..6e714c015a79 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -371,6 +371,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
enum ttm_bo_type type;
unsigned long page_align;
size_t acc_size;
+   u32 setting_domain;
int r;
  
  	page_align = roundup(byte_align, PAGE_SIZE) >> PAGE_SHIFT;

@@ -440,13 +441,23 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
  #endif
  
  	bo->tbo.bdev = >mman.bdev;

-   amdgpu_ttm_placement_from_domain(bo, domain);
+   setting_domain = bo->preferred_domains;
+retry:
+   amdgpu_ttm_placement_from_domain(bo, setting_domain);
  
  	r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,

 >placement, page_align, , acc_size,
 sg, resv, _ttm_bo_destroy);
-   if (unlikely(r != 0))
+   if (unlikely(r != 0)) {
+   if (r != -ERESTARTSYS &&
+   (ctx.flags & TTM_OPT_FLAG_ALLOW_RES_EVICT) == 0 &&
+   setting_domain != bo->allowed_domains &&
+   type == ttm_bo_type_device){
+   setting_domain = bo->allowed_domains;
+   goto retry;
+   }
return r;
+   }


That still doesn't looks good to me, please just open code it as I 
suggested.


Additional to that why do you check TTM_OPT_FLAG_ALLOW_RES_EVICT here? 
That doesn't seems to make sense.


Regards,
Christian.

  
  	if (adev->gmc.visible_vram_size < adev->gmc.real_vram_size &&

bo->tbo.mem.mem_type == TTM_PL_VRAM &&


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

2018-03-19 Thread Christian König


Am 19.03.2018 um 07:08 schrieb Evan Quan:

Since under some heavy computing environment(dgemm test), it takes
the asic over 10+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite confusing although it
does not seem to bring any real problems.
As a quick workround, we choose to disable timeout when GPU reset
is disabled.


NAK, I enabled those warning intentionally even when the GPU recovery is 
disabled to have a hint in the logs what goes wrong.


Please only increase the timeout for the compute queue and/or add a 
separate timeout for them.


Regards,
Christian.




Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
Signed-off-by: Evan Quan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8bd9c3f..9d6a775 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -861,6 +861,13 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
amdgpu_lockup_timeout = 1;
}
  
+	/*

+* Disable timeout when GPU reset is disabled to avoid confusing
+* timeout messages in the kernel log.
+*/
+   if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
+   amdgpu_lockup_timeout = INT_MAX;
+
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, 
amdgpu_fw_load_type);
  }
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] gpu: drm: Use list_first_entry instead of list_entry

2018-03-19 Thread Christian König

Mhm, actually that patch isn't correct. What we grab get here is the 
next entry, not the first one.


We don't have an alias list_next_entry for list_first_entry?

Regards,
Christian.

Am 18.03.2018 um 22:51 schrieb Arushi Singhal:

This patch replaces list_entry with list_first_entry as it makes the
code more clear.
Done using coccinelle:

@@
expression e;
@@
(
- list_entry(e->next,
+ list_first_entry(e,
   ...)
|
- list_entry(e->prev,
+ list_last_entry(e,
   ...)
)

Signed-off-by: Arushi Singhal 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 4 ++--
  drivers/gpu/drm/omapdrm/dss/display.c  | 4 ++--
  drivers/gpu/drm/radeon/radeon_sa.c | 4 ++--
  3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
index 3144400..646f593 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
@@ -158,7 +158,7 @@ static void amdgpu_sa_bo_try_free(struct amdgpu_sa_manager 
*sa_manager)
if (sa_manager->hole->next == _manager->olist)
return;
  
-	sa_bo = list_entry(sa_manager->hole->next, struct amdgpu_sa_bo, olist);

+   sa_bo = list_first_entry(sa_manager->hole, struct amdgpu_sa_bo, olist);
list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
if (sa_bo->fence == NULL ||
!dma_fence_is_signaled(sa_bo->fence)) {
@@ -183,7 +183,7 @@ static inline unsigned amdgpu_sa_bo_hole_eoffset(struct 
amdgpu_sa_manager *sa_ma
struct list_head *hole = sa_manager->hole;
  
  	if (hole->next != _manager->olist) {

-   return list_entry(hole->next, struct amdgpu_sa_bo, 
olist)->soffset;
+   return list_first_entry(hole, struct amdgpu_sa_bo, 
olist)->soffset;
}
return sa_manager->size;
  }
diff --git a/drivers/gpu/drm/omapdrm/dss/display.c 
b/drivers/gpu/drm/omapdrm/dss/display.c
index 0c9480b..fb9ecae 100644
--- a/drivers/gpu/drm/omapdrm/dss/display.c
+++ b/drivers/gpu/drm/omapdrm/dss/display.c
@@ -158,8 +158,8 @@ struct omap_dss_device *omap_dss_get_next_device(struct 
omap_dss_device *from)
goto out;
}
  
-			dssdev = list_entry(l->next, struct omap_dss_device,

-   panel_list);
+   dssdev = list_first_entry(l, struct omap_dss_device,
+ panel_list);
omap_dss_get_device(dssdev);
goto out;
}
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 197b157..66c0482 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -158,7 +158,7 @@ static void radeon_sa_bo_try_free(struct radeon_sa_manager 
*sa_manager)
if (sa_manager->hole->next == _manager->olist)
return;
  
-	sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo, olist);

+   sa_bo = list_first_entry(sa_manager->hole, struct radeon_sa_bo, olist);
list_for_each_entry_safe_from(sa_bo, tmp, _manager->olist, olist) {
if (sa_bo->fence == NULL || 
!radeon_fence_signaled(sa_bo->fence)) {
return;
@@ -182,7 +182,7 @@ static inline unsigned radeon_sa_bo_hole_eoffset(struct 
radeon_sa_manager *sa_ma
struct list_head *hole = sa_manager->hole;
  
  	if (hole->next != _manager->olist) {

-   return list_entry(hole->next, struct radeon_sa_bo, 
olist)->soffset;
+   return list_first_entry(hole, struct radeon_sa_bo, 
olist)->soffset;
}
return sa_manager->size;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/nbio6: Correct PCIE_INDEX/DATA pair used for smn register accessing

2018-03-19 Thread Hawking Zhang

PCIE_INDEX2/DATA2 pair will be used for smn register accessing since from vega.
PCIE_INDEX/DATA pair should be reserved for smu

Change-Id: Ie597d89001e706225521c94161d2b40443ec3c48
Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index 1cf3424..6f9c549 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -220,12 +220,12 @@ static u32 nbio_v6_1_get_hdp_flush_done_offset(struct 
amdgpu_device *adev)
 
 static u32 nbio_v6_1_get_pcie_index_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_INDEX);
+   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_INDEX2);
 }
 
 static u32 nbio_v6_1_get_pcie_data_offset(struct amdgpu_device *adev)
 {
-   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_DATA);
+   return SOC15_REG_OFFSET(NBIO, 0, mmPCIE_DATA2);
 }
 
 static const struct nbio_hdp_flush_reg nbio_v6_1_hdp_flush_reg = {
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: validate fallback BOs

2018-03-19 Thread Chunming Zhou

issue:
Game F1 performance drops 13% when per vm bo is enabled.

root cause:
if some BOs are fallback to allowed domain, they will never be validated if no 
eviction happens,
that means they always exist in allowed domain.

Fix:
maintain a per vm allowed domain BOs list, then try to validated them with 
perferred domain.

Change-Id: I4335470bf867b46ac93c8e2531eac3f8ba9ac2da
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 49 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  7 -
 3 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 383bf2d31c92..7509b6bd2047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -359,7 +359,7 @@ void amdgpu_cs_report_moved_bytes(struct amdgpu_device 
*adev, u64 num_bytes,
 }
 
 static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
-struct amdgpu_bo *bo)
+struct amdgpu_bo *bo, bool *allowed)
 {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
struct ttm_operation_ctx ctx = {
@@ -374,6 +374,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
if (bo->pin_count)
return 0;
 
+   if (allowed)
+   *allowed = false;
/* Don't move this buffer if we have depleted our allowance
 * to move it. Don't move anything if the threshold is zero.
 */
@@ -396,6 +398,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
}
 
 retry:
+   if (domain != bo->preferred_domains && domain == bo->allowed_domains &&
+   allowed)
+   *allowed = true;
amdgpu_ttm_placement_from_domain(bo, domain);
r = ttm_bo_validate(>tbo, >placement, );
 
@@ -479,19 +484,19 @@ static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser 
*p,
return false;
 }
 
-static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo)
+static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo, bool *allowed)
 {
struct amdgpu_cs_parser *p = param;
int r;
 
do {
-   r = amdgpu_cs_bo_validate(p, bo);
+   r = amdgpu_cs_bo_validate(p, bo, allowed);
} while (r == -ENOMEM && amdgpu_cs_try_evict(p, bo));
if (r)
return r;
 
if (bo->shadow)
-   r = amdgpu_cs_bo_validate(p, bo->shadow);
+   r = amdgpu_cs_bo_validate(p, bo->shadow, NULL);
 
return r;
 }
@@ -528,7 +533,7 @@ static int amdgpu_cs_list_validate(struct amdgpu_cs_parser 
*p,
if (p->evictable == lobj)
p->evictable = NULL;
 
-   r = amdgpu_cs_validate(p, bo);
+   r = amdgpu_cs_validate(p, bo, NULL);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e9a41dd05345..365e8dca05f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -186,6 +186,35 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
list_add(>tv.head, validated);
 }
 
+static int amdgpu_vm_try_validate_allowed(struct amdgpu_device *adev,
+ struct amdgpu_vm *vm,
+ int (*validate)(void *p,
+ struct amdgpu_bo *bo,
+ bool *allowed),
+ void *param)
+{
+   struct amdgpu_vm_bo_base *bo_base, *tmp;
+   int r;
+   bool allowed;
+
+   spin_lock(>status_lock);
+   list_for_each_entry_safe(bo_base, tmp, >allowed_domain,
+allowed_domain_list) {
+   spin_unlock(>status_lock);
+   r = validate(param, bo_base->bo, );
+   if (r)
+   return r;
+   spin_lock(>status_lock);
+   if (!allowed)
+   list_del_init(_base->allowed_domain_list);
+   if (bo_base->bo->tbo.type != ttm_bo_type_kernel)
+   list_move(_base->vm_status, >moved);
+   else
+   list_move(_base->vm_status, >relocated);
+   }
+   spin_unlock(>status_lock);
+   return 0;
+}
 /**
  * amdgpu_vm_validate_pt_bos - validate the page table BOs
  *
@@ -197,16 +226,19 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
  * Validate the page table BOs on command submission if neccessary.
  */
 int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
- int (*validate)(void *p, struct amdgpu_bo *bo),
+ int

[PATCH 1/2] drm/amdgpu: Don't change preferred domian when fallback GTT v2

2018-03-19 Thread Chunming Zhou

v2: add sanity checking

Change-Id: I2cf672ad36b8b4cc1a6b2e704f786bf6a155d9ce
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 15 +--
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6e6570ff9f8b..660f5e44b1a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -85,11 +85,6 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
goto retry;
}
-
-   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-   initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-   goto retry;
-   }
DRM_DEBUG("Failed to allocate GEM object (%ld, %d, %u, 
%d)\n",
  size, initial_domain, alignment, r);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3310219e0ac..6e714c015a79 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -371,6 +371,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
enum ttm_bo_type type;
unsigned long page_align;
size_t acc_size;
+   u32 setting_domain;
int r;
 
page_align = roundup(byte_align, PAGE_SIZE) >> PAGE_SHIFT;
@@ -440,13 +441,23 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 #endif
 
bo->tbo.bdev = >mman.bdev;
-   amdgpu_ttm_placement_from_domain(bo, domain);
+   setting_domain = bo->preferred_domains;
+retry:
+   amdgpu_ttm_placement_from_domain(bo, setting_domain);
 
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, type,
 >placement, page_align, , acc_size,
 sg, resv, _ttm_bo_destroy);
-   if (unlikely(r != 0))
+   if (unlikely(r != 0)) {
+   if (r != -ERESTARTSYS &&
+   (ctx.flags & TTM_OPT_FLAG_ALLOW_RES_EVICT) == 0 &&
+   setting_domain != bo->allowed_domains &&
+   type == ttm_bo_type_device){
+   setting_domain = bo->allowed_domains;
+   goto retry;
+   }
return r;
+   }
 
if (adev->gmc.visible_vram_size < adev->gmc.real_vram_size &&
bo->tbo.mem.mem_type == TTM_PL_VRAM &&
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/amdgpu: passing i2s instance value as platform data

2018-03-19 Thread Vijendar Mukunda

i2s instance value is passed as platform data to dwc driver.
this parameter will be useful to distinguish current i2s
instance value when multiple i2s controller instances are created.

Signed-off-by: Vijendar Mukunda 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
index 6cca4d1..61d6cb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
@@ -83,6 +83,8 @@
 #define ACP_TIMEOUT_LOOP   0x00FF
 #define ACP_DEVS   4
 #define ACP_SRC_ID 162
+#define I2S_SP_INSTANCE1
+#define I2S_BT_INSTANCE2
 
 enum {
ACP_TILE_P1 = 0,
@@ -347,6 +349,7 @@ static int acp_hw_init(void *handle)
i2s_pdata[0].snd_rates = SNDRV_PCM_RATE_8000_96000;
i2s_pdata[0].i2s_reg_comp1 = ACP_I2S_COMP1_PLAY_REG_OFFSET;
i2s_pdata[0].i2s_reg_comp2 = ACP_I2S_COMP2_PLAY_REG_OFFSET;
+   i2s_pdata[0].i2s_instance = I2S_SP_INSTANCE;
switch (adev->asic_type) {
case CHIP_STONEY:
i2s_pdata[1].quirks = DW_I2S_QUIRK_COMP_REG_OFFSET |
@@ -362,6 +365,7 @@ static int acp_hw_init(void *handle)
i2s_pdata[1].snd_rates = SNDRV_PCM_RATE_8000_96000;
i2s_pdata[1].i2s_reg_comp1 = ACP_I2S_COMP1_CAP_REG_OFFSET;
i2s_pdata[1].i2s_reg_comp2 = ACP_I2S_COMP2_CAP_REG_OFFSET;
+   i2s_pdata[1].i2s_instance = I2S_SP_INSTANCE;
 
i2s_pdata[2].quirks = DW_I2S_QUIRK_COMP_REG_OFFSET;
switch (adev->asic_type) {
@@ -376,6 +380,7 @@ static int acp_hw_init(void *handle)
i2s_pdata[2].snd_rates = SNDRV_PCM_RATE_8000_96000;
i2s_pdata[2].i2s_reg_comp1 = ACP_BT_COMP1_REG_OFFSET;
i2s_pdata[2].i2s_reg_comp2 = ACP_BT_COMP2_REG_OFFSET;
+   i2s_pdata[2].i2s_instance = I2S_BT_INSTANCE;
 
adev->acp.acp_res[0].name = "acp2x_dma";
adev->acp.acp_res[0].flags = IORESOURCE_MEM;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

2018-03-19 Thread Quan, Evan

Hi Alex,

INT_MAX is used instead of MAX_SCHEDULE_TIMEOUT(which we discussed in another 
mail thread) since the amdgpu_lockup_timeout is with data type int.
Using MAX_SCHEDULE_TIMEOUT(data type:long) will get compile warnings.

Regards,
Evan
-Original Message-
From: Evan Quan [mailto:evan.q...@amd.com] 
Sent: Monday, March 19, 2018 2:08 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

Since under some heavy computing environment(dgemm test), it takes the asic 
over 10+ seconds to finish the dispatched single job which will trigger the 
timeout. It's quite confusing although it does not seem to bring any real 
problems.
As a quick workround, we choose to disable timeout when GPU reset is disabled.

Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8bd9c3f..9d6a775 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -861,6 +861,13 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
amdgpu_lockup_timeout = 1;
}
 
+   /*
+* Disable timeout when GPU reset is disabled to avoid confusing
+* timeout messages in the kernel log.
+*/
+   if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
+   amdgpu_lockup_timeout = INT_MAX;
+
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, 
amdgpu_fw_load_type);  }
 
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

2018-03-19 Thread Evan Quan

Since under some heavy computing environment(dgemm test), it takes
the asic over 10+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite confusing although it
does not seem to bring any real problems.
As a quick workround, we choose to disable timeout when GPU reset
is disabled.

Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8bd9c3f..9d6a775 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -861,6 +861,13 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
amdgpu_lockup_timeout = 1;
}
 
+   /*
+* Disable timeout when GPU reset is disabled to avoid confusing
+* timeout messages in the kernel log.
+*/
+   if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
+   amdgpu_lockup_timeout = INT_MAX;
+
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, 
amdgpu_fw_load_type);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

58 matches

Mail list logo