Which branch to test latest support for Raven Ridge?

2018-04-13 Thread Bráulio Bhavamitra
Hi all,

I've been testing and mitigating the Raven Ridge crashes. I've just
compiled the kernel https://cgit.freedesktop.org/~agd5f/linux/ at branch
drm-next-4.18-wip to see if it would have the necessary fix for video
hangs/freezes. Unfortunetely, it still crashes in the same situations.

Which branch should I compile for the latest Raven Ridge support?

Cheers,
Bráulio

-- Forwarded message -
From: Bráulio Bhavamitra 
Date: Fri, Apr 13, 2018 at 7:24 PM
Subject: Re: Raven Ridge Ryzen 2500U hang reproduced
To: 


It ALWAYS crashes on shader15 of
http://www.graphicsfuzz.com/benchmark/android-v1.html.

Also reported at https://bugzilla.redhat.com/show_bug.cgi?id=1562530

Using kernel 4.16 with options rcu_nocb=0-15 and amdgpu.dpm=0

Cheers,
Bráulio


On Mon, Mar 26, 2018 at 8:30 PM Bráulio Bhavamitra 
wrote:

> Hi all,
>
> Following the random crashes happenning with many users (e.g.
> https://www.phoronix.com/scan.php?page=news_item=Raven-Ridge-March-Update),
> not only on Linux but also Windows, I've been struggling to reproduce and
> generate any error log.
>
> After discovering that the error only happenned with KDE and games (at
> least for me, see https://bugs.kde.org/show_bug.cgi?id=392378), I could
> reproduce after a failing suspend.
>
> The crash most of the times allows the mouse to keep moving, but anything
> else works. Except for this time the keyboard worked so I could switch the
> tty and save the dmesg messages. After this I had to force reboot as it got
> stuck trying to kill the lightdm service (gpu hanged?).
>
> The errors are, see attached the full dmesg:
> [ 2899.525650] amdgpu :03:00.0: couldn't schedule ib on ring 
> [ 2899.525769] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
> (-22)
> [ 2909.125047] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=174624, last emitted seq=174627
> [ 2909.125060] [drm] IP block:psp is hung!
> [ 2909.125063] [drm] GPU recovery disabled.
> [ 2914.756931] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
> amdgpu_cs_list_validate(validated) failed.
> [ 2914.756997] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process
> the buffer list -16!
> [ 2914.997372] amdgpu :03:00.0: couldn't schedule ib on ring 
> [ 2914.997498] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs
> (-22)
> [ 2930.117275] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
> amdgpu_cs_list_validate(validated) failed.
> [ 2930.117405] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process
> the buffer list -16!
> [ 2930.152015] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear
> memory with ring turned off.
> [ 2930.157940] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear
> memory with ring turned off.
> [ 2930.180535] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear
> memory with ring turned off.
> [ 2933.781692] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
> [ 2945.477205] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
> amdgpu_cs_list_validate(validated) failed.
> [ 2945.477348] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process
> the buffer list -16!
>
> System details:
> HP Envy x360 Ryzen 2500U
> ArchLinux, kernel 4.16rc6 and 4.15.12
>
> Cheers,
> bráulio
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCHv3] drm/amdkfd: Remove vla

2018-04-13 Thread Laura Abbott

There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. Switch to a constant value that covers all hardware.

[1] https://lkml.org/lkml/2018/3/7/621

Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Laura Abbott 
---
v3: Introduced a #define for the max value, switched to pr_err_once to
avoid log flood, switched to sizeof(array) per private suggestion.
---
 drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 8 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 2 ++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
index 035c351f47c5..db6d9336b80d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -139,10 +139,12 @@ static void interrupt_wq(struct work_struct *work)
 {
struct kfd_dev *dev = container_of(work, struct kfd_dev,
interrupt_work);
+   uint32_t ih_ring_entry[KFD_MAX_RING_ENTRY_SIZE];
 
-   uint32_t ih_ring_entry[DIV_ROUND_UP(
-   dev->device_info->ih_ring_entry_size,
-   sizeof(uint32_t))];
+   if (dev->device_info->ih_ring_entry_size > sizeof(ih_ring_entry)) {
+   dev_err_once(kfd_chardev(), "Ring entry too small\n");
+   return;
+   }
 
while (dequeue_ih_ring_entry(dev, ih_ring_entry))
dev->device_info->event_interrupt_class->interrupt_wq(dev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 96a9cc0f02c9..a90db05dfe61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -39,6 +39,8 @@
 
 #include "amd_shared.h"
 
+#define KFD_MAX_RING_ENTRY_SIZE8
+
 #define KFD_SYSFS_FILE_MODE 0444
 
 #define KFD_MMAP_DOORBELL_MASK 0x8ull
-- 
2.14.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v4)

2018-04-13 Thread Andrey Grodzovsky



On 04/13/2018 10:31 AM, Christian König wrote:

Am 13.04.2018 um 09:01 schrieb Emily Deng:

issue:
there are VMC page fault occurred if force APP kill during
3dmark test, the cause is in entity_fini we manually signal
all those jobs in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on
shadow buffer, and shadow buffer's Gart table is cleaned by
ttm_bo_release since the fence in its reservation was fake signaled
by entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime
of gfx job we manually fake signal all jobs from its entity
in entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX page fault.

fix:
1)should at least wait all jobs already scheduled complete in 
entity_fini()

if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency, should
set this entity guilty to prevent its job really run since the 
dependency

is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs
with an error code.

v3:
1)Replace the fini1 and fini2 with better name
2)Call the first part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part
after the VM teardown
3)Keep the original function drm_sched_entity_fini to
refine the code.

v4:
1)Rename entity->finished to entity->last_scheduled;
2)Rename drm_sched_entity_fini_job_cb() to
drm_sched_entity_kill_jobs_cb();
3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT;
4)Replace the type of entity->fini_status with "int";
5)Remove the check about entity->finished.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 


At least of hand that looks really good.

Patch is Reviewed-by: Christian König .

Andrey, David you guys also recently hacked on the scheduler, so can 
please take a look as well?


Took a look at the patch and previous discussions, as far as I can 
understand,  looks good to me.


Andrey



Thanks,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64 


  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 71 
++-

  include/drm/gpu_scheduler.h   |  7 +++
  5 files changed, 128 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 5734871..b3d047d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void 
*data,
  int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx, unsigned 
ring_id);

    void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
  void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 09d35051..eb80edf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -111,8 +111,9 @@ static int amdgpu_ctx_init(struct amdgpu_device 
*adev,

  return r;
  }
  -static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
+static void amdgpu_ctx_fini(struct kref *ref)
  {
+    struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx, 
refcount);

  struct amdgpu_device *adev = ctx->adev;
  unsigned i, j;
  @@ -125,13 +126,11 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx 
*ctx)

  kfree(ctx->fences);
  ctx->fences = NULL;
  -    for (i = 0; i < adev->num_rings; i++)
-    drm_sched_entity_fini(>rings[i]->sched,
-  >rings[i].entity);
-
  amdgpu_queue_mgr_fini(adev, >queue_mgr);
    mutex_destroy(>lock);
+
+    kfree(ctx);
  }
    static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
@@ -170,12 +169,15 @@ static int amdgpu_ctx_alloc(struct 
amdgpu_device *adev,

  static void amdgpu_ctx_do_release(struct kref *ref)
  {
  struct amdgpu_ctx *ctx;
+    u32 i;
    ctx = container_of(ref, struct amdgpu_ctx, refcount);
  -    amdgpu_ctx_fini(ctx);
+    for (i = 0; i < ctx->adev->num_rings; i++)
+ drm_sched_entity_fini(>adev->rings[i]->sched,
+    >rings[i].entity);
  -    kfree(ctx);
+    amdgpu_ctx_fini(ref);
  }
    static int amdgpu_ctx_free(struct amdgpu_fpriv *fpriv, uint32_t id)
@@ -435,16 +437,62 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr 
*mgr)

  idr_init(>ctx_handles);
  }
  +void 

Re: RFC for a render API to support adaptive sync and VRR

2018-04-13 Thread Harry Wentland
On 2018-04-12 05:38 PM, Stéphane Marchesin wrote:
> On Tue, Apr 10, 2018 at 12:37 AM, Michel Dänzer  wrote:
>> On 2018-04-10 08:45 AM, Christian König wrote:
>>> Am 09.04.2018 um 23:45 schrieb Manasi Navare:
 Thanks for initiating the discussion. Find my comments below:
 On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
> On 2018-04-09 03:56 PM, Harry Wentland wrote:
>>
>> === A DRM render API to support variable refresh rates ===
>>
>> In order to benefit from adaptive sync and VRR userland needs a way
>> to let us know whether to vary frame timings or to target a
>> different frame time. These can be provided as atomic properties on
>> a CRTC:
>>   * boolvariable_refresh_compatible
>>   * inttarget_frame_duration_ns (nanosecond frame duration)
>>
>> This gives us the following cases:
>>
>> variable_refresh_compatible = 0, target_frame_duration_ns = 0
>>   * drive monitor at timing's normal refresh rate
>>
>> variable_refresh_compatible = 1, target_frame_duration_ns = 0
>>   * send new frame to monitor as soon as it's available, if within
>> min/max of monitor's reported capabilities
>>
>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
>>   * send new frame to monitor with the specified
>> target_frame_duration_ns
>>
>> When a target_frame_duration_ns or variable_refresh_compatible
>> cannot be supported the atomic check will reject the commit.
>>
 What I would like is two sets of properties on a CRTC or preferably on
 a connector:

 KMD properties that UMD can query:
 * vrr_capable -  This will be an immutable property for exposing
 hardware's capability of supporting VRR. This will be set by the
 kernel after
 reading the EDID mode information and monitor range capabilities.
 * vrr_vrefresh_max, vrr_vrefresh_min - To expose the min and max
 refresh rates supported.
 These properties are optional and will be created and attached to the
 DP/eDP connector when the connector
 is getting intialized.
>>>
>>> Mhm, aren't those properties actually per mode and not per CRTC/connector?
>>>
 Properties that you mentioned above that the UMD can set before kernel
 can enable VRR functionality
 *bool vrr_enable or vrr_compatible
 target_frame_duration_ns
>>>
>>> Yeah, that certainly makes sense. But target_frame_duration_ns is a bad
>>> name/semantics.
>>>
>>> We should use an absolute timestamp where the frame should be presented,
>>> otherwise you could run into a bunch of trouble with IOCTL restarts or
>>> missed blanks.
>>
>> Also, a fixed target frame duration isn't suitable even for video
>> playback, due to drift between the video and audio clocks.
>>
>> Time-based presentation seems to be the right approach for preventing
>> micro-stutter in games as well, Croteam developers have been researching
>> this.
> 
> Another case that you can handle with time-based presentation but not
> with refresh-based API is the use of per-scanline flips in conjunction
> with damage rects. For example if you know that the damage rect covers
> a certain Y range, you can flip when you're outside that range if the
> time that you were given allows it. That's even independent from VRR
> displays.
> 

That's an interesting use-case. I don't think we have given much thought to 
damage rects before.

Harry

> Stéphane
> 
> 
>>
>>
>> --
>> Earthling Michel Dänzer   |   http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-13 Thread Harry Wentland
On 2018-04-13 12:04 PM, Daniel Vetter wrote:
> On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
>> Adding dri-devel, which I should've included from the start.
> 
> Top posting, because I'm lazy and was out sick ...
> 
> Few observations:
> - Stéphane has a great point which seems to have been ignored thus far.
> - Where's the VK extension for this - there must be one :-) Starting with
>   a full implementation for that (based on radv or anv or something like
>   that) might help.

Good point. The intention of this very early RFC was to understand if we're 
sort of thinking along the same lines as the rest of the community before going 
ahead and prototyping something that's not going to work well in the end. The 
focus here was on the kernel API. We haven't done any investigation of VK, GL, 
or MM APIs on this yet and were hoping for some guidance on that. That guidance 
seems to be that from VK and MM API perspectives frame_duration doesn't cut it 
and we should rather pursue an absolute presentation time.

Harry

> - Imo if we do a conversion between the vk api and what we feed into the
>   hw, then let's not do a midlayer mistake: That conversion should happen
>   at the bottom, in the kernel driver, maybe assisted with some helpers.
>   Not somewhere in-between, like in libdrm of all places!
> 
> Cheers, Daniel
> 
>>
>> On 2018-04-09 03:56 PM, Harry Wentland wrote:
>>> === What is adaptive sync and VRR? ===
>>>
>>> Adaptive sync has been part of the DisplayPort spec for a while now and 
>>> allows graphics adapters to drive displays with varying frame timings. VRR 
>>> (variable refresh rate) is essentially the same, but defined for HDMI.
>>>
>>>
>>>
>>> === Why allow variable frame timings? ===
>>>
>>> Variable render times don't align with fixed refresh rates, leading to
>>> stuttering, tearing, and/or input lag.
>>>
>>> e.g. (rc = render completion, dr = display refresh)
>>>
>>> rc   B  CDE  F
>>> dr  A   B   C   C   D   E   F
>>>
>>> ^ ^
>>>   frame missed 
>>>  repeated   display
>>>   twice refresh   
>>>
>>>
>>>
>>> === Other use cases of adaptive sync 
>>>
>>> Beside the variable render case, adaptive sync also allows adjustment of 
>>> refresh rates without a mode change. One such use case would be 24 Hz video.
>>>
>>>
>>>
>>> === A DRM render API to support variable refresh rates ===
>>>
>>> In order to benefit from adaptive sync and VRR userland needs a way to let 
>>> us know whether to vary frame timings or to target a different frame time. 
>>> These can be provided as atomic properties on a CRTC:
>>>  * bool variable_refresh_compatible
>>>  * int  target_frame_duration_ns (nanosecond frame duration)
>>>
>>> This gives us the following cases:
>>>
>>> variable_refresh_compatible = 0, target_frame_duration_ns = 0
>>>  * drive monitor at timing's normal refresh rate
>>>
>>> variable_refresh_compatible = 1, target_frame_duration_ns = 0
>>>  * send new frame to monitor as soon as it's available, if within min/max 
>>> of monitor's reported capabilities
>>>
>>> variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
>>>  * send new frame to monitor with the specified target_frame_duration_ns
>>>
>>> When a target_frame_duration_ns or variable_refresh_compatible cannot be 
>>> supported the atomic check will reject the commit.
>>>
>>>
>>>
>>> === Previous discussions ===
>>>
>>> https://lists.freedesktop.org/archives/dri-devel/2017-October/155207.html
>>>
>>>
>>>
>>> === Feedback and moving forward ===
>>>
>>> I'm hoping to get some feedback on this or continue the discussion on how 
>>> adaptive sync / VRR might look like in the DRM ecosystem. Once there are no 
>>> major concerns or objections left we'll probably start creating some 
>>> patches to sketch this out a bit better and see how it looks in practice.
>>>
>>>
>>>
>>> Cheers,
>>> Harry
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot (V2)

2018-04-13 Thread Alex Deucher
On Fri, Apr 13, 2018 at 9:25 AM, Christian König
 wrote:
> Am 13.04.2018 um 10:31 schrieb Shirish S:
>>
>> amdgpu_ib_ring_tests() runs test IB's on rings at boot
>> contributes to ~500 ms of amdgpu driver's boot time.
>>
>> This patch defers it and ensures that its executed
>> in amdgpu_info_ioctl() if it wasn't scheduled.
>>
>> V2: Use queue_delayed_work() & flush_delayed_work().
>>
>> Signed-off-by: Shirish S 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  4 
>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 5734871..ae8f722 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1611,6 +1611,8 @@ struct amdgpu_device {
>> /* delayed work_func for deferring clockgating during resume */
>> struct delayed_work late_init_work;
>> +   /* delayed work_func to defer testing IB's on rings during boot */
>> +   struct delayed_work late_init_test_ib_work;
>
>
> That still has the chance of running the late init in parallel with the IB
> tests and that really doesn't looks like a good idea to me.

Yeah, at least on older chips we run into problems if we power or
clock gate some engines while they are in use.  Even on engines that
support dynamic gating, you usually have to set it up while the engine
is idle.  Make sure the IB tests run before we enable gating.

Alex

>
> Is there any issue with putting the IB test into the late init work handler
> as well?
>
>
>> struct amdgpu_virt  virt;
>> /* firmware VRAM reservation */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 1762eb4..ee84058 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>>   MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
>> #define AMDGPU_RESUME_MS2000
>> +#define AMDGPU_IB_TEST_SCHED_MS2000
>> static const char *amdgpu_asic_name[] = {
>> "TAHITI",
>> @@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum
>> amd_asic_type asic_type)
>> }
>>   }
>>   +static void amdgpu_device_late_init_test_ib_func_handler(struct
>> work_struct *work)
>> +{
>> +   struct amdgpu_device *adev =
>> +   container_of(work, struct amdgpu_device,
>> late_init_test_ib_work.work);
>> +   int r = amdgpu_ib_ring_tests(adev);
>> +
>> +   if (r)
>> +   DRM_ERROR("ib ring test failed (%d).\n", r);
>> +}
>> +
>>   /**
>>* amdgpu_device_has_dc_support - check if dc is supported
>>*
>> @@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> INIT_LIST_HEAD(>ring_lru_list);
>> spin_lock_init(>ring_lru_list_lock);
>>   + INIT_DELAYED_WORK(>late_init_test_ib_work,
>> + amdgpu_device_late_init_test_ib_func_handler);
>> INIT_DELAYED_WORK(>late_init_work,
>>   amdgpu_device_ip_late_init_func_handler);
>>   @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> goto failed;
>> }
>>   - r = amdgpu_ib_ring_tests(adev);
>> -   if (r)
>> -   DRM_ERROR("ib ring test failed (%d).\n", r);
>> +   /* Schedule amdgpu_ib_ring_tests() */
>> +   queue_delayed_work(system_wq, >late_init_test_ib_work,
>> +   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));
>> if (amdgpu_sriov_vf(adev))
>> amdgpu_virt_init_data_exchange(adev);
>> @@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>> }
>> adev->accel_working = false;
>> cancel_delayed_work_sync(>late_init_work);
>> +   cancel_delayed_work_sync(>late_init_test_ib_work);
>> /* free i2c buses */
>> if (!amdgpu_device_has_dc_support(adev))
>> amdgpu_i2c_fini(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 487d39e..6fa326b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -279,6 +279,10 @@ static int amdgpu_info_ioctl(struct drm_device *dev,
>> void *data, struct drm_file
>> if (!info->return_size || !info->return_pointer)
>> return -EINVAL;
>>   + /* Ensure IB tests on ring are executed */
>> +   if (delayed_work_pending(>late_init_test_ib_work))
>> +   flush_delayed_work(>late_init_test_ib_work);
>> +
>
>
> You just need to call flush_delayed_work() here without the if.
>
> Regards,
> Christian.
>
>> 

[PATCH] drm/amdgpu/acp: Fix slab-out-of-bounds in mfd_add_device in acp_hw_init

2018-04-13 Thread Daniel Kurtz
Commit 51f7415039d4 ("drm/amd/amdgpu: creating two I2S instances for
stoney/cz") added support for the "BT_I2S" ACP i2s channel.  As part of
this change, one additional acp resource was added, but the "num_resource"
count was accidentally incremented by 2.

This incorrect count eventually causes mfd_add_device() to try to access
an invalid memory address (the location of non-existent resource 5.

This fault was detected by running a KASAN enabled kernel, which produced
the following splat at boot:

[6.612987] 
==
[6.613509] BUG: KASAN: slab-out-of-bounds in mfd_add_device+0x4bc/0x7a7
[6.613509] Read of size 8 at addr 880107d4dc58 by task swapper/0/1
[6.613509]
[6.613509] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.33 #349
[6.613509] Hardware name: Google Grunt/Grunt, BIOS 
Google_Grunt.10543.0.2018_04_03_1812 04/02/2018
[6.613509] Call Trace:
[6.613509]  dump_stack+0x4d/0x63
[6.613509]  print_address_description+0x80/0x2d6
[6.613509]  ? mfd_add_device+0x4bc/0x7a7
[6.613509]  kasan_report+0x255/0x295
[6.613509]  mfd_add_device+0x4bc/0x7a7
[6.613509]  ? kasan_kmalloc+0x99/0xa8
[6.613509]  ? mfd_add_devices+0x58/0xe4
[6.613509]  ? __kmalloc+0x154/0x178
[6.613509]  mfd_add_devices+0xa5/0xe4
[6.613509]  acp_hw_init+0x92e/0xc4a
[6.613509]  amdgpu_device_init+0x1dfb/0x22a2
[6.613509]  ? kmalloc_order+0x53/0x5d
[6.613509]  ? kmalloc_order_trace+0x23/0xb3
[6.613509]  amdgpu_driver_load_kms+0xce/0x267
[6.613509]  drm_dev_register+0x169/0x2fb
[6.613509]  amdgpu_pci_probe+0x217/0x242
[6.613509]  pci_device_probe+0x101/0x18e
[6.613509]  driver_probe_device+0x1dd/0x419
[6.613509]  ? ___might_sleep+0x80/0x1b6
[6.613509]  __driver_attach+0x9f/0xc9
[6.613509]  ? driver_probe_device+0x419/0x419
[6.613509]  bus_for_each_dev+0xbc/0xe1
[6.613509]  bus_add_driver+0x189/0x2c0
[6.613509]  driver_register+0x108/0x156
[6.613509]  ? ttm_init+0x67/0x67
[6.613509]  do_one_initcall+0xb2/0x161
[6.613509]  kernel_init_freeable+0x25a/0x308
[6.613509]  ? rest_init+0xcc/0xcc
[6.613509]  kernel_init+0x11/0x10d
[6.613509]  ? rest_init+0xcc/0xcc
[6.613509]  ret_from_fork+0x22/0x40
[6.613509]
[6.613509] Allocated by task 1:
[6.613509]  save_stack+0x46/0xce
[6.613509]  kasan_kmalloc+0x99/0xa8
[6.613509]  kmem_cache_alloc_trace+0x11a/0x13e
[6.613509]  acp_hw_init+0x210/0xc4a
[6.613509]  amdgpu_device_init+0x1dfb/0x22a2
[6.613509]  amdgpu_driver_load_kms+0xce/0x267
[6.613509]  drm_dev_register+0x169/0x2fb
[6.613509]  amdgpu_pci_probe+0x217/0x242
[6.613509]  pci_device_probe+0x101/0x18e
[6.613509]  driver_probe_device+0x1dd/0x419
[6.613509]  __driver_attach+0x9f/0xc9
[6.613509]  bus_for_each_dev+0xbc/0xe1
[6.613509]  bus_add_driver+0x189/0x2c0
[6.613509]  driver_register+0x108/0x156
[6.613509]  do_one_initcall+0xb2/0x161
[6.613509]  kernel_init_freeable+0x25a/0x308
[6.613509]  kernel_init+0x11/0x10d
[6.613509]  ret_from_fork+0x22/0x40
[6.613509]
[6.613509] Freed by task 0:
[6.613509] (stack is not available)
[6.613509]
[6.613509] The buggy address belongs to the object at 880107d4db08
[6.613509]  which belongs to the cache kmalloc-512 of size 512
[6.613509] The buggy address is located 336 bytes inside of
[6.613509]  512-byte region [880107d4db08, 880107d4dd08)
[6.613509] The buggy address belongs to the page:
[6.613509] page:ea00041f5300 count:1 mapcount:0 mapping:  
(null) index:0x0 compound_mapcount: 0
[6.613509] flags: 0x80008100(slab|head)
[6.613509] raw: 80008100   
000100120012
[6.613509] raw: ea0004208520 88010b001680 88010b002cc0 

[6.613509] page dumped because: kasan: bad access detected
[6.613509]
[6.613509] Memory state around the buggy address:
[6.613509]  880107d4db00: fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[6.613509]  880107d4db80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00
[6.613509] >880107d4dc00: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc 
fc
[6.613509] ^
[6.613509]  880107d4dc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[6.613509]  880107d4dd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[6.613509] 
==

Fixes: 51f7415039d4 ("drm/amd/amdgpu: creating two I2S instances for stoney/cz")
Signed-off-by: Daniel Kurtz 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c
index 20f20079935b..42f0d60cf3f4 100644
--- 

Re: RFC for a render API to support adaptive sync and VRR

2018-04-13 Thread Daniel Vetter
On Mon, Apr 09, 2018 at 04:00:21PM -0400, Harry Wentland wrote:
> Adding dri-devel, which I should've included from the start.

Top posting, because I'm lazy and was out sick ...

Few observations:
- Stéphane has a great point which seems to have been ignored thus far.
- Where's the VK extension for this - there must be one :-) Starting with
  a full implementation for that (based on radv or anv or something like
  that) might help.
- Imo if we do a conversion between the vk api and what we feed into the
  hw, then let's not do a midlayer mistake: That conversion should happen
  at the bottom, in the kernel driver, maybe assisted with some helpers.
  Not somewhere in-between, like in libdrm of all places!

Cheers, Daniel

> 
> On 2018-04-09 03:56 PM, Harry Wentland wrote:
> > === What is adaptive sync and VRR? ===
> > 
> > Adaptive sync has been part of the DisplayPort spec for a while now and 
> > allows graphics adapters to drive displays with varying frame timings. VRR 
> > (variable refresh rate) is essentially the same, but defined for HDMI.
> > 
> > 
> > 
> > === Why allow variable frame timings? ===
> > 
> > Variable render times don't align with fixed refresh rates, leading to
> > stuttering, tearing, and/or input lag.
> > 
> > e.g. (rc = render completion, dr = display refresh)
> > 
> > rc   B  CDE  F
> > dr  A   B   C   C   D   E   F
> > 
> > ^ ^
> >   frame missed 
> >  repeated   display
> >   twice refresh   
> > 
> > 
> > 
> > === Other use cases of adaptive sync 
> > 
> > Beside the variable render case, adaptive sync also allows adjustment of 
> > refresh rates without a mode change. One such use case would be 24 Hz video.
> > 
> > 
> > 
> > === A DRM render API to support variable refresh rates ===
> > 
> > In order to benefit from adaptive sync and VRR userland needs a way to let 
> > us know whether to vary frame timings or to target a different frame time. 
> > These can be provided as atomic properties on a CRTC:
> >  * bool variable_refresh_compatible
> >  * int  target_frame_duration_ns (nanosecond frame duration)
> > 
> > This gives us the following cases:
> > 
> > variable_refresh_compatible = 0, target_frame_duration_ns = 0
> >  * drive monitor at timing's normal refresh rate
> > 
> > variable_refresh_compatible = 1, target_frame_duration_ns = 0
> >  * send new frame to monitor as soon as it's available, if within min/max 
> > of monitor's reported capabilities
> > 
> > variable_refresh_compatible = 0/1, target_frame_duration_ns = > 0
> >  * send new frame to monitor with the specified target_frame_duration_ns
> > 
> > When a target_frame_duration_ns or variable_refresh_compatible cannot be 
> > supported the atomic check will reject the commit.
> > 
> > 
> > 
> > === Previous discussions ===
> > 
> > https://lists.freedesktop.org/archives/dri-devel/2017-October/155207.html
> > 
> > 
> > 
> > === Feedback and moving forward ===
> > 
> > I'm hoping to get some feedback on this or continue the discussion on how 
> > adaptive sync / VRR might look like in the DRM ecosystem. Once there are no 
> > major concerns or objections left we'll probably start creating some 
> > patches to sketch this out a bit better and see how it looks in practice.
> > 
> > 
> > 
> > Cheers,
> > Harry
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v5 1/2] drm/amdgpu/gmc: steal the appropriate amount of vram for fw hand-over (v2)

2018-04-13 Thread Andrey Grodzovsky
From: Alex Deucher 

Steal 9 MB for vga emulation and fb if vga is enabled, otherwise,
steal enough to cover the current display size as set by the vbios.

If no memory is used (e.g., secondary or headless card), skip
stolen memory reserve.

v2: skip reservation if vram is limited, address Christian's comments

Reviewed-and-Tested-by: Andrey Grodzovsky  (v1)
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 +
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c   | 23 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c   | 23 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c   | 23 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   | 51 +
 5 files changed, 116 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index ab73300e..2be04ac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1441,12 +1441,14 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return r;
}
 
-   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   >stolen_vga_memory,
-   NULL, NULL);
-   if (r)
-   return r;
+   if (adev->gmc.stolen_size) {
+   r = amdgpu_bo_create_kernel(adev, adev->gmc.stolen_size, 
PAGE_SIZE,
+   AMDGPU_GEM_DOMAIN_VRAM,
+   >stolen_vga_memory,
+   NULL, NULL);
+   if (r)
+   return r;
+   }
DRM_INFO("amdgpu: %uM of VRAM memory ready\n",
 (unsigned) (adev->gmc.real_vram_size / (1024 * 1024)));
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 5617cf6..24e1ea3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -825,6 +825,25 @@ static int gmc_v6_0_late_init(void *handle)
return 0;
 }
 
+static unsigned gmc_v6_0_get_vbios_fb_size(struct amdgpu_device *adev)
+{
+   u32 d1vga_control = RREG32(mmD1VGA_CONTROL);
+   unsigned size;
+
+   if (REG_GET_FIELD(d1vga_control, D1VGA_CONTROL, D1VGA_MODE_ENABLE)) {
+   size = 9 * 1024 * 1024; /* reserve 8MB for vga emulator and 1 
MB for FB */
+   } else {
+   u32 viewport = RREG32(mmVIEWPORT_SIZE);
+   size = (REG_GET_FIELD(viewport, VIEWPORT_SIZE, VIEWPORT_HEIGHT) 
*
+   REG_GET_FIELD(viewport, VIEWPORT_SIZE, VIEWPORT_WIDTH) *
+   4);
+   }
+   /* return 0 if the pre-OS buffer uses up most of vram */
+   if ((adev->gmc.real_vram_size - size) < (8 * 1024 * 1024))
+   return 0;
+   return size;
+}
+
 static int gmc_v6_0_sw_init(void *handle)
 {
int r;
@@ -851,8 +870,6 @@ static int gmc_v6_0_sw_init(void *handle)
 
adev->gmc.mc_mask = 0xffULL;
 
-   adev->gmc.stolen_size = 256 * 1024;
-
adev->need_dma32 = false;
dma_bits = adev->need_dma32 ? 32 : 40;
r = pci_set_dma_mask(adev->pdev, DMA_BIT_MASK(dma_bits));
@@ -878,6 +895,8 @@ static int gmc_v6_0_sw_init(void *handle)
if (r)
return r;
 
+   adev->gmc.stolen_size = gmc_v6_0_get_vbios_fb_size(adev);
+
r = amdgpu_bo_init(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 80054f3..93861f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -964,6 +964,25 @@ static int gmc_v7_0_late_init(void *handle)
return 0;
 }
 
+static unsigned gmc_v7_0_get_vbios_fb_size(struct amdgpu_device *adev)
+{
+   u32 d1vga_control = RREG32(mmD1VGA_CONTROL);
+   unsigned size;
+
+   if (REG_GET_FIELD(d1vga_control, D1VGA_CONTROL, D1VGA_MODE_ENABLE)) {
+   size = 9 * 1024 * 1024; /* reserve 8MB for vga emulator and 1 
MB for FB */
+   } else {
+   u32 viewport = RREG32(mmVIEWPORT_SIZE);
+   size = (REG_GET_FIELD(viewport, VIEWPORT_SIZE, VIEWPORT_HEIGHT) 
*
+   REG_GET_FIELD(viewport, VIEWPORT_SIZE, VIEWPORT_WIDTH) *
+   4);
+   }
+   /* return 0 if the pre-OS buffer uses up most of vram */
+   if ((adev->gmc.real_vram_size - size) < (8 * 1024 * 1024))
+   return 0;
+   return size;
+}
+
 static int gmc_v7_0_sw_init(void *handle)
 {
int r;
@@ -998,8 +1017,6 @@ static int gmc_v7_0_sw_init(void *handle)
 */
adev->gmc.mc_mask = 0xffULL; /* 40 bit MC */
 
-   adev->gmc.stolen_size = 256 * 1024;

[PATCH v5 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

2018-04-13 Thread Andrey Grodzovsky
Reserved VRAM is used to avoid overriding pre OS FB.
Once our display stack takes over we don't need the reserved
VRAM anymore.

v2:
Remove comment, we know actually why we need to reserve the stolen VRAM.
Fix return type for amdgpu_ttm_late_init.
v3:
Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
v4: rebase
v5:
For GMC9 reserve always just 9M and keep the stolem memory around
until GART table curruption on S3 resume is resolved.
Rebase.

Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h|  1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  |  2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 26 ++
 8 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9e23d6f..a160ef0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -852,6 +852,13 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
return amdgpu_ttm_init(adev);
 }
 
+int amdgpu_bo_late_init(struct amdgpu_device *adev)
+{
+   amdgpu_ttm_late_init(adev);
+
+   return 0;
+}
+
 void amdgpu_bo_fini(struct amdgpu_device *adev)
 {
amdgpu_ttm_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 3bee133..1e9fe85 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -251,6 +251,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
domain,
 int amdgpu_bo_unpin(struct amdgpu_bo *bo);
 int amdgpu_bo_evict_vram(struct amdgpu_device *adev);
 int amdgpu_bo_init(struct amdgpu_device *adev);
+int amdgpu_bo_late_init(struct amdgpu_device *adev);
 void amdgpu_bo_fini(struct amdgpu_device *adev);
 int amdgpu_bo_fbdev_mmap(struct amdgpu_bo *bo,
struct vm_area_struct *vma);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2be04ac..29efaac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1517,13 +1517,17 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
return 0;
 }
 
+void amdgpu_ttm_late_init(struct amdgpu_device *adev)
+{
+   amdgpu_bo_free_kernel(>stolen_vga_memory, NULL, NULL);
+}
+
 void amdgpu_ttm_fini(struct amdgpu_device *adev)
 {
if (!adev->mman.initialized)
return;
 
amdgpu_ttm_debugfs_fini(adev);
-   amdgpu_bo_free_kernel(>stolen_vga_memory, NULL, NULL);
amdgpu_ttm_fw_reserve_vram_fini(adev);
if (adev->mman.aper_base_kaddr)
iounmap(adev->mman.aper_base_kaddr);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 6ea7de8..e969c87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -77,6 +77,7 @@ uint64_t amdgpu_vram_mgr_usage(struct ttm_mem_type_manager 
*man);
 uint64_t amdgpu_vram_mgr_vis_usage(struct ttm_mem_type_manager *man);
 
 int amdgpu_ttm_init(struct amdgpu_device *adev);
+void amdgpu_ttm_late_init(struct amdgpu_device *adev);
 void amdgpu_ttm_fini(struct amdgpu_device *adev);
 void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev,
bool enable);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
index 24e1ea3..79f9ac2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
@@ -819,6 +819,8 @@ static int gmc_v6_0_late_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   amdgpu_bo_late_init(adev);
+
if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS)
return amdgpu_irq_get(adev, >gmc.vm_fault, 0);
else
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 93861f9..7147bfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -958,6 +958,8 @@ static int gmc_v7_0_late_init(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   amdgpu_bo_late_init(adev);
+
if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS)
return amdgpu_irq_get(adev, >gmc.vm_fault, 0);
else
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index fbd8f56..4d970da 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ 

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot (V2)

2018-04-13 Thread Christian König

Am 13.04.2018 um 10:31 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and ensures that its executed
in amdgpu_info_ioctl() if it wasn't scheduled.

V2: Use queue_delayed_work() & flush_delayed_work().

Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  4 
  3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
  
  	/* delayed work_func for deferring clockgating during resume */

struct delayed_work late_init_work;
+   /* delayed work_func to defer testing IB's on rings during boot */
+   struct delayed_work late_init_test_ib_work;


That still has the chance of running the late init in parallel with the 
IB tests and that really doesn't looks like a good idea to me.


Is there any issue with putting the IB test into the late init work 
handler as well?


  
  	struct amdgpu_virt	virt;

/* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..ee84058 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
  
  #define AMDGPU_RESUME_MS		2000

+#define AMDGPU_IB_TEST_SCHED_MS2000
  
  static const char *amdgpu_asic_name[] = {

"TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)
}
  }
  
+static void amdgpu_device_late_init_test_ib_func_handler(struct work_struct *work)

+{
+   struct amdgpu_device *adev =
+   container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);
+   int r = amdgpu_ib_ring_tests(adev);
+
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(>ring_lru_list);
spin_lock_init(>ring_lru_list_lock);
  
+	INIT_DELAYED_WORK(>late_init_test_ib_work,

+ amdgpu_device_late_init_test_ib_func_handler);
INIT_DELAYED_WORK(>late_init_work,
  amdgpu_device_ip_late_init_func_handler);
  
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,

goto failed;
}
  
-	r = amdgpu_ib_ring_tests(adev);

-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   /* Schedule amdgpu_ib_ring_tests() */
+   queue_delayed_work(system_wq, >late_init_test_ib_work,
+   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));
  
  	if (amdgpu_sriov_vf(adev))

amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
}
adev->accel_working = false;
cancel_delayed_work_sync(>late_init_work);
+   cancel_delayed_work_sync(>late_init_test_ib_work);
/* free i2c buses */
if (!amdgpu_device_has_dc_support(adev))
amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 487d39e..6fa326b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,10 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (!info->return_size || !info->return_pointer)
return -EINVAL;
  
+	/* Ensure IB tests on ring are executed */

+   if (delayed_work_pending(>late_init_test_ib_work))
+   flush_delayed_work(>late_init_test_ib_work);
+


You just need to call flush_delayed_work() here without the if.

Regards,
Christian.


switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 10/21] drm/amd/display: Add back code to allow for rounding error

2018-04-13 Thread Koo, Anthony
Thanks for the comments.

The input parameters are intended as uHz, but units do get translated within 
the function. We did not want to create new unnecessary parameters on the 
stack, which led up to the slightly confusing units. Maybe a better comment 
here would also help.
And indeed, the comment about ceil needs to be fixed up.

Thanks,
Anthony

From: Nils Wallménius [mailto:nils.wallmen...@gmail.com]
Sent: Friday, April 13, 2018 1:05 AM
To: Wentland, Harry 
Cc: amd-gfx@lists.freedesktop.org; Koo, Anthony 
Subject: Re: [PATCH 10/21] drm/amd/display: Add back code to allow for rounding 
error


Den tis 10 apr. 2018 23:11Harry Wentland 
> skrev:
From: Anthony Koo >

Signed-off-by: Anthony Koo >
Reviewed-by: Aric Cyr >
Acked-by: Harry Wentland >
---
 drivers/gpu/drm/amd/display/modules/freesync/freesync.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c 
b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
index 4887c888bbe7..abd5c9374eb3 100644
--- a/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
+++ b/drivers/gpu/drm/amd/display/modules/freesync/freesync.c
@@ -896,6 +896,17 @@ bool mod_freesync_is_valid_range(struct mod_freesync 
*mod_freesync,
unsigned long long nominal_field_rate_in_uhz =
mod_freesync_calc_nominal_field_rate(stream);

+   /* Allow for some rounding error of actual video timing by taking ceil.
+* For example, 144 Hz mode timing may actually be 143.xxx Hz when
+* calculated from pixel rate and vertical/horizontal totals, but
+* this should be allowed instead of blocking FreeSync.
+*/

Hi, with this change the var names ending in *_in_uhz are a bit confusing, also 
the integer division will not round up (take ceil) as seems to be the intention 
from the above comment. Perhaps the comment needs to be improved?

BR
Nils


+   nominal_field_rate_in_uhz = div_u64(nominal_field_rate_in_uhz, 100);
+   min_refresh_cap_in_uhz /= 100;
+   max_refresh_cap_in_uhz /= 100;
+   min_refresh_request_in_uhz /= 100;
+   max_refresh_request_in_uhz /= 100;
+
// Check nominal is within range
if (nominal_field_rate_in_uhz > max_refresh_cap_in_uhz ||
nominal_field_rate_in_uhz < min_refresh_cap_in_uhz)
@@ -921,7 +932,7 @@ bool mod_freesync_is_valid_range(struct mod_freesync 
*mod_freesync,

// For variable range, check for at least 10 Hz range
if ((max_refresh_request_in_uhz != min_refresh_request_in_uhz) &&
-   (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 
1000))
+   (max_refresh_request_in_uhz - min_refresh_request_in_uhz < 10))
return false;

return true;
--
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 3/5] Keep CRTC properties consistent

2018-04-13 Thread Michel Dänzer
On 2018-04-12 06:48 PM, Leo Li wrote:
>
> A side question: How does xrandr display lengthy properties, like LUTs?

Presumably it would be displayed the same way as the EDID property, e.g.:

EDID:
000022f06f3201010101
181b0103803420782a4ca5a7554da226
105054210800b30095008100d1c0a9c0
81c0a9408180283c80a070b023403020
36000644211a00fd00323c1e
5011000a20202020202000fc0048
5020453234320a202020202000ff
00434e43373234305a37470a20200111
020318b148101f04130302121167030c
001022e2002b023a801871382d40
582c45000644211e023a80d07238
2d40102c45800644211e011d0072
51d01e206e2855000644211e011d
00bc52d01e20b82855400644211e
8c0ad08a20e02d10103e960006442100
0018002f


> Can users set it via --set still?

Not sure, maybe not. If not, maybe you can write a simple
proof-of-concept client, that should also be useful for review and
testing of these patches.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


答复: [PATCH 2/2] drm/amd/pp: Remove dead interface

2018-04-13 Thread Quan, Evan
Reviewed-by: Evan Quan 


发件人: amd-gfx  代表 Rex Zhu 

发送时间: 2018年4月13日 16:19:14
收件人: amd-gfx@lists.freedesktop.org
抄送: Zhu, Rex
主题: [PATCH 2/2] drm/amd/pp: Remove dead interface

Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h|  6 --
 drivers/gpu/drm/amd/include/kgd_pp_interface.h |  5 -
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 30 --
 3 files changed, 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index 19d8bf5..354c6dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -349,12 +349,6 @@ enum amdgpu_pcie_gen {
 ((adev)->powerplay.pp_funcs->set_clockgating_by_smu(\
 (adev)->powerplay.pp_handle, msg_id))

-#define amdgpu_dpm_notify_smu_memory_info(adev, virtual_addr_low, \
-   virtual_addr_hi, mc_addr_low, mc_addr_hi, size) \
-   ((adev)->powerplay.pp_funcs->notify_smu_memory_info)( \
-   (adev)->powerplay.pp_handle, virtual_addr_low, \
-   virtual_addr_hi, mc_addr_low, mc_addr_hi, size)
-
 #define amdgpu_dpm_get_power_profile_mode(adev, buf) \
 ((adev)->powerplay.pp_funcs->get_power_profile_mode(\
 (adev)->powerplay.pp_handle, buf))
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 1bec907..01969b1 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -239,11 +239,6 @@ struct amd_pm_funcs {
 int (*load_firmware)(void *handle);
 int (*wait_for_fw_loading_complete)(void *handle);
 int (*set_clockgating_by_smu)(void *handle, uint32_t msg_id);
-   int (*notify_smu_memory_info)(void *handle, uint32_t virtual_addr_low,
-   uint32_t virtual_addr_hi,
-   uint32_t mc_addr_low,
-   uint32_t mc_addr_hi,
-   uint32_t size);
 int (*set_power_limit)(void *handle, uint32_t n);
 int (*get_power_limit)(void *handle, uint32_t *limit, bool 
default_limit);
 /* export to DC */
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 0cdfa41..b1e916f 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -935,35 +935,6 @@ static int pp_dpm_switch_power_profile(void *handle,
 return 0;
 }

-static int pp_dpm_notify_smu_memory_info(void *handle,
-   uint32_t virtual_addr_low,
-   uint32_t virtual_addr_hi,
-   uint32_t mc_addr_low,
-   uint32_t mc_addr_hi,
-   uint32_t size)
-{
-   struct pp_hwmgr *hwmgr = handle;
-   int ret = 0;
-
-   if (!hwmgr || !hwmgr->pm_en)
-   return -EINVAL;
-
-   if (hwmgr->hwmgr_func->notify_cac_buffer_info == NULL) {
-   pr_info("%s was not implemented.\n", __func__);
-   return -EINVAL;
-   }
-
-   mutex_lock(>smu_lock);
-
-   ret = hwmgr->hwmgr_func->notify_cac_buffer_info(hwmgr, virtual_addr_low,
-   virtual_addr_hi, mc_addr_low, 
mc_addr_hi,
-   size);
-
-   mutex_unlock(>smu_lock);
-
-   return ret;
-}
-
 static int pp_set_power_limit(void *handle, uint32_t limit)
 {
 struct pp_hwmgr *hwmgr = handle;
@@ -1230,7 +1201,6 @@ static int pp_set_mmhub_powergating_by_smu(void *handle)
 .get_vce_clock_state = pp_dpm_get_vce_clock_state,
 .switch_power_profile = pp_dpm_switch_power_profile,
 .set_clockgating_by_smu = pp_set_clockgating_by_smu,
-   .notify_smu_memory_info = pp_dpm_notify_smu_memory_info,
 .get_power_profile_mode = pp_get_power_profile_mode,
 .set_power_profile_mode = pp_set_power_profile_mode,
 .odn_edit_dpm_table = pp_odn_edit_dpm_table,
--
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


答复: [PATCH 1/2] drm/amdgpu: Reserved vram for smu to save debug info.

2018-04-13 Thread Quan, Evan
Do we need to add check for adev->pm.smu_prv_buffer_size before calling

pp_reserve_vram_for_smu()? I mean maybe for dev->pm.smu_prv_buffer_size ==0, 
there is no need to proceed pp_reserve_vram_for_smu.


Regards,

Evan


发件人: amd-gfx  代表 Rex Zhu 

发送时间: 2018年4月13日 16:19:13
收件人: amd-gfx@lists.freedesktop.org
抄送: Zhu, Rex
主题: [PATCH 1/2] drm/amdgpu: Reserved vram for smu to save debug info.

Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 44 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  6 
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 36 ++
 5 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c688361..5ef0bc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -129,6 +129,7 @@
 extern int amdgpu_compute_multipipe;
 extern int amdgpu_gpu_recovery;
 extern int amdgpu_emu_mode;
+extern uint amdgpu_smu_memory_pool_size;

 #ifdef CONFIG_DRM_AMDGPU_SI
 extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..bcbbd23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -690,6 +690,8 @@ void amdgpu_device_gart_location(struct amdgpu_device *adev,
 {
 u64 size_af, size_bf;

+   mc->gart_size += adev->pm.smu_prv_buffer_size;
+
 size_af = adev->gmc.mc_mask - mc->vram_end;
 size_bf = mc->vram_start;
 if (size_bf > size_af) {
@@ -907,6 +909,46 @@ static void amdgpu_device_check_vm_size(struct 
amdgpu_device *adev)
 }
 }

+static void amdgpu_device_check_smu_prv_buffer_size(struct amdgpu_device *adev)
+{
+   struct sysinfo si;
+   bool is_os_64 = (sizeof(void *) == 8) ? true : false;
+   uint64_t total_memory;
+   uint64_t dram_size_seven_GB = 0x1B800;
+   uint64_t dram_size_three_GB = 0xB800;
+
+   if (amdgpu_smu_memory_pool_size == 0)
+   return;
+
+   if (!is_os_64) {
+   DRM_WARN("Not 64-bit OS, feature not supported\n");
+   goto def_value;
+   }
+   si_meminfo();
+   total_memory = (uint64_t)si.totalram * si.mem_unit;
+
+   if ((amdgpu_smu_memory_pool_size == 1) ||
+   (amdgpu_smu_memory_pool_size == 2)) {
+   if (total_memory < dram_size_three_GB)
+   goto def_value1;
+   } else if ((amdgpu_smu_memory_pool_size == 4) ||
+   (amdgpu_smu_memory_pool_size == 8)) {
+   if (total_memory < dram_size_seven_GB)
+   goto def_value1;
+   } else {
+   DRM_WARN("Smu memory pool size not supported\n");
+   goto def_value;
+   }
+   adev->pm.smu_prv_buffer_size = amdgpu_smu_memory_pool_size << 28;
+
+   return;
+
+def_value1:
+   DRM_WARN("No enough system memory\n");
+def_value:
+   adev->pm.smu_prv_buffer_size = 0;
+}
+
 /**
  * amdgpu_device_check_arguments - validate module params
  *
@@ -948,6 +990,8 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
 amdgpu_vm_fragment_size = -1;
 }

+   amdgpu_device_check_smu_prv_buffer_size(adev);
+
 amdgpu_device_check_vm_size(adev);

 amdgpu_device_check_block_size(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index b8c5177..19d8bf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -445,6 +445,8 @@ struct amdgpu_pm {
 uint32_tpcie_gen_mask;
 uint32_tpcie_mlw_mask;
 struct amd_pp_display_configuration pm_display_cfg;/* set by dc */
+   uint32_tsmu_prv_buffer_size;
+   struct amdgpu_bo*smu_prv_buffer;
 };

 #define R600_SSTU_DFLT   0
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0b19482..5c0567a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -132,6 +132,7 @@
 int amdgpu_compute_multipipe = -1;
 int amdgpu_gpu_recovery = -1; /* auto */
 int amdgpu_emu_mode = 0;
+uint amdgpu_smu_memory_pool_size = 0;

 MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -316,6 +317,11 @@
 module_param_named(cik_support, amdgpu_cik_support, int, 0444);
 #endif

+MODULE_PARM_DESC(smu_memory_pool_size,
+   "reserve gtt for smu debug usage, 0 = disable,"
+   "0x1 = 256Mbyte, 0x2 = 512Mbyte, 

[PATCH 1/2] drm/amdgpu: Reserved vram for smu to save debug info.

2018-04-13 Thread Rex Zhu
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 44 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  6 
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 36 ++
 5 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c688361..5ef0bc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -129,6 +129,7 @@
 extern int amdgpu_compute_multipipe;
 extern int amdgpu_gpu_recovery;
 extern int amdgpu_emu_mode;
+extern uint amdgpu_smu_memory_pool_size;
 
 #ifdef CONFIG_DRM_AMDGPU_SI
 extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..bcbbd23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -690,6 +690,8 @@ void amdgpu_device_gart_location(struct amdgpu_device *adev,
 {
u64 size_af, size_bf;
 
+   mc->gart_size += adev->pm.smu_prv_buffer_size;
+
size_af = adev->gmc.mc_mask - mc->vram_end;
size_bf = mc->vram_start;
if (size_bf > size_af) {
@@ -907,6 +909,46 @@ static void amdgpu_device_check_vm_size(struct 
amdgpu_device *adev)
}
 }
 
+static void amdgpu_device_check_smu_prv_buffer_size(struct amdgpu_device *adev)
+{
+   struct sysinfo si;
+   bool is_os_64 = (sizeof(void *) == 8) ? true : false;
+   uint64_t total_memory;
+   uint64_t dram_size_seven_GB = 0x1B800;
+   uint64_t dram_size_three_GB = 0xB800;
+
+   if (amdgpu_smu_memory_pool_size == 0)
+   return;
+
+   if (!is_os_64) {
+   DRM_WARN("Not 64-bit OS, feature not supported\n");
+   goto def_value;
+   }
+   si_meminfo();
+   total_memory = (uint64_t)si.totalram * si.mem_unit;
+
+   if ((amdgpu_smu_memory_pool_size == 1) ||
+   (amdgpu_smu_memory_pool_size == 2)) {
+   if (total_memory < dram_size_three_GB)
+   goto def_value1;
+   } else if ((amdgpu_smu_memory_pool_size == 4) ||
+   (amdgpu_smu_memory_pool_size == 8)) {
+   if (total_memory < dram_size_seven_GB)
+   goto def_value1;
+   } else {
+   DRM_WARN("Smu memory pool size not supported\n");
+   goto def_value;
+   }
+   adev->pm.smu_prv_buffer_size = amdgpu_smu_memory_pool_size << 28;
+
+   return;
+
+def_value1:
+   DRM_WARN("No enough system memory\n");
+def_value:
+   adev->pm.smu_prv_buffer_size = 0;
+}
+
 /**
  * amdgpu_device_check_arguments - validate module params
  *
@@ -948,6 +990,8 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
amdgpu_vm_fragment_size = -1;
}
 
+   amdgpu_device_check_smu_prv_buffer_size(adev);
+
amdgpu_device_check_vm_size(adev);
 
amdgpu_device_check_block_size(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index b8c5177..19d8bf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -445,6 +445,8 @@ struct amdgpu_pm {
uint32_tpcie_gen_mask;
uint32_tpcie_mlw_mask;
struct amd_pp_display_configuration pm_display_cfg;/* set by dc */
+   uint32_tsmu_prv_buffer_size;
+   struct amdgpu_bo*smu_prv_buffer;
 };
 
 #define R600_SSTU_DFLT   0
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0b19482..5c0567a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -132,6 +132,7 @@
 int amdgpu_compute_multipipe = -1;
 int amdgpu_gpu_recovery = -1; /* auto */
 int amdgpu_emu_mode = 0;
+uint amdgpu_smu_memory_pool_size = 0;
 
 MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -316,6 +317,11 @@
 module_param_named(cik_support, amdgpu_cik_support, int, 0444);
 #endif
 
+MODULE_PARM_DESC(smu_memory_pool_size,
+   "reserve gtt for smu debug usage, 0 = disable,"
+   "0x1 = 256Mbyte, 0x2 = 512Mbyte, 0x4 = 1 Gbyte, 0x8 = 2GByte");
+module_param_named(smu_memory_pool_size, amdgpu_smu_memory_pool_size, uint, 
0444);
+
 static const struct pci_device_id pciidlist[] = {
 #ifdef  CONFIG_DRM_AMDGPU_SI
{0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 66c49b8..0cdfa41 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ 

[PATCH 1/2] drm/amdgpu: Reserved vram for smu to save debug info.

2018-04-13 Thread Rex Zhu
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 44 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  6 
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c | 36 ++
 5 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c688361..5ef0bc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -129,6 +129,7 @@
 extern int amdgpu_compute_multipipe;
 extern int amdgpu_gpu_recovery;
 extern int amdgpu_emu_mode;
+extern uint amdgpu_smu_memory_pool_size;
 
 #ifdef CONFIG_DRM_AMDGPU_SI
 extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..bcbbd23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -690,6 +690,8 @@ void amdgpu_device_gart_location(struct amdgpu_device *adev,
 {
u64 size_af, size_bf;
 
+   mc->gart_size += adev->pm.smu_prv_buffer_size;
+
size_af = adev->gmc.mc_mask - mc->vram_end;
size_bf = mc->vram_start;
if (size_bf > size_af) {
@@ -907,6 +909,46 @@ static void amdgpu_device_check_vm_size(struct 
amdgpu_device *adev)
}
 }
 
+static void amdgpu_device_check_smu_prv_buffer_size(struct amdgpu_device *adev)
+{
+   struct sysinfo si;
+   bool is_os_64 = (sizeof(void *) == 8) ? true : false;
+   uint64_t total_memory;
+   uint64_t dram_size_seven_GB = 0x1B800;
+   uint64_t dram_size_three_GB = 0xB800;
+
+   if (amdgpu_smu_memory_pool_size == 0)
+   return;
+
+   if (!is_os_64) {
+   DRM_WARN("Not 64-bit OS, feature not supported\n");
+   goto def_value;
+   }
+   si_meminfo();
+   total_memory = (uint64_t)si.totalram * si.mem_unit;
+
+   if ((amdgpu_smu_memory_pool_size == 1) ||
+   (amdgpu_smu_memory_pool_size == 2)) {
+   if (total_memory < dram_size_three_GB)
+   goto def_value1;
+   } else if ((amdgpu_smu_memory_pool_size == 4) ||
+   (amdgpu_smu_memory_pool_size == 8)) {
+   if (total_memory < dram_size_seven_GB)
+   goto def_value1;
+   } else {
+   DRM_WARN("Smu memory pool size not supported\n");
+   goto def_value;
+   }
+   adev->pm.smu_prv_buffer_size = amdgpu_smu_memory_pool_size << 28;
+
+   return;
+
+def_value1:
+   DRM_WARN("No enough system memory\n");
+def_value:
+   adev->pm.smu_prv_buffer_size = 0;
+}
+
 /**
  * amdgpu_device_check_arguments - validate module params
  *
@@ -948,6 +990,8 @@ static void amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
amdgpu_vm_fragment_size = -1;
}
 
+   amdgpu_device_check_smu_prv_buffer_size(adev);
+
amdgpu_device_check_vm_size(adev);
 
amdgpu_device_check_block_size(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index b8c5177..19d8bf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -445,6 +445,8 @@ struct amdgpu_pm {
uint32_tpcie_gen_mask;
uint32_tpcie_mlw_mask;
struct amd_pp_display_configuration pm_display_cfg;/* set by dc */
+   uint32_tsmu_prv_buffer_size;
+   struct amdgpu_bo*smu_prv_buffer;
 };
 
 #define R600_SSTU_DFLT   0
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0b19482..5c0567a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -132,6 +132,7 @@
 int amdgpu_compute_multipipe = -1;
 int amdgpu_gpu_recovery = -1; /* auto */
 int amdgpu_emu_mode = 0;
+uint amdgpu_smu_memory_pool_size = 0;
 
 MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -316,6 +317,11 @@
 module_param_named(cik_support, amdgpu_cik_support, int, 0444);
 #endif
 
+MODULE_PARM_DESC(smu_memory_pool_size,
+   "reserve gtt for smu debug usage, 0 = disable,"
+   "0x1 = 256Mbyte, 0x2 = 512Mbyte, 0x4 = 1 Gbyte, 0x8 = 2GByte");
+module_param_named(smu_memory_pool_size, amdgpu_smu_memory_pool_size, uint, 
0444);
+
 static const struct pci_device_id pciidlist[] = {
 #ifdef  CONFIG_DRM_AMDGPU_SI
{0x1002, 0x6780, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TAHITI},
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 66c49b8..0cdfa41 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ 

[PATCH 2/2] drm/amd/pp: Remove dead interface

2018-04-13 Thread Rex Zhu
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h|  6 --
 drivers/gpu/drm/amd/include/kgd_pp_interface.h |  5 -
 drivers/gpu/drm/amd/powerplay/amd_powerplay.c  | 29 --
 3 files changed, 40 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index 19d8bf5..354c6dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -349,12 +349,6 @@ enum amdgpu_pcie_gen {
((adev)->powerplay.pp_funcs->set_clockgating_by_smu(\
(adev)->powerplay.pp_handle, msg_id))
 
-#define amdgpu_dpm_notify_smu_memory_info(adev, virtual_addr_low, \
-   virtual_addr_hi, mc_addr_low, mc_addr_hi, size) \
-   ((adev)->powerplay.pp_funcs->notify_smu_memory_info)( \
-   (adev)->powerplay.pp_handle, virtual_addr_low, \
-   virtual_addr_hi, mc_addr_low, mc_addr_hi, size)
-
 #define amdgpu_dpm_get_power_profile_mode(adev, buf) \
((adev)->powerplay.pp_funcs->get_power_profile_mode(\
(adev)->powerplay.pp_handle, buf))
diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 1bec907..01969b1 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -239,11 +239,6 @@ struct amd_pm_funcs {
int (*load_firmware)(void *handle);
int (*wait_for_fw_loading_complete)(void *handle);
int (*set_clockgating_by_smu)(void *handle, uint32_t msg_id);
-   int (*notify_smu_memory_info)(void *handle, uint32_t virtual_addr_low,
-   uint32_t virtual_addr_hi,
-   uint32_t mc_addr_low,
-   uint32_t mc_addr_hi,
-   uint32_t size);
int (*set_power_limit)(void *handle, uint32_t n);
int (*get_power_limit)(void *handle, uint32_t *limit, bool 
default_limit);
 /* export to DC */
diff --git a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
index 0cdfa41..6dd06d7 100644
--- a/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/powerplay/amd_powerplay.c
@@ -935,35 +935,6 @@ static int pp_dpm_switch_power_profile(void *handle,
return 0;
 }
 
-static int pp_dpm_notify_smu_memory_info(void *handle,
-   uint32_t virtual_addr_low,
-   uint32_t virtual_addr_hi,
-   uint32_t mc_addr_low,
-   uint32_t mc_addr_hi,
-   uint32_t size)
-{
-   struct pp_hwmgr *hwmgr = handle;
-   int ret = 0;
-
-   if (!hwmgr || !hwmgr->pm_en)
-   return -EINVAL;
-
-   if (hwmgr->hwmgr_func->notify_cac_buffer_info == NULL) {
-   pr_info("%s was not implemented.\n", __func__);
-   return -EINVAL;
-   }
-
-   mutex_lock(>smu_lock);
-
-   ret = hwmgr->hwmgr_func->notify_cac_buffer_info(hwmgr, virtual_addr_low,
-   virtual_addr_hi, mc_addr_low, 
mc_addr_hi,
-   size);
-
-   mutex_unlock(>smu_lock);
-
-   return ret;
-}
-
 static int pp_set_power_limit(void *handle, uint32_t limit)
 {
struct pp_hwmgr *hwmgr = handle;
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: defer test IBs on the rings at boot (V2)

2018-04-13 Thread Shirish S
amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and ensures that its executed
in amdgpu_info_ioctl() if it wasn't scheduled.

V2: Use queue_delayed_work() & flush_delayed_work().

Signed-off-by: Shirish S 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  4 
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
 
/* delayed work_func for deferring clockgating during resume */
struct delayed_work late_init_work;
+   /* delayed work_func to defer testing IB's on rings during boot */
+   struct delayed_work late_init_test_ib_work;
 
struct amdgpu_virt  virt;
/* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..ee84058 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
 
 #define AMDGPU_RESUME_MS   2000
+#define AMDGPU_IB_TEST_SCHED_MS2000
 
 static const char *amdgpu_asic_name[] = {
"TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)
}
 }
 
+static void amdgpu_device_late_init_test_ib_func_handler(struct work_struct 
*work)
+{
+   struct amdgpu_device *adev =
+   container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);
+   int r = amdgpu_ib_ring_tests(adev);
+
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
 /**
  * amdgpu_device_has_dc_support - check if dc is supported
  *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(>ring_lru_list);
spin_lock_init(>ring_lru_list_lock);
 
+   INIT_DELAYED_WORK(>late_init_test_ib_work,
+ amdgpu_device_late_init_test_ib_func_handler);
INIT_DELAYED_WORK(>late_init_work,
  amdgpu_device_ip_late_init_func_handler);
 
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
goto failed;
}
 
-   r = amdgpu_ib_ring_tests(adev);
-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   /* Schedule amdgpu_ib_ring_tests() */
+   queue_delayed_work(system_wq, >late_init_test_ib_work,
+   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));
 
if (amdgpu_sriov_vf(adev))
amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
}
adev->accel_working = false;
cancel_delayed_work_sync(>late_init_work);
+   cancel_delayed_work_sync(>late_init_test_ib_work);
/* free i2c buses */
if (!amdgpu_device_has_dc_support(adev))
amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 487d39e..6fa326b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,10 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (!info->return_size || !info->return_pointer)
return -EINVAL;
 
+   /* Ensure IB tests on ring are executed */
+   if (delayed_work_pending(>late_init_test_ib_work))
+   flush_delayed_work(>late_init_test_ib_work);
+
switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-13 Thread S, Shirish



On 4/13/2018 12:38 PM, Christian König wrote:

Am 13.04.2018 um 09:01 schrieb S, Shirish:



On 4/13/2018 11:53 AM, Christian König wrote:

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.

Which part is sub-optimal, deferring or checking if the work is 
scheduled?


That was about the check. We should wait for the test to finish 
instead of printing an error and continuing.



Done. Have made this change in V2.


Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
    /* delayed work_func for deferring clockgating during 
resume */

  struct delayed_work late_init_work;
+    /* delayed work_func to defer testing IB's on rings during 
boot */

+    struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise 
the two delayed workers can race with each other.


I thought  from the comment above the declaration, its clear why i am 
creating 2 work structures.
late_init_work is to optimize resume time and late_init_test_ib_work 
is to optimize the boot time.
There cant be race as the context in which they are called are 
totally different.


Late init enables power and clock gating. If I'm not completely 
mistaken we don't do the power/clock gating earlier because we had to 
wait for the IB test to finish.


Could be that modern ASICs have additional logic to prevent that, but 
the last time I worked on this power gating a block while you run 
something on it could even crash the whole system.



    struct amdgpu_virt    virt;
  /* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
    #define AMDGPU_RESUME_MS    2000
+#define AMDGPU_IB_TEST_SCHED_MS    2000
    static const char *amdgpu_asic_name[] = {
  "TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)

  }
  }
  +static void amdgpu_device_late_init_test_ib_func_handler(struct 
work_struct *work)

+{
+    struct amdgpu_device *adev =
+    container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);

+    int r = amdgpu_ib_ring_tests(adev);
+
+    if (r)
+    DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  INIT_LIST_HEAD(>ring_lru_list);
  spin_lock_init(>ring_lru_list_lock);
  + INIT_DELAYED_WORK(>late_init_test_ib_work,
+ amdgpu_device_late_init_test_ib_func_handler);
  INIT_DELAYED_WORK(>late_init_work,
    amdgpu_device_ip_late_init_func_handler);
  @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  goto failed;
  }
  -    r = amdgpu_ib_ring_tests(adev);
-    if (r)
-    DRM_ERROR("ib ring test failed (%d).\n", r);
+    /* Schedule amdgpu_ib_ring_tests() */
+    mod_delayed_work(system_wq, >late_init_test_ib_work,
+    msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides 
the existing handler.


What you wanted to use is queue_delayed_work(), but as I said we 
should only have one delayed worker.
mod_delayed_work() is safer and optimal method that replaces 
cancel_delayed_work() followed by queue_delayed_work().

(https://lkml.org/lkml/2011/2/3/175)
But if you strongly insist i don't mind changing it.


Well, mod_delayed_work() does NOT replace queue_delayed_work(). Those 
two functions are for different use cases.


The link you posted actually explains it quite well:

So, cancel_delayed_work() followed by queue_delayed_work() schedules
the work to be executed at the specified time regardless of the
current pending state while queue_delayed_work() takes effect iff
currently the work item is not pending.


queue_delayed_work() takes only effect if the work item is not already 
pending/executing.


In other words with queue_delayed_work() you don't risk executing 
things twice when the 

Re: [PATCH] drm/radeon: change function signature to pass full range

2018-04-13 Thread Huang Rui
On Thu, Apr 12, 2018 at 09:33:33PM +0200, Mathieu Malaterre wrote:
> In function ‘radeon_process_i2c_ch’ a comparison of a u8 value against
> 255 is done. Since it is always false, change the signature of this
> function to use an `int` instead, which match the type used in caller:
> `radeon_atom_hw_i2c_xfer`.
> 
> Fix the following warning triggered with W=1:
> 
>   CC [M]  drivers/gpu/drm/radeon/atombios_i2c.o
>   drivers/gpu/drm/radeon/atombios_i2c.c: In function ‘radeon_process_i2c_ch’:
>   drivers/gpu/drm/radeon/atombios_i2c.c:71:11: warning: comparison is always 
> false due to limited range of data type [-Wtype-limits]
>if (num > ATOM_MAX_HW_I2C_READ) {
>^
> 
> Signed-off-by: Mathieu Malaterre 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/radeon/atombios_i2c.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/radeon/atombios_i2c.c 
> b/drivers/gpu/drm/radeon/atombios_i2c.c
> index 4157780585a0..9022e9af11a0 100644
> --- a/drivers/gpu/drm/radeon/atombios_i2c.c
> +++ b/drivers/gpu/drm/radeon/atombios_i2c.c
> @@ -35,7 +35,7 @@
>  
>  static int radeon_process_i2c_ch(struct radeon_i2c_chan *chan,
>u8 slave_addr, u8 flags,
> -  u8 *buf, u8 num)
> +  u8 *buf, int num)
>  {
>   struct drm_device *dev = chan->dev;
>   struct radeon_device *rdev = dev->dev_private;
> -- 
> 2.11.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/pp: Move common code to smu_help.c

2018-04-13 Thread Huang Rui
On Fri, Apr 13, 2018 at 02:08:15PM +0800, Rex Zhu wrote:
> Signed-off-by: Rex Zhu 

Reviewed-by: Huang Rui 

> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c   | 56 
> ++
>  drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h   | 21 
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 43 +
>  3 files changed, 78 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> index e6178b0..7c23741 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> @@ -650,3 +650,59 @@ int smu_get_voltage_dependency_table_ppt_v1(
>  
>   return 0;
>  }
> +
> +int smu_set_watermarks_for_clocks_ranges(void *wt_table,
> + struct pp_wm_sets_with_clock_ranges_soc15 *wm_with_clock_ranges)
> +{
> + uint32_t i;
> + struct watermarks *table = wt_table;
> +
> + if (!table || wm_with_clock_ranges)
> + return -EINVAL;
> +
> + if (wm_with_clock_ranges->num_wm_sets_dmif > 4 || 
> wm_with_clock_ranges->num_wm_sets_mcif > 4)
> + return -EINVAL;
> +
> + for (i = 0; i < wm_with_clock_ranges->num_wm_sets_dmif; i++) {
> + table->WatermarkRow[1][i].MinClock =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_dmif[i].wm_min_dcefclk_in_khz) /
> + 100);
> + table->WatermarkRow[1][i].MaxClock =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_dmif[i].wm_max_dcefclk_in_khz) /
> + 100);
> + table->WatermarkRow[1][i].MinUclk =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_dmif[i].wm_min_memclk_in_khz) /
> + 100);
> + table->WatermarkRow[1][i].MaxUclk =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_dmif[i].wm_max_memclk_in_khz) /
> + 100);
> + table->WatermarkRow[1][i].WmSetting = (uint8_t)
> + wm_with_clock_ranges->wm_sets_dmif[i].wm_set_id;
> + }
> +
> + for (i = 0; i < wm_with_clock_ranges->num_wm_sets_mcif; i++) {
> + table->WatermarkRow[0][i].MinClock =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_mcif[i].wm_min_socclk_in_khz) /
> + 100);
> + table->WatermarkRow[0][i].MaxClock =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_mcif[i].wm_max_socclk_in_khz) /
> + 100);
> + table->WatermarkRow[0][i].MinUclk =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_mcif[i].wm_min_memclk_in_khz) /
> + 100);
> + table->WatermarkRow[0][i].MaxUclk =
> + cpu_to_le16((uint16_t)
> + 
> (wm_with_clock_ranges->wm_sets_mcif[i].wm_max_memclk_in_khz) /
> + 100);
> + table->WatermarkRow[0][i].WmSetting = (uint8_t)
> + wm_with_clock_ranges->wm_sets_mcif[i].wm_set_id;
> + }
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h 
> b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
> index 2243e29..916cc01 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
> @@ -26,10 +26,27 @@
>  struct pp_atomctrl_voltage_table;
>  struct pp_hwmgr;
>  struct phm_ppt_v1_voltage_lookup_table;
> +struct Watermarks_t;
> +struct pp_wm_sets_with_clock_ranges_soc15;
>  
>  uint8_t convert_to_vid(uint16_t vddc);
>  uint16_t convert_to_vddc(uint8_t vid);
>  
> +struct watermark_row_generic_t {
> + uint16_t MinClock;
> + uint16_t MaxClock;
> + uint16_t MinUclk;
> + uint16_t MaxUclk;
> +
> + uint8_t  WmSetting;
> + uint8_t  Padding[3];
> +};
> +
> +struct watermarks {
> + struct watermark_row_generic_t WatermarkRow[2][4];
> + uint32_t padding[7];
> +};
> +
>  extern int phm_wait_for_register_unequal(struct pp_hwmgr *hwmgr,
>   uint32_t index,
>   uint32_t value, uint32_t mask);
> @@ -88,6 +105,10 @@ void *smu_atom_get_data_table(void *dev, uint32_t table, 
> uint16_t *size,
>  int smu_get_voltage_dependency_table_ppt_v1(
>   const struct phm_ppt_v1_clock_voltage_dependency_table 
> *allowed_dep_table,
>   struct phm_ppt_v1_clock_voltage_dependency_table *dep_table);
> +
> +int smu_set_watermarks_for_clocks_ranges(void *wt_table,
> + struct pp_wm_sets_with_clock_ranges_soc15 
> 

Re: [PATCH 2/2] Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"

2018-04-13 Thread Michel Dänzer
On 2018-04-13 04:55 AM, S, Shirish wrote:
> Hi Harry, Alex,
> 
> The solution given while reviewing my patch was that "DC should support 
> enabling a CRTC without a framebuffer."

I think it's weird that an enabled CRTC would end up having a NULL FB
during normal operation in the first place. Any idea how that happens?


> Since the revert is a temporary workaround to address issue at hand and 
> considering the bigger regression it will cause on ChromeOS(explained below),
> I would strongly recommend that the revert should not be mainlined (to linus 
> tree), until a proper fix for both the issues i.e., flickering and BUG hit on 
> atomic is found.
> 
> For the sake of everyone's understanding in the list below is a brief 
> background.
> 
> Mainline patch from intel folks, "846c7df drm/atomic: Try to preserve the 
> crtc enabled state in drm_atomic_remove_fb, v2." 
> introduces a slight behavioral change to rmfb. Instead of disabling a crtc 
> when the primary plane is disabled, it now preserves it.
> 
> This change leads to BUG hit while performing atomic commit on amd driver 
> leading to reboot/system instability on ChromeOS which has enabled
> drm atomic way of rendering, I also remember it causing some issue on other 
> OS as well.

Can you provide more information about "some issue on other OS"?
Otherwise I'm afraid this is a no-brainer, since nobody's using ChromeOS
with an AMD GPU in the wild, whereas many people have been running into
issues with these commits in the wild, especially since they're in the
4.16 release.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-13 Thread Christian König

Am 13.04.2018 um 09:01 schrieb S, Shirish:



On 4/13/2018 11:53 AM, Christian König wrote:

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.

Which part is sub-optimal, deferring or checking if the work is 
scheduled?


That was about the check. We should wait for the test to finish instead 
of printing an error and continuing.




Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
    /* delayed work_func for deferring clockgating during resume */
  struct delayed_work late_init_work;
+    /* delayed work_func to defer testing IB's on rings during boot */
+    struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise 
the two delayed workers can race with each other.


I thought  from the comment above the declaration, its clear why i am 
creating 2 work structures.
late_init_work is to optimize resume time and late_init_test_ib_work 
is to optimize the boot time.
There cant be race as the context in which they are called are totally 
different.


Late init enables power and clock gating. If I'm not completely mistaken 
we don't do the power/clock gating earlier because we had to wait for 
the IB test to finish.


Could be that modern ASICs have additional logic to prevent that, but 
the last time I worked on this power gating a block while you run 
something on it could even crash the whole system.



    struct amdgpu_virt    virt;
  /* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
    #define AMDGPU_RESUME_MS    2000
+#define AMDGPU_IB_TEST_SCHED_MS    2000
    static const char *amdgpu_asic_name[] = {
  "TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)

  }
  }
  +static void amdgpu_device_late_init_test_ib_func_handler(struct 
work_struct *work)

+{
+    struct amdgpu_device *adev =
+    container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);

+    int r = amdgpu_ib_ring_tests(adev);
+
+    if (r)
+    DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  INIT_LIST_HEAD(>ring_lru_list);
  spin_lock_init(>ring_lru_list_lock);
  +    INIT_DELAYED_WORK(>late_init_test_ib_work,
+  amdgpu_device_late_init_test_ib_func_handler);
  INIT_DELAYED_WORK(>late_init_work,
    amdgpu_device_ip_late_init_func_handler);
  @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  goto failed;
  }
  -    r = amdgpu_ib_ring_tests(adev);
-    if (r)
-    DRM_ERROR("ib ring test failed (%d).\n", r);
+    /* Schedule amdgpu_ib_ring_tests() */
+    mod_delayed_work(system_wq, >late_init_test_ib_work,
+    msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides the 
existing handler.


What you wanted to use is queue_delayed_work(), but as I said we 
should only have one delayed worker.
mod_delayed_work() is safer and optimal method that replaces 
cancel_delayed_work() followed by queue_delayed_work().

(https://lkml.org/lkml/2011/2/3/175)
But if you strongly insist i don't mind changing it.


Well, mod_delayed_work() does NOT replace queue_delayed_work(). Those 
two functions are for different use cases.


The link you posted actually explains it quite well:

So, cancel_delayed_work() followed by queue_delayed_work() schedules
the work to be executed at the specified time regardless of the
current pending state while queue_delayed_work() takes effect iff
currently the work item is not pending.


queue_delayed_work() takes only effect if the work item is not already 
pending/executing.


In other words with queue_delayed_work() you don't risk executing things 
twice when the work is already executing.


While with mod_delayed_work() you absolutely 

[PATCH] drm/gpu-sched: fix force APP kill hang(v4)

2018-04-13 Thread Emily Deng
issue:
there are VMC page fault occurred if force APP kill during
3dmark test, the cause is in entity_fini we manually signal
all those jobs in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on
shadow buffer, and shadow buffer's Gart table is cleaned by
ttm_bo_release since the fence in its reservation was fake signaled
by entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime
of gfx job we manually fake signal all jobs from its entity
in entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX page fault.

fix:
1)should at least wait all jobs already scheduled complete in entity_fini()
if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency, should
set this entity guilty to prevent its job really run since the dependency
is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs
with an error code.

v3:
1)Replace the fini1 and fini2 with better name
2)Call the first part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part
after the VM teardown
3)Keep the original function drm_sched_entity_fini to
refine the code.

v4:
1)Rename entity->finished to entity->last_scheduled;
2)Rename drm_sched_entity_fini_job_cb() to
drm_sched_entity_kill_jobs_cb();
3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT;
4)Replace the type of entity->fini_status with "int";
5)Remove the check about entity->finished.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64 
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 71 ++-
 include/drm/gpu_scheduler.h   |  7 +++
 5 files changed, 128 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..b3d047d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx, unsigned ring_id);
 
 void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr *mgr);
+void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr);
 void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr);
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 09d35051..eb80edf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -111,8 +111,9 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
return r;
 }
 
-static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
+static void amdgpu_ctx_fini(struct kref *ref)
 {
+   struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx, refcount);
struct amdgpu_device *adev = ctx->adev;
unsigned i, j;
 
@@ -125,13 +126,11 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
kfree(ctx->fences);
ctx->fences = NULL;
 
-   for (i = 0; i < adev->num_rings; i++)
-   drm_sched_entity_fini(>rings[i]->sched,
- >rings[i].entity);
-
amdgpu_queue_mgr_fini(adev, >queue_mgr);
 
mutex_destroy(>lock);
+
+   kfree(ctx);
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
@@ -170,12 +169,15 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
 static void amdgpu_ctx_do_release(struct kref *ref)
 {
struct amdgpu_ctx *ctx;
+   u32 i;
 
ctx = container_of(ref, struct amdgpu_ctx, refcount);
 
-   amdgpu_ctx_fini(ctx);
+   for (i = 0; i < ctx->adev->num_rings; i++)
+   drm_sched_entity_fini(>adev->rings[i]->sched,
+   >rings[i].entity);
 
-   kfree(ctx);
+   amdgpu_ctx_fini(ref);
 }
 
 static int amdgpu_ctx_free(struct amdgpu_fpriv *fpriv, uint32_t id)
@@ -435,16 +437,62 @@ void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr)
idr_init(>ctx_handles);
 }
 
+void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
+{
+   struct amdgpu_ctx *ctx;
+   struct idr *idp;
+   uint32_t id, i;
+
+   idp = >ctx_handles;
+
+   idr_for_each_entry(idp, ctx, id) {
+
+   if (!ctx->adev)
+   return;
+
+   for (i = 0; i < ctx->adev->num_rings; i++)
+   if (kref_read(>refcount) == 

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-13 Thread S, Shirish



On 4/13/2018 11:53 AM, Christian König wrote:

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.


Which part is sub-optimal, deferring or checking if the work is scheduled?


Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
    /* delayed work_func for deferring clockgating during resume */
  struct delayed_work late_init_work;
+    /* delayed work_func to defer testing IB's on rings during boot */
+    struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise 
the two delayed workers can race with each other.


I thought  from the comment above the declaration, its clear why i am 
creating 2 work structures.
late_init_work is to optimize resume time and late_init_test_ib_work is 
to optimize the boot time.
There cant be race as the context in which they are called are totally 
different.

    struct amdgpu_virt    virt;
  /* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
    #define AMDGPU_RESUME_MS    2000
+#define AMDGPU_IB_TEST_SCHED_MS    2000
    static const char *amdgpu_asic_name[] = {
  "TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)

  }
  }
  +static void amdgpu_device_late_init_test_ib_func_handler(struct 
work_struct *work)

+{
+    struct amdgpu_device *adev =
+    container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);

+    int r = amdgpu_ib_ring_tests(adev);
+
+    if (r)
+    DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  INIT_LIST_HEAD(>ring_lru_list);
  spin_lock_init(>ring_lru_list_lock);
  +    INIT_DELAYED_WORK(>late_init_test_ib_work,
+  amdgpu_device_late_init_test_ib_func_handler);
  INIT_DELAYED_WORK(>late_init_work,
    amdgpu_device_ip_late_init_func_handler);
  @@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  goto failed;
  }
  -    r = amdgpu_ib_ring_tests(adev);
-    if (r)
-    DRM_ERROR("ib ring test failed (%d).\n", r);
+    /* Schedule amdgpu_ib_ring_tests() */
+    mod_delayed_work(system_wq, >late_init_test_ib_work,
+    msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides the 
existing handler.


What you wanted to use is queue_delayed_work(), but as I said we 
should only have one delayed worker.
mod_delayed_work() is safer and optimal method that replaces 
cancel_delayed_work() followed by queue_delayed_work().

(https://lkml.org/lkml/2011/2/3/175)
But if you strongly insist i don't mind changing it.



    if (amdgpu_sriov_vf(adev))
  amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device 
*adev)

  }
  adev->accel_working = false;
  cancel_delayed_work_sync(>late_init_work);
+ cancel_delayed_work_sync(>late_init_test_ib_work);
  /* free i2c buses */
  if (!amdgpu_device_has_dc_support(adev))
  amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device 
*dev, void *data, struct drm_file

  if (!info->return_size || !info->return_pointer)
  return -EINVAL;
  +    if (delayed_work_pending(>late_init_test_ib_work))
+    DRM_ERROR("IB test on ring not executed\n");
+


Please use flush_delayed_work() instead of issuing and error here.


Agree, wasn't sure of what to do here :).
So i will re-spin with the flush part added. Hope this reply clarifies 
your comments.

Thanks.
Regards,
Shirish S

Regards,
Christian.


  

[PATCH] drm/amd/pp: Move common code to smu_help.c

2018-04-13 Thread Rex Zhu
Signed-off-by: Rex Zhu 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c   | 56 ++
 drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h   | 21 
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 43 +
 3 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
index e6178b0..7c23741 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
@@ -650,3 +650,59 @@ int smu_get_voltage_dependency_table_ppt_v1(
 
return 0;
 }
+
+int smu_set_watermarks_for_clocks_ranges(void *wt_table,
+   struct pp_wm_sets_with_clock_ranges_soc15 *wm_with_clock_ranges)
+{
+   uint32_t i;
+   struct watermarks *table = wt_table;
+
+   if (!table || wm_with_clock_ranges)
+   return -EINVAL;
+
+   if (wm_with_clock_ranges->num_wm_sets_dmif > 4 || 
wm_with_clock_ranges->num_wm_sets_mcif > 4)
+   return -EINVAL;
+
+   for (i = 0; i < wm_with_clock_ranges->num_wm_sets_dmif; i++) {
+   table->WatermarkRow[1][i].MinClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_min_dcefclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MaxClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_max_dcefclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MinUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_min_memclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].MaxUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_dmif[i].wm_max_memclk_in_khz) /
+   100);
+   table->WatermarkRow[1][i].WmSetting = (uint8_t)
+   wm_with_clock_ranges->wm_sets_dmif[i].wm_set_id;
+   }
+
+   for (i = 0; i < wm_with_clock_ranges->num_wm_sets_mcif; i++) {
+   table->WatermarkRow[0][i].MinClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_min_socclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MaxClock =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_max_socclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MinUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_min_memclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].MaxUclk =
+   cpu_to_le16((uint16_t)
+   
(wm_with_clock_ranges->wm_sets_mcif[i].wm_max_memclk_in_khz) /
+   100);
+   table->WatermarkRow[0][i].WmSetting = (uint8_t)
+   wm_with_clock_ranges->wm_sets_mcif[i].wm_set_id;
+   }
+   return 0;
+}
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h 
b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
index 2243e29..916cc01 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.h
@@ -26,10 +26,27 @@
 struct pp_atomctrl_voltage_table;
 struct pp_hwmgr;
 struct phm_ppt_v1_voltage_lookup_table;
+struct Watermarks_t;
+struct pp_wm_sets_with_clock_ranges_soc15;
 
 uint8_t convert_to_vid(uint16_t vddc);
 uint16_t convert_to_vddc(uint8_t vid);
 
+struct watermark_row_generic_t {
+   uint16_t MinClock;
+   uint16_t MaxClock;
+   uint16_t MinUclk;
+   uint16_t MaxUclk;
+
+   uint8_t  WmSetting;
+   uint8_t  Padding[3];
+};
+
+struct watermarks {
+   struct watermark_row_generic_t WatermarkRow[2][4];
+   uint32_t padding[7];
+};
+
 extern int phm_wait_for_register_unequal(struct pp_hwmgr *hwmgr,
uint32_t index,
uint32_t value, uint32_t mask);
@@ -88,6 +105,10 @@ void *smu_atom_get_data_table(void *dev, uint32_t table, 
uint16_t *size,
 int smu_get_voltage_dependency_table_ppt_v1(
const struct phm_ppt_v1_clock_voltage_dependency_table 
*allowed_dep_table,
struct phm_ppt_v1_clock_voltage_dependency_table *dep_table);
+
+int smu_set_watermarks_for_clocks_ranges(void *wt_table,
+   struct pp_wm_sets_with_clock_ranges_soc15 
*wm_with_clock_ranges);
+
 #define PHM_FIELD_SHIFT(reg, field) reg##__##field##__SHIFT
 #define PHM_FIELD_MASK(reg, field) reg##__##field##_MASK
 
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c 

Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-13 Thread Christian König

Hi Monk/Emily,

give me the weekend to take a closer look since I'm very busy this morning.

In general the order of ctx_fini and vm fini is very important cause we 
otherwise dereference invalid pointers here.


Regards,
Christian.

Am 13.04.2018 um 08:18 schrieb Deng, Emily:

Hi Monk,
 Another consideration, it will be better to put amdgpu_ctx_mgr_fini() 
beneath amdgpu_vm_fini, in
this case, it will first set all ctx and vm entity rq to null, then set all the 
not scheduled job's fence to signal.

Best Wishes,
Emily Deng


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
Of Deng, Emily
Sent: Friday, April 13, 2018 2:08 PM
To: Liu, Monk ; Christian König

Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

Hi Monk,
  Thanks your review, I will refine the code as your suggestion 1),2),3), 
4),5).
  About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini before 
or
after amdgpu_vm_fini.

Hi Christian,
  Do you have any thoughts about 6)?

Best Wishes,
Emily Deng


-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
Of monk
Sent: Friday, April 13, 2018 1:01 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

Hi Christian & Emily

This v3 version looks pretty good to me, but still some parts need to
improve:

e.g.

1)entity->finished doesn't presenting what it really means, better
rename it to entity->last_scheduled.

2)drm_sched_entity_fini_job_cb() better renamed to
drm_sched_entity_kill_jobs_cb()

3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
drm_sched_entity_fini_job_cb() if -ENOENT returned after
dma_fence_add_callback since this parm is not needed at all in this
callback routine

4)better change type of entity->fini_status to "int" instead of
"uint32_t" ... it should be aligned with the return type of
wait_event_killable()

5)

+   if (entity->finished) {
+   dma_fence_put(entity->finished);
+   entity->finished = NULL;
}

no need to check entity->finished because dma_fence_put() will do it

inside...



and the same here in job_recovery:

+
+   if (s_job->entity->finished)
+   dma_fence_put(s_job->entity->finished);

and the same here in sched_main:

+   if (entity->finished)
+   dma_fence_put(entity->finished);


6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
for that ?


thanks

/Monk






On 04/13/2018 10:06 AM, Deng, Emily wrote:

Ping

Best Wishes,
Emily Deng





-Original Message-
From: Emily Deng [mailto:emily.d...@amd.com]
Sent: Thursday, April 12, 2018 6:22 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Liu, Monk



Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

issue:
there are VMC page fault occurred if force APP kill during 3dmark
test, the cause is in entity_fini we manually signal all those jobs
in entity's queue which confuse the sync/dep
mechanism:

1)page fault occurred in sdma's clear job which operate on shadow
buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
since the fence in its reservation was fake signaled by
entity_fini() under the case of SIGKILL received.

2)page fault occurred in gfx' job because during the lifetime of
gfx job we manually fake signal all jobs from its entity in
entity_fini(), thus the unmapping/clear PTE job depend on those
result fence is satisfied and sdma start clearing the PTE and lead
to GFX

page fault.

fix:
1)should at least wait all jobs already scheduled complete in
entity_fini() if SIGKILL is the case.

2)if a fence signaled and try to clear some entity's dependency,
should set this entity guilty to prevent its job really run since
the dependency is fake signaled.

v2:
splitting drm_sched_entity_fini() into two functions:
1)The first one is does the waiting, removes the entity from the
runqueue and returns an error when the process was killed.
2)The second one then goes over the entity, install it as
completion signal for the remaining jobs and signals all jobs with an

error code.

v3:
1)Replace the fini1 and fini2 with better name 2)Call the first
part before the VM teardown in
amdgpu_driver_postclose_kms() and the second part after the VM
teardown 3)Keep the original function drm_sched_entity_fini to
refine the

code.

Signed-off-by: Monk Liu 
Signed-off-by: Emily Deng 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
++
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
++-
   

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

2018-04-13 Thread Christian König

Am 13.04.2018 um 06:07 schrieb Shirish S:

amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.


That is rather suboptimal, but see below.



Signed-off-by: Shirish S 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
  
  	/* delayed work_func for deferring clockgating during resume */

struct delayed_work late_init_work;
+   /* delayed work_func to defer testing IB's on rings during boot */
+   struct delayed_work late_init_test_ib_work;


You must put the IB test into the late_init_work as well, otherwise the 
two delayed workers can race with each other.


  
  	struct amdgpu_virt	virt;

/* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
  
  #define AMDGPU_RESUME_MS		2000

+#define AMDGPU_IB_TEST_SCHED_MS2000
  
  static const char *amdgpu_asic_name[] = {

"TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enum 
amd_asic_type asic_type)
}
  }
  
+static void amdgpu_device_late_init_test_ib_func_handler(struct work_struct *work)

+{
+   struct amdgpu_device *adev =
+   container_of(work, struct amdgpu_device, 
late_init_test_ib_work.work);
+   int r = amdgpu_ib_ring_tests(adev);
+
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(>ring_lru_list);
spin_lock_init(>ring_lru_list_lock);
  
+	INIT_DELAYED_WORK(>late_init_test_ib_work,

+ amdgpu_device_late_init_test_ib_func_handler);
INIT_DELAYED_WORK(>late_init_work,
  amdgpu_device_ip_late_init_func_handler);
  
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,

goto failed;
}
  
-	r = amdgpu_ib_ring_tests(adev);

-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   /* Schedule amdgpu_ib_ring_tests() */
+   mod_delayed_work(system_wq, >late_init_test_ib_work,
+   msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));


That doesn't work like you intended. mod_delayed_work() overrides the 
existing handler.


What you wanted to use is queue_delayed_work(), but as I said we should 
only have one delayed worker.


  
  	if (amdgpu_sriov_vf(adev))

amdgpu_virt_init_data_exchange(adev);
@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
}
adev->accel_working = false;
cancel_delayed_work_sync(>late_init_work);
+   cancel_delayed_work_sync(>late_init_test_ib_work);
/* free i2c buses */
if (!amdgpu_device_has_dc_support(adev))
amdgpu_i2c_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (!info->return_size || !info->return_pointer)
return -EINVAL;
  
+	if (delayed_work_pending(>late_init_test_ib_work))

+   DRM_ERROR("IB test on ring not executed\n");
+


Please use flush_delayed_work() instead of issuing and error here.

Regards,
Christian.


switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-13 Thread Deng, Emily
Hi Monk,
Another consideration, it will be better to put amdgpu_ctx_mgr_fini() 
beneath amdgpu_vm_fini, in 
this case, it will first set all ctx and vm entity rq to null, then set all the 
not scheduled job's fence to signal.

Best Wishes,
Emily Deng

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Deng, Emily
> Sent: Friday, April 13, 2018 2:08 PM
> To: Liu, Monk ; Christian König
> 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> 
> Hi Monk,
>  Thanks your review, I will refine the code as your suggestion 1),2),3), 
> 4),5).
>  About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini 
> before or
> after amdgpu_vm_fini.
> 
> Hi Christian,
>  Do you have any thoughts about 6)?
> 
> Best Wishes,
> Emily Deng
> 
> > -Original Message-
> > From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> > Of monk
> > Sent: Friday, April 13, 2018 1:01 PM
> > To: Deng, Emily ; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> >
> > Hi Christian & Emily
> >
> > This v3 version looks pretty good to me, but still some parts need to
> > improve:
> >
> > e.g.
> >
> > 1)entity->finished doesn't presenting what it really means, better
> > rename it to entity->last_scheduled.
> >
> > 2)drm_sched_entity_fini_job_cb() better renamed to
> > drm_sched_entity_kill_jobs_cb()
> >
> > 3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
> > drm_sched_entity_fini_job_cb() if -ENOENT returned after
> > dma_fence_add_callback since this parm is not needed at all in this
> > callback routine
> >
> > 4)better change type of entity->fini_status to "int" instead of
> > "uint32_t" ... it should be aligned with the return type of
> > wait_event_killable()
> >
> > 5)
> >
> > +   if (entity->finished) {
> > +   dma_fence_put(entity->finished);
> > +   entity->finished = NULL;
> > }
> >
> > no need to check entity->finished because dma_fence_put() will do it
> inside...
> >
> >
> >
> > and the same here in job_recovery:
> >
> > +
> > +   if (s_job->entity->finished)
> > +   dma_fence_put(s_job->entity->finished);
> >
> > and the same here in sched_main:
> >
> > +   if (entity->finished)
> > +   dma_fence_put(entity->finished);
> >
> >
> > 6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
> > for that ?
> >
> >
> > thanks
> >
> > /Monk
> >
> >
> >
> >
> >
> >
> > On 04/13/2018 10:06 AM, Deng, Emily wrote:
> > > Ping
> > >
> > > Best Wishes,
> > > Emily Deng
> > >
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: Emily Deng [mailto:emily.d...@amd.com]
> > >> Sent: Thursday, April 12, 2018 6:22 PM
> > >> To: amd-gfx@lists.freedesktop.org
> > >> Cc: Deng, Emily ; Liu, Monk
> > 
> > >> Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> > >>
> > >> issue:
> > >> there are VMC page fault occurred if force APP kill during 3dmark
> > >> test, the cause is in entity_fini we manually signal all those jobs
> > >> in entity's queue which confuse the sync/dep
> > >> mechanism:
> > >>
> > >> 1)page fault occurred in sdma's clear job which operate on shadow
> > >> buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
> > >> since the fence in its reservation was fake signaled by
> > >> entity_fini() under the case of SIGKILL received.
> > >>
> > >> 2)page fault occurred in gfx' job because during the lifetime of
> > >> gfx job we manually fake signal all jobs from its entity in
> > >> entity_fini(), thus the unmapping/clear PTE job depend on those
> > >> result fence is satisfied and sdma start clearing the PTE and lead
> > >> to GFX
> > page fault.
> > >>
> > >> fix:
> > >> 1)should at least wait all jobs already scheduled complete in
> > >> entity_fini() if SIGKILL is the case.
> > >>
> > >> 2)if a fence signaled and try to clear some entity's dependency,
> > >> should set this entity guilty to prevent its job really run since
> > >> the dependency is fake signaled.
> > >>
> > >> v2:
> > >> splitting drm_sched_entity_fini() into two functions:
> > >> 1)The first one is does the waiting, removes the entity from the
> > >> runqueue and returns an error when the process was killed.
> > >> 2)The second one then goes over the entity, install it as
> > >> completion signal for the remaining jobs and signals all jobs with an
> error code.
> > >>
> > >> v3:
> > >> 1)Replace the fini1 and fini2 with better name 2)Call the first
> > >> part before the VM teardown in
> > >> amdgpu_driver_postclose_kms() and the second part after the VM
> > >> teardown 3)Keep the original function drm_sched_entity_fini to
> > >> refine the
> > code.
> > >>
> > >> Signed-off-by: Monk Liu 

RE: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)

2018-04-13 Thread Deng, Emily
Hi Monk,
 Thanks your review, I will refine the code as your suggestion 1),2),3), 
4),5).
 About 6): I think it is no relation to put the amdgpu_ctx_mgr_fini before 
or after amdgpu_vm_fini.

Hi Christian, 
 Do you have any thoughts about 6)?

Best Wishes,
Emily Deng

> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of monk
> Sent: Friday, April 13, 2018 1:01 PM
> To: Deng, Emily ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> 
> Hi Christian & Emily
> 
> This v3 version looks pretty good to me, but still some parts need to
> improve:
> 
> e.g.
> 
> 1)entity->finished doesn't presenting what it really means, better rename it
> to entity->last_scheduled.
> 
> 2)drm_sched_entity_fini_job_cb() better renamed to
> drm_sched_entity_kill_jobs_cb()
> 
> 3) no need to pass "entity->finished" (line275 of gpu_schedulers.c) to
> drm_sched_entity_fini_job_cb() if -ENOENT returned after
> dma_fence_add_callback since this parm is not needed at all in this callback
> routine
> 
> 4)better change type of entity->fini_status to "int" instead of "uint32_t" 
> ... it
> should be aligned with the return type of
> wait_event_killable()
> 
> 5)
> 
> + if (entity->finished) {
> + dma_fence_put(entity->finished);
> + entity->finished = NULL;
>   }
> 
> no need to check entity->finished because dma_fence_put() will do it inside...
> 
> 
> 
> and the same here in job_recovery:
> 
> +
> + if (s_job->entity->finished)
> + dma_fence_put(s_job->entity->finished);
> 
> and the same here in sched_main:
> 
> + if (entity->finished)
> + dma_fence_put(entity->finished);
> 
> 
> 6) why put amdgpu_ctx_mgr_fini() beneath amdgpu_vm_fini() ? any reason
> for that ?
> 
> 
> thanks
> 
> /Monk
> 
> 
> 
> 
> 
> 
> On 04/13/2018 10:06 AM, Deng, Emily wrote:
> > Ping
> >
> > Best Wishes,
> > Emily Deng
> >
> >
> >
> >
> >> -Original Message-
> >> From: Emily Deng [mailto:emily.d...@amd.com]
> >> Sent: Thursday, April 12, 2018 6:22 PM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Deng, Emily ; Liu, Monk
> 
> >> Subject: [PATCH] drm/gpu-sched: fix force APP kill hang(v3)
> >>
> >> issue:
> >> there are VMC page fault occurred if force APP kill during 3dmark
> >> test, the cause is in entity_fini we manually signal all those jobs
> >> in entity's queue which confuse the sync/dep
> >> mechanism:
> >>
> >> 1)page fault occurred in sdma's clear job which operate on shadow
> >> buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release
> >> since the fence in its reservation was fake signaled by entity_fini()
> >> under the case of SIGKILL received.
> >>
> >> 2)page fault occurred in gfx' job because during the lifetime of gfx
> >> job we manually fake signal all jobs from its entity in
> >> entity_fini(), thus the unmapping/clear PTE job depend on those
> >> result fence is satisfied and sdma start clearing the PTE and lead to GFX
> page fault.
> >>
> >> fix:
> >> 1)should at least wait all jobs already scheduled complete in
> >> entity_fini() if SIGKILL is the case.
> >>
> >> 2)if a fence signaled and try to clear some entity's dependency,
> >> should set this entity guilty to prevent its job really run since the
> >> dependency is fake signaled.
> >>
> >> v2:
> >> splitting drm_sched_entity_fini() into two functions:
> >> 1)The first one is does the waiting, removes the entity from the
> >> runqueue and returns an error when the process was killed.
> >> 2)The second one then goes over the entity, install it as completion
> >> signal for the remaining jobs and signals all jobs with an error code.
> >>
> >> v3:
> >> 1)Replace the fini1 and fini2 with better name 2)Call the first part
> >> before the VM teardown in
> >> amdgpu_driver_postclose_kms() and the second part after the VM
> >> teardown 3)Keep the original function drm_sched_entity_fini to refine the
> code.
> >>
> >> Signed-off-by: Monk Liu 
> >> Signed-off-by: Emily Deng 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 64
> >> ++
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  5 ++-
> >>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 74
> >> ++-
> >>   include/drm/gpu_scheduler.h   |  7 +++
> >>   5 files changed, 131 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> index 2babfad..200db73 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >> @@ -681,6 +681,8 @@ int amdgpu_ctx_ioctl(struct drm_device *dev,
> void
> >> *data,  int amdgpu_ctx_wait_prev_fence(struct