Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread Mikhail Gavrilov
On Thu, 20 Dec 2018 at 03:41, StDenis, Tom wrote: > sudo strace umr -R gfx[.] 2>&1 | tee strace.log > > will capture everything. > > In the mean time I can fix at least the segfault. > > The issue is why can't it open "amdgpu_ring_gfx". > > Tom > strace file is attached here. -- Best Regards,

RE: [PATCH 2/2] drm/amdgpu/nbio7.4: add hw bug workaround for vega20

2018-12-19 Thread Xu, Feifei
Series Acked-by: Feifei Xu -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Thursday, December 20, 2018 7:09 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH 2/2] drm/amdgpu/nbio7.4: add hw bug workaround for vega20 Configure PCIE_CI_CNTL to

[PATCH v2 06/16] drm/i915: Keep malloc references to MST ports

2018-12-19 Thread Lyude Paul
So that the ports stay around until we've destroyed the connectors, in order to ensure that we don't pass an invalid pointer to any MST helpers once we introduce the new MST VCPI helpers. Changes since v1: * Move drm_dp_mst_get_port_malloc() to where we assign intel_connector->port - danvet

[PATCH v2 09/16] drm/nouveau: Remove unnecessary VCPI checks in nv50_msto_cleanup()

2018-12-19 Thread Lyude Paul
There is no need to look at the port's VCPI allocation before calling drm_dp_mst_deallocate_vcpi(), as we already have msto->disabled to let us avoid cleaning up an msto more then once. The DP MST core will never call drm_dp_mst_deallocate_vcpi() on it's own, which is presumably what these checks

[PATCH v2 12/16] drm/nouveau: Grab payload lock in nv50_msto_payload()

2018-12-19 Thread Lyude Paul
Going through the currently programmed payloads isn't safe without holding mgr->payload_lock, so actually do that and warn if anyone tries calling nv50_msto_payload() in the future without grabbing the right locks. Signed-off-by: Lyude Paul Cc: Daniel Vetter Cc: David Airlie Cc: Jerry Zuo Cc:

[PATCH v2 10/16] drm/nouveau: Keep malloc references to MST ports

2018-12-19 Thread Lyude Paul
Now that we finally have a sane way to keep port allocations around, use it to fix the potential unchecked ->port accesses that nouveau makes by making sure we keep the mst port allocated for as long as it's drm_connector is accessible. Additionally, now that we've guaranteed that mstc->port is

[PATCH v2 11/16] drm/nouveau: Stop unsetting mstc->port, use malloc refs

2018-12-19 Thread Lyude Paul
Same as we did for i915, but for nouveau this time. Additionally, we grab a malloc reference to the port that lasts for the entire lifetime of nv50_mstc, which gives us the guarantee that mstc->port will always point to valid memory for as long as the mstc stays around. Signed-off-by: Lyude Paul

[PATCH v2 15/16] drm/dp_mst: Check payload count in drm_dp_mst_atomic_check()

2018-12-19 Thread Lyude Paul
It occurred to me that we never actually check this! So let's start doing that. Signed-off-by: Lyude Paul Reviewed-by: Daniel Vetter Cc: David Airlie Cc: Jerry Zuo Cc: Harry Wentland Cc: Juston Li --- drivers/gpu/drm/drm_dp_mst_topology.c | 11 +++ 1 file changed, 7 insertions(+),

[PATCH v2 08/16] drm/nouveau: Remove bogus cleanup in nv50_mstm_add_connector()

2018-12-19 Thread Lyude Paul
Trying to destroy the connector using mstc->connector.funcs->destroy() if connector initialization fails is wrong: there is no possible codepath in nv50_mstc_new where nv50_mstm_add_connector() would return <0 and mstc would be non-NULL. Signed-off-by: Lyude Paul Cc: Daniel Vetter Cc: David

[PATCH v2 05/16] drm/dp_mst: Fix payload deallocation on hotplugs using malloc refs

2018-12-19 Thread Lyude Paul
Up until now, freeing payloads on remote MST hubs that just had ports removed has almost never worked because we've been relying on port validation in order to stop us from accessing ports that have already been freed from memory, but ports which need their payloads released due to being removed

[PATCH v2 14/16] drm/dp_mst: Start tracking per-port VCPI allocations

2018-12-19 Thread Lyude Paul
There has been a TODO waiting for quite a long time in drm_dp_mst_topology.c: /* We cannot rely on port->vcpi.num_slots to update * topology_state->avail_slots as the port may not exist if the parent * branch device was unplugged. This should be fixed by tracking

[PATCH v2 16/16] drm/nouveau: Use atomic VCPI helpers for MST

2018-12-19 Thread Lyude Paul
Currently, nouveau uses the yolo method of setting up MST displays: it uses the old VCPI helpers (drm_dp_find_vcpi_slots()) for computing the display configuration. These helpers don't take care to make sure they take a reference to the mstb port that they're checking, and additionally don't

[PATCH v2 04/16] drm/dp_mst: Stop releasing VCPI when removing ports from topology

2018-12-19 Thread Lyude Paul
This has never actually worked, and isn't needed anyway: the driver's always going to try to deallocate VCPI when it tears down the display that the VCPI belongs to. Signed-off-by: Lyude Paul Reviewed-by: Daniel Vetter Cc: David Airlie Cc: Jerry Zuo Cc: Harry Wentland Cc: Juston Li ---

[PATCH v2 13/16] drm/dp_mst: Add some atomic state iterator macros

2018-12-19 Thread Lyude Paul
Changes since v6: - Move EXPORT_SYMBOL() for drm_dp_mst_topology_state_funcs to this commit - Document __drm_dp_mst_state_iter_get() and note that it shouldn't be called directly Signed-off-by: Lyude Paul Reviewed-by: Daniel Vetter Cc: David Airlie Cc: Jerry Zuo Cc: Harry Wentland

[PATCH v2 03/16] drm/dp_mst: Restart last_connected_port_and_mstb() if topology ref fails

2018-12-19 Thread Lyude Paul
While this isn't a complete fix, this will improve the reliability of drm_dp_get_last_connected_port_and_mstb() pretty significantly during hotplug events, since there's a chance that the in-memory topology tree may not be fully updated when drm_dp_get_last_connected_port_and_mstb() is called and

[PATCH v2 07/16] drm/amdgpu/display: Keep malloc ref to MST port

2018-12-19 Thread Lyude Paul
Just like i915 and nouveau, it's a good idea for us to hold a malloc reference to the port here so that we never pass a freed pointer to any of the DP MST helper functions. Also, we stop unsetting aconnector->port in dm_dp_destroy_mst_connector(). There's literally no point to that assignment

[PATCH v2 02/16] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports

2018-12-19 Thread Lyude Paul
The current way of handling refcounting in the DP MST helpers is really confusing and probably just plain wrong because it's been hacked up many times over the years without anyone actually going over the code and seeing if things could be simplified. To the best of my understanding, the current

[PATCH v2 00/16] MST refcounting/atomic helpers cleanup

2018-12-19 Thread Lyude Paul
This is the series I've been working on for a while now to get all of the atomic DRM drivers in the tree to use the atomic MST helpers, and to make the atomic MST helpers actually idempotent. Turns out it's a lot more difficult to do that without also fixing how port and branch device refcounting

[PATCH v2 01/16] drm/dp_mst: Rename drm_dp_mst_get_validated_(port|mstb)_ref and friends

2018-12-19 Thread Lyude Paul
s/drm_dp_get_validated_port_ref/drm_dp_mst_topology_get_port_validated/ s/drm_dp_put_port/drm_dp_mst_topology_put_port/ s/drm_dp_get_validated_mstb_ref/drm_dp_mst_topology_get_mstb_validated/ s/drm_dp_put_mst_branch_device/drm_dp_mst_topology_put_mstb/ This is a much more consistent naming

[PATCH 2/2] drm/amdgpu/nbio7.4: add hw bug workaround for vega20

2018-12-19 Thread Alex Deucher
Configure PCIE_CI_CNTL to work around a hw bug that affects some multi-GPU compute workloads. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c

[PATCH 1/2] drm/amdgpu/nbio6.1: add hw bug workaround for vega10/12

2018-12-19 Thread Alex Deucher
Configure PCIE_CI_CNTL to work around a hw bug that affects some multi-GPU compute workloads. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread Mikhail Gavrilov
On Thu, 20 Dec 2018 at 01:56, StDenis, Tom wrote: > > Sorry missed the gfx ring in the reply. > > Um what kernel version? 4.20.0-0.rc6 > Is this the latest umr? yes, master branch, commit 546c30a71f7b87f97f2a96eab184c3973b014711 > Maybe capture a trace of umr to see what is happening. Cannot

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread Mikhail Gavrilov
I see that backtrace in my previous message are borked. I place backtrace in text file for more comfort reading in this message. -- Best Regards, Mike Gavrilov. Cannot seek to MMIO address: Bad file descriptor [ERROR]: Could not open ring debugfs file Program received signal SIGSEGV,

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread StDenis, Tom
On 2018-12-19 4:21 p.m., Mikhail Gavrilov wrote: > I see that backtrace in my previous message are borked. > I place backtrace in text file for more comfort reading in this message. The backtrace points to the segfault in umr caused when it fails to read the file. We want to know why it can't

Re: [PATCH] drm/amd/display: Fix 64-bit division for 32-bit builds

2018-12-19 Thread Wentland, Harry
On 2018-12-19 3:28 p.m., sunpeng...@amd.com wrote: > From: Ken Chalmers > > [Why] > 32-bit builds break when doing 64-bit division directly. > > [How] > Use the div_u64() function instead to perform the division. > > Fixes: >

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread StDenis, Tom
No gfx ring? You can specify a ring name for --waves should be in the docs. It's not on the web docs but in the help text https://cgit.freedesktop.org/amd/umr/tree/src/app/main.c#n643 I'll fix the web docs when I'm in next. Tom On December 19, 2018 3:21:25 PM EST, "Grodzovsky, Andrey"

[PATCH] drm/amd/display: Fix 64-bit division for 32-bit builds

2018-12-19 Thread sunpeng.li
From: Ken Chalmers [Why] 32-bit builds break when doing 64-bit division directly. [How] Use the div_u64() function instead to perform the division. Fixes: https://lists.freedesktop.org/archives/dri-devel/2018-December/201008.html Signed-off-by: Ken Chalmers Reviewed-by: Leo Li ---

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread Grodzovsky, Andrey
+Tom Andrey On 12/19/2018 01:35 PM, Mikhail Gavrilov wrote: > On Tue, 18 Dec 2018 at 00:08, Grodzovsky, Andrey > wrote: >> Please install UMR and dump gfx ring content and waves after the hang is >> happening. >> >> UMR at - https://cgit.freedesktop.org/amd/umr/ >> Waves dump >> sudo umr -O

Re: [PATCH] drm/amd/display: Fix MST dp_blank REG_WAIT timeout

2018-12-19 Thread Deucher, Alexander
Acked-by: Alex Deucher From: amd-gfx on behalf of Harry Wentland Sent: Wednesday, December 19, 2018 2:14:11 PM To: amd-gfx@lists.freedesktop.org Cc: Zuo, Jerry Subject: [PATCH] drm/amd/display: Fix MST dp_blank REG_WAIT timeout From: "Jerry (Fangzhi) Zuo"

Re: After Vega 56/64 GPU hang I unable reboot system

2018-12-19 Thread Mikhail Gavrilov
On Tue, 18 Dec 2018 at 00:08, Grodzovsky, Andrey wrote: > > Please install UMR and dump gfx ring content and waves after the hang is > happening. > > UMR at - https://cgit.freedesktop.org/amd/umr/ > Waves dump > sudo umr -O verbose,halt_waves -wa > GFX ring dump > sudo umr -O verbose,follow -R

[PATCH] drm/amd/display: Fix MST dp_blank REG_WAIT timeout

2018-12-19 Thread Harry Wentland
From: "Jerry (Fangzhi) Zuo" Need to blank stream before deallocate MST payload. [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:944 [ cut here ] WARNING: CPU: 0 PID: 2201 at

Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports

2018-12-19 Thread Lyude Paul
On Wed, 2018-12-19 at 13:48 +0100, Daniel Vetter wrote: > On Tue, Dec 18, 2018 at 04:27:58PM -0500, Lyude Paul wrote: > > On Fri, 2018-12-14 at 10:29 +0100, Daniel Vetter wrote: > > > On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote: > > > > The current way of handling refcounting in the

Re: Powerplay clock information missing on Polaris 12 (msi RX550 LP OC) on armhf

2018-12-19 Thread Luís Mendes
Ok, great! Thanks! Luís On Wed, Dec 19, 2018 at 6:27 PM Deucher, Alexander wrote: > > Those messages are harmless and can be ignored. I think they have been > removed in newer kernels. > > > Alex > > > From: amd-gfx on behalf of Luís > Mendes > Sent:

Re: Powerplay clock information missing on Polaris 12 (msi RX550 LP OC) on armhf

2018-12-19 Thread Deucher, Alexander
Those messages are harmless and can be ignored. I think they have been removed in newer kernels. Alex From: amd-gfx on behalf of Luís Mendes Sent: Wednesday, December 19, 2018 12:31:44 PM To: Koenig, Christian; amd-gfx list Subject: Powerplay clock

Re: [PATCH v4 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-19 Thread Grodzovsky, Andrey
On 12/19/2018 11:21 AM, Christian König wrote: > Am 17.12.18 um 20:51 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from

Powerplay clock information missing on Polaris 12 (msi RX550 LP OC) on armhf

2018-12-19 Thread Luís Mendes
Hi, Just want to report this issue for the Polaris 12, RX550, with kernel 4.19.9 [ 20.272719] amdgpu: [powerplay] Failed to retrieve minimum clocks. [ 20.272723] amdgpu: [powerplay] Error in phm_get_clock_info dmesg excerpt of relevant amdgpu initialization follows below. Regards, Luís

Re: [PATCH amdgpu] Fix crash when page flipping in multi-X-Screen/Zaphod mode

2018-12-19 Thread Michel Dänzer
On 2018-12-19 7:56 a.m., Mario Kleiner wrote: > amdgpu_do_pageflip() indexed the flipdata->fb[] array > indexing over config->num_crtc, but the flip completion > routines, e.g., drmmode_flip_handler(), index that array > via the crtc hw id from drmmode_get_crtc_id(crtc). > > This is mismatched

Re: [PATCH] Fix crash when page flipping in multi-X-Screen/Zaphod mode

2018-12-19 Thread Michel Dänzer
On 2018-12-18 8:09 p.m., Mario Kleiner wrote: > On Tue, Dec 18, 2018 at 3:42 PM Michel Dänzer wrote: >> >> Good catch, thanks! Pushed with >> >> Fixes: 740f0850f1e4 "Store FB for each CRTC in drmmode_flipdata_rec" >> Reviewed-by: Michel Dänzer >> >> >> Do you want to make an xf86-video-amdgpu

Re: [PATCH v4 1/2] drm/sched: Refactor ring mirror list handling.

2018-12-19 Thread Christian König
Am 17.12.18 um 20:51 schrieb Andrey Grodzovsky: Decauple sched threads stop and start and ring mirror list handling from the policy of what to do about the guilty jobs. When stoppping the sched thread and detaching sched fences from non signaled HW fenes wait for all signaled HW fences to

Re: [PATCH 5/5] drm/amd/display: Move the dm update dance to crtc->atomic_check

2018-12-19 Thread Grodzovsky, Andrey
On 12/19/2018 08:54 AM, Kazlauskas, Nicholas wrote: > On 12/18/18 3:12 PM, Grodzovsky, Andrey wrote: >> >> On 12/18/2018 10:26 AM, sunpeng...@amd.com wrote: >>> From: Leo Li >>> >>> drm_atomic_helper_check_planes() calls the crtc atomic check helpers. In >>> an attempt to better align with the

Re: [PATCH 5/5] drm/amd/display: Move the dm update dance to crtc->atomic_check

2018-12-19 Thread Li, Sun peng (Leo)
On 2018-12-18 3:33 p.m., Grodzovsky, Andrey wrote: > > > On 12/18/2018 12:09 PM, Kazlauskas, Nicholas wrote: >> On 12/18/18 10:26 AM, sunpeng...@amd.com wrote: >>> From: Leo Li >>> >>> drm_atomic_helper_check_planes() calls the crtc atomic check helpers. In >>> an attempt to better align

[PATCH] drm/amd/display: Remove duplicate header

2018-12-19 Thread Brajeswar Ghosh
Remove custom_float.h which is included more than once Signed-off-by: Brajeswar Ghosh --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c

Re: [PATCH] drm/amd/display: Remove duplicate header

2018-12-19 Thread Souptick Joarder
On Wed, Dec 19, 2018 at 1:27 PM Brajeswar Ghosh wrote: > > Remove custom_float.h which is included more than once > > Signed-off-by: Brajeswar Ghosh > --- Acked-by: Souptick Joarder > drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 1 - > 1 file changed, 1 deletion(-) > > diff

Re: [PATCH] drm/amd/display: Use div_u64 for flip timestamp ns to ms

2018-12-19 Thread Deucher, Alexander
Could also use do_div. Either way: Reviewed-by: Alex Deucher From: amd-gfx on behalf of Nicholas Kazlauskas Sent: Wednesday, December 19, 2018 9:12:09 AM To: amd-gfx@lists.freedesktop.org Cc: Li, Sun peng (Leo); Wentland, Harry; Kazlauskas, Nicholas Subject:

[PATCH] drm/amd/display: Use div_u64 for flip timestamp ns to ms

2018-12-19 Thread Nicholas Kazlauskas
Resolves __udivdi3 missing errors when building for i386. Fixes: 6378ef012ddc ("drm/amd/display: Add below the range support for FreeSync") Change-Id: I4ded5790160054e6908367f20a63257225517714 Cc: Leo Li Cc: Harry Wentland Signed-off-by: Nicholas Kazlauskas ---

Re: [PATCH 5/5] drm/amd/display: Move the dm update dance to crtc->atomic_check

2018-12-19 Thread Kazlauskas, Nicholas
On 12/18/18 3:12 PM, Grodzovsky, Andrey wrote: > > > On 12/18/2018 10:26 AM, sunpeng...@amd.com wrote: >> From: Leo Li >> >> drm_atomic_helper_check_planes() calls the crtc atomic check helpers. In >> an attempt to better align with the DRM framework, we can move the >> entire dm_update dance

Re: [WIP PATCH 06/15] drm/i915: Keep malloc references to MST ports

2018-12-19 Thread Daniel Vetter
On Tue, Dec 18, 2018 at 04:52:24PM -0500, Lyude Paul wrote: > On Fri, 2018-12-14 at 10:32 +0100, Daniel Vetter wrote: > > On Thu, Dec 13, 2018 at 08:25:35PM -0500, Lyude Paul wrote: > > > So that the ports stay around until we've destroyed the connectors, in > > > order to ensure that we don't

Re: [WIP PATCH 03/15] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports

2018-12-19 Thread Daniel Vetter
On Tue, Dec 18, 2018 at 04:27:58PM -0500, Lyude Paul wrote: > On Fri, 2018-12-14 at 10:29 +0100, Daniel Vetter wrote: > > On Thu, Dec 13, 2018 at 08:25:32PM -0500, Lyude Paul wrote: > > > The current way of handling refcounting in the DP MST helpers is really > > > confusing and probably just

Re: [PATCH] drm/amdgpu/uvd:Change uvd ring name convention

2018-12-19 Thread Christian König
Reviewed-by: Christian König Am 18.12.18 um 23:06 schrieb Deucher, Alexander: Reviewed-by: Alex Deucher *From:* amd-gfx on behalf of Zhu, James *Sent:* Tuesday, December 18, 2018 4:07:21 PM *To:*