from:"Johannes Hirte"

Re: [PATCH] drm/amdgpu: fix the hw hang during perform system reboot and reset

2020-04-13 Thread Johannes Hirte

On 2020 Apr 13, Prike Liang wrote:
> Unify set device CGPG to ungate state before enter poweroff or reset.
> 
> Signed-off-by: Prike Liang 
> Tested-by: Mengbing Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 87f7c12..bbe090a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2413,6 +2413,8 @@ static int amdgpu_device_ip_suspend_phase1(struct 
> amdgpu_device *adev)
>  {
>   int i, r;
>  
> + amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> + amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
>  
>   for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
>   if (!adev->ip_blocks[i].status.valid)
> -- 
> 2.7.4
> 

I can confirm that this fixes the shutdown/reboot hang on my raven.

-- 
Regards,
  Johannes Hirte

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-12 Thread Johannes Hirte

On 2020 Apr 12, Liang, Prike wrote:
> Thanks update and verify. Could you give more detail information and error 
> log message   
> about you observed issue? 
> 
> Thanks,
> Prike

There is no error log, the system just doesn't poweroff/reboot. 

lspci:

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root 
Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
00h-1fh) PCIe Dummy Host Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP 
Bridge [6:0]
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP 
Bridge [6:0]
00:01.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP 
Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal 
PCIe GPP Bridge 0 to Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal 
PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: 
Function 7
01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network 
Adapter (rev 32)
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5762 
Gigabit Ethernet PCIe (rev 10)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI 
Express Card Reader (rev 01)
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven 
Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev d1)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] 
Raven/Raven2/Fenghuang HDMI/DP Audio Controller
04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h 
(Models 10h-1fh) Platform Security Processor
04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 
10h-1fh) HD Audio Controller
04:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] 
Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
05:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller 
[AHCI mode] (rev 61)

-- 
Regards,
  Johannes Hirte

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

2020-04-11 Thread Johannes Hirte

On 2020 Apr 07, Prike Liang wrote:
> The system will be hang up during S3 suspend because of SMU is pending
> for GC not respose the register CP_HQD_ACTIVE access request.This issue
> root cause of accessing the GC register under enter GFX CGGPG and can
> be fixed by disable GFX CGPG before perform suspend.
> 
> v2: Use disable the GFX CGPG instead of RLC safe mode guard.
> 
> Signed-off-by: Prike Liang 
> Tested-by: Mengbing Wang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2e1f955..bf8735b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2440,8 +2440,6 @@ static int amdgpu_device_ip_suspend_phase1(struct 
> amdgpu_device *adev)
>  {
>   int i, r;
>  
> - amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> - amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
>  
>   for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
>   if (!adev->ip_blocks[i].status.valid)
> @@ -3470,6 +3468,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>   }
>   }
>  
> + amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
> + amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
> +
>   amdgpu_amdkfd_suspend(adev, !fbcon);
>  
>   amdgpu_ras_suspend(adev);


This breaks shutdown/reboot on my system (Dell latitude 5495). 

-- 
Regards,
  Johannes Hirte

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec

2018-08-17 Thread Johannes Hirte

On 2018 Aug 16, Michel Dänzer wrote:
> On 2018-08-10 09:06 AM, Johannes Hirte wrote:
> > On 2018 Jul 27, Michel Dänzer wrote:
> >> From: Michel Dänzer 
> >>
> >> We were only storing the FB provided by the client, but on CRTCs with
> >> TearFree enabled, we use a separate FB. This could cause
> >> drmmode_flip_handler to fail to clear drmmode_crtc->flip_pending, which
> >> could result in a hang when waiting for the pending flip to complete. We
> >> were trying to avoid that by always clearing drmmode_crtc->flip_pending
> >> when TearFree is enabled, but that wasn't reliable, because
> >> drmmode_crtc->tear_free can already be FALSE at this point when
> >> disabling TearFree.
> >>
> >> Now that we're keeping track of each CRTC's flip FB separately,
> >> drmmode_flip_handler can reliably clear flip_pending, and we no longer
> >> need the TearFree hack.
> >>
> >> Signed-off-by: Michel Dänzer 
> > 
> > Since this change I get a black screen when login into KDE Plasma. I
> > have to switch to linux console and back for getting the X11 screen.
> > Additional the Xorg.log is spammed with:
> > 
> > [   189.744] (WW) AMDGPU(0): get vblank counter failed: Invalid argument
> > [   189.828] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: 
> > Device or resource busy, TearFree inactive until next modeset
> > [   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
> > Invalid argument
> > [   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
> > Invalid argument
> > 
> > The "flip queue failed" message appears only once, the other two are
> > much more often.
> > 
> > System is a Carrizo A10-8700B, kernel 4.17.13 + this patch:
> > https://bugzilla.kernel.org/attachment.cgi?id=276173
> 
> Does https://patchwork.freedesktop.org/patch/244860/ fix it?
> 
Yes, this fixed it.

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec

2018-08-10 Thread Johannes Hirte

On 2018 Jul 27, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> We were only storing the FB provided by the client, but on CRTCs with
> TearFree enabled, we use a separate FB. This could cause
> drmmode_flip_handler to fail to clear drmmode_crtc->flip_pending, which
> could result in a hang when waiting for the pending flip to complete. We
> were trying to avoid that by always clearing drmmode_crtc->flip_pending
> when TearFree is enabled, but that wasn't reliable, because
> drmmode_crtc->tear_free can already be FALSE at this point when
> disabling TearFree.
> 
> Now that we're keeping track of each CRTC's flip FB separately,
> drmmode_flip_handler can reliably clear flip_pending, and we no longer
> need the TearFree hack.
> 
> Signed-off-by: Michel Dänzer 

Since this change I get a black screen when login into KDE Plasma. I
have to switch to linux console and back for getting the X11 screen.
Additional the Xorg.log is spammed with:

[   189.744] (WW) AMDGPU(0): get vblank counter failed: Invalid argument
[   189.828] (WW) AMDGPU(0): flip queue failed in amdgpu_scanout_flip: Device 
or resource busy, TearFree inactive until next modeset
[   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
Invalid argument
[   189.828] (WW) AMDGPU(0): drmmode_wait_vblank failed for scanout update: 
Invalid argument

The "flip queue failed" message appears only once, the other two are
much more often.

System is a Carrizo A10-8700B, kernel 4.17.13 + this patch:
https://bugzilla.kernel.org/attachment.cgi?id=276173


-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-14 Thread Johannes Hirte

On 2018 Jan 14, Grodzovsky, Andrey wrote:
> To be sure it was inserted at the correct place please send me output of git 
> diff on your modified branch.
> 
> Thanks,
> Andrey
> 

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index bb5fa895fb64..bc2050a5a5c6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4802,7 +4802,7 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
 * synchronization events.
 */

-   if (lock_and_validation_needed) {
+   if (lock_and_validation_needed || state->legacy_cursor_update == true) {

ret = do_aquire_global_lock(dev, state);
if (ret)

If this matters, I've applied the patch on top of 4.15-rc7 with your 
"Fix: Save job's priority on it's creation instead of accessing it from 
s_entity later on." 
patch. This one is still not upstream, but without I see the other
use-after-free

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-13 Thread Johannes Hirte

On 2018 Jan 12, Andrey Grodzovsky wrote:
> Yea, I know , just dumped diff of one file into it, please search in 
> code for
> 
> "ret = do_aquire_global_lock(dev, state);" it appears only in one place 
> in entire code base, and manually apply the one line change.
>

with patch applied:

[ 6887.679618] [drm] {1920x1080, 2250x1132@152840Khz}
[ 6887.806430] [drm] HBRx2 pass VS=1, PE=0
[12432.070076] [drm] {1920x1080, 2250x1132@152840Khz}
[12432.194472] [drm] HBRx2 pass VS=1, PE=0
[13677.257767] 
==
[13677.257812] BUG: KASAN: use-after-free in 
drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[13677.257820] Read of size 8 at addr 8803f0533388 by task 
kworker/u8:6/22172

[13677.257832] CPU: 2 PID: 22172 Comm: kworker/u8:6 Not tainted 
4.15.0-rc7-2-g617b2907a7aa #445
[13677.257837] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 
10/12/2017
[13677.257848] Workqueue: events_unbound commit_work
[13677.257853] Call Trace:
[13677.257867]  dump_stack+0x99/0x11e
[13677.257874]  ? _atomic_dec_and_lock+0x152/0x152
[13677.257886]  print_address_description+0x65/0x270
[13677.257892]  kasan_report+0x272/0x360
[13677.257898]  ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[13677.257903]  drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[13677.257913]  amdgpu_dm_atomic_commit_tail+0x185e/0x2b90
[13677.257923]  ? dm_crtc_duplicate_state+0x130/0x130
[13677.257931]  ? trace_raw_output_rcu_utilization+0xa0/0xa0
[13677.257939]  ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800
[13677.257945]  commit_tail+0x92/0xe0
[13677.257953]  process_one_work+0x84b/0x1600
[13677.257961]  ? tick_nohz_dep_clear_signal+0x20/0x20
[13677.257969]  ? _raw_spin_unlock_irq+0xbe/0x120
[13677.257973]  ? _raw_spin_unlock+0x120/0x120
[13677.257977]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[13677.257984]  ? arch_vtime_task_switch+0xee/0x190
[13677.257991]  ? finish_task_switch+0x27d/0x7f0
[13677.257995]  ? wq_worker_waking_up+0xc0/0xc0
[13677.258000]  ? copy_overflow+0x20/0x20
[13677.258010]  ? pci_mmcfg_check_reserved+0x100/0x100
[13677.258014]  ? pci_mmcfg_check_reserved+0x100/0x100
[13677.258022]  ? schedule+0xfb/0x3b0
[13677.258027]  ? __schedule+0x19b0/0x19b0
[13677.258031]  ? preempt_schedule_common+0x30/0xb0
[13677.258038]  ? ___preempt_schedule+0x16/0x18
[13677.258043]  ? _raw_spin_unlock_irq+0xfa/0x120
[13677.258047]  ? _raw_spin_unlock+0x120/0x120
[13677.258052]  worker_thread+0x211/0x1790
[13677.258060]  ? pick_next_task_fair+0x313/0x10f0
[13677.258065]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[13677.258073]  ? cyc2ns_read_end+0x20/0x20
[13677.258078]  ? tick_nohz_dep_clear_signal+0x20/0x20
[13677.258083]  ? get_vtime_delta+0x16/0xd0
[13677.258087]  ? _raw_spin_unlock_irq+0xbe/0x120
[13677.258091]  ? _raw_spin_unlock+0x120/0x120
[13677.258098]  ? finish_task_switch+0x27d/0x7f0
[13677.258104]  ? sched_clock_cpu+0x18/0x1e0
[13677.258110]  ? ret_from_fork+0x1f/0x30
[13677.258116]  ? pci_mmcfg_check_reserved+0x100/0x100
[13677.258120]  ? get_vtime_delta+0x16/0xd0
[13677.258125]  ? cyc2ns_read_end+0x20/0x20
[13677.258131]  ? schedule+0xfb/0x3b0
[13677.258136]  ? __schedule+0x19b0/0x19b0
[13677.258141]  ? remove_wait_queue+0x2b0/0x2b0
[13677.258146]  ? arch_vtime_task_switch+0xee/0x190
[13677.258151]  ? _raw_spin_unlock_irqrestore+0xc2/0x130
[13677.258156]  ? _raw_spin_unlock_irq+0x120/0x120
[13677.258162]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[13677.258167]  kthread+0x2d4/0x390
[13677.258172]  ? kthread_create_worker+0xd0/0xd0
[13677.258177]  ret_from_fork+0x1f/0x30

[13677.258188] Allocated by task 2377:
[13677.258196]  kasan_kmalloc+0xa0/0xd0
[13677.258202]  kmem_cache_alloc_trace+0xd1/0x1e0
[13677.258208]  dm_crtc_duplicate_state+0x73/0x130
[13677.258214]  drm_atomic_get_crtc_state+0x13c/0x400
[13677.258218]  page_flip_common+0x52/0x230
[13677.258223]  drm_atomic_helper_page_flip+0xa1/0x100
[13677.258230]  drm_mode_page_flip_ioctl+0xc10/0x1030
[13677.258236]  drm_ioctl_kernel+0x1b5/0x2c0
[13677.258240]  drm_ioctl+0x709/0xa00
[13677.258245]  amdgpu_drm_ioctl+0x118/0x280
[13677.258250]  do_vfs_ioctl+0x18a/0x1260
[13677.258254]  SyS_ioctl+0x6f/0x80
[13677.258258]  do_syscall_64+0x220/0x670
[13677.258262]  return_from_SYSCALL_64+0x0/0x65

[13677.258267] Freed by task 2523:
[13677.258273]  kasan_slab_free+0x71/0xc0
[13677.258276]  kfree+0x88/0x1b0
[13677.258280]  drm_atomic_state_default_clear+0x2c8/0xa00
[13677.258285]  __drm_atomic_state_free+0x30/0xd0
[13677.258289]  drm_atomic_helper_update_plane+0xb6/0x350
[13677.258293]  __setplane_internal+0x5b4/0x9d0
[13677.258297]  drm_mode_cursor_universal+0x412/0xc60
[13677.258301]  drm_mode_cursor_common+0x4b6/0x890
[13677.258305]  drm_mode_cursor_ioctl+0xd3/0x120
[13677.258309]  drm_ioctl_kernel+0x1b5/0x2c0
[13677.258313]  drm_ioctl+0x709/0xa00
[13677.258316]  amdgpu_drm_ioctl+0x118/0x280
[13677.258319]  do_vfs_ioctl+0x18a/0x1260
[13677.258323]  SyS_ioctl+0x6f/0x80
[13677.258326]

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-12 Thread Johannes Hirte

On 2018 Jan 12, Andrey Grodzovsky wrote:
> Hi, looks to me  like a different issue (not related) then the one  
> Johannes, reports, your issue was already reported by some one (can't 
> remember the thread of hand) and looks like in shader hang or GPU 
> scheduler synchronization issue while  Johannes's use after free is pure 
> software logic issue in either KMS atomic framework or more probably in 
> AMDGPU/DC (DAL).
> 
> 
> Johanes, I attached a debug patch which forces the cursor update to wait 
> for any  page flip in progress, can you give it a try and see if the 
> issue is gone ? This is not an actual fix but just to evaluate the reason.
> 

> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 5a70682..323d020 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4908,7 +4908,7 @@ static int amdgpu_dm_atomic_check(struct drm_device 
> *dev,
>  * synchronization events.
>  */
>  
> -   if (lock_and_validation_needed) {
> +   if (lock_and_validation_needed || state->legacy_cursor_update == 
> true) {
>  
> ret = do_aquire_global_lock(dev, state);
> if (ret)
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index a1a751b..6d6ffdf 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c

The patch seems incomplete. 

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-12 Thread Johannes Hirte

On 2018 Jan 11, Andrey Grodzovsky wrote:
> Thanks for the dmesg, unfortunately nothing suspicious from there.
> 
> Looking again at KASAN it hints at a race between cursor update and non 
> blocking part of flip with regard to accessing CRTC states, maybe cursor 
> update is not properly synchronized against a flip in flight on same CRTC...
> 
> P.S What is your setup ? How many displays ?
> 

It's a Carizzo A10-8700B R6 with 16G RAM, 512M assigned to graphics
card. Only the laptop display (1920x1080) is connected via eDP, so nothing 
special.

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-11 Thread Johannes Hirte

On 2018 Jan 10, Andrey Grodzovsky wrote:
> 
> Hi, is there a particular scenario when this happens , 

Unfortunately no, I still search for a reproducer. Sometimes it takes
several days until the next use-after-free.

> can you add dmesg with echo 0x10 > /sys/module/drm/parameters/debug?

I assume you want the debug output when a use-after-free happened. Here
it is:

Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_init] Allocated atomic 
state a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_plane_state] Added 
[PLANE:40:plane-4] 9b693a40 state to a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:41:crtc-0] fd68d0e6 state to a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_crtc_for_plane] Link plane 
state 9b693a40 to [CRTC:41:crtc-0]
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_fb_for_plane] Set [FB:48] 
for plane state 9b693a40
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_check_only] checking 
a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_commit] committing 
a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_default_clear] Clearing 
atomic state a67d7f62
Jan 11 23:21:33 probook kernel: [drm:__drm_atomic_state_free] Freeing atomic 
state a67d7f62
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_init] Allocated atomic 
state aff36e64
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_plane_state] Added 
[PLANE:40:plane-4] bef4ac0a state to aff36e64
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_get_crtc_state] Added 
[CRTC:41:crtc-0] 487e5e13 state to aff36e64
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_crtc_for_plane] Link plane 
state bef4ac0a to [CRTC:41:crtc-0]
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_set_fb_for_plane] Set [FB:48] 
for plane state bef4ac0a
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_check_only] checking 
aff36e64
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_commit] committing 
aff36e64
Jan 11 23:21:33 probook kernel: [drm:drm_atomic_state_default_clear] Clearing 
atomic state aff36e64
Jan 11 23:21:33 probook kernel: [drm:__drm_atomic_state_free] Freeing atomic 
state aff36e64
Jan 11 23:21:33 probook kernel: 
==
Jan 11 23:21:33 probook kernel: BUG: KASAN: use-after-free in 
drm_atomic_helper_wait_for_flip_done+0x24f/0x270
Jan 11 23:21:33 probook kernel: Read of size 8 at addr 8801e020d788 by task 
kworker/u8:6/18738
Jan 11 23:21:33 probook kernel: 
Jan 11 23:21:33 probook kernel: CPU: 2 PID: 18738 Comm: kworker/u8:6 Not 
tainted 4.15.0-rc7-1-gd24b113b5c00 #444
Jan 11 23:21:33 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS 
N77 Ver. 01.10 10/12/2017
Jan 11 23:21:33 probook kernel: Workqueue: events_unbound commit_work
Jan 11 23:21:33 probook kernel: Call Trace:
Jan 11 23:21:33 probook kernel:  dump_stack+0x99/0x11e
Jan 11 23:21:33 probook kernel:  ? _atomic_dec_and_lock+0x152/0x152
Jan 11 23:21:33 probook kernel:  print_address_description+0x65/0x270
Jan 11 23:21:33 probook kernel:  kasan_report+0x272/0x360
Jan 11 23:21:33 probook kernel:  ? 
drm_atomic_helper_wait_for_flip_done+0x24f/0x270
Jan 11 23:21:33 probook kernel:  
drm_atomic_helper_wait_for_flip_done+0x24f/0x270
Jan 11 23:21:33 probook kernel:  amdgpu_dm_atomic_commit_tail+0x185e/0x2b90
Jan 11 23:21:33 probook kernel:  ? dm_crtc_duplicate_state+0x130/0x130
Jan 11 23:21:33 probook kernel:  ? 
drm_atomic_helper_wait_for_dependencies+0x3f2/0x800
Jan 11 23:21:33 probook kernel:  commit_tail+0x92/0xe0
Jan 11 23:21:33 probook kernel:  process_one_work+0x84b/0x1600
Jan 11 23:21:33 probook kernel:  ? tick_nohz_dep_clear_signal+0x20/0x20
Jan 11 23:21:33 probook kernel:  ? _raw_spin_unlock_irq+0xbe/0x120
Jan 11 23:21:33 probook kernel:  ? _raw_spin_unlock+0x120/0x120
Jan 11 23:21:33 probook kernel:  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
Jan 11 23:21:33 probook kernel:  ? arch_vtime_task_switch+0xee/0x190
Jan 11 23:21:33 probook kernel:  ? finish_task_switch+0x27d/0x7f0
Jan 11 23:21:33 probook kernel:  ? wq_worker_waking_up+0xc0/0xc0
Jan 11 23:21:33 probook kernel:  ? copy_overflow+0x20/0x20
Jan 11 23:21:33 probook kernel:  ? sched_clock_cpu+0x18/0x1e0
Jan 11 23:21:33 probook kernel:  ? pci_mmcfg_check_reserved+0x100/0x100
Jan 11 23:21:33 probook kernel:  ? preempt_schedule_irq+0x4e/0xb0
Jan 11 23:21:33 probook kernel:  ? schedule+0xfb/0x3b0
Jan 11 23:21:33 probook kernel:  ? __schedule+0x19b0/0x19b0
Jan 11 23:21:33 probook kernel:  ? _raw_spin_unlock_irq+0xb9/0x120
Jan 11 23:21:33 probook kernel:  ? _raw_spin_unlock_irq+0xbe/0x120
Jan 11 23:21:33 probook kernel:  ? _raw_spin_unlock+0x120/0x120
Jan 11 23:21:33 probook kernel:  worker_thread+0x211/0x1790
Jan 11 23:21:33 probook kernel:  ? 
trace_event_raw_event_workqueue_work+0x170/0x170
Jan 11 23:21:33

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-09 Thread Johannes Hirte

On 2018 Jan 03, Johannes Hirte wrote:
> On 2018 Jan 03, Johannes Hirte wrote:
> > This should be fixed already with 
> > https://lists.freedesktop.org/archives/amd-gfx/2017-October/014932.html
> > but's still missing upstream.
> > 
> 
> With this patch, the use-after-free in amdgpu_job_free_cb seems to be
> gone. But now I get an use-after-free in
> drm_atomic_helper_wait_for_flip_done:
> 
> [89387.069387] 
> ==
> [89387.069407] BUG: KASAN: use-after-free in 
> drm_atomic_helper_wait_for_flip_done+0x24f/0x270
> [89387.069413] Read of size 8 at addr 880124df0688 by task 
> kworker/u8:3/31426
> 
> [89387.069423] CPU: 1 PID: 31426 Comm: kworker/u8:3 Not tainted 
> 4.15.0-rc6-1-ge0895ba8d88e #442
> [89387.069427] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 
> 10/12/2017
> [89387.069435] Workqueue: events_unbound commit_work
> [89387.069440] Call Trace:
> [89387.069448]  dump_stack+0x99/0x11e
> [89387.069453]  ? _atomic_dec_and_lock+0x152/0x152
> [89387.069460]  print_address_description+0x65/0x270
> [89387.069465]  kasan_report+0x272/0x360
> [89387.069470]  ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270
> [89387.069475]  drm_atomic_helper_wait_for_flip_done+0x24f/0x270
> [89387.069483]  amdgpu_dm_atomic_commit_tail+0x185e/0x2b90
> [89387.069492]  ? dm_crtc_duplicate_state+0x130/0x130
> [89387.069498]  ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800
> [89387.069504]  commit_tail+0x92/0xe0
> [89387.069511]  process_one_work+0x84b/0x1600
> [89387.069517]  ? tick_nohz_dep_clear_signal+0x20/0x20
> [89387.069522]  ? _raw_spin_unlock_irq+0xbe/0x120
> [89387.069525]  ? _raw_spin_unlock+0x120/0x120
> [89387.069529]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
> [89387.069534]  ? arch_vtime_task_switch+0xee/0x190
> [89387.069539]  ? finish_task_switch+0x27d/0x7f0
> [89387.069542]  ? wq_worker_waking_up+0xc0/0xc0
> [89387.069547]  ? copy_overflow+0x20/0x20
> [89387.069550]  ? sched_clock_cpu+0x18/0x1e0
> [89387.069558]  ? pci_mmcfg_check_reserved+0x100/0x100
> [89387.069562]  ? pci_mmcfg_check_reserved+0x100/0x100
> [89387.069569]  ? schedule+0xfb/0x3b0
> [89387.069574]  ? __schedule+0x19b0/0x19b0
> [89387.069578]  ? _raw_spin_unlock_irq+0xb9/0x120
> [89387.069582]  ? _raw_spin_unlock_irq+0xbe/0x120
> [89387.069585]  ? _raw_spin_unlock+0x120/0x120
> [89387.069590]  worker_thread+0x211/0x1790
> [89387.069597]  ? pick_next_task_fair+0x313/0x10f0
> [89387.069601]  ? trace_event_raw_event_workqueue_work+0x170/0x170
> [89387.069606]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
> [89387.069612]  ? tick_nohz_dep_clear_signal+0x20/0x20
> [89387.069616]  ? account_idle_time+0x94/0x1f0
> [89387.069620]  ? _raw_spin_unlock_irq+0xbe/0x120
> [89387.069623]  ? _raw_spin_unlock+0x120/0x120
> [89387.069628]  ? finish_task_switch+0x27d/0x7f0
> [89387.069633]  ? sched_clock_cpu+0x18/0x1e0
> [89387.069639]  ? ret_from_fork+0x1f/0x30
> [89387.069644]  ? pci_mmcfg_check_reserved+0x100/0x100
> [89387.069650]  ? cyc2ns_read_end+0x20/0x20
> [89387.069657]  ? schedule+0xfb/0x3b0
> [89387.069662]  ? __schedule+0x19b0/0x19b0
> [89387.069666]  ? remove_wait_queue+0x2b0/0x2b0
> [89387.069670]  ? arch_vtime_task_switch+0xee/0x190
> [89387.069675]  ? _raw_spin_unlock_irqrestore+0xc2/0x130
> [89387.069679]  ? _raw_spin_unlock_irq+0x120/0x120
> [89387.069683]  ? trace_event_raw_event_workqueue_work+0x170/0x170
> [89387.069688]  kthread+0x2d4/0x390
> [89387.069693]  ? kthread_create_worker+0xd0/0xd0
> [89387.069697]  ret_from_fork+0x1f/0x30
> 
> [89387.069705] Allocated by task 2387:
> [89387.069712]  kasan_kmalloc+0xa0/0xd0
> [89387.069717]  kmem_cache_alloc_trace+0xd1/0x1e0
> [89387.069722]  dm_crtc_duplicate_state+0x73/0x130
> [89387.069726]  drm_atomic_get_crtc_state+0x13c/0x400
> [89387.069730]  page_flip_common+0x52/0x230
> [89387.069734]  drm_atomic_helper_page_flip+0xa1/0x100
> [89387.069739]  drm_mode_page_flip_ioctl+0xc10/0x1030
> [89387.069744]  drm_ioctl_kernel+0x1b5/0x2c0
> [89387.069748]  drm_ioctl+0x709/0xa00
> [89387.069752]  amdgpu_drm_ioctl+0x118/0x280
> [89387.069756]  do_vfs_ioctl+0x18a/0x1260
> [89387.069760]  SyS_ioctl+0x6f/0x80
> [89387.069764]  do_syscall_64+0x220/0x670
> [89387.069768]  return_from_SYSCALL_64+0x0/0x65
> 
> [89387.069772] Freed by task 2533:
> [89387.069776]  kasan_slab_free+0x71/0xc0
> [89387.069780]  kfree+0x88/0x1b0
> [89387.069784]  drm_atomic_state_default_clear+0x2c8/0xa00
> [89387.069787]  __drm_atomic_state_free+0x30/0xd0
> [89387.069791]  drm_atomic_helper_update_plane+0xb6/0x350
> [89387.069794]  __setplane_internal+0x5b4/0x9d0
> [89387.069798]  drm_mode_cursor_universal+0x412/0xc60
> [89387.0

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-03 Thread Johannes Hirte

On 2018 Jan 03, Johannes Hirte wrote:
> This should be fixed already with 
> https://lists.freedesktop.org/archives/amd-gfx/2017-October/014932.html
> but's still missing upstream.
> 

With this patch, the use-after-free in amdgpu_job_free_cb seems to be
gone. But now I get an use-after-free in
drm_atomic_helper_wait_for_flip_done:

[89387.069387] 
==
[89387.069407] BUG: KASAN: use-after-free in 
drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[89387.069413] Read of size 8 at addr 880124df0688 by task 
kworker/u8:3/31426

[89387.069423] CPU: 1 PID: 31426 Comm: kworker/u8:3 Not tainted 
4.15.0-rc6-1-ge0895ba8d88e #442
[89387.069427] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 
10/12/2017
[89387.069435] Workqueue: events_unbound commit_work
[89387.069440] Call Trace:
[89387.069448]  dump_stack+0x99/0x11e
[89387.069453]  ? _atomic_dec_and_lock+0x152/0x152
[89387.069460]  print_address_description+0x65/0x270
[89387.069465]  kasan_report+0x272/0x360
[89387.069470]  ? drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[89387.069475]  drm_atomic_helper_wait_for_flip_done+0x24f/0x270
[89387.069483]  amdgpu_dm_atomic_commit_tail+0x185e/0x2b90
[89387.069492]  ? dm_crtc_duplicate_state+0x130/0x130
[89387.069498]  ? drm_atomic_helper_wait_for_dependencies+0x3f2/0x800
[89387.069504]  commit_tail+0x92/0xe0
[89387.069511]  process_one_work+0x84b/0x1600
[89387.069517]  ? tick_nohz_dep_clear_signal+0x20/0x20
[89387.069522]  ? _raw_spin_unlock_irq+0xbe/0x120
[89387.069525]  ? _raw_spin_unlock+0x120/0x120
[89387.069529]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[89387.069534]  ? arch_vtime_task_switch+0xee/0x190
[89387.069539]  ? finish_task_switch+0x27d/0x7f0
[89387.069542]  ? wq_worker_waking_up+0xc0/0xc0
[89387.069547]  ? copy_overflow+0x20/0x20
[89387.069550]  ? sched_clock_cpu+0x18/0x1e0
[89387.069558]  ? pci_mmcfg_check_reserved+0x100/0x100
[89387.069562]  ? pci_mmcfg_check_reserved+0x100/0x100
[89387.069569]  ? schedule+0xfb/0x3b0
[89387.069574]  ? __schedule+0x19b0/0x19b0
[89387.069578]  ? _raw_spin_unlock_irq+0xb9/0x120
[89387.069582]  ? _raw_spin_unlock_irq+0xbe/0x120
[89387.069585]  ? _raw_spin_unlock+0x120/0x120
[89387.069590]  worker_thread+0x211/0x1790
[89387.069597]  ? pick_next_task_fair+0x313/0x10f0
[89387.069601]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[89387.069606]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
[89387.069612]  ? tick_nohz_dep_clear_signal+0x20/0x20
[89387.069616]  ? account_idle_time+0x94/0x1f0
[89387.069620]  ? _raw_spin_unlock_irq+0xbe/0x120
[89387.069623]  ? _raw_spin_unlock+0x120/0x120
[89387.069628]  ? finish_task_switch+0x27d/0x7f0
[89387.069633]  ? sched_clock_cpu+0x18/0x1e0
[89387.069639]  ? ret_from_fork+0x1f/0x30
[89387.069644]  ? pci_mmcfg_check_reserved+0x100/0x100
[89387.069650]  ? cyc2ns_read_end+0x20/0x20
[89387.069657]  ? schedule+0xfb/0x3b0
[89387.069662]  ? __schedule+0x19b0/0x19b0
[89387.069666]  ? remove_wait_queue+0x2b0/0x2b0
[89387.069670]  ? arch_vtime_task_switch+0xee/0x190
[89387.069675]  ? _raw_spin_unlock_irqrestore+0xc2/0x130
[89387.069679]  ? _raw_spin_unlock_irq+0x120/0x120
[89387.069683]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[89387.069688]  kthread+0x2d4/0x390
[89387.069693]  ? kthread_create_worker+0xd0/0xd0
[89387.069697]  ret_from_fork+0x1f/0x30

[89387.069705] Allocated by task 2387:
[89387.069712]  kasan_kmalloc+0xa0/0xd0
[89387.069717]  kmem_cache_alloc_trace+0xd1/0x1e0
[89387.069722]  dm_crtc_duplicate_state+0x73/0x130
[89387.069726]  drm_atomic_get_crtc_state+0x13c/0x400
[89387.069730]  page_flip_common+0x52/0x230
[89387.069734]  drm_atomic_helper_page_flip+0xa1/0x100
[89387.069739]  drm_mode_page_flip_ioctl+0xc10/0x1030
[89387.069744]  drm_ioctl_kernel+0x1b5/0x2c0
[89387.069748]  drm_ioctl+0x709/0xa00
[89387.069752]  amdgpu_drm_ioctl+0x118/0x280
[89387.069756]  do_vfs_ioctl+0x18a/0x1260
[89387.069760]  SyS_ioctl+0x6f/0x80
[89387.069764]  do_syscall_64+0x220/0x670
[89387.069768]  return_from_SYSCALL_64+0x0/0x65

[89387.069772] Freed by task 2533:
[89387.069776]  kasan_slab_free+0x71/0xc0
[89387.069780]  kfree+0x88/0x1b0
[89387.069784]  drm_atomic_state_default_clear+0x2c8/0xa00
[89387.069787]  __drm_atomic_state_free+0x30/0xd0
[89387.069791]  drm_atomic_helper_update_plane+0xb6/0x350
[89387.069794]  __setplane_internal+0x5b4/0x9d0
[89387.069798]  drm_mode_cursor_universal+0x412/0xc60
[89387.069801]  drm_mode_cursor_common+0x4b6/0x890
[89387.069805]  drm_mode_cursor_ioctl+0xd3/0x120
[89387.069809]  drm_ioctl_kernel+0x1b5/0x2c0
[89387.069813]  drm_ioctl+0x709/0xa00
[89387.069816]  amdgpu_drm_ioctl+0x118/0x280
[89387.069819]  do_vfs_ioctl+0x18a/0x1260
[89387.069822]  SyS_ioctl+0x6f/0x80
[89387.069824]  do_syscall_64+0x220/0x670
[89387.069828]  return_from_SYSCALL_64+0x0/0x65

[89387.069834] The buggy address belongs to the object at 880124df0480
[89387.069839] The buggy address is located 520 bytes inside of
[89387.06984

BUG: KASAN: use-after-free in amdgpu_job_free_cb

2018-01-03 Thread Johannes Hirte

I still get a use-after-free with linux-4.15-rc6:

[   16.788943] 
==
[   16.788968] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x140/0x150
[   16.788975] Read of size 8 at addr 8803dfe4b3c8 by task kworker/0:2/1355

[   16.788986] CPU: 0 PID: 1355 Comm: kworker/0:2 Not tainted 4.15.0-rc6 #438
[   16.788990] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.10 
10/12/2017
[   16.788998] Workqueue: events amd_sched_job_finish
[   16.789003] Call Trace:
[   16.789012]  dump_stack+0x99/0x11e
[   16.789018]  ? _atomic_dec_and_lock+0x152/0x152
[   16.789026]  print_address_description+0x65/0x270
[   16.789032]  kasan_report+0x272/0x360
[   16.789038]  ? amdgpu_job_free_cb+0x140/0x150
[   16.789043]  amdgpu_job_free_cb+0x140/0x150
[   16.789049]  amd_sched_job_finish+0x288/0x560
[   16.789055]  ? amd_sched_process_job+0x220/0x220
[   16.789061]  ? __queue_delayed_work+0x211/0x360
[   16.789067]  ? pick_next_task_fair+0xcff/0x10f0
[   16.789073]  ? _raw_spin_unlock_irq+0xbe/0x120
[   16.789077]  ? _raw_spin_unlock+0x120/0x120
[   16.789082]  process_one_work+0x84b/0x1600
[   16.789088]  ? tick_nohz_dep_clear_signal+0x20/0x20
[   16.789093]  ? _raw_spin_unlock_irq+0xbe/0x120
[   16.789097]  ? _raw_spin_unlock+0x120/0x120
[   16.789101]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[   16.789107]  ? compat_start_thread+0x70/0x70
[   16.789111]  ? cyc2ns_read_end+0x20/0x20
[   16.789117]  ? finish_task_switch+0x27d/0x7f0
[   16.789121]  ? wq_worker_waking_up+0xc0/0xc0
[   16.789127]  ? sched_clock_cpu+0x18/0x1e0
[   16.789133]  ? task_change_group_fair+0x7e0/0x7e0
[   16.789139]  ? pci_mmcfg_check_reserved+0x100/0x100
[   16.789143]  ? load_balance+0x3120/0x3120
[   16.789148]  ? perf_event_exit_task+0x91f/0xe20
[   16.789156]  ? schedule+0xfb/0x3b0
[   16.789160]  ? __schedule+0x19b0/0x19b0
[   16.789165]  ? _raw_spin_unlock_irq+0xb9/0x120
[   16.789169]  ? _raw_spin_unlock_irq+0xbe/0x120
[   16.789172]  ? _raw_spin_unlock+0x120/0x120
[   16.789177]  worker_thread+0x211/0x1790
[   16.789184]  ? pick_next_task_fair+0x97d/0x10f0
[   16.789188]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[   16.789194]  ? tick_nohz_dep_clear_signal+0x20/0x20
[   16.789199]  ? _raw_spin_unlock_irq+0xbe/0x120
[   16.789202]  ? _raw_spin_unlock+0x120/0x120
[   16.789207]  ? compat_start_thread+0x70/0x70
[   16.789212]  ? finish_task_switch+0x27d/0x7f0
[   16.789217]  ? sched_clock_cpu+0x18/0x1e0
[   16.789223]  ? ret_from_fork+0x1f/0x30
[   16.789228]  ? pci_mmcfg_check_reserved+0x100/0x100
[   16.789233]  ? get_task_cred+0x210/0x210
[   16.789238]  ? cyc2ns_read_end+0x20/0x20
[   16.789245]  ? schedule+0xfb/0x3b0
[   16.789249]  ? __schedule+0x19b0/0x19b0
[   16.789254]  ? remove_wait_queue+0x2b0/0x2b0
[   16.789258]  ? arch_vtime_task_switch+0xee/0x190
[   16.789263]  ? _raw_spin_unlock_irqrestore+0xc2/0x130
[   16.789267]  ? _raw_spin_unlock_irq+0x120/0x120
[   16.789273]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[   16.789277]  kthread+0x2d4/0x390
[   16.789282]  ? kthread_create_worker+0xd0/0xd0
[   16.789286]  ? umh_complete+0x60/0x60
[   16.789290]  ret_from_fork+0x1f/0x30

[   16.789298] Allocated by task 2385:
[   16.789304]  kasan_kmalloc+0xa0/0xd0
[   16.789309]  kmem_cache_alloc_trace+0xd1/0x1e0
[   16.789314]  amdgpu_driver_open_kms+0x12b/0x4d0
[   16.789320]  drm_open+0x7c3/0x1100
[   16.789324]  drm_stub_open+0x2a8/0x400
[   16.789329]  chrdev_open+0x1eb/0x5a0
[   16.789333]  do_dentry_open+0x5a1/0xc50
[   16.789337]  path_openat+0x11d3/0x4e90
[   16.789341]  do_filp_open+0x239/0x3c0
[   16.789344]  do_sys_open+0x402/0x630
[   16.789349]  do_syscall_64+0x220/0x670
[   16.789353]  return_from_SYSCALL_64+0x0/0x65

[   16.789357] Freed by task 2541:
[   16.789362]  kasan_slab_free+0x71/0xc0
[   16.789365]  kfree+0x88/0x1b0
[   16.789369]  amdgpu_driver_postclose_kms+0x469/0x860
[   16.789373]  drm_release+0x8a8/0x1180
[   16.789377]  __fput+0x2ab/0x730
[   16.789380]  task_work_run+0x14b/0x200
[   16.789384]  exit_to_usermode_loop+0x151/0x180
[   16.789387]  do_syscall_64+0x4ed/0x670
[   16.789391]  return_from_SYSCALL_64+0x0/0x65

[   16.789397] The buggy address belongs to the object at 8803dfe4b300
[   16.789403] The buggy address is located 200 bytes inside of
[   16.789406] The buggy address belongs to the page:
[   16.789413] page:4ccd276f count:1 mapcount:0 mapping:  
(null) index:0x0 compound_mapcount: 0
[   16.789421] flags: 0x20008100(slab|head)
[   16.789428] raw: 20008100   
0001000f000f
[   16.789433] raw: dead0100 dead0200 8803f3002a80 

[   16.789436] page dumped because: kasan: bad access detected

[   16.789441] Memory state around the buggy address:
[   16.789445]  8803dfe4b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[   16.789449]  8803dfe4b300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb

Re: Fixes for 4.15-rc1

2017-11-28 Thread Johannes Hirte

On 2017 Nov 28, Harry Wentland wrote:
> Hi Alex,
> 
> I cherry-picked a bunch of fixes for 4.15. These can be found at 
> hwentlan/4.15-rc1-fixes.
> 
> Of the changes the highlighted ones (with *) in particular are highly 
> recommended, but even the other ones are probably good to have.
> 
> * af54c36e0c30 drm/amd/display: Do not put drm_atomic_state on resume

This one is really needed, cause it fixes a use-after-free. See this
thread: https://lists.freedesktop.org/archives/amd-gfx/2017-November/016236.html

Additionally, another use-after-free waits for fixing in 4.15-rc:
https://lists.freedesktop.org/archives/amd-gfx/2017-October/014827.html

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Kernel crash/Null pointer dereference on vblank

2017-11-23 Thread Johannes Hirte

On 2017 Nov 23, Leo Li wrote:
> Hi Johannes,
> 
> The s3 resume issue looks to be a problem with amdgpu/display. Could you 
> give the attached patch a try?
> 
> Thanks,
> Leo
> 
> On 2017-11-23 07:27 AM, Johannes Hirte wrote:
> > On 2017 Nov 23, Chunming Zhou wrote:
> >> See the attached email, they fixed same issue, each of them is ok to fix
> >> your issue, your calltrace is  same as the second.
> >>
> >> We should already push the first patch in early time, could you check if
> >> the first patch is in your branch?
> >>
> > 
> > This patch (series) is not upstream yet. Just tested it, but this doesn't 
> > fix the
> > use-after-free on S3 resume with dc enabled.
> > 

> From 8656ef112d53f8c08f6571dd0d093f03d2e6cc30 Mon Sep 17 00:00:00 2001
> From: "Leo (Sunpeng) Li" <sunpeng...@amd.com>
> Date: Thu, 16 Nov 2017 15:17:27 -0500
> Subject: [PATCH] drm/amdgpu/display: Do not put drm_atomic_state on resume
> 
> drm_atomic_helper_resume now puts it for us. See relevant patch here:
> https://lists.freedesktop.org/archives/dri-devel/2017-October/154268.html
> 
> Change-Id: Ief246492f721a1cf281d48e9d1a7029e5cefc2da
> Signed-off-by: Leo (Sunpeng) Li <sunpeng...@amd.com>
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 5731167..951ea77 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -688,7 +688,6 @@ int amdgpu_dm_display_resume(struct amdgpu_device *adev)
>  
>   ret = drm_atomic_helper_resume(ddev, adev->dm.cached_state);
>  
> - drm_atomic_state_put(adev->dm.cached_state);
>   adev->dm.cached_state = NULL;
>  
>   amdgpu_dm_irq_resume_late(adev);
> -- 
> 2.7.4
> 

Looks good, with this patch the use-after-free is gone and S3 resume woks as
expected.

You can add my Tested-by.

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Kernel crash/Null pointer dereference on vblank

2017-11-23 Thread Johannes Hirte

On 2017 Nov 23, Chunming Zhou wrote:
> See the attached email, they fixed same issue, each of them is ok to fix 
> your issue, your calltrace is  same as the second.
> 
> We should already push the first patch in early time, could you check if 
> the first patch is in your branch?
>

This patch (series) is not upstream yet. Just tested it, but this doesn't fix 
the
use-after-free on S3 resume with dc enabled. 

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Kernel crash/Null pointer dereference on vblank

2017-11-23 Thread Johannes Hirte

On 2017 Nov 23, Chunming Zhou wrote:
> Which driver are you using?
> 
> I guess your driver is a bit old, the issue should be fixed before.
> 

This was with git master from Linus. But even with the latest changes
from agd5f/drm-next-4.15 both use-after-free still persist. If there are
fixes for this, they're not available for upstream.

-- 
Regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Kernel crash/Null pointer dereference on vblank

2017-11-22 Thread Johannes Hirte

Ok, now I have more use-after-free report, this time without dc. I
don't know if this is related, but I didn't have runtime errors without
dc for now. 

kasan report:

[22697.845475] 
==
[22697.845495] BUG: KASAN: use-after-free in amdgpu_job_free_cb+0x140/0x150
[22697.845500] Read of size 8 at addr 8801c02e91c8 by task kworker/0:2/22547

[22697.845509] CPU: 0 PID: 22547 Comm: kworker/0:2 Not tainted 
4.14.0-11095-g0c86a6bd85ff #404
[22697.845513] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.09 
06/09/2017
[22697.845520] Workqueue: events amd_sched_job_finish
[22697.845525] Call Trace:
[22697.845534]  dump_stack+0x99/0x11e
[22697.845541]  ? _atomic_dec_and_lock+0x152/0x152
[22697.845548]  print_address_description+0x65/0x270
[22697.845553]  kasan_report+0x272/0x360
[22697.845557]  ? amdgpu_job_free_cb+0x140/0x150
[22697.845562]  amdgpu_job_free_cb+0x140/0x150
[22697.845566]  amd_sched_job_finish+0x288/0x560
[22697.845571]  ? amd_sched_process_job+0x220/0x220
[22697.845576]  ? amdgpu_unpin_work_func+0x266/0x460
[22697.845582]  ? _raw_spin_unlock_irq+0xbe/0x120
[22697.845587]  ? _raw_spin_unlock+0x120/0x120
[22697.845593]  process_one_work+0x84b/0x1600
[22697.845599]  ? tick_nohz_dep_clear_signal+0x20/0x20
[22697.845603]  ? _raw_spin_unlock_irq+0xbe/0x120
[22697.845607]  ? _raw_spin_unlock+0x120/0x120
[22697.845611]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
[22697.845617]  ? release_thread+0xa0/0xe0
[22697.845621]  ? cyc2ns_read_end+0x20/0x20
[22697.845626]  ? finish_task_switch+0x27d/0x7f0
[22697.845630]  ? wq_worker_waking_up+0xc0/0xc0
[22697.845640]  ? pci_mmcfg_check_reserved+0x100/0x100
[22697.845644]  ? pci_mmcfg_check_reserved+0x100/0x100
[22697.845648]  ? preempt_schedule_irq+0x4e/0xb0
[22697.845653]  ? retint_kernel+0x1b/0x1d
[22697.845659]  ? schedule+0xfb/0x3b0
[22697.845663]  ? __schedule+0x19b0/0x19b0
[22697.845669]  ? _raw_spin_unlock_irq+0xb9/0x120
[22697.845674]  ? _raw_spin_unlock_irq+0xbe/0x120
[22697.845678]  ? _raw_spin_unlock+0x120/0x120
[22697.845683]  worker_thread+0x211/0x1790
[22697.845692]  ? pick_next_task_fair+0x97d/0x10f0
[22697.845697]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[22697.845703]  ? tick_nohz_dep_clear_signal+0x20/0x20
[22697.845708]  ? _raw_spin_unlock_irq+0xbe/0x120
[22697.845713]  ? _raw_spin_unlock+0x120/0x120
[22697.845718]  ? compat_start_thread+0x70/0x70
[22697.845722]  ? finish_task_switch+0x27d/0x7f0
[22697.845727]  ? sched_clock_cpu+0x18/0x1e0
[22697.845733]  ? ret_from_fork+0x1f/0x30
[22697.845739]  ? pci_mmcfg_check_reserved+0x100/0x100
[22697.845744]  ? unix_write_space+0x410/0x410
[22697.845749]  ? cyc2ns_read_end+0x20/0x20
[22697.845755]  ? schedule+0xfb/0x3b0
[22697.845759]  ? __schedule+0x19b0/0x19b0
[22697.845765]  ? remove_wait_queue+0x2b0/0x2b0
[22697.845770]  ? arch_vtime_task_switch+0xee/0x190
[22697.845774]  ? _raw_spin_unlock_irqrestore+0xc2/0x130
[22697.845778]  ? _raw_spin_unlock_irq+0x120/0x120
[22697.845783]  ? trace_event_raw_event_workqueue_work+0x170/0x170
[22697.845788]  kthread+0x2d4/0x390
[22697.845793]  ? kthread_create_worker+0xd0/0xd0
[22697.845797]  ret_from_fork+0x1f/0x30

[22697.845809] Allocated by task 2378:
[22697.845817]  kasan_kmalloc+0xa0/0xd0
[22697.845822]  kmem_cache_alloc_trace+0xd1/0x1e0
[22697.845829]  amdgpu_driver_open_kms+0x12b/0x4d0
[22697.845839]  drm_open+0x7c3/0x1100
[22697.845843]  drm_stub_open+0x2a8/0x400
[22697.845851]  chrdev_open+0x1eb/0x5a0
[22697.845857]  do_dentry_open+0x5a1/0xc50
[22697.845865]  path_openat+0x11d3/0x4e90
[22697.845868]  do_filp_open+0x239/0x3c0
[22697.845872]  do_sys_open+0x402/0x630
[22697.845878]  do_syscall_64+0x220/0x670
[22697.845881]  return_from_SYSCALL_64+0x0/0x65

[22697.845887] Freed by task 24090:
[22697.845892]  kasan_slab_free+0x71/0xc0
[22697.845895]  kfree+0x88/0x1b0
[22697.845900]  amdgpu_driver_postclose_kms+0x469/0x860
[22697.845904]  drm_release+0x8a8/0x1180
[22697.845909]  __fput+0x2ab/0x730
[22697.845913]  task_work_run+0x14b/0x200
[22697.845919]  do_exit+0x7c6/0x13a0
[22697.845922]  do_group_exit+0x121/0x340
[22697.845926]  SyS_exit_group+0x14/0x20
[22697.845929]  do_syscall_64+0x220/0x670
[22697.845932]  return_from_SYSCALL_64+0x0/0x65

[22697.845940] The buggy address belongs to the object at 8801c02e9100
[22697.845946] The buggy address is located 200 bytes inside of
[22697.845949] The buggy address belongs to the page:
[22697.845958] page:ea000700ba00 count:1 mapcount:0 mapping:  
(null) index:0x0 compound_mapcount: 0
[22697.845967] flags: 0x20008100(slab|head)
[22697.845977] raw: 20008100   
0001000f000f
[22697.845982] raw: dead0100 dead0200 8803f3402a80 

[22697.845985] page dumped because: kasan: bad access detected

[22697.845990] Memory state around the buggy address:
[22697.845995]  8801c02e9080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc
[22697.845999]

Re: Kernel crash/Null pointer dereference on vblank

2017-11-22 Thread Johannes Hirte

On 2017 Nov 22, Martin Babutzka wrote:
>Dear AMD Developers,
>At first congratulations for the DC code submission to the 4.15 kernel.
>Unfortunately the major regression which I reported on 29.09., 06.10.,
>02.11. and 05.11. still exists. But this time I got additional
>debugging information maybe this helps to fix it.
>
>Summary: I am running Xubuntu 17.10 with the amd-staging-drm-next
>kernel patched to 4.14.0. The latest build which I tested is from
>includes all commits up to now (including 2017-11-17 19:51:57 (GMT)
>commit 85d09ce5e5039644487e9508d6359f9f4cf64427).
>
>Some vblank operations make the kernel crash and hang up the whole
>system. The error is reproducible by enabling the screen lock or the
>suspend mode. The system can not return to proper state from either of
>these (after all I am not 100% sure it is the same error). Debugging is
> easier with screen lock. Attached you can find the kernel crash and
>the dce110_vblank_set function modified by some kernel prints. It looks
>like the function is called twice and does not work the second time.
>The whole code around dce110_vblank_set also looks interrupt-ish -
>could this be a race condition or timing problem? Objects being cleared
>from memory and then accessed by dce110_vblank_set?
>
>Bug reports on this issue:
>https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/37
>https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/29
>
>Many regards,
>Martin (M-bab)

I'm having the same problem on Carrizo. The system crashes when resuming
from S3 and dc is on. With dc off, everything works fine. I was able to
catch some debug info with kasan:

Nov 22 15:52:19 probook kernel: PM: suspend entry (deep)
Nov 22 15:52:19 probook kernel: PM: Syncing filesystems ... done.
Nov 22 15:52:28 probook kernel: Freezing user space processes ... (elapsed 
0.002 seconds) done.
Nov 22 15:52:28 probook kernel: OOM killer disabled.
Nov 22 15:52:28 probook kernel: Freezing remaining freezable tasks ... (elapsed 
0.001 seconds) done.
Nov 22 15:52:28 probook kernel: Suspending console(s) (use no_console_suspend 
to debug)
Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Stopping disk
Nov 22 15:52:28 probook kernel: amdgpu :00:01.0: 8803e8075500 unpin not 
necessary
Nov 22 15:52:28 probook kernel: ACPI: Preparing to enter system sleep state S3
Nov 22 15:52:28 probook kernel: ACPI: EC: event blocked
Nov 22 15:52:28 probook kernel: ACPI: EC: EC stopped
Nov 22 15:52:28 probook kernel: PM: Saving platform NVS memory
Nov 22 15:52:28 probook kernel: Disabling non-boot CPUs ...
Nov 22 15:52:28 probook kernel: smpboot: CPU 1 is now offline
Nov 22 15:52:28 probook kernel: smpboot: CPU 2 is now offline
Nov 22 15:52:28 probook kernel: smpboot: CPU 3 is now offline
Nov 22 15:52:28 probook kernel: ACPI: Low-level resume complete
Nov 22 15:52:28 probook kernel: ACPI: EC: EC started
Nov 22 15:52:28 probook kernel: PM: Restoring platform NVS memory
Nov 22 15:52:28 probook kernel: LVT offset 0 assigned for vector 0x400
Nov 22 15:52:28 probook kernel: Enabling non-boot CPUs ...
Nov 22 15:52:28 probook kernel: x86: Booting SMP configuration:
Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 1 APIC 0x11
Nov 22 15:52:28 probook kernel:  cache: parent cpu1 should not be sleeping
Nov 22 15:52:28 probook kernel: CPU1 is up
Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 2 APIC 0x12
Nov 22 15:52:28 probook kernel:  cache: parent cpu2 should not be sleeping
Nov 22 15:52:28 probook kernel: CPU2 is up
Nov 22 15:52:28 probook kernel: smpboot: Booting Node 0 Processor 3 APIC 0x13
Nov 22 15:52:28 probook kernel:  cache: parent cpu3 should not be sleeping
Nov 22 15:52:28 probook kernel: CPU3 is up
Nov 22 15:52:28 probook kernel: ACPI: Waking up from system sleep state S3
Nov 22 15:52:28 probook kernel: ACPI: EC: event unblocked
Nov 22 15:52:28 probook kernel: [drm] PCIE GART of 1024M enabled (table at 
0x00F40004).
Nov 22 15:52:28 probook kernel: sd 0:0:0:0: [sda] Starting disk
Nov 22 15:52:28 probook kernel: r8169 :01:00.0 enp1s0: link down
Nov 22 15:52:28 probook kernel: ACPI: button: The lid device is not compliant 
to SW_LID.
Nov 22 15:52:28 probook kernel: usb 3-1.1: reset high-speed USB device number 3 
using ehci-pci
Nov 22 15:52:28 probook kernel: [drm:hwss_wait_for_blank_complete] *ERROR* DC: 
failed to blank crtc!
Nov 22 15:52:28 probook kernel: [drm] ring test on 0 succeeded in 11 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 9 succeeded in 8 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 1 succeeded in 4 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 2 succeeded in 2 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 3 succeeded in 2 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 4 succeeded in 2 usecs
Nov 22 15:52:28 probook kernel: [drm] ring test on 5 succeeded in 7 usecs
Nov 22 15:52:28 probook

[BUG] X broken on Carrizo when GFX_PG enabled

2017-08-22 Thread Johannes Hirte

Because nobody reacted on the bug report, I'm trying this way.

As mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=196337, my
system gets unusable with GFX_PG enabled, cause X doesn't start anymore.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[drm-next-4.9-wip] this function not implement on Carizzo

2016-08-15 Thread Johannes Hirte

With commit fad2af195f1abaada473f4f9e9a554c1e4db768b PowerPlay was
enabled by default for Carizzo, so I assumed this as complete and tested
again. But I still get in dmesg this:

[ powerplay ] this function not implement!
[ powerplay ] min_core_set_clock not set

There are multiple entries on startup and it happens occasional during
runtime. Is PowerPlay on Carizzo still incomplete or is this a special
problem with my system (HP ProBook 645 G2)?

regards,
  Johannes

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix the hw hang during perform system reboot and reset

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

Re: [PATCH v2] drm/amdgpu: fix gfx hang during suspend with video playback (v2)

Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec

Re: [PATCH xf86-video-amdgpu] Store FB for each CRTC in drmmode_flipdata_rec

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: BUG: KASAN: use-after-free in amdgpu_job_free_cb

BUG: KASAN: use-after-free in amdgpu_job_free_cb

Re: Fixes for 4.15-rc1

Re: Kernel crash/Null pointer dereference on vblank

Re: Kernel crash/Null pointer dereference on vblank

Re: Kernel crash/Null pointer dereference on vblank

Re: Kernel crash/Null pointer dereference on vblank

Re: Kernel crash/Null pointer dereference on vblank

[BUG] X broken on Carrizo when GFX_PG enabled

[drm-next-4.9-wip] this function not implement on Carizzo

21 matches

Site Navigation

Mail list logo

Footer information