[Bug 109978] Unprivileged user mode program can cause GPU reset

2019-11-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109978

Martin Peres  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |MOVED

--- Comment #1 from Martin Peres  ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/5.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 109978] Unprivileged user mode program can cause GPU reset

2019-03-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109978

Andre Klapper  changed:

   What|Removed |Added

 Blocks|110099  |


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=110099
[Bug 110099] Unprivileged user mode program can cause GPU reset
-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 109978] Unprivileged user mode program can cause GPU reset

2019-03-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109978

baigshakira...@gmail.com changed:

   What|Removed |Added

 Blocks||110099


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=110099
[Bug 110099] Unprivileged user mode program can cause GPU reset
-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 109978] Unprivileged user mode program can cause GPU reset

2019-03-12 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109978

Bug ID: 109978
   Summary: Unprivileged user mode program can cause GPU reset
   Product: DRI
   Version: XOrg git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: major
  Priority: medium
 Component: DRM/amdkfd
  Assignee: dri-devel@lists.freedesktop.org
  Reporter: sudols...@gmail.com

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/issues/72

Sample program which causes this (needs ROCm):

> #include 
> int main()
> {
>   parallel_for_each(hc::extent<1>(1), [=]() [[hc]]
>   {
>   asm("s_trap 2");
>   });
>   return 0;
> }

> hcc -hc main.cpp
> ./a.out

Process never ends and CTRL-C causes GPU reset which breaks all other processes
actually using rocm on that GPU. Seems trap handler expects queue handle in
s[0:1] which is set when using __builtin_trap() so without it trap handler
causes another exceptions.

System logs:

[  247.428727] qcm fence wait loop timeout expired
[  247.428730] The cp might be in an unrecoverable state due to an unsuccessful
queues preemption
[  247.428736] amdgpu :0b:00.0: GPU reset begin!
[  247.619440] amdgpu :0b:00.0: GPU reset
[  248.152762] [drm] psp mode1 reset succeed 
[  248.279461] amdgpu :0b:00.0: GPU reset succeeded, trying to resume
[  248.279584] [drm] PCIE GART of 512M enabled (table at 0x00F40090).
[  248.279639] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  248.279769] [drm] PSP is resuming...
[  248.428305] [drm] reserve 0x40 from 0xf400d0 for PSP TMR SIZE
[  248.472774] WARNING: CPU: 23 PID: 21634 at
/build/linux-uQJ2um/linux-4.15.0/kernel/kthread.c:498 kthread_park+0x67/0x80
[  248.472775] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs
msr nls_utf8 cifs ccm fscache cmac bnep binfmt_misc nls_iso8859_1 edac_mce_amd
arc4 snd_hda_codec_realtek snd_hda_codec_generic kvm_amd snd_hda_codec_hdmi kvm
snd_seq_midi irqbypass snd_hda_intel snd_seq_midi_event snd_hda_codec btusb
snd_hda_core btrtl wmi_bmof snd_rawmidi iwlmvm snd_hwdep btbcm btintel snd_pcm
snd_seq bluetooth mac80211 snd_seq_device ecdh_generic snd_timer iwlwifi ccp
snd cfg80211 soundcore k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
nct6775 hwmon_vid parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1
[  248.472823]  multipath linear raid0 amdgpu(OE) amdchash(OE) amdttm(OE)
amd_sched(OE) mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
aesni_intel aes_x86_64 amdkcl(OE) crypto_simd glue_helper amd_iommu_v2 cryptd
drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops drm dca nvme
i2c_algo_bit i2c_piix4 nvme_core ptp ahci atlantic libahci pps_core gpio_amdpt
wmi gpio_generic
[  248.472846] CPU: 23 PID: 21634 Comm: a.out Tainted: G   OE   
4.15.0-45-generic #48-Ubuntu
[  248.472847] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018
[  248.472849] RIP: 0010:kthread_park+0x67/0x80
[  248.472850] RSP: 0018:b44fc7e27ad0 EFLAGS: 00010202
[  248.472852] RAX: 0004 RBX: 9ec63f49e480 RCX:

[  248.472853] RDX: 9ec63c717198 RSI: 9ec63ea0c0c0 RDI:
9ec63dd38000
[  248.472854] RBP: b44fc7e27ae0 R08: 0051 R09:

[  248.472855] R10:  R11: 0056 R12:
9ec63ea0c0c0
[  248.472855] R13: 9ec64f4f4200 R14: 9ec63c71 R15:

[  248.472857] FS:  7fd52a286c00() GS:9ec65cdc()
knlGS:
[  248.472858] CS:  0010 DS:  ES:  CR0: 80050033
[  248.472859] CR2: 7f0c07687a98 CR3: 00081b5b6000 CR4:
003406e0
[  248.472860] Call Trace:
[  248.472865]  amddrm_sched_entity_fini+0x44/0x1b0 [amd_sched]
[  248.472868]  amddrm_sched_entity_destroy+0x1f/0x30 [amd_sched]
[  248.472907]  amdgpu_vm_fini+0xbb/0x4f0 [amdgpu]
[  248.472942]  amdgpu_driver_postclose_kms+0x15b/0x2b0 [amdgpu]
[  248.472952]  drm_release+0x26b/0x390 [drm]
[  248.472955]  __fput+0xea/0x220
[  248.472957]  fput+0xe/0x10
[  248.472959]  task_work_run+0x9d/0xc0
[  248.472961]  do_exit+0x2ec/0xb40
[  248.472963]  do_group_exit+0x43/0xb0
[  248.472965]  get_signal+0x27b/0x590
[  248.472968]  do_signal+0x37/0x730
[  248.472971]  ? __switch_to_asm+0x34/0x70
[  248.472973]  ? __switch_to_asm+0x40/0x70
[  248.472976]  ? do_vfs_ioctl+0xa8/0x630
[  248.472978]  ? __schedule+0x299/0x8a0
[  248.472980]  exit_to_usermode_loop+0x73/0xd0
[  248.472982]  do_syscall_64+0x115/0x130
[  248.472984]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  248.472986] RIP: 0033:0x7fd528bdd5d7
[  248.472987] RSP: 002b:7ffe830d4778 EFLAGS: