Re: unkillable process consuming 100% cpu
On Mon, Nov 11, 2019 at 01:22:09PM +0100, Hans Petter Selasky wrote: > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index a6e0a16ae..0697d70f4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c Are you using ports/graphics/drm-devel-kmod? This file does not exist in drm-current-kmod. > @@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct > amdgpu_bo *bo, Using 'nm *.ko | grep eviction_fence' in /boot/modules shows that none of the modules contain amdgpu_amdkfd_remove_eviction_fence(). -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Wed, Nov 13, 2019 at 04:22:19PM +0100, Hans Petter Selasky wrote: > On 2019-11-13 15:52, Steve Kargl wrote: > > at /usr/src/sys/amd64/amd64/trap.c:743 > > #7 0x808b0468 in trap (frame=0xfe00b460e0c0) > > at /usr/src/sys/amd64/amd64/trap.c:407 > > #8 > > #9 0x in ?? () > > #10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248) > > at > > /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720 > > #11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, > > flags=2147483647) > > Hi, > > I don't see any function call here. Can you try to double check the > backtrace? > > Which version of FreeBSD is this? > % uname -a (trimmed) FreeBSD 13.0-CURRENT r353571 % kgdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.2 % bt ... #7 0x808b0468 in trap (frame=0xfe00b460e0c0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 #9 0x in ?? () #10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720 #11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, flags=2147483647) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804 #12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156 Looking at radeon_ttm.c, line 720 is the if-stmt in this function static struct radeon_ttm_tt *radeon_ttm_tt_to_gtt(struct ttm_tt *ttm) { if (!ttm || ttm->func != _backend_func) return NULL; return (struct radeon_ttm_tt *)ttm; } (kgdb) p ttm->func $2 = (struct ttm_backend_func *) 0x231 (kgdb) p _backend_func $4 = (struct ttm_backend_func *) 0x8186d870 AFAIK, 0x231 is not a valid address. (kgdb) p *ttm $5 = {bdev = 0x819021ef, func = 0x231, dummy_read_page = 0x0, pages = 0xf800612c, page_flags = 2173789980, num_pages = 0, sg = 0x0, glob = 0x2a, swap_storage = 0xf8017fe84e00, caching_state = (unknown: 145613312), state = (tt_unbound | tt_unpopulated | unknown: 4294965248)} Moving to frame 12 suggests that the stack is corrupt (whether by the dump or the crash I don't know) (kgdb) frame 12 #12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156 156 if (rdev->flags & RADEON_IS_PX) (kgdb) p *dev Cannot access memory at address 0xf8017fe84e00 (kgdb) p rdev $25 = (struct radeon_device *) 0x0 -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On 2019-11-13 15:52, Steve Kargl wrote: at /usr/src/sys/amd64/amd64/trap.c:743 #7 0x808b0468 in trap (frame=0xfe00b460e0c0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 #9 0x in ?? () #10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720 #11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, flags=2147483647) Hi, I don't see any function call here. Can you try to double check the backtrace? Which version of FreeBSD is this? --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Wed, Nov 13, 2019 at 09:10:06AM +0100, Hans Petter Selasky wrote: > On 2019-11-13 01:30, Steve Kargl wrote: > > > > I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023, > > rebooting, and have been pounding on the system with workloads that are > > similar to what the system was doing during the lockups. So far, I > > cannot ge the system lock-up. Looks like your patch fixes (or at > > least helps). Thanks for taking a look at the problem. > > > > Can you apply the kdb.diff on top and check dmesg for prints? > I could not find the amdgpu_amdkfd_gpuvm.c file when I went looking. Is it autogenerated? I also spoke too soon. I got a panic after my reply above. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 15 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfe00b460e188 frame pointer = 0x28:0xfe00b460e1c0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 877 (X:rcs0) trap number = 12 panic: page fault cpuid = 5 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00b460dde0 vpanic() at vpanic+0x17e/frame 0xfe00b460de40 panic() at panic+0x43/frame 0xfe00b460dea0 trap_fatal() at trap_fatal+0x388/frame 0xfe00b460df10 trap_pfault() at trap_pfault+0x4f/frame 0xfe00b460df80 trap() at trap+0x288/frame 0xfe00b460e0b0 calltrap() at calltrap+0x8/frame 0xfe00b460e0b0 --- trap 0xc, rip = 0, rsp = 0xfe00b460e188, rbp = 0xfe00b460e1c0 --- ??() at 0/frame 0xfe00b460e1c0 radeon_cs_ioctl() at radeon_cs_ioctl+0xa0b/frame 0xfe00b460e640 drm_ioctl_kernel() at drm_ioctl_kernel+0xf1/frame 0xfe00b460e680 drm_ioctl() at drm_ioctl+0x279/frame 0xfe00b460e770 linux_file_ioctl() at linux_file_ioctl+0x298/frame 0xfe00b460e7d0 kern_ioctl() at kern_ioctl+0x284/frame 0xfe00b460e840 sys_ioctl() at sys_ioctl+0x157/frame 0xfe00b460e910 amd64_syscall() at amd64_syscall+0x273/frame 0xfe00b460ea30 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00b460ea30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x200cc6bfa, rsp = 0x7fffbfffde98, rbp = 0x7fffbfffdec0 --- Uptime: 5h9m5s Dumping 1472 out of 16327 MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 warning: Source file is more recent than executable. 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x805de452 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x805de8a6 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:908 #4 0x805de6c3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:835 #5 0x808b0d58 in trap_fatal (frame=0xfe00b460e0c0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:925 #6 0x808b0daf in trap_pfault (frame=0xfe00b460e0c0, usermode=, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:743 #7 0x808b0468 in trap (frame=0xfe00b460e0c0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 #9 0x in ?? () #10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720 #11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, flags=2147483647) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804 #12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156 #13 0x818a9e81 in drm_ioctl_kernel (linux_file=, func=0xfe00b460e428, kdata=0xfe00b31eb000, flags=1521620552) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:760 #14 0x818aa129 in drm_ioctl (filp=0xf80061198e00, cmd=, arg=65536) at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:856 #15 0x807c8098 in linux_file_ioctl_sub (fp=, filp=, fop=, cmd=, data=, td=) at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:965 #16 linux_file_ioctl (fp=, cmd=, data=, cred=, td=0xf800612c) at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1558 #17 0x8063ed34 in fo_ioctl (fp=, com=3223348326, data=0x7fff, active_cred=0xfe001f7e6250, td=0xf800612c) at /usr/src/sys/sys/file.h:340 #18 kern_ioctl (td=, fd=9, com=3223348326, data=0x7fff ) at /usr/src/sys/kern/sys_generic.c:801 #19 0x8063ea37 in sys_ioctl
Re: unkillable process consuming 100% cpu
On 2019-11-13 01:30, Steve Kargl wrote: On Tue, Nov 12, 2019 at 06:48:22PM +0100, Hans Petter Selasky wrote: On 2019-11-12 18:31, Steve Kargl wrote: Can you open the radeonkms.ko in gdb83 from ports and type: l *(radeon_gem_busy_ioctl+0x30) % /boot/modules/radeonkms.ko (gdb) l *(radeon_gem_busy_ioctl+0x30) 0xa12b0 is in radeon_gem_busy_ioctl (/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453). 448 /usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c: No such file or directory. (gdb) Like expected. I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023, rebooting, and have been pounding on the system with workloads that are similar to what the system was doing during the lockups. So far, I cannot ge the system lock-up. Looks like your patch fixes (or at least helps). Thanks for taking a look at the problem. Can you apply the kdb.diff on top and check dmesg for prints? --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Tue, Nov 12, 2019 at 06:48:22PM +0100, Hans Petter Selasky wrote: > On 2019-11-12 18:31, Steve Kargl wrote: > >> Can you open the radeonkms.ko in gdb83 from ports and type: > >> > >> l *(radeon_gem_busy_ioctl+0x30) > >> > > % /boot/modules/radeonkms.ko > > (gdb) l *(radeon_gem_busy_ioctl+0x30) > > 0xa12b0 is in radeon_gem_busy_ioctl > > (/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453). > > 448 > > /usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c: > > No such file or directory. > > (gdb) > > Like expected. > I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023, rebooting, and have been pounding on the system with workloads that are similar to what the system was doing during the lockups. So far, I cannot ge the system lock-up. Looks like your patch fixes (or at least helps). Thanks for taking a look at the problem. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On 2019-11-12 18:31, Steve Kargl wrote: Can you open the radeonkms.ko in gdb83 from ports and type: l *(radeon_gem_busy_ioctl+0x30) % /boot/modules/radeonkms.ko (gdb) l *(radeon_gem_busy_ioctl+0x30) 0xa12b0 is in radeon_gem_busy_ioctl (/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453). 448 /usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c: No such file or directory. (gdb) Like expected. --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Mon, Nov 11, 2019 at 10:34:23AM +0100, Hans Petter Selasky wrote: > Hi, > > Can you open the radeonkms.ko in gdb83 from ports and type: > > l *(radeon_gem_busy_ioctl+0x30) > % /boot/modules/radeonkms.ko (gdb) l *(radeon_gem_busy_ioctl+0x30) 0xa12b0 is in radeon_gem_busy_ioctl (/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453). 448 /usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c: No such file or directory. (gdb) -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Mon, Nov 11, 2019 at 02:22:55PM +0100, Hans Petter Selasky wrote: > On 2019-11-08 23:09, Steve Kargl wrote: > > Here's 'procstat -kk' for the stuck process with the long line wrapped. > > Can you run this command a couple of times and see if the backtrace changes? > > --HPS I was AFK for a few days. I'll try all your suggestions tomorrow. The two lock ups occurred while using chrome to watch/listen to youtube and using libreoffice to prepare a presentation. I'll see if I can reproduce the issue. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On 2019-11-08 23:09, Steve Kargl wrote: Here's 'procstat -kk' for the stuck process with the long line wrapped. Can you run this command a couple of times and see if the backtrace changes? --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Mon, Nov 11, 2019 at 01:22:09PM +0100, Hans Petter Selasky wrote: > On 2019-11-11 11:44, Hans Petter Selasky wrote: > > Seems like we can optimise away one more write memory barrier. > > > > If you are building from ports, simply: > > > > cd work/kms-drm* > > cat seqlock.diff | patch -p1 > > > > Hi, > > Here is one more debug patch you can try. See if you get that print > added in the patch in dmesg. > > --HPS > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index a6e0a16ae..0697d70f4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -31,6 +31,8 @@ > #include "amdgpu_vm.h" > #include "amdgpu_amdkfd.h" > > +#include > + > /* Special VM and GART address alignment needed for VI pre-Fiji due to > * a HW bug. > */ > @@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct > amdgpu_bo *bo, > *ef_count = 0; > } > > + if (resv != NULL && > + (struct thread *)SX_OWNER(resv->lock.base.sx.sx_lock) != curthread) > { This is really should be spelled as sx_xlocked(). > + printf("Called unlocked\n"); > + kdb_backtrace(); > + } > + > old = reservation_object_get_list(resv); > if (!old) > return 0; > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On 2019-11-11 11:44, Hans Petter Selasky wrote: Seems like we can optimise away one more write memory barrier. If you are building from ports, simply: cd work/kms-drm* cat seqlock.diff | patch -p1 Hi, Here is one more debug patch you can try. See if you get that print added in the patch in dmesg. --HPS diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a6e0a16ae..0697d70f4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -31,6 +31,8 @@ #include "amdgpu_vm.h" #include "amdgpu_amdkfd.h" +#include + /* Special VM and GART address alignment needed for VI pre-Fiji due to * a HW bug. */ @@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, *ef_count = 0; } + if (resv != NULL && + (struct thread *)SX_OWNER(resv->lock.base.sx.sx_lock) != curthread) { + printf("Called unlocked\n"); + kdb_backtrace(); + } + old = reservation_object_get_list(resv); if (!old) return 0; ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
Seems like we can optimise away one more write memory barrier. If you are building from ports, simply: cd work/kms-drm* cat seqlock.diff | patch -p1 --HPS diff --git a/linuxkpi/gplv2/include/linux/reservation.h b/linuxkpi/gplv2/include/linux/reservation.h index b975f792c..0ce922a0e 100644 --- a/linuxkpi/gplv2/include/linux/reservation.h +++ b/linuxkpi/gplv2/include/linux/reservation.h @@ -94,7 +94,7 @@ reservation_object_init(struct reservation_object *obj) { ww_mutex_init(>lock, _ww_class); - __seqcount_init(>seq, reservation_seqcount_string, _seqcount_class); + seqcount_init(>seq); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); obj->staged = NULL; diff --git a/linuxkpi/gplv2/include/linux/seqlock.h b/linuxkpi/gplv2/include/linux/seqlock.h index e86351810..115ad5e68 100644 --- a/linuxkpi/gplv2/include/linux/seqlock.h +++ b/linuxkpi/gplv2/include/linux/seqlock.h @@ -1,410 +1,148 @@ #ifndef __LINUX_SEQLOCK_H -#define __LINUX_SEQLOCK_H -/* - * Reader/writer consistent mechanism without starving writers. This type of - * lock for data where the reader wants a consistent set of information - * and is willing to retry if the information changes. There are two types - * of readers: - * 1. Sequence readers which never block a writer but they may have to retry - *if a writer is in progress by detecting change in sequence number. - *Writers do not wait for a sequence reader. - * 2. Locking readers which will wait if a writer or another locking reader - *is in progress. A locking reader in progress will also block a writer - *from going forward. Unlike the regular rwlock, the read lock here is - *exclusive so that only one locking reader can get it. - * - * This is not as cache friendly as brlock. Also, this may not work well - * for data that contains pointers, because any writer could - * invalidate a pointer that a reader was following. - * - * Expected non-blocking reader usage: - * do { - * seq = read_seqbegin(); - * ... - * } while (read_seqretry(, seq)); - * - * - * On non-SMP the spin locks disappear but the writer still needs - * to increment the sequence variables because an interrupt routine could - * change the state of the data. - * - * Based on x86_64 vsyscall gettimeofday - * by Keith Owens and Andrea Arcangeli - */ +#define __LINUX_SEQLOCK_H #include #include -#include #include #include +#include #include - -/* - * Version using sequence counter only. - * This can be used when code has its own mutex protecting the - * updating starting before the write_seqcountbeqin() and ending - * after the write_seqcount_end(). - */ typedef struct seqcount { - unsigned sequence; -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map dep_map; -#endif + volatile unsigned sequence; } seqcount_t; - -#define lockdep_init_map(a, b, c, d) - -static inline void __seqcount_init(seqcount_t *s, const char *name, - struct lock_class_key *key) +static inline void +seqcount_init(seqcount_t *s) { - /* - * Make sure we are not reinitializing a held lock: - */ - lockdep_init_map(>dep_map, name, key, 0); s->sequence = 0; } -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define SEQCOUNT_DEP_MAP_INIT(lockname) \ - .dep_map = { .name = #lockname } \ - -# define seqcount_init(s)\ - do { \ - static struct lock_class_key __key; \ - __seqcount_init((s), #s, &__key); \ - } while (0) +#define __seqcount_init(a,b,c) \ + seqcount_init(a) -static inline void seqcount_lockdep_reader_access(seqcount_t *s) -{ - seqcount_t *l = (seqcount_t *)s; - unsigned long flags; - - local_irq_save(flags); - seqcount_acquire_read(>dep_map, 0, 0, _RET_IP_); - seqcount_release(>dep_map, 1, _RET_IP_); - local_irq_restore(flags); +#define SEQCNT_ZERO(lockname) { \ + .sequence = 0\ } -#else -# define SEQCOUNT_DEP_MAP_INIT(lockname) -# define seqcount_init(s) __seqcount_init(s, NULL, NULL) -# define seqcount_lockdep_reader_access(x) -#endif - -#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)} - - -/** - * __read_seqcount_begin - begin a seq-read critical section (without barrier) - * @s: pointer to seqcount_t - * Returns: count to be passed to read_seqcount_retry - * - * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb() - * barrier. Callers should ensure that smp_rmb() or equivalent ordering is - * provided before actually loading any of the variables that are to be - * protected in this critical section. - * - * Use carefully, only in critical code, and comment how the barrier is - * provided. - */ -static inline unsigned __read_seqcount_begin(seqcount_t *s) +static inline unsigned +__read_seqcount_begin(seqcount_t *s) { unsigned ret; repeat: - ret = READ_ONCE(s->sequence); + ret = s->sequence; if (unlikely(ret & 1)) { cpu_relax(); goto repeat; } - return ret; + return (ret); } -/** - * raw_read_seqcount - Read the raw seqcount - * @s: pointer to seqcount_t - * Returns: count
Re: unkillable process consuming 100% cpu
On 2019-11-11 10:34, Hans Petter Selasky wrote: Hi, Can you open the radeonkms.ko in gdb83 from ports and type: l *(radeon_gem_busy_ioctl+0x30) Hi, I suspect there is a memory race in the seqlock framework. Can you try the attached patch and re-build? Is this issue easily reproducible? --HPS diff --git a/linuxkpi/gplv2/include/linux/reservation.h b/linuxkpi/gplv2/include/linux/reservation.h index b975f792c..0ce922a0e 100644 --- a/linuxkpi/gplv2/include/linux/reservation.h +++ b/linuxkpi/gplv2/include/linux/reservation.h @@ -94,7 +94,7 @@ reservation_object_init(struct reservation_object *obj) { ww_mutex_init(>lock, _ww_class); - __seqcount_init(>seq, reservation_seqcount_string, _seqcount_class); + seqcount_init(>seq); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); obj->staged = NULL; diff --git a/linuxkpi/gplv2/include/linux/seqlock.h b/linuxkpi/gplv2/include/linux/seqlock.h index e86351810..940bd8e90 100644 --- a/linuxkpi/gplv2/include/linux/seqlock.h +++ b/linuxkpi/gplv2/include/linux/seqlock.h @@ -1,410 +1,149 @@ #ifndef __LINUX_SEQLOCK_H -#define __LINUX_SEQLOCK_H -/* - * Reader/writer consistent mechanism without starving writers. This type of - * lock for data where the reader wants a consistent set of information - * and is willing to retry if the information changes. There are two types - * of readers: - * 1. Sequence readers which never block a writer but they may have to retry - *if a writer is in progress by detecting change in sequence number. - *Writers do not wait for a sequence reader. - * 2. Locking readers which will wait if a writer or another locking reader - *is in progress. A locking reader in progress will also block a writer - *from going forward. Unlike the regular rwlock, the read lock here is - *exclusive so that only one locking reader can get it. - * - * This is not as cache friendly as brlock. Also, this may not work well - * for data that contains pointers, because any writer could - * invalidate a pointer that a reader was following. - * - * Expected non-blocking reader usage: - * do { - * seq = read_seqbegin(); - * ... - * } while (read_seqretry(, seq)); - * - * - * On non-SMP the spin locks disappear but the writer still needs - * to increment the sequence variables because an interrupt routine could - * change the state of the data. - * - * Based on x86_64 vsyscall gettimeofday - * by Keith Owens and Andrea Arcangeli - */ +#define __LINUX_SEQLOCK_H #include #include -#include #include #include +#include #include - -/* - * Version using sequence counter only. - * This can be used when code has its own mutex protecting the - * updating starting before the write_seqcountbeqin() and ending - * after the write_seqcount_end(). - */ typedef struct seqcount { - unsigned sequence; -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map dep_map; -#endif + volatile unsigned sequence; } seqcount_t; - -#define lockdep_init_map(a, b, c, d) - -static inline void __seqcount_init(seqcount_t *s, const char *name, - struct lock_class_key *key) +static inline void +seqcount_init(seqcount_t *s) { - /* - * Make sure we are not reinitializing a held lock: - */ - lockdep_init_map(>dep_map, name, key, 0); s->sequence = 0; } -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define SEQCOUNT_DEP_MAP_INIT(lockname) \ - .dep_map = { .name = #lockname } \ - -# define seqcount_init(s)\ - do { \ - static struct lock_class_key __key; \ - __seqcount_init((s), #s, &__key); \ - } while (0) +#define __seqcount_init(a,b,c) \ + seqcount_init(a) -static inline void seqcount_lockdep_reader_access(seqcount_t *s) -{ - seqcount_t *l = (seqcount_t *)s; - unsigned long flags; - - local_irq_save(flags); - seqcount_acquire_read(>dep_map, 0, 0, _RET_IP_); - seqcount_release(>dep_map, 1, _RET_IP_); - local_irq_restore(flags); +#define SEQCNT_ZERO(lockname) { \ + .sequence = 0\ } -#else -# define SEQCOUNT_DEP_MAP_INIT(lockname) -# define seqcount_init(s) __seqcount_init(s, NULL, NULL) -# define seqcount_lockdep_reader_access(x) -#endif - -#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)} - - -/** - * __read_seqcount_begin - begin a seq-read critical section (without barrier) - * @s: pointer to seqcount_t - * Returns: count to be passed to read_seqcount_retry - * - * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb() - * barrier. Callers should ensure that smp_rmb() or equivalent ordering is - * provided before actually loading any of the variables that are to be - * protected in this critical section. - * - * Use carefully, only in critical code, and comment how the barrier is - * provided. - */ -static inline unsigned __read_seqcount_begin(seqcount_t *s) +static inline unsigned +__read_seqcount_begin(seqcount_t *s) { unsigned ret; repeat: - ret = READ_ONCE(s->sequence); + ret = s->sequence; if (unlikely(ret & 1)) { cpu_relax(); goto
Re: unkillable process consuming 100% cpu
Hi, Can you open the radeonkms.ko in gdb83 from ports and type: l *(radeon_gem_busy_ioctl+0x30) --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Thu, Nov 07, 2019 at 03:32:23PM -0500, Mark Johnston wrote: > On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote: > > I haven't seen anyone post about an unkillable process > > (even by root), which consumes 100% cpu. > > > > last pid: 4592; load averages: 1.24, 1.08, 0.74 up 13+20:21:20 > > 12:26:29 > > 68 processes: 2 running, 66 sleeping > > CPU: 0.1% user, 0.0% nice, 12.6% system, 0.0% interrupt, 87.2% idle > > Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M > > Free > > Swap: 16G Total, 24M Used, 16G Free > > > > PID USERNAMETHR PRI NICE SIZERES STATEC TIMEWCPU > > COMMAND > > 69092 kargl 2 450 342M 148M CPU2 2 12:51 100.07% > > chrome > > > > > > Neither of these have an effect. > > > > kill -1 69092 > > kill -9 69069 > > > > Attempts to attach gdb831 to -p 69092 leads to hung xterm. > > Could you please show us the output of "procstat -kk 69092"? Just had another lock-up. A force 'shutdown -r now' from a remote terminal led to a console message about an unkillable process. Here's 'procstat -kk' for the stuck process with the long line wrapped. PIDTID COMM TDNAME KSTACK 877 100161 Xorg - radeon_gem_busy_ioctl+0x30 drm_ioctl_kernel+0xf1 drm_ioctl+0x279 linux_file_ioctl+0x298 kern_ioctl+0x284 sys_ioctl+0x157 amd64_syscall+0x273 fast_syscall_common+0x101 877 100344 Xorg X:rcs0 mi_switch+0xcb sleepq_catch_signals+0x35d sleepq_wait_sig+0xc _sleep+0x1bd umtxq_sleep+0x132 do_wait+0x3d6 __umtx_op_wait_uint_private+0x7e amd64_syscall+0x273 fast_syscall_common+0x101 It looks like radeonkms+drm is getting stuck. -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Thu, Nov 07, 2019 at 03:32:23PM -0500, Mark Johnston wrote: > On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote: > > I haven't seen anyone post about an unkillable process > > (even by root), which consumes 100% cpu. > > > > last pid: 4592; load averages: 1.24, 1.08, 0.74 up 13+20:21:20 > > 12:26:29 > > 68 processes: 2 running, 66 sleeping > > CPU: 0.1% user, 0.0% nice, 12.6% system, 0.0% interrupt, 87.2% idle > > Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M > > Free > > Swap: 16G Total, 24M Used, 16G Free > > > > PID USERNAMETHR PRI NICE SIZERES STATEC TIMEWCPU > > COMMAND > > 69092 kargl 2 450 342M 148M CPU2 2 12:51 100.07% > > chrome > > > > > > Neither of these have an effect. > > > > kill -1 69092 > > kill -9 69069 > > > > Attempts to attach gdb831 to -p 69092 leads to hung xterm. > > Could you please show us the output of "procstat -kk 69092"? Unfortunately, no. I just rebooted the system to kill 69092. During 'shutdown -r now', a message appeared on the console warning that some processes would not die. Then 'shutdown -r now' hung the console. :( Before rebooting I did try a number of ps and procstat commands, 69092 was chrome: --type=gpu-process --field-trial-handle=long-string-of-number --gpu-preferences=long-string-with-IAs So, it seems that drm-current-kmod may not be happy. For the record, uname gives FreeBSD 13.0-CURRENT r353571 -- Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: unkillable process consuming 100% cpu
On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote: > I haven't seen anyone post about an unkillable process > (even by root), which consumes 100% cpu. > > last pid: 4592; load averages: 1.24, 1.08, 0.74 up 13+20:21:20 > 12:26:29 > 68 processes: 2 running, 66 sleeping > CPU: 0.1% user, 0.0% nice, 12.6% system, 0.0% interrupt, 87.2% idle > Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M Free > Swap: 16G Total, 24M Used, 16G Free > > PID USERNAMETHR PRI NICE SIZERES STATEC TIMEWCPU COMMAND > 69092 kargl 2 450 342M 148M CPU2 2 12:51 100.07% chrome > > > Neither of these have an effect. > > kill -1 69092 > kill -9 69069 > > Attempts to attach gdb831 to -p 69092 leads to hung xterm. Could you please show us the output of "procstat -kk 69092"? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"