subject:"Re\: unkillable process consuming 100% cpu"

Re: unkillable process consuming 100% cpu

2019-11-13 Thread Steve Kargl

On Mon, Nov 11, 2019 at 01:22:09PM +0100, Hans Petter Selasky wrote:

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a6e0a16ae..0697d70f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

Are you using ports/graphics/drm-devel-kmod?
This file does not exist in drm-current-kmod.

> @@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
> amdgpu_bo *bo,

Using 'nm *.ko | grep eviction_fence' in /boot/modules shows
that none of the modules contain amdgpu_amdkfd_remove_eviction_fence().

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-13 Thread Steve Kargl

On Wed, Nov 13, 2019 at 04:22:19PM +0100, Hans Petter Selasky wrote:
> On 2019-11-13 15:52, Steve Kargl wrote:
> >  at /usr/src/sys/amd64/amd64/trap.c:743
> > #7  0x808b0468 in trap (frame=0xfe00b460e0c0)
> >  at /usr/src/sys/amd64/amd64/trap.c:407
> > #8  
> > #9  0x in ?? ()
> > #10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248)
> >  at 
> > /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720
> > #11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1,
> >  flags=2147483647)
> 
> Hi,
> 
> I don't see any function call here. Can you try to double check the 
> backtrace?
> 
> Which version of FreeBSD is this?
> 

% uname -a (trimmed)
FreeBSD 13.0-CURRENT r353571

% kgdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.2
% bt
...
#7  0x808b0468 in trap (frame=0xfe00b460e0c0)
at /usr/src/sys/amd64/amd64/trap.c:407
#8  
#9  0x in ?? ()
#10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720
#11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, 
flags=2147483647)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804
#12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156

Looking at radeon_ttm.c, line 720 is the if-stmt in this function

static struct radeon_ttm_tt *radeon_ttm_tt_to_gtt(struct ttm_tt *ttm)
{
 if (!ttm || ttm->func != _backend_func)
  return NULL;
 return (struct radeon_ttm_tt *)ttm;
}

(kgdb) p ttm->func
$2 = (struct ttm_backend_func *) 0x231
(kgdb) p _backend_func
$4 = (struct ttm_backend_func *) 0x8186d870 

AFAIK, 0x231 is not a valid address.

(kgdb) p *ttm
$5 = {bdev = 0x819021ef, func = 0x231, dummy_read_page = 0x0, 
  pages = 0xf800612c, page_flags = 2173789980, num_pages = 0, 
  sg = 0x0, glob = 0x2a, swap_storage = 0xf8017fe84e00, 
  caching_state = (unknown: 145613312), 
  state = (tt_unbound | tt_unpopulated | unknown: 4294965248)}

Moving to frame 12 suggests that the stack is corrupt (whether
by the dump or the crash I don't know)

(kgdb) frame 12
#12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156
156 if (rdev->flags & RADEON_IS_PX)
(kgdb) p *dev
Cannot access memory at address 0xf8017fe84e00
(kgdb) p rdev
$25 = (struct radeon_device *) 0x0


-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-13 Thread Hans Petter Selasky


On 2019-11-13 15:52, Steve Kargl wrote:

 at /usr/src/sys/amd64/amd64/trap.c:743
#7  0x808b0468 in trap (frame=0xfe00b460e0c0)
 at /usr/src/sys/amd64/amd64/trap.c:407
#8  
#9  0x in ?? ()
#10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248)
 at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720
#11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1,
 flags=2147483647)


Hi,

I don't see any function call here. Can you try to double check the 
backtrace?


Which version of FreeBSD is this?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-13 Thread Steve Kargl

On Wed, Nov 13, 2019 at 09:10:06AM +0100, Hans Petter Selasky wrote:
> On 2019-11-13 01:30, Steve Kargl wrote:
> > 
> > I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023,
> > rebooting, and have been pounding on the system with workloads that are
> > similar to what the system was doing during the lockups.  So far, I
> > cannot ge the system lock-up.  Looks like your patch fixes (or at
> > least helps).  Thanks for taking a look at the problem.
> > 
> 
> Can you apply the kdb.diff on top and check dmesg for prints?
> 

I could not find the amdgpu_amdkfd_gpuvm.c file when I went looking.
Is it autogenerated?

I also spoke too soon. I got a panic after my reply above.

Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 15
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00b460e188
frame pointer   = 0x28:0xfe00b460e1c0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 877 (X:rcs0)
trap number = 12
panic: page fault
cpuid = 5

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00b460dde0
vpanic() at vpanic+0x17e/frame 0xfe00b460de40
panic() at panic+0x43/frame 0xfe00b460dea0
trap_fatal() at trap_fatal+0x388/frame 0xfe00b460df10
trap_pfault() at trap_pfault+0x4f/frame 0xfe00b460df80
trap() at trap+0x288/frame 0xfe00b460e0b0
calltrap() at calltrap+0x8/frame 0xfe00b460e0b0
--- trap 0xc, rip = 0, rsp = 0xfe00b460e188, rbp = 0xfe00b460e1c0 ---
??() at 0/frame 0xfe00b460e1c0
radeon_cs_ioctl() at radeon_cs_ioctl+0xa0b/frame 0xfe00b460e640
drm_ioctl_kernel() at drm_ioctl_kernel+0xf1/frame 0xfe00b460e680
drm_ioctl() at drm_ioctl+0x279/frame 0xfe00b460e770
linux_file_ioctl() at linux_file_ioctl+0x298/frame 0xfe00b460e7d0
kern_ioctl() at kern_ioctl+0x284/frame 0xfe00b460e840
sys_ioctl() at sys_ioctl+0x157/frame 0xfe00b460e910
amd64_syscall() at amd64_syscall+0x273/frame 0xfe00b460ea30
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00b460ea30
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x200cc6bfa, rsp = 
0x7fffbfffde98, rbp = 0x7fffbfffdec0 ---
Uptime: 5h9m5s
Dumping 1472 out of 16327 MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392
#2  0x805de452 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:479
#3  0x805de8a6 in vpanic (fmt=, ap=)
at /usr/src/sys/kern/kern_shutdown.c:908
#4  0x805de6c3 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:835
#5  0x808b0d58 in trap_fatal (frame=0xfe00b460e0c0, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:925
#6  0x808b0daf in trap_pfault (frame=0xfe00b460e0c0, 
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:743
#7  0x808b0468 in trap (frame=0xfe00b460e0c0)
at /usr/src/sys/amd64/amd64/trap.c:407
#8  
#9  0x in ?? ()
#10 0x817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xf80061eeb248)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720
#11 radeon_ttm_tt_set_userptr (ttm=0xf80061eeb248, addr=1, 
flags=2147483647)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804
#12 0x817adc9b in radeon_is_px (dev=0xf8017fe84e00)
at 
/usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156
#13 0x818a9e81 in drm_ioctl_kernel (linux_file=, 
func=0xfe00b460e428, kdata=0xfe00b31eb000, flags=1521620552)
at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:760
#14 0x818aa129 in drm_ioctl (filp=0xf80061198e00, 
cmd=, arg=65536)
at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:856
#15 0x807c8098 in linux_file_ioctl_sub (fp=, 
filp=, fop=, cmd=, 
data=, td=)
at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:965
#16 linux_file_ioctl (fp=, cmd=, 
data=, cred=, td=0xf800612c)
at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1558
#17 0x8063ed34 in fo_ioctl (fp=, com=3223348326, 
data=0x7fff, active_cred=0xfe001f7e6250, td=0xf800612c)
at /usr/src/sys/sys/file.h:340
#18 kern_ioctl (td=, fd=9, com=3223348326, 
data=0x7fff )
at /usr/src/sys/kern/sys_generic.c:801
#19 0x8063ea37 in sys_ioctl

Re: unkillable process consuming 100% cpu

2019-11-13 Thread Hans Petter Selasky


On 2019-11-13 01:30, Steve Kargl wrote:

On Tue, Nov 12, 2019 at 06:48:22PM +0100, Hans Petter Selasky wrote:

On 2019-11-12 18:31, Steve Kargl wrote:

Can you open the radeonkms.ko in gdb83 from ports and type:

l *(radeon_gem_busy_ioctl+0x30)


% /boot/modules/radeonkms.ko
(gdb) l  *(radeon_gem_busy_ioctl+0x30)
0xa12b0 is in radeon_gem_busy_ioctl 
(/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453).
448 
/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:
 No such file or directory.
(gdb)


Like expected.



I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023,
rebooting, and have been pounding on the system with workloads that are
similar to what the system was doing during the lockups.  So far, I
cannot ge the system lock-up.  Looks like your patch fixes (or at
least helps).  Thanks for taking a look at the problem.



Can you apply the kdb.diff on top and check dmesg for prints?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-12 Thread Steve Kargl

On Tue, Nov 12, 2019 at 06:48:22PM +0100, Hans Petter Selasky wrote:
> On 2019-11-12 18:31, Steve Kargl wrote:
> >> Can you open the radeonkms.ko in gdb83 from ports and type:
> >>
> >> l *(radeon_gem_busy_ioctl+0x30)
> >>
> > % /boot/modules/radeonkms.ko
> > (gdb) l  *(radeon_gem_busy_ioctl+0x30)
> > 0xa12b0 is in radeon_gem_busy_ioctl 
> > (/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453).
> > 448 
> > /usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:
> >  No such file or directory.
> > (gdb)
> 
> Like expected.
> 

I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023,
rebooting, and have been pounding on the system with workloads that are
similar to what the system was doing during the lockups.  So far, I
cannot ge the system lock-up.  Looks like your patch fixes (or at 
least helps).  Thanks for taking a look at the problem.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-12 Thread Hans Petter Selasky


On 2019-11-12 18:31, Steve Kargl wrote:

Can you open the radeonkms.ko in gdb83 from ports and type:

l *(radeon_gem_busy_ioctl+0x30)


% /boot/modules/radeonkms.ko
(gdb) l  *(radeon_gem_busy_ioctl+0x30)
0xa12b0 is in radeon_gem_busy_ioctl 
(/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453).
448 
/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:
 No such file or directory.
(gdb)


Like expected.

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-12 Thread Steve Kargl

On Mon, Nov 11, 2019 at 10:34:23AM +0100, Hans Petter Selasky wrote:
> Hi,
> 
> Can you open the radeonkms.ko in gdb83 from ports and type:
> 
> l *(radeon_gem_busy_ioctl+0x30)
> 

% /boot/modules/radeonkms.ko
(gdb) l  *(radeon_gem_busy_ioctl+0x30)
0xa12b0 is in radeon_gem_busy_ioctl 
(/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:453).
448 
/usr/ports/graphics/drm-current-kmod/work/kms-drm-2d2852e/drivers/gpu/drm/radeon/radeon_gem.c:
 No such file or directory.
(gdb) 

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Steve Kargl

On Mon, Nov 11, 2019 at 02:22:55PM +0100, Hans Petter Selasky wrote:
> On 2019-11-08 23:09, Steve Kargl wrote:
> > Here's 'procstat -kk' for the stuck process with the long line wrapped.
> 
> Can you run this command a couple of times and see if the backtrace changes?
> 
> --HPS

I was AFK for a few days.  I'll try all your suggestions
tomorrow.  The two lock ups occurred while using chrome
to watch/listen to youtube and using libreoffice to prepare
a presentation.  I'll see if I can reproduce the issue.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Hans Petter Selasky


On 2019-11-08 23:09, Steve Kargl wrote:

Here's 'procstat -kk' for the stuck process with the long line wrapped.


Can you run this command a couple of times and see if the backtrace changes?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Konstantin Belousov

On Mon, Nov 11, 2019 at 01:22:09PM +0100, Hans Petter Selasky wrote:
> On 2019-11-11 11:44, Hans Petter Selasky wrote:
> > Seems like we can optimise away one more write memory barrier.
> > 
> > If you are building from ports, simply:
> > 
> > cd work/kms-drm*
> > cat seqlock.diff | patch -p1
> > 
> 
> Hi,
> 
> Here is one more debug patch you can try. See if you get that print 
> added in the patch in dmesg.
> 
> --HPS
> 

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a6e0a16ae..0697d70f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -31,6 +31,8 @@
>  #include "amdgpu_vm.h"
>  #include "amdgpu_amdkfd.h"
>  
> +#include 
> +
>  /* Special VM and GART address alignment needed for VI pre-Fiji due to
>   * a HW bug.
>   */
> @@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
> amdgpu_bo *bo,
>   *ef_count = 0;
>   }
>  
> + if (resv != NULL &&
> + (struct thread *)SX_OWNER(resv->lock.base.sx.sx_lock) != curthread) 
> {
This is really should be spelled as sx_xlocked().

> + printf("Called unlocked\n");
> + kdb_backtrace();
> + }
> +
>   old = reservation_object_get_list(resv);
>   if (!old)
>   return 0;

> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Hans Petter Selasky


On 2019-11-11 11:44, Hans Petter Selasky wrote:

Seems like we can optimise away one more write memory barrier.

If you are building from ports, simply:

cd work/kms-drm*
cat seqlock.diff | patch -p1



Hi,

Here is one more debug patch you can try. See if you get that print 
added in the patch in dmesg.


--HPS

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a6e0a16ae..0697d70f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -31,6 +31,8 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_amdkfd.h"
 
+#include 
+
 /* Special VM and GART address alignment needed for VI pre-Fiji due to
  * a HW bug.
  */
@@ -236,6 +238,12 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
 		*ef_count = 0;
 	}
 
+	if (resv != NULL &&
+	(struct thread *)SX_OWNER(resv->lock.base.sx.sx_lock) != curthread) {
+		printf("Called unlocked\n");
+		kdb_backtrace();
+	}
+
 	old = reservation_object_get_list(resv);
 	if (!old)
 		return 0;
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Hans Petter Selasky


Seems like we can optimise away one more write memory barrier.

If you are building from ports, simply:

cd work/kms-drm*
cat seqlock.diff | patch -p1

--HPS
diff --git a/linuxkpi/gplv2/include/linux/reservation.h b/linuxkpi/gplv2/include/linux/reservation.h
index b975f792c..0ce922a0e 100644
--- a/linuxkpi/gplv2/include/linux/reservation.h
+++ b/linuxkpi/gplv2/include/linux/reservation.h
@@ -94,7 +94,7 @@ reservation_object_init(struct reservation_object *obj)
 {
 	ww_mutex_init(>lock, _ww_class);
 
-	__seqcount_init(>seq, reservation_seqcount_string, _seqcount_class);
+	seqcount_init(>seq);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 	obj->staged = NULL;
diff --git a/linuxkpi/gplv2/include/linux/seqlock.h b/linuxkpi/gplv2/include/linux/seqlock.h
index e86351810..115ad5e68 100644
--- a/linuxkpi/gplv2/include/linux/seqlock.h
+++ b/linuxkpi/gplv2/include/linux/seqlock.h
@@ -1,410 +1,148 @@
 #ifndef __LINUX_SEQLOCK_H
-#define __LINUX_SEQLOCK_H
-/*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *if a writer is in progress by detecting change in sequence number.
- *Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *is in progress. A locking reader in progress will also block a writer
- *from going forward. Unlike the regular rwlock, the read lock here is
- *exclusive so that only one locking reader can get it.
- *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
- *
- * Expected non-blocking reader usage:
- * 	do {
- *	seq = read_seqbegin();
- * 	...
- *  } while (read_seqretry(, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
- */
+#define	__LINUX_SEQLOCK_H
 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
 
-
-/*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
- */
 typedef struct seqcount {
-	unsigned sequence;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-	struct lockdep_map dep_map;
-#endif
+	volatile unsigned sequence;
 } seqcount_t;
 
-
-#define lockdep_init_map(a, b, c, d)
-
-static inline void __seqcount_init(seqcount_t *s, const char *name,
-	  struct lock_class_key *key)
+static inline void
+seqcount_init(seqcount_t *s)
 {
-	/*
-	 * Make sure we are not reinitializing a held lock:
-	 */
-	lockdep_init_map(>dep_map, name, key, 0);
 	s->sequence = 0;
 }
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SEQCOUNT_DEP_MAP_INIT(lockname) \
-		.dep_map = { .name = #lockname } \
-
-# define seqcount_init(s)\
-	do {		\
-		static struct lock_class_key __key;	\
-		__seqcount_init((s), #s, &__key);	\
-	} while (0)
+#define	__seqcount_init(a,b,c) \
+	seqcount_init(a)
 
-static inline void seqcount_lockdep_reader_access(seqcount_t *s)
-{
-	seqcount_t *l = (seqcount_t *)s;
-	unsigned long flags;
-
-	local_irq_save(flags);
-	seqcount_acquire_read(>dep_map, 0, 0, _RET_IP_);
-	seqcount_release(>dep_map, 1, _RET_IP_);
-	local_irq_restore(flags);
+#define	SEQCNT_ZERO(lockname) {			\
+	.sequence = 0\
 }
 
-#else
-# define SEQCOUNT_DEP_MAP_INIT(lockname)
-# define seqcount_init(s) __seqcount_init(s, NULL, NULL)
-# define seqcount_lockdep_reader_access(x)
-#endif
-
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
-
-
-/**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
- *
- * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
- * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
- * provided before actually loading any of the variables that are to be
- * protected in this critical section.
- *
- * Use carefully, only in critical code, and comment how the barrier is
- * provided.
- */
-static inline unsigned __read_seqcount_begin(seqcount_t *s)
+static inline unsigned
+__read_seqcount_begin(seqcount_t *s)
 {
 	unsigned ret;
 
 repeat:
-	ret = READ_ONCE(s->sequence);
+	ret = s->sequence;
 	if (unlikely(ret & 1)) {
 		cpu_relax();
 		goto repeat;
 	}
-	return ret;
+	return (ret);
 }
 
-/**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Hans Petter Selasky


On 2019-11-11 10:34, Hans Petter Selasky wrote:

Hi,

Can you open the radeonkms.ko in gdb83 from ports and type:

l *(radeon_gem_busy_ioctl+0x30)



Hi,

I suspect there is a memory race in the seqlock framework. Can you try 
the attached patch and re-build?


Is this issue easily reproducible?

--HPS
diff --git a/linuxkpi/gplv2/include/linux/reservation.h b/linuxkpi/gplv2/include/linux/reservation.h
index b975f792c..0ce922a0e 100644
--- a/linuxkpi/gplv2/include/linux/reservation.h
+++ b/linuxkpi/gplv2/include/linux/reservation.h
@@ -94,7 +94,7 @@ reservation_object_init(struct reservation_object *obj)
 {
 	ww_mutex_init(>lock, _ww_class);
 
-	__seqcount_init(>seq, reservation_seqcount_string, _seqcount_class);
+	seqcount_init(>seq);
 	RCU_INIT_POINTER(obj->fence, NULL);
 	RCU_INIT_POINTER(obj->fence_excl, NULL);
 	obj->staged = NULL;
diff --git a/linuxkpi/gplv2/include/linux/seqlock.h b/linuxkpi/gplv2/include/linux/seqlock.h
index e86351810..940bd8e90 100644
--- a/linuxkpi/gplv2/include/linux/seqlock.h
+++ b/linuxkpi/gplv2/include/linux/seqlock.h
@@ -1,410 +1,149 @@
 #ifndef __LINUX_SEQLOCK_H
-#define __LINUX_SEQLOCK_H
-/*
- * Reader/writer consistent mechanism without starving writers. This type of
- * lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. There are two types
- * of readers:
- * 1. Sequence readers which never block a writer but they may have to retry
- *if a writer is in progress by detecting change in sequence number.
- *Writers do not wait for a sequence reader.
- * 2. Locking readers which will wait if a writer or another locking reader
- *is in progress. A locking reader in progress will also block a writer
- *from going forward. Unlike the regular rwlock, the read lock here is
- *exclusive so that only one locking reader can get it.
- *
- * This is not as cache friendly as brlock. Also, this may not work well
- * for data that contains pointers, because any writer could
- * invalidate a pointer that a reader was following.
- *
- * Expected non-blocking reader usage:
- * 	do {
- *	seq = read_seqbegin();
- * 	...
- *  } while (read_seqretry(, seq));
- *
- *
- * On non-SMP the spin locks disappear but the writer still needs
- * to increment the sequence variables because an interrupt routine could
- * change the state of the data.
- *
- * Based on x86_64 vsyscall gettimeofday 
- * by Keith Owens and Andrea Arcangeli
- */
+#define	__LINUX_SEQLOCK_H
 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
 
-
-/*
- * Version using sequence counter only.
- * This can be used when code has its own mutex protecting the
- * updating starting before the write_seqcountbeqin() and ending
- * after the write_seqcount_end().
- */
 typedef struct seqcount {
-	unsigned sequence;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-	struct lockdep_map dep_map;
-#endif
+	volatile unsigned sequence;
 } seqcount_t;
 
-
-#define lockdep_init_map(a, b, c, d)
-
-static inline void __seqcount_init(seqcount_t *s, const char *name,
-	  struct lock_class_key *key)
+static inline void
+seqcount_init(seqcount_t *s)
 {
-	/*
-	 * Make sure we are not reinitializing a held lock:
-	 */
-	lockdep_init_map(>dep_map, name, key, 0);
 	s->sequence = 0;
 }
 
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SEQCOUNT_DEP_MAP_INIT(lockname) \
-		.dep_map = { .name = #lockname } \
-
-# define seqcount_init(s)\
-	do {		\
-		static struct lock_class_key __key;	\
-		__seqcount_init((s), #s, &__key);	\
-	} while (0)
+#define	__seqcount_init(a,b,c) \
+	seqcount_init(a)
 
-static inline void seqcount_lockdep_reader_access(seqcount_t *s)
-{
-	seqcount_t *l = (seqcount_t *)s;
-	unsigned long flags;
-
-	local_irq_save(flags);
-	seqcount_acquire_read(>dep_map, 0, 0, _RET_IP_);
-	seqcount_release(>dep_map, 1, _RET_IP_);
-	local_irq_restore(flags);
+#define	SEQCNT_ZERO(lockname) {			\
+	.sequence = 0\
 }
 
-#else
-# define SEQCOUNT_DEP_MAP_INIT(lockname)
-# define seqcount_init(s) __seqcount_init(s, NULL, NULL)
-# define seqcount_lockdep_reader_access(x)
-#endif
-
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
-
-
-/**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
- *
- * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
- * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
- * provided before actually loading any of the variables that are to be
- * protected in this critical section.
- *
- * Use carefully, only in critical code, and comment how the barrier is
- * provided.
- */
-static inline unsigned __read_seqcount_begin(seqcount_t *s)
+static inline unsigned
+__read_seqcount_begin(seqcount_t *s)
 {
 	unsigned ret;
 
 repeat:
-	ret = READ_ONCE(s->sequence);
+	ret = s->sequence;
 	if (unlikely(ret & 1)) {
 		cpu_relax();
 		goto

Re: unkillable process consuming 100% cpu

2019-11-11 Thread Hans Petter Selasky


Hi,

Can you open the radeonkms.ko in gdb83 from ports and type:

l *(radeon_gem_busy_ioctl+0x30)

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-08 Thread Steve Kargl

On Thu, Nov 07, 2019 at 03:32:23PM -0500, Mark Johnston wrote:
> On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote:
> > I haven't seen anyone post about an unkillable process
> > (even by root), which consumes 100% cpu.
> > 
> > last pid:  4592;  load averages:  1.24,  1.08,  0.74   up 13+20:21:20  
> > 12:26:29
> > 68 processes:  2 running, 66 sleeping
> > CPU:  0.1% user,  0.0% nice, 12.6% system,  0.0% interrupt, 87.2% idle
> > Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M 
> > Free
> > Swap: 16G Total, 24M Used, 16G Free
> > 
> >   PID USERNAMETHR PRI NICE   SIZERES STATEC   TIMEWCPU 
> > COMMAND
> > 69092 kargl 2  450   342M   148M CPU2 2  12:51 100.07% 
> > chrome
> > 
> > 
> > Neither of these have an effect.
> > 
> > kill -1 69092
> > kill -9 69069
> > 
> > Attempts to attach gdb831 to -p 69092 leads to hung xterm.
> 
> Could you please show us the output of "procstat -kk 69092"?

Just had another lock-up.  A force 'shutdown -r now' from a
remote terminal led to a console message about an unkillable
process.

Here's 'procstat -kk' for the stuck process with the long line wrapped.

  PIDTID COMM   TDNAME  KSTACK   
  877 100161 Xorg   -   radeon_gem_busy_ioctl+0x30
drm_ioctl_kernel+0xf1
drm_ioctl+0x279
linux_file_ioctl+0x298
kern_ioctl+0x284
sys_ioctl+0x157
amd64_syscall+0x273
fast_syscall_common+0x101 
  877 100344 Xorg   X:rcs0  mi_switch+0xcb
sleepq_catch_signals+0x35d
sleepq_wait_sig+0xc
_sleep+0x1bd
umtxq_sleep+0x132
do_wait+0x3d6
__umtx_op_wait_uint_private+0x7e
amd64_syscall+0x273
fast_syscall_common+0x101 


It looks like radeonkms+drm is getting stuck.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-07 Thread Steve Kargl

On Thu, Nov 07, 2019 at 03:32:23PM -0500, Mark Johnston wrote:
> On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote:
> > I haven't seen anyone post about an unkillable process
> > (even by root), which consumes 100% cpu.
> > 
> > last pid:  4592;  load averages:  1.24,  1.08,  0.74   up 13+20:21:20  
> > 12:26:29
> > 68 processes:  2 running, 66 sleeping
> > CPU:  0.1% user,  0.0% nice, 12.6% system,  0.0% interrupt, 87.2% idle
> > Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M 
> > Free
> > Swap: 16G Total, 24M Used, 16G Free
> > 
> >   PID USERNAMETHR PRI NICE   SIZERES STATEC   TIMEWCPU 
> > COMMAND
> > 69092 kargl 2  450   342M   148M CPU2 2  12:51 100.07% 
> > chrome
> > 
> > 
> > Neither of these have an effect.
> > 
> > kill -1 69092
> > kill -9 69069
> > 
> > Attempts to attach gdb831 to -p 69092 leads to hung xterm.
> 
> Could you please show us the output of "procstat -kk 69092"?

Unfortunately, no.  I just rebooted the system to kill 69092.
During 'shutdown -r now', a message appeared on the console
warning that some processes would not die.  Then 'shutdown
-r now' hung the console. :(

Before rebooting I did try a number of ps and procstat commands, 69092 was

chrome: --type=gpu-process --field-trial-handle=long-string-of-number
--gpu-preferences=long-string-with-IAs

So, it seems that drm-current-kmod may not be happy.

For the record, uname gives FreeBSD 13.0-CURRENT r353571



-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

2019-11-07 Thread Mark Johnston

On Thu, Nov 07, 2019 at 12:29:19PM -0800, Steve Kargl wrote:
> I haven't seen anyone post about an unkillable process
> (even by root), which consumes 100% cpu.
> 
> last pid:  4592;  load averages:  1.24,  1.08,  0.74   up 13+20:21:20  
> 12:26:29
> 68 processes:  2 running, 66 sleeping
> CPU:  0.1% user,  0.0% nice, 12.6% system,  0.0% interrupt, 87.2% idle
> Mem: 428M Active, 11G Inact, 138M Laundry, 2497M Wired, 1525M Buf, 2377M Free
> Swap: 16G Total, 24M Used, 16G Free
> 
>   PID USERNAMETHR PRI NICE   SIZERES STATEC   TIMEWCPU COMMAND
> 69092 kargl 2  450   342M   148M CPU2 2  12:51 100.07% chrome
> 
> 
> Neither of these have an effect.
> 
> kill -1 69092
> kill -9 69069
> 
> Attempts to attach gdb831 to -p 69092 leads to hung xterm.

Could you please show us the output of "procstat -kk 69092"?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

Re: unkillable process consuming 100% cpu

18 matches

Site Navigation

Mail list logo

Footer information