[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-06-24 Thread Po-Hsu Lin
This issue is striking us again with 5.3.0-60.54 on modoc.

** Tags added: sru-20200608

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.817006] 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
** Tags removed: kqa-blocker

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
I can reproduce this with 5.3.0-40-generic, so removing the kqa-blocker
tag.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Sean Feole
** Changed in: ubuntu-kernel-tests
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting error (-12) 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
I managed to catch this issue with IPMI console:

[  673.975988] BUG: Unable to handle kernel instruction fetch
[  673.976017] Faulting instruction address: 0x7fe87fe8
[  673.976025] Oops: Kernel access of bad area, sig: 11 [#1]
[  673.976032] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[  673.976040] Modules linked in: binfmt_misc dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua leds_powernv ipmi_powernv ipmi_devintf ipmi_msghandler 
ibmpowernv uio_pdrv_genirq vmx_crypto uio powernv_rng powernv_op_panel 
sch_fq_codel ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_vpmsum 
crc32c_vpmsum tg3 ipr [last unloaded: notifier_error_inject]
[  673.976101] CPU: 0 PID: 76667 Comm: reuseport_bpf_c Not tainted 
5.3.0-42-generic #34-Ubuntu
[  673.976109] NIP:  7fe87fe8 LR: c0ca9d10 CTR: 7fe87fe8
[  673.976117] REGS: c00cf580 TRAP: 0480   Not tainted  
(5.3.0-42-generic)
[  673.976124] MSR:  90009033   CR: 24002488  
XER: 2000
[  673.976135] CFAR: c0ca9d0c IRQMASK: 0 
[  673.976135] GPR00: c0e4094c c00cf810 c19c9000 
c00a2208b6e0 
[  673.976135] GPR04: c00807c70038 c00a2208b6e0 0028 
00012e2e5506 
[  673.976135] GPR08: a2be9417   
 
[  673.976135] GPR12: 7fe87fe8 c1d6 0003 
0101 
[  673.976135] GPR16: c3fe00e8 22b8 2788 
c18fb600 
[  673.976135] GPR20: 000a  22b8 
0001 
[  673.976135] GPR24:  0028 00a0 
c3fe00e8 
[  673.976135] GPR28: c00807c7 147b1819 c00a2208b6e0 
c01e2d8b 
[  673.976195] NIP [7fe87fe8] 0x7fe87fe8
[  673.976206] LR [c0ca9d10] reuseport_select_sock+0x100/0x400
[  673.976212] Call Trace:
[  673.976216] [c00cf810] [c00fe9daf0f0] 0xc00fe9daf0f0 
(unreliable)
[  673.976226] [c00cf8b0] [c0e4094c] 
inet6_lhash2_lookup+0x1ec/0x220
[  673.976234] [c00cf930] [c0e40c80] 
inet6_lookup_listener+0x300/0x3f0
[  673.976244] [c00cf9d0] [c0e1ec38] tcp_v6_rcv+0x7d8/0xe10
[  673.976252] [c00cfb00] [c0dd7e70] 
ip6_protocol_deliver_rcu+0x110/0x6e0
[  673.976261] [c00cfb80] [c0dd846c] ip6_input_finish+0x2c/0x40
[  673.976269] [c00cfba0] [c0dd8560] ip6_input+0xe0/0xf0
[  673.976276] [c00cfc10] [c0dd7008] ip6_rcv_finish+0xa8/0xe0
[  673.976283] [c00cfc40] [c0dd7b90] ipv6_rcv+0x100/0x110
[  673.976291] [c00cfcc0] [c0c6d040] 
__netif_receive_skb_one_core+0x70/0xb0
[  673.976299] [c00cfd00] [c0c6ee70] process_backlog+0xd0/0x230
[  673.976307] [c00cfd70] [c0c6db88] net_rx_action+0x1e8/0x510
[  673.976315] [c00cfe90] [c0e9c8a0] __do_softirq+0x160/0x3f4
[  673.976324] [c00cff90] [c0030be8] call_do_softirq+0x14/0x24
[  673.976332] [c4a4f6b0] [c001c2e8] 
do_softirq_own_stack+0x38/0x50
[  673.976341] [c4a4f6d0] [c01315a0] do_softirq.part.0+0x80/0xb0
[  673.976349] [c4a4f700] [c0131698] 
__local_bh_enable_ip+0xc8/0xf0
[  673.976357] [c4a4f720] [c0dd19b8] 
ip6_finish_output2+0x298/0x790
[  673.976365] [c4a4f7c0] [c0dd6124] ip6_output+0x84/0x1c0
[  673.976372] [c4a4f840] [c0dd222c] ip6_xmit+0x37c/0x7b0
[  673.976379] [c4a4f960] [c0e272e4] inet6_csk_xmit+0xb4/0x120
[  673.976387] [c4a4fa00] [c0d4231c] 
__tcp_transmit_skb+0x55c/0xd80
[  673.976396] [c4a4fab0] [c0d43a18] tcp_connect+0x918/0xab0
[  673.976403] [c4a4fb70] [c0e1c25c] tcp_v6_connect+0x5ac/0x740
[  673.976411] [c4a4fc50] [c0d7271c] 
__inet_stream_connect+0x12c/0x4b0
[  673.976419] [c4a4fcf0] [c0d72afc] 
inet_stream_connect+0x5c/0x90
[  673.976428] [c4a4fd30] [c0c37a0c] __sys_connect+0x11c/0x160
[  673.976435] [c4a4fe00] [c0c37a78] sys_connect+0x28/0x40
[  673.976443] [c4a4fe20] [c000b388] system_call+0x5c/0x70
[  673.976449] Instruction dump:
[  673.976454]        
 
[  673.976462]        
 
[  673.976472] ---[ end trace 0961698a90363364 ]---
[  673.976948] 
[  674.977035] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  674.978187] Rebooting in 10 seconds..

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
I can see this message on a freshly deployed modoc:
[  396.856011] ipr 0001:08:00.0: 9076: Configuration error, missing remote IOA
[  396.856042] ipr 0001:08:00.0: Attached Adapter not discovered within 
allotted time [PRC: 17101541]
[  396.856051] ipr 0001:08:00.0: Remote IOA VPID/SN:   
[  396.856058] ipr 0001:08:00.0: Remote IOA WWN: 
[  396.856065] ipr: : 0100 0100 FC22 0840
[  396.856072] ipr: 0010: 4C494344 49424D20 20202020 35374438
[  396.856078] ipr: 0020: 30303153 4953494F 41202020 
[  396.856084] ipr: 0030: 30303431 57303132 50050760 5EC33900
[  396.856091] ipr: 0040: 0001  02020101 00FF

With 5.3.0-40

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
Please find attachment for the syslog?field.comment=Please find
attachment for the syslog

** Attachment added: "modoc-syslog.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867155/+attachment/5336156/+files/modoc-syslog.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  This is what you can see from syslog after reboot:
  Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
  Mar 12 04:27:39 modoc kernel: [  536.719289] anon
  Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
  Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
  Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
  Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
  Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
** Description changed:

  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu
  
  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)
  
  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)
  
  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853
  
  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)
  
  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)
  
  I tried to run tests in this sru-misc suite in the following order:
- 'hwclock',
- 'ubuntu_bpf',
- 'ubuntu_bpf_jit',
- 'ubuntu_kernel_selftests',
- 'ubuntu_lxc',
- 'ubuntu_seccomp',
- 'ubuntu_unionmount_ovlfs',
- 'ubuntu_cts_kernel',
- 'ubuntu_kvm_unit_tests',
+ 'hwclock',
+ 'ubuntu_bpf',
+ 'ubuntu_bpf_jit',
+ 'ubuntu_kernel_selftests',
+ 'ubuntu_lxc',
+ 'ubuntu_seccomp',
+ 'ubuntu_unionmount_ovlfs',
+ 'ubuntu_cts_kernel',
+ 'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.
  
  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.
+ 
+ This is what you can see from syslog after reboot:
+ Mar 12 04:27:39 modoc kernel: [  536.668305] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.684547] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.700907] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.717246] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.719288] page:c00c00c4f000 refcount:1 
mapcount:0 mapping:c00f8cfe0fd1 index:0x7611c3e
+ Mar 12 04:27:39 modoc kernel: [  536.719289] anon
+ Mar 12 04:27:39 modoc kernel: [  536.719291] flags: 
0x3800080024(uptodate|active|swapbacked)
+ Mar 12 04:27:39 modoc kernel: [  536.719294] raw: 003800080024 
5deadbeef100 5deadbeef122 c00f8cfe0fd1
+ Mar 12 04:27:39 modoc kernel: [  536.719295] raw: 07611c3e 
 0001 c00fcfd1c000
+ Mar 12 04:27:39 modoc kernel: [  536.719296] page dumped because: unmovable 
page
+ Mar 12 04:27:39 modoc kernel: [  536.719296] page->mem_cgroup:c00fcfd1c000
+ Mar 12 04:27:39 modoc kernel: [  536.735465] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.751848] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.768210] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.784450] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.800756] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.817006] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 12 04:27:39 modoc kernel: [  536.833133] Injecting error (-12) to 
MEM_GOING_OFFLINE
+ Mar 

[Kernel-packages] [Bug 1867155] Re: P8 node modoc will reboot automatically when running the sru_misc test suite

2020-03-12 Thread Po-Hsu Lin
** Tags removed: sru-20200217
** Tags added: kqa-blocker

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867155

Title:
  P8 node modoc will reboot automatically when running the sru_misc test
  suite

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Tested with 5 attempts, 4 hangs around the following test in 
ubuntu_kernel_selftests net sub-category:
   # selftests: net: reuseport_bpf_cpu

  First attempt:
  23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
  23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
  23:21:32 DEBUG| [stdout] #  IPv4 UDP 
  (hang here)

  Second attempt:
  10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  10:17:35 DEBUG| [stdout] #  IPv4 UDP 
  10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
  (line skipped)
  10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
  10:17:35 DEBUG| [stdout] #  IPv6 TCP 
  (hang here)

  Third attempt failed because of test timeout:
  12:46:16 DEBUG| [stdout] # [FAIL]
  12:46:16 DEBUG| [stdout] # 
  12:46:16 DEBUG| [stdout] # running psock_tpacket test
  12:46:16 DEBUG| [stdout] # 
  13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853

  Fourth attempt:
  07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  07:41:51 DEBUG| [stdout] #  IPv4 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv6 UDP 
  07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
  07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
  (lines skipped)
  07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
  07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
  07:41:51 DEBUG| [stdout] #  IPv4 TCP 
  (test hang here)

  Fifth attempt:
  04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
  04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
  04:29:17 DEBUG| [stdout] #  IPv4 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv6 UDP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
  04:29:17 DEBUG| [stdout] #  IPv4 TCP 
  04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
  (lines skipped)
  04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
  (test hang here)

  I tried to run tests in this sru-misc suite in the following order:
  'hwclock',
  'ubuntu_bpf',
  'ubuntu_bpf_jit',
  'ubuntu_kernel_selftests',
  'ubuntu_lxc',
  'ubuntu_seccomp',
  'ubuntu_unionmount_ovlfs',
  'ubuntu_cts_kernel',
  'ubuntu_kvm_unit_tests',
  One by one on this node, but I can't reproduce this issue.

  I tried to watch dmesg when this happens, but there is no information
  there, the system will be reboot automatically silently.

  Maybe we need to use IPMI to see if there is anything on the console.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-42-generic 5.3.0-42.34
  ProcVersionSignature: Ubuntu 5.3.0-42.34-generic 5.3.18
  Uname: Linux 5.3.0-42-generic ppc64le
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 12 04:33 seq
   crw-rw 1 root audio 116, 33 Mar 12 04:33 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.5
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Thu Mar 12 09:42:24 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  PciMultimedia:

  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: root=UUID=b2a867ce-7813-4785-8861-4e7de2ac39b4 ro 
console=hvc0
  ProcLoadAvg: 0.07 0.02 0.00 1/1461 86637
  ProcLocks:
   1: POSIX  ADVISORY  WRITE 3799 00:18:841 0 EOF
   2: POSIX  ADVISORY  WRITE 3526 00:18:743 0 EOF
   3: FLOCK  ADVISORY  WRITE 3720 00:18:837 0 EOF
  ProcSwaps:
   Filename TypeSizeUsedPriority