[Bug 1840650] Re: System crashes under nfs heavy load

2019-08-27 Thread dmn42
Installed 5.3.0-050300rc5-generic. Looks better than any other kernel
I've tried before (see attached uptime graph). I'll keep an eye on it.

** Attachment added: "storage_uptime.png"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840650/+attachment/5284921/+files/storage_uptime.png

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1840650

Title:
  System crashes under nfs heavy load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840650/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1840650] Re: System crashes under nfs heavy load

2019-08-21 Thread dmn42
Fresh log:

  KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-55-generic
DUMPFILE: /var/crash_/201908201708/dump.201908201708  [PARTIAL DUMP]
CPUS: 32
DATE: Tue Aug 20 17:08:36 2019
  UPTIME: 03:32:12
LOAD AVERAGE: 5.43, 6.39, 7.70
   TASKS: 653
NODENAME: storage-gce-be-1.project.domain.net
 RELEASE: 4.15.0-55-generic
 VERSION: #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019
 MACHINE: x86_64  (2300 Mhz)
  MEMORY: 120 GB
   PANIC: "BUG: unable to handle kernel paging request at 976688e89010"
 PID: 112
 COMMAND: "ksoftirqd/17"
TASK: 97843fccad80  [THREAD_INFO: 97843fccad80]
 CPU: 17
   STATE: TASK_RUNNING (PANIC)

[50031.661288] WARNING: CPU: 17 PID: 10720 at 
/build/linux-aAn8fZ/linux-4.15.0/lib/radix-tree.c:783 delete_node+0x87/0x1f0
[50031.661291] Modules linked in: binfmt_misc tcp_diag inet_diag 
ip6table_filter ip6_tables iptable_filter sch_fq_codel ib_iser rdma_cm iw_cm 
ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nls_iso8859_1 sb_edac intel_rapl_perf input_leds serio_raw mac_hid pvpanic nfsd 
netconsole auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 
crypto_simd glue_helper cryptd psmouse virtio_net virtio_scsi i2c_piix4
[50031.661341] CPU: 17 PID: 10720 Comm: kworker/u64:4 Not tainted 
4.15.0-55-generic #60-Ubuntu
[50031.661342] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[50031.661359] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[50031.661362] RIP: 0010:delete_node+0x87/0x1f0
[50031.661363] RSP: 0018:ae6808dcfd78 EFLAGS: 00010297
[50031.661364] RAX: 9766958a6498 RBX: 976689cb36c0 RCX: 
[50031.661365] RDX:  RSI: 976688e88ff8 RDI: 976688e89010
[50031.661366] RBP: ae6808dcfda0 R08:  R09: 0001
[50031.661367] R10: 976688e89020 R11: 0002 R12: 9783f3c37840
[50031.661368] R13:  R14: b7b83630 R15: 
[50031.661370] FS:  () GS:97844724() 
knlGS:
[50031.661371] CS:  0010 DS:  ES:  CR0: 80050033
[50031.661371] CR2: 7fda62d297f0 CR3: 001ba7a0a001 CR4: 001606e0
[50031.661377] DR0:  DR1:  DR2: 
[50031.661378] DR3:  DR6: fffe0ff0 DR7: 0400
[50031.661378] Call Trace:
[50031.661384]  __radix_tree_delete+0x7f/0xa0
[50031.661387]  radix_tree_delete_item+0x6a/0xc0
[50031.661393]  nfs4_put_stid+0x3d/0x90 [nfsd]
[50031.661398]  nfsd4_cb_recall_release+0x15/0x20 [nfsd]
[50031.661403]  nfsd4_run_cb_work+0xd4/0xf0 [nfsd]
[50031.661408]  process_one_work+0x1de/0x410
[50031.661410]  worker_thread+0x32/0x410
[50031.661413]  kthread+0x121/0x140
[50031.661414]  ? process_one_work+0x410/0x410
[50031.661416]  ? kthread_create_worker_on_cpu+0x70/0x70
[50031.661421]  ret_from_fork+0x35/0x40
[50031.661422] Code: c2 41 8b 04 24 a9 00 00 00 02 75 09 25 ff ff ff 03 41 89 
04 24 49 c7 44 24 08 00 00 00 00 48 8b 46 18 48 39 f8 0f 84 2d 01 00 00 <0f> 0b 
4c 89 f6 e8 2f f3 77 ff 48 85 db 75 ab 41 bf 01 00 00 00 
[50031.661446] ---[ end trace 157b93e1b360abc7 ]---
[50031.715701] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[50031.723796] BUG: unable to handle kernel paging request at 976688e89010
[50031.732281] IP: 0x976688e89010
[50031.737185] PGD 1ba813f067 P4D 1ba813f067 PUD 1ba8140067 PMD 
88e000e3 
[50031.744546] Oops: 0011 [#1] SMP PTI
[50031.748178] Modules linked in: binfmt_misc tcp_diag inet_diag 
ip6table_filter ip6_tables iptable_filter sch_fq_codel ib_iser rdma_cm iw_cm 
ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nls_iso8859_1 sb_edac intel_rapl_perf input_leds serio_raw mac_hid pvpanic nfsd 
netconsole auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 
crypto_simd glue_helper cryptd psmouse virtio_net virtio_scsi i2c_piix4
[50031.807432] CPU: 17 PID: 112 Comm: ksoftirqd/17 Tainted: GW
4.15.0-55-generic #60-Ubuntu
[50031.818702] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[50031.828051] RIP: 0010:0x976688e89010
[50031.833470] RSP: 0018:ae68066b7df0 EFLAGS: 00010296
[50031.838829] RAX: 976688e89010 RBX: 978447263640 RCX: 0001001c001a
[50031.847498] RDX: 976688e89010 RSI: db4bfff6fc00 RDI: 976688e89010

[Bug 1840650] Re: System crashes under nfs heavy load

2019-08-19 Thread dmn42
Due to the nature of data processed on this box I'm unable to run
apport-collect.

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1840650

Title:
  System crashes under nfs heavy load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840650/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1840650] [NEW] System crashes under nfs heavy load

2019-08-19 Thread dmn42
Public bug reported:

Hi all.

We have an NFSv4 server hosted in GCP with relative heavy load (read
over nfs and write over ssh/rsync). After upgrade from 14.04 to 18.04
this server started to crash approx. once a day or two. We have similar
box with same configuration but without nfs load (cold spare) and this
problem does not affect it.

I've tried multiple kernels listed below, this did not help.
linux-image-4.15.0-1026-gcp
linux-image-4.15.0-55-generic
linux-image-4.18.0-1008-gcp
linux-image-4.18.0-1015-gcp
linux-image-4.19.36
linux-image-5.0.0-1011-gcp
linux-image-5.2.5

NFS packages:
ii  libnfsidmap2:amd640.25-5.1  
  amd64NFS idmapping library
ii  nfs-common1:1.3.4-2.1ubuntu5.2  
  amd64NFS support files common to client and server
ii  nfs-kernel-server 1:1.3.4-2.1ubuntu5.2  
  amd64support for NFS kernel server

Crash info:

  KERNEL: /usr/lib/debug/boot/vmlinux-4.15.0-55-generic
DUMPFILE: /var/crash_/201908190304/dump.201908190304  [PARTIAL DUMP]
CPUS: 32
DATE: Mon Aug 19 03:04:28 2019
  UPTIME: 06:05:53
LOAD AVERAGE: 7.56, 6.82, 6.77
   TASKS: 627
NODENAME: storage-gce-be-1.project.domain.net
 RELEASE: 4.15.0-55-generic
 VERSION: #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019
 MACHINE: x86_64  (2300 Mhz)
  MEMORY: 120 GB
   PANIC: "BUG: unable to handle kernel paging request at 9e9119d39b78"
 PID: 112
 COMMAND: "ksoftirqd/17"
TASK: 9e96bfcc8000  [THREAD_INFO: 9e96bfcc8000]
 CPU: 17
   STATE: TASK_RUNNING (PANIC)

A part of crash log:

[86915.179808] WARNING: CPU: 17 PID: 3917 at 
/build/linux-aAn8fZ/linux-4.15.0/lib/radix-tree.c:783 delete_node+0x87/0x1f0
[86915.179810] Modules linked in: tcp_diag inet_diag binfmt_misc 
ip6table_filter ip6_tables iptable_filter sch_fq_codel ib_iser rdma_cm iw_cm 
ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nls_iso8859_1 sb_edac intel_rapl_perf input_leds mac_hid serio_raw pvpanic nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc netconsole ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 
crypto_simd glue_helper cryptd psmouse virtio_net virtio_scsi i2c_piix4
[86915.179859] CPU: 17 PID: 3917 Comm: kworker/u64:3 Not tainted 
4.15.0-55-generic #60-Ubuntu
[86915.179860] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[86915.179877] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[86915.179880] RIP: 0010:delete_node+0x87/0x1f0
[86915.179881] RSP: 0018:b97087e0fd78 EFLAGS: 00010206
[86915.179882] RAX: 9e81839df6d8 RBX: 9e81839df6c0 RCX: 
[86915.179883] RDX:  RSI: 9e9119d39b60 RDI: 9e9119d39b78
[86915.179884] RBP: b97087e0fda0 R08:  R09: 0034
[86915.179884] R10: 9e9119d39b88 R11: 0035 R12: 9e96b6304840
[86915.179885] R13:  R14: 84583630 R15: 
[86915.179887] FS:  () GS:9e96c724() 
knlGS:
[86915.179887] CS:  0010 DS:  ES:  CR0: 80050033
[86915.179888] CR2: 7fdbe3df9ba8 CR3: 0002fba0a006 CR4: 001606e0
[86915.179893] DR0:  DR1:  DR2: 
[86915.179894] DR3:  DR6: fffe0ff0 DR7: 0400
[86915.179895] Call Trace:
[86915.179901]  __radix_tree_delete+0x7f/0xa0
[86915.179903]  radix_tree_delete_item+0x6a/0xc0
[86915.179910]  nfs4_put_stid+0x3d/0x90 [nfsd]
[86915.179915]  nfsd4_cb_recall_release+0x15/0x20 [nfsd]
[86915.179920]  nfsd4_run_cb_work+0xd4/0xf0 [nfsd]
[86915.179924]  process_one_work+0x1de/0x410
[86915.179926]  worker_thread+0x32/0x410
[86915.179928]  kthread+0x121/0x140
[86915.179930]  ? process_one_work+0x410/0x410
[86915.179932]  ? kthread_create_worker_on_cpu+0x70/0x70
[86915.179935]  ret_from_fork+0x35/0x40
[86915.179936] Code: c2 41 8b 04 24 a9 00 00 00 02 75 09 25 ff ff ff 03 41 89 
04 24 49 c7 44 24 08 00 00 00 00 48 8b 46 18 48 39 f8 0f 84 2d 01 00 00 <0f> 0b 
4c 89 f6 e8 2f f3 77 ff 48 85 db 75 ab 41 bf 01 00 00 00 
[86915.179961] ---[ end trace d94747d62f40d46c ]---
[86915.220219] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[86915.229005] BUG: unable to handle kernel paging request at 9e9119d39b78
[86915.236105] IP: 0x9e9119d39b78
[86915.241016] PGD 2fc143067 P4D 2fc143067 PUD 756c2f063 PMD 801819c000e3 
[86915.249604] Oops: 0011 [#1] SMP PTI
[86915.253220] Modules linked in: tcp_diag inet_diag binfmt_misc 
ip6table_filter ip6_tables iptable_filter sch_fq_codel