[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2024-01-22 Thread Brian Corriveau
I was able to reproduce the hang problem using the script mdhang.sh
(modified for my raid device location) on 22.04 with kernel 5.15 several
times, usually within 10-15 minutes.

Using 22.04 hwe kernel 6.5 I was not able to reproduce the problem with
a 25+ hours run of the same script. Seems something has been fixed in
the newer hwe kernel.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2024-01-18 Thread alex
Hi. I have the same problem on radi5x2 arrays assembled in stripe
The total volume of the array is 97 TB and free 16 GB
Do users who have a bug also have little free space?
I couldn’t reproduce this condition on a test bench under a synthetic load.
Ubuntu 22.04.3 LTS
kernel 5.15.0.-72-generic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2024-01-11 Thread Lucas Magnien
Hello, got the same issue with Ubuntu 22.04.3 LTS (GNU/Linux
5.15.0-91-generic x86_64). Also tried 5.19 kernel and got the same
problem.

Jan  7 02:28:26 cache4 systemd[1]: Starting MD array scrubbing...
Jan  7 02:28:26 cache4 root: mdcheck start checking /dev/md0
Jan  7 08:28:44 cache4 kernel: [2914434.326024] md: md0: data-check interrupted.
Jan  7 08:32:08 cache4 kernel: [2914638.397357] INFO: task jbd2/md0-8:1337 
blocked for more than 120 seconds.
Jan  7 08:32:08 cache4 kernel: [2914638.397420]   Not tainted 
5.15.0-91-generic #99-Ubuntu
Jan  7 08:32:08 cache4 kernel: [2914638.397457] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  7 08:32:08 cache4 kernel: [2914638.397505] task:jbd2/md0-8  state:D 
stack:0 pid: 1337 ppid: 2 flags:0x4000
Jan  7 08:32:08 cache4 kernel: [2914638.397512] Call Trace:
Jan  7 08:32:08 cache4 kernel: [2914638.397515]  
Jan  7 08:32:08 cache4 kernel: [2914638.397520]  __schedule+0x24e/0x590
Jan  7 08:32:08 cache4 kernel: [2914638.397530]  schedule+0x69/0x110
Jan  7 08:32:08 cache4 kernel: [2914638.397535]  
md_write_start.part.0+0x174/0x220
Jan  7 08:32:08 cache4 kernel: [2914638.397540]  ? wait_woken+0x70/0x70
Jan  7 08:32:08 cache4 kernel: [2914638.397547]  md_write_start+0x14/0x30
Jan  7 08:32:08 cache4 kernel: [2914638.397553]  raid5_make_request+0x77/0x540 
[raid456]
Jan  7 08:32:08 cache4 kernel: [2914638.397566]  ? 
jbd2_transaction_committed+0x1b/0x60
Jan  7 08:32:08 cache4 kernel: [2914638.397573]  ? ext4_set_iomap+0x5a/0x1d0
Jan  7 08:32:08 cache4 kernel: [2914638.397579]  ? wait_woken+0x70/0x70
Jan  7 08:32:08 cache4 kernel: [2914638.397584]  md_handle_request+0x12d/0x1b0
Jan  7 08:32:08 cache4 kernel: [2914638.397589]  ? submit_bio_checks+0x1a5/0x560
Jan  7 08:32:08 cache4 kernel: [2914638.397595]  md_submit_bio+0x76/0xc0
Jan  7 08:32:08 cache4 kernel: [2914638.397600]  __submit_bio+0x1a5/0x220
Jan  7 08:32:08 cache4 kernel: [2914638.397603]  ? mempool_alloc_slab+0x17/0x20
Jan  7 08:32:08 cache4 kernel: [2914638.397611]  __submit_bio_noacct+0x85/0x200
Jan  7 08:32:08 cache4 kernel: [2914638.397614]  ? kmem_cache_alloc+0x1ab/0x2f0
Jan  7 08:32:08 cache4 kernel: [2914638.397619]  submit_bio_noacct+0x4e/0x120
Jan  7 08:32:08 cache4 kernel: [2914638.397623]  submit_bio+0x4a/0x130
Jan  7 08:32:08 cache4 kernel: [2914638.397627]  submit_bh_wbc+0x18d/0x1c0
Jan  7 08:32:08 cache4 kernel: [2914638.397632]  submit_bh+0x13/0x20
Jan  7 08:32:08 cache4 kernel: [2914638.397635]  
jbd2_journal_commit_transaction+0x861/0x17a0
Jan  7 08:32:08 cache4 kernel: [2914638.397640]  ? __update_idle_core+0x93/0x120
Jan  7 08:32:08 cache4 kernel: [2914638.397649]  kjournald2+0xa9/0x280
Jan  7 08:32:08 cache4 kernel: [2914638.397653]  ? wait_woken+0x70/0x70
Jan  7 08:32:08 cache4 kernel: [2914638.397657]  ? 
load_superblock.part.0+0xc0/0xc0
Jan  7 08:32:08 cache4 kernel: [2914638.397662]  kthread+0x12a/0x150
Jan  7 08:32:08 cache4 kernel: [2914638.397667]  ? set_kthread_struct+0x50/0x50
Jan  7 08:32:08 cache4 kernel: [2914638.397672]  ret_from_fork+0x22/0x30
Jan  7 08:32:08 cache4 kernel: [2914638.397680]  


# cat /sys/block/md0/md/array_state
write-pending

This is happening on all our servers with NVMe devices.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-12-19 Thread Jan Kratochvil
Linux version 6.4.12-200.fc38.x86_64
(mockbuild@30894952d3244f1ab967aeda9ed417f6) (gcc (GCC) 13.2.1 20230728
(Red Hat 13.2.1-1), GNU ld version 2.39-9.fc38) #1 SMP PREEMPT_DYNAMIC
Wed Aug 23 17:46:49 UTC 2023

230 ?I< 0:00  \_ [md]
   1377 ?S  6:15  \_ [md0_raid1]
   1565 ?D4955:37  \_ [md4_raid6]
2772538 ?DN   111:11  \_ [md4_resync]

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sde2[0]
  13671170048 blocks super 1.2 [1/1] [U]
  bitmap: 1/102 pages [4KB], 65536KB chunk

md4 : active raid6 sdd1[2] sdb1[5] sde1[4] sdc1[1]
  11720779776 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] []
  [=>...]  check = 89.0% (5217054876/5860389888) 
finish=852549.6min speed=12K/sec
  bitmap: 9/44 pages [36KB], 65536KB chunk

unused devices: 
# cat /proc/2772538/stack # md4_resync
[<0>] raid5_get_active_stripe+0x271/0x540 [raid456]
[<0>] raid5_sync_request+0x3ad/0x3d0 [raid456]
[<0>] md_do_sync+0x7be/0x11c0
[<0>] md_thread+0xae/0x190
[<0>] kthread+0xe8/0x120
[<0>] ret_from_fork+0x2c/0x50
# cat /proc/1565/stack # md4_raid6
[<0>] raid5d+0x524/0x750 [raid456]
[<0>] md_thread+0xae/0x190
[<0>] kthread+0xe8/0x120
[<0>] ret_from_fork+0x2c/0x50

Workarounded by:
# cat /sys/block/md4/md/array_state
write-pending
# echo active > /sys/block/md4/md/array_state 
# cat /sys/block/md4/md/array_state
active

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-10-02 Thread Patrick Hampson
Noticed this issue on Ubuntu 20.04 with a md raid device.  System
exhibited the same behavior as other users have noted: high CPU usage
and terminal locking up until the system is rebooted.

[14715252.569157] INFO: task md1_raid4:1763945 blocked for more than 120 
seconds.
[14715252.570228]   Not tainted 5.4.0-146-generic #163-Ubuntu
[14715252.571277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[14715252.572347] md1_raid4   D0 1763945  2 0x80004000
[14715252.572357] Call Trace:
[14715252.572360]  __schedule+0x2e3/0x740
[14715252.572363]  schedule+0x42/0xb0
[14715252.572369]  raid5d+0x3e6/0x5f0 [raid456]
[14715252.572376]  ? schedule_timeout+0x10e/0x160
[14715252.572381]  ? __wake_up_pollfree+0x40/0x40
[14715252.572384]  md_thread+0x97/0x160
[14715252.572392]  ? __wake_up_pollfree+0x40/0x40
[14715252.572394]  kthread+0x104/0x140
[14715252.572399]  ? md_start_sync+0x60/0x60
[14715252.572403]  ? kthread_park+0x90/0x90
[14715252.572405]  ret_from_fork+0x35/0x40
[14715252.572430] INFO: task kworker/u64:1:3189415 blocked for more than 120 
seconds.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-10-02 Thread Silas Horton
On Ubuntu 22.04, 5.15.0-83-generic #92-Ubuntu - our storage system ran
into this bug. mdcheck ran for the scheduled 1st day of the month and
then hung 6 hours later.


Oct  1 06:52:13 server1 systemd[1]: Starting MD array scrubbing...
Oct  1 06:52:13 server1 root: mdcheck start checking /dev/md0
Oct  1 06:52:13 server1 kernel: [2129098.393495] md: data-check of RAID array 
md0


Oct  1 12:57:49 server1 kernel: [2151034.623372] INFO: task 
dmcrypt_write/2:1783 blocked for more than 241 seconds.
Oct  1 12:57:49 server1 kernel: [2151034.623446]   Tainted: G S 
   5.15.0-83-generic #92-Ubuntu
Oct  1 12:57:49 server1 kernel: [2151034.623498] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  1 12:57:49 server1 kernel: [2151034.623559] task:dmcrypt_write/2 state:D 
stack:0 pid: 1783 ppid: 2 flags:0x4000
Oct  1 12:57:49 server1 kernel: [2151034.623566] Call Trace:
Oct  1 12:57:49 server1 kernel: [2151034.623570]  
Oct  1 12:57:49 server1 kernel: [2151034.623574]  __schedule+0x24e/0x590
Oct  1 12:57:49 server1 kernel: [2151034.623585]  ? __schedule+0x256/0x590
Oct  1 12:57:49 server1 kernel: [2151034.623590]  schedule+0x69/0x110
Oct  1 12:57:49 server1 kernel: [2151034.623596]  
md_write_start.part.0+0x174/0x220
Oct  1 12:57:49 server1 kernel: [2151034.623601]  ? wait_woken+0x70/0x70
Oct  1 12:57:49 server1 kernel: [2151034.623610]  md_write_start+0x14/0x30
Oct  1 12:57:49 server1 kernel: [2151034.623615]  raid5_make_request+0x77/0x540 
[raid456]
Oct  1 12:57:49 server1 kernel: [2151034.623633]  ? 
cgroup_rstat_updated+0x11c/0x1e0
Oct  1 12:57:49 server1 kernel: [2151034.623642]  ? wait_woken+0x70/0x70
Oct  1 12:57:49 server1 kernel: [2151034.623648]  md_handle_request+0x12d/0x1b0
Oct  1 12:57:49 server1 kernel: [2151034.623657]  ? 
submit_bio_checks+0x1a5/0x560
Oct  1 12:57:49 server1 kernel: [2151034.623664]  md_submit_bio+0x76/0xc0
Oct  1 12:57:49 server1 kernel: [2151034.623670]  __submit_bio+0x1a5/0x220
Oct  1 12:57:49 server1 kernel: [2151034.623675]  ? psi_task_switch+0xc6/0x220
Oct  1 12:57:49 server1 kernel: [2151034.623682]  __submit_bio_noacct+0x85/0x200
Oct  1 12:57:49 server1 kernel: [2151034.623687]  submit_bio_noacct+0x4e/0x120
Oct  1 12:57:49 server1 kernel: [2151034.623691]  ? schedule+0x69/0x110
Oct  1 12:57:49 server1 kernel: [2151034.623698]  dmcrypt_write+0x104/0x130 
[dm_crypt]
Oct  1 12:57:49 server1 kernel: [2151034.623708]  ? crypt_ctr+0x600/0x600 
[dm_crypt]
Oct  1 12:57:49 server1 kernel: [2151034.623715]  kthread+0x12a/0x150
Oct  1 12:57:49 server1 kernel: [2151034.623723]  ? set_kthread_struct+0x50/0x50
Oct  1 12:57:49 server1 kernel: [2151034.623730]  ret_from_fork+0x22/0x30
Oct  1 12:57:49 server1 kernel: [2151034.623739]  
Oct  1 12:57:49 server1 kernel: [2151034.623778] INFO: task mdcheck:2323903 
blocked for more than 241 seconds.
Oct  1 12:57:49 server1 kernel: [2151034.623833]   Tainted: G S 
   5.15.0-83-generic #92-Ubuntu
Oct  1 12:57:49 server1 kernel: [2151034.625482] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  1 12:57:49 server1 kernel: [2151034.627098] task:mdcheck state:D 
stack:0 pid:2323903 ppid: 1 flags:0x0002
Oct  1 12:57:49 server1 kernel: [2151034.627104] Call Trace:
Oct  1 12:57:49 server1 kernel: [2151034.627106]  
Oct  1 12:57:49 server1 kernel: [2151034.627109]  __schedule+0x24e/0x590
Oct  1 12:57:49 server1 kernel: [2151034.627114]  ? 
select_idle_sibling+0x2b/0xa60
Oct  1 12:57:49 server1 kernel: [2151034.627124]  schedule+0x69/0x110
Oct  1 12:57:49 server1 kernel: [2151034.627129]  schedule_timeout+0x103/0x140
Oct  1 12:57:49 server1 kernel: [2151034.627135]  ? 
ttwu_queue_wakelist+0x131/0x1c0
Oct  1 12:57:49 server1 kernel: [2151034.627142]  __wait_for_common+0xae/0x150
Oct  1 12:57:49 server1 kernel: [2151034.627148]  ? usleep_range_state+0x90/0x90
Oct  1 12:57:49 server1 kernel: [2151034.627155]  wait_for_completion+0x24/0x30
Oct  1 12:57:49 server1 kernel: [2151034.627160]  kthread_stop+0x6d/0x170
Oct  1 12:57:49 server1 kernel: [2151034.627168]  md_unregister_thread+0x44/0x90
Oct  1 12:57:49 server1 kernel: [2151034.627172]  md_reap_sync_thread+0x24/0x230
Oct  1 12:57:49 server1 kernel: [2151034.627177]  action_store+0x16f/0x300
Oct  1 12:57:49 server1 kernel: [2151034.627182]  md_attr_store+0x95/0xf0
Oct  1 12:57:49 server1 kernel: [2151034.627187]  sysfs_kf_write+0x3e/0x50
Oct  1 12:57:49 server1 kernel: [2151034.627194]  
kernfs_fop_write_iter+0x13b/0x1c0
Oct  1 12:57:49 server1 kernel: [2151034.627199]  new_sync_write+0x114/0x1a0
Oct  1 12:57:49 server1 kernel: [2151034.627207]  vfs_write+0x1d5/0x270
Oct  1 12:57:49 server1 kernel: [2151034.627212]  ksys_write+0x67/0xf0
Oct  1 12:57:49 server1 kernel: [2151034.627219]  __x64_sys_write+0x19/0x20
Oct  1 12:57:49 server1 kernel: [2151034.627225]  do_syscall_64+0x5c/0xc0
Oct  1 12:57:49 server1 kernel: [2151034.627232]  ? do_syscall_64+0x69/0xc0
Oct  1 12:57:49 server1 kernel: 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-07-13 Thread brjhaverkamp
I can't add much in terms of data. But this is a +1. My symptoms were virtually 
identical to Chad Wagner's
This happened july 1st 2023 on my machine, the day the check started. It got 
stuck at 98%.
Only cure I could find was to reboot the system.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  New
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-07-04 Thread Filip Hruska
** Also affects: linux-signed-hwe-5.15 (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.15 package in Ubuntu:
  New
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-07-04 Thread Filip Hruska
Seeing this bug on Ubuntu 20.04 and Ubuntu 22.04 as well, with both
normal and HWE kernels.

To add some more information, this bug seems to randomly appear during
the initial RAID 6 creation process as well, where the array is mounted
but completely empty and not accessed - so it's likely to originate
within the mdadm resync process itself, unrelated to other system I/O
operations. Around 1 in 15 arrays will "freeze" during the initial
resync process in my experience, so it is not that uncommon
unfortunately.

The symptoms are always the same - at some point during resync, the
speeds will radically drop to single MB/s levels and continue degrading
over time until "echo active > /sys/block/mdX/md/array_state" is issued.
A minute or two after running that, the speeds ramp back up to normal.

This bug seems unrelated to hardware configuration, as I've seen it
happen across multiple systems with different CPU vendors, HBA models
and with different HDD sizes and vendors. Systems which were previously
stable under Ubuntu 18.04 started exhibiting freezes after upgrading to
20.04 as well.

It would also seem disabling mdcheck_start and mdcheck_continue is not
necessarily the magical bullet in fixing this, it certainly doesn't seem
to help with freezing during the initial resync. I have also seen
instances of mdadm scheduled resyncs freezing when triggered using the
old cronjob method, with both systemd services and timers masked off.

Dmesg from a freshly installed system where the initial resync "froze" 
approximately 15 hours after the array was created:
mdadm --create /dev/md1 --level=6 --raid-devices=6 /dev/sd[cdefgh]
mkfs.ext4 /dev/md1
mount -o errors=remount-ro /dev/md1 /srv


[Jul 4 00:18] md/raid:md1: not clean -- starting background reconstruction
[  +0.62] md/raid:md1: device sdh operational as raid disk 5
[  +0.02] md/raid:md1: device sdg operational as raid disk 4
[  +0.01] md/raid:md1: device sdf operational as raid disk 3
[  +0.01] md/raid:md1: device sde operational as raid disk 2
[  +0.01] md/raid:md1: device sdd operational as raid disk 1
[  +0.01] md/raid:md1: device sdc operational as raid disk 0
[  +0.002104] md/raid:md1: raid level 6 active with 6 out of 6 devices, 
algorithm 2
[  +0.048241] md1: detected capacity change from 0 to 72000290684928
[  +0.66] md: resync of RAID array md1
[Jul 4 00:32] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: 
errors=remount-ro
[Jul 4 02:39] perf: interrupt took too long (2522 > 2500), lowering 
kernel.perf_event_max_sample_rate to 79250
[Jul 4 04:36] perf: interrupt took too long (3155 > 3152), lowering 
kernel.perf_event_max_sample_rate to 63250
[Jul 4 15:22] INFO: task md1_raid6:5688 blocked for more than 120 seconds.
[  +0.59]   Not tainted 5.4.0-153-generic #170-Ubuntu
[  +0.32] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  +0.45] md1_raid6   D0  5688  2 0x80004000
[  +0.04] Call Trace:
[  +0.11]  __schedule+0x2e3/0x740
[  +0.05]  schedule+0x42/0xb0
[  +0.11]  raid5d+0x3e6/0x5f0 [raid456]
[  +0.05]  ? try_to_del_timer_sync+0x54/0x80
[  +0.05]  ? schedule_timeout+0x92/0x160
[  +0.04]  ? __wake_up_pollfree+0x40/0x40
[  +0.04]  md_thread+0x97/0x160
[  +0.03]  ? __wake_up_pollfree+0x40/0x40
[  +0.04]  kthread+0x104/0x140
[  +0.03]  ? md_start_sync+0x60/0x60
[  +0.03]  ? kthread_park+0x90/0x90
[  +0.02]  ret_from_fork+0x1f/0x40
[  +0.05] INFO: task md1_resync:5724 blocked for more than 120 seconds.
[  +0.39]   Not tainted 5.4.0-153-generic #170-Ubuntu
[  +0.31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  +0.44] md1_resync  D0  5724  2 0x80004000
[  +0.02] Call Trace:
[  +0.05]  __schedule+0x2e3/0x740
[  +0.04]  schedule+0x42/0xb0
[  +0.07]  raid5_get_active_stripe+0x459/0x610 [raid456]
[  +0.03]  ? __wake_up_pollfree+0x40/0x40
[  +0.07]  raid5_sync_request+0x38b/0x3b0 [raid456]
[  +0.04]  ? cpumask_next+0x1b/0x20
[  +0.03]  ? is_mddev_idle+0xc1/0x11e
[  +0.04]  md_do_sync.cold+0x3ef/0x992
[  +0.05]  ? sched_clock+0x9/0x10
[  +0.03]  ? __wake_up_pollfree+0x40/0x40
[  +0.04]  md_thread+0x97/0x160
[  +0.04]  kthread+0x104/0x140
[  +0.02]  ? md_start_sync+0x60/0x60
[  +0.03]  ? kthread_park+0x90/0x90
[  +0.03]  ret_from_fork+0x1f/0x40
[  +0.03] INFO: task jbd2/md1-8:6099 blocked for more than 120 seconds.
[  +0.38]   Not tainted 5.4.0-153-generic #170-Ubuntu
[  +0.31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  +0.43] jbd2/md1-8  D0  6099  2 0x80004000
[  +0.01] Call Trace:
[  +0.04]  __schedule+0x2e3/0x740
[  +0.03]  ? __wake_up_common_lock+0x8a/0xc0
[  +0.04]  schedule+0x42/0xb0
[  +0.05]  jbd2_journal_commit_transaction+0x24e/0x18b0
[  +0.04]  ? dequeue_entity+0x118/0x460
[  +0.02]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-06-13 Thread Brian Corriveau
I hit his bug as well in Ubuntu 22.04 with kernel 5.15.0-67-generic

We have a single RAID 5 on 3 drives for 28T. I'm switching to the workaround 
from
comment #5.

Jun  4 12:48:11 server1 kernel: [1622699.548591] INFO: task md0_raid5:406 
blocked for more than 120 seconds.
Jun  4 12:48:11 server1 kernel: [1622699.556202]   Tainted: G   OE  
   5.15.0-67-generic #74-Ubuntu
Jun  4 12:48:11 server1 kernel: [1622699.564101] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  4 12:48:11 server1 kernel: [1622699.573063] task:md0_raid5   state:D 
stack:0 pid:  406 ppid: 2 flags:0x4000
Jun  4 12:48:11 server1 kernel: [1622699.573077] Call Trace:
Jun  4 12:48:11 server1 kernel: [1622699.573081]  
Jun  4 12:48:11 server1 kernel: [1622699.573087]  __schedule+0x24e/0x590
Jun  4 12:48:11 server1 kernel: [1622699.573103]  schedule+0x69/0x110
Jun  4 12:48:11 server1 kernel: [1622699.573115]  raid5d+0x3d9/0x5f0 [raid456]
Jun  4 12:48:11 server1 kernel: [1622699.573140]  ? wait_woken+0x70/0x70
Jun  4 12:48:11 server1 kernel: [1622699.573151]  md_thread+0xad/0x170
Jun  4 12:48:11 server1 kernel: [1622699.573162]  ? wait_woken+0x70/0x70
Jun  4 12:48:11 server1 kernel: [1622699.573169]  ? md_write_inc+0x60/0x60
Jun  4 12:48:11 server1 kernel: [1622699.573176]  kthread+0x12a/0x150
Jun  4 12:48:11 server1 kernel: [1622699.573187]  ? set_kthread_struct+0x50/0x50
Jun  4 12:48:11 server1 kernel: [1622699.573197]  ret_from_fork+0x22/0x30
Jun  4 12:48:11 server1 kernel: [1622699.573212]  
Jun  4 12:48:11 server1 kernel: [1622699.573231] INFO: task jbd2/dm-0-8:1375 
blocked for more than 120 seconds.
Jun  4 12:48:11 server1 kernel: [1622699.581119]   Tainted: G   OE  
   5.15.0-67-generic #74-Ubuntu
Jun  4 12:48:11 server1 kernel: [1622699.589004] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  4 12:48:11 server1 kernel: [1622699.597959] task:jbd2/dm-0-8 state:D 
stack:0 pid: 1375 ppid: 2 flags:0x4000
Jun  4 12:48:11 server1 kernel: [1622699.597968] Call Trace:
Jun  4 12:48:11 server1 kernel: [1622699.597970]  
Jun  4 12:48:11 server1 kernel: [1622699.597973]  __schedule+0x24e/0x590
Jun  4 12:48:11 server1 kernel: [1622699.597984]  schedule+0x69/0x110
Jun  4 12:48:11 server1 kernel: [1622699.597992]  
md_write_start.part.0+0x174/0x220
Jun  4 12:48:11 server1 kernel: [1622699.598002]  ? wait_woken+0x70/0x70
Jun  4 12:48:11 server1 kernel: [1622699.598024]  md_write_start+0x14/0x30
Jun  4 12:48:11 server1 kernel: [1622699.598032]  raid5_make_request+0x77/0x540 
[raid456]
Jun  4 12:48:11 server1 kernel: [1622699.598051]  ? wait_woken+0x70/0x70
Jun  4 12:48:11 server1 kernel: [1622699.598058]  md_handle_request+0x12d/0x1b0
Jun  4 12:48:11 server1 kernel: [1622699.598065]  ? __blk_queue_split+0xfe/0x200
Jun  4 12:48:11 server1 kernel: [1622699.598075]  md_submit_bio+0x71/0xc0
Jun  4 12:48:11 server1 kernel: [1622699.598082]  __submit_bio+0x1a5/0x220
Jun  4 12:48:11 server1 kernel: [1622699.598091]  ? mempool_alloc_slab+0x17/0x20
Jun  4 12:48:11 server1 kernel: [1622699.598102]  __submit_bio_noacct+0x85/0x200
Jun  4 12:48:11 server1 kernel: [1622699.598110]  ? kmem_cache_alloc+0x1ab/0x2f0
Jun  4 12:48:11 server1 kernel: [1622699.598122]  submit_bio_noacct+0x4e/0x120
Jun  4 12:48:11 server1 kernel: [1622699.598131]  submit_bio+0x4a/0x130
Jun  4 12:48:11 server1 kernel: [1622699.598139]  submit_bh_wbc+0x18d/0x1c0
Jun  4 12:48:11 server1 kernel: [1622699.598151]  submit_bh+0x13/0x20
Jun  4 12:48:11 server1 kernel: [1622699.598160]  
jbd2_journal_commit_transaction+0x861/0x17a0
Jun  4 12:48:11 server1 kernel: [1622699.598170]  ? 
__update_idle_core+0x93/0x120
Jun  4 12:48:11 server1 kernel: [1622699.598184]  kjournald2+0xa9/0x280
Jun  4 12:48:11 server1 kernel: [1622699.598190]  ? wait_woken+0x70/0x70
Jun  4 12:48:11 server1 kernel: [1622699.598197]  ? 
load_superblock.part.0+0xc0/0xc0
Jun  4 12:48:11 server1 kernel: [1622699.598202]  kthread+0x12a/0x150
Jun  4 12:48:11 server1 kernel: [1622699.598210]  ? set_kthread_struct+0x50/0x50
Jun  4 12:48:11 server1 kernel: [1622699.598218]  ret_from_fork+0x22/0x30
Jun  4 12:48:11 server1 kernel: [1622699.598229]  

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-06-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-signed-hwe-5.4 (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-01-10 Thread Florian Klemenz
** Also affects: linux-signed-hwe-5.4 (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.4 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2023-01-05 Thread Kilian Felder
I hit this bug with Ubuntu 22.04 (Jammy) on Kernel 5.15.0-56

For testing, I have set up a RAID1 and a RAID5 in a VM. I put the disks
for each RAID on a separate controller. Based on the 'mdhang' script
(see comment #11), I was able to reproduce the error easily.

I ran 2 'mdhang' scripts at the same time, one for RAID1 and one for
RAID5. The RAID5 blocked after a short time. The RAID1 continued to run
without problems. So it probably only affects the RAID5.

On my production system I have now activated the workaround (see comment
#5).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-12-23 Thread Chad Wagner
Comment #5 (https://bugs.launchpad.net/ubuntu/+source/linux-signed-
hwe-5.11/+bug/1942935/comments/5) has been a stable workaround for me
(basically revert back to a continuous resync like 18.04).

My newer machines are using ZFS with raidz2 pools.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-12-12 Thread Robert Lippmann
Turns out my issue was a faulty drive, and the system would lock up when
mdadm hit the bad sectors on resync. The issue seemed like it was lower
in the blockdev code causing a deadlock.

I replaced the drive and the problem went away.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-12-08 Thread Janne Blomqvist
The second of the patches mentioned in #27 (with git SHA 1e2677...) has,
I believe, been backported to Ubuntu kernels 5.15.0-48 and 5.4.0-126.

We've still hit this with Ubuntu Jammy on 5.15.0-53, so I guess the
first commit needs to be backported as well.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-07-10 Thread Robert Lippmann
It won’t let me change the state back to active.

Every time I try nothing happens and array_status is always idle.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-07-10 Thread Chad Wagner
Yeah, that's the same issue as this one.  The issue is the raid is doing
a consistency check (mdcheck) and is transitioned to an "idle" state and
hits a deadlock that causes all I/O through the md device to block.  The
workaround is to change the array state back to active.

I made the changes in #5 almost a year ago and no problems, before that
it pretty much hung almost every single month when the scheduled
consistency check was triggered ever since upgrading to 20.04.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-07-09 Thread Robert Lippmann
Digging further, I think I might be running into this bug:

https://lore.kernel.org/linux-raid/5ed54ffc-
ce82-bf66-4eff-390cb23bc...@molgen.mpg.de/T/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-07-09 Thread Robert Lippmann
I'm running checkarray manually, I took off all the start and stop stuff
like you did.

echo active > /sys/block/md0/md/array_state doesn't fix.

I must have not gotten all the trace last time.  I've attached it here.


** Attachment added: "kernel log snippet"
   
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5602089/+files/newkern.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-07-08 Thread Chad Wagner
I believe to resolve the deadlock you want to do:
echo active > /sys/block/md1/md/array_state

Not "idle". You should see a hung task for mdcheck in there somewhere as well, 
and it only occurs when the raid is resyncing (md_resync should be running), at 
least for me I the workaround in comment 5:
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/comments/5

Haven't had a problem since, upgraded to 22.04 since then as well with
5.15.  I am pretty sure the problem is still there, the problem surfaced
in 20.04 when they changed the raid consistency check.  The new check
pauses the check after 8 hours and triggers the deadlock, I just let it
run to completion like it did in 18.04.

I have not checked the patches, but it's possible you have a different
problem because I don't see in either of two hung process traces any
calls to md_ code.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2022-05-30 Thread Chad Wagner
Looks like two patches are landing in next to resolve this:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527=8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527=1e267742283a4b5a8ca65755c44166be27e9aa0f

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-17 Thread Ubuntu Foundations Team Bug Bot
** Tags added: patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  [921588.559100]  ? timerqueue_del+0x24/0x50
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-17 Thread Chad Wagner
Same issue on impish 5.13.13 kernel, running in VBox.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  [921588.559100]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-17 Thread Chad Wagner
** Patch added: "md-reap-sync-thread.patch"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942935/+attachment/5526028/+files/md-reap-sync-thread.patch

** Tags added: apport-collected impish

** Description changed:

  It seems to always occur during an mdcheck/resync, if I am logged in via
  SSH it is still somewhat responsive and basic utilities like dmesg will
  work.  But it apppears any write I/O will hang the terminal and nothing
  is written to syslog (presumably because it is blocked).
  
  Below is output of dmesg and cat /proc/mdstat, it appears the data check
  was interrupted and /proc/mdstat still shows progress, and a whole slew
  of hung tasks including md1_resync itself.
  
  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  [921588.559100]  ? timerqueue_del+0x24/0x50
  [921588.559105]  ? futex_wait+0x1ed/0x270
  [921588.559109]  do_writepages+0x43/0xd0
  [921588.559112]  ? do_writepages+0x43/0xd0
  [921588.559115]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-17 Thread Chad Wagner
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Confirmed
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Chad Wagner
** Tags removed: hirsute

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Incomplete
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  [921588.559100]  ? timerqueue_del+0x24/0x50
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Chad Wagner
** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux package in Ubuntu:
  Incomplete
Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? hrtimer_start_range_ns+0x1aa/0x2f0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Chad Wagner
Here is Donald Buczek's reproducer script.  I setup an Ubuntu 20.04 VM
with latest linux-image-generic and was able to reproduce it within
maybe 10 or 15 minutes.  Exactly the same issue.

Filesystem layout built as follows:

# assemble raid devices
mdadm --create /dev/md0 --level=1 --raid-devices=2 --spare-devices=1 /dev/sda2 
/dev/sdb2 /dev/sdc2
mdadm --create /dev/md1 --level=5 --raid-devices=5 /dev/sda3 /dev/sdb3 
/dev/sdc3 /dev/sdd3 /dev/sde3


# create PVs, VGs, LVs
pvcreate /dev/md1
vgcreate vg1 /dev/md1
lvcreate --name root --extents 100%FREE vg1


# create filesystems
mkfs.ext4 /dev/md0
mkfs.ext4 /dev/vg1/root


** Attachment added: "mdhang.sh"
   
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5525050/+files/mdhang.sh

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Chad Wagner
The patch hasn't made it into mainline from what I have seen, it looks
like it died back in March waiting for feedback from additional kernel
developers.  From what I have gathered this is a deadlock scenario
directly caused by pausing the resync while the system is under heavy
write activity.

Donald Buczek provided a reproducer which is a shell script that
generates a lot of write activity and pauses/resumes the raid scrubbing.
And he also provided a workaround to get the stuck system running
without reboot:

echo active > /sys/block/md1/md/array_state


I haven't tried the patch or any of this, I pretty much eliminated the trigger 
which is mdcheck_start & mdcheck_continue and went back to the 18.04 LTS way of 
scrubbing arrays (which is basically don't pause/interrupt it once it starts).  
I ran through a checkarray yesterday to 100%, no problems.  Meanwhile since 
upgrading to 20.04 LTS it has hung almost every single time through 5.4, 5.8, 
and 5.11 kernels.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Mikko Rantalainen
One of the systems was using package linux-generic and in practice

Linux pedabackup 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version_signature 
Ubuntu 5.4.0-80.90-generic 5.4.124

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-signed-hwe-5.11 (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-13 Thread Mikko Rantalainen
I think I've seen this issue once per two different systems so I think
this is software issue. Does anybody know if patch in comment #6 is
going to be included in Ubuntu 20.04 LTS?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  Confirmed

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-12 Thread Chad Wagner
Here is the proposed patch, Doesn't appear to have been applied.  Last
report was with 5.11rc5.

https://lore.kernel.org/linux-raid/1613177399-22024-1-git-send-email-
guoqing.ji...@cloud.ionos.com/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? 

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-12 Thread Chad Wagner
Similar report here on 5.10.0-rc4:
https://www.spinics.net/lists/raid/msg66654.html


I ended up masking the services introduced with 20.04 LTS, and switched back 
the crontab.

systemctl mask mdcheck_continue.service mdcheck_continue.timer 
mdcheck_start.service mdcheck_start.timer
cat > /etc/cron.d/mdadm << 'EOF'
#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft 
# distributed under the terms of the Artistic Licence 2.0
#

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 
]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi
EOF


The pausing and resuming of the integrity check was an annoyance for me
anyways.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-08 Thread Chad Wagner
Hi Kleber,
  I installed it later yesterday, but I won't know until the next resync. This 
has been a problem since at least linux 5.4 kernel that shipped with Ubuntu 
20.04.  I don't think I had these problems on Ubuntu 18.04 LTS, the same 
hardware, running the linux-image-generic at that time.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-08 Thread Kleber Sacilotto de Souza
Hello Chad Wagner,

Thank you for reporting this issue. Could you please try installing the
latest 20.04 HWE kernel and check whether the problem persists? The
version currently in focal-updates is 5.11.0-34.36~20.04.1.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  

[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync

2021-09-07 Thread Chad Wagner
** Attachment added: "screenlog.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5523575/+files/screenlog.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1942935

Title:
  kernel io hangs during mdcheck/resync

Status in linux-signed-hwe-5.11 package in Ubuntu:
  New

Bug description:
  It seems to always occur during an mdcheck/resync, if I am logged in
  via SSH it is still somewhat responsive and basic utilities like dmesg
  will work.  But it apppears any write I/O will hang the terminal and
  nothing is written to syslog (presumably because it is blocked).

  Below is output of dmesg and cat /proc/mdstat, it appears the data
  check was interrupted and /proc/mdstat still shows progress, and a
  whole slew of hung tasks including md1_resync itself.

  [756484.534293] md: data-check of RAID array md0
  [756484.628039] md: delaying data-check of md1 until md0 has finished (they 
share one or more physical units)
  [756493.808773] md: md0: data-check done.
  [756493.829760] md: data-check of RAID array md1
  [778112.446410] md: md1: data-check interrupted.
  [810654.608102] md: data-check of RAID array md1
  [832291.201064] md: md1: data-check interrupted.
  [899745.389485] md: data-check of RAID array md1
  [921395.835305] md: md1: data-check interrupted.
  [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 
seconds.
  [921588.558846]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.558854] task:systemd-journal state:D stack:0 pid:  376 ppid: 
1 flags:0x0220
  [921588.558859] Call Trace:
  [921588.558864]  __schedule+0x44c/0x8a0
  [921588.558872]  schedule+0x4f/0xc0
  [921588.558876]  md_write_start+0x150/0x240
  [921588.558880]  ? wait_woken+0x80/0x80
  [921588.558886]  raid5_make_request+0x88/0x890 [raid456]
  [921588.558898]  ? wait_woken+0x80/0x80
  [921588.558901]  ? mempool_kmalloc+0x17/0x20
  [921588.558904]  md_handle_request+0x12d/0x1a0
  [921588.558907]  ? __part_start_io_acct+0x51/0xf0
  [921588.558912]  md_submit_bio+0xca/0x100
  [921588.558915]  submit_bio_noacct+0x112/0x4f0
  [921588.558918]  ? ext4_fc_reserve_space+0x110/0x230
  [921588.558922]  submit_bio+0x51/0x1a0
  [921588.558925]  ? _cond_resched+0x19/0x30
  [921588.558928]  ? kmem_cache_alloc+0x38e/0x440
  [921588.558932]  ? ext4_init_io_end+0x1f/0x50
  [921588.558936]  ext4_io_submit+0x4d/0x60
  [921588.558940]  ext4_writepages+0x2c6/0xcd0
  [921588.558944]  do_writepages+0x43/0xd0
  [921588.558948]  ? do_writepages+0x43/0xd0
  [921588.558951]  ? fault_dirty_shared_page+0xa5/0x110
  [921588.558955]  __filemap_fdatawrite_range+0xcc/0x110
  [921588.558960]  file_write_and_wait_range+0x74/0xc0
  [921588.558962]  ext4_sync_file+0xf5/0x350
  [921588.558967]  vfs_fsync_range+0x49/0x80
  [921588.558970]  do_fsync+0x3d/0x70
  [921588.558973]  __x64_sys_fsync+0x14/0x20
  [921588.558976]  do_syscall_64+0x38/0x90
  [921588.558980]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [921588.558984] RIP: 0033:0x7f4c97ee832b
  [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 
004a
  [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 
7f4c97ee832b
  [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 
0021
  [921588.558995] RBP: 0001 R08:  R09: 
7ffdceb29fa8
  [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 
7ffdceb29fa8
  [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 
55ced34bcf90
  [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds.
  [921588.559018]   Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu
  [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [921588.559025] task:mysqld  state:D stack:0 pid: 1505 ppid: 
1 flags:0x
  [921588.559030] Call Trace:
  [921588.559032]  __schedule+0x44c/0x8a0
  [921588.559036]  schedule+0x4f/0xc0
  [921588.559040]  md_write_start+0x150/0x240
  [921588.559044]  ? wait_woken+0x80/0x80
  [921588.559047]  raid5_make_request+0x88/0x890 [raid456]
  [921588.559056]  ? wait_woken+0x80/0x80
  [921588.559059]  ? mempool_kmalloc+0x17/0x20
  [921588.559062]  md_handle_request+0x12d/0x1a0
  [921588.559065]  ? __part_start_io_acct+0x51/0xf0
  [921588.559068]  md_submit_bio+0xca/0x100
  [921588.559071]  submit_bio_noacct+0x112/0x4f0
  [921588.559075]  submit_bio+0x51/0x1a0
  [921588.559077]  ? _cond_resched+0x19/0x30
  [921588.559081]  ? kmem_cache_alloc+0x38e/0x440
  [921588.559084]  ? ext4_init_io_end+0x1f/0x50
  [921588.559088]  ext4_io_submit+0x4d/0x60
  [921588.559091]  ext4_writepages+0x2c6/0xcd0
  [921588.559094]  ? __schedule+0x454/0x8a0
  [921588.559097]  ?