[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I was able to reproduce the hang problem using the script mdhang.sh (modified for my raid device location) on 22.04 with kernel 5.15 several times, usually within 10-15 minutes. Using 22.04 hwe kernel 6.5 I was not able to reproduce the problem with a 25+ hours run of the same script. Seems something has been fixed in the newer hwe kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Hi. I have the same problem on radi5x2 arrays assembled in stripe The total volume of the array is 97 TB and free 16 GB Do users who have a bug also have little free space? I couldn’t reproduce this condition on a test bench under a synthetic load. Ubuntu 22.04.3 LTS kernel 5.15.0.-72-generic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Hello, got the same issue with Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-91-generic x86_64). Also tried 5.19 kernel and got the same problem. Jan 7 02:28:26 cache4 systemd[1]: Starting MD array scrubbing... Jan 7 02:28:26 cache4 root: mdcheck start checking /dev/md0 Jan 7 08:28:44 cache4 kernel: [2914434.326024] md: md0: data-check interrupted. Jan 7 08:32:08 cache4 kernel: [2914638.397357] INFO: task jbd2/md0-8:1337 blocked for more than 120 seconds. Jan 7 08:32:08 cache4 kernel: [2914638.397420] Not tainted 5.15.0-91-generic #99-Ubuntu Jan 7 08:32:08 cache4 kernel: [2914638.397457] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 7 08:32:08 cache4 kernel: [2914638.397505] task:jbd2/md0-8 state:D stack:0 pid: 1337 ppid: 2 flags:0x4000 Jan 7 08:32:08 cache4 kernel: [2914638.397512] Call Trace: Jan 7 08:32:08 cache4 kernel: [2914638.397515] Jan 7 08:32:08 cache4 kernel: [2914638.397520] __schedule+0x24e/0x590 Jan 7 08:32:08 cache4 kernel: [2914638.397530] schedule+0x69/0x110 Jan 7 08:32:08 cache4 kernel: [2914638.397535] md_write_start.part.0+0x174/0x220 Jan 7 08:32:08 cache4 kernel: [2914638.397540] ? wait_woken+0x70/0x70 Jan 7 08:32:08 cache4 kernel: [2914638.397547] md_write_start+0x14/0x30 Jan 7 08:32:08 cache4 kernel: [2914638.397553] raid5_make_request+0x77/0x540 [raid456] Jan 7 08:32:08 cache4 kernel: [2914638.397566] ? jbd2_transaction_committed+0x1b/0x60 Jan 7 08:32:08 cache4 kernel: [2914638.397573] ? ext4_set_iomap+0x5a/0x1d0 Jan 7 08:32:08 cache4 kernel: [2914638.397579] ? wait_woken+0x70/0x70 Jan 7 08:32:08 cache4 kernel: [2914638.397584] md_handle_request+0x12d/0x1b0 Jan 7 08:32:08 cache4 kernel: [2914638.397589] ? submit_bio_checks+0x1a5/0x560 Jan 7 08:32:08 cache4 kernel: [2914638.397595] md_submit_bio+0x76/0xc0 Jan 7 08:32:08 cache4 kernel: [2914638.397600] __submit_bio+0x1a5/0x220 Jan 7 08:32:08 cache4 kernel: [2914638.397603] ? mempool_alloc_slab+0x17/0x20 Jan 7 08:32:08 cache4 kernel: [2914638.397611] __submit_bio_noacct+0x85/0x200 Jan 7 08:32:08 cache4 kernel: [2914638.397614] ? kmem_cache_alloc+0x1ab/0x2f0 Jan 7 08:32:08 cache4 kernel: [2914638.397619] submit_bio_noacct+0x4e/0x120 Jan 7 08:32:08 cache4 kernel: [2914638.397623] submit_bio+0x4a/0x130 Jan 7 08:32:08 cache4 kernel: [2914638.397627] submit_bh_wbc+0x18d/0x1c0 Jan 7 08:32:08 cache4 kernel: [2914638.397632] submit_bh+0x13/0x20 Jan 7 08:32:08 cache4 kernel: [2914638.397635] jbd2_journal_commit_transaction+0x861/0x17a0 Jan 7 08:32:08 cache4 kernel: [2914638.397640] ? __update_idle_core+0x93/0x120 Jan 7 08:32:08 cache4 kernel: [2914638.397649] kjournald2+0xa9/0x280 Jan 7 08:32:08 cache4 kernel: [2914638.397653] ? wait_woken+0x70/0x70 Jan 7 08:32:08 cache4 kernel: [2914638.397657] ? load_superblock.part.0+0xc0/0xc0 Jan 7 08:32:08 cache4 kernel: [2914638.397662] kthread+0x12a/0x150 Jan 7 08:32:08 cache4 kernel: [2914638.397667] ? set_kthread_struct+0x50/0x50 Jan 7 08:32:08 cache4 kernel: [2914638.397672] ret_from_fork+0x22/0x30 Jan 7 08:32:08 cache4 kernel: [2914638.397680] # cat /sys/block/md0/md/array_state write-pending This is happening on all our servers with NVMe devices. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Linux version 6.4.12-200.fc38.x86_64 (mockbuild@30894952d3244f1ab967aeda9ed417f6) (gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), GNU ld version 2.39-9.fc38) #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 230 ?I< 0:00 \_ [md] 1377 ?S 6:15 \_ [md0_raid1] 1565 ?D4955:37 \_ [md4_raid6] 2772538 ?DN 111:11 \_ [md4_resync] # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sde2[0] 13671170048 blocks super 1.2 [1/1] [U] bitmap: 1/102 pages [4KB], 65536KB chunk md4 : active raid6 sdd1[2] sdb1[5] sde1[4] sdc1[1] 11720779776 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [] [=>...] check = 89.0% (5217054876/5860389888) finish=852549.6min speed=12K/sec bitmap: 9/44 pages [36KB], 65536KB chunk unused devices: # cat /proc/2772538/stack # md4_resync [<0>] raid5_get_active_stripe+0x271/0x540 [raid456] [<0>] raid5_sync_request+0x3ad/0x3d0 [raid456] [<0>] md_do_sync+0x7be/0x11c0 [<0>] md_thread+0xae/0x190 [<0>] kthread+0xe8/0x120 [<0>] ret_from_fork+0x2c/0x50 # cat /proc/1565/stack # md4_raid6 [<0>] raid5d+0x524/0x750 [raid456] [<0>] md_thread+0xae/0x190 [<0>] kthread+0xe8/0x120 [<0>] ret_from_fork+0x2c/0x50 Workarounded by: # cat /sys/block/md4/md/array_state write-pending # echo active > /sys/block/md4/md/array_state # cat /sys/block/md4/md/array_state active -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP:
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Noticed this issue on Ubuntu 20.04 with a md raid device. System exhibited the same behavior as other users have noted: high CPU usage and terminal locking up until the system is rebooted. [14715252.569157] INFO: task md1_raid4:1763945 blocked for more than 120 seconds. [14715252.570228] Not tainted 5.4.0-146-generic #163-Ubuntu [14715252.571277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [14715252.572347] md1_raid4 D0 1763945 2 0x80004000 [14715252.572357] Call Trace: [14715252.572360] __schedule+0x2e3/0x740 [14715252.572363] schedule+0x42/0xb0 [14715252.572369] raid5d+0x3e6/0x5f0 [raid456] [14715252.572376] ? schedule_timeout+0x10e/0x160 [14715252.572381] ? __wake_up_pollfree+0x40/0x40 [14715252.572384] md_thread+0x97/0x160 [14715252.572392] ? __wake_up_pollfree+0x40/0x40 [14715252.572394] kthread+0x104/0x140 [14715252.572399] ? md_start_sync+0x60/0x60 [14715252.572403] ? kthread_park+0x90/0x90 [14715252.572405] ret_from_fork+0x35/0x40 [14715252.572430] INFO: task kworker/u64:1:3189415 blocked for more than 120 seconds. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
On Ubuntu 22.04, 5.15.0-83-generic #92-Ubuntu - our storage system ran into this bug. mdcheck ran for the scheduled 1st day of the month and then hung 6 hours later. Oct 1 06:52:13 server1 systemd[1]: Starting MD array scrubbing... Oct 1 06:52:13 server1 root: mdcheck start checking /dev/md0 Oct 1 06:52:13 server1 kernel: [2129098.393495] md: data-check of RAID array md0 Oct 1 12:57:49 server1 kernel: [2151034.623372] INFO: task dmcrypt_write/2:1783 blocked for more than 241 seconds. Oct 1 12:57:49 server1 kernel: [2151034.623446] Tainted: G S 5.15.0-83-generic #92-Ubuntu Oct 1 12:57:49 server1 kernel: [2151034.623498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 1 12:57:49 server1 kernel: [2151034.623559] task:dmcrypt_write/2 state:D stack:0 pid: 1783 ppid: 2 flags:0x4000 Oct 1 12:57:49 server1 kernel: [2151034.623566] Call Trace: Oct 1 12:57:49 server1 kernel: [2151034.623570] Oct 1 12:57:49 server1 kernel: [2151034.623574] __schedule+0x24e/0x590 Oct 1 12:57:49 server1 kernel: [2151034.623585] ? __schedule+0x256/0x590 Oct 1 12:57:49 server1 kernel: [2151034.623590] schedule+0x69/0x110 Oct 1 12:57:49 server1 kernel: [2151034.623596] md_write_start.part.0+0x174/0x220 Oct 1 12:57:49 server1 kernel: [2151034.623601] ? wait_woken+0x70/0x70 Oct 1 12:57:49 server1 kernel: [2151034.623610] md_write_start+0x14/0x30 Oct 1 12:57:49 server1 kernel: [2151034.623615] raid5_make_request+0x77/0x540 [raid456] Oct 1 12:57:49 server1 kernel: [2151034.623633] ? cgroup_rstat_updated+0x11c/0x1e0 Oct 1 12:57:49 server1 kernel: [2151034.623642] ? wait_woken+0x70/0x70 Oct 1 12:57:49 server1 kernel: [2151034.623648] md_handle_request+0x12d/0x1b0 Oct 1 12:57:49 server1 kernel: [2151034.623657] ? submit_bio_checks+0x1a5/0x560 Oct 1 12:57:49 server1 kernel: [2151034.623664] md_submit_bio+0x76/0xc0 Oct 1 12:57:49 server1 kernel: [2151034.623670] __submit_bio+0x1a5/0x220 Oct 1 12:57:49 server1 kernel: [2151034.623675] ? psi_task_switch+0xc6/0x220 Oct 1 12:57:49 server1 kernel: [2151034.623682] __submit_bio_noacct+0x85/0x200 Oct 1 12:57:49 server1 kernel: [2151034.623687] submit_bio_noacct+0x4e/0x120 Oct 1 12:57:49 server1 kernel: [2151034.623691] ? schedule+0x69/0x110 Oct 1 12:57:49 server1 kernel: [2151034.623698] dmcrypt_write+0x104/0x130 [dm_crypt] Oct 1 12:57:49 server1 kernel: [2151034.623708] ? crypt_ctr+0x600/0x600 [dm_crypt] Oct 1 12:57:49 server1 kernel: [2151034.623715] kthread+0x12a/0x150 Oct 1 12:57:49 server1 kernel: [2151034.623723] ? set_kthread_struct+0x50/0x50 Oct 1 12:57:49 server1 kernel: [2151034.623730] ret_from_fork+0x22/0x30 Oct 1 12:57:49 server1 kernel: [2151034.623739] Oct 1 12:57:49 server1 kernel: [2151034.623778] INFO: task mdcheck:2323903 blocked for more than 241 seconds. Oct 1 12:57:49 server1 kernel: [2151034.623833] Tainted: G S 5.15.0-83-generic #92-Ubuntu Oct 1 12:57:49 server1 kernel: [2151034.625482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 1 12:57:49 server1 kernel: [2151034.627098] task:mdcheck state:D stack:0 pid:2323903 ppid: 1 flags:0x0002 Oct 1 12:57:49 server1 kernel: [2151034.627104] Call Trace: Oct 1 12:57:49 server1 kernel: [2151034.627106] Oct 1 12:57:49 server1 kernel: [2151034.627109] __schedule+0x24e/0x590 Oct 1 12:57:49 server1 kernel: [2151034.627114] ? select_idle_sibling+0x2b/0xa60 Oct 1 12:57:49 server1 kernel: [2151034.627124] schedule+0x69/0x110 Oct 1 12:57:49 server1 kernel: [2151034.627129] schedule_timeout+0x103/0x140 Oct 1 12:57:49 server1 kernel: [2151034.627135] ? ttwu_queue_wakelist+0x131/0x1c0 Oct 1 12:57:49 server1 kernel: [2151034.627142] __wait_for_common+0xae/0x150 Oct 1 12:57:49 server1 kernel: [2151034.627148] ? usleep_range_state+0x90/0x90 Oct 1 12:57:49 server1 kernel: [2151034.627155] wait_for_completion+0x24/0x30 Oct 1 12:57:49 server1 kernel: [2151034.627160] kthread_stop+0x6d/0x170 Oct 1 12:57:49 server1 kernel: [2151034.627168] md_unregister_thread+0x44/0x90 Oct 1 12:57:49 server1 kernel: [2151034.627172] md_reap_sync_thread+0x24/0x230 Oct 1 12:57:49 server1 kernel: [2151034.627177] action_store+0x16f/0x300 Oct 1 12:57:49 server1 kernel: [2151034.627182] md_attr_store+0x95/0xf0 Oct 1 12:57:49 server1 kernel: [2151034.627187] sysfs_kf_write+0x3e/0x50 Oct 1 12:57:49 server1 kernel: [2151034.627194] kernfs_fop_write_iter+0x13b/0x1c0 Oct 1 12:57:49 server1 kernel: [2151034.627199] new_sync_write+0x114/0x1a0 Oct 1 12:57:49 server1 kernel: [2151034.627207] vfs_write+0x1d5/0x270 Oct 1 12:57:49 server1 kernel: [2151034.627212] ksys_write+0x67/0xf0 Oct 1 12:57:49 server1 kernel: [2151034.627219] __x64_sys_write+0x19/0x20 Oct 1 12:57:49 server1 kernel: [2151034.627225] do_syscall_64+0x5c/0xc0 Oct 1 12:57:49 server1 kernel: [2151034.627232] ? do_syscall_64+0x69/0xc0 Oct 1 12:57:49 server1 kernel:
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I can't add much in terms of data. But this is a +1. My symptoms were virtually identical to Chad Wagner's This happened july 1st 2023 on my machine, the day the check started. It got stuck at 98%. Only cure I could find was to reboot the system. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: New Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Also affects: linux-signed-hwe-5.15 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.15 package in Ubuntu: New Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Seeing this bug on Ubuntu 20.04 and Ubuntu 22.04 as well, with both normal and HWE kernels. To add some more information, this bug seems to randomly appear during the initial RAID 6 creation process as well, where the array is mounted but completely empty and not accessed - so it's likely to originate within the mdadm resync process itself, unrelated to other system I/O operations. Around 1 in 15 arrays will "freeze" during the initial resync process in my experience, so it is not that uncommon unfortunately. The symptoms are always the same - at some point during resync, the speeds will radically drop to single MB/s levels and continue degrading over time until "echo active > /sys/block/mdX/md/array_state" is issued. A minute or two after running that, the speeds ramp back up to normal. This bug seems unrelated to hardware configuration, as I've seen it happen across multiple systems with different CPU vendors, HBA models and with different HDD sizes and vendors. Systems which were previously stable under Ubuntu 18.04 started exhibiting freezes after upgrading to 20.04 as well. It would also seem disabling mdcheck_start and mdcheck_continue is not necessarily the magical bullet in fixing this, it certainly doesn't seem to help with freezing during the initial resync. I have also seen instances of mdadm scheduled resyncs freezing when triggered using the old cronjob method, with both systemd services and timers masked off. Dmesg from a freshly installed system where the initial resync "froze" approximately 15 hours after the array was created: mdadm --create /dev/md1 --level=6 --raid-devices=6 /dev/sd[cdefgh] mkfs.ext4 /dev/md1 mount -o errors=remount-ro /dev/md1 /srv [Jul 4 00:18] md/raid:md1: not clean -- starting background reconstruction [ +0.62] md/raid:md1: device sdh operational as raid disk 5 [ +0.02] md/raid:md1: device sdg operational as raid disk 4 [ +0.01] md/raid:md1: device sdf operational as raid disk 3 [ +0.01] md/raid:md1: device sde operational as raid disk 2 [ +0.01] md/raid:md1: device sdd operational as raid disk 1 [ +0.01] md/raid:md1: device sdc operational as raid disk 0 [ +0.002104] md/raid:md1: raid level 6 active with 6 out of 6 devices, algorithm 2 [ +0.048241] md1: detected capacity change from 0 to 72000290684928 [ +0.66] md: resync of RAID array md1 [Jul 4 00:32] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: errors=remount-ro [Jul 4 02:39] perf: interrupt took too long (2522 > 2500), lowering kernel.perf_event_max_sample_rate to 79250 [Jul 4 04:36] perf: interrupt took too long (3155 > 3152), lowering kernel.perf_event_max_sample_rate to 63250 [Jul 4 15:22] INFO: task md1_raid6:5688 blocked for more than 120 seconds. [ +0.59] Not tainted 5.4.0-153-generic #170-Ubuntu [ +0.32] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.45] md1_raid6 D0 5688 2 0x80004000 [ +0.04] Call Trace: [ +0.11] __schedule+0x2e3/0x740 [ +0.05] schedule+0x42/0xb0 [ +0.11] raid5d+0x3e6/0x5f0 [raid456] [ +0.05] ? try_to_del_timer_sync+0x54/0x80 [ +0.05] ? schedule_timeout+0x92/0x160 [ +0.04] ? __wake_up_pollfree+0x40/0x40 [ +0.04] md_thread+0x97/0x160 [ +0.03] ? __wake_up_pollfree+0x40/0x40 [ +0.04] kthread+0x104/0x140 [ +0.03] ? md_start_sync+0x60/0x60 [ +0.03] ? kthread_park+0x90/0x90 [ +0.02] ret_from_fork+0x1f/0x40 [ +0.05] INFO: task md1_resync:5724 blocked for more than 120 seconds. [ +0.39] Not tainted 5.4.0-153-generic #170-Ubuntu [ +0.31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.44] md1_resync D0 5724 2 0x80004000 [ +0.02] Call Trace: [ +0.05] __schedule+0x2e3/0x740 [ +0.04] schedule+0x42/0xb0 [ +0.07] raid5_get_active_stripe+0x459/0x610 [raid456] [ +0.03] ? __wake_up_pollfree+0x40/0x40 [ +0.07] raid5_sync_request+0x38b/0x3b0 [raid456] [ +0.04] ? cpumask_next+0x1b/0x20 [ +0.03] ? is_mddev_idle+0xc1/0x11e [ +0.04] md_do_sync.cold+0x3ef/0x992 [ +0.05] ? sched_clock+0x9/0x10 [ +0.03] ? __wake_up_pollfree+0x40/0x40 [ +0.04] md_thread+0x97/0x160 [ +0.04] kthread+0x104/0x140 [ +0.02] ? md_start_sync+0x60/0x60 [ +0.03] ? kthread_park+0x90/0x90 [ +0.03] ret_from_fork+0x1f/0x40 [ +0.03] INFO: task jbd2/md1-8:6099 blocked for more than 120 seconds. [ +0.38] Not tainted 5.4.0-153-generic #170-Ubuntu [ +0.31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.43] jbd2/md1-8 D0 6099 2 0x80004000 [ +0.01] Call Trace: [ +0.04] __schedule+0x2e3/0x740 [ +0.03] ? __wake_up_common_lock+0x8a/0xc0 [ +0.04] schedule+0x42/0xb0 [ +0.05] jbd2_journal_commit_transaction+0x24e/0x18b0 [ +0.04] ? dequeue_entity+0x118/0x460 [ +0.02] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I hit his bug as well in Ubuntu 22.04 with kernel 5.15.0-67-generic We have a single RAID 5 on 3 drives for 28T. I'm switching to the workaround from comment #5. Jun 4 12:48:11 server1 kernel: [1622699.548591] INFO: task md0_raid5:406 blocked for more than 120 seconds. Jun 4 12:48:11 server1 kernel: [1622699.556202] Tainted: G OE 5.15.0-67-generic #74-Ubuntu Jun 4 12:48:11 server1 kernel: [1622699.564101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 4 12:48:11 server1 kernel: [1622699.573063] task:md0_raid5 state:D stack:0 pid: 406 ppid: 2 flags:0x4000 Jun 4 12:48:11 server1 kernel: [1622699.573077] Call Trace: Jun 4 12:48:11 server1 kernel: [1622699.573081] Jun 4 12:48:11 server1 kernel: [1622699.573087] __schedule+0x24e/0x590 Jun 4 12:48:11 server1 kernel: [1622699.573103] schedule+0x69/0x110 Jun 4 12:48:11 server1 kernel: [1622699.573115] raid5d+0x3d9/0x5f0 [raid456] Jun 4 12:48:11 server1 kernel: [1622699.573140] ? wait_woken+0x70/0x70 Jun 4 12:48:11 server1 kernel: [1622699.573151] md_thread+0xad/0x170 Jun 4 12:48:11 server1 kernel: [1622699.573162] ? wait_woken+0x70/0x70 Jun 4 12:48:11 server1 kernel: [1622699.573169] ? md_write_inc+0x60/0x60 Jun 4 12:48:11 server1 kernel: [1622699.573176] kthread+0x12a/0x150 Jun 4 12:48:11 server1 kernel: [1622699.573187] ? set_kthread_struct+0x50/0x50 Jun 4 12:48:11 server1 kernel: [1622699.573197] ret_from_fork+0x22/0x30 Jun 4 12:48:11 server1 kernel: [1622699.573212] Jun 4 12:48:11 server1 kernel: [1622699.573231] INFO: task jbd2/dm-0-8:1375 blocked for more than 120 seconds. Jun 4 12:48:11 server1 kernel: [1622699.581119] Tainted: G OE 5.15.0-67-generic #74-Ubuntu Jun 4 12:48:11 server1 kernel: [1622699.589004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 4 12:48:11 server1 kernel: [1622699.597959] task:jbd2/dm-0-8 state:D stack:0 pid: 1375 ppid: 2 flags:0x4000 Jun 4 12:48:11 server1 kernel: [1622699.597968] Call Trace: Jun 4 12:48:11 server1 kernel: [1622699.597970] Jun 4 12:48:11 server1 kernel: [1622699.597973] __schedule+0x24e/0x590 Jun 4 12:48:11 server1 kernel: [1622699.597984] schedule+0x69/0x110 Jun 4 12:48:11 server1 kernel: [1622699.597992] md_write_start.part.0+0x174/0x220 Jun 4 12:48:11 server1 kernel: [1622699.598002] ? wait_woken+0x70/0x70 Jun 4 12:48:11 server1 kernel: [1622699.598024] md_write_start+0x14/0x30 Jun 4 12:48:11 server1 kernel: [1622699.598032] raid5_make_request+0x77/0x540 [raid456] Jun 4 12:48:11 server1 kernel: [1622699.598051] ? wait_woken+0x70/0x70 Jun 4 12:48:11 server1 kernel: [1622699.598058] md_handle_request+0x12d/0x1b0 Jun 4 12:48:11 server1 kernel: [1622699.598065] ? __blk_queue_split+0xfe/0x200 Jun 4 12:48:11 server1 kernel: [1622699.598075] md_submit_bio+0x71/0xc0 Jun 4 12:48:11 server1 kernel: [1622699.598082] __submit_bio+0x1a5/0x220 Jun 4 12:48:11 server1 kernel: [1622699.598091] ? mempool_alloc_slab+0x17/0x20 Jun 4 12:48:11 server1 kernel: [1622699.598102] __submit_bio_noacct+0x85/0x200 Jun 4 12:48:11 server1 kernel: [1622699.598110] ? kmem_cache_alloc+0x1ab/0x2f0 Jun 4 12:48:11 server1 kernel: [1622699.598122] submit_bio_noacct+0x4e/0x120 Jun 4 12:48:11 server1 kernel: [1622699.598131] submit_bio+0x4a/0x130 Jun 4 12:48:11 server1 kernel: [1622699.598139] submit_bh_wbc+0x18d/0x1c0 Jun 4 12:48:11 server1 kernel: [1622699.598151] submit_bh+0x13/0x20 Jun 4 12:48:11 server1 kernel: [1622699.598160] jbd2_journal_commit_transaction+0x861/0x17a0 Jun 4 12:48:11 server1 kernel: [1622699.598170] ? __update_idle_core+0x93/0x120 Jun 4 12:48:11 server1 kernel: [1622699.598184] kjournald2+0xa9/0x280 Jun 4 12:48:11 server1 kernel: [1622699.598190] ? wait_woken+0x70/0x70 Jun 4 12:48:11 server1 kernel: [1622699.598197] ? load_superblock.part.0+0xc0/0xc0 Jun 4 12:48:11 server1 kernel: [1622699.598202] kthread+0x12a/0x150 Jun 4 12:48:11 server1 kernel: [1622699.598210] ? set_kthread_struct+0x50/0x50 Jun 4 12:48:11 server1 kernel: [1622699.598218] ret_from_fork+0x22/0x30 Jun 4 12:48:11 server1 kernel: [1622699.598229] -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-signed-hwe-5.4 (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Also affects: linux-signed-hwe-5.4 (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Status in linux-signed-hwe-5.4 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I hit this bug with Ubuntu 22.04 (Jammy) on Kernel 5.15.0-56 For testing, I have set up a RAID1 and a RAID5 in a VM. I put the disks for each RAID on a separate controller. Based on the 'mdhang' script (see comment #11), I was able to reproduce the error easily. I ran 2 'mdhang' scripts at the same time, one for RAID1 and one for RAID5. The RAID5 blocked after a short time. The RAID1 continued to run without problems. So it probably only affects the RAID5. On my production system I have now activated the workaround (see comment #5). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Comment #5 (https://bugs.launchpad.net/ubuntu/+source/linux-signed- hwe-5.11/+bug/1942935/comments/5) has been a stable workaround for me (basically revert back to a continuous resync like 18.04). My newer machines are using ZFS with raidz2 pools. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Turns out my issue was a faulty drive, and the system would lock up when mdadm hit the bad sectors on resync. The issue seemed like it was lower in the blockdev code causing a deadlock. I replaced the drive and the problem went away. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
The second of the patches mentioned in #27 (with git SHA 1e2677...) has, I believe, been backported to Ubuntu kernels 5.15.0-48 and 5.4.0-126. We've still hit this with Ubuntu Jammy on 5.15.0-53, so I guess the first commit needs to be backported as well. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
It won’t let me change the state back to active. Every time I try nothing happens and array_status is always idle. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Yeah, that's the same issue as this one. The issue is the raid is doing a consistency check (mdcheck) and is transitioned to an "idle" state and hits a deadlock that causes all I/O through the md device to block. The workaround is to change the array state back to active. I made the changes in #5 almost a year ago and no problems, before that it pretty much hung almost every single month when the scheduled consistency check was triggered ever since upgrading to 20.04. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Digging further, I think I might be running into this bug: https://lore.kernel.org/linux-raid/5ed54ffc- ce82-bf66-4eff-390cb23bc...@molgen.mpg.de/T/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I'm running checkarray manually, I took off all the start and stop stuff like you did. echo active > /sys/block/md0/md/array_state doesn't fix. I must have not gotten all the trace last time. I've attached it here. ** Attachment added: "kernel log snippet" https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5602089/+files/newkern.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I believe to resolve the deadlock you want to do: echo active > /sys/block/md1/md/array_state Not "idle". You should see a hung task for mdcheck in there somewhere as well, and it only occurs when the raid is resyncing (md_resync should be running), at least for me I the workaround in comment 5: https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/comments/5 Haven't had a problem since, upgraded to 22.04 since then as well with 5.15. I am pretty sure the problem is still there, the problem surfaced in 20.04 when they changed the raid consistency check. The new check pauses the check after 8 hours and triggers the deadlock, I just let it run to completion like it did in 18.04. I have not checked the patches, but it's possible you have a different problem because I don't see in either of two hung process traces any calls to md_ code. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Looks like two patches are landing in next to resolve this: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527=8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20220527=1e267742283a4b5a8ca65755c44166be27e9aa0f -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Tags added: patch -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0 [921588.559100] ? timerqueue_del+0x24/0x50
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Same issue on impish 5.13.13 kernel, running in VBox. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0 [921588.559100] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Patch added: "md-reap-sync-thread.patch" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942935/+attachment/5526028/+files/md-reap-sync-thread.patch ** Tags added: apport-collected impish ** Description changed: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0 [921588.559100] ? timerqueue_del+0x24/0x50 [921588.559105] ? futex_wait+0x1ed/0x270 [921588.559109] do_writepages+0x43/0xd0 [921588.559112] ? do_writepages+0x43/0xd0 [921588.559115] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Confirmed Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Tags removed: hirsute -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Incomplete Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0 [921588.559100] ? timerqueue_del+0x24/0x50
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Also affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux package in Ubuntu: Incomplete Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ? hrtimer_start_range_ns+0x1aa/0x2f0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Here is Donald Buczek's reproducer script. I setup an Ubuntu 20.04 VM with latest linux-image-generic and was able to reproduce it within maybe 10 or 15 minutes. Exactly the same issue. Filesystem layout built as follows: # assemble raid devices mdadm --create /dev/md0 --level=1 --raid-devices=2 --spare-devices=1 /dev/sda2 /dev/sdb2 /dev/sdc2 mdadm --create /dev/md1 --level=5 --raid-devices=5 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 # create PVs, VGs, LVs pvcreate /dev/md1 vgcreate vg1 /dev/md1 lvcreate --name root --extents 100%FREE vg1 # create filesystems mkfs.ext4 /dev/md0 mkfs.ext4 /dev/vg1/root ** Attachment added: "mdhang.sh" https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5525050/+files/mdhang.sh -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
The patch hasn't made it into mainline from what I have seen, it looks like it died back in March waiting for feedback from additional kernel developers. From what I have gathered this is a deadlock scenario directly caused by pausing the resync while the system is under heavy write activity. Donald Buczek provided a reproducer which is a shell script that generates a lot of write activity and pauses/resumes the raid scrubbing. And he also provided a workaround to get the stuck system running without reboot: echo active > /sys/block/md1/md/array_state I haven't tried the patch or any of this, I pretty much eliminated the trigger which is mdcheck_start & mdcheck_continue and went back to the 18.04 LTS way of scrubbing arrays (which is basically don't pause/interrupt it once it starts). I ran through a checkarray yesterday to 100%, no problems. Meanwhile since upgrading to 20.04 LTS it has hung almost every single time through 5.4, 5.8, and 5.11 kernels. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
One of the systems was using package linux-generic and in practice Linux pedabackup 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux $ cat /proc/version_signature Ubuntu 5.4.0-80.90-generic 5.4.124 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-signed-hwe-5.11 (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
I think I've seen this issue once per two different systems so I think this is software issue. Does anybody know if patch in comment #6 is going to be included in Ubuntu 20.04 LTS? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: Confirmed Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Here is the proposed patch, Doesn't appear to have been applied. Last report was with 5.11rc5. https://lore.kernel.org/linux-raid/1613177399-22024-1-git-send-email- guoqing.ji...@cloud.ionos.com/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ?
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Similar report here on 5.10.0-rc4: https://www.spinics.net/lists/raid/msg66654.html I ended up masking the services introduced with 20.04 LTS, and switched back the crontab. systemctl mask mdcheck_continue.service mdcheck_continue.timer mdcheck_start.service mdcheck_start.timer cat > /etc/cron.d/mdadm << 'EOF' # # cron.d/mdadm -- schedules periodic redundancy checks of MD devices # # Copyright © martin f. krafft # distributed under the terms of the Artistic Licence 2.0 # # By default, run at 00:57 on every Sunday, but do nothing unless the day of # the month is less than or equal to 7. Thus, only run on the first Sunday of # each month. crontab(5) sucks, unfortunately, in this regard; therefore this # hack (see #380425). 57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi EOF The pausing and resuming of the integrity check was an annoyance for me anyways. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Hi Kleber, I installed it later yesterday, but I won't know until the next resync. This has been a problem since at least linux 5.4 kernel that shipped with Ubuntu 20.04. I don't think I had these problems on Ubuntu 18.04 LTS, the same hardware, running the linux-image-generic at that time. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088]
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
Hello Chad Wagner, Thank you for reporting this issue. Could you please try installing the latest 20.04 HWE kernel and check whether the problem persists? The version currently in focal-updates is 5.11.0-34.36~20.04.1. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0
[Kernel-packages] [Bug 1942935] Re: kernel io hangs during mdcheck/resync
** Attachment added: "screenlog.txt" https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.11/+bug/1942935/+attachment/5523575/+files/screenlog.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-hwe-5.11 in Ubuntu. https://bugs.launchpad.net/bugs/1942935 Title: kernel io hangs during mdcheck/resync Status in linux-signed-hwe-5.11 package in Ubuntu: New Bug description: It seems to always occur during an mdcheck/resync, if I am logged in via SSH it is still somewhat responsive and basic utilities like dmesg will work. But it apppears any write I/O will hang the terminal and nothing is written to syslog (presumably because it is blocked). Below is output of dmesg and cat /proc/mdstat, it appears the data check was interrupted and /proc/mdstat still shows progress, and a whole slew of hung tasks including md1_resync itself. [756484.534293] md: data-check of RAID array md0 [756484.628039] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [756493.808773] md: md0: data-check done. [756493.829760] md: data-check of RAID array md1 [778112.446410] md: md1: data-check interrupted. [810654.608102] md: data-check of RAID array md1 [832291.201064] md: md1: data-check interrupted. [899745.389485] md: data-check of RAID array md1 [921395.835305] md: md1: data-check interrupted. [921588.558834] INFO: task systemd-journal:376 blocked for more than 120 seconds. [921588.558846] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.558850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.558854] task:systemd-journal state:D stack:0 pid: 376 ppid: 1 flags:0x0220 [921588.558859] Call Trace: [921588.558864] __schedule+0x44c/0x8a0 [921588.558872] schedule+0x4f/0xc0 [921588.558876] md_write_start+0x150/0x240 [921588.558880] ? wait_woken+0x80/0x80 [921588.558886] raid5_make_request+0x88/0x890 [raid456] [921588.558898] ? wait_woken+0x80/0x80 [921588.558901] ? mempool_kmalloc+0x17/0x20 [921588.558904] md_handle_request+0x12d/0x1a0 [921588.558907] ? __part_start_io_acct+0x51/0xf0 [921588.558912] md_submit_bio+0xca/0x100 [921588.558915] submit_bio_noacct+0x112/0x4f0 [921588.558918] ? ext4_fc_reserve_space+0x110/0x230 [921588.558922] submit_bio+0x51/0x1a0 [921588.558925] ? _cond_resched+0x19/0x30 [921588.558928] ? kmem_cache_alloc+0x38e/0x440 [921588.558932] ? ext4_init_io_end+0x1f/0x50 [921588.558936] ext4_io_submit+0x4d/0x60 [921588.558940] ext4_writepages+0x2c6/0xcd0 [921588.558944] do_writepages+0x43/0xd0 [921588.558948] ? do_writepages+0x43/0xd0 [921588.558951] ? fault_dirty_shared_page+0xa5/0x110 [921588.558955] __filemap_fdatawrite_range+0xcc/0x110 [921588.558960] file_write_and_wait_range+0x74/0xc0 [921588.558962] ext4_sync_file+0xf5/0x350 [921588.558967] vfs_fsync_range+0x49/0x80 [921588.558970] do_fsync+0x3d/0x70 [921588.558973] __x64_sys_fsync+0x14/0x20 [921588.558976] do_syscall_64+0x38/0x90 [921588.558980] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [921588.558984] RIP: 0033:0x7f4c97ee832b [921588.558987] RSP: 002b:7ffdceb29e50 EFLAGS: 0293 ORIG_RAX: 004a [921588.558991] RAX: ffda RBX: 55ced34b0fa0 RCX: 7f4c97ee832b [921588.558993] RDX: 7f4c97fc8000 RSI: 55ced3487b70 RDI: 0021 [921588.558995] RBP: 0001 R08: R09: 7ffdceb29fa8 [921588.558996] R10: 7f4c97d2c848 R11: 0293 R12: 7ffdceb29fa8 [921588.558998] R13: 7ffdceb29fa0 R14: 55ced34b0fa0 R15: 55ced34bcf90 [921588.559014] INFO: task mysqld:1505 blocked for more than 120 seconds. [921588.559018] Not tainted 5.11.0-27-generic #29~20.04.1-Ubuntu [921588.559022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [921588.559025] task:mysqld state:D stack:0 pid: 1505 ppid: 1 flags:0x [921588.559030] Call Trace: [921588.559032] __schedule+0x44c/0x8a0 [921588.559036] schedule+0x4f/0xc0 [921588.559040] md_write_start+0x150/0x240 [921588.559044] ? wait_woken+0x80/0x80 [921588.559047] raid5_make_request+0x88/0x890 [raid456] [921588.559056] ? wait_woken+0x80/0x80 [921588.559059] ? mempool_kmalloc+0x17/0x20 [921588.559062] md_handle_request+0x12d/0x1a0 [921588.559065] ? __part_start_io_acct+0x51/0xf0 [921588.559068] md_submit_bio+0xca/0x100 [921588.559071] submit_bio_noacct+0x112/0x4f0 [921588.559075] submit_bio+0x51/0x1a0 [921588.559077] ? _cond_resched+0x19/0x30 [921588.559081] ? kmem_cache_alloc+0x38e/0x440 [921588.559084] ? ext4_init_io_end+0x1f/0x50 [921588.559088] ext4_io_submit+0x4d/0x60 [921588.559091] ext4_writepages+0x2c6/0xcd0 [921588.559094] ? __schedule+0x454/0x8a0 [921588.559097] ?