Bug#913138: linux: I/O on md RAID 6 hangs completely
i have the same problem here with one of my boxes. it happens with disk IO going via raid1 to one of my SATA ssd's (sda & sdb). disabling blk_mq solves this issue for me. i am using 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 but i had the same problems with 4.18. my raid setup: $ pvdisplay --- Physical volume --- PV Name /dev/sdb VG Name flo_data PV Size 465.76 GiB / not usable 4.02 MiB Allocatable yes PE Size 4.00 MiB Total PE 119234 Free PE 9153 Allocated PE 110081 PV UUID 7vbn06-H2vv-6O08-ndZa-Z1LF-xKN1-QcTBGh --- Physical volume --- PV Name /dev/sda VG Name flo_data PV Size 465.76 GiB / not usable 4.02 MiB Allocatable yes PE Size 4.00 MiB Total PE 119234 Free PE 9153 Allocated PE 110081 PV UUID xYIXB0-7SF6-KexY-Aw4S-31uL-7mAF-7sHTqj $ lvdisplay --- Logical volume --- LV Path/dev/flo_data/home LV Namehome VG Nameflo_data LV UUIDMf5mBg-Gj1c-Gclf-e3WG-mK2g-icRN-NO0SZg LV Write Accessread/write LV Creation host, time uter, 2018-05-24 19:57:17 +0200 LV Status available # open 2 LV Size430.00 GiB Current LE 110080 Mirrored volumes 2 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 i get "blocked for more than 120 seconds" kernel messages and all stacks seem to have in common: ... proc a: [ 5076.429209] schedule+0x28/0x80 [ 5076.429218] md_super_wait+0x6e/0xa0 [md_mod] [ 5076.429220] ? finish_wait+0x80/0x80 [ 5076.429229] md_bitmap_wait_writes+0x93/0xa0 [md_mod] [ 5076.429233] ? __wake_up_common_lock+0x89/0xc0 [ 5076.429242] md_bitmap_unplug+0xc7/0x110 [md_mod] [ 5076.429246] flush_bio_list+0x1c/0xd0 [raid1] [ 5076.429249] raid1_unplug+0xb9/0xd0 [raid1] [ 5076.429254] blk_flush_plug_list+0xcf/0x240 [ 5076.429257] blk_finish_plug+0x21/0x2e [ 5076.429286] ext4_writepages+0x68f/0xf00 [ext4] ... proc b: [ 5076.429429] schedule+0x28/0x80 [ 5076.429438] md_super_wait+0x6e/0xa0 [md_mod] [ 5076.429440] ? finish_wait+0x80/0x80 [ 5076.429449] md_bitmap_wait_writes+0x93/0xa0 [md_mod] [ 5076.429452] ? __wake_up_common_lock+0x89/0xc0 [ 5076.429461] md_bitmap_unplug+0xc7/0x110 [md_mod] [ 5076.429465] flush_bio_list+0x1c/0xd0 [raid1] [ 5076.429469] raid1_unplug+0xb9/0xd0 [raid1] [ 5076.429472] blk_flush_plug_list+0xcf/0x240 [ 5076.429475] blk_finish_plug+0x21/0x2e [ 5076.429497] ext4_writepages+0x68f/0xf00 [ext4] ... proc c: [ 5076.429633] schedule+0x28/0x80 [ 5076.429641] md_super_wait+0x6e/0xa0 [md_mod] [ 5076.429644] ? finish_wait+0x80/0x80 [ 5076.429652] md_bitmap_wait_writes+0x93/0xa0 [md_mod] [ 5076.429656] ? __wake_up_common_lock+0x89/0xc0 [ 5076.429665] md_bitmap_unplug+0xc7/0x110 [md_mod] [ 5076.429669] flush_bio_list+0x1c/0xd0 [raid1] [ 5076.429672] raid1_unplug+0xb9/0xd0 [raid1] [ 5076.429675] blk_flush_plug_list+0xcf/0x240 [ 5076.429678] blk_finish_plug+0x21/0x2e [ 5076.429701] ext4_writepages+0x68f/0xf00 [ext4] my journalctl -a output: Jan 09 19:27:48 uter kernel: Linux version 4.19.0-1-amd64 (debian-ker...@lists.debian.org) (gcc version 8.2.0 (Debian 8.2.0-13)) #1 SMP Debian 4.19.12-1 (2018-12-22) Jan 09 19:27:48 uter kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-1-amd64 root=/dev/sdc1 ro quiet Jan 09 19:27:48 uter kernel: x86/fpu: x87 FPU will use FXSAVE Jan 09 19:27:48 uter kernel: BIOS-provided physical RAM map: Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0x-0x0009e7ff] usable Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0x0009f800-0x0009] reserved Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0x000f-0x000f] reserved Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0x0010-0xcfed] usable Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0xcfee-0xcfee2fff] ACPI NVS Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0xcfee3000-0xcfee] ACPI data Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0xcfef-0xcfef] reserved Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0xf000-0xf3ff] reserved Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0xfec0-0x] reserved Jan 09 19:27:48 uter kernel: BIOS-e820: [mem 0x0001-0x00022fff] usable Jan 09 19:27:48 uter kernel: NX (Execute Disable) protection: active Jan 09 19:27:48 uter kernel: SMBIOS 2.4 present. Jan 09 19:27:48 uter kernel: DMI: Gigabyte Technology Co., Ltd. X38-DS4/X38-DS4, BIOS F1 11/23/2007 Jan 09 19:27:48 uter kernel: tsc: Fast TSC calibration using PIT Jan 09 19:27:48 uter
Bug#913138: linux: I/O on md RAID 6 hangs completely
On Thu, 08 Nov 2018 23:28:16 +0100 =?UTF-8?Q?Stanis=C5=82aw?= wrote: I suffer the same problem while running RAID1 with kernel 4.18.10-2. Me too. For me this happens since the switch from 4.16 to 4.17.x, with two different PCs, both with LVM based RAID1. I've already opened bug #913119, then I've found this bug report and the reply from Stanislav was really helpful for me. To me this bug, mine and the already closed bug #904822 have the same root: the stack traces reported by dmesg are very similar. And the common denominators are some sort of LVM RAID and the range of kernel used. "...Someone else suggested this might be related to using "blk-mq", so could you try with these parameter: dm_mod.use_blk_mq=0 scsi_mod.use_blk_mq=0 This seems to have solved the problem for me. I've tested these boot parameters on one of the affected PC and now it's running for more than three days. Before, with kernel from 4.17.x to the current Debian's 4.18.10-2+b1, the system showed an oops within 0.5/1 day. Disabling these parameters is plausible, since Debian's kernel enabled SCSI_MQ_DEFAULT and DM_MQ_DEFAULT with 4.17~rc7-1~exp1. Also, do you have laptop-mode-tools installed? No, not installed here. I've checked with two other distributions I have here, to see what they have done with SCSI_MQ_DEFAULT and DM_MQ_DEFAULT parameters: - Arch Linux (kernel 4.18.16-arch1-1-ARCH): both disabled. - Arch Linux (kernel 4.19.2-arch1-1-ARCH): both enabled. - Fedora server 29 (kernel 4.18.17-300.fc29.x86_64): both disabled. - Fedora server 29 (kernel 4.19.2-301.fc29.x86_64): both disabled. But I was unable to find if upstream is aware of this problem and if it's already resolved in 4.19. Cesare.
Bug#913138: linux: I/O on md RAID 6 hangs completely
I suffer the same problem while running RAID1 with kernel 4.18.10-2. I have found a hint in the Debian kernel mailing list, however, I havent tested it yet: ...Someone else suggested this might be related to using blk-mq, so could you try with these parameter: dm_mod.use_blk_mq=0 scsi_mod.use_blk_mq=0 Also, do you have laptop-mode-tools installed? Ben... Please, read: lists.debian.org lists.debian.org lists.debian.org lists.debian.org lists.debian.org lists.debian.org
Bug#913138: linux: I/O on md RAID 6 hangs completely
On Wed, 7 Nov 2018, Thorsten Glaser wrote: > Normally, if I leave the system alone for a while (half an hour or > so), it resolves itself, but that’s unacceptable for a work system, The system hasn’t recovered yet today. There’s nothing new in dmesg. bye, //mirabilos -- tarent solutions GmbH Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/ Tel: +49 228 54881-393 • Fax: +49 228 54881-235 HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941 Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg
Bug#913138: linux: I/O on md RAID 6 hangs completely
Package: linux-image-4.18.0-2-amd64 Version: 4.18.10-2 Severity: normal Occasionally, my system begins freezing (processes doing a lot of I/O enter D state). It is still somewhat usable for already cached stuff (starting a new shell tab in GNU screen works, lynx does, … but e.g. the debsums verify of reportbug freezes it, a new reportbug with --no-verify again works though). Normally, if I leave the system alone for a while (half an hour or so), it resolves itself, but that’s unacceptable for a work system, especially if I can’t engage the screen lock (most of the time, I can, though). This has happened a few times over the last weeks, sometimes twice in a single day, oftentimes not at all, and with multiple recent kernel images. Today’s occurrence is from Linux tglase.lan.tarent.de 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64 GNU/Linux dmesg: [0.00] microcode: microcode updated early to revision 0x1d, date = 2018-05-11 [0.00] Linux version 4.18.0-2-amd64 (debian-ker...@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-29)) #1 SMP Debian 4.18.10-2 (2018-10-07) [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.18.0-2-amd64 root=/dev/mapper/vg--tglase-lv--tglase ro rootdelay=5 net.ifnames=0 syscall.x32=y vsyscall=emulate kaslr [0.00] x86/fpu: x87 FPU will use FXSAVE [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dbff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcfec] usable [0.00] BIOS-e820: [mem 0xcfed-0xcfed0fff] ACPI NVS [0.00] BIOS-e820: [mem 0xcfed1000-0xcfed] ACPI data [0.00] BIOS-e820: [mem 0xcfee-0xcfef] reserved [0.00] BIOS-e820: [mem 0xf400-0xf7ff] reserved [0.00] BIOS-e820: [mem 0xfec0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00062fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Gigabyte Technology Co., Ltd. X58-USB3/X58-USB3, BIOS F5 09/07/2011 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] last_pfn = 0x63 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-CDFFF write-protect [0.00] CE000-E uncachable [0.00] F-F write-through [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask F write-back [0.00] 1 base 0E000 mask FE000 uncachable [0.00] 2 base 0D000 mask FF000 uncachable [0.00] 3 base 1 mask F write-back [0.00] 4 base 2 mask E write-back [0.00] 5 base 4 mask C write-back [0.00] 6 base 5 mask F write-back [0.00] 7 base 6 mask FC000 write-back [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.00] e820: update [mem 0xd000-0x] usable ==> reserved [0.00] last_pfn = 0xcfed0 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000f5a60-0x000f5a6f] mapped at [(ptrval)] [0.00] Base memory trampoline at [(ptrval)] 97000 size 24576 [0.00] BRK [0x2ba84b000, 0x2ba84bfff] PGTABLE [0.00] BRK [0x2ba84c000, 0x2ba84cfff] PGTABLE [0.00] BRK [0x2ba84d000, 0x2ba84dfff] PGTABLE [0.00] BRK [0x2ba84e000, 0x2ba84efff] PGTABLE [0.00] BRK [0x2ba84f000, 0x2ba84] PGTABLE [0.00] BRK [0x2ba85, 0x2ba850fff] PGTABLE [0.00] BRK [0x2ba851000, 0x2ba851fff] PGTABLE [0.00] BRK [0x2ba852000, 0x2ba852fff] PGTABLE [0.00] BRK [0x2ba853000, 0x2ba853fff] PGTABLE [0.00] RAMDISK: [mem 0x34928000-0x3648bfff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000F7200 14 (v00 GBT ) [0.00] ACPI: RSDT 0xCFED1040 48 (v01 GBTGBTUACPI 42302E31 GBTU 01010101) [0.00] ACPI: FACP 0xCFED1100 74 (v01 GBTGBTUACPI 42302E31 GBTU 01010101) [0.00] ACPI: DSDT 0xCFED11C0 00391C (v01 GBTGBTUACPI 1000 MSFT 010C) [0.00] ACPI: FACS 0xCFED 40 [0.00] ACPI: MSDM 0xCFED4CC0 55 (v03 GBTGBTUACPI 42302E31 GBTU 01010101) [0.00] ACPI: HPET 0xCFED4D80 38 (v01 GBTGBTUACPI 42302E31 GBTU 0098) [0.00] ACPI: MCFG