Hi! We are running some large (100+) kubernetes clusters on bare metal machines also running a LSI MegaRAID controller and we are experiencing these exact problems since about a week (cluster being in production since early june).
We run with this controller: # lspci | grep -i mega 59:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID Tri-Mode SAS3516 (rev 01) After being in operation and taking production traffic (we are very IO intense) for less month both clusters started behaving weirdly with timeouts between processes on different nodes. After much debugging we found that basically IO performance had ground to a halt. Here is a quick summary of what we found together with our external partner (developer of our software defined storage solution): We analyzed a small subset of nodes encountering the issue. Based on our initial diagnosis, here are the findings: - The CPU and IO utilization were low but all systems reported high load averages. - All disks looked fine - root and data disks. The IO load was very small with no visible spikes in latencies. None of the devices were unresponsive/stuck. - All user space processes looked fine, none were blocked. - The systems were slow in general. i.e. Installing packages and writing even to the root disk was very slow. - A kernel thread dump showed a lot of them in D state, all stuck on writing to the page cache. - We used dd to write a small 1M file on the host (NOT on px device), requiring no disk io and it showed the same symptoms. - System dirty threshold settings do not show anything out of the ordinary. - All the affected nodes another common symptom is that the page cache drain appeared to be slow/stuck. Amount of pages in the writeback list is high but no writeback appeared to be happening. These clusters all run a 18.04 LTS but have different kernels depending on their patch level: Cluster A: 228 4.15.0-140-generic 74 4.15.0-142-generic 7 4.15.0-143-generic Cluster B: 418 4.15.0-136-generic 22 4.15.0-144-generic 12 4.15.0-143-generic 9 4.15.0-142-generic Cluster A is the one with the worst performance. Cluster B have had similar problems but since the majority of the nodes are on 136 it has not been so badly hit. Our only remedy for now have been to downgrade to 136 and/or rebooting the machines. Rebooting the machines but staying on the problematic kernel works for now, but most likely we will see the same behavior in a week or so of taking production traffic. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1928744 Title: Disk IO very slow on kernel 4.15.0-142-generic Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: After upgrading Ubuntu 18.04 kernel from 4.15.0-136-generic to kernel 4.15.0-142-generic disk IOs (writes) are suffering from terrible delay. Time required to write 1GB of data to the dist is ~12 minutes. 1GB in 12 minutes: strace -ttvvff -o mel.strace dd bs=1024k count=1024 if=/dev/zero of=/data/disk_test/dd_test Before the upgrade, it took seconds to complete. The only change which was done on the system was security upgrade done with unattended-upgrade utility and main repositories for bionic and bionic-security. This is physical server LENOVO System x3650 M5, with Symbios Logic MegaRAID SAS-3 3108 disk controller. The only difference which I see is that there are more modules loaded by the latest kernel. I would expect that the performance would be the same. There were no changes to the physical server and the disks are reported as online and healthy. No only dd is affected but also all disk-based operation by barman (we are using postgres steaming backup along with rsync/ssh archiving). I think this could be a kernel bug. I would be happy to provide other information which would help in fixing this problem. ===================================================== Details: lsb_release -rd Description: Ubuntu 18.04.1 LTS Release: 18.04 lsmod Module Size Used by ip6_tables 28672 0 iptable_filter 16384 0 ip_tables 28672 1 iptable_filter x_tables 40960 3 iptable_filter,ip6_tables,ip_tables tcp_diag 16384 0 udp_diag 16384 0 raw_diag 16384 0 inet_diag 24576 3 tcp_diag,raw_diag,udp_diag unix_diag 16384 0 binfmt_misc 20480 1 ipmi_ssif 32768 0 kvm_intel 217088 0 kvm 614400 1 kvm_intel irqbypass 16384 1 kvm crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 ghash_clmulni_intel 16384 0 pcbc 16384 0 aesni_intel 188416 0 aes_x86_64 20480 1 aesni_intel crypto_simd 16384 1 aesni_intel glue_helper 16384 1 aesni_intel cryptd 24576 3 crypto_simd,ghash_clmulni_intel,aesni_intel ipmi_si 57344 0 ipmi_devintf 20480 0 ipmi_msghandler 53248 3 ipmi_devintf,ipmi_si,ipmi_ssif acpi_pad 180224 0 bonding 163840 0 lp 20480 0 parport 49152 1 lp autofs4 40960 2 mgag200 45056 1 i2c_algo_bit 16384 1 mgag200 ttm 106496 1 mgag200 drm_kms_helper 172032 1 mgag200 syscopyarea 16384 1 drm_kms_helper tg3 167936 0 sysfillrect 16384 1 drm_kms_helper sysimgblt 16384 1 drm_kms_helper fb_sys_fops 16384 1 drm_kms_helper ptp 20480 1 tg3 drm 401408 4 drm_kms_helper,mgag200,ttm pps_core 20480 1 ptp megaraid_sas 143360 12 lspci -vnn -s 0b:00.0 0b:00.0 RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02) Subsystem: IBM MegaRAID SAS-3 3108 [Invader] [1014:0454] Physical Slot: 9 Flags: bus master, fast devsel, latency 0, IRQ 31, NUMA node 0 I/O ports at 2e00 [size=256] Memory at 903f0000 (64-bit, non-prefetchable) [size=64K] Memory at 90400000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at <ignored> [disabled] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas Strace example for 10MB: strace -ttvvff -o mel.strace dd bs=1024k count=10 if=/dev/zero of=/data/disk_test/dd_test 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 6.73972 s, 1.6 MB/s 14:59:19.145037 execve("/bin/dd", ["dd", "bs=1024k", "count=10", "if=/dev/zero", "of=/data/disk_test/dd_test"], ["LS_COLORS=rs=0:di=01;34:ln=01;36"..., "LESSCLOSE=/usr/bin/lesspipe %s %"..., "LANG=en_AU.UTF-8", "SUDO_GID=4118", "USERNAME=root", "SUDO_COMMAND=/bin/bash", "S_COLORS=auto", "USER=root", "PWD=/data/disk_test", "HOME=/root", "SUDO_USER=mariusz.boruc", "SUDO_UID=4118", "MAIL=/var/mail/root", "SHELL=/bin/bash", "TERM=xterm-256color", "SHLVL=1", "LANGUAGE=en_AU:", "LOGNAME=root", "PATH=/usr/local/sbin:/usr/local/"..., "LESSOPEN=| /usr/bin/lesspipe %s", "_=/usr/bin/strace", "OLDPWD=/data"]) = 0 14:59:19.145685 brk(NULL) = 0x563734ec4000 14:59:19.145766 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:59:19.145846 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 14:59:19.145909 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 14:59:19.145970 fstat(3, {st_dev=makedev(8, 3), st_ino=845, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=64, st_size=28885, st_atime=1621301366 /* 2021-05-18T10:59:26.840044503+0930 */, st_atime_nsec=840044503, st_mtime=1620610160 /* 2021-05-10T10:59:20.912915629+0930 */, st_mtime_nsec=912915629, st_ctime=1620610160 /* 2021-05-10T10:59:20.912915629+0930 */, st_ctime_nsec=912915629}) = 0 14:59:19.146055 mmap(NULL, 28885, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d79db000 14:59:19.146108 close(3) = 0 14:59:19.146159 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:59:19.146215 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 14:59:19.146270 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832 14:59:19.146327 fstat(3, {st_dev=makedev(8, 3), st_ino=1413, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=3968, st_size=2030544, st_atime=1621312135 /* 2021-05-18T13:58:55.539717705+0930 */, st_atime_nsec=539717705, st_mtime=1591291526 /* 2020-06-05T02:55:26+0930 */, st_mtime_nsec=0, st_ctime=1604896088 /* 2020-11-09T14:58:08.844042263+1030 */, st_ctime_nsec=844042263}) = 0 14:59:19.146403 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49d79d9000 14:59:19.146460 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f49d73cb000 14:59:19.146510 mprotect(0x7f49d75b2000, 2097152, PROT_NONE) = 0 14:59:19.146567 mmap(0x7f49d77b2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f49d77b2000 14:59:19.146629 mmap(0x7f49d77b8000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f49d77b8000 14:59:19.146689 close(3) = 0 14:59:19.146758 arch_prctl(ARCH_SET_FS, 0x7f49d79da540) = 0 14:59:19.146895 mprotect(0x7f49d77b2000, 16384, PROT_READ) = 0 14:59:19.146990 mprotect(0x563733ac0000, 4096, PROT_READ) = 0 14:59:19.147044 mprotect(0x7f49d79e3000, 4096, PROT_READ) = 0 14:59:19.147093 munmap(0x7f49d79db000, 28885) = 0 14:59:19.147190 rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 14:59:19.147248 rt_sigaction(SIGUSR1, {sa_handler=0x5637338b30e0, sa_mask=[INT USR1], sa_flags=SA_RESTORER, sa_restorer=0x7f49d7409fd0}, NULL, 8) = 0 14:59:19.147301 rt_sigaction(SIGINT, {sa_handler=0x5637338b30d0, sa_mask=[INT USR1], sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, sa_restorer=0x7f49d7409fd0}, NULL, 8) = 0 14:59:19.147430 brk(NULL) = 0x563734ec4000 14:59:19.147475 brk(0x563734ee5000) = 0x563734ee5000 14:59:19.147535 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 14:59:19.147602 fstat(3, {st_dev=makedev(8, 7), st_ino=48819, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=3168, st_size=1687536, st_atime=1621312728 /* 2021-05-18T14:08:48.519742590+0930 */, st_atime_nsec=519742590, st_mtime=1604896695 /* 2020-11-09T15:08:15.489844866+1030 */, st_mtime_nsec=489844866, st_ctime=1604896695 /* 2020-11-09T15:08:15.489844866+1030 */, st_ctime_nsec=489844866}) = 0 14:59:19.147687 mmap(NULL, 1687536, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f49d783d000 14:59:19.147746 close(3) = 0 14:59:19.147853 openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 14:59:19.147912 dup2(3, 0) = 0 14:59:19.147960 close(3) = 0 14:59:19.148005 lseek(0, 0, SEEK_CUR) = 0 14:59:19.148051 openat(AT_FDCWD, "/data/disk_test/dd_test", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 14:59:19.293893 dup2(3, 1) = 1 14:59:19.293961 close(3) = 0 14:59:19.294012 mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f49d72c8000 14:59:19.294069 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:19.294495 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.025616 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.025759 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.026304 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.026416 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.026966 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.027067 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.027605 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.027714 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.028242 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.028339 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.028841 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.028927 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.029459 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.029558 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.030123 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.030235 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.030790 read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.030902 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 14:59:26.031457 close(0) = 0 14:59:26.031503 close(1) = 0 14:59:26.032941 openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0 14:59:26.033098 fstat(0, {st_dev=makedev(8, 3), st_ino=992, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2995, st_atime=1621312819 /* 2021-05-18T14:10:19.531746410+0930 */, st_atime_nsec=531746410, st_mtime=1591291526 /* 2020-06-05T02:55:26+0930 */, st_mtime_nsec=0, st_ctime=1604896693 /* 2020-11-09T15:08:13.373844883+1030 */, st_ctime_nsec=373844883}) = 0 14:59:26.033168 read(0, "# Locale name alias data base.\n#"..., 4096) = 2995 14:59:26.033323 read(0, "", 4096) = 0 14:59:26.033358 close(0) = 0 14:59:26.033400 openat(AT_FDCWD, "/usr/share/locale/en_AU/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 14:59:26.033447 openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 14:59:26.033487 openat(AT_FDCWD, "/usr/share/locale-langpack/en_AU/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 14:59:26.033616 openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) 14:59:26.033678 write(2, "10+0 records in\n10+0 records out"..., 33) = 33 14:59:26.033747 write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 58) = 58 14:59:26.033785 write(2, "\n", 1) = 1 14:59:26.033824 close(2) = 0 14:59:26.033863 exit_group(0) = ? 14:59:26.034041 +++ exited with 0 +++ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1928744/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp