On 15/02/2019 09:31, Christine Caulfield wrote: > On 14/02/2019 17:33, Edwin Török wrote: >> Hello, >> >> We were testing corosync 2.4.3/libqb 1.0.1-6/sbd 1.3.1/gfs2 on 4.19 and >> noticed a fundamental problem with realtime priorities: >> - corosync runs on CPU3, and interrupts for the NIC used by corosync are >> also routed to CPU3 >> - corosync runs with SCHED_RR, ksoftirqd does not (should it?), but >> without it packets sent/received from that interface would not get processed >> - corosync is in a busy loop using 100% CPU, never giving a chance for >> softirqs to be processed (including TIMER and SCHED) >> > > > Can you tell me what distribution this is please? This is a not-yet-released development version of XenServer based on CentOS 7.5/7.6. The kernel is 4.19.19 + patches to make it work well with Xen (previously we were using a 4.4.52 + Xen patches and backports kernel)
The versions of packages are: rpm -q libqb corosync dlm sbd kernel libqb-1.0.1-6.el7.x86_64 corosync-2.4.3-13.xs+2.0.0.x86_64 dlm-4.0.7-1.el7.x86_64 sbd-1.3.1-7.xs+2.0.0.x86_64 kernel-4.19.19-5.0.0.x86_64 Package versions with +xs in version have xenserver specific patches applied, libqb is coming straight from upstream CentOS here: https://git.centos.org/tree/rpms!libqb.git/fe522aa5e0af26c0cff1170b6d766b5f248778d2 > There are patches to > libqb that should be applied to fix a similar problem in 1.0.1-6 - but > that's a RHEL version and kernel 4.19 is not a RHEL 7 kernel, so I just > need to be sure that those fixes are in your libqb before going any further. We have libqb 1.0.1-6 from CentOS, it looks like there is 1.0.1-7 which includes an SHM crash fix, is this the one you were refering to, or is there an additional patch elsewhere? https://git.centos.org/commit/rpms!libqb.git/b5ede72cb0faf5b70ddd504822552fe97bfbbb5e > > Without doubt this is a bug, in normal operation corosync is quite light > on CPU. Thanks for the help in advance, --Edwin > > Chrissie > >> This appears to be a priority inversion problem, if corosync runs as >> realtime then everything it needs (timers...) should be realtime as >> well, otherwise running as realtime guarantees we'll miss the watchdog >> deadline, instead of guaranteeing that we process the data before the >> deadline. >> >> Do you have some advice on what the expected realtime priorities would >> be for: >> - corosync >> - sbd >> - hard irqs >> - soft irqs >> >> Also would it be possible for corosync to avoid hogging the CPU in libqb? >> (Our hypothesis is that if softirqs are not processed then timers >> wouldn't work for processes on that CPU either) >> >> Some more background and analysis: >> >> We noticed that cluster membership changes were very unstable >> (especially as you approach 16 physical host clusters) causing the whole >> cluster to fence when adding a single node. This was previously working >> fairly reliably with a 4.4 based kernel. >> >> I've increased SBD timeout to 3600 to be able to investigate the problem >> and noticed that corosync was using 100% of CPU3 [1] and I immediately >> lost SSH access on eth0 (corosync was using eth1), where eth0's >> interrupts were also processed on CPU3 (and irqbalance didn't move it). >> >> IIUC SCHED_RR tasks should not be able to take up 100% of CPU, according >> to [3] it shouldn't be allowed to use more than 95% of CPU. >> >> Softirqs were not processed at all on CPU3 (see [2], the numbers in the >> CPU3 column did not change, the numbers in the other columns did). >> Tried decreasing priority of corosync using chrt to 1, which didn't >> help. I then increased the priority of ksoftirqd to 50 using chrt, which >> immediately solved the CPU usage problem on corosync. >> >> I tried a simple infinite loop program with realtime priority, but it >> didn't reproduce the problems with interrupts getting stuck. >> >> >> Three problems here: >> * all softirqs were stuck (not being processed) on CPU3, which included >> TIMER and SCHED. corosync relies quite heavily on timers, would lack of >> processing them cause the 100% CPU usage? >> * is there a kernel bug introduced between 4.4 - 4.19 that causes >> realtime tasks to not respect the 95% limit anymore? This would leave 5% >> time for IRQs, including NIC IRQs >> * if corosync runs at higher priority than the kernel softirq thread >> processing NIC IRQ how is corosync expecting incoming packets to be >> processed, if it is hogging the CPU by receiving, polling and sending >> packets? >> >> On another host which exhibited the same problem I've run strace (which >> also had the side-effect of getting corosync unstuck from 100% CPU use >> after strace finished): >> 1 bind >> 5 close >> 688 epoll_wait >> 8 futex >> 1 getsockname >> 3 ioctl >> 1 open >> 3 recvfrom >> 190 recvmsg >> 87 sendmsg >> 9 sendto >> 4 socket >> 6 write >> >> On yet another host I've run gdb while corosync was stuck: >> Thread 2 (Thread 0x7f6fd0c9b700 (LWP 16245)): >> #0 0x00007f6fd34a0afb in do_futex_wait.constprop.1 () >> from /lib64/libpthread.so.0 >> #1 0x00007f6fd34a0b8f in __new_sem_wait_slow.constprop.0 () >> from /lib64/libpthread.so.0 >> #2 0x00007f6fd34a0c2b in sem_wait@@GLIBC_2.2.5 () from >> /lib64/libpthread.so.0 >> #3 0x00007f6fd3b38991 in qb_logt_worker_thread () from /lib64/libqb.so.0 >> #4 0x00007f6fd349ae25 in start_thread () from /lib64/libpthread.so.0 >> #5 0x00007f6fd31c4bad in clone () from /lib64/libc.so.6 >> >> Thread 1 (Thread 0x7f6fd43c7b80 (LWP 16242)): >> #0 0x00007f6fd31c5183 in epoll_wait () from /lib64/libc.so.6 >> #1 0x00007f6fd3b3dea8 in poll_and_add_to_jobs () from /lib64/libqb.so.0 >> #2 0x00007f6fd3b2ed93 in qb_loop_run () from /lib64/libqb.so.0 >> #3 0x000055592d62ff78 in main () >> >> >> [1] >> top - 15:51:38 up 47 min, 2 users, load average: 3.81, 1.70, 0.70 >> Tasks: 208 total, 4 running, 130 sleeping, 0 stopped, 0 zombie >> %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu1 : 53.8 us, 46.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu4 : 3.8 us, 0.0 sy, 0.0 ni, 96.2 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, >> 0.0 st >> KiB Mem : 4210184 total, 3374752 free, 387380 used, 448052 buff/cache >> KiB Swap: 1048572 total, 1048572 free, 0 used. 3660276 avail Mem >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 16217 root rt 0 194832 92532 66128 R 104.0 2.2 2:54.00 corosync >> 24004 root 20 0 161984 4432 3696 R 4.0 0.1 0:00.02 top >> >> cat /proc/interrupts |grep eth >> 88: 2 0 0 0 0 0 >> 0 0 xen-pirq -msi-x eth0 >> 89: 2805 0 0 419621 0 0 >> 0 0 xen-pirq -msi-x eth0-TxRx-0 >> 90: 2 0 0 0 0 0 >> 1 0 xen-pirq -msi-x eth1 >> 91: 435 171086 0 0 0 0 >> 0 0 xen-pirq -msi-x eth1-TxRx-0 >> >> [2] >> watch cat /proc/softirqs >> CPU0 CPU1 CPU2 CPU3 CPU4 >> CPU5 CPU6 CPU7 >> HI: 1 0 0 0 0 >> 0 0 0 >> TIMER: 355339 786951 367148 266583 389591 >> 357184 369577 374880 >> NET_TX: 8 1081 1 743 1 >> 1 223 5 >> NET_RX: 5807 230818 2709 529755 3323 >> 2598 2550 1878 >> BLOCK: 8166 53 42 13 10 >> 6563 21 6894 >> IRQ_POLL: 0 0 0 0 0 >> 0 0 0 >> TASKLET: 103 6 9 85 0 >> 0 0 0 >> SCHED: 240428 100842 83166 51213 77824 >> 77270 72074 72261 >> HRTIMER: 0 0 0 0 0 >> 0 0 0 >> RCU: 135491 245345 139513 130591 140280 >> 135094 141007 136063 >> >> >> [3] >> /proc/sys/kernel/sched_rt_runtime_us >> 950000 >> /proc/sys/kernel/sched_rt_period_us >> 1000000 >> >> Best regards, >> --Edwin >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org