[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
This bug was fixed in the package pacemaker - 2.0.3-3ubuntu4.2 --- pacemaker (2.0.3-3ubuntu4.2) focal; urgency=medium * d/rules: Rebuild with QB_KILL_ATTRIBUTE_SECTION to overcome a problem in libqb (LP: #1915828) -- Dariusz Gadomski Thu, 04 Mar 2021 10:29:48 +0100 ** Changed in: pacemaker (Ubuntu Focal) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
This bug was fixed in the package pacemaker - 2.0.4-2ubuntu3.2 --- pacemaker (2.0.4-2ubuntu3.2) groovy; urgency=medium * d/rules: Rebuild with QB_KILL_ATTRIBUTE_SECTION to overcome a problem in libqb (LP: #1915828) -- Dariusz Gadomski Thu, 04 Mar 2021 10:52:40 +0100 ** Changed in: pacemaker (Ubuntu Groovy) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
# verification groovy $ apt-cache policy libcrmcommon34 | grep Installed Installed: 2.0.4-2ubuntu3.1 # dlm_stonith -t 5 -n 1089 dlm_stonith: utils.c:48: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ apt-cache policy libcrmcommon34 | grep Installed Installed: 2.0.4-2ubuntu3.2 # dlm_stonith -t 5 -n 1089 kick_helper error -107 nodeid 1089 ** Tags removed: verification-needed-groovy ** Tags added: verification-done-groovy ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
# verification focal $ apt-cache policy libcrmcommon34 | grep Installed Installed: 2.0.3-3ubuntu4.1 # dlm_stonith -t 5 -n 1 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ apt-cache policy libcrmcommon34 | grep Installed Installed: 2.0.3-3ubuntu4.2 # dlm_stonith -t 5 -n 1 kick_helper error -79 nodeid 1 ** Tags removed: verification-needed-focal ** Tags added: verification-done-focal -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Description changed: [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: #include QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) { return 0; } compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. + + additionally, altering of build flags (namely + -DQB_KILL_ATTRIBUTE_SECTION) removes some symbols from pacemaker + libraries (please see the debdiffs for the full list of them). Those + seem to be previously defined by macros (resolved in the end to + QB_LOG_INIT_DATA) and used internally by libqb for logging purposes. If + there was anything using those symbols build time or runtime missing + symbols may be reported. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 The libqb code in bionic does not include the linker "magic" and so does not have this problem. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? $ ./test [original description] When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
Hello Jim, or anyone else affected, Accepted pacemaker into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/2.0.3-3ubuntu4.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: pacemaker (Ubuntu Focal) Status: In Progress => Fix Committed ** Tags added: verification-needed-focal -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
Hello Jim, or anyone else affected, Accepted pacemaker into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/pacemaker/2.0.4-2ubuntu3.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-groovy. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: pacemaker (Ubuntu Groovy) Status: New => Fix Committed ** Tags added: verification-needed verification-needed-groovy -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
SRU proposal for groovy ** Patch added: "groovy.debdiff" https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5474408/+files/groovy.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
SRU proposal for focal ** Patch removed: "focal.debdiff" https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5473371/+files/focal.debdiff ** Patch added: "focal.debdiff" https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5474407/+files/focal.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Changed in: pacemaker (Ubuntu Groovy) Assignee: (unassigned) => Dariusz Gadomski (dgadomski) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Patch added: "focal.debdiff" https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+attachment/5473371/+files/focal.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
Initial Focal SRU proposal. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
I'm pasting an email answer here so we keep track of all that has been discussed for this issue: > Me and Dan have been working recently on a customer case reported as LP: #1915828. Turns out that some linker "magic" used inside libqb broke pacemaker (and potentially any package using QB_LOG_INIT_DATA from the library or even every packaged linked against libqb). I have checked libqb commits and: https://paste.ubuntu.com/p/vTdYJdC4Hc/ . I see that upstream added some options in configure.ac related to the linking issue, even quoting debian/ubuntu option to override libtool variable by force. There is also an existing test for the __attribute__((section("__verbose"))) breakage detection now (gcc_has_attribute_section_visible variable in configure.ac. > Dan found an ABI-compatible way of mitigating the problem by rebuilding pacemaker with the QB_KILL_ATTRIBUTE_SECTION defined. This strips the magic and makes pacemaker useful again. I see the mitigation (QB_KILL_ATTRIBUTE_SECTION) comes from: -- 32555d8 tests: add a script to generate callsite-heavy logging client… ...so as to evaluate use of resources. In particular, the intention here is to uncover the observable differences between the same logging code built with callsite section (default when available) and purposefully (overriding that default by force) without it. ... -- And they even added “tests/functional/log_callsite_bench_gen.py” to measure the impact of this mitigation. I’m particularly worried with: — Based on the above, we can conclude that leveraging the callsite section for logging as facilitated by the toolchain intrinsics is beneficial, especially for performance-critical applications (corosync being the showcase here). Therefore it's desired to struggle for retaining this nifty trick despite some troubles emerged with recent binutils releases (starting with 2.29) and the changed behaviour we relied on so far in respective ld.bfd linkers (as mentioned in preceding commits). That motive is immediately followed -- well, judging the impact fairly, actually outclassed -- with the intention to preserve binary compatibility (incl. continuous library support for callsite section offloading spread in the existing client space widely for quite some years already) to the utmost extent possible. -- I believe this will be accepted by the SRU team but this, for sure, has to be mentioned in the public bug. I would add to [regression potential] the fact that the logging mechanism would stir heap more often (commit log has even a time execution delta). Shouldn’t this bug also affect all the libqb0 rdepends ? I can see pacemaker, sbd, corosync and usbguard source packages. > The problem is (more details in comments #3-#5) some symbols disappear from the package. Those symbols doesn't seem to be used anywhere explicitly, but we were wondering if it's ok to just drop those symbols or maybe to implement a change in libqb to create dummy constructors (e.g. https://pastebin.canonical.com/p/Y4fk747YfK/) to ensure the symbols are available just in case. For the pacemaker fix I’ll let the SRU team to discuss whether they would like to have symbols (or not) the symbols in the new binary (after this fix). I don’t think those symbols are used elsewhere (from rdepends of libcrm* they would only be used by either pacemaker OR sbd). I think next step here is to offer a patch and ask for the SRU team input/review. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
The symbols defined with CRM_TRACE_INIT_DATA doesn't seem to be used anywhere inside pacemaker and it's less than likely those are used anywhere outside of it. The definitions seem to be strictly logging related without any other functionality declared. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
The list of symbols missing seem to be consistent with the onces defined with the CRM_TRACE_INIT_DATA macro: lib/lrmd/lrmd_client.c 46:CRM_TRACE_INIT_DATA(lrmd); lib/pacemaker/pcmk_trans_unpack.c 20:CRM_TRACE_INIT_DATA(transitioner); lib/fencing/st_client.c 37:CRM_TRACE_INIT_DATA(stonith); lib/pacemaker/pcmk_sched_allocate.c 24:CRM_TRACE_INIT_DATA(pe_allocate); lib/cluster/cluster.c 25:CRM_TRACE_INIT_DATA(cluster); lib/common/utils.c 57:CRM_TRACE_INIT_DATA(common); lib/pengine/unpack.c 28:CRM_TRACE_INIT_DATA(pe_status); lib/pengine/rules.c 25:CRM_TRACE_INIT_DATA(pe_rules); The macro itself is defined in the following way: include/crm/common/logging.h 112: #define CRM_TRACE_INIT_DATA(name) QB_LOG_INIT_DATA(name) On the other hand QB_LOG_INIT_DATA is defined in libqb as follows: #if defined(QB_KILL_ATTRIBUTE_SECTION) || defined(S_SPLINT_S) #undef QB_HAVE_ATTRIBUTE_SECTION #endif /* defined(QB_KILL_ATTRIBUTE_SECTION) || defined(S_SPLINT_S) */ #ifdef QB_HAVE_ATTRIBUTE_SECTION // ... #else #define QB_LOG_INIT_DATA(name) #endif /* QB_HAVE_ATTRIBUTE_SECTION */ So in the end with the QB_KILL_ATTRIBUTE_SECTION defined the macro QB_LOG_INIT_DATA is being left empty. Hence the missing symbols. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
Adding -DQB_KILL_ATTRIBUTE_SECTION to CFLAGS seems to result in some symbols disappearing during the build: https://paste.ubuntu.com/p/hmBpMXGjqy/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
I have checked it again on Groovy and looks like change from https://github.com/ClusterLabs/libqb/pull/322 did not make it to Groovy version of libqb. Also in a test the behavior was identical to Focal, so I have targetted the bug to the series. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Also affects: pacemaker (Ubuntu Groovy) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
It appears the linker is eliding the section that libqb's "linker magic" expects when the linker detects that the compiled program doesn't actually use that section, which seems completely reasonable for the linker to do. This seems to be a case of libqb trying to be "clever" by relying on an implementation detail of the linker, which is bad. A slightly adjusted test program, to actually use the libqb log functions, does not reproduce the problem, e.g.: #include QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) { qb_log_init("test", LOG_USER, LOG_INFO); qb_log(LOG_ERR, "test\n"); qb_log_fini(); return 0; } Compiling that with or without -DQB_KILL_ATTRIBUTE_SECTION results in the message "test" logged to syslog when run, so it appears safe to compile pacemaker (and any other programs using libqb that show this problem) with that define, to work around this issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Description changed: [impact] programs using libqb logging exit due to failed assertion on qb log init [test case] test program: - #include QB_LOG_INIT_DATA(test); int main(int argc, char* argv[]) { - return 0; + return 0; } - compile and run: $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? - $ ./test + $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) - Note the error is slightly different when compiling without lto: $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? - $ ./test + $ ./test test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. Aborted (core dumped) - [regression potential] any regression would likely involve problems during logging using the libqb logging functions, which could include failure to log or even program exit and/or crash. [scope] this appears to be needed only for focal; the issue seems to be an interaction between the focal version of binutils and some linker "magic" that libqb used in the focal version. The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. https://github.com/ClusterLabs/libqb/pull/322 - The binutils in bionic and earlier does not appear to cause the - problematic behavior with the libqb linker "magic", so no change is - needed there. + The libqb code in bionic does not include the linker "magic" and so does + not have this problem. [other info] related debian binutils bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 related gcc bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=24276 however, those appear to only have changed binutils to ignore the issue to allow the build to stop failing. The libqb docs do contain two suggestions to possibly work around this bug, specifically using either -l:libqb.so.0 or -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help with the simple test case, but more testing is needed that actually exercises the log functionality to make sure nothing else breaks. $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? - $ ./test + $ ./test test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. Aborted (core dumped) $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl - $ ./test + $ ./test $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? - $ ./test - + $ ./test [original description] - - When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. + When a clustered node is detected as failed the remaining node tries to + fence the resources. When using pacemaker with gfs2 on an lvm2 logical + volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Also affects: pacemaker (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: pacemaker (Ubuntu) Status: New => Fix Released ** Changed in: pacemaker (Ubuntu Focal) Assignee: (unassigned) => Dariusz Gadomski (dgadomski) ** Changed in: pacemaker (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: pacemaker (Ubuntu Focal) Status: New => In Progress ** Changed in: pacemaker (Ubuntu) Assignee: Dariusz Gadomski (dgadomski) => (unassigned) ** Changed in: pacemaker (Ubuntu) Importance: Medium => Undecided -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Description changed: - When a clustered node is detected as failed the remaining node tries to - fence the resources. When using pacemaker with gfs2 on an lvm2 logical - volume dlm_controld calls out to dlm_stonith to release any locks held. + [impact] + + programs using libqb logging exit due to failed assertion on qb log init + + [test case] + + test program: + + + #include + + QB_LOG_INIT_DATA(test); + + int main(int argc, char* argv[]) + { + return 0; + } + + + compile and run: + + $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl + /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? + + $ ./test + test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. + Aborted (core dumped) + + + Note the error is slightly different when compiling without lto: + + $ gcc -D_GNU_SOURCE -o test test.c -lqb -ldl + /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? + + $ ./test + test: test.c:4: test: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed. + Aborted (core dumped) + + + [regression potential] + + any regression would likely involve problems during logging using the + libqb logging functions, which could include failure to log or even + program exit and/or crash. + + [scope] + + this appears to be needed only for focal; the issue seems to be an + interaction between the focal version of binutils and some linker + "magic" that libqb used in the focal version. + + The upstream libqb removed/replaced that linker "magic" after the version in focal, so this should not affect groovy or later. However, the fix changes the ABI and thus isn't appropriate for SRUing. + https://github.com/ClusterLabs/libqb/pull/322 + + The binutils in bionic and earlier does not appear to cause the + problematic behavior with the libqb linker "magic", so no change is + needed there. + + [other info] + + related debian binutils bug report: + https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923246 + + related gcc bug report: + https://sourceware.org/bugzilla/show_bug.cgi?id=24276 + + however, those appear to only have changed binutils to ignore the issue + to allow the build to stop failing. + + The libqb docs do contain two suggestions to possibly work around this + bug, specifically using either -l:libqb.so.0 or + -DQB_KILL_ATTRIBUTE_SECTION, or both. Either or both approaches do help + with the simple test case, but more testing is needed that actually + exercises the log functionality to make sure nothing else breaks. + + $ gcc -flto -D_GNU_SOURCE -o test test.c -lqb -ldl + /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? + $ ./test + test: test.c:4: test: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. + Aborted (core dumped) + + $ gcc -flto -D_GNU_SOURCE -o test test.c -l:libqb.so.0 -ldl + $ ./test + + $ gcc -flto -DQB_KILL_ATTRIBUTE_SECTION -D_GNU_SOURCE -o test test.c -lqb -ldl + /usr/bin/ld: warning: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libqb.so contains output sections; did you forget -T? + $ ./test + + + [original description] + + + When a clustered node is detected as failed the remaining node tries to fence the resources. When using pacemaker with gfs2 on an lvm2 logical volume dlm_controld calls out to dlm_stonith to release any locks held. Due to a build issue with the version of libqb that pacemaker is compiled against, the call to QB_LOG_INIT_DATA which is #defined to CRM_TRACE_INIT_DATA, fails with an assertion. This prevents the lock manager from releasing any held locks on the failed node. At this point the gfs2 filesystem cannot be accessed and after any resource timeouts are met, the resource is marked as failed. Calling dlm_stonith by hand with the data that is passed to it by dlm_controld shows the assertion. root@u2004-1:~# /usr/sbin/dlm_stonith -n 2 -t 1612361398 dlm_stonith: utils.c:57: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. It would appear that the code in libqb is over aggressive on the sanity checking, or assumes that QB_LOG_INIT_DATA will only be called by the library. External programs such as pacemaker that end up calling CRM_TRACE_INIT_DATA will suffer the same assertion. This
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Changed in: pacemaker (Ubuntu) Assignee: (unassigned) => Dariusz Gadomski (dgadomski) ** Changed in: pacemaker (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Tags added: server-next -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915828] Re: pacemaker fails to release clustered filesystem dlm locks on failover
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915828 Title: pacemaker fails to release clustered filesystem dlm locks on failover To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1915828/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs