Re: [ClusterLabs] lrmd segfault
Yes, it was running on bare-metal. After upgrade to SL7.3 and pacemaker 1.1.15 the problem is gone. I hope gone forever. -- Alexey Kurnosov > ... and it is running on bare-metal? > Just to be sure it is not due to some code-patching done by a hypervisor ... > pgpDdYGEVrMdf.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] lrmd segfault
On 01/31/2017 03:12 PM, ale...@kurnosov.spb.ru wrote: > As i said, we used rpm from standard repo, hardly it compiled incorrectly. > And according > to a spec L5630 (the node's CPU) has SSE4.2 support. And in that case it > should be > illegal instruction exception, not segfault. ... and it is running on bare-metal? Just to be sure it is not due to some code-patching done by a hypervisor ... > > -- > Alexey Kurnosov > > On Tue, Jan 31, 2017 at 07:34:18AM +0100, Kristoffer Grönlund wrote: >> ale...@kurnosov.spb.ru writes: >> >>> [ Unknown signature status ] >>> >>> Hi All. >>> >>> We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 >>> SL7(Scientific linux) and 2 SL6. >>> SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - >>> 1.1.13-10), SL6 build from sources (same version). >>> The cluster not unified, some nodes have RA which other do not have. crmsh >>> used for management. >>> SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the >>> exactly same place. >>> Here is an example: >>> >> Just from looking at the core dump, it looks like your processor doesn't >> support the SSE extensions used by the newer version of the code. You'll >> need to recompile and disable use of those extensions. >> >> It looks like the code is using SSE 4.2, which is relatively new: >> >> https://en.wikipedia.org/wiki/SSE4#SSE4.2 >> >> Cheers, >> Kristoffer >> >>> Core was generated by `/usr/libexec/pacemaker/lrmd'. >>> Program terminated with signal 11, Segmentation fault. >>> #0 __strcasecmp_l_sse42 () at >>> ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 >>> 164 movdqu (%rdi), %xmm1 >>> (gdb) bt >>> #0 __strcasecmp_l_sse42 () at >>> ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 >>> #1 0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 >>> "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416 >>> #2 0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at >>> services.c:644 >>> #3 0x7fed073eac1d in services_action_async (op=0xed58e0, >>> action_callback=) at services.c:625 >>> #4 0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, >>> rsc=0xed4500) at lrmd.c:1242 >>> #5 lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308 >>> #6 lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry=>> variable: value has been optimized out>) at lrmd.c:1317 >>> #7 0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, >>> callback=, userdata=) at mainloop.c:107 >>> #8 0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109 >>> #9 g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708 >>> #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, >>> block=block@entry=1, dispatch=dispatch@entry=1, self=) at >>> gmain.c:3779 >>> #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973 >>> #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) >>> at main.c:476 >>> >>> Any help would be appreciated. >>> >>> -- >>> Alexey Kurnosov >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] lrmd segfault
As i said, we used rpm from standard repo, hardly it compiled incorrectly. And according to a spec L5630 (the node's CPU) has SSE4.2 support. And in that case it should be illegal instruction exception, not segfault. -- Alexey Kurnosov On Tue, Jan 31, 2017 at 07:34:18AM +0100, Kristoffer Grönlund wrote: > ale...@kurnosov.spb.ru writes: > > > [ Unknown signature status ] > > > > Hi All. > > > > We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 > > SL7(Scientific linux) and 2 SL6. > > SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - > > 1.1.13-10), SL6 build from sources (same version). > > The cluster not unified, some nodes have RA which other do not have. crmsh > > used for management. > > SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the > > exactly same place. > > Here is an example: > > > > Just from looking at the core dump, it looks like your processor doesn't > support the SSE extensions used by the newer version of the code. You'll > need to recompile and disable use of those extensions. > > It looks like the code is using SSE 4.2, which is relatively new: > > https://en.wikipedia.org/wiki/SSE4#SSE4.2 > > Cheers, > Kristoffer > > > Core was generated by `/usr/libexec/pacemaker/lrmd'. > > Program terminated with signal 11, Segmentation fault. > > #0 __strcasecmp_l_sse42 () at > > ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > > 164 movdqu (%rdi), %xmm1 > > (gdb) bt > > #0 __strcasecmp_l_sse42 () at > > ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > > #1 0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 > > "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416 > > #2 0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at > > services.c:644 > > #3 0x7fed073eac1d in services_action_async (op=0xed58e0, > > action_callback=) at services.c:625 > > #4 0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, > > rsc=0xed4500) at lrmd.c:1242 > > #5 lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308 > > #6 lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry= > variable: value has been optimized out>) at lrmd.c:1317 > > #7 0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, > > callback=, userdata=) at mainloop.c:107 > > #8 0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109 > > #9 g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708 > > #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, > > block=block@entry=1, dispatch=dispatch@entry=1, self=) at > > gmain.c:3779 > > #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973 > > #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) > > at main.c:476 > > > > Any help would be appreciated. > > > > -- > > Alexey Kurnosov > > ___ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > -- > // Kristoffer Grönlund > // kgronl...@suse.com pgpU0oc7XiVQ8.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org