Re: [ClusterLabs] lrmd segfault

2017-02-01 Thread alexey

Yes, it was running on bare-metal.

After upgrade to SL7.3 and pacemaker 1.1.15 the problem is gone.
I hope gone forever.

--
Alexey Kurnosov

> ... and it is running on bare-metal?
> Just to be sure it is not due to some code-patching done by a hypervisor ...
> 


pgpDdYGEVrMdf.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] lrmd segfault

2017-01-31 Thread Klaus Wenninger
On 01/31/2017 03:12 PM, ale...@kurnosov.spb.ru wrote:
> As i said, we used rpm from standard repo, hardly it compiled incorrectly. 
> And according
> to a spec L5630 (the node's CPU) has SSE4.2 support. And in that case it 
> should be
> illegal instruction exception, not segfault.

... and it is running on bare-metal?
Just to be sure it is not due to some code-patching done by a hypervisor ...

>
> --
> Alexey Kurnosov
>
> On Tue, Jan 31, 2017 at 07:34:18AM +0100, Kristoffer Grönlund wrote:
>> ale...@kurnosov.spb.ru writes:
>>
>>> [ Unknown signature status ]
>>>
>>> Hi All.
>>>
>>> We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 
>>> SL7(Scientific linux) and 2 SL6.
>>> SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - 
>>> 1.1.13-10), SL6 build from sources (same version).
>>> The cluster not unified, some nodes have RA which other do not have. crmsh 
>>> used for management.
>>> SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the 
>>> exactly same place.
>>> Here is an example:
>>>
>> Just from looking at the core dump, it looks like your processor doesn't
>> support the SSE extensions used by the newer version of the code. You'll
>> need to recompile and disable use of those extensions.
>>
>> It looks like the code is using SSE 4.2, which is relatively new:
>>
>> https://en.wikipedia.org/wiki/SSE4#SSE4.2
>>
>> Cheers,
>> Kristoffer
>>
>>> Core was generated by `/usr/libexec/pacemaker/lrmd'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  __strcasecmp_l_sse42 () at 
>>> ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>>> 164 movdqu  (%rdi), %xmm1
>>> (gdb) bt
>>> #0  __strcasecmp_l_sse42 () at 
>>> ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
>>> #1  0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 
>>> "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416
>>> #2  0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at 
>>> services.c:644
>>> #3  0x7fed073eac1d in services_action_async (op=0xed58e0, 
>>> action_callback=) at services.c:625
>>> #4  0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, 
>>> rsc=0xed4500) at lrmd.c:1242
>>> #5  lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308
>>> #6  lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry=>> variable: value has been optimized out>) at lrmd.c:1317
>>> #7  0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, 
>>> callback=, userdata=) at mainloop.c:107
>>> #8  0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109
>>> #9  g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708
>>> #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, 
>>> block=block@entry=1, dispatch=dispatch@entry=1, self=) at 
>>> gmain.c:3779
>>> #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973
>>> #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) 
>>> at main.c:476
>>>
>>> Any help would be appreciated.
>>>
>>> --
>>> Alexey Kurnosov
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] lrmd segfault

2017-01-31 Thread alexey

As i said, we used rpm from standard repo, hardly it compiled incorrectly. And 
according
to a spec L5630 (the node's CPU) has SSE4.2 support. And in that case it should 
be
illegal instruction exception, not segfault.

--
Alexey Kurnosov

On Tue, Jan 31, 2017 at 07:34:18AM +0100, Kristoffer Grönlund wrote:
> ale...@kurnosov.spb.ru writes:
> 
> > [ Unknown signature status ]
> >
> > Hi All.
> >
> > We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 
> > SL7(Scientific linux) and 2 SL6.
> > SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - 
> > 1.1.13-10), SL6 build from sources (same version).
> > The cluster not unified, some nodes have RA which other do not have. crmsh 
> > used for management.
> > SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the 
> > exactly same place.
> > Here is an example:
> >
> 
> Just from looking at the core dump, it looks like your processor doesn't
> support the SSE extensions used by the newer version of the code. You'll
> need to recompile and disable use of those extensions.
> 
> It looks like the code is using SSE 4.2, which is relatively new:
> 
> https://en.wikipedia.org/wiki/SSE4#SSE4.2
> 
> Cheers,
> Kristoffer
> 
> > Core was generated by `/usr/libexec/pacemaker/lrmd'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  __strcasecmp_l_sse42 () at 
> > ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> > 164 movdqu  (%rdi), %xmm1
> > (gdb) bt
> > #0  __strcasecmp_l_sse42 () at 
> > ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> > #1  0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 
> > "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416
> > #2  0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at 
> > services.c:644
> > #3  0x7fed073eac1d in services_action_async (op=0xed58e0, 
> > action_callback=) at services.c:625
> > #4  0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, 
> > rsc=0xed4500) at lrmd.c:1242
> > #5  lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308
> > #6  lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry= > variable: value has been optimized out>) at lrmd.c:1317
> > #7  0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, 
> > callback=, userdata=) at mainloop.c:107
> > #8  0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109
> > #9  g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708
> > #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, 
> > block=block@entry=1, dispatch=dispatch@entry=1, self=) at 
> > gmain.c:3779
> > #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973
> > #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) 
> > at main.c:476
> >
> > Any help would be appreciated.
> >
> > --
> > Alexey Kurnosov
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com


pgpU0oc7XiVQ8.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org