Re: [ClusterLabs] lrmd segfault
ale...@kurnosov.spb.ru writes: > [ Unknown signature status ] > > Hi All. > > We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 > SL7(Scientific linux) and 2 SL6. > SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - > 1.1.13-10), SL6 build from sources (same version). > The cluster not unified, some nodes have RA which other do not have. crmsh > used for management. > SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the > exactly same place. > Here is an example: > Just from looking at the core dump, it looks like your processor doesn't support the SSE extensions used by the newer version of the code. You'll need to recompile and disable use of those extensions. It looks like the code is using SSE 4.2, which is relatively new: https://en.wikipedia.org/wiki/SSE4#SSE4.2 Cheers, Kristoffer > Core was generated by `/usr/libexec/pacemaker/lrmd'. > Program terminated with signal 11, Segmentation fault. > #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > 164 movdqu (%rdi), %xmm1 > (gdb) bt > #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > #1 0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 > "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416 > #2 0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at > services.c:644 > #3 0x7fed073eac1d in services_action_async (op=0xed58e0, > action_callback=) at services.c:625 > #4 0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, > rsc=0xed4500) at lrmd.c:1242 > #5 lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308 > #6 lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry= variable: value has been optimized out>) at lrmd.c:1317 > #7 0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, > callback=, userdata=) at mainloop.c:107 > #8 0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109 > #9 g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708 > #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, > block=block@entry=1, dispatch=dispatch@entry=1, self=) at > gmain.c:3779 > #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973 > #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) at > main.c:476 > > Any help would be appreciated. > > -- > Alexey Kurnosov > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync maximum nodes
Thank you for reply and a useful link. Have a nice day! 2017-01-27 17:43 GMT+08:00 Гюльнара Невежина : > Hello! > I'm very sorry to disturb you with such question but I can't find > information if there is maximum nodes' limit in corosync? I've found a bug > report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with > "Corosync has hardcoded maximum number of nodes to 64" but it was posted 4 > years ago.. > If anybody knows how many nodes I can add to future HA cluster? > > Any tips or links would be much appreciated. > Thank you! > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker kill does not cause node fault ???
On 10/01/17 05:24 AM, Stefan Schloesser wrote: > Hi, > > I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems > to be working ok including the STONITH. > For test purposes I issued a "pkill -f pace" killing all pacemaker processes > on one node. > > Result: > The node is marked as "pending", all resources stay on it. If I manually kill > a resource it is not noticed. On the other node a drbd "promote" command > fails (drbd is still running as master on the first node). > > Killing the corosync process works as expected -> STONITH. > > Could someone shed some light on this behavior? > > Thanks, > > Stefan A good way to test fencing is to crash the OS with 'echo c > /proc/sysrq-trigger', which causes an immediate segfault. The only recovery is a reboot, so it's excellent for simulating a hung node. Make sure, too, that you've hooked DRBD's fencing into pacemaker with 'fencing resource-and-stonith;' and using the crm-{un,}fence-peer.sh {un,}fence-handlers. If these are bare-iron nodes, also test by pulling the power out of the node entirely while it was running. If you can pass both of these tests, you will have simulated most all possible node failure modes (I say 'most' because it is impossible to think of everything :) ). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.
you can start, reading the meta-data session of the resource agent https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/oracle 2017-01-31 0:21 GMT+01:00 Jihed M'selmi : > I wish I could :-/ > > All I am asking about the requirement to use the Resource Agent > OCF:heartbeat:oracle and ocf:heartbeat:orclsnr. > > Thanks > > Jihed M’SELMI > Mobile: +21658433664 > http://about.me/jihed.mselmi > > On Tue, Jan 31, 2017 at 12:16 AM, emmanuel segura > wrote: >> >> please, if you need help, the first thing is show, your cluster >> configuration. >> >> 2017-01-30 23:15 GMT+01:00 Jihed M'selmi : >> > I tried to install two resources: a resource for oracle database and >> > oracle >> > listener: but the pcmk can't install the resource (red hat 7.3) usint >> > hte >> > ocf:heartbeat:oracle and oraclsnr >> > >> > On the log,ti shows that the sqlplus was not installed. >> > >> > I installed it, but, I keep getting the same message and the resources >> > were >> > not installed. >> > >> > Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ? >> > >> > In my resource group, I have One rsc for IP, Three rsc for filesystems >> > where I have the oracle binary, db and backup and I should have Two more >> > rsc >> > for database and listener. >> > >> > Could anyone share how to configure a peacemaker and corosync to host an >> > Oracle database on two nodes ? (or more). >> > >> > Thanks in advance. >> > Cheers, >> > JM >> > >> > ___ >> > Users mailing list: Users@clusterlabs.org >> > http://lists.clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> >> >> -- >> .~. >> /V\ >> // \\ >> /( )\ >> ^`~'^ >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- .~. /V\ // \\ /( )\ ^`~'^ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker kill does not cause node fault ???
On 01/10/2017 04:24 AM, Stefan Schloesser wrote: > Hi, > > I am currently testing a 2 node cluster under Ubuntu 16.04. The setup seems > to be working ok including the STONITH. > For test purposes I issued a "pkill -f pace" killing all pacemaker processes > on one node. > > Result: > The node is marked as "pending", all resources stay on it. If I manually kill > a resource it is not noticed. On the other node a drbd "promote" command > fails (drbd is still running as master on the first node). > > Killing the corosync process works as expected -> STONITH. > > Could someone shed some light on this behavior? > > Thanks, > > Stefan I suspect that, when you kill pacemakerd, systemd respawns it quickly enough that fencing is unnecessary. Try "pkill -f pace; systemd stop pacemaker". Did you schedule monitor operations on your resources? If not, pacemaker will not know if they go down. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.
I wish I could :-/ All I am asking about the requirement to use the Resource Agent OCF:heartbeat:oracle and ocf:heartbeat:orclsnr. Thanks Jihed M’SELMI Mobile: +21658433664 http://about.me/jihed.mselmi On Tue, Jan 31, 2017 at 12:16 AM, emmanuel segura wrote: > please, if you need help, the first thing is show, your cluster > configuration. > > 2017-01-30 23:15 GMT+01:00 Jihed M'selmi : > > I tried to install two resources: a resource for oracle database and > oracle > > listener: but the pcmk can't install the resource (red hat 7.3) usint hte > > ocf:heartbeat:oracle and oraclsnr > > > > On the log,ti shows that the sqlplus was not installed. > > > > I installed it, but, I keep getting the same message and the resources > were > > not installed. > > > > Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ? > > > > In my resource group, I have One rsc for IP, Three rsc for filesystems > > where I have the oracle binary, db and backup and I should have Two more > rsc > > for database and listener. > > > > Could anyone share how to configure a peacemaker and corosync to host an > > Oracle database on two nodes ? (or more). > > > > Thanks in advance. > > Cheers, > > JM > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > -- > .~. > /V\ > // \\ > /( )\ > ^`~'^ > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.
please, if you need help, the first thing is show, your cluster configuration. 2017-01-30 23:15 GMT+01:00 Jihed M'selmi : > I tried to install two resources: a resource for oracle database and oracle > listener: but the pcmk can't install the resource (red hat 7.3) usint hte > ocf:heartbeat:oracle and oraclsnr > > On the log,ti shows that the sqlplus was not installed. > > I installed it, but, I keep getting the same message and the resources were > not installed. > > Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ? > > In my resource group, I have One rsc for IP, Three rsc for filesystems > where I have the oracle binary, db and backup and I should have Two more rsc > for database and listener. > > Could anyone share how to configure a peacemaker and corosync to host an > Oracle database on two nodes ? (or more). > > Thanks in advance. > Cheers, > JM > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- .~. /V\ // \\ /( )\ ^`~'^ ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Can't create a resource for ocf:heartbeat:oracle and oraclsnr.
I tried to install two resources: a resource for oracle database and oracle listener: but the pcmk can't install the resource (red hat 7.3) usint hte ocf:heartbeat:oracle and oraclsnr On the log,ti shows that the sqlplus was not installed. I installed it, but, I keep getting the same message and the resources were not installed. Is there any requirement to use OCF:HEARTBEAT:ORACLE and ORACLSNR ? In my resource group, I have One rsc for IP, Three rsc for filesystems where I have the oracle binary, db and backup and I should have Two more rsc for database and listener. Could anyone share how to configure a peacemaker and corosync to host an Oracle database on two nodes ? (or more). Thanks in advance. Cheers, JM ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] HA/Clusterlabs Summit 2017 Proposal
On 30/01/17 09:23 AM, Kristoffer Grönlund wrote: > Hi everyone! > > The last time we had an HA summit was in 2015, and the intention then > was to have SUSE arrange the next meetup in the following year. We did > try to find a date that would be suitable for everyone, but for various > reasons there was never a conclusion and 2016 came and went. > > Well, I'd like to give it another try this year! This time, I've already > got a proposal for a place and date: September 7-8 in Nuremberg, Germany > (SUSE main office). I've got the new event area in the SUSE office > already reserved for these dates. > > My suggestion is to do a two day event similar to the one in Brno, but I > am open to any suggestions as to format and content. The main reason for > having the event would be for everyone to have a chance to meet and get > to know each other, but it's also an opportunity to discuss the future > of Clusterlabs and the direction going forward. > > Any thoughts or feedback are more than welcome! Let me know if you are > interested in coming or unable to make it. > > Cheers, > Kristoffer Thank you for starting this back up. I was just thinking about this a few days ago. I could make it, and I would be happy to help organize it however I might be able to help. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] lrmd segfault
Hi All. We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 SL7(Scientific linux) and 2 SL6. SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - 1.1.13-10), SL6 build from sources (same version). The cluster not unified, some nodes have RA which other do not have. crmsh used for management. SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the exactly same place. Here is an example: Core was generated by `/usr/libexec/pacemaker/lrmd'. Program terminated with signal 11, Segmentation fault. #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 164 movdqu (%rdi), %xmm1 (gdb) bt #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 #1 0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416 #2 0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at services.c:644 #3 0x7fed073eac1d in services_action_async (op=0xed58e0, action_callback=) at services.c:625 #4 0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, rsc=0xed4500) at lrmd.c:1242 #5 lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308 #6 lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry=) at lrmd.c:1317 #7 0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, callback=, userdata=) at mainloop.c:107 #8 0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109 #9 g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708 #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, block=block@entry=1, dispatch=dispatch@entry=1, self=) at gmain.c:3779 #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973 #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) at main.c:476 Any help would be appreciated. -- Alexey Kurnosov pgpOLPeiAUqoD.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] HA/Clusterlabs Summit 2017 Proposal
Hi everyone! The last time we had an HA summit was in 2015, and the intention then was to have SUSE arrange the next meetup in the following year. We did try to find a date that would be suitable for everyone, but for various reasons there was never a conclusion and 2016 came and went. Well, I'd like to give it another try this year! This time, I've already got a proposal for a place and date: September 7-8 in Nuremberg, Germany (SUSE main office). I've got the new event area in the SUSE office already reserved for these dates. My suggestion is to do a two day event similar to the one in Brno, but I am open to any suggestions as to format and content. The main reason for having the event would be for everyone to have a chance to meet and get to know each other, but it's also an opportunity to discuss the future of Clusterlabs and the direction going forward. Any thoughts or feedback are more than welcome! Let me know if you are interested in coming or unable to make it. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync maximum nodes
Hello! I'm very sorry to disturb you with such question but I can't find information if there is maximum nodes' limit in corosync? I've found a bug report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with "Corosync has hardcoded maximum number of nodes to 64" but it was posted 4 years ago.. This limit is gone in 2.x, but it doesn't mean corosync is able to handle much more without finetuning. If anybody knows how many nodes I can add to future HA cluster? http://lists.clusterlabs.org/pipermail/users/2017-January/004764.html But honestly, as Chrissie told, pacemaker-remote is usually better way to go (as long as you are not planning to use dlm on all nodes). Honza Any tips or links would be much appreciated. Thank you! ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync maximum nodes
On 27/01/17 09:43, Гюльнара Невежина wrote: > Hello! > I'm very sorry to disturb you with such question but I can't find > information if there is maximum nodes' limit in corosync? I've found a > bug report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with > "Corosync has hardcoded maximum number of nodes to 64" but it was posted > 4 years ago.. > If anybody knows how many nodes I can add to future HA cluster? > Even at 64 nodes, corosync needs some tuning to make it reliable. If you want to go above around 32 nodes then pacemaker-remote is probably the least stressful (and recommended) way of doing it. http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/ Chrissie ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org