Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread Andrea
emmanuel segura emi2fast@... writes:

 
 sorry, but i forgot to tell you, you need to know the fence_scsi
 doesn't reboot the evicted node, so you can combine fence_vmware with
 fence_scsi as the second option.
 
 for this, i'm trying to use a watchdog script 
https://access.redhat.com/solutions/65187

But when I start wachdog daemon, all node reboot.
I continue testing...



 2015-01-27 11:44 GMT+01:00 emmanuel segura emi2fast at gmail.com:
  In normal situation every node can in your file system, fence_scsi is
  used when your cluster is in split-braint, when your a node doesn't
  comunicate with the other node, i don't is good idea.
 

So, i will see key registration only when nodes loose comunication?





 
  2015-01-27 11:35 GMT+01:00 Andrea a.bacchi at codices.com:
  Andrea a.bacchi at ... writes:






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread emmanuel segura
please show your configuration and your logs.

2015-01-27 14:24 GMT+01:00 Andrea a.bac...@codices.com:
 emmanuel segura emi2fast@... writes:



 if you are using cman+pacemaker you need to enabled the stonith and
 configuring that in you crm config


 2015-01-27 14:05 GMT+01:00 Vinod Prabhu
 pvinod@gmail.com:
 is stonith enabled in crm conf?


 yes, stonith is enabled

 [ONE]pcs property
 Cluster Properties:
  cluster-infrastructure: cman
  dc-version: 1.1.11-97629de
  last-lrm-refresh: 1422285715
  no-quorum-policy: ignore
  stonith-enabled: true


 If I disable it, stonith device don't start



 On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
 emi2f...@gmail.com wrote:When a node is dead
 the registration key is removed.

 So I must see 2 key registered when I add fence_scsi device?
 But I don't see 2 key registered...



 Andrea






 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Segfault on monitor resource

2015-01-27 Thread emmanuel segura
maybe you can use sar for checking if your server was tight of resources?

Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: 910:
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork


2015-01-26 18:22 GMT+01:00 Oscar Salvador osalvador.vilard...@gmail.com:
 Oh, I forgot some important details:

 root# (S) crm status
 
 Last updated: Mon Jan 26 18:21:35 2015
 Last change: Sun Jan 25 05:19:13 2015 via crm_resource on lb01
 Stack: Heartbeat
 Current DC: lb01 (43b2c5a1-9552-4438-962b-6e98a2dd67c7) - partition with
 quorum
 Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
 2 Nodes configured, unknown expected votes
 8 Resources configured.
 

 Online: [ lb01 lb02 ]

  IP-rsc_mysql (ocf::heartbeat:IPaddr2): Started lb02
  IP-rsc_nginx (ocf::heartbeat:IPaddr2): Started lb02
  IP-rsc_nginx6 (ocf::heartbeat:IPv6addr): Started lb02
  IP-rsc_mysql6 (ocf::heartbeat:IPv6addr): Started lb02
  IP-rsc_elasticsearch6 (ocf::heartbeat:IPv6addr): Started lb02
  IP-rsc_elasticsearch (ocf::heartbeat:IPaddr2): Started lb02
  Ldirector-rsc (ocf::heartbeat:ldirectord): Started lb02
  Nginx-rsc (ocf::heartbeat:nginx): Started lb02


 This is running on:

 Debian7.8
 pacemaker  1.1.7-1

 2015-01-26 18:20 GMT+01:00 Oscar Salvador osalvador.vilard...@gmail.com:

 Hi!

 I'm writing here because two days ago I experienced a strange problem in
 my Pacemaker Cluster.
 Everything was working fine, till suddenly a Segfault in Nginx monitor
 resource happened:

 Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph:  Transition 7551
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
 Source=/var/lib/pengine/pe-input-90.bz2): Complete
 Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
 cause=C_FSA_INTERNAL origin=notify_crmd ]
 Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
 (0.00us average, 0% utilization) in the last 10min
 Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
 Timer (I_PE_CALC) just popped (90ms)
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
 origin=crm_timer_popped ]
 Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
 to state S_POLICY_ENGINE after C_TIMER_POPPED
 Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
 failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
 Jan 25 04:10:24 lb02 pengine: [10028]: notice: common_apply_stickiness:
 Ldirector-rsc can fail 97 more times on lb02 before being forced off
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
 cause=C_IPC_MESSAGE origin=handle_response ]
 Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
 Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
 Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
 7552 (ref=pe_calc-dc-1422155424-7644) derived from
 /var/lib/pengine/pe-input-90.bz2
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph:  Transition 7552
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
 Source=/var/lib/pengine/pe-input-90.bz2): Complete
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
 cause=C_FSA_INTERNAL origin=notify_crmd ]


 Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
 (Nginx-rsc:monitor:stderr) Segmentation fault   *** here it starts

 As you can see, the last line.
 And then:

 Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
 (Nginx-rsc:monitor:stderr) Killed
 /usr/lib/ocf/resource.d//heartbeat/nginx: 910:
 /usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork

 I guess here Nginx was killed.

 And then I have some others errors till Pacemaker decide to move the
 resources to the node:

 Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
 Nginx-rsc_monitor_1 (call=52, rc=2, cib-update=7633, confirmed=false)
 invalid parameter
 Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
 action Nginx-rsc_monitor_1 from a different transition: 5739 vs. 7552
 Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
 process_graph_event:476 - Triggered transition abort (complete=1,
 tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
 magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
 3.14.40) : Old event
 Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
 failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
 time=1422155430)
 Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_IDLE - 

Re: [Pacemaker] Segfault on monitor resource

2015-01-27 Thread Dejan Muhamedagic
Hi,

On Mon, Jan 26, 2015 at 06:20:35PM +0100, Oscar Salvador wrote:
 Hi!
 
 I'm writing here because two days ago I experienced a strange problem in my
 Pacemaker Cluster.
 Everything was working fine, till suddenly a Segfault in Nginx monitor
 resource happened:
 
 Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph:  Transition 7551
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
 Source=/var/lib/pengine/pe-input-90.bz2): Complete
 Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
 cause=C_FSA_INTERNAL origin=notify_crmd ]
 Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
 (0.00us average, 0% utilization) in the last 10min
 Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine Recheck
 Timer (I_PE_CALC) just popped (90ms)
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
 origin=crm_timer_popped ]
 Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed to
 state S_POLICY_ENGINE after C_TIMER_POPPED
 Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
 failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
 Jan 25 04:10:24 lb02 pengine: [10028]: notice: common_apply_stickiness:
 Ldirector-rsc can fail 97 more times on lb02 before being forced off
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
 cause=C_IPC_MESSAGE origin=handle_response ]
 Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
 Transition 7552: PEngine Input stored in: /var/lib/pengine/pe-input-90.bz2
 Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
 7552 (ref=pe_calc-dc-1422155424-7644) derived from
 /var/lib/pengine/pe-input-90.bz2
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph:  Transition 7552
 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
 Source=/var/lib/pengine/pe-input-90.bz2): Complete
 Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
 cause=C_FSA_INTERNAL origin=notify_crmd ]
 
 
 Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
 (Nginx-rsc:monitor:stderr) Segmentation fault   *** here it starts

What exactly did segfault? Do you have a core dump to examine?

 As you can see, the last line.
 And then:
 
 Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
 (Nginx-rsc:monitor:stderr) Killed
 /usr/lib/ocf/resource.d//heartbeat/nginx: 910:
 /usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork

This could be related to the segfault, or due to other serious
system error.

 I guess here Nginx was killed.
 
 And then I have some others errors till Pacemaker decide to move the
 resources to the node:
 
 Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
 Nginx-rsc_monitor_1 (call=52, rc=2, cib-update=7633, confirmed=false)
 invalid parameter
 Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
 action Nginx-rsc_monitor_1 from a different transition: 5739 vs. 7552
 Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
 process_graph_event:476 - Triggered transition abort (complete=1,
 tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
 magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
 3.14.40) : Old event
 Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
 failcount for Nginx-rsc on lb02 after failed monitor: rc=2 (update=value++,
 time=1422155430)
 Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
 origin=abort_transition_graph ]
 Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
 /var/log/ha-log
 Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
 flush op to all hosts for: fail-count-Nginx-rsc (1)
 Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: unpack_rsc_op: Preventing
 Nginx-rsc from re-starting on lb02: operation monitor failed 'invalid
 parameter' (rc=2)
 Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
 failed op Nginx-rsc_last_failure_0 on lb02: invalid parameter (2)
 Jan 25 04:10:30 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
 failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
 Jan 25 04:10:30 lb02 pengine: [10028]: notice: common_apply_stickiness:
 Ldirector-rsc can fail 97 more times on lb02 before being forced off
 Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
  IP-rsc_mysql (lb02)
 Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
  IP-rsc_nginx (lb02)
 Jan 25 04:10:30 lb02 pengine: [10028]: notice: LogActions: Stop
  IP-rsc_nginx6(lb02)
 Jan 25 04:10:30 lb02 pengine: [10028]: notice: 

Re: [Pacemaker] pacemaker-remote not listening

2015-01-27 Thread David Vossel


- Original Message -
 Hi,
 my os is debian-wheezy
 i compiled and installed pacemaker-remote.
 Startup log:
 Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: crm_log_init: Changed
 active directory to /var/lib/heartbeat/cores/root
 Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: qb_ipcs_us_publish:
 server name: lrmd
 Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: main: Starting
 My problem is, that pacemaker remote is not listening on port 3121

By default pacemaker_remote should listen on 3121. This is odd.

One thing I can think of. Take a look at /etc/sysconfig/pacemaker on the
node running pacemaker_remote. Make sure there isn't a custom port set
using the PCMK_remote_port variable.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html#_pacemaker_and_pacemaker_remote_options

-- Vossel

 netstat -tulpen | grep 3121
 netstat -alpen
 Proto RefCnt Flags Type State I-Node PID/Program name Path
 unix 2 [ ACC ] STREAM LISTENING 6635 2859/pacemaker_remo @lrmd
 unix 2 [ ] DGRAM 6634 2859/pacemaker_remo
 ...
 ...
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Segfault on monitor resource

2015-01-27 Thread Oscar Salvador
 Hi,

I've checked the resource graphs I have, and the resources were fine, so I
think it's not a problem due to a high use of memory or something like that.
And unfortunately I don't have a core dump to analize(I'll enable it for a
future case) so the only thing I have are the logs.

For the line below, I though that was the process in charge to monitore
nginx what was killed due to a segfault:

RA output: (Nginx-rsc:monitor:stderr) Segmentation fault


I've checked the Nginx logs, and there is nothing worth there, actually
there is no activity, so I think it has to be something internal what
caused the failure.
I'll enable coredumps, it's the only thing I can do for now.

Thank you very much

Oscar

2015-01-27 10:39 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:

 Hi,

 On Mon, Jan 26, 2015 at 06:20:35PM +0100, Oscar Salvador wrote:
  Hi!
 
  I'm writing here because two days ago I experienced a strange problem in
 my
  Pacemaker Cluster.
  Everything was working fine, till suddenly a Segfault in Nginx monitor
  resource happened:
 
  Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph:  Transition
 7551
  (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
  Source=/var/lib/pengine/pe-input-90.bz2): Complete
  Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
  transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
  cause=C_FSA_INTERNAL origin=notify_crmd ]
  Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
  (0.00us average, 0% utilization) in the last 10min
  Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
 Recheck
  Timer (I_PE_CALC) just popped (90ms)
  Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
  transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
 cause=C_TIMER_POPPED
  origin=crm_timer_popped ]
  Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
 to
  state S_POLICY_ENGINE after C_TIMER_POPPED
  Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
  failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
  Jan 25 04:10:24 lb02 pengine: [10028]: notice: common_apply_stickiness:
  Ldirector-rsc can fail 97 more times on lb02 before being forced off
  Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
  transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
  cause=C_IPC_MESSAGE origin=handle_response ]
  Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
  Transition 7552: PEngine Input stored in:
 /var/lib/pengine/pe-input-90.bz2
  Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
  7552 (ref=pe_calc-dc-1422155424-7644) derived from
  /var/lib/pengine/pe-input-90.bz2
  Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph:  Transition
 7552
  (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
  Source=/var/lib/pengine/pe-input-90.bz2): Complete
  Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
  transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
  cause=C_FSA_INTERNAL origin=notify_crmd ]
 
 
  Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
  (Nginx-rsc:monitor:stderr) Segmentation fault   *** here it starts

 What exactly did segfault? Do you have a core dump to examine?

  As you can see, the last line.
  And then:
 
  Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
  (Nginx-rsc:monitor:stderr) Killed
  /usr/lib/ocf/resource.d//heartbeat/nginx: 910:
  /usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork

 This could be related to the segfault, or due to other serious
 system error.

  I guess here Nginx was killed.
 
  And then I have some others errors till Pacemaker decide to move the
  resources to the node:
 
  Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
  Nginx-rsc_monitor_1 (call=52, rc=2, cib-update=7633, confirmed=false)
  invalid parameter
  Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
  action Nginx-rsc_monitor_1 from a different transition: 5739 vs. 7552
  Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
  process_graph_event:476 - Triggered transition abort (complete=1,
  tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
  magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
  3.14.40) : Old event
  Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
  failcount for Nginx-rsc on lb02 after failed monitor: rc=2
 (update=value++,
  time=1422155430)
  Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
  transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
 cause=C_FSA_INTERNAL
  origin=abort_transition_graph ]
  Jan 25 04:10:30 lb02 attrd: [9974]: info: log-rotate detected on logfile
  /var/log/ha-log
  Jan 25 04:10:30 lb02 attrd: [9974]: notice: attrd_trigger_update: Sending
  flush op to all hosts for: fail-count-Nginx-rsc (1)
  Jan 25 04:10:30 lb02 pengine: [10028]: ERROR: 

Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread Andrea
Andrea a.bacchi@... writes:

 
 Michael Schwartzkopff ms at ... writes:
 
  
  Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
   On 21.01.2015 11:18 Digimer wrote:
On 21/01/15 08:13 AM, Andrea wrote:
 Hi All,
 
 I have a question about stonith
 In my scenarion , I have to create 2 node cluster, but I don't 
  
  Are you sure that you do not have fencing hardware? Perhaps you just did
nit 
  configure it? Please read the manual of you BIOS and check your system
 board if 
  you have a IPMI interface.
  
 
 In my test, when I simulate network failure, split brain occurs, and
 when
 network come back, One node kill the other node
 -log on node 1:
 Jan 21 11:45:28 corosync [CMAN  ] memb: Sending KILL to node 2
 
 -log on node 2:
 Jan 21 11:45:28 corosync [CMAN  ] memb: got KILL for node 2
  
  That is how fencing works.
  
  Mit freundlichen Grüßen,
  
  Michael Schwartzkopff
  
 
 Hi All
 
 many thanks for your replies.
 I will update my scenario to ask about adding some devices for stonith
 - Option 1
 I will ask for having 2 vmware virtual machine, so i can try fance_vmware
 -Option 2
 In the project, maybe will need a shared storage. In this case, the shared
 storage will be a NAS that a can add to my nodes via iscsi. In this case I
 can try fence_scsi
 
 I will write here about news
 
 Many thanks  to all for support
 Andrea
 
 


some news

- Option 2
In the customer environment I configured a iscsi target that our project
will use as cluster filesystem

[ONE]pvcreate /dev/sdb
[ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
[ONE]lvcreate -L*G -n cluster_lv cluster_vg
[ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS /dev/cluster_vg/cluster_lv

now I can add a Filesystem resource 

[ONE]pcs resource create clusterfs Filesystem
device=/dev/cluster_vg/cluster_lv directory=/var/mountpoint
fstype=gfs2 options=noatime op monitor interval=10s clone interleave=true

and I can read and write from both node.


Now I'd like to use this device with fence_scsi. 
It is ok? because I see in the man page this:
The fence_scsi agent works by having each node in the cluster register a
unique key with the SCSI devive(s). Once registered, a single node will
become the reservation holder by creating  a  write  exclu-sive, 
registrants only reservation on the device(s). The result is that only
registered nodes may write to the device(s)
It's no good for me, I need both node can write on the device.
So, I need another device to use with fence_scsi? In this case I will try to
create two partition, sdb1 and sdb2, on this device and use sdb1 as
clusterfs and sdb2 for fencing.


If i try to manually test this, I obtain before any operation 
[ONE]sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
  PR generation=0x27, 1 registered reservation key follows:
0x98343e580002734d


Then, I try to set serverHA1 key
[serverHA1]fence_scsi -d
/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
serverHA1 -o on

But nothing has changed
[ONE]sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
  PR generation=0x27, 1 registered reservation key follows:
0x98343e580002734d


and in the log:
gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
(node_key=4d5a0001, dev=/dev/sde)
gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)

The same when i try on serverHA2
It is normal?


In any case, i try to create a stonith device
[ONE]pcs stonith create iscsi-stonith-device fence_scsi
pcmk_host_list=serverHA1 serverHA2
devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
provides=unfencing

and the cluster status is ok
[ONE] pcs status
Cluster name: MyCluHA
Last updated: Tue Jan 27 11:21:48 2015
Last change: Tue Jan 27 10:46:57 2015
Stack: cman
Current DC: serverHA1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
5 Resources configured


Online: [ serverHA1 serverHA2 ]

Full list of resources:

 Clone Set: ping-clone [ping]
 Started: [ serverHA1 serverHA2 ]
 Clone Set: clusterfs-clone [clusterfs]
 Started: [ serverHA1 serverHA2 ]
 iscsi-stonith-device   (stonith:fence_scsi):   Started serverHA1 



How I can try this from remote connection?


Andrea

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread emmanuel segura
sorry, but i forgot to tell you, you need to know the fence_scsi
doesn't reboot the evicted node, so you can combine fence_vmware with
fence_scsi as the second option.

2015-01-27 11:44 GMT+01:00 emmanuel segura emi2f...@gmail.com:
 In normal situation every node can in your file system, fence_scsi is
 used when your cluster is in split-braint, when your a node doesn't
 comunicate with the other node, i don't is good idea.


 2015-01-27 11:35 GMT+01:00 Andrea a.bac...@codices.com:
 Andrea a.bacchi@... writes:


 Michael Schwartzkopff ms at ... writes:

 
  Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
   On 21.01.2015 11:18 Digimer wrote:
On 21/01/15 08:13 AM, Andrea wrote:
 Hi All,

 I have a question about stonith
 In my scenarion , I have to create 2 node cluster, but I don't
 
  Are you sure that you do not have fencing hardware? Perhaps you just did
 nit
  configure it? Please read the manual of you BIOS and check your system
 board if
  you have a IPMI interface.
 

 In my test, when I simulate network failure, split brain occurs, 
 and
 when
 network come back, One node kill the other node
 -log on node 1:
 Jan 21 11:45:28 corosync [CMAN  ] memb: Sending KILL to node 2

 -log on node 2:
 Jan 21 11:45:28 corosync [CMAN  ] memb: got KILL for node 2
 
  That is how fencing works.
 
  Mit freundlichen Grüßen,
 
  Michael Schwartzkopff
 

 Hi All

 many thanks for your replies.
 I will update my scenario to ask about adding some devices for stonith
 - Option 1
 I will ask for having 2 vmware virtual machine, so i can try fance_vmware
 -Option 2
 In the project, maybe will need a shared storage. In this case, the shared
 storage will be a NAS that a can add to my nodes via iscsi. In this case I
 can try fence_scsi

 I will write here about news

 Many thanks  to all for support
 Andrea




 some news

 - Option 2
 In the customer environment I configured a iscsi target that our project
 will use as cluster filesystem

 [ONE]pvcreate /dev/sdb
 [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
 [ONE]lvcreate -L*G -n cluster_lv cluster_vg
 [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS 
 /dev/cluster_vg/cluster_lv

 now I can add a Filesystem resource

 [ONE]pcs resource create clusterfs Filesystem
 device=/dev/cluster_vg/cluster_lv directory=/var/mountpoint
 fstype=gfs2 options=noatime op monitor interval=10s clone interleave=true

 and I can read and write from both node.


 Now I'd like to use this device with fence_scsi.
 It is ok? because I see in the man page this:
 The fence_scsi agent works by having each node in the cluster register a
 unique key with the SCSI devive(s). Once registered, a single node will
 become the reservation holder by creating  a  write  exclu-sive,
 registrants only reservation on the device(s). The result is that only
 registered nodes may write to the device(s)
 It's no good for me, I need both node can write on the device.
 So, I need another device to use with fence_scsi? In this case I will try to
 create two partition, sdb1 and sdb2, on this device and use sdb1 as
 clusterfs and sdb2 for fencing.


 If i try to manually test this, I obtain before any operation
 [ONE]sg_persist -n --read-keys
 --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
   PR generation=0x27, 1 registered reservation key follows:
 0x98343e580002734d


 Then, I try to set serverHA1 key
 [serverHA1]fence_scsi -d
 /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
 serverHA1 -o on

 But nothing has changed
 [ONE]sg_persist -n --read-keys
 --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
   PR generation=0x27, 1 registered reservation key follows:
 0x98343e580002734d


 and in the log:
 gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
 (node_key=4d5a0001, dev=/dev/sde)
 gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
 gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)

 The same when i try on serverHA2
 It is normal?


 In any case, i try to create a stonith device
 [ONE]pcs stonith create iscsi-stonith-device fence_scsi
 pcmk_host_list=serverHA1 serverHA2
 devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
 provides=unfencing

 and the cluster status is ok
 [ONE] pcs status
 Cluster name: MyCluHA
 Last updated: Tue Jan 27 11:21:48 2015
 Last change: Tue Jan 27 10:46:57 2015
 Stack: cman
 Current DC: serverHA1 - partition with quorum
 Version: 1.1.11-97629de
 2 Nodes configured
 5 Resources configured


 Online: [ serverHA1 serverHA2 ]

 Full list of resources:

  Clone Set: ping-clone [ping]
  Started: [ serverHA1 serverHA2 ]
  Clone Set: clusterfs-clone [clusterfs]
  Started: [ serverHA1 serverHA2 ]
  iscsi-stonith-device   (stonith:fence_scsi):   Started serverHA1



 How I can try this from remote connection?


 Andrea

 ___
 Pacemaker 

[Pacemaker] rrp_mode in corosync.conf

2015-01-27 Thread Kostiantyn Ponomarenko
Hi all,

I've been looking for a good answer to my question, but all information I
found is ambiguous.
I hope to get a good answer here =)

The only description about active and passive modes I found is:
Active: both rings will be active, in use
Passive: only one of the 2 rings is in use, the second one will be use
only if the first one fails

There is no description of how it works and what the impact is?
So, my general question is: How the rings are used in active and
passive modes?


Thank you,
Kostya
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread emmanuel segura
In normal situation every node can in your file system, fence_scsi is
used when your cluster is in split-braint, when your a node doesn't
comunicate with the other node, i don't is good idea.


2015-01-27 11:35 GMT+01:00 Andrea a.bac...@codices.com:
 Andrea a.bacchi@... writes:


 Michael Schwartzkopff ms at ... writes:

 
  Am Donnerstag, 22. Januar 2015, 10:03:38 schrieb E. Kuemmerle:
   On 21.01.2015 11:18 Digimer wrote:
On 21/01/15 08:13 AM, Andrea wrote:
 Hi All,

 I have a question about stonith
 In my scenarion , I have to create 2 node cluster, but I don't
 
  Are you sure that you do not have fencing hardware? Perhaps you just did
 nit
  configure it? Please read the manual of you BIOS and check your system
 board if
  you have a IPMI interface.
 

 In my test, when I simulate network failure, split brain occurs, and
 when
 network come back, One node kill the other node
 -log on node 1:
 Jan 21 11:45:28 corosync [CMAN  ] memb: Sending KILL to node 2

 -log on node 2:
 Jan 21 11:45:28 corosync [CMAN  ] memb: got KILL for node 2
 
  That is how fencing works.
 
  Mit freundlichen Grüßen,
 
  Michael Schwartzkopff
 

 Hi All

 many thanks for your replies.
 I will update my scenario to ask about adding some devices for stonith
 - Option 1
 I will ask for having 2 vmware virtual machine, so i can try fance_vmware
 -Option 2
 In the project, maybe will need a shared storage. In this case, the shared
 storage will be a NAS that a can add to my nodes via iscsi. In this case I
 can try fence_scsi

 I will write here about news

 Many thanks  to all for support
 Andrea




 some news

 - Option 2
 In the customer environment I configured a iscsi target that our project
 will use as cluster filesystem

 [ONE]pvcreate /dev/sdb
 [ONE]vgcreate -Ay -cy cluster_vg /dev/sdb
 [ONE]lvcreate -L*G -n cluster_lv cluster_vg
 [ONE]mkfs.gfs2 -j2 -p lock_dlm -t ProjectHA:ArchiveFS 
 /dev/cluster_vg/cluster_lv

 now I can add a Filesystem resource

 [ONE]pcs resource create clusterfs Filesystem
 device=/dev/cluster_vg/cluster_lv directory=/var/mountpoint
 fstype=gfs2 options=noatime op monitor interval=10s clone interleave=true

 and I can read and write from both node.


 Now I'd like to use this device with fence_scsi.
 It is ok? because I see in the man page this:
 The fence_scsi agent works by having each node in the cluster register a
 unique key with the SCSI devive(s). Once registered, a single node will
 become the reservation holder by creating  a  write  exclu-sive,
 registrants only reservation on the device(s). The result is that only
 registered nodes may write to the device(s)
 It's no good for me, I need both node can write on the device.
 So, I need another device to use with fence_scsi? In this case I will try to
 create two partition, sdb1 and sdb2, on this device and use sdb1 as
 clusterfs and sdb2 for fencing.


 If i try to manually test this, I obtain before any operation
 [ONE]sg_persist -n --read-keys
 --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
   PR generation=0x27, 1 registered reservation key follows:
 0x98343e580002734d


 Then, I try to set serverHA1 key
 [serverHA1]fence_scsi -d
 /dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 -f /tmp/miolog.txt -n
 serverHA1 -o on

 But nothing has changed
 [ONE]sg_persist -n --read-keys
 --device=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4
   PR generation=0x27, 1 registered reservation key follows:
 0x98343e580002734d


 and in the log:
 gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore
 (node_key=4d5a0001, dev=/dev/sde)
 gen 26 17:53:27 fence_scsi: [debug] main::do_reset (dev=/dev/sde, status=6)
 gen 26 17:53:27 fence_scsi: [debug] main::do_register_ignore (err=0)

 The same when i try on serverHA2
 It is normal?


 In any case, i try to create a stonith device
 [ONE]pcs stonith create iscsi-stonith-device fence_scsi
 pcmk_host_list=serverHA1 serverHA2
 devices=/dev/disk/by-id/scsi-36e843b608e55bb8d6d72d43bfdbc47d4 meta
 provides=unfencing

 and the cluster status is ok
 [ONE] pcs status
 Cluster name: MyCluHA
 Last updated: Tue Jan 27 11:21:48 2015
 Last change: Tue Jan 27 10:46:57 2015
 Stack: cman
 Current DC: serverHA1 - partition with quorum
 Version: 1.1.11-97629de
 2 Nodes configured
 5 Resources configured


 Online: [ serverHA1 serverHA2 ]

 Full list of resources:

  Clone Set: ping-clone [ping]
  Started: [ serverHA1 serverHA2 ]
  Clone Set: clusterfs-clone [clusterfs]
  Started: [ serverHA1 serverHA2 ]
  iscsi-stonith-device   (stonith:fence_scsi):   Started serverHA1



 How I can try this from remote connection?


 Andrea

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 

[Pacemaker] Pre/post-action notifications for master-slave and clone resources

2015-01-27 Thread Vladislav Bogdanov

Hi,

Playing with two-week old git master on a two-node cluster I discovered 
that only limited set of notify operations is performed for clone and 
master-slave instances when all of them are being started/stopped.


Clones (anonymous):
* post-start
* pre-stop

M/S:
* post-start
* post-promote
* pre-demote
* pre-stop

According to Pacemaker Explained there should be more:

* pre-start
* pre-promote
* post-demote
* post-stop

Some notifications (pre-stop for my clone and pre-demote for ms) are 
repeated twice (due to transition aborts or fact that multiple instances 
are stopping/demoting?) but that has minor impact for me.


I tested that by setting stop-all-resources property to 'true' and 'false'.

On the other hand, if I put one node with running instances into standby 
and then into online states, I see all missing notifications.


I that intended that actions above are not performed when all instances 
are handled simultaneously?


One more question about 'post' notifications: Are they send to RA 
right after corresponding main action is finished or they wait in the 
transition queue? In other words, is it possible to get post-stop 
notification that the foreign instance is stopped during the time the 
stop action on the local instance is still running?


Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread Andrea
emmanuel segura emi2fast@... writes:

 
 please show your configuration and your logs.
 
 2015-01-27 14:24 GMT+01:00 Andrea a.bacchi@...:
  emmanuel segura emi2fast at ... writes:
 
 
 
  if you are using cman+pacemaker you need to enabled the stonith and
  configuring that in you crm config
 
 
  2015-01-27 14:05 GMT+01:00 Vinod Prabhu
  pvinod.mit@...:
  is stonith enabled in crm conf?
 
 
  yes, stonith is enabled
 
  [ONE]pcs property
  Cluster Properties:
   cluster-infrastructure: cman
   dc-version: 1.1.11-97629de
   last-lrm-refresh: 1422285715
   no-quorum-policy: ignore
   stonith-enabled: true
 
 
  If I disable it, stonith device don't start
 
 
 
  On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
  emi2fast@... wrote:When a node is dead
  the registration key is removed.
 
  So I must see 2 key registered when I add fence_scsi device?
  But I don't see 2 key registered...
 
 

Sorry, I used wrong device id.
Now, with the correct device id, I see 2 key reserved


[ONE] sg_persist -n --read-keys
--device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
  PR generation=0x4, 2 registered reservation keys follow:
0x4d5a0001
0x4d5a0002


Tomorrow i will do some test for fencing...

thanks
Andrea





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Segfault on monitor resource

2015-01-27 Thread Dejan Muhamedagic
On Tue, Jan 27, 2015 at 03:18:13PM +0100, Oscar Salvador wrote:
  Hi,
 
 I've checked the resource graphs I have, and the resources were fine, so I
 think it's not a problem due to a high use of memory or something like that.
 And unfortunately I don't have a core dump to analize(I'll enable it for a
 future case) so the only thing I have are the logs.
 
 For the line below, I though that was the process in charge to monitore
 nginx what was killed due to a segfault:
 
 RA output: (Nginx-rsc:monitor:stderr) Segmentation fault

This is just output captured during the execution of the RA
monitor action. It could've been anything within the RA (which is
just a shell script) to segfault.

Thanks,

Dejan

 I've checked the Nginx logs, and there is nothing worth there, actually
 there is no activity, so I think it has to be something internal what
 caused the failure.
 I'll enable coredumps, it's the only thing I can do for now.
 
 Thank you very much
 
 Oscar
 
 2015-01-27 10:39 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
 
  Hi,
 
  On Mon, Jan 26, 2015 at 06:20:35PM +0100, Oscar Salvador wrote:
   Hi!
  
   I'm writing here because two days ago I experienced a strange problem in
  my
   Pacemaker Cluster.
   Everything was working fine, till suddenly a Segfault in Nginx monitor
   resource happened:
  
   Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph:  Transition
  7551
   (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
   Source=/var/lib/pengine/pe-input-90.bz2): Complete
   Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
   transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
   cause=C_FSA_INTERNAL origin=notify_crmd ]
   Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1 operations
   (0.00us average, 0% utilization) in the last 10min
   Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
  Recheck
   Timer (I_PE_CALC) just popped (90ms)
   Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
   transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
  cause=C_TIMER_POPPED
   origin=crm_timer_popped ]
   Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition: Progressed
  to
   state S_POLICY_ENGINE after C_TIMER_POPPED
   Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op: Processing
   failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
   Jan 25 04:10:24 lb02 pengine: [10028]: notice: common_apply_stickiness:
   Ldirector-rsc can fail 97 more times on lb02 before being forced off
   Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
   transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
   cause=C_IPC_MESSAGE origin=handle_response ]
   Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
   Transition 7552: PEngine Input stored in:
  /var/lib/pengine/pe-input-90.bz2
   Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing graph
   7552 (ref=pe_calc-dc-1422155424-7644) derived from
   /var/lib/pengine/pe-input-90.bz2
   Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph:  Transition
  7552
   (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
   Source=/var/lib/pengine/pe-input-90.bz2): Complete
   Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
   transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
   cause=C_FSA_INTERNAL origin=notify_crmd ]
  
  
   Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
   (Nginx-rsc:monitor:stderr) Segmentation fault   *** here it starts
 
  What exactly did segfault? Do you have a core dump to examine?
 
   As you can see, the last line.
   And then:
  
   Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
   (Nginx-rsc:monitor:stderr) Killed
   /usr/lib/ocf/resource.d//heartbeat/nginx: 910:
   /usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
 
  This could be related to the segfault, or due to other serious
  system error.
 
   I guess here Nginx was killed.
  
   And then I have some others errors till Pacemaker decide to move the
   resources to the node:
  
   Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM operation
   Nginx-rsc_monitor_1 (call=52, rc=2, cib-update=7633, confirmed=false)
   invalid parameter
   Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event: Detected
   action Nginx-rsc_monitor_1 from a different transition: 5739 vs. 7552
   Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
   process_graph_event:476 - Triggered transition abort (complete=1,
   tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,
   magic=0:2;4:5739:0:42d1ed53-9686-4174-84e7-d2c230ed8832, cib=
   3.14.40) : Old event
   Jan 25 04:10:30 lb02 crmd: [9975]: WARN: update_failcount: Updating
   failcount for Nginx-rsc on lb02 after failed monitor: rc=2
  (update=value++,
   time=1422155430)
   Jan 25 04:10:30 lb02 crmd: [9975]: notice: do_state_transition: State
   transition S_IDLE - 

Re: [Pacemaker] new release date for resource-agents release 3.9.6

2015-01-27 Thread Dejan Muhamedagic
Hi Vladislav,

On Mon, Jan 26, 2015 at 06:52:21PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan,
 
 if it is not too late, would it be possible to add output of
 environment into resource trace file when tracing is enabled?

Applied.

Thanks,

Dejan

 --- ocf-shellfuncs.orig 2015-01-26 15:50:34.435001364 +
 +++ ocf-shellfuncs  2015-01-26 15:49:19.707001542 +
 @@ -822,6 +822,7 @@
 fi
 PS4='+ `date +%T`: ${FUNCNAME[0]:+${FUNCNAME[0]}:}${LINENO}: '
 set -x
 +   env=$( echo; printenv | sort )
  }
  ocf_stop_trace() {
 set +x
 
 Best,
 Vladislav
 
 23.01.2015 18:45, Dejan Muhamedagic wrote:
 Hello everybody,
 
 Someone warned us that three days is too short a period to test a
 release, so let's postpone the final release of resource-agents
 v3.9.6 to:
 
  Tuesday, Jan 27
 
 Please do more testing in the meantime. The v3.9.6-rc1 packages
 are available for most popular platforms:
 
 http://download.opensuse.org/repositories/home:/dmuhamedagic:/branches:/network:/ha-clustering:/Stable
 
 RHEL-7 and Fedora 21 are unfortunately missing, due to some
 strange unresolvable dependencies issue.
 
 Debian/Ubuntu people can use alien.
 
 Many thanks!
 
 The resource-agents crowd
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] authentication in the cluster

2015-01-27 Thread Kostiantyn Ponomarenko
Hi all,

Here is a situation - there are two two-node clusters.
They have totally identical configuration.
Nodes in the clusters are connected directly, without any switches.

Here is a part of corosync.comf file:

totem {
version: 2

cluster_name: mycluster
transport: udpu

crypto_hash: sha256
crypto_cipher: none
rrp_mode: passive
}

nodelist {
node {
name: node-a
nodeid: 1
ring0_addr: 169.254.0.2
ring1_addr: 169.254.1.2
}

node {
name: node-b
nodeid: 2
ring0_addr: 169.254.0.3
ring1_addr: 169.254.1.3
}
}

The only difference between those two clusters is authentication key (
/etc/corosync/authkey ) - it is different for both clusters.

QUESTION:
--
What will be the behavior if the next mess in connection occurs:
ring1_addr of node-a (cluster-A) is connected to ring1_addr of node-b
(cluster-B)
ring1_addr of node-a (cluster-B) is connected to ring1_addr of node-b
(cluster-A)

I attached a pic which shows the connections.

My actual goal - do not let the clusters work in such case.
To achieve it, I decided to use authentication key mechanism.
But I don't know the result in the situation which I described ... .

Thank you,
Kostya
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread emmanuel segura
When a node is dead the registration key is removed.

2015-01-27 13:29 GMT+01:00 Andrea a.bac...@codices.com:
 emmanuel segura emi2fast@... writes:


 sorry, but i forgot to tell you, you need to know the fence_scsi
 doesn't reboot the evicted node, so you can combine fence_vmware with
 fence_scsi as the second option.

  for this, i'm trying to use a watchdog script
 https://access.redhat.com/solutions/65187

 But when I start wachdog daemon, all node reboot.
 I continue testing...



 2015-01-27 11:44 GMT+01:00 emmanuel segura emi2fast at gmail.com:
  In normal situation every node can in your file system, fence_scsi is
  used when your cluster is in split-braint, when your a node doesn't
  comunicate with the other node, i don't is good idea.
 

 So, i will see key registration only when nodes loose comunication?





 
  2015-01-27 11:35 GMT+01:00 Andrea a.bacchi at codices.com:
  Andrea a.bacchi at ... writes:






 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread Vinod Prabhu
is stonith enabled in crm conf?

On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura emi2f...@gmail.com wrote:

 When a node is dead the registration key is removed.

 2015-01-27 13:29 GMT+01:00 Andrea a.bac...@codices.com:
  emmanuel segura emi2fast@... writes:
 
 
  sorry, but i forgot to tell you, you need to know the fence_scsi
  doesn't reboot the evicted node, so you can combine fence_vmware with
  fence_scsi as the second option.
 
   for this, i'm trying to use a watchdog script
  https://access.redhat.com/solutions/65187
 
  But when I start wachdog daemon, all node reboot.
  I continue testing...
 
 
 
  2015-01-27 11:44 GMT+01:00 emmanuel segura emi2fast at gmail.com:
   In normal situation every node can in your file system, fence_scsi is
   used when your cluster is in split-braint, when your a node doesn't
   comunicate with the other node, i don't is good idea.
  
 
  So, i will see key registration only when nodes loose comunication?
 
 
 
 
 
  
   2015-01-27 11:35 GMT+01:00 Andrea a.bacchi at codices.com:
   Andrea a.bacchi at ... writes:
 
 
 
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org



 --
 esta es mi vida e me la vivo hasta que dios quiera

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
OSS BSS Developer
Hand Phone: 9860788344
[image: Picture]
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread emmanuel segura
if you are using cman+pacemaker you need to enabled the stonith and
configuring that in you crm config

2015-01-27 14:05 GMT+01:00 Vinod Prabhu pvinod@gmail.com:

 is stonith enabled in crm conf?

 On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura emi2f...@gmail.com
 wrote:

 When a node is dead the registration key is removed.

 2015-01-27 13:29 GMT+01:00 Andrea a.bac...@codices.com:
  emmanuel segura emi2fast@... writes:
 
 
  sorry, but i forgot to tell you, you need to know the fence_scsi
  doesn't reboot the evicted node, so you can combine fence_vmware with
  fence_scsi as the second option.
 
   for this, i'm trying to use a watchdog script
  https://access.redhat.com/solutions/65187
 
  But when I start wachdog daemon, all node reboot.
  I continue testing...
 
 
 
  2015-01-27 11:44 GMT+01:00 emmanuel segura emi2fast at gmail.com:
   In normal situation every node can in your file system, fence_scsi is
   used when your cluster is in split-braint, when your a node doesn't
   comunicate with the other node, i don't is good idea.
  
 
  So, i will see key registration only when nodes loose comunication?
 
 
 
 
 
  
   2015-01-27 11:35 GMT+01:00 Andrea a.bacchi at codices.com:
   Andrea a.bacchi at ... writes:
 
 
 
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org



 --
 esta es mi vida e me la vivo hasta que dios quiera

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




 --
 OSS BSS Developer
 Hand Phone: 9860788344
 [image: Picture]

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-01-27 Thread Andrea
emmanuel segura emi2fast@... writes:

 
 
 if you are using cman+pacemaker you need to enabled the stonith and
configuring that in you crm config
 
 
 2015-01-27 14:05 GMT+01:00 Vinod Prabhu
pvinod@gmail.com:
 is stonith enabled in crm conf?
 

yes, stonith is enabled

[ONE]pcs property
Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 last-lrm-refresh: 1422285715
 no-quorum-policy: ignore
 stonith-enabled: true


If I disable it, stonith device don't start


 
 On Tue, Jan 27, 2015 at 6:21 PM, emmanuel segura
emi2f...@gmail.com wrote:When a node is dead
the registration key is removed.

So I must see 2 key registered when I add fence_scsi device?
But I don't see 2 key registered...



Andrea






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] pacemaker-remote not listening

2015-01-27 Thread Thomas Manninger
Hi,



my os is debian-wheezy



i compiled and installed pacemaker-remote.

Startup log:

Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: qb_ipcs_us_publish: server name: lrmd
Jan 27 16:04:30 [2859] vm1 pacemaker_remoted: info: main: Starting



My problem is, that pacemaker remote is not listening on port 3121



netstat -tulpen  grep 3121



netstat -alpen

Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 6635 2859/pacemaker_remo @lrmd
unix 2 [ ] DGRAM 6634 2859/pacemaker_remo

...

...

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Segfault on monitor resource

2015-01-27 Thread Oscar Salvador
2015-01-27 17:58 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:

 On Tue, Jan 27, 2015 at 03:18:13PM +0100, Oscar Salvador wrote:
   Hi,
 
  I've checked the resource graphs I have, and the resources were fine, so
 I
  think it's not a problem due to a high use of memory or something like
 that.
  And unfortunately I don't have a core dump to analize(I'll enable it for
 a
  future case) so the only thing I have are the logs.
 
  For the line below, I though that was the process in charge to monitore
  nginx what was killed due to a segfault:
 
  RA output: (Nginx-rsc:monitor:stderr) Segmentation fault

 This is just output captured during the execution of the RA
 monitor action. It could've been anything within the RA (which is
 just a shell script) to segfault.


Hi,

Yes, I see.
I've enabled core dumps on the system, so the next time I'll be able to
check what is causing this.

Thank you very much
Oscar Salvador



 Thanks,

 Dejan

  I've checked the Nginx logs, and there is nothing worth there, actually
  there is no activity, so I think it has to be something internal what
  caused the failure.
  I'll enable coredumps, it's the only thing I can do for now.
 
  Thank you very much
 
  Oscar
 
  2015-01-27 10:39 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm:
 
   Hi,
  
   On Mon, Jan 26, 2015 at 06:20:35PM +0100, Oscar Salvador wrote:
Hi!
   
I'm writing here because two days ago I experienced a strange
 problem in
   my
Pacemaker Cluster.
Everything was working fine, till suddenly a Segfault in Nginx
 monitor
resource happened:
   
Jan 25 03:55:24 lb02 crmd: [9975]: notice: run_graph:  Transition
   7551
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 03:55:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 04:00:08 lb02 cib: [9971]: info: cib_stats: Processed 1
 operations
(0.00us average, 0% utilization) in the last 10min
Jan 25 04:10:24 lb02 crmd: [9975]: info: crm_timer_popped: PEngine
   Recheck
Timer (I_PE_CALC) just popped (90ms)
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
   cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_state_transition:
 Progressed
   to
state S_POLICY_ENGINE after C_TIMER_POPPED
Jan 25 04:10:24 lb02 pengine: [10028]: WARN: unpack_rsc_op:
 Processing
failed op Ldirector-rsc_last_failure_0 on lb02: not running (7)
Jan 25 04:10:24 lb02 pengine: [10028]: notice:
 common_apply_stickiness:
Ldirector-rsc can fail 97 more times on lb02 before being forced
 off
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [
 input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 04:10:24 lb02 pengine: [10028]: notice: process_pe_message:
Transition 7552: PEngine Input stored in:
   /var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: info: do_te_invoke: Processing
 graph
7552 (ref=pe_calc-dc-1422155424-7644) derived from
/var/lib/pengine/pe-input-90.bz2
Jan 25 04:10:24 lb02 crmd: [9975]: notice: run_graph:  Transition
   7552
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-90.bz2): Complete
Jan 25 04:10:24 lb02 crmd: [9975]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
   
   
Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Segmentation fault   *** here it
 starts
  
   What exactly did segfault? Do you have a core dump to examine?
  
As you can see, the last line.
And then:
   
Jan 25 04:10:30 lb02 lrmd: [9972]: info: RA output:
(Nginx-rsc:monitor:stderr) Killed
/usr/lib/ocf/resource.d//heartbeat/nginx: 910:
/usr/lib/ocf/resource.d//heartbeat/nginx: Cannot fork
  
   This could be related to the segfault, or due to other serious
   system error.
  
I guess here Nginx was killed.
   
And then I have some others errors till Pacemaker decide to move the
resources to the node:
   
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_lrm_event: LRM
 operation
Nginx-rsc_monitor_1 (call=52, rc=2, cib-update=7633,
 confirmed=false)
invalid parameter
Jan 25 04:10:30 lb02 crmd: [9975]: info: process_graph_event:
 Detected
action Nginx-rsc_monitor_1 from a different transition: 5739 vs.
 7552
Jan 25 04:10:30 lb02 crmd: [9975]: info: abort_transition_graph:
process_graph_event:476 - Triggered transition abort (complete=1,
tag=lrm_rsc_op, id=Nginx-rsc_last_failure_0,

Re: [Pacemaker] HA Summit Key-signing Party (was: Organizing HA Summit 2015)

2015-01-27 Thread Jan Pokorný
 What's needed?
 Once you have a key pair (and provided that you are using GnuPG), please
 run the following sequence:
 
 # figure out the key ID for the identity to be verified;
 # IDENTITY is either your associated email address/your name
 # if only single key ID matches, specific key otherwise
 # (you can use gpg -K to select a desired ID at the sec line)
 KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

Oops, sorry, somehow '-k' got lost above ^.  Correct version:

 KEY=$(gpg -k --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)

 # export the public key to a file that is suitable for exchange
 gpg --export -a -- $KEY  $KEY
 
 # verify that you have an expected data to share
 gpg --with-fingerprint -- $KEY

-- 
Jan


pgpjL67h19sUK.pgp
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] authentication in the cluster

2015-01-27 Thread Kostiantyn Ponomarenko
Hi Chrissie,

I know that this setup it crazy thing =)
First of all I needed to say - think about each two-node cluster as one box
with two nodes.

 You can't connect clusters together like that.
I know that.

All nodes in the cluster have just 1 authkey file.
That is true. But in this example there are two clusters, each of them have
its own auth key.

What you have there is not a ring, it's err, a linked-cross?!
Yep, I showed the wrong way of connecting two clusters.

 Why do you need to connect the two clusters together - is it for failover?
No, it is not. I really don't (and won't) connect them in that way. It
wrong.
But, in real life those two clusters will be standing (physically, in the
same room, in the same rack) pretty close to each other.
And my concern is - if someone do that connection by a mistake. What will
be in that situation?
What I would like to get in that situation, is something which prevent
simultaneous work of two nodes in one cluster - because it will cause data
corruption.

The situation is pretty simple when there is only one ring_addr defined
per node.
In this case, when some one cross-linked two separate clusters, it will
lead to 4 clusters each of which is missing one node - because two
connected nodes has different auth keys, and that is why they will not see
each other even when there is a connection.
STONITH always works in the same cluster.
So, STONITH will be rebooting the other one in the cluster.
That will prevent simultaneous access to the data.

I tried to do my best in describing the situation, the problem and the
question.
Looking forward to hear any suggestions =)


Thank you,
Kostya
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] authentication in the cluster

2015-01-27 Thread Christine Caulfield
On 27/01/15 15:56, Kostiantyn Ponomarenko wrote:
 Hi all,
 
 Here is a situation - there are two two-node clusters.
 They have totally identical configuration.
 Nodes in the clusters are connected directly, without any switches.
 

You can't connect clusters together like that. All nodes in the cluster
have just 1 authkey file. Also, corosync clusters are a ring, even if
you have two nodes. What you have there is not a ring, it's err, a
linked-cross?!

Why do you need to connect the two clusters together - is it for
failover? There must be a better way of achieving what you need, have a
look for 'stretch clusters' (not my speciality TBH) if they are at
separate sites. If you just want to run resources outside of the cluster
then pacemaker_remote might be more useful.

If it's just for isolation of resources then pacemaker can do that
anyway so you don't need to partition the cluster like that.

If you can explain just why you think you need this system we might be
able to come up with something that will work :)

Chrissie

 totem {
 version: 2
 
 cluster_name: mycluster
 transport: udpu
 
 crypto_hash: sha256
 crypto_cipher: none
 rrp_mode: passive
 }
 
 nodelist {
 node {
 name: node-a
 nodeid: 1
 ring0_addr: 169.254.0.2
 ring1_addr: 169.254.1.2
 }
 
 node {
 name: node-b
 nodeid: 2
 ring0_addr: 169.254.0.3
 ring1_addr: 169.254.1.3
 }
 }
 
 The only difference between those two clusters is authentication key (
 /etc/corosync/authkey ) - it is different for both clusters.
 
 QUESTION:
 --
 What will be the behavior if the next mess in connection occurs:
 ring1_addr of node-a (cluster-A) is connected to ring1_addr of node-b
 (cluster-B)
 ring1_addr of node-a (cluster-B) is connected to ring1_addr of node-b
 (cluster-A)
 
 I attached a pic which shows the connections.
 
 My actual goal - do not let the clusters work in such case.
 To achieve it, I decided to use authentication key mechanism.
 But I don't know the result in the situation which I described ... .
 
 Thank you,
 Kostya
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org