Re: [ClusterLabs] Pacemaker newbie

2024-09-16 Thread Ken Gaillot
On Fri, 2024-09-13 at 17:32 +0200, Antony Stone wrote:
> On Friday 13 September 2024 at 17:23:59, Taylor, Marc D wrote:
> 
> > We bought a storage system from Dell and they recommended to us
> > that we
> > should use a two-node cluster
> 
> I do hope you realise that a literal two-node cluster is not a good
> idea?
> 
> If the two nodes lose contact with each other, you get a situation
> called 
> "split brain" where neither node can know what state the other node
> is in, and 
> neither node can safely take over resources.

As long as fencing is configured (and tested!), a two-node cluster is
not a problem. If the nodes lose communication, one side will fence the
other and take over all resources. (Various fencing options are
available to avoid a "death match" where both nodes fence each other.)

> 
> You should always have an odd number of nodes in a cluster, and
> provided more 
> than 50% of the nodes can see each other, they will run resources;
> any node 
> which cannot see enough other nodes to be in a group of more than 50%
> will 
> stop running resources.
> 
> > to share the storage out as either NFS or SMB.
> 
> Do they explicitly say you can do both?
> 
> It might be possible to share a single storage resource using both
> NFS and 
> SMB, but it must have some interesting file-locking capabilities.
> 
> 
> Antony
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker newbie

2024-09-16 Thread Ken Gaillot
On Fri, 2024-09-13 at 15:08 +, Taylor, Marc D wrote:
> Hello,
>  
> I just found this list and have a few questions.
>  
> My understanding is that you can’t run a cluster that is both
> active/active and active/passive on the same cluster nodes.  Is this
> correct?

Nope, Pacemaker has no concept of active or passive nodes, that's just
an easy way for people to think about it. In Pacemaker, any node can
run any resource in any mode unless told otherwise.

> We need to run a cluster to share out storage.  On one LUN we need to
> share out NFS and on another LUN we need to share out Samba.  We
> already shared out the NFS LUN in an active/passive configuration. 
> It looks like Samba should be shared out as Active/Active though we
> did find procedures for Active/Passive.  What is the best wisdom
> here?

Both resources would be clones. I'm not familiar with those particular
resource agents, but assuming they support promote/demote actions, you
would just configure both clones with promotable="true", and set
promoted-max="2" on the samba clone.

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/collective.html#clone-options


>  
> Thanks in advance,
>  
> Marc Taylor
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker newbie

2024-09-13 Thread Antony Stone
On Friday 13 September 2024 at 17:23:59, Taylor, Marc D wrote:

> We bought a storage system from Dell and they recommended to us that we
> should use a two-node cluster

I do hope you realise that a literal two-node cluster is not a good idea?

If the two nodes lose contact with each other, you get a situation called 
"split brain" where neither node can know what state the other node is in, and 
neither node can safely take over resources.

You should always have an odd number of nodes in a cluster, and provided more 
than 50% of the nodes can see each other, they will run resources; any node 
which cannot see enough other nodes to be in a group of more than 50% will 
stop running resources.

> to share the storage out as either NFS or SMB.

Do they explicitly say you can do both?

It might be possible to share a single storage resource using both NFS and 
SMB, but it must have some interesting file-locking capabilities.


Antony

-- 
There's a good theatrical performance about puns on in the West End.  It's a 
play on words.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker newbie

2024-09-13 Thread Taylor, Marc D
Antony,

Let me back the tape up a little bit.

We brought a storage system from Dell and they recommended to us that we should 
use a two-node cluster to share the storage out as either NFS or SMB.

So let's start with the idea we need to use a cluster for this.

Regards,

Marc

-Original Message-
From: Users  On Behalf Of Antony Stone
Sent: Friday, September 13, 2024 10:13
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] Pacemaker newbie

On Friday 13 September 2024 at 17:08:09, Taylor, Marc D wrote:

> We need to run a cluster to share out storage.  On one LUN we need to 
> share out NFS and on another LUN we need to share out Samba.  We 
> already shared out the NFS LUN in an active/passive configuration.  It 
> looks like Samba should be shared out as Active/Active though we did 
> find procedures for Active/Passive.  What is the best wisdom here?

If you want Samba to be running on all nodes at all times, why does it need to 
be managed with pacemaker at all?

Antony.

--
My life is going completely according to plan.

I do sometimes wish it had been *my* plan, though.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/users__;!!DZ3fjg!-ksQQMU___0KDe0s_NTicrA5Mm2dwoaxtiLycRuQb4eTiktLbn-mVpxSS0Z64X3hv9rC5uQGRTQX71aUxNjsLXlW9hMFL2kd9w$
 

ClusterLabs home: 
https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!DZ3fjg!-ksQQMU___0KDe0s_NTicrA5Mm2dwoaxtiLycRuQb4eTiktLbn-mVpxSS0Z64X3hv9rC5uQGRTQX71aUxNjsLXlW9hOKLn87Kw$
 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker newbie

2024-09-13 Thread Antony Stone
On Friday 13 September 2024 at 17:08:09, Taylor, Marc D wrote:

> We need to run a cluster to share out storage.  On one LUN we need to share
> out NFS and on another LUN we need to share out Samba.  We already shared
> out the NFS LUN in an active/passive configuration.  It looks like Samba
> should be shared out as Active/Active though we did find procedures for
> Active/Passive.  What is the best wisdom here?

If you want Samba to be running on all nodes at all times, why does it need to 
be managed with pacemaker at all?

Antony.

-- 
My life is going completely according to plan.

I do sometimes wish it had been *my* plan, though.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker resource configure issue

2024-02-08 Thread Ken Gaillot
On Thu, 2024-02-08 at 10:12 +0800, hywang via Users wrote:
> hello, everyone,
>  I want to make a node fenced or the cluster stopped after a
> resource start failed 3 times, how to make the resource configure to
> achive it?
> Thanks!
> 

The current design doesn't allow it. You can set start-failure-is-fatal 
to false to let the cluster reattempt the start and migration-threshold 
to 3 to have it try to start on a different node after three failures,
or you can set on-fail to fence to have it fence the node if the
(first) start fails, but you can't combine those approaches.

It's a longstanding goal to allow more flexibility in failure handling,
but there hasn't been time to deal with it.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.7-rc2 now available

2023-11-27 Thread Jan Friesse

On 24/11/2023 09:18, Klaus Wenninger wrote:

Hi all,

Source code for the 2nd release candidate for Pacemaker version 2.1.7
is available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc2

This is primarily a bug fix release. See the ChangeLog or the link
above for details.

Everyone is encouraged to download, build, and test the new release. We


I would like to ask if https://bugs.clusterlabs.org/show_bug.cgi?id=5529 
fix get in?


Without the fix booth is not working so it's really not recommended for 
anybody who is booth to try pcmk 2.1.7 until the bug 5529 gets fixed.


Regards
  Honza


do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all contributors of source code to this release,
including Chris Lumens
Gao Yan, Grace Chin, Hideo Yamauchi, Jan Pokorný, Ken Gaillot,
liupei, Oyvind Albrigtsen, Reid Wahl, xin liang, xuezhixin.

Klaus


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-remote

2023-09-18 Thread Ken Gaillot
On Thu, 2023-09-14 at 18:28 +0800, Mr.R via Users wrote:
> Hi all,
>
> In Pacemaker-Remote 2.1.6, the pacemaker package is required
> for guest nodes and not for remote nodes. Why is that? What does 
> pacemaker do?
> After adding guest node, pacemaker package does not seem to be 
> needed. Can I not install it here?

I'm not sure what's requiring it in your environment. There's no
dependency in the upstream RPM at least.

The pacemaker package does have the crm_master script needed by some
resource agents, so you will need it if you use any of those. (That
script should have been moved to the pacemaker-cli package in 2.1.3,
oops ...)

> After testing, remote nodes can be offline, but guest nodes cannot
>  be offline. Is there any way to get them offline? Are there
> relevant 
> failure test cases?
> 
> thanks,

To make a guest node offline, stop the resource that creates it.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker fatal shutdown

2023-07-25 Thread Ken Gaillot
On Thu, 2023-07-20 at 12:43 +0530, Priyanka Balotra wrote:
> What I mainly want to understand is that:
> - why "fatal failure" is coming 

The logs so far don't show that. The earliest sign is:

Jul 17 14:18:20.085 FILE-6 pacemaker-fenced[19411]
(remote_op_done)   notice: Operation 'reboot' targeting FILE-2 by FILE-
4 for pacemaker-controld.19415@FILE-6: OK | id=4e523b34

You'd want to figure out which node was the Designated Controller (DC)
at that time, and look at its logs before this time. The DC will have
"Calculated transition" log messages.

You want to find such messages just before the timestamp above. If you
look above the "Calculated transition" message, it will show what
actions the cluster wants to take, including fencing. The logs around
there should say why the fencing was needed.

> - why does pacemaker not start on the node after a node boots
> followed by  "pacemaker fatal failure" .

A fatal failure is one where Pacemaker should stay down, so that's what
it does. In this case, fencing completed against the node, but the node
was still alive, so it shuts down and waits for manual intervention to
figure out what happened.

> - How can this be handled?

In a situation like this, figure out (1) why fencing was needed and (2)
why successful fencing did not kill the node (if you're using fabric
fencing such as SCSI fencing, that could be a reason, otherwise it
might be a misconfiguration).

Once you know that, it should be fairly obvious what to do about it,
and once it's taken care of, you can manually start Pacemaker on the
node again.

> 
> Thanks
> Priyanka
> 
> On Thu, Jul 20, 2023 at 12:41 PM Priyanka Balotra <
> priyanka.14balo...@gmail.com> wrote:
> > Hi, 
> > 
> > Here are FILE-6 logs: 
> > 
> > 65710:Jul 17 14:16:51.517 FILE-6 pacemaker-controld  [19415]
> > (throttle_mode)debug: Current load is 0.76 across 10
> > core(s)
> > 65711:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (throttle_update)  debug: Node FILE-2 has negligible load and
> > supports at most 20 jobs; new job limit 20
> > 65712:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (handle_request)   debug: The throttle changed. Trigger a graph.
> > 65713:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__set_flags_as)   debug: FSA action flags 0x0002
> > (new_actions) for controller set by s_crmd_fsa:198
> > 65714:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (s_crmd_fsa)   debug: Processing I_JOIN_REQUEST: [
> > state=S_INTEGRATION cause=C_HA_MESSAGE origin=route_message ]
> > 65715:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__clear_flags_as) debug: FSA action flags 0x0002
> > (an_action) for controller cleared by do_fsa_action:108
> > 65716:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (do_dc_join_filter_offer)  debug: Accepting join-1 request from
> > FILE-2 | ref=join_request-crmd-1689603392-8
> > 65717:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__update_peer_expected)   info: do_dc_join_filter_offer:
> > Node FILE-2[2] - expected state is now member (was (null))
> > 65718:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (do_dc_join_filter_offer)  debug: 2 nodes currently integrated in
> > join-1
> > 65719:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (check_join_state) debug: join-1: Integration of 2 peers
> > complete | state=S_INTEGRATION for=do_dc_join_filter_offer
> > 65720:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__set_flags_as)   debug: FSA action flags 0x0004
> > (new_actions) for controller set by s_crmd_fsa:198
> > 65721:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (s_crmd_fsa)   debug: Processing I_INTEGRATED: [
> > state=S_INTEGRATION cause=C_FSA_INTERNAL origin=check_join_state ]
> > 65722:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (do_state_transition)  info: State transition S_INTEGRATION ->
> > S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL
> > origin=check_join_state
> > 65723:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__set_flags_as)   debug: FSA action flags 0x0020
> > (A_INTEGRATE_TIMER_STOP) for controller set by
> > do_state_transition:559
> > 65724:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__set_flags_as)   debug: FSA action flags 0x0040
> > (A_FINALIZE_TIMER_START) for controller set by
> > do_state_transition:563
> > 65725:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__set_flags_as)   debug: FSA action flags 0x0200
> > (A_DC_TIMER_STOP) for controller set by do_state_transition:569
> > 65726:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (do_state_transition)  debug: All cluster nodes (2) responded
> > to join offer
> > 65727:Jul 17 14:16:55.085 FILE-6 pacemaker-controld  [19415]
> > (pcmk__clear_flags_as) debug: FSA action flags 0x0200
>

Re: [ClusterLabs] Pacemaker fatal shutdown

2023-07-19 Thread Reid Wahl
On Wed, Jul 19, 2023 at 8:33 PM Priyanka Balotra
 wrote:
>
> Sure,
> Here are the logs:
>
>
> 63138:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (post_cache_update)debug: Updated cache after membership event 44.
> 63139:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__set_flags_as)   debug: FSA action flags 0x2 
> (A_ELECTION_CHECK) for controller set by post_cache_update:81
> 63140:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x0002 (an_action) for 
> controller cleared by do_fsa_action:108
> 63141:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started) 
>   info: Delaying start, Config not read (0040)
> 63142:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (register_fsa_input_adv)   debug: Stalling the FSA pending further input: 
> source=do_started cause=C_FSA_INTERNAL data=(nil) queue=0
> 63143:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__set_flags_as)   debug: FSA action flags 0x0002 (with_actions) 
> for controller set by register_fsa_input_adv:88
> 63144:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (s_crmd_fsa) 
>   debug: Exiting the FSA: queue=0, fsa_actions=0x20002, stalled=true
> 63145:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (config_query_callback)debug: Call 3 : Parsing CIB options
> 63146:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (config_query_callback)debug: Shutdown escalation occurs if DC has not 
> responded to request in 120ms
> 63147:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (config_query_callback)debug: Re-run scheduler after 90ms of 
> inactivity
> 63148:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pe_unpack_alerts) debug: Alert pf-ha-alert: 
> path=/usr/lib/ocf/resource.d/pacemaker/pf_ha_alert.sh timeout=3ms 
> tstamp-format='%H:%M:%S.%06N' 0 vars
> 63149:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x0002 (an_action) for 
> controller cleared by do_fsa_action:108
> 63150:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started) 
>   debug: Init server comms
> 63151:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (qb_ipcs_us_publish)   info: server name: crmd
> 63152:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started) 
>   notice: Pacemaker controller successfully started and accepting connections
> 63153:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x2 (an_action) 
> for controller cleared by do_fsa_action:108
> 63154:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (do_election_check)debug: Ignoring election check because we are not 
> in an election
> 63155:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__set_flags_as)   debug: FSA action flags 0x10100100 
> (new_actions) for controller set by s_crmd_fsa:198
> 63156:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (s_crmd_fsa) 
>   debug: Processing I_PENDING: [ state=S_STARTING cause=C_FSA_INTERNAL 
> origin=do_started ]
> 63157:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x1000 
> (an_action) for controller cleared by do_fsa_action:108
> 63158:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_log)   info: 
> Input I_PENDING received in state S_STARTING from do_started
> 63159:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (do_state_transition)  notice: State transition S_STARTING -> S_PENDING | 
> input=I_PENDING cause=C_FSA_INTERNAL origin=do_started
> 63160:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__set_flags_as)   debug: FSA action flags 0x0020 
> (A_INTEGRATE_TIMER_STOP) for controller set by do_state_transition:559
> 63161:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__set_flags_as)   debug: FSA action flags 0x0080 
> (A_FINALIZE_TIMER_STOP) for controller set by do_state_transition:565
> 63162:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x0020 (an_action) for 
> controller cleared by do_fsa_action:108
> 63163:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x0080 (an_action) for 
> controller cleared by do_fsa_action:108
> 63164:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as) debug: FSA action flags 0x0010 (an_action) for 
> controller cleared by do_fsa_action:108
> 63165:Jul 17 14:16:26.132 FILE-2 pacemaker-controld  [15962] 
> (do_cl_join_query) debug: Querying for a DC
> 63166:Jul 17 14:16:26.132 FILE-2 pacemaker-controld  [15962] 
> (pcmk__clear_flags_as)  

Re: [ClusterLabs] Pacemaker fatal shutdown

2023-07-19 Thread Priyanka Balotra
Sure,
Here are the logs:


63138:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(post_cache_update)debug: Updated cache after membership event 44.
63139:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__set_flags_as)   debug: FSA action flags 0x2
(A_ELECTION_CHECK) for controller set by post_cache_update:81
63140:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0002 (an_action)
for controller cleared by do_fsa_action:108
63141:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started)
info: Delaying start, Config not read (0040)
63142:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(register_fsa_input_adv)   debug: Stalling the FSA pending further input:
source=do_started cause=C_FSA_INTERNAL data=(nil) queue=0
63143:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__set_flags_as)   debug: FSA action flags 0x0002
(with_actions) for controller set by register_fsa_input_adv:88
63144:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (s_crmd_fsa)
debug: Exiting the FSA: queue=0, fsa_actions=0x20002, stalled=true
63145:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(config_query_callback)debug: Call 3 : Parsing CIB options
63146:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(config_query_callback)debug: Shutdown escalation occurs if DC has not
responded to request in 120ms
63147:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(config_query_callback)debug: Re-run scheduler after 90ms of
inactivity
63148:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pe_unpack_alerts) debug: Alert pf-ha-alert:
path=/usr/lib/ocf/resource.d/pacemaker/pf_ha_alert.sh timeout=3ms
tstamp-format='%H:%M:%S.%06N' 0 vars
63149:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0002 (an_action)
for controller cleared by do_fsa_action:108
63150:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started)
debug: Init server comms
63151:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(qb_ipcs_us_publish)   info: server name: crmd
63152:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_started)
notice: Pacemaker controller successfully started and accepting
connections
63153:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x2 (an_action)
for controller cleared by do_fsa_action:108
63154:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(do_election_check)debug: Ignoring election check because we are
not in an election
63155:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__set_flags_as)   debug: FSA action flags 0x10100100
(new_actions) for controller set by s_crmd_fsa:198
63156:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (s_crmd_fsa)
debug: Processing I_PENDING: [ state=S_STARTING cause=C_FSA_INTERNAL
origin=do_started ]
63157:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x1000
(an_action) for controller cleared by do_fsa_action:108
63158:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962] (do_log)
info: Input I_PENDING received in state S_STARTING from do_started
63159:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(do_state_transition)  notice: State transition S_STARTING -> S_PENDING
| input=I_PENDING cause=C_FSA_INTERNAL origin=do_started
63160:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__set_flags_as)   debug: FSA action flags 0x0020
(A_INTEGRATE_TIMER_STOP) for controller set by do_state_transition:559
63161:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__set_flags_as)   debug: FSA action flags 0x0080
(A_FINALIZE_TIMER_STOP) for controller set by do_state_transition:565
63162:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0020 (an_action)
for controller cleared by do_fsa_action:108
63163:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0080 (an_action)
for controller cleared by do_fsa_action:108
63164:Jul 17 14:16:25.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0010 (an_action)
for controller cleared by do_fsa_action:108
63165:Jul 17 14:16:26.132 FILE-2 pacemaker-controld  [15962]
(do_cl_join_query) debug: Querying for a DC
63166:Jul 17 14:16:26.132 FILE-2 pacemaker-controld  [15962]
(pcmk__clear_flags_as) debug: FSA action flags 0x0100 (an_action)
for controller cleared by do_fsa_action:108
63167:Jul 17 14:16:26.132 FILE-2 pacemaker-controld  [15962]
(controld_start_timer) debug: Started Election Trigger (inject
I_DC_TIMEOUT if pops after 2ms, source=18)
63168:Jul 17 

Re: [ClusterLabs] Pacemaker fatal shutdown

2023-07-19 Thread Ken Gaillot
On Wed, 2023-07-19 at 23:49 +0530, Priyanka Balotra wrote:
> Hi All, 
> I am using SLES 15 SP4. One of the nodes of the cluster is brought
> down and boot up after sometime. Pacemaker service came up first but
> later it faced a fatal shutdown. Due to that crm service is down. 
> 
> The logs from /var/log/pacemaker.pacemaker.log are as follows:
> 
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956]
> (pcmk_child_exit)warning: Shutting cluster down because
> pacemaker-controld[15962] had fatal failure

The interesting messages will be before this. The ones with "pacemaker-
controld" will be the most relevant, at least initially.

> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956]
> (pcmk_shutdown_worker)   notice: Shutting down Pacemaker
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956]
> (pcmk_shutdown_worker)   debug: pacemaker-controld confirmed stopped
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956] (stop_child)  
>   notice: Stopping pacemaker-schedulerd | sent signal 15 to process
> 15961
> Jul 17 14:18:20.093 FILE-2 pacemaker-schedulerd[15961]
> (crm_signal_dispatch)notice: Caught 'Terminated' signal | 15
> (invoking handler)
> Jul 17 14:18:20.093 FILE-2 pacemaker-schedulerd[15961]
> (qb_ipcs_us_withdraw)info: withdrawing server sockets
> Jul 17 14:18:20.093 FILE-2 pacemaker-schedulerd[15961]
> (qb_ipcs_unref)  debug: qb_ipcs_unref() - destroying
> Jul 17 14:18:20.093 FILE-2 pacemaker-schedulerd[15961]
> (crm_xml_cleanup)info: Cleaning up memory from libxml2
> Jul 17 14:18:20.093 FILE-2 pacemaker-schedulerd[15961] (crm_exit)
>   info: Exiting pacemaker-schedulerd | with status 0
> Jul 17 14:18:20.093 FILE-2 pacemaker-based [15957]
> (qb_ipcs_event_sendv)debug: new_event_notification (/dev/shm/qb-
> 15957-15962-12-RDPw6O/qb): Broken pipe (32)
> Jul 17 14:18:20.093 FILE-2 pacemaker-based [15957]
> (cib_notify_send_one)warning: Could not notify client crmd:
> Broken pipe | id=e29d175e-7e91-4b6a-bffb-fabfdd7a33bf
> Jul 17 14:18:20.093 FILE-2 pacemaker-based [15957]
> (cib_process_request)info: Completed cib_delete operation for
> section //node_state[@uname='FILE-2']/*: OK (rc=0, origin=FILE-
> 6/crmd/74, version=0.24.75)
> Jul 17 14:18:20.093 FILE-2 pacemaker-fenced[15958]
> (xml_patch_version_check)debug: Can apply patch 0.24.75 to
> 0.24.74
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956]
> (pcmk_child_exit)info: pacemaker-schedulerd[15961] exited
> with status 0 (OK)
> Jul 17 14:18:20.093 FILE-2 pacemaker-based [15957]
> (cib_process_request)info: Completed cib_modify operation for
> section status: OK (rc=0, origin=FILE-6/crmd/75, version=0.24.75)
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956]
> (pcmk_shutdown_worker)   debug: pacemaker-schedulerd confirmed
> stopped
> Jul 17 14:18:20.093 FILE-2 pacemakerd  [15956] (stop_child)  
>   notice: Stopping pacemaker-attrd | sent signal 15 to process 15960
> Jul 17 14:18:20.093 FILE-2 pacemaker-attrd [15960]
> (crm_signal_dispatch)notice: Caught 'Terminated' signal | 15
> (invoking handler)
> 
> Could you please help me understand the issue here.
> 
> Regards
> Priyanka
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker logs written on message which is not expected as per configuration

2023-06-28 Thread S Sathish S via Users
Thanks Klaus and Ken for your quick support.

Regards,
S Sathish S
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker logs written on message which is not expected as per configuration

2023-06-26 Thread Ken Gaillot
On Mon, 2023-06-26 at 08:46 +0200, Klaus Wenninger wrote:
> 
> 
> On Fri, Jun 23, 2023 at 3:57 PM S Sathish S via Users <
> users@clusterlabs.org> wrote:
> > Hi Team,
> >  
> > The pacemaker logs is written in both '/var/log/messages' and
> > '/var/log/pacemaker/pacemaker.log'.
> > Could you please help us for not write pacemaker processes in
> > /var/log/messages? Even corosync configuration we have set
> > to_syslog: no.
> > Attached the corosync.conf file.
> >  
> > Pacemaker 2.1.6
> >  
> > [root@node1 username]# tail -f /var/log/messages
> > Jun 23 13:45:38 node1 ESAFMA_RA(ESAFMA_node1)[3593054]: INFO:
> >  component is running with 10502  number
> > Jun 23 13:45:38 node1
> > HealthMonitor_RA(HEALTHMONITOR_node1)[3593055]: INFO: Health
> > Monitor component is running with 3046  number
> > Jun 23 13:45:38 node1 ESAPMA_RA(ESAPMA_OCC)[3593056]: INFO: 
> > component is running with 10902  number
> > Jun 23 13:45:38 node1 HP_AMSD_RA(HP_AMSD_node1)[3593057]: INFO:
> >  component is running with 2540  number
> > Jun 23 13:45:38 node1 HP_SMAD_RA(HP_SMAD_node1)[3593050]: INFO:
> >  component is running with 2536  number
> > Jun 23 13:45:38 node1 SSMAGENT_RA(SSMAGENT_node1)[3593068]: INFO:
> >  component is running with 2771  number
> > Jun 23 13:45:38 node1 HazelCast_RA(HAZELCAST_node1)[3593059]: INFO:
> >  component is running with 13355 number
> > Jun 23 13:45:38 node1 HP_SMADREV_RA(HP_SMADREV_node1)[3593062]:
> > INFO:  component is running with 2735  number
> > Jun 23 13:45:38 node1 ESAMA_RA(ESAMA_node1)[3593065]: INFO: 
> > component is running with 9572  number
> > Jun 23 13:45:38 node1 MANAGER_RA(MANAGER_OCC)[3593071]: INFO:
> >  component is running with 10069 number
> > 
> 
> What did you configure in /etc/sysconfig/pacemaker?
>   PCMK_logfacility=none
> should disable all syslogging. 

It's worth mentioning that the syslog gets only the most serious
messages. By default these are critical, error, warning, and notice
level, but you can change that by setting PCMK_logpriority. For example
with PCMK_logpriority=error you will get only critical and error
messages in syslog.

> 
> Klaus
> >  
> >  
> > cat /etc/corosync/corosync.conf
> > totem {
> > version: 2
> > cluster_name: OCC
> > transport: knet
> > crypto_cipher: aes256
> > crypto_hash: sha256
> > cluster_uuid: 20572748740a4ac2a7bcc3a3bb6889e9
> > }
> >  
> > nodelist {
> > node {
> > ring0_addr: node1
> > name: node1
> > nodeid: 1
> > }
> > }
> >  
> > quorum {
> > provider: corosync_votequorum
> > }
> >  
> > logging {
> > to_logfile: yes
> > logfile: /var/log/cluster/corosync.log
> > to_syslog: no
> > timestamp: on
> > }
> >  
> > Thanks and Regards,
> > S Sathish S
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker logs written on message which is not expected as per configuration

2023-06-25 Thread Klaus Wenninger
On Fri, Jun 23, 2023 at 3:57 PM S Sathish S via Users 
wrote:

> Hi Team,
>
>
>
> The pacemaker logs is written in both '/var/log/messages' and
> '/var/log/pacemaker/pacemaker.log'.
>
> Could you please help us for not write pacemaker processes in
> /var/log/messages? Even corosync configuration we have set to_syslog: no.
>
> Attached the corosync.conf file.
>
>
>
> Pacemaker 2.1.6
>
>
>
> [root@node1 username]# tail -f /var/log/messages
>
> Jun 23 13:45:38 node1 ESAFMA_RA(ESAFMA_node1)[3593054]: INFO: 
> component is running with 10502  number
>
> Jun 23 13:45:38 node1 HealthMonitor_RA(HEALTHMONITOR_node1)[3593055]:
> INFO: Health Monitor component is running with 3046  number
>
> Jun 23 13:45:38 node1 ESAPMA_RA(ESAPMA_OCC)[3593056]: INFO: 
> component is running with 10902  number
>
> Jun 23 13:45:38 node1 HP_AMSD_RA(HP_AMSD_node1)[3593057]: INFO: 
> component is running with 2540  number
>
> Jun 23 13:45:38 node1 HP_SMAD_RA(HP_SMAD_node1)[3593050]: INFO: 
> component is running with 2536  number
>
> Jun 23 13:45:38 node1 SSMAGENT_RA(SSMAGENT_node1)[3593068]: INFO: 
> component is running with 2771  number
>
> Jun 23 13:45:38 node1 HazelCast_RA(HAZELCAST_node1)[3593059]: INFO:
>  component is running with 13355 number
>
> Jun 23 13:45:38 node1 HP_SMADREV_RA(HP_SMADREV_node1)[3593062]: INFO:
>  component is running with 2735  number
>
> Jun 23 13:45:38 node1 ESAMA_RA(ESAMA_node1)[3593065]: INFO: 
> component is running with 9572  number
>
> Jun 23 13:45:38 node1 MANAGER_RA(MANAGER_OCC)[3593071]: INFO: 
> component is running with 10069 number
>

What did you configure in /etc/sysconfig/pacemaker?
  PCMK_logfacility=none
should disable all syslogging.

Klaus

>
>
>
>
> cat /etc/corosync/corosync.conf
>
> totem {
>
> version: 2
>
> cluster_name: OCC
>
> transport: knet
>
> crypto_cipher: aes256
>
> crypto_hash: sha256
>
> cluster_uuid: 20572748740a4ac2a7bcc3a3bb6889e9
>
> }
>
>
>
> nodelist {
>
> node {
>
> ring0_addr: node1
>
> name: node1
>
> nodeid: 1
>
> }
>
> }
>
>
>
> quorum {
>
> provider: corosync_votequorum
>
> }
>
>
>
> logging {
>
> to_logfile: yes
>
> logfile: /var/log/cluster/corosync.log
>
> to_syslog: no
>
> timestamp: on
>
> }
>
>
>
> Thanks and Regards,
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-28 Thread Ken Gaillot
On Tue, 2023-03-28 at 13:11 +0800, d tbsky wrote:
> Ken Gaillot 
> > I'm glad it's resolved, but for future reference, that does
> > indicate a
> > serious problem. It means the fencer is not accepting any requests,
> > so
> > any fencing attempts or even attempts to monitor a fencing device
> > from
> > that node will fail.
> > 
> 
>That sounds like pacemaker-fenced became some kind of zombie.
> For testing, I block the connection between the node and ipmi-fencing
> device. the fencing resource stopped and  report error like below:
> 
> Failed Resource Actions:
>   * fence_ipmi start on c1.example.tw could not be executed (Timed
> Out) because 'Fence agent did not complete in time' at Tue Mar 28
> 12:49:58 2023 after 20.004s
> 
> and it recovered when the connection recovered.
> Does it mean fencing is still working?
> I want to make sure if I saw message like "pacemaker-fenced[2405] is
> unresponsive to ipc after 1 tries", does it mean permanent fail or
> the
> second try success so it no more complains.
> 

If successful client connections are shown later in the log, it's
recovered and should not be a problem. Of course if fencing failed or
timed out, the cluster will want to keep trying before recovering
resources.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread d tbsky
Ken Gaillot 
> I'm glad it's resolved, but for future reference, that does indicate a
> serious problem. It means the fencer is not accepting any requests, so
> any fencing attempts or even attempts to monitor a fencing device from
> that node will fail.
>

   That sounds like pacemaker-fenced became some kind of zombie.
For testing, I block the connection between the node and ipmi-fencing
device. the fencing resource stopped and  report error like below:

Failed Resource Actions:
  * fence_ipmi start on c1.example.tw could not be executed (Timed
Out) because 'Fence agent did not complete in time' at Tue Mar 28
12:49:58 2023 after 20.004s

and it recovered when the connection recovered.
Does it mean fencing is still working?
I want to make sure if I saw message like "pacemaker-fenced[2405] is
unresponsive to ipc after 1 tries", does it mean permanent fail or the
second try success so it no more complains.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread Ken Gaillot
On Mon, 2023-03-27 at 14:48 +0800, d tbsky wrote:
> Hi:
>the cluster is running under RHEL 9.0 elements. today I saw log
> report strange errors like below:
> 
> Mar 27 13:07:06.287 example.com pacemaker-fenced[2405]
> (qb_sys_mmap_file_open) error: couldn't allocate file
> /dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng-data:
> Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
> (qb_rb_open_2)  error: couldn't create file for mmap
> Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
> (qb_ipcs_shm_rb_open)   error:
> qb_rb_open:/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng:
> Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
> (qb_ipcs_shm_connect)   error: shm connection FAILED: Interrupted
> system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
> (handle_new_connection) error: Error in connection setup
> (/dev/shm/qb-2405-2403-12-A9UUaJ/qb): Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemakerd  [2403]
> (pcmk__ipc_is_authentic_process_active) info: Could not connect
> to
> stonith-ng IPC: Interrupted system call
> Mar 27 13:07:06.288 example.com pacemakerd  [2403]
> (check_active_before_startup_processes) notice:
> pacemaker-fenced[2405] is unresponsive to ipc after 1 tries
> 
> there are no more "pacemaker-fenced" keywords in the log. the cluster
> seems fine and the process id "2405" of pacemaker-fenced is still
> running. may I assume the cluster is ok and I don't need to do
> anything since pacemaker didn't complain further?

I'm glad it's resolved, but for future reference, that does indicate a
serious problem. It means the fencer is not accepting any requests, so
any fencing attempts or even attempts to monitor a fencing device from
that node will fail.

If sbd is in use, it will kick in and reboot the node. However without
sbd, there is no automated mechanism to deal with the issue.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread d tbsky
Christine caulfield 
> It sounds like you're running an old version of libqb, upgrading to
> libqb 2.0.6 (in RHEL 9.1) should fix those messages

Thanks a lot for the quick response! I will arrange the upgrade.

Regards,
tbskyd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced /dev/shm errors

2023-03-27 Thread Christine caulfield

On 27/03/2023 07:48, d tbsky wrote:

Hi:
the cluster is running under RHEL 9.0 elements. today I saw log
report strange errors like below:

Mar 27 13:07:06.287 example.com pacemaker-fenced[2405]
(qb_sys_mmap_file_open) error: couldn't allocate file
/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng-data:
Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_rb_open_2)  error: couldn't create file for mmap
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_ipcs_shm_rb_open)   error:
qb_rb_open:/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng:
Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(qb_ipcs_shm_connect)   error: shm connection FAILED: Interrupted
system call (4)
Mar 27 13:07:06.288 example.com pacemaker-fenced[2405]
(handle_new_connection) error: Error in connection setup
(/dev/shm/qb-2405-2403-12-A9UUaJ/qb): Interrupted system call (4)
Mar 27 13:07:06.288 example.com pacemakerd  [2403]
(pcmk__ipc_is_authentic_process_active) info: Could not connect to
stonith-ng IPC: Interrupted system call
Mar 27 13:07:06.288 example.com pacemakerd  [2403]
(check_active_before_startup_processes) notice:
pacemaker-fenced[2405] is unresponsive to ipc after 1 tries

there are no more "pacemaker-fenced" keywords in the log. the cluster
seems fine and the process id "2405" of pacemaker-fenced is still
running. may I assume the cluster is ok and I don't need to do
anything since pacemaker didn't complain further?



It sounds like you're running an old version of libqb, upgrading to 
libqb 2.0.6 (in RHEL 9.1) should fix those messages


Chrissie

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [PaceMaker] Help troubleshooting frequent disjoin issue

2023-03-21 Thread Jehan-Guillaume de Rorthais via Users
On Tue, 21 Mar 2023 11:47:23 +0100
Jérôme BECOT  wrote:

> Le 21/03/2023 à 11:00, Jehan-Guillaume de Rorthais a écrit :
> > Hi,
> >
> > On Tue, 21 Mar 2023 09:33:04 +0100
> > Jérôme BECOT  wrote:
> >  
> >> We have several clusters running for different zabbix components. Some
> >> of these clusters consist of 2 zabbix proxies,where nodes run Mysql,
> >> Zabbix-proxy server and a VIP, and a corosync-qdevice.  
> > I'm not sure to understand your topology. The corosync-device is not
> > supposed to be on a cluster node. It is supposed to be on a remote node and
> > provide some quorum features to one or more cluster without setting up the
> > whole pacemaker/corosync stack.  
> I was not clear, the qdevice is deployed on a remote node, as intended.

ok

> >> The MySQL servers are always up to replicate, and are configured in
> >> Master/Master (they both replicate from the other but only one is supposed
> >> to be updated by the proxy running on the master node).  
> > Why do you bother with Master/Master when a simple (I suppose, I'm not a
> > MySQL cluster guy) Primary-Secondary topology or even a shared storage
> > would be enough and would keep your logic (writes on one node only) safe
> > from incidents, failures, errors, etc?
> >
> > HA must be a simple as possible. Remove useless parts when you can.  
> A shared storage moves the complexity somewhere else.

Yes, on storage/SAN side.

> A classic Primary / secondary can be an option if PaceMaker manages to start
> the client on the slave node,

I suppose this can be done using a location constraint.

> but it would become Master/Master during the split brain.

No, and if you do have real split brain, then you might have something wrong in
your setup. See below.


> >> One cluster is prompt to frequent sync errors, with duplicate entries
> >> errors in SQL. When I look at the logs, I can see "Mar 21 09:11:41
> >> zabbix-proxy-01 pacemaker-controld  [948] (pcmk_cpg_membership)
> >> info: Group crmd event 89: zabbix-proxy-02 (node 2 pid 967) left via
> >> cluster exit", and within the next second, a rejoin. The same messages
> >> are in the other node logs, suggesting a split brain, which should not
> >> happen, because there is a quorum device.  
> > Would it be possible your SQL sync errors and the left/join issues are
> > correlated and are both symptoms of another failure? Look at your log for
> > some explanation about why the node decided to leave the cluster.  
> 
> My guess is that maybe a high latency in network cause the disjoin, 
> hence starting Zabbix-proxy on both nodes causes the replication error. 
> It is configured to use the vip which is up locally because there is a 
> split brain.

If you have a split brain, that means your quorum setup is failing. 

No node could start/promote a resource without having the quorum. If a node is
isolated from the cluster and quorum-device, it should stop its resources, not
recover/promote them.

If both nodes lost connection with each others, but are still connected to the
quorum-device, the later should be able to grant the quorum on one side only.

Lastly, quorum is a split brain protection when "things are going fine".
Fencing is a split brain protection for all other situations. Fencing is hard
and painful, but it saves from many split brain situation.

> This is why I'm requesting guidance to check/monitor these nodes to find 
> out if it is temporary network latency that is causing the disjoin.

A cluster is always very sensitive to network latency/failures. You need to
build on stronger fondations.

Regards,
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [PaceMaker] Help troubleshooting frequent disjoin issue

2023-03-21 Thread Jehan-Guillaume de Rorthais via Users
Hi,

On Tue, 21 Mar 2023 09:33:04 +0100
Jérôme BECOT  wrote:

> We have several clusters running for different zabbix components. Some 
> of these clusters consist of 2 zabbix proxies,where nodes run Mysql, 
> Zabbix-proxy server and a VIP, and a corosync-qdevice. 

I'm not sure to understand your topology. The corosync-device is not supposed
to be on a cluster node. It is supposed to be on a remote node and provide some
quorum features to one or more cluster without setting up the whole
pacemaker/corosync stack.

> The MySQL servers are always up to replicate, and are configured in
> Master/Master (they both replicate from the other but only one is supposed to
> be updated by the proxy running on the master node).

Why do you bother with Master/Master when a simple (I suppose, I'm not a MySQL
cluster guy) Primary-Secondary topology or even a shared storage would be
enough and would keep your logic (writes on one node only) safe from incidents,
failures, errors, etc?

HA must be a simple as possible. Remove useless parts when you can.

> One cluster is prompt to frequent sync errors, with duplicate entries 
> errors in SQL. When I look at the logs, I can see "Mar 21 09:11:41 
> zabbix-proxy-01 pacemaker-controld  [948] (pcmk_cpg_membership)     
> info: Group crmd event 89: zabbix-proxy-02 (node 2 pid 967) left via 
> cluster exit", and within the next second, a rejoin. The same messages 
> are in the other node logs, suggesting a split brain, which should not 
> happen, because there is a quorum device.

Would it be possible your SQL sync errors and the left/join issues are
correlated and are both symptoms of another failure? Look at your log for some
explanation about why the node decided to leave the cluster.

> Can you help me to troubleshoot this ? I can provide any 
> log/configuration required in the process, so let me know.
> 
> I'd also like to ask if there is a bit of configuration that can be done 
> to postpone service start on the other node for two or three seconds as 
> a quick workaround ?

How would it be a workaround?

Regards,
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Ken Gaillot
On Mon, 2023-03-06 at 16:03 +0300, Alexander Epaneshnikov via Users
wrote:
> Hello. we are using pacemaker 2.1.4-5.el8  and seeing strange errors
> in the
> logs when a request is made to the cluster.
> 
> Feb 17 08:18:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077673-18-7xR8Y0/qb): Remote I/O error (121)
> Feb 17 08:19:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077927-18-dX5NSt/qb): Remote I/O error (121)
> Feb 17 08:20:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078160-18-RjzD4K/qb): Remote I/O error (121)
> Feb 17 08:21:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078400-18-YyJmJJ/qb): Remote I/O error (121)

The error code is likely coming from one of pacemaker-remoted's
qb_ipcs_service_handlers.

This is the correct behavior when a local client (typically a Pacemaker
command-line tool) attempts to contact the cluster before the cluster
has established a connection to the remote node.

I've also seen it very rarely in lab testing just before a new IPC
client is successfully accepted by pacemaker-remoted, and it doesn't
seem to have any ill effect, but I'm not sure why it shows up then.

I also occasionally see "Error in connection setup" on full cluster
nodes with "Operation not permitted" instead of "Remote I/O error". In
that case it's generally the correct behavior when a local client
attempts to connect while the cluster is shutting down on the node.

Pacemaker generally logs info- or warning-level messages for these, so
I'd rather the libqb message be at debug level, but I'm not sure
whether that would be a good idea for all possible errors.

> 
> other than that pacemaker/corosync works fine.
> 
> any suggestions on the cause of the error, or at least where to start
> debugging, are welcome.
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Klaus Wenninger
On Mon, Mar 6, 2023 at 3:32 PM Christine caulfield 
wrote:

> Hi,
>
> The error is coming from libqb - which is what manages the local IPC
> connections between local clients and the server.
>
> I'm the libqb maintainer but I've never seen that error before! Is there
> anything unusual about the setup on this node? Like filesystems on NFS
> or some other networked filesystem?
>
> Other basic things to check are that /dev/shm is not full. Yes, normally
> you'd get ENOSPC in that case but it's always worth checking because odd
> things can happen when filesystems get full.
>
> It might be helpful strace the client and server processes when the
> error occurs (if that's possible). I'm not 100% sure which operation is
> failing with EREMOTEIO - though I can't find many useful references to
> that error in the kernel which is also slightly weird.
>

EREMOTEIO is being used for the obvious purpose in pacemaker.

Klaus


>
> Chrissie
>
> On 06/03/2023 13:03, Alexander Epaneshnikov via Users wrote:
> > Hello. we are using pacemaker 2.1.4-5.el8  and seeing strange errors in
> the
> > logs when a request is made to the cluster.
> >
> > Feb 17 08:18:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077673-18-7xR8Y0/qb): Remote I/O error (121)
> > Feb 17 08:19:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1077927-18-dX5NSt/qb): Remote I/O error (121)
> > Feb 17 08:20:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078160-18-RjzD4K/qb): Remote I/O error (121)
> > Feb 17 08:21:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)  error: Error in connection setup
> (/dev/shm/qb-2984-1078400-18-YyJmJJ/qb): Remote I/O error (121)
> >
> > other than that pacemaker/corosync works fine.
> >
> > any suggestions on the cause of the error, or at least where to start
> debugging, are welcome.
> >
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-remoted /dev/shm errors

2023-03-06 Thread Christine caulfield

Hi,

The error is coming from libqb - which is what manages the local IPC 
connections between local clients and the server.


I'm the libqb maintainer but I've never seen that error before! Is there 
anything unusual about the setup on this node? Like filesystems on NFS 
or some other networked filesystem?


Other basic things to check are that /dev/shm is not full. Yes, normally 
you'd get ENOSPC in that case but it's always worth checking because odd 
things can happen when filesystems get full.


It might be helpful strace the client and server processes when the 
error occurs (if that's possible). I'm not 100% sure which operation is 
failing with EREMOTEIO - though I can't find many useful references to 
that error in the kernel which is also slightly weird.


Chrissie

On 06/03/2023 13:03, Alexander Epaneshnikov via Users wrote:

Hello. we are using pacemaker 2.1.4-5.el8  and seeing strange errors in the
logs when a request is made to the cluster.

Feb 17 08:18:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984] 
(handle_new_connection)  error: Error in connection setup 
(/dev/shm/qb-2984-1077673-18-7xR8Y0/qb): Remote I/O error (121)
Feb 17 08:19:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984] 
(handle_new_connection)  error: Error in connection setup 
(/dev/shm/qb-2984-1077927-18-dX5NSt/qb): Remote I/O error (121)
Feb 17 08:20:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984] 
(handle_new_connection)  error: Error in connection setup 
(/dev/shm/qb-2984-1078160-18-RjzD4K/qb): Remote I/O error (121)
Feb 17 08:21:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984] 
(handle_new_connection)  error: Error in connection setup 
(/dev/shm/qb-2984-1078400-18-YyJmJJ/qb): Remote I/O error (121)

other than that pacemaker/corosync works fine.

any suggestions on the cause of the error, or at least where to start 
debugging, are welcome.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker user question

2023-01-11 Thread Ken Gaillot
Pacemaker only uses the user and group names directly. It will get the
IDs from the local system, so you can use any IDs you want.

The names themselves are configurable at compile-time. If you're using
pre-built packages, you're stuck with whatever the packager chose (most
likely the defaults, hacluster for the user and haclient for the
group). If you're building your own, you can specify them when running
./configure using the --with-daemon-user and --with-daemon-group
options.

On Tue, 2023-01-10 at 13:51 +, Jelen, Piotr wrote:
> HI ,
>  
> I would like to ask you if the hacluster and haclinet group is
> hardcoded into the pacemaker or we can use other uid/gid than the
> standard 189/189?
>  
> I was able to create pacemaker  cluster with different uid/gid in my
> home cluster lab  by running
>  
> esg4bel37# groupadd -r haclient -g 2655
> esg4bel37# useradd -r -g haclient -u 2655-s /sbin/nologin -c "cluster
> user" hacluster
> esg4bel39# groupadd -r haclient -g 2655
> esg4bel39# useradd -r -g haclient -u 2655-s /sbin/nologin -c "cluster
> user" hacluster
> before installing pacemaker cluster  
> 
> Can you please tell me if this type of installation might cause any
> issue?
> 
>  
> Regards
> Piotr Jelen
> Senior Systems Platform Engineer
>  
> Mastercard
> Mountain View, Central Park  | Leopard
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker question

2022-10-05 Thread Tomas Jelinek

Hi,

If you are using pcs to setup your cluster, then the answer is no. I'm 
not sure about crm shell / hawk. Once you have a cluster, you can use 
users other than hacluster as Ken pointed out.


Regards,
Tomas


Dne 04. 10. 22 v 16:06 Ken Gaillot napsal(a):

Yes, see ACLs:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-acls

On Mon, 2022-10-03 at 15:51 +, Jelen, Piotr wrote:

Dear Clusterlabs team ,

I  would like to ask you if there is some possibility to use
different user (eg.cephhauser) for authenticate/setup cluster or
there is other method authenticate/setup cluster, not using password
by dedicated  pacamker user such us hacluster  ?


Best Regards
Piotr Jelen
Senior Systems Platform Engineer
  
Mastercard

Mountain View, Central Park  | Leopard
  

  
CONFIDENTIALITY NOTICE This e-mail message and any attachments are

only for the use of the intended recipient and may contain
information that is privileged, confidential or exempt from
disclosure under applicable law. If you are not the intended
recipient, any disclosure, distribution or other use of this e-mail
message or attachments is prohibited. If you have received this e-
mail message in error, please delete and notify the sender
immediately. Thank you.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker question

2022-10-04 Thread Ken Gaillot
Yes, see ACLs:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-acls

On Mon, 2022-10-03 at 15:51 +, Jelen, Piotr wrote:
> Dear Clusterlabs team , 
> 
> I  would like to ask you if there is some possibility to use
> different user (eg.cephhauser) for authenticate/setup cluster or
> there is other method authenticate/setup cluster, not using password
> by dedicated  pacamker user such us hacluster  ?
> 
> 
> Best Regards
> Piotr Jelen
> Senior Systems Platform Engineer
>  
> Mastercard
> Mountain View, Central Park  | Leopard
>  
> 
>  
> CONFIDENTIALITY NOTICE This e-mail message and any attachments are
> only for the use of the intended recipient and may contain
> information that is privileged, confidential or exempt from
> disclosure under applicable law. If you are not the intended
> recipient, any disclosure, distribution or other use of this e-mail
> message or attachments is prohibited. If you have received this e-
> mail message in error, please delete and notify the sender
> immediately. Thank you.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced[11637]: warning: Can't create a sane reply

2022-06-22 Thread Ken Gaillot
On Wed, 2022-06-22 at 17:16 +0530, Priyanka Balotra wrote:
> Hi All, 
> 
> We are seeing an issue where we performed cluster shutdown followed
> by cluster boot operation. All the nodes joined the cluster excet one
> (the first node). Here are some pacemaker logs around that
> timestamp: 
> 
> 2022-06-19T07:02:08.690213+00:00 FILE-1 pacemaker-fenced[11637]: 
> notice: Operation 'off' targeting FILE-1 on FILE-2 for 
> pacemaker-controld.11523@FILE-2.0b09e949: OK
> 2022-06-19T07:02:08.690604+00:00 FILE-1 pacemaker-fenced[11637]: 
> error: stonith_construct_reply: Triggered assert at
> fenced_commands.c:2363 : request != NULL
> 2022-06-19T07:02:08.690781+00:00 FILE-1 pacemaker-fenced[11637]: 
> warning: Can't create a sane reply

This assertion and message were dropped in the 2.0.5 release because
they do not really indicate an error. The new message in this situation
is:

  Missing request information for client notifications for operation
with result '...' (initiated before we came up?)

The most likely sequence in this particular case is that FILE-2 came up

and formed a cluster first, then fenced FILE-1 for being unseen. FILE-1 then 
finished coming up, joined the cluster, and got the notification
of its own fencing (which by the way means that either fencing is
misconfigured or perhaps you are using fabric fencing).

> 2022-06-19T07:02:08.691872+00:00 FILE-1 pacemaker-controld[11643]: 
> crit: We were allegedly just fenced by FILE-2 for FILE-2!
> 2022-06-19T07:02:08.693994+00:00 FILE-1 pacemakerd[11622]:  warning:
> Shutting cluster down because pacemaker-controld[11643] had fatal
> failure
> 2022-06-19T07:02:08.694209+00:00 FILE-1 pacemakerd[11622]:  notice:
> Shutting down Pacemaker
> 2022-06-19T07:02:08.694381+00:00 FILE-1 pacemakerd[11622]:  notice:
> Stopping pacemaker-schedulerd
> 
> 
> Let us know if you need any more logs to find an rca to this. 
> 
> Thanks
> Priyanka
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced[11637]: warning: Can't create a sane reply

2022-06-22 Thread Priyanka Balotra
Hi Klaus,
The config is as follows:
There are 2  nodes in the setup and some resources configured (stonith, IP,
systemd services related).
Sorry, I can share only high level details for this.

- pacemaker version
# rpm -qa pacemaker

pacemaker-2.0.3+20200511.2b248d828-1.10.x86_64




# rpm -qa corosync

corosync-2.4.5-10.14.6.1.x86_64


 # rpm -qa crmsh

crmsh-4.2.0+git.1585096577.f3257c89-3.4.noarch


On Wed, Jun 22, 2022 at 5:45 PM Klaus Wenninger  wrote:

> On Wed, Jun 22, 2022 at 1:46 PM Priyanka Balotra
>  wrote:
> >
> > Hi All,
> >
> > We are seeing an issue where we performed cluster shutdown followed by
> cluster boot operation. All the nodes joined the cluster excet one (the
> first node). Here are some pacemaker logs around that timestamp:
> >
> > 2022-06-19T07:02:08.690213+00:00 FILE-1 pacemaker-fenced[11637]:
> notice: Operation 'off' targeting FILE-1 on FILE-2 for
> pacemaker-controld.11523@FILE-2.0b09e949: OK
> >
> > 2022-06-19T07:02:08.690604+00:00 FILE-1 pacemaker-fenced[11637]:  error:
> stonith_construct_reply: Triggered assert at fenced_commands.c:2363 :
> request != NULL
> >
> > 2022-06-19T07:02:08.690781+00:00 FILE-1 pacemaker-fenced[11637]:
> warning: Can't create a sane reply
> >
> > 2022-06-19T07:02:08.691872+00:00 FILE-1 pacemaker-controld[11643]:
> crit: We were allegedly just fenced by FILE-2 for FILE-2!
> >
> > 2022-06-19T07:02:08.693994+00:00 FILE-1 pacemakerd[11622]:  warning:
> Shutting cluster down because pacemaker-controld[11643] had fatal failure
> >
> > 2022-06-19T07:02:08.694209+00:00 FILE-1 pacemakerd[11622]:  notice:
> Shutting down Pacemaker
> >
> > 2022-06-19T07:02:08.694381+00:00 FILE-1 pacemakerd[11622]:  notice:
> Stopping pacemaker-schedulerd
> >
> >
> >
> > Let us know if you need any more logs to find an rca to this.
>
> A little bit more info about your configuration and the pacemaker-version
> (cib?)
> used would definitely be helpful.
>
> Klaus
> >
> > Thanks
> > Priyanka
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-fenced[11637]: warning: Can't create a sane reply

2022-06-22 Thread Klaus Wenninger
On Wed, Jun 22, 2022 at 1:46 PM Priyanka Balotra
 wrote:
>
> Hi All,
>
> We are seeing an issue where we performed cluster shutdown followed by 
> cluster boot operation. All the nodes joined the cluster excet one (the first 
> node). Here are some pacemaker logs around that timestamp:
>
> 2022-06-19T07:02:08.690213+00:00 FILE-1 pacemaker-fenced[11637]:  notice: 
> Operation 'off' targeting FILE-1 on FILE-2 for 
> pacemaker-controld.11523@FILE-2.0b09e949: OK
>
> 2022-06-19T07:02:08.690604+00:00 FILE-1 pacemaker-fenced[11637]:  error: 
> stonith_construct_reply: Triggered assert at fenced_commands.c:2363 : request 
> != NULL
>
> 2022-06-19T07:02:08.690781+00:00 FILE-1 pacemaker-fenced[11637]:  warning: 
> Can't create a sane reply
>
> 2022-06-19T07:02:08.691872+00:00 FILE-1 pacemaker-controld[11643]:  crit: We 
> were allegedly just fenced by FILE-2 for FILE-2!
>
> 2022-06-19T07:02:08.693994+00:00 FILE-1 pacemakerd[11622]:  warning: Shutting 
> cluster down because pacemaker-controld[11643] had fatal failure
>
> 2022-06-19T07:02:08.694209+00:00 FILE-1 pacemakerd[11622]:  notice: Shutting 
> down Pacemaker
>
> 2022-06-19T07:02:08.694381+00:00 FILE-1 pacemakerd[11622]:  notice: Stopping 
> pacemaker-schedulerd
>
>
>
> Let us know if you need any more logs to find an rca to this.

A little bit more info about your configuration and the pacemaker-version (cib?)
used would definitely be helpful.

Klaus
>
> Thanks
> Priyanka
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker / ubuntu doesn't see my sbd device: what am I missing?

2022-04-07 Thread Ken Gaillot
With watchdog-only SBD you don't need a fence agent; it's built-in to
Pacemaker when you set the stonith-watchdog-timeout cluster property.

However watchdog-only SBD isn't sufficient for a 2-node cluster,
because each node will assume the other self-fences but neither will.
You need either a shared disk or true quorum (via a third node or
corosync-qdevice).

On Wed, 2022-04-06 at 19:34 +, Tavanyar, Simon wrote:
> This is my first time using Pacemaker, and I wanted to try watchdog-
> only fencing with SBD.
> I’m running on Ubuntu 21.10 and Pacemaker v2.0.5
> My cluster is up just fine with Dummy services on two nodes.
> Systemd says my sbd device is active and running.
> But the ‘stonith’ command that Pacemaker uses won’t find it, so the
> resource fails to start in the cluster.
>  
> Help much appreciated!
> Thanks
> Simon
>  
>  
>  
> $ sudo stonith -t external/sbd -E -S
> external/sbd[361914]: ERROR: No sbd device(s) found in the
> configuration.
> WARN: external_status: 'sbd status' failed with rc 1
> ERROR: external/sbd device not accessible.
>  
>  
> $ systemctl status sbd
> ● sbd.service - Shared-storage based fencing daemon
> Loaded: loaded (/lib/systemd/system/sbd.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Fri 2022-04-01 15:18:04 EDT; 4 days
> ago
> Docs: man:sbd(8)
> Process: 2474278 ExecStart=/usr/sbin/sbd $SBD_OPTS -p
> /var/run/sbd.pid watch (code=exited, status=0/SUCCESS)
> Main PID: 2474279 (sbd)
> Tasks: 3 (limit: 38258)
> Memory: 11.2M
> CPU: 4min 7.329s
> CGroup: /system.slice/sbd.service
> ├─2474279 sbd: inquisitor
> ├─2474280 sbd: watcher: Pacemaker
> └─2474281 sbd: watcher: Cluster
>  
>  
> $ sudo pcs status
> Cluster name: Axx
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: node0 (version 2.0.5-ba59be7122) - partition with
> quorum
>   * Last updated: Wed Apr  6 14:38:44 2022
>   * Last change:  Wed Apr  6 14:38:35 2022 by root via cibadmin on
> node0
>   * 2 nodes configured
>   * 6 resource instances configured
>  
> Node List:
>   * Online: [ node0 node1 ]
>  
> Full List of Resources:
>   * Resource Group: AxxDummy:
> * p_Dummy_1 (ocf::heartbeat:Dummy):  Started node0
> * p_Dummy_2 (ocf::heartbeat:Dummy):  Started node0
> * p_Dummy_3 (ocf::heartbeat:Dummy):  Started node0
> * ClusterIP (ocf::heartbeat:IPaddr2):Started node0
>   * p_Dummy_4   (ocf::heartbeat:Dummy):  Started node0
>   * fence-sbd   (stonith:external/sbd):  Stopped
>  
> Failed Resource Actions:
>   * fence-sbd_start_0 on node0 'error' (1): call=51,
> status='complete', exitreason='', last-rc-change='2022-04-06 14:38:13
> -04:00', queued=0ms, exec=3102ms
>   * fence-sbd_start_0 on node1 'error' (1): call=41,
> status='complete', exitreason='', last-rc-change='2022-04-06 14:38:09
> -04:00', queued=0ms, exec=3094ms
>  
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>   sbd: active/enabled
>  
>  
> This is from /var/log/syslog
> Apr 6 14:40:43 ubuntuserver pacemaker-controld[349716]: notice:
> Requesting local execution of start operation for fence-sbd on node0
> Apr 6 14:40:43 ubuntuserver external/sbd[349924]: [349930]: ERROR: No
> sbd device(s) found in the configuration.
> Apr 6 14:40:46 ubuntuserver pacemaker-fenced[349712]: notice:
> Operation 'monitor' [349931] for device 'fence-sbd' returned: -61 (No
> data available)
> Apr 6 14:40:46 ubuntuserver pacemaker-fenced[349712]: warning: fence-
> sbd:349931 [ Performing: stonith -t external/sbd -E -S ]
> Apr 6 14:40:46 ubuntuserver pacemaker-fenced[349712]: warning: fence-
> sbd:349931 [ failed: 1 ]
>  
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker API (REST, SOAP, Java library)?

2022-03-08 Thread Ken Gaillot
On Tue, 2022-03-08 at 10:24 +0200, Viet Nguyen wrote:
> Hi,
> 
> May I know is there any API like REST or SOAP or some kind of Java
> libraries to interact with Pacemaker?
> 
> I have a Java application and I would like to use that to monitor
> Pacemaker. Like viewing the resource statuses; and managing resources
> and nodes like move resources or stop a node from that Java app, if
> possible.
> 
> Thank you so much for your help?
> Regards,
> Viet

Hi,

There is no REST/SOAP API, but there is a high-level C API being
developed that will make it much easier to create such things. The C
API will eventually have C function equivalents of all command-line
tool options, producing XML output that is the same as if the command
were run with --output-as=xml.

The equivalents of a few commands are already available, including
crm_simulate and crmadmin, and crm_mon will be added with the next
release.

In the meantime, most people just execute the command-line tools
directly from their code.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker managing Keycloak

2022-01-31 Thread damiano giuliani
Ehy Philip,

sorry for being late, today was a bad day!
to make keycloak reconnect to the postgres db when it fails, you have to
edit your current configuration file (it could be for
example /opt/keycloak/standalone/configuration/standalone-ha.xml  or
standalone.xml doublecheck it)

replace the datasources block with:




jdbc:postgresql://ltaoperdbs01/keycloakdb
postgresql

100


keycloak
yourpassword


 select
1
 true

 15000
 
 
 false






org.postgresql.xa.PGXADataSource




replacing  with your postgres database informations.

dont forget to make this on all your keycloack cluster node.

probably you have to restart the services.

jave a look there if you need more info:

https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/6.4/html/administration_and_configuration_guide/sect-example_datasources

let us how things are going after this.

BR

Damiano

Il giorno ven 28 gen 2022 alle ore 23:12 Philip Alesio <
philip.ale...@gmail.com> ha scritto:

> That would be great!
>
> On Fri, Jan 28, 2022 at 2:50 PM damiano giuliani <
> damianogiulian...@gmail.com> wrote:
>
>> Ehy, i solved the issue you talking about few months ago, you have to
>> modify .xml configuration on keycloak side, if you re not in hurry monday i
>> send you how i fix it.
>>
>> Damiano
>>
>> On Fri, 28 Jan 2022, 20:25 Ken Gaillot,  wrote:
>>
>>> On Fri, 2022-01-28 at 12:15 -0500, Philip Alesio wrote:
>>> > Hi Everyone,
>>> >
>>> > I'm attempting to create a failover cluster that uses Postgresql and
>>> > Keycloak and am having difficulty getting Keycloak running.  Keycloak
>>> > is using a Postgresql database.  In one case I'm using DRBD to
>>> > replicate the data and in another case I'm using Postgresql.  The
>>> > failure, in both cases, is that Keycloak fails to connect to the
>>> > database.  In both cases Pacemaker is running with the Postgresql
>>> > resource when I add the Keycloak resource. If I "docker run"
>>> > Keyclock, not adding it as a Pacemaker resource, Keycloak starts and
>>> > connects to the database.
>>> >
>>> > Below adds Keycloak as a Pacemaker resource:
>>> >
>>> > pcs cluster cib  cluster1.xml
>>> > pcs -
>>> > f cluster1.xml resource create p_keycloak ocf:heartbeat:docker image=
>>> > jboss/keycloak name=keycloak run_opts="-d -e KEYCLOAK_USER=admin -
>>> > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -e DB_VENDOR=postgres -
>>> > e DB_USER=postgres -e DB_PASSWORD=postgres -
>>> > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -p 8080:8080 -
>>> > e DB_ADDR=postgres -
>>> > e DB_PORT='5432' –network=cluster1dkrnet" op monitor interval=60s
>>> > pcs -f
>>> > cluster1.xml resource group add g_receiver p_keycloak
>>> > pcs cluster cib-push  cluster1.xml --config
>>> >
>>> > Below creates a Keycloak container that is not managed by Pacemaker:
>>> > > docker run --name keycloak -e KEYCLOAK_USER=admin -
>>> > > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -
>>> > > e DB_VENDOR=postgres -e DB_USER=postgres -e DB_PASSWORD=postgres -
>>> > > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -
>>> > > p 8080:8080 -e DB_ADDR=postgres -e DB_PORT='5432'
>>> > > --network=cluster1dkrnet jboss/keycloak
>>> >
>>> >  Does anyone have experience with Pacemaker with Keyclock and/or if
>>> > there are any thoughts about why Keycloak is not connecting to the
>>> > Postgresql database?
>>> >
>>> > Thanks in advance.
>>>
>>> I'd check for SELinux denials first. A command executed from the
>>> command line is unconstrained, while being executed by a daemon is
>>> subject to SELinux policies.
>>>
>>> Other than that, maybe turn on any debugging options and check the
>>> keycloak logs from the container (e.g. using network logging or an
>>> exported host disk).
>>> --
>>> Ken Gaillot 
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/u

Re: [ClusterLabs] Pacemaker managing Keycloak

2022-01-28 Thread Philip Alesio
That would be great!

On Fri, Jan 28, 2022 at 2:50 PM damiano giuliani <
damianogiulian...@gmail.com> wrote:

> Ehy, i solved the issue you talking about few months ago, you have to
> modify .xml configuration on keycloak side, if you re not in hurry monday i
> send you how i fix it.
>
> Damiano
>
> On Fri, 28 Jan 2022, 20:25 Ken Gaillot,  wrote:
>
>> On Fri, 2022-01-28 at 12:15 -0500, Philip Alesio wrote:
>> > Hi Everyone,
>> >
>> > I'm attempting to create a failover cluster that uses Postgresql and
>> > Keycloak and am having difficulty getting Keycloak running.  Keycloak
>> > is using a Postgresql database.  In one case I'm using DRBD to
>> > replicate the data and in another case I'm using Postgresql.  The
>> > failure, in both cases, is that Keycloak fails to connect to the
>> > database.  In both cases Pacemaker is running with the Postgresql
>> > resource when I add the Keycloak resource. If I "docker run"
>> > Keyclock, not adding it as a Pacemaker resource, Keycloak starts and
>> > connects to the database.
>> >
>> > Below adds Keycloak as a Pacemaker resource:
>> >
>> > pcs cluster cib  cluster1.xml
>> > pcs -
>> > f cluster1.xml resource create p_keycloak ocf:heartbeat:docker image=
>> > jboss/keycloak name=keycloak run_opts="-d -e KEYCLOAK_USER=admin -
>> > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -e DB_VENDOR=postgres -
>> > e DB_USER=postgres -e DB_PASSWORD=postgres -
>> > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -p 8080:8080 -
>> > e DB_ADDR=postgres -
>> > e DB_PORT='5432' –network=cluster1dkrnet" op monitor interval=60s
>> > pcs -f
>> > cluster1.xml resource group add g_receiver p_keycloak
>> > pcs cluster cib-push  cluster1.xml --config
>> >
>> > Below creates a Keycloak container that is not managed by Pacemaker:
>> > > docker run --name keycloak -e KEYCLOAK_USER=admin -
>> > > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -
>> > > e DB_VENDOR=postgres -e DB_USER=postgres -e DB_PASSWORD=postgres -
>> > > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -
>> > > p 8080:8080 -e DB_ADDR=postgres -e DB_PORT='5432'
>> > > --network=cluster1dkrnet jboss/keycloak
>> >
>> >  Does anyone have experience with Pacemaker with Keyclock and/or if
>> > there are any thoughts about why Keycloak is not connecting to the
>> > Postgresql database?
>> >
>> > Thanks in advance.
>>
>> I'd check for SELinux denials first. A command executed from the
>> command line is unconstrained, while being executed by a daemon is
>> subject to SELinux policies.
>>
>> Other than that, maybe turn on any debugging options and check the
>> keycloak logs from the container (e.g. using network logging or an
>> exported host disk).
>> --
>> Ken Gaillot 
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker managing Keycloak

2022-01-28 Thread damiano giuliani
Ehy, i solved the issue you talking about few months ago, you have to
modify .xml configuration on keycloak side, if you re not in hurry monday i
send you how i fix it.

Damiano

On Fri, 28 Jan 2022, 20:25 Ken Gaillot,  wrote:

> On Fri, 2022-01-28 at 12:15 -0500, Philip Alesio wrote:
> > Hi Everyone,
> >
> > I'm attempting to create a failover cluster that uses Postgresql and
> > Keycloak and am having difficulty getting Keycloak running.  Keycloak
> > is using a Postgresql database.  In one case I'm using DRBD to
> > replicate the data and in another case I'm using Postgresql.  The
> > failure, in both cases, is that Keycloak fails to connect to the
> > database.  In both cases Pacemaker is running with the Postgresql
> > resource when I add the Keycloak resource. If I "docker run"
> > Keyclock, not adding it as a Pacemaker resource, Keycloak starts and
> > connects to the database.
> >
> > Below adds Keycloak as a Pacemaker resource:
> >
> > pcs cluster cib  cluster1.xml
> > pcs -
> > f cluster1.xml resource create p_keycloak ocf:heartbeat:docker image=
> > jboss/keycloak name=keycloak run_opts="-d -e KEYCLOAK_USER=admin -
> > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -e DB_VENDOR=postgres -
> > e DB_USER=postgres -e DB_PASSWORD=postgres -
> > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -p 8080:8080 -
> > e DB_ADDR=postgres -
> > e DB_PORT='5432' –network=cluster1dkrnet" op monitor interval=60s
> > pcs -f
> > cluster1.xml resource group add g_receiver p_keycloak
> > pcs cluster cib-push  cluster1.xml --config
> >
> > Below creates a Keycloak container that is not managed by Pacemaker:
> > > docker run --name keycloak -e KEYCLOAK_USER=admin -
> > > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -
> > > e DB_VENDOR=postgres -e DB_USER=postgres -e DB_PASSWORD=postgres -
> > > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -
> > > p 8080:8080 -e DB_ADDR=postgres -e DB_PORT='5432'
> > > --network=cluster1dkrnet jboss/keycloak
> >
> >  Does anyone have experience with Pacemaker with Keyclock and/or if
> > there are any thoughts about why Keycloak is not connecting to the
> > Postgresql database?
> >
> > Thanks in advance.
>
> I'd check for SELinux denials first. A command executed from the
> command line is unconstrained, while being executed by a daemon is
> subject to SELinux policies.
>
> Other than that, maybe turn on any debugging options and check the
> keycloak logs from the container (e.g. using network logging or an
> exported host disk).
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker managing Keycloak

2022-01-28 Thread Ken Gaillot
On Fri, 2022-01-28 at 12:15 -0500, Philip Alesio wrote:
> Hi Everyone,
> 
> I'm attempting to create a failover cluster that uses Postgresql and
> Keycloak and am having difficulty getting Keycloak running.  Keycloak
> is using a Postgresql database.  In one case I'm using DRBD to
> replicate the data and in another case I'm using Postgresql.  The
> failure, in both cases, is that Keycloak fails to connect to the
> database.  In both cases Pacemaker is running with the Postgresql
> resource when I add the Keycloak resource. If I "docker run"
> Keyclock, not adding it as a Pacemaker resource, Keycloak starts and
> connects to the database. 
> 
> Below adds Keycloak as a Pacemaker resource:
> 
> pcs cluster cib  cluster1.xml
> pcs -
> f cluster1.xml resource create p_keycloak ocf:heartbeat:docker image=
> jboss/keycloak name=keycloak run_opts="-d -e KEYCLOAK_USER=admin -
> e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -e DB_VENDOR=postgres -
> e DB_USER=postgres -e DB_PASSWORD=postgres -
> e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -p 8080:8080 -
> e DB_ADDR=postgres -
> e DB_PORT='5432' –network=cluster1dkrnet" op monitor interval=60s
> pcs -f
> cluster1.xml resource group add g_receiver p_keycloak
> pcs cluster cib-push  cluster1.xml --config
>  
> Below creates a Keycloak container that is not managed by Pacemaker: 
> > docker run --name keycloak -e KEYCLOAK_USER=admin -
> > e KEYCLOAK_PASSWORD=admin -e DB_ADDR=postgres -
> > e DB_VENDOR=postgres -e DB_USER=postgres -e DB_PASSWORD=postgres -
> > e DB_DATABASE=keycloak_db -e JDBC_PARAMS=useSSL=false -
> > p 8080:8080 -e DB_ADDR=postgres -e DB_PORT='5432' 
> > --network=cluster1dkrnet jboss/keycloak
> 
>  Does anyone have experience with Pacemaker with Keyclock and/or if
> there are any thoughts about why Keycloak is not connecting to the
> Postgresql database?
> 
> Thanks in advance.

I'd check for SELinux denials first. A command executed from the
command line is unconstrained, while being executed by a daemon is
subject to SELinux policies.

Other than that, maybe turn on any debugging options and check the
keycloak logs from the container (e.g. using network logging or an
exported host disk).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.1 final release now available

2021-09-10 Thread Digimer

  
  
On 2021-09-09 7:12 p.m., Ken Gaillot
  wrote:


  Hi all,

Pacemaker 2.1.1 has officially been released, with source code
available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.1

Highlights include a number of regression fixes and other bug fixes.
For more details, see the ChangeLog in the source repository.

Many thanks to all contributors of source code to this release,
including Chris Lumens, Christine Caulfield, Emil Penchev, Gao,Yan,
Grace Chin, Hideo Yamauchi, José Guilherme Vanz, Ken Gaillot, Klaus
Wenninger, and Oyvind Albrigtsen.


Congrats to all!!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker multi-state resource stop not running although "pcs status" indicates "Stopped"

2021-08-14 Thread Andrei Borzenkov
On 13.08.2021 22:46, ChittaNagaraj, Raghav wrote:
> Hello Team,
> 
> Hope you doing well.
> 
> Running into an issue with multi-state resources not running stop function on 
> a node but failing over to start the resource on another node part of the 
> cluster when corosync process is killed.
> 
> Note, in the below, actual resource names/hostnames have been changed from 
> the original.
> 
> Snippet of pcs status before corosync is killed:
> 
>  $ hostname
> pace_node_a
> 
> snippet of "pcs status"
> colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_a
> Master/Slave Set: main-multi-state-resource [main-multi]
>  Masters: [ pace_node_a ]
>  Stopped: [ pace_node_b ]
> 
> Now executed action to kill corosync process using kill -9 on "pace_node_a"
> 
> Resulting snippet of "pcs status"
> 
> colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_b
> Master/Slave Set: main-multi-state-resource [main-multi]
>  Stopped: [ pace_node_a ]
>  Masters: [ pace_node_b ]
> 
> As you can see, pcs status indicates that "main-multi-state-resource" stopped 
> where corosync was killed on "pace_node_a" and started on "pace_node_b". 
> Although, this indication is right, the underlying resource managed by 
> "main-multi-state-resource" never stopped on "pace_node_a".

When you kill corosync, this node is isolated and should have been
fenced. At which point it does not matter whether resources had been
stopped or not. Besides, when you kill corosync pacemaker processes on
this node are also terminated so nothing can initiate stop of any resource.

> Also, there were no logs from crmd and other components stating it even 
> attempted to stop on "pace_node_a". Interestingly, crmd logs indicated that 
> the colocated resource - "colocated-resource" was being stopped and there is 
> evidence that the resource managed by "colocated-resource" actually stopped.

Well, you did not show any evidence so it is hard to make any comment.

> 
> Is this a known issue?
>

There is no issue. Most likely stonith is not enabled and pacemaker
assumes node is offline when communication is lost. When node is offline
nothing can be active on this node by definition.

> Please let us know if any additional information is needed.
> 

There is no information so far. You need to show actual configuration of
your cluster and those resources as well as logs from DC starting with
killing corosync until resources were migrated.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov  wrote:
>
> Three nodes A, B, C. Communication between A and B is blocked
> (completely - no packet can come in both direction). A and B can
> communicate with C.
>
> I expected that result will be two partitions - (A, C) and (B, C). To my
> surprise, A went offline leaving (B, C) running. It was always the same
> node (with node id 1 if it matters, out of 1, 2, 3).
>
> How surviving partition is determined in this case?
>

For the sake of archives - this is how Totem protocol works. Which
node will be isolated is non-deterministic and depends on whether C
receives a message from A or B first. A will mark B as unreachable
(failed) and send a message to C; once C gets this message it marks B
as failed and ignores further messages from it (actually this will
cause B to mark C as failed in return). So the cluster will be split
in two partitions - (A, C) and B. B sends exactly the same message
that marks A as failed. Both messages are sent after consensus timeout
so at approximately the same moment.

> Can I be sure the same will also work in case of multiple nodes? I.e. if
> I have two sites with equal number of nodes and the third site as
> witness and connectivity between multi-node sites is lost but each site
> can communicate with witness. Will one site go offline? Which one?

This should work exactly the same and the isolated site is just as
non-deterministic. Moreover, it will also be non-deterministic if the
number of nodes on sites without connectivity is different (at last I
do not see anything in Totem that would depend on the number of nodes
unless Corosync adds some external knobs here). So in case of site A
and B with 3 nodes each and site C with 1 node and site A losing
connectivity to C we may equally end up with 6+1 split as well as 3+4
split.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-05 Thread Digimer

  
  
On 2021-08-05 2:25 p.m., Andrei
  Borzenkov wrote:


  Three nodes A, B, C. Communication between A and B is blocked
(completely - no packet can come in both direction). A and B can
communicate with C.

I expected that result will be two partitions - (A, C) and (B, C). To my
surprise, A went offline leaving (B, C) running. It was always the same
node (with node id 1 if it matters, out of 1, 2, 3).

How surviving partition is determined in this case?

Can I be sure the same will also work in case of multiple nodes? I.e. if
I have two sites with equal number of nodes and the third site as
witness and connectivity between multi-node sites is lost but each site
can communicate with witness. Will one site go offline? Which one?



In your case, your nodes were otherwise healthy so quorum worked.
  To properly avoid a split brain (when a node is not behaving
  properly, ie: lockups, bad RAM/CPU, etc) you realy need actual
  fencing. In such a case, whichever nodes maintain quorum, will
  fence the lost node (be it because it became inquorate or stopped
  behaving properly). 

As for the mechanics of how quorum is determined in your case
  above, I'll let one of the corosync people decide.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker problems with pingd

2021-08-05 Thread Klaus Wenninger
On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz <
janusz.jaskiew...@gmail.com> wrote:

> Hello.
>
> Please forgive the length of this email but I wanted to provide as much
> details as possible.
>
> I'm trying to set up a cluster of two nodes for my service.
> I have a problem with a scenario where the network between two nodes gets
> broken and they can no longer see each other.
> This causes split-brain.
> I know that proper way of implementing this would be to employ STONITH,
> but it is not feasible for me now (I don't have necessary hardware support
> and I don't want to introduce another point of failure by introducing
> shared storage based STONITH).
>
> In order to work-around the split-brain scenario I introduced pingd to my
> cluster, which in theory should do what I expect.
> pingd pings a network device, so when the NIC is broken on one of my
> nodes, this node should not run the resources because pingd would fail for
> it.
>
As we've discussed on this list in multiple previous threads already there
are lots of failure scenarios
where cluster-nodes don't see each other but both can ping something else
on the network.
Important cases where your approach wouldn't work are as well those where
nodes are just
partially alive - leads to corosync membership being lost & node not able
to stop resources
properly anymore.
Thus it is highly recommended to have all these setups that rely on some
kind of self-fencing or
bringing down of resources within some timeout being guarded by a
(hardware)-watchdog.
Previously you probably were referring to SBD which implements such a
watchdog-guarded approach. As you've probably figured out you can't
directly use SBD
in a 2-node-setup without a shared-disk. Pure watchdog-fencing needs quorum
decision
made by at least 3 instances. If you don't want a full blown 3rd node you
can consider
qdevice - can be used by multiple 2-node-clusters for quorum evaluation.
Otherwise you can use SBD with a shared disk.
You are right that both, a shared disk and any kind of 3rd node are an
additional point of
failure. Important is that in both cases we are talking about a point of
failure but not of a
single point of failure - meaning it failing it would not necessarily
impose services to be
shutdown.

Klaus

>
> pingd resource is configured to update the value of variable 'pingd'
> (interval: 5s, dampen: 3s, multiplier:1000).
> Based on the value of pingd I have a location constraint which sets score
> to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000.
> All other resources are colocated with DimProdClusterIP, and
> DimProdClusterIP should start before all other resources.
>
> Based on that setup I would expect that when the resources run on
> dimprod01 and I disconnect dimprod02 from the network, the resources will
> not start on dimprod02.
> Unfortunately I see that after a token interval + consensus interval my
> resources are brought up for a moment and then go down again.
> This is undesirable, as it causes DRBD split-brain inconsistency and
> cluster IP may also be taken over by the node which is down.
>
> I tried to debug it, but I can't figure out why it doesn't work.
> I would appreciate any help/pointers.
>
>
> Following are some details of my setup and snippet of pacemaker logs with
> comments:
>
> Setup details:
>
> pcs status:
> Cluster name: dimprodcluster
> Cluster Summary:
>   * Stack: corosync
>   * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition
> with quorum
>   * Last updated: Tue Aug  3 08:20:32 2021
>   * Last change:  Mon Aug  2 18:24:39 2021 by root via cibadmin on
> dimprod01
>   * 2 nodes configured
>   * 8 resource instances configured
>
> Node List:
>   * Online: [ dimprod01 dimprod02 ]
>
> Full List of Resources:
>   * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * WyrDimProdServer (systemd:wyr-dim): Started dimprod01
>   * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData]
> (promotable):
> * Masters: [ dimprod01 ]
> * Slaves: [ dimprod02 ]
>   * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01
>   * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01
>   * Clone Set: ping-clone [ping]:
> * Started: [ dimprod01 dimprod02 ]
>
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>
>
> pcs constraint
> Location Constraints:
>   Resource: DimProdClusterIP
> Constraint: location-DimProdClusterIP
>   Rule: score=-INFINITY
> Expression: pingd ne 1000
> Ordering Constraints:
>   start DimProdClusterIP then promote WyrDimProdServerData-clone
> (kind:Mandatory)
>   promote WyrDimProdServerData-clone then start WyrDimProdFS
> (kind:Mandatory)
>   start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)
>   start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)
> Colocation Constraints:
>   WyrDimProdServer with DimProdClusterIP (score:INFINITY)
>   DimTestClusterIP with DimProdClu

Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-10 Thread Andrei Borzenkov
On Wed, Jun 9, 2021 at 8:58 PM  wrote:
>
> I had generated the docs from a host with older versions of some of the
> doc tools. I regenerated them from a newer host. Some tables still have
> issues, but long lines are now wrapped.
>

Yes, that is fixed, thank you.

> > >
> > > There are problems with document structure for all documents. All
> > > documents contain a single chapter with the name TABLE OF CONTENTS
> > > which then contains actual chapters as subsections.
> > >

And that issue still remains. Chapter numbering in HTML does not match
PDF. HTML has chapters 1, 2, 3 ... while PDF has single chapter 2
"TABLE OF CONTENTS" and  further sections 2.1, 2.2, ... for actual
chapters.

HTML:

2. Installing Cluster Software

PDF:

2.2 Installing Cluster Software
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-09 Thread kgaillot
I had generated the docs from a host with older versions of some of the
doc tools. I regenerated them from a newer host. Some tables still have
issues, but long lines are now wrapped.

On Wed, 2021-06-09 at 12:40 -0500, kgail...@redhat.com wrote:
> On Wed, 2021-06-09 at 12:29 +0300, Andrei Borzenkov wrote:
> > On Wed, Jun 9, 2021 at 12:24 AM  wrote:
> > > 
> > > Hi all,
> > > 
> > > Pacemaker 2.1.0 has officially been released, with source code
> > > available at:
> > > 
> > >  
> > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.0
> > > 
> > > Highlights include OCF Resource Agent API 1.1 compatibility,
> > > noncritical resources, and new build-time options. The Pacemaker
> > > documentation is now built using Sphinx instead of Publican,
> > > giving
> > > a
> > > fresher look:
> > > 
> > >  https://clusterlabs.org/pacemaker/doc/
> > > 
> > 
> > About PDF versions
> > 
> > Documents still have the problem with too long lines in code
> > samples
> > that are not folded and simply truncated.
> 
> At some point I had fixed this so it wrapped with a line-wrap
> indicator. I don't know what made that stop working. :-(
> 
> > 
> > Some tables have rather poor column width so that even a single
> > word
> > does not fit into columns; this makes reading rather hard. See as
> > example Table 2.4: Upgrade Methods in pacemaker administration.
> > 
> > There are problems with document structure for all documents. All
> > documents contain a single chapter with the name TABLE OF CONTENTS
> > which then contains actual chapters as subsections.
> > 
> > Pacemaker administration does not have index (and corresponding PDF
> > bookmarks)
> > 
> > > For more details, see the ChangeLog in the source repository, and
> > > the
> > > following wiki page, which distribution packagers and users who
> > > build
> > > Pacemaker from source or use Pacemaker command-line tools in
> > > scripts
> > > are encouraged to go over carefully:
> > > 
> > >  https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
> > > 
> > > --
> > > Ken Gaillot 
> > > 
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> > 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-09 Thread kgaillot
On Wed, 2021-06-09 at 12:29 +0300, Andrei Borzenkov wrote:
> On Wed, Jun 9, 2021 at 12:24 AM  wrote:
> > 
> > Hi all,
> > 
> > Pacemaker 2.1.0 has officially been released, with source code
> > available at:
> > 
> >  
> > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.0
> > 
> > Highlights include OCF Resource Agent API 1.1 compatibility,
> > noncritical resources, and new build-time options. The Pacemaker
> > documentation is now built using Sphinx instead of Publican, giving
> > a
> > fresher look:
> > 
> >  https://clusterlabs.org/pacemaker/doc/
> > 
> 
> About PDF versions
> 
> Documents still have the problem with too long lines in code samples
> that are not folded and simply truncated.

At some point I had fixed this so it wrapped with a line-wrap
indicator. I don't know what made that stop working. :-(

> 
> Some tables have rather poor column width so that even a single word
> does not fit into columns; this makes reading rather hard. See as
> example Table 2.4: Upgrade Methods in pacemaker administration.
> 
> There are problems with document structure for all documents. All
> documents contain a single chapter with the name TABLE OF CONTENTS
> which then contains actual chapters as subsections.
> 
> Pacemaker administration does not have index (and corresponding PDF
> bookmarks)
> 
> > For more details, see the ChangeLog in the source repository, and
> > the
> > following wiki page, which distribution packagers and users who
> > build
> > Pacemaker from source or use Pacemaker command-line tools in
> > scripts
> > are encouraged to go over carefully:
> > 
> >  https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
> > 
> > --
> > Ken Gaillot 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-09 Thread Andrei Borzenkov
On Wed, Jun 9, 2021 at 12:24 AM  wrote:
>
> Hi all,
>
> Pacemaker 2.1.0 has officially been released, with source code
> available at:
>
>  https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.0
>
> Highlights include OCF Resource Agent API 1.1 compatibility,
> noncritical resources, and new build-time options. The Pacemaker
> documentation is now built using Sphinx instead of Publican, giving a
> fresher look:
>
>  https://clusterlabs.org/pacemaker/doc/
>

About PDF versions

Documents still have the problem with too long lines in code samples
that are not folded and simply truncated.

Some tables have rather poor column width so that even a single word
does not fit into columns; this makes reading rather hard. See as
example Table 2.4: Upgrade Methods in pacemaker administration.

There are problems with document structure for all documents. All
documents contain a single chapter with the name TABLE OF CONTENTS
which then contains actual chapters as subsections.

Pacemaker administration does not have index (and corresponding PDF bookmarks)

> For more details, see the ChangeLog in the source repository, and the
> following wiki page, which distribution packagers and users who build
> Pacemaker from source or use Pacemaker command-line tools in scripts
> are encouraged to go over carefully:
>
>  https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes
>
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 2.1.0 final release now available

2021-06-08 Thread Digimer
On 2021-06-08 5:24 p.m., kgail...@redhat.com wrote:
> Hi all,
> 
> Pacemaker 2.1.0 has officially been released, with source code
> available at:
> 
>  https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.0
> 
> Highlights include OCF Resource Agent API 1.1 compatibility,
> noncritical resources, and new build-time options. The Pacemaker
> documentation is now built using Sphinx instead of Publican, giving a
> fresher look:
> 
>  https://clusterlabs.org/pacemaker/doc/
> 
> For more details, see the ChangeLog in the source repository, and the
> following wiki page, which distribution packagers and users who build
> Pacemaker from source or use Pacemaker command-line tools in scripts
> are encouraged to go over carefully:
> 
>  https://wiki.clusterlabs.org/wiki/Pacemaker_2.1_Changes

Huge congrats!!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Cluster help

2021-06-01 Thread Andrei Borzenkov
On 01.06.2021 18:20, kgail...@redhat.com wrote:
> On Thu, 2021-05-27 at 20:46 +0300, Andrei Borzenkov wrote:
>> On 27.05.2021 15:36, Nathan Mazarelo wrote:
>>> Is there a way to have pacemaker resource groups failover if all
>>> floating IP resources are unavailable?
>>> 
>>> I want to have multiple floating IPs in a resource group that will
>>> only failover if all IPs cannot work. Each floating IP is on a
>>> different subnet and can be used by the application I have. If a
>>> floating IP is unavailable it will use the next available floating
>>> IP.
>>> Resource Group: floating_IP
>>>
>>> floating-IP
>>>
>>> floating-IP2
>>>
>>> floating-IP3  
>>> For example, right now if a floating-IP resource fails the whole
>>> resource group will failover to a different node. What I want is to
>>> have pacemaker failover the resource group only if all three
>>> resources are unavailable. Is this possible?
>>>
>>
>> Yes. See "Moving Resources Due to Connectivity Changes" in pacemaker
>> explained.
> 
> I don't think that will work when the IP resources themselves are
> what's desired to be affected.
> 

I guess this need more precise explanation from OP what "floating IP is
unavailable" means. Personally I do not see any point in having local IP
without connectivity. If the question is really just "fail only if all
resources failed" then the obvious answer is resource set with
require-all=false and it does not matter what type resources are.

> My first thought is that a resource group is probably not the right
> model, since there is not likely to be an ordering relationship among
> the IPs, just colocation. I'd use separate colocations for IP2 and IP3
> with IP1 instead. However, that is not completely symmetrical -- if IP1
> *can't* be assigned to a node for any reason (e.g. meeting its failure
> threshold on all nodes), then the other IPs can't either.
> 
> To keep the IPs failing over as soon as one of them fails, the closest
> approach I can think of is the new critical resource feature, which is
> just coming out in the 2.1.0 release and so probably not an option
> here. Marking IP2 and IP3 as noncritical would allow those to stop on
> failure, and only if IP1 also failed would they be started elsewhere.
> However again it's not completely symmetric, all IPs would fail over if
> IP1 fails.
> 
> Basically, there's no way to treat a set of resources exactly equally.
> Pacemaker has to assign one of them to a node first, then assign the
> others relative to it.
> 
> There are some feature requests that are related, but no one's
> volunteered to do them yet:
> 
>  https://bugs.clusterlabs.org/show_bug.cgi?id=5052
>  https://bugs.clusterlabs.org/show_bug.cgi?id=5320
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Cluster help

2021-06-01 Thread kgaillot
On Thu, 2021-05-27 at 20:46 +0300, Andrei Borzenkov wrote:
> On 27.05.2021 15:36, Nathan Mazarelo wrote:
> > Is there a way to have pacemaker resource groups failover if all
> > floating IP resources are unavailable?
> > 
> > I want to have multiple floating IPs in a resource group that will
> > only failover if all IPs cannot work. Each floating IP is on a
> > different subnet and can be used by the application I have. If a
> > floating IP is unavailable it will use the next available floating
> > IP.
> > Resource Group: floating_IP
> > 
> > floating-IP
> > 
> > floating-IP2
> > 
> > floating-IP3  
> > For example, right now if a floating-IP resource fails the whole
> > resource group will failover to a different node. What I want is to
> > have pacemaker failover the resource group only if all three
> > resources are unavailable. Is this possible?
> > 
> 
> Yes. See "Moving Resources Due to Connectivity Changes" in pacemaker
> explained.

I don't think that will work when the IP resources themselves are
what's desired to be affected.

My first thought is that a resource group is probably not the right
model, since there is not likely to be an ordering relationship among
the IPs, just colocation. I'd use separate colocations for IP2 and IP3
with IP1 instead. However, that is not completely symmetrical -- if IP1
*can't* be assigned to a node for any reason (e.g. meeting its failure
threshold on all nodes), then the other IPs can't either.

To keep the IPs failing over as soon as one of them fails, the closest
approach I can think of is the new critical resource feature, which is
just coming out in the 2.1.0 release and so probably not an option
here. Marking IP2 and IP3 as noncritical would allow those to stop on
failure, and only if IP1 also failed would they be started elsewhere.
However again it's not completely symmetric, all IPs would fail over if
IP1 fails.

Basically, there's no way to treat a set of resources exactly equally.
Pacemaker has to assign one of them to a node first, then assign the
others relative to it.

There are some feature requests that are related, but no one's
volunteered to do them yet:

 https://bugs.clusterlabs.org/show_bug.cgi?id=5052
 https://bugs.clusterlabs.org/show_bug.cgi?id=5320

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not issuing start command intermittently

2021-05-30 Thread Klaus Wenninger

On 5/29/21 12:05 AM, Strahil Nikolov wrote:

Most RA scripts are writen in bash.
Usually you can change the shebang to '!#/usr/bin/bash -x' or you can 
set trace_ra=1 via 'pcs resource update RESOURCE trace_ra=1 
trace_file=/somepath'.


If you don't define trace_file, it should create them in 
/var/lib/heartbeat/trace_ra (based on memory -> so use find/locate).


Best Regards,
Strahil Nikolov

On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy
 wrote:
Hello Team,
We have been recently running some tests on our Pacemaker clusters
that involve two Pacemaker resources on two nodes respectively.
The test case in which we are experiencing intermittent problems
is one in which we bring down the Pacemaker resources on both
nodes simultaneously. Now our expected behaviour is that our
monitor function in our resource agent script detects the
downtime, and then should issue a start command. This happens on
most successful iterations of our test case. However, on some
iterations (approximately 1 out of 30 simulations) we notice that
Pacemaker is issuing the start command on only one of the hosts.
On the troubled host the monitor function is logging that the
resource is down as expected and is exiting with OCF_ERR_GENERIC
return code (1) . According to the documentation, this should
perform a soft disaster recovery, but when scanning the Pacemaker
logs, there is no indication of the start command being issued or
invoked. However, it works as expected on the other host.
To summarize the issue:

 1. The resource’s monitor is running and returning OCF_ERR_GENERIC
 2. The constraints we have for the resources are satisfied.
 3. There are no visible differences in the Pacemaker logs between
the test iteration that failed, and the multiple successful
iterations, other than the fact that Pacemaker does not start
the resource after the monitor returns OCF_ERR_GENERIC


In general pacemaker won't start a resource after receiving
OCF_ERR_GENERIC from the monitor. As you already mentioned
it will try to recover the resource to a known state by first
trying to stop and the state has to be reported as stopped
after that. Just then it will try to restart if rules say so.
Which Resource Agent are you using? If you brought down
the resource manually it shouldn't report OCF_ERR_GENERIC
but stopped.

Regards,
Klaus


1.

Could you provide some more insight into why this may be happening
and how we can further debug this issue? We are currently relying
on Pacemaker logs, but are there additional diagnostics to further
debug?
Thanks,
Abithan

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users


ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not issuing start command intermittently

2021-05-28 Thread Strahil Nikolov
Most RA scripts are writen in bash.Usually you can change the shebang to 
'!#/usr/bin/bash -x' or you can set trace_ra=1 via 'pcs resource update 
RESOURCE trace_ra=1 trace_file=/somepath'.
If you don't define trace_file, it should create them in 
/var/lib/heartbeat/trace_ra (based on memory -> so use find/locate).
Best Regards,Strahil Nikolov
 
  On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy 
wrote:   Hello Team, We have been recently running some tests on our Pacemaker 
clusters that involve two Pacemaker resources on two nodes respectively. The 
test case in which we are experiencing intermittent problems is one in which we 
bring down the Pacemaker resources on both nodes simultaneously. Now our 
expected behaviour is that our monitor function in our resource agent script 
detects the downtime, and then should issue a start command. This happens on 
most successful iterations of our test case. However, on some iterations 
(approximately 1 out of 30 simulations) we notice that Pacemaker is issuing the 
start command on only one of the hosts. On the troubled host the monitor 
function is logging that the resource is down as expected and is exiting with 
OCF_ERR_GENERIC return code (1) . According to the documentation, this should 
perform a soft disaster recovery, but when scanning the Pacemaker logs, there 
is no indication of the start command being issued or invoked. However, it 
works as expected on the other host.  To summarize the issue:   
   - The resource’s monitor is running and returning OCF_ERR_GENERIC
   - The constraints we have for the resources are satisfied.
   - There are no visible differences in the Pacemaker logs between the test 
iteration that failed, and the multiple successful iterations, other than the 
fact that Pacemaker does not start the resource after the monitor returns 
OCF_ERR_GENERIC   
  
Could you provide some more insight into why this may be happening and how we 
can further debug this issue? We are currently relying on Pacemaker logs, but 
are there additional diagnostics to further debug?
  Thanks,Abithan
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Cluster help

2021-05-27 Thread Andrei Borzenkov
On 27.05.2021 15:36, Nathan Mazarelo wrote:
> Is there a way to have pacemaker resource groups failover if all floating IP 
> resources are unavailable?
> 
> I want to have multiple floating IPs in a resource group that will only 
> failover if all IPs cannot work. Each floating IP is on a different subnet 
> and can be used by the application I have. If a floating IP is unavailable it 
> will use the next available floating IP.
> Resource Group: floating_IP
> 
> floating-IP
> 
> floating-IP2
> 
> floating-IP3  
> For example, right now if a floating-IP resource fails the whole resource 
> group will failover to a different node. What I want is to have pacemaker 
> failover the resource group only if all three resources are unavailable. Is 
> this possible?
> 

Yes. See "Moving Resources Due to Connectivity Changes" in pacemaker
explained.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-12 Thread Tomas Jelinek

Hi Sathish,

Sorry, I don't know how to get ruby 2.2.0 for RHEL 7.

The types of support have been explained by Ken. It has already been 
said that pcs-0.10 is not supported on RHEL 7.


I would recommend to either downgrade to pacemaker 1.x and pcs-0.9, as 
suggested by Ken, or upgrade your nodes to RHEL 8. Either of the options 
would get you to a supported configuration.



Regards,
Tomas


Dne 11. 03. 21 v 15:18 S Sathish S napsal(a):

Hi Tomas,

Thanks for your response.

Python 3.6+ package is available in RHEL 7 stream but for ruby 2.2.0+ we 
don’t have package available in RHEL 7 stream . how to overcome this 
problem can you provide way-forward for the same.


It also stated runtime dependencies of pcs and pcsd.

Thanks and Regards,

S Sathish S



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-11 Thread S Sathish S
Hi Tomas,

Thanks for your response.

Python 3.6+ package is available in RHEL 7 stream but for ruby 2.2.0+ we don’t 
have package available in RHEL 7 stream . how to overcome this problem can you 
provide way-forward for the same.

It also stated runtime dependencies of pcs and pcsd.

Thanks and Regards,
S Sathish S
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-10 Thread Ken Gaillot
On Wed, 2021-03-10 at 07:15 +, S Sathish S wrote:
> Hi Ken/Team,
>  
> We are using pacemaker software from Clusterlab upstream with version
> pacemaker 2.0.2 in our RHEL 7 system. So for fixing the CVE’s CVE-
> 2020-25654 we don’t want to downgrade to lower version of pacemaker
> 1.x hence we are trying to build latest pcs-0.10 version from
> upstream source has runtime dependencies for ruby 2.2.0+ which is not
> available in RHEL 7.x stream and getting compilation error , Please
> check and advise us whether pcs-0.10 is supported on RHEL 7.
>  
> we also need to understand pacemaker1.x and pacemaker2.x clusterlab
> supporting term whether new feature/security fix/bug fix will be
> handled for both channels.   
>  
> Thanks and Regards,
> S Sathish S

There's a distinction between commercial support (Red
Hat/SUSE/Ubuntu/etc.) and ClusterLabs support. Commercial support is
much more extensive; when ClusterLabs says an OS is supported, it's
basically just that the software can build and run regression tests
successfully when all dependencies are present.

Commercial support will generally handle building, dependencies,
security fixes, etc., for you, but limit you to a particular version
they choose. With ClusterLabs support, you can run whatever version you
want, but you're on your own as far as getting all dependencies working
in your environment and so forth.

ClusterLabs supports only the most current release of active Pacemaker
series. Rolling upgrades from certain older releases are also
supported.

ClusterLabs support for the Pacemaker 1 series ended with the 1.1.24
release. All Pacemaker 1 releases supported rolling upgrades from 1.0.0
or later.

There is no set time frame for how long the Pacemaker 2 series will be
supported. All Pacemaker 2 releases will support rolling upgrades from
1.1.11 or later and 2.0.0 or later.

There is no formal plan for Pacemaker 3 yet, but it will likely support
rolling upgrades from 2.0.0 or later, with the exception of dropping
support for Upstart. There may or may not be some time when both
Pacemaker 2 and 3 releases are being made and supported.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-10 Thread Tomas Jelinek

Hi,

The same principles apply to both pcs and pacemaker (and most probably 
the whole cluster stack).


Red Hat only supports the packages it provides, which is pcs-0.9 series 
in RHEL 7.


Even if you manage to install ruby 2.2.0+ and python 3.6+ on RHEL 7 
hosts and build and run pcs-0.10 on top of that, Red Hat won't support it.



Regards,
Tomas


Dne 10. 03. 21 v 8:15 S Sathish S napsal(a):

Hi Ken/Team,

We are using pacemaker software from Clusterlab upstream with version 
pacemaker 2.0.2 in our RHEL 7 system. So for fixing the CVE’s 
CVE-2020-25654 we don’t want to downgrade to lower version of pacemaker 
1.x hence we are trying to build latest pcs-0.10 version from upstream 
source has runtime dependencies for ruby 2.2.0+ which is not available 
in RHEL 7.x stream and getting compilation error , Please check and 
advise us whether pcs-0.10 is supported on RHEL 7.


we also need to understand pacemaker1.x and pacemaker2.x clusterlab 
supporting term whether new feature/security fix/bug fix will be handled 
for both channels.


Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-10 Thread S Sathish S
Hi Ken/Team,

We are using pacemaker software from Clusterlab upstream with version pacemaker 
2.0.2 in our RHEL 7 system. So for fixing the CVE’s CVE-2020-25654 we don’t 
want to downgrade to lower version of pacemaker 1.x hence we are trying to 
build latest pcs-0.10 version from upstream source has runtime dependencies for 
ruby 2.2.0+ which is not available in RHEL 7.x stream and getting compilation 
error , Please check and advise us whether pcs-0.10 is supported on RHEL 7.

we also need to understand pacemaker1.x and pacemaker2.x clusterlab supporting 
term whether new feature/security fix/bug fix will be handled for both channels.

Thanks and Regards,
S Sathish S

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-09 Thread Ken Gaillot
Hi Sathish,

You don't need to go through all that trouble. Red Hat backported the
fix for that vulnerability to RHEL 7:

https://access.redhat.com/errata/RHSA-2020:5453

The pacemaker-1.1.23-1.el7_9.1 packages available in RHEL 7.9 do not
have the vulnerability.

On Tue, 2021-03-09 at 05:09 +, S Sathish S wrote:
> Hi Ken/Team,
>  
> Thanks for the prompt response.
>  
> pacemaker 2.0.2 version from upstream source we build and run on RHEL
> 7 with corosync-2.4.4  & pcs-0.9.169 software version.
>  
> Due to CVE-2020-25654 high vulnerability is open on both the
> pacemaker 1.x and 2.x stream and fix is available in pacemaker 2.0.5
> version. Then pacemaker 2.0.5 version we have built from upstream
> source and run on RHEL 7 with corosync-2.4.4  & pcs-0.9.169 software
> version and found "pcs status resources" command not working and
> raised support ticket with clusterlab below response for the same.
>  
> Clusterlab response : pcs-0.9 does not support pacemaker => 2.0.0.
> You can go with pcs-0.9 + corosync < 3 + pacemaker 1.x OR pcs-0.10 +
> corosync 3.x + pacemaker 2.x. Combination of corosync 2 + pacemaker 2
> is not supported in any pcs version, even though it may work to some
> degree.
>  
> Ticket reference : 
> https://www.mail-archive.com/users@clusterlabs.org/msg11091.html
>  
> Now we are trying to build latest pcs-0.10 version from upstream
> source has runtime dependencies for ruby 2.2.0+ Which is not
> available in RHEL 7.x stream and getting compilation error , Please
> check and advise us whether pcs-0.10 is supported on RHEL 7.
>  
> 
>  
> Thanks and Regards,
> S Sathish S
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-09 Thread S Sathish S
Hi Ken/Team,

Thanks for the prompt response.

pacemaker 2.0.2 version from upstream source we build and run on RHEL 7 with 
corosync-2.4.4  & pcs-0.9.169 software version.

Due to CVE-2020-25654 high vulnerability is open on both the pacemaker 1.x and 
2.x stream and fix is available in pacemaker 2.0.5 version. Then pacemaker 
2.0.5 version we have built from upstream source and run on RHEL 7 with 
corosync-2.4.4  & pcs-0.9.169 software version and found "pcs status resources" 
command not working and raised support ticket with clusterlab below response 
for the same.

Clusterlab response : pcs-0.9 does not support pacemaker => 2.0.0. You can go 
with pcs-0.9 + corosync < 3 + pacemaker 1.x OR pcs-0.10 + corosync 3.x + 
pacemaker 2.x. Combination of corosync 2 + pacemaker 2 is not supported in any 
pcs version, even though it may work to some degree.

Ticket reference : 
https://www.mail-archive.com/users@clusterlabs.org/msg11091.html

Now we are trying to build latest pcs-0.10 version from upstream source has 
runtime dependencies for ruby 2.2.0+ Which is not available in RHEL 7.x stream 
and getting compilation error , Please check and advise us whether pcs-0.10 is 
supported on RHEL 7.

[cid:image001.jpg@01D714D0.6B796570]

Thanks and Regards,
S Sathish S
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-08 Thread Ken Gaillot
On Fri, 2021-03-05 at 13:39 +, S Sathish S wrote:
> Hi Team,
>  
> pacemaker-2.0.x version support on “RHEL 7” OS.
>  
> Thanks and Regards,
> S Sathish S

Hi,

Red Hat only supports the packages it provides, which is the Pacemaker
1.1 series in RHEL 7.

From an upstream perspective, you can certainly build and run Pacemaker
2.0 from source on RHEL 7, but Red Hat won't support it. Also, the
version of pcs supplied with RHEL 7 is only compatible with Pacemaker
1.1, so you'd need to build pcs from source as well.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-22 Thread Ken Gaillot
On Sun, 2021-02-21 at 12:56 +0300, İsmet BALAT wrote:
> And this state can be. Master machine can down and anorher machine
> can be new master. Then first machine can be online with master.Like
> in video. This state, I cant fix because no internet. This machines
> will use a offline project. All states have work successfully

Fencing is necessary for the case where both nodes are up, but unable
to communicate with each other (network issue, CPU load, etc.). It is
also necessary for the case where one node is not working properly and
is unable to stop an active resource. Without fencing, both nodes could
promote to the master role, causing data inconsistencies or even loss.

If you can't get power fencing, there are alternatives.

If your nodes are physical machines with hardware watchdogs, you may be
able to use sbd, if you have either shared storage or a lightweight
third node that can run corosync-qdevice to provide true quorum. Or, if
you have shared SCSI storage, you may be able to use fence_scsi to
fence by cutting off disk access. Or, if you have an intelligent
network switch (that is, one with SNMP-based administration), you may
be able to use fence_snmp to fence by cutting off network access. Or,
if your nodes are virtual machines, and you have access to the host,
you may be able to use fence_virt or fence_xvm.



> 
> On 21 Feb 2021 Sun at 12:45 İsmet BALAT  wrote:
> > I am testing aşk scenarios because I will use real machines with
> > pacemaker. Scenarios;
> > 
> > 1- 
> > node1  master
> > node2 slave 
> > Shutting node1, then node2 become master
> > Successfully
> > 
> > 2-
> > node1  slave
> > node2 master 
> > Shutting node2, then node1 become master
> > Successfully
> > 
> > 3-
> > node1  slave
> > node2 slave 
> > One node become master after 60s
> > Successfully
> > 
> > 4-
> > node1  master
> > node2 master 
> > First machine fail, and not fix unlike send command cleanup  
> > Fail
> > 
> > I haven’t got physical fencing device. But all machines must online
> > for redundancy. So I guess we don’t use fencing. Because servers
> > havent got connection for remote help and internet. They must fix
> > their:)
> > 
> > 
> > 
> > On 21 Feb 2021 Sun at 12:14 damiano giuliani <
> > damianogiulian...@gmail.com> wrote:
> > > My question is:
> > > Why you are pausing one VM?there is any specific scope in
> > > that?you should never have 2 master resources, pausing one vm
> > > could make unexpected behaviours.
> > > If you are testing failovers or simulated faults you must
> > > configure a fencing mechanism.
> > > Dont expect your cluster is working properly without it.
> > > 
> > > On Sun, 21 Feb 2021, 07:29 İsmet BALAT, 
> > > wrote:
> > > > Sorry, I am in +3utc and was sleeping. I will try first fix
> > > > node, then start cluster. Thank you 
> > > > 
> > > > On 21 Feb 2021 Sun at 00:00 damiano giuliani <
> > > > damianogiulian...@gmail.com> wrote:
> > > > > resources configured in a master/slave mode
> > > > > If you got 2 masters something is not working right. You
> > > > > should never have 2 node in master.
> > > > > Disable pacemaker and corosync services to autostart on both
> > > > > nodes
> > > > > systemctl disable corosync
> > > > > Systemctl disable pacemaker
> > > > > 
> > > > > You can start the faulty node using pcs cli:
> > > > > pcs cluster start
> > > > > 
> > > > > You can start the whole cluster using
> > > > > pcs cluster start --all
> > > > > 
> > > > > First of all configure a fencing mechanism to make the
> > > > > cluster consistent. Its mandatory.
> > > > > 
> > > > > 
> > > > > 
> > > > > On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  > > > > > wrote:
> > > > > > I am not using fence. If I disable pacemaker,how node join
> > > > > > cluster (for first example in video - master/slave
> > > > > > changing)? So I need a check script for fault states :( 
> > > > > > 
> > > > > > And thank you for reply 
> > > > > > 
> > > > > > On 20 Feb 2021 Sat at 23:40 damiano giuliani <
> > > > > > damianogiulian...@gmail.com> wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Have you correcly configure a working fencing
> > > > > > > mechanism?without it you cant rely on a safe and
> > > > > > > consistent environment.
> > > > > > > My suggestion is to disable the autostart services (and
> > > > > > > so the autojoin into the cluster) on both nodes.
> > > > > > >  if there is a fault you have to investigate before you
> > > > > > > rejoin the old fault master node.
> > > > > > > Pacemaker (and paf if u are using it) as far i know, 
> > > > > > > doesnt support the autoheal of the old master, so you
> > > > > > > should resync or pg_rewind eveythime there is a fault.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Sat, 20 Feb 2021, 19:03 İsmet BALAT, <
> > > > > > > bcalbat...@gmail.com> wrote:
> > > > > > > > I am using Pacemaker with Centos 8 and Postgresql 12.
> > > > > > > > Failover master/slave states successfully run. But if
> > > > > > > > all nodes are masters, 

Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-21 Thread İsmet BALAT
And this state can be. Master machine can down and anorher machine can be
new master. Then first machine can be online with master.Like in video.
This state, I cant fix because no internet. This machines will use a
offline project. All states have work successfully


On 21 Feb 2021 Sun at 12:45 İsmet BALAT  wrote:

> I am testing aşk scenarios because I will use real machines with
> pacemaker. Scenarios;
>
> 1-
> node1  master
> node2 slave
> Shutting node1, then node2 become master
> Successfully
>
> 2-
> node1  slave
> node2 master
> Shutting node2, then node1 become master
> Successfully
>
> 3-
> node1  slave
> node2 slave
> One node become master after 60s
> Successfully
>
> 4-
> node1  master
> node2 master
> First machine fail, and not fix unlike send command cleanup
> Fail
>
> I haven’t got physical fencing device. But all machines must online for
> redundancy. So I guess we don’t use fencing. Because servers havent got
> connection for remote help and internet. They must fix their:)
>
>
>
> On 21 Feb 2021 Sun at 12:14 damiano giuliani 
> wrote:
>
>> My question is:
>> Why you are pausing one VM?there is any specific scope in that?you should
>> never have 2 master resources, pausing one vm could make unexpected
>> behaviours.
>> If you are testing failovers or simulated faults you must configure a
>> fencing mechanism.
>> Dont expect your cluster is working properly without it.
>>
>> On Sun, 21 Feb 2021, 07:29 İsmet BALAT,  wrote:
>>
>>> Sorry, I am in +3utc and was sleeping. I will try first fix node, then
>>> start cluster. Thank you
>>>
>>> On 21 Feb 2021 Sun at 00:00 damiano giuliani <
>>> damianogiulian...@gmail.com> wrote:
>>>
 resources configured in a master/slave mode
 If you got 2 masters something is not working right. You should never
 have 2 node in master.
 Disable pacemaker and corosync services to autostart on both nodes
 systemctl disable corosync
 Systemctl disable pacemaker

 You can start the faulty node using pcs cli:
 pcs cluster start

 You can start the whole cluster using
 pcs cluster start --all

 First of all configure a fencing mechanism to make the cluster
 consistent. Its mandatory.



 On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  wrote:

> I am not using fence. If I disable pacemaker,how node join cluster
> (for first example in video - master/slave changing)? So I need a check
> script for fault states :(
>
> And thank you for reply
>
> On 20 Feb 2021 Sat at 23:40 damiano giuliani <
> damianogiulian...@gmail.com> wrote:
>
>> Hi,
>>
>> Have you correcly configure a working fencing mechanism?without it
>> you cant rely on a safe and consistent environment.
>> My suggestion is to disable the autostart services (and so the
>> autojoin into the cluster) on both nodes.
>>  if there is a fault you have to investigate before you rejoin the
>> old fault master node.
>> Pacemaker (and paf if u are using it) as far i know,  doesnt support
>> the autoheal of the old master, so you should resync or pg_rewind 
>> eveythime
>> there is a fault.
>>
>>
>>
>> On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:
>>
>>> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
>>> master/slave states successfully run. But if all nodes are masters,
>>> pacemaker can't repair its unlikely send command 'pcs resources 
>>> cleanup'.
>>> Wheras I set 60s in resource config. How can I fix it?
>>>
>>> StackOverFlow link:
>>> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>>>
>>> Thanks
>>>
>>> İsmet BALAT
>>>
>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
 ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users

 ClusterLabs home: https://www.clusterlabs.org/

>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>

Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-21 Thread İsmet BALAT
I am testing aşk scenarios because I will use real machines with pacemaker.
Scenarios;

1-
node1  master
node2 slave
Shutting node1, then node2 become master
Successfully

2-
node1  slave
node2 master
Shutting node2, then node1 become master
Successfully

3-
node1  slave
node2 slave
One node become master after 60s
Successfully

4-
node1  master
node2 master
First machine fail, and not fix unlike send command cleanup
Fail

I haven’t got physical fencing device. But all machines must online for
redundancy. So I guess we don’t use fencing. Because servers havent got
connection for remote help and internet. They must fix their:)



On 21 Feb 2021 Sun at 12:14 damiano giuliani 
wrote:

> My question is:
> Why you are pausing one VM?there is any specific scope in that?you should
> never have 2 master resources, pausing one vm could make unexpected
> behaviours.
> If you are testing failovers or simulated faults you must configure a
> fencing mechanism.
> Dont expect your cluster is working properly without it.
>
> On Sun, 21 Feb 2021, 07:29 İsmet BALAT,  wrote:
>
>> Sorry, I am in +3utc and was sleeping. I will try first fix node, then
>> start cluster. Thank you
>>
>> On 21 Feb 2021 Sun at 00:00 damiano giuliani 
>> wrote:
>>
>>> resources configured in a master/slave mode
>>> If you got 2 masters something is not working right. You should never
>>> have 2 node in master.
>>> Disable pacemaker and corosync services to autostart on both nodes
>>> systemctl disable corosync
>>> Systemctl disable pacemaker
>>>
>>> You can start the faulty node using pcs cli:
>>> pcs cluster start
>>>
>>> You can start the whole cluster using
>>> pcs cluster start --all
>>>
>>> First of all configure a fencing mechanism to make the cluster
>>> consistent. Its mandatory.
>>>
>>>
>>>
>>> On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  wrote:
>>>
 I am not using fence. If I disable pacemaker,how node join cluster (for
 first example in video - master/slave changing)? So I need a check script
 for fault states :(

 And thank you for reply

 On 20 Feb 2021 Sat at 23:40 damiano giuliani <
 damianogiulian...@gmail.com> wrote:

> Hi,
>
> Have you correcly configure a working fencing mechanism?without it you
> cant rely on a safe and consistent environment.
> My suggestion is to disable the autostart services (and so the
> autojoin into the cluster) on both nodes.
>  if there is a fault you have to investigate before you rejoin the old
> fault master node.
> Pacemaker (and paf if u are using it) as far i know,  doesnt support
> the autoheal of the old master, so you should resync or pg_rewind 
> eveythime
> there is a fault.
>
>
>
> On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:
>
>> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
>> master/slave states successfully run. But if all nodes are masters,
>> pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
>> Wheras I set 60s in resource config. How can I fix it?
>>
>> StackOverFlow link:
>> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>>
>> Thanks
>>
>> İsmet BALAT
>>
> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
 ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users

 ClusterLabs home: https://www.clusterlabs.org/

>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-21 Thread damiano giuliani
My question is:
Why you are pausing one VM?there is any specific scope in that?you should
never have 2 master resources, pausing one vm could make unexpected
behaviours.
If you are testing failovers or simulated faults you must configure a
fencing mechanism.
Dont expect your cluster is working properly without it.

On Sun, 21 Feb 2021, 07:29 İsmet BALAT,  wrote:

> Sorry, I am in +3utc and was sleeping. I will try first fix node, then
> start cluster. Thank you
>
> On 21 Feb 2021 Sun at 00:00 damiano giuliani 
> wrote:
>
>> resources configured in a master/slave mode
>> If you got 2 masters something is not working right. You should never
>> have 2 node in master.
>> Disable pacemaker and corosync services to autostart on both nodes
>> systemctl disable corosync
>> Systemctl disable pacemaker
>>
>> You can start the faulty node using pcs cli:
>> pcs cluster start
>>
>> You can start the whole cluster using
>> pcs cluster start --all
>>
>> First of all configure a fencing mechanism to make the cluster
>> consistent. Its mandatory.
>>
>>
>>
>> On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  wrote:
>>
>>> I am not using fence. If I disable pacemaker,how node join cluster (for
>>> first example in video - master/slave changing)? So I need a check script
>>> for fault states :(
>>>
>>> And thank you for reply
>>>
>>> On 20 Feb 2021 Sat at 23:40 damiano giuliani <
>>> damianogiulian...@gmail.com> wrote:
>>>
 Hi,

 Have you correcly configure a working fencing mechanism?without it you
 cant rely on a safe and consistent environment.
 My suggestion is to disable the autostart services (and so the autojoin
 into the cluster) on both nodes.
  if there is a fault you have to investigate before you rejoin the old
 fault master node.
 Pacemaker (and paf if u are using it) as far i know,  doesnt support
 the autoheal of the old master, so you should resync or pg_rewind eveythime
 there is a fault.



 On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:

> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
> master/slave states successfully run. But if all nodes are masters,
> pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
> Wheras I set 60s in resource config. How can I fix it?
>
> StackOverFlow link:
> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>
> Thanks
>
> İsmet BALAT
>
 ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
 ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users

 ClusterLabs home: https://www.clusterlabs.org/

>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-20 Thread İsmet BALAT
Sorry, I am in +3utc and was sleeping. I will try first fix node, then
start cluster. Thank you

On 21 Feb 2021 Sun at 00:00 damiano giuliani 
wrote:

> resources configured in a master/slave mode
> If you got 2 masters something is not working right. You should never have
> 2 node in master.
> Disable pacemaker and corosync services to autostart on both nodes
> systemctl disable corosync
> Systemctl disable pacemaker
>
> You can start the faulty node using pcs cli:
> pcs cluster start
>
> You can start the whole cluster using
> pcs cluster start --all
>
> First of all configure a fencing mechanism to make the cluster consistent.
> Its mandatory.
>
>
>
> On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  wrote:
>
>> I am not using fence. If I disable pacemaker,how node join cluster (for
>> first example in video - master/slave changing)? So I need a check script
>> for fault states :(
>>
>> And thank you for reply
>>
>> On 20 Feb 2021 Sat at 23:40 damiano giuliani 
>> wrote:
>>
>>> Hi,
>>>
>>> Have you correcly configure a working fencing mechanism?without it you
>>> cant rely on a safe and consistent environment.
>>> My suggestion is to disable the autostart services (and so the autojoin
>>> into the cluster) on both nodes.
>>>  if there is a fault you have to investigate before you rejoin the old
>>> fault master node.
>>> Pacemaker (and paf if u are using it) as far i know,  doesnt support the
>>> autoheal of the old master, so you should resync or pg_rewind eveythime
>>> there is a fault.
>>>
>>>
>>>
>>> On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:
>>>
 I am using Pacemaker with Centos 8 and Postgresql 12. Failover
 master/slave states successfully run. But if all nodes are masters,
 pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
 Wheras I set 60s in resource config. How can I fix it?

 StackOverFlow link:
 https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state

 Thanks

 İsmet BALAT

>>> ___
 Manage your subscription:
 https://lists.clusterlabs.org/mailman/listinfo/users

 ClusterLabs home: https://www.clusterlabs.org/

>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-20 Thread damiano giuliani
resources configured in a master/slave mode
If you got 2 masters something is not working right. You should never have
2 node in master.
Disable pacemaker and corosync services to autostart on both nodes
systemctl disable corosync
Systemctl disable pacemaker

You can start the faulty node using pcs cli:
pcs cluster start

You can start the whole cluster using
pcs cluster start --all

First of all configure a fencing mechanism to make the cluster consistent.
Its mandatory.



On Sat, 20 Feb 2021, 21:47 İsmet BALAT,  wrote:

> I am not using fence. If I disable pacemaker,how node join cluster (for
> first example in video - master/slave changing)? So I need a check script
> for fault states :(
>
> And thank you for reply
>
> On 20 Feb 2021 Sat at 23:40 damiano giuliani 
> wrote:
>
>> Hi,
>>
>> Have you correcly configure a working fencing mechanism?without it you
>> cant rely on a safe and consistent environment.
>> My suggestion is to disable the autostart services (and so the autojoin
>> into the cluster) on both nodes.
>>  if there is a fault you have to investigate before you rejoin the old
>> fault master node.
>> Pacemaker (and paf if u are using it) as far i know,  doesnt support the
>> autoheal of the old master, so you should resync or pg_rewind eveythime
>> there is a fault.
>>
>>
>>
>> On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:
>>
>>> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
>>> master/slave states successfully run. But if all nodes are masters,
>>> pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
>>> Wheras I set 60s in resource config. How can I fix it?
>>>
>>> StackOverFlow link:
>>> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>>>
>>> Thanks
>>>
>>> İsmet BALAT
>>>
>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-20 Thread İsmet BALAT
I am not using fence. If I disable pacemaker,how node join cluster (for
first example in video - master/slave changing)? So I need a check script
for fault states :(

And thank you for reply

On 20 Feb 2021 Sat at 23:40 damiano giuliani 
wrote:

> Hi,
>
> Have you correcly configure a working fencing mechanism?without it you
> cant rely on a safe and consistent environment.
> My suggestion is to disable the autostart services (and so the autojoin
> into the cluster) on both nodes.
>  if there is a fault you have to investigate before you rejoin the old
> fault master node.
> Pacemaker (and paf if u are using it) as far i know,  doesnt support the
> autoheal of the old master, so you should resync or pg_rewind eveythime
> there is a fault.
>
>
>
> On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:
>
>> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
>> master/slave states successfully run. But if all nodes are masters,
>> pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
>> Wheras I set 60s in resource config. How can I fix it?
>>
>> StackOverFlow link:
>> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>>
>> Thanks
>>
>> İsmet BALAT
>>
> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker Fail Master-Master State

2021-02-20 Thread damiano giuliani
Hi,

Have you correcly configure a working fencing mechanism?without it you cant
rely on a safe and consistent environment.
My suggestion is to disable the autostart services (and so the autojoin
into the cluster) on both nodes.
 if there is a fault you have to investigate before you rejoin the old
fault master node.
Pacemaker (and paf if u are using it) as far i know,  doesnt support the
autoheal of the old master, so you should resync or pg_rewind eveythime
there is a fault.



On Sat, 20 Feb 2021, 19:03 İsmet BALAT,  wrote:

> I am using Pacemaker with Centos 8 and Postgresql 12. Failover
> master/slave states successfully run. But if all nodes are masters,
> pacemaker can't repair its unlikely send command 'pcs resources cleanup'.
> Wheras I set 60s in resource config. How can I fix it?
>
> StackOverFlow link:
> https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
>
> Thanks
>
> İsmet BALAT
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 1.1 support period

2021-02-08 Thread Andrei Zheregelia
Hello Ken,

Thank you for reply.

Best Regards,
Andrei

Andrei Zheregelia| Senior Software Engineer| DSR Corporation
| E-mail:
andrei.zherege...@dsr-corporation.com


DSR logo



CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents,
files or previous e-mail messages attached to it, is confidential and/or
legally privileged. If you are not the intended recipient, or a person
responsible for delivering it to the intended recipient, any use,
disclosure or copying of this email or any attachments is expressly
prohibited. If you have received this transmission in error, please
immediately notify the sender by reply email, and destroy the original
transmission and its attachments without reading or saving in any manner.

VIRUS: Every care has been taken to ensure this email and its
attachments are virus free, however, any loss or damage incurred in
using this email is not the sender's responsibility. It is your
responsibility to ensure virus checks are completed before installing
any data sent in this email to your computer.

On 2/4/21 5:19 PM, Ken Gaillot wrote:
> It ended with 1.1.24 in December.
>
> We will still accept pull requests that backport compatible fixes to
> the 1.1 branch, in case someone wants to make work easily available,
> but there will be no more releases, and we no longer test it.
>
> On Wed, 2021-02-03 at 18:47 +, Andrei Zheregelia wrote:
>> Hello,
>>
>> I would like to clarify support period for Pacemaker 1.1.
>> Until which year it is planned to backport bugfixes into 1.1 branch
>> and
>> create releases?
>>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker 1.1 support period

2021-02-04 Thread Ken Gaillot
It ended with 1.1.24 in December.

We will still accept pull requests that backport compatible fixes to
the 1.1 branch, in case someone wants to make work easily available,
but there will be no more releases, and we no longer test it.

On Wed, 2021-02-03 at 18:47 +, Andrei Zheregelia wrote:
> Hello,
> 
> I would like to clarify support period for Pacemaker 1.1.
> Until which year it is planned to backport bugfixes into 1.1 branch
> and
> create releases?
> 
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker 2.0.5 version pcs status resources command is not working

2021-02-04 Thread Tomas Jelinek

Hi,

pcs-0.9 does not support pacemaker => 2.0.0. You can go with pcs-0.9 + 
corosync < 3 + pacemaker 1.x OR pcs-0.10 + corosync 3.x + pacemaker 2.x. 
Combination of corosync 2 + pacemaker 2 is not supported in any pcs 
version, even though it may work to some degree.


Regards,
Tomas


Dne 03. 02. 21 v 22:28 Reid Wahl napsal(a):
With that in mind, I'd suggest upgrading to a newer pcs version if 
possible. If not, then you may have to do something more hack-y, like 
`pcs status | grep '(.*:.*):'`.


On Wed, Feb 3, 2021 at 1:26 PM Reid Wahl > wrote:


Looks like pcs-0.9 isn't fully compatible with pacemaker >= 2.0.3.
   -

https://github.com/ClusterLabs/pcs/commit/0cf06b79f6dcabb780ee1fa7fee0565d73789329



The resource_status() function in older pcs versions doesn't match
the lines in the crm_mon output of newer pacemaker versions.

On Wed, Feb 3, 2021 at 9:10 AM S Sathish S mailto:s.s.sath...@ericsson.com>> wrote:

Hi Team,

__ __

In latest pacemaker version 2.0.5 we are not getting "pcs status
resource" command output but in older version we used to get the
output.

__ __

Kindly let us know any already command to get pcs full list
resource.

__ __

*Latest Pacemaker version* :

pacemaker-2.0.5 -->
https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.5


corosync-2.4.4 -->
https://github.com/corosync/corosync/tree/v2.4.4


pcs-0.9.169

__ __

[root@node2 ~]# pcs status resources

[root@node2 ~]#

__ __

*Older Pacemaker version* : 

__ __

pacemaker-2.0.2 -->
https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.2


corosync-2.4.4 -->
https://github.com/corosync/corosync/tree/v2.4.4


pcs-0.9.169

__ __

[root@node1 ~]# pcs status resources

TOMCAT_node1 (ocf::provider:TOMCAT_RA):  Started node1

HEALTHMONITOR_node1  (ocf::provider:HealthMonitor_RA):  
Started node1


SNMP_node1   (ocf::pacemaker:ClusterMon):    Started node1

[root@node1 ~]#

__ __

Thanks and Regards,

S Sathish S

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users


ClusterLabs home: https://www.clusterlabs.org/




-- 
Regards,


Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA



--
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker 2.0.5 version pcs status resources command is not working

2021-02-03 Thread Reid Wahl
With that in mind, I'd suggest upgrading to a newer pcs version if
possible. If not, then you may have to do something more hack-y, like `pcs
status | grep '(.*:.*):'`.

On Wed, Feb 3, 2021 at 1:26 PM Reid Wahl  wrote:

> Looks like pcs-0.9 isn't fully compatible with pacemaker >= 2.0.3.
>   -
> https://github.com/ClusterLabs/pcs/commit/0cf06b79f6dcabb780ee1fa7fee0565d73789329
>
> The resource_status() function in older pcs versions doesn't match the
> lines in the crm_mon output of newer pacemaker versions.
>
> On Wed, Feb 3, 2021 at 9:10 AM S Sathish S 
> wrote:
>
>> Hi Team,
>>
>>
>>
>> In latest pacemaker version 2.0.5 we are not getting "pcs status
>> resource" command output but in older version we used to get the output.
>>
>>
>>
>> Kindly let us know any already command to get pcs full list resource.
>>
>>
>>
>> *Latest Pacemaker version* :
>>
>> pacemaker-2.0.5 -->
>> https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.5
>>
>> corosync-2.4.4 -->  https://github.com/corosync/corosync/tree/v2.4.4
>>
>> pcs-0.9.169
>>
>>
>>
>> [root@node2 ~]# pcs status resources
>>
>> [root@node2 ~]#
>>
>>
>>
>> *Older Pacemaker version* :
>>
>>
>>
>> pacemaker-2.0.2 -->
>> https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.2
>>
>> corosync-2.4.4 -->  https://github.com/corosync/corosync/tree/v2.4.4
>>
>> pcs-0.9.169
>>
>>
>>
>> [root@node1 ~]# pcs status resources
>>
>> TOMCAT_node1 (ocf::provider:TOMCAT_RA):  Started node1
>>
>> HEALTHMONITOR_node1  (ocf::provider:HealthMonitor_RA):   Started node1
>>
>> SNMP_node1   (ocf::pacemaker:ClusterMon):Started node1
>>
>> [root@node1 ~]#
>>
>>
>>
>> Thanks and Regards,
>>
>> S Sathish S
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
>


-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker 2.0.5 version pcs status resources command is not working

2021-02-03 Thread Reid Wahl
Looks like pcs-0.9 isn't fully compatible with pacemaker >= 2.0.3.
  -
https://github.com/ClusterLabs/pcs/commit/0cf06b79f6dcabb780ee1fa7fee0565d73789329

The resource_status() function in older pcs versions doesn't match the
lines in the crm_mon output of newer pacemaker versions.

On Wed, Feb 3, 2021 at 9:10 AM S Sathish S  wrote:

> Hi Team,
>
>
>
> In latest pacemaker version 2.0.5 we are not getting "pcs status resource"
> command output but in older version we used to get the output.
>
>
>
> Kindly let us know any already command to get pcs full list resource.
>
>
>
> *Latest Pacemaker version* :
>
> pacemaker-2.0.5 -->
> https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.5
>
> corosync-2.4.4 -->  https://github.com/corosync/corosync/tree/v2.4.4
>
> pcs-0.9.169
>
>
>
> [root@node2 ~]# pcs status resources
>
> [root@node2 ~]#
>
>
>
> *Older Pacemaker version* :
>
>
>
> pacemaker-2.0.2 -->
> https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.2
>
> corosync-2.4.4 -->  https://github.com/corosync/corosync/tree/v2.4.4
>
> pcs-0.9.169
>
>
>
> [root@node1 ~]# pcs status resources
>
> TOMCAT_node1 (ocf::provider:TOMCAT_RA):  Started node1
>
> HEALTHMONITOR_node1  (ocf::provider:HealthMonitor_RA):   Started node1
>
> SNMP_node1   (ocf::pacemaker:ClusterMon):Started node1
>
> [root@node1 ~]#
>
>
>
> Thanks and Regards,
>
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker-2.0.5 build/install error

2021-01-29 Thread Ken Gaillot
Hi Sathish,

It looks like it's trying to build pacemaker-remoted even though you
don't have the necessary prerequisites. I'll have to look into that for
the next release.

You could try installing the GnuTLS library first, or try commenting
out sbin_PROGRAMS in daemons/execd/Makefile.am.

On Fri, 2021-01-29 at 10:36 +, S Sathish S wrote:
> Hi Team,
>  
> we are trying to building the latest 2.0.5 getting below error , can
> we know reason for the same
>  
> https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.5
>  
> $ ./autogen.sh
> $ ./configure --prefix /root/pacemaker/
>  
> Above two command sucessed without error.
>  
> $ make --> While simple install getting below error.
>  
> Making all in controld
> make[3]: Entering directory `/root/sathish/pacemaker-Pacemaker-
> 2.0.5/daemons/controld'
> make[3]: Nothing to be done for `all'.
> make[3]: Leaving directory `/root/sathish/pacemaker-Pacemaker-
> 2.0.5/daemons/controld'
> Making all in execd
> make[3]: Entering directory `/root/sathish/pacemaker-Pacemaker-
> 2.0.5/daemons/execd'
> gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include  -DSUPPORT_REMOTE
> -I../../include -I../../include -I../../libltdl -I../../libltdl
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  
> -I/usr/include/libxml2   -I/root/pacemaker1/include/heartbeat -fPIE
> -g -O2  -ggdb  -fgnu89-inline -Wall -Waggregate-return -Wbad-
> function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-
> labels -Wfloat-equal -Wformat-security -Wmissing-prototypes
> -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-
> aliasing -Wpointer-arith -Wwrite-strings -Wunused-but-set-variable
> -Wformat=2 -Wformat-nonliteral -fstack-protector-strong -Werror -MT
> pacemaker_remoted-execd_commands.o -MD -MP -MF
> .deps/pacemaker_remoted-execd_commands.Tpo -c -o pacemaker_remoted-
> execd_commands.o `test -f 'execd_commands.c' || echo
> './'`execd_commands.c
> execd_commands.c: In function ‘process_lrmd_signon’:
> execd_commands.c:1530:55: error: ‘struct pcmk__remote_s’ has no
> member named ‘tls_handshake_complete’
>  if ((client->remote != NULL) && client->remote-
> >tls_handshake_complete) {
>^
> make[3]: *** [pacemaker_remoted-execd_commands.o] Error 1
> make[3]: Leaving directory `/root/sathish/pacemaker-Pacemaker-
> 2.0.5/daemons/execd'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/root/sathish/pacemaker-Pacemaker-
> 2.0.5/daemons'
> make[1]: *** [core] Error 1
> make[1]: Leaving directory `/root/sathish/pacemaker-Pacemaker-2.0.5'
> make: *** [build] Error 2
>  
> Thanks and Regards,
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker alerts node_selector

2020-11-25 Thread Reid Wahl
I created https://github.com/ClusterLabs/pacemaker/pull/2241 to
correct the schema mistake.

On Wed, Nov 25, 2020 at 10:51 PM  wrote:
>
> Hi, thank you for your reply.
>
> I tried it this way:
>
> 
>path="/usr/share/pacemaker/alerts/test_alert.sh">
>   
> hana_node_1
>   
> 
>id="test_alert_1-instance_attributes-HANASID"/>
>id="test_alert_1-instance_attributes-AVZone"/>
> 
>  value="/usr/share/pacemaker/alerts/test_alert.sh"/>
>   
>   
> 
>
>
> During the save the select is been reset to
>   
>   
>  

The schema shows that  has to be empty.

  

  

  

  
  

  

  
  

  

  
  

  

  
  

  

  

  


> Do I need to specify in addition to select_nodes the section  name="select_attributes">

The  element configures the agent to receive alerts
when a node attribute changes.

For a bit more detail on how these  values work, see:
  - 
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_alert_filters

So it doesn't seem like this would be the way to configure alerts for
a particular node, which is what you've said you want to do.

I'm not very familiar with alerts off the top of my head, so I would
have to research this further unless someone else jumps in to answer
first. However, based on a cursory reading of the doc, it looks like
the  attributes do not provide a way to filter by a
particular node. The  element does allow you to
filter by node **attribute**. But the  element simply
filters "node events" in general, rather than filtering by node.
(Anyone correct me if I'm wrong.)

>
> Thank you, Alfred
>
>
> -Ursprüngliche Nachricht-----
> Von: Users  Im Auftrag von Reid Wahl
> Gesendet: Donnerstag, 26. November 2020 05:30
> An: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Betreff: Re: [ClusterLabs] pacemaker alerts node_selector
>
> What version of Pacemaker are you using, and how does it behave?
>
> Depending on the error/misbehavior you're experiencing, this might have been 
> me. Looks like in commit bd451763[1], I copied the alerts-2.9.rng[2] schema 
> instead of the alerts-2.10.rng[3] schema.
>
> [1] https://github.com/ClusterLabs/pacemaker/commit/bd451763
> [2] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.9.rng
> [3] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.10.rng
>
> On Wed, Nov 25, 2020 at 9:31 AM  wrote:
> >
> > Hi, I would like to trigger an external script, if something happens on a 
> > specific node.
> >
> >
> >
> > In the documentation of alerts, i can see  but whatever I 
> > put into the XML, it’s not working…..
> >
> >
> >
> > configuration>
> >
> > 
> >
> > 
> >
> > 
> >
> >   
> >
> >   
> >
> > 
> >
> >  > value="someu...@example.com"/>
> >
> > 
> >
> > 
> >
> > 
> >
> > Can anybody send me an example about the right syntax ?
> >
> >
> >
> > Thank you very much……
> >
> >
> >
> > Best regards, Alfred
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery 
> - ClusterHA
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker alerts node_selector

2020-11-25 Thread vockinger
Hi, thank you for your reply.

I tried it this way:


  
  
hana_node_1  
  

  
  


  
  



During the save the select is been reset to 
  

 


Do I need to specify in addition to select_nodes the section 

Thank you, Alfred


-Ursprüngliche Nachricht-
Von: Users  Im Auftrag von Reid Wahl
Gesendet: Donnerstag, 26. November 2020 05:30
An: Cluster Labs - All topics related to open-source clustering welcomed 

Betreff: Re: [ClusterLabs] pacemaker alerts node_selector

What version of Pacemaker are you using, and how does it behave?

Depending on the error/misbehavior you're experiencing, this might have been 
me. Looks like in commit bd451763[1], I copied the alerts-2.9.rng[2] schema 
instead of the alerts-2.10.rng[3] schema.

[1] https://github.com/ClusterLabs/pacemaker/commit/bd451763
[2] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.9.rng
[3] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.10.rng

On Wed, Nov 25, 2020 at 9:31 AM  wrote:
>
> Hi, I would like to trigger an external script, if something happens on a 
> specific node.
>
>
>
> In the documentation of alerts, i can see  but whatever I put 
> into the XML, it’s not working…..
>
>
>
> configuration>
>
> 
>
> 
>
> 
>
>   
>
>   
>
> 
>
>  value="someu...@example.com"/>
>
> 
>
> 
>
> 
>
> Can anybody send me an example about the right syntax ?
>
>
>
> Thank you very much……
>
>
>
> Best regards, Alfred
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



--
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - 
ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker alerts node_selector

2020-11-25 Thread Reid Wahl
What version of Pacemaker are you using, and how does it behave?

Depending on the error/misbehavior you're experiencing, this might
have been me. Looks like in commit bd451763[1], I copied the
alerts-2.9.rng[2] schema instead of the alerts-2.10.rng[3] schema.

[1] https://github.com/ClusterLabs/pacemaker/commit/bd451763
[2] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.9.rng
[3] https://github.com/ClusterLabs/pacemaker/blob/master/xml/alerts-2.10.rng

On Wed, Nov 25, 2020 at 9:31 AM  wrote:
>
> Hi, I would like to trigger an external script, if something happens on a 
> specific node.
>
>
>
> In the documentation of alerts, i can see  but whatever I put 
> into the XML, it’s not working…..
>
>
>
> configuration>
>
> 
>
> 
>
> 
>
>   
>
>   
>
> 
>
> 
>
> 
>
> 
>
> 
>
> Can anybody send me an example about the right syntax ?
>
>
>
> Thank you very much……
>
>
>
> Best regards, Alfred
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-05 Thread Igor Tverdovskiy
Riccardo,

> Only working way was to kill/reload corosync/pacemaker couple (I havent
tried with reloading only pacemaker) on both nodes.

I did a migration a long time ago, but haven't experienced any issues,
probably because it was a part of a regular servers update (which includes
reboot).
Any way even if I had to stop all nodes and tolerate to system downtime I
would do this, the benefit is huge. (BTW you may try maintenance-mode which
allows to stop corosync/pacemaker without downtime)

On Mon, Oct 5, 2020 at 9:49 AM Riccardo Manfrin <
riccardo.manf...@athonet.com> wrote:

> Thanks Igor,
> I'm afraid I tried this path too although I did not mention in the issue.
>
> The procedure I followed was to add the "name" attribute to the node
> list in corosync.conf, with the current hostname, than reboot both nodes
> to start clean, than do the previously described procedure AND sed s///g
> corosync.conf to update the name attribute value to the
> new hostname.
>
> As I rebooted one of the two nodes the stubborn pacemaker brought back
> the old hostnames no matter what. Only working way was
> to kill/reload corosync/pacemaker couple (I havent tried with reloading
> only pacemaker) on both nodes.
>
> Thanks for the suggestion though
> R
>
> On 02/10/20 20:45, Igor Tverdovskiy wrote:
> > Hi Riccardo,
> >
> > As I see you have already handled the issue, but I would recommend using
> > static node names
> > in the corosync.conf instead of reference to hostname. I did so years
> > ago and now I have no issues
> > with hostname changes.
> >
> > e.g.:
> > node {
> >  ring0_addr: 1.1.1.1
> >  name: my.node
> >  nodeid: 123456
> >  }
> >
> > On Thu, Oct 1, 2020 at 10:10 PM Riccardo Manfrin
> > mailto:riccardo.manf...@athonet.com>>
> wrote:
> >
> > Thank you for your suggestion Ken; I'm indeed on Centos7, but using
> >
> >  hostnamectl set-hostname newHostname
> >
> > in place of
> >
> >  hostname -f /etc/hostname
> >
> > didn't have any beneficial effect. As soon as I powered off one of
> the
> > two nodes, the other one took the old hostnames back and drifted out
> of
> > sync.
> >
> > The only way of doing this in the end was
> >
> > 1. rebooting the machine (close in time so that the first new
> corosync
> > instance coming up never ever sees the old instance from the other
> node,
> > or it gets the old hostnames again)
> >
> > 2. killing pacemakerd and corosync (and letting systemd bring them
> on up
> > again).
> >
> > This second method appears to be the cleanest and more robust, and
> has
> > the advantage that while primitives/services are unsupervised, they
> are
> > not reloaded.
> >
> > I hope this can be of help to someone although I tend to think that
> my
> > case was really a rare beast not to be seen around.
> >
> > R
> >
> > On 01/10/20 16:41, Ken Gaillot wrote:
> >  > Does "uname -n" also revert?
> >  >
> >  > It looks like you're using RHEL 7 or a derivative -- if so, use
> >  > hostnamectl to change the host name. That will make sure it's
> updated
> >  > in the right places.
> > 
> >
> > Riccardo Manfrin
> > R&D DEPARTMENT
> > Web |
> > LinkedIn t +39 (0)444
> > 750045
> > e riccardo.manf...@athonet.com
> >  riccardo.manf...@athonet.com
> > >
> > [https://www.athonet.com/signature/logo_athonet.png]<
> https://www.athonet.com/>
> > ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
> > This email and any attachments are confidential and intended solely
> > for the use of the intended recipient. If you are not the named
> > addressee, please be aware that you shall not distribute, copy, use
> > or disclose this email. If you have received this email by error,
> > please notify us immediately and delete this email from your system.
> > Email transmission cannot be guaranteed to be secured or error-free
> > or not to contain viruses. Athonet S.r.l. processes any personal
> > data exchanged in email correspondence in accordance with EU Reg.
> > 679/2016 (GDPR) - you may find here the privacy policy with
> > information on such processing and your rights. Any views or
> > opinions presented in this email are solely those of the sender and
> > do not necessarily represent those of Athonet S.r.l.
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: 

Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-04 Thread Riccardo Manfrin

Thanks Igor,
I'm afraid I tried this path too although I did not mention in the issue.

The procedure I followed was to add the "name" attribute to the node
list in corosync.conf, with the current hostname, than reboot both nodes
to start clean, than do the previously described procedure AND sed s///g
corosync.conf to update the name attribute value to the
new hostname.

As I rebooted one of the two nodes the stubborn pacemaker brought back
the old hostnames no matter what. Only working way was
to kill/reload corosync/pacemaker couple (I havent tried with reloading
only pacemaker) on both nodes.

Thanks for the suggestion though
R

On 02/10/20 20:45, Igor Tverdovskiy wrote:

Hi Riccardo,

As I see you have already handled the issue, but I would recommend using
static node names
in the corosync.conf instead of reference to hostname. I did so years
ago and now I have no issues
with hostname changes.

e.g.:
node {
 ring0_addr: 1.1.1.1
 name: my.node
 nodeid: 123456
 }

On Thu, Oct 1, 2020 at 10:10 PM Riccardo Manfrin
mailto:riccardo.manf...@athonet.com>> wrote:

Thank you for your suggestion Ken; I'm indeed on Centos7, but using

 hostnamectl set-hostname newHostname

in place of

 hostname -f /etc/hostname

didn't have any beneficial effect. As soon as I powered off one of the
two nodes, the other one took the old hostnames back and drifted out of
sync.

The only way of doing this in the end was

1. rebooting the machine (close in time so that the first new corosync
instance coming up never ever sees the old instance from the other node,
or it gets the old hostnames again)

2. killing pacemakerd and corosync (and letting systemd bring them on up
again).

This second method appears to be the cleanest and more robust, and has
the advantage that while primitives/services are unsupervised, they are
not reloaded.

I hope this can be of help to someone although I tend to think that my
case was really a rare beast not to be seen around.

R

On 01/10/20 16:41, Ken Gaillot wrote:
 > Does "uname -n" also revert?
 >
 > It looks like you're using RHEL 7 or a derivative -- if so, use
 > hostnamectl to change the host name. That will make sure it's updated
 > in the right places.


Riccardo Manfrin
R&D DEPARTMENT
Web |
LinkedIn t +39 (0)444
750045
e riccardo.manf...@athonet.com
>

[https://www.athonet.com/signature/logo_athonet.png]
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely
for the use of the intended recipient. If you are not the named
addressee, please be aware that you shall not distribute, copy, use
or disclose this email. If you have received this email by error,
please notify us immediately and delete this email from your system.
Email transmission cannot be guaranteed to be secured or error-free
or not to contain viruses. Athonet S.r.l. processes any personal
data exchanged in email correspondence in accordance with EU Reg.
679/2016 (GDPR) - you may find here the privacy policy with
information on such processing and your rights. Any views or
opinions presented in this email are solely those of the sender and
do not necessarily represent those of Athonet S.r.l.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




Riccardo Manfrin
R&D DEPARTMENT
Web | 
LinkedIn t +39 (0)444 750045
e riccardo.manf...@athonet.com
[https://www.athonet.com/signature/logo_athonet.png]
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use 
of the intended recipient. If you are not the named addressee, please be aware 
that you shall not distribute, copy, use or disclose this email. If you have 
received this email by error, please notify us immediately and delete this 
email from your system. Email transmission cannot be guaranteed to be secured 
or error-free or not to contain viruses. Athonet S.r.l. processes any personal 
data exchanged in email correspondence in accordance with EU Reg. 679/201

Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-02 Thread Igor Tverdovskiy
Hi Riccardo,

As I see you have already handled the issue, but I would recommend using
static node names
in the corosync.conf instead of reference to hostname. I did so years ago
and now I have no issues
with hostname changes.

e.g.:
node {
ring0_addr: 1.1.1.1
name: my.node
nodeid: 123456
}

On Thu, Oct 1, 2020 at 10:10 PM Riccardo Manfrin <
riccardo.manf...@athonet.com> wrote:

> Thank you for your suggestion Ken; I'm indeed on Centos7, but using
>
> hostnamectl set-hostname newHostname
>
> in place of
>
> hostname -f /etc/hostname
>
> didn't have any beneficial effect. As soon as I powered off one of the
> two nodes, the other one took the old hostnames back and drifted out of
> sync.
>
> The only way of doing this in the end was
>
> 1. rebooting the machine (close in time so that the first new corosync
> instance coming up never ever sees the old instance from the other node,
> or it gets the old hostnames again)
>
> 2. killing pacemakerd and corosync (and letting systemd bring them on up
> again).
>
> This second method appears to be the cleanest and more robust, and has
> the advantage that while primitives/services are unsupervised, they are
> not reloaded.
>
> I hope this can be of help to someone although I tend to think that my
> case was really a rare beast not to be seen around.
>
> R
>
> On 01/10/20 16:41, Ken Gaillot wrote:
> > Does "uname -n" also revert?
> >
> > It looks like you're using RHEL 7 or a derivative -- if so, use
> > hostnamectl to change the host name. That will make sure it's updated
> > in the right places.
> 
>
> Riccardo Manfrin
> R&D DEPARTMENT
> Web | LinkedIn<
> https://www.linkedin.com/company/athonet/> t +39 (0)444 750045
> e riccardo.manf...@athonet.com
> [https://www.athonet.com/signature/logo_athonet.png]<
> https://www.athonet.com/>
> ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
> This email and any attachments are confidential and intended solely for
> the use of the intended recipient. If you are not the named addressee,
> please be aware that you shall not distribute, copy, use or disclose this
> email. If you have received this email by error, please notify us
> immediately and delete this email from your system. Email transmission
> cannot be guaranteed to be secured or error-free or not to contain viruses.
> Athonet S.r.l. processes any personal data exchanged in email
> correspondence in accordance with EU Reg. 679/2016 (GDPR) - you may find
> here the privacy policy with information on such processing and your
> rights. Any views or opinions presented in this email are solely those of
> the sender and do not necessarily represent those of Athonet S.r.l.
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-01 Thread Riccardo Manfrin

Thank you for your suggestion Ken; I'm indeed on Centos7, but using

   hostnamectl set-hostname newHostname

in place of

   hostname -f /etc/hostname

didn't have any beneficial effect. As soon as I powered off one of the
two nodes, the other one took the old hostnames back and drifted out of
sync.

The only way of doing this in the end was

1. rebooting the machine (close in time so that the first new corosync
instance coming up never ever sees the old instance from the other node,
or it gets the old hostnames again)

2. killing pacemakerd and corosync (and letting systemd bring them on up
again).

This second method appears to be the cleanest and more robust, and has
the advantage that while primitives/services are unsupervised, they are
not reloaded.

I hope this can be of help to someone although I tend to think that my
case was really a rare beast not to be seen around.

R

On 01/10/20 16:41, Ken Gaillot wrote:

Does "uname -n" also revert?

It looks like you're using RHEL 7 or a derivative -- if so, use
hostnamectl to change the host name. That will make sure it's updated
in the right places.



Riccardo Manfrin
R&D DEPARTMENT
Web | 
LinkedIn t +39 (0)444 750045
e riccardo.manf...@athonet.com
[https://www.athonet.com/signature/logo_athonet.png]
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use 
of the intended recipient. If you are not the named addressee, please be aware 
that you shall not distribute, copy, use or disclose this email. If you have 
received this email by error, please notify us immediately and delete this 
email from your system. Email transmission cannot be guaranteed to be secured 
or error-free or not to contain viruses. Athonet S.r.l. processes any personal 
data exchanged in email correspondence in accordance with EU Reg. 679/2016 
(GDPR) - you may find here the privacy policy with information on such 
processing and your rights. Any views or opinions presented in this email are 
solely those of the sender and do not necessarily represent those of Athonet 
S.r.l.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-01 Thread Ken Gaillot
On Thu, 2020-10-01 at 10:40 +0200, Riccardo Manfrin wrote:
> Ciao,
> 
> I'm among the people that have to deal with with the in-famous two
> nodes problem (http://www.beekhof.net/blog/2018/two-node-problems).
> I am not sure if to open a bug for this.. so I'm first off reporting
> on the list.. in the hope to get fast feedback.
> Problem statement
> 
> I have a cluster made by two nodes with a DRBD shared partition which
> some resources (systemd services) have to stick to.
> Software versions
> corosync -v
> Corosync Cluster Engine, version '2.4.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> pacemakerd --version
> Pacemaker 1.1.21-4.el7
> drbdadm --version
> DRBDADM_BUILDTAG=GIT-hash:\ fb98589a8e76783d2c56155c645dbaf02ac7ece7\
> build\ by\ mockbuild@\,\ 2020-04-05\ 03:21:05
> DRBDADM_API_VERSION=2
> DRBD_KERNEL_VERSION_CODE=0x090010
> DRBD_KERNEL_VERSION=9.0.16
> DRBDADM_VERSION_CODE=0x090c02
> DRBDADM_VERSION=9.12.2
> corosync.conf nodes:
> nodelist {
> node {
> ring0_addr: 10.1.3.1
> nodeid: 1
> }
> node {
> ring0_addr: 10.1.3.2
> nodeid: 2
> }
> }
> quorum {
> provider: corosync_votequorum
> two_node: 1
> }
> drbd nodes config:
> resource myresource {
> 
>   volume 0 {
> device/dev/drbd0;
> disk  /dev/mapper/vg0-res--etc;
> meta-disk internal;
>   }
> 
>   on 123z555666y0 {
> node-id 0;
> address 10.1.3.1:7789;
>   }
> 
>   on 123z555666y1 {
> node-id 1;
> address 10.1.3.2:7789;
>   }
> 
>   connection {
> host 123z555666y0;
> host 123z555666y1;
>   }
> 
>   handlers {
> before-resync-target "/usr/lib/drbd/snapshot-resync-target-
> lvm.sh";
> after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-
> lvm.sh";
>   }
> 
> }
> I need to reconfigure the hostname of both the nodes of the cluster.
> I've gathered some literature around
> https://pacemaker.oss.clusterlabs.narkive.com/csHZkR5R/change-hostname
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-name.html
> https://www.suse.com/support/kb/doc/?id=18878 <- DIDN'T  WORK
> https://bugs.clusterlabs.org/show_bug.cgi?id=5265 <- DIDN'T  WORK
> but have not yet found a way to address this (unless with
> simultaneous reboot of both nodes).
> The procedure:
> Update the hostname on both Master and Slave nodes
> update /etc/hostname
> update /etc/hosts
> update system with hostname -F /etc/hostname
> Reconfigure drbd on Master and Slave nodes
> modify drbd.01.conf (attached) to reflect new hostname
> invoke drbdadm adjust all
> Update pacemaker config on Master node only
> crm configure property maintenance-mode=true
> crm configure delete --force 1
> crm configure delete --force 2
> crm configure xml ' 
> 
>   
> 
>   '
> crm configure xml ' 
> 
>   
> 
>   '
> crm resource reprobe
> crm configure refresh
> crm configure property maintenance-mode=false
> Let's say for example that I migrate the hostnames like this
> hostname10 -> hostname20
> hostname11 -> hostname21
> After the above procedure is concluded the cluster is correctly
> reconfigured and when I check with crm_mon or crm status or crm
> configure show xml or even by inspecting the cib.xml I find the
> proper new hostnames fetched by pacemaker/corosync (hostname20 and
> hostname21).
> The documentation reports that pacemaker node name is taken from
> corosync.conf nodelist->ring0_addr if not an ip address: NOT MY CASE
> => skip
> corosync.conf nodelist->name if available: NOT MY CASE => skip
> uname -n [SHOULD BE IN HERE]
> Apparently case number 3 does not apply:
> [root@hostname20 ~]# crm_node -n
> hostname10
> [root@hostname20 ~]# uname -n
> hostname20
> This becomes evident as soon as I reboot/poweroff one of  the two
> nodes: crm_mon which after the reconfiguration was correctly showing
> Online: [ hostname21 hostname20 ]
> "rolls back" the configuration without any notice and starts showing
> the old one
> Online: [ hostname10 ]
> OFFLINE: [ hostname11 ]
> Do you have any idea of where on heath pacemaker is recovering the
> old hostnames ? 

Does "uname -n" also revert?

It looks like you're using RHEL 7 or a derivative -- if so, use
hostnamectl to change the host name. That will make sure it's updated
in the right places.

> 
> I've even checked  the code and see that there are cmaps involved so
> I suspect there's some caching issues involved in this.
> It looks like it is retaining the old hostnames in memory and when
> something .. "fails" it restores them.
> Besides don't blame me for this use case (reconfigure hostnames in a
> two-nodes cluster), as I didn't make it up. I just carry the pain.
> R
> 
> 
> 
>  Riccardo Manfrin
> R&D DEPARTMENT
> Web | LinkedInt +39 (0)444 750045
> e riccardo.manf...@athonet.com
> 
> ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy 
> This email and any attachments are confidential and intended solely
> for the use of 

Re: [ClusterLabs] pacemaker and cluster hostname reconfiguration

2020-10-01 Thread Riccardo Manfrin

Ciao,

[TXT version]

I'm among the people that have to deal with with the in-famous two nodes
problem (http://www.beekhof.net/blog/2018/two-node-problems).

I am not sure if to open a bug for this.. so I'm first off reporting on
the list.. in the hope to get fast feedback.

Problem statement

I have a cluster made by two nodes with a DRBD shared partition which
some resources (systemd services) have to stick to.

Software versions

corosync -v
Corosync Cluster Engine, version '2.4.5'
Copyright (c) 2006-2009 Red Hat, Inc.

pacemakerd --version
Pacemaker 1.1.21-4.el7

drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\
fb98589a8e76783d2c56155c645dbaf02ac7ece7\ build\ by\ mockbuild@\,\
2020-04-05\ 03:21:05
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090010
DRBD_KERNEL_VERSION=9.0.16
DRBDADM_VERSION_CODE=0x090c02
DRBDADM_VERSION=9.12.2

corosync.conf nodes:

nodelist {
node {
ring0_addr: 10.1.3.1
nodeid: 1
}
node {
ring0_addr: 10.1.3.2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}

drbd nodes config:

resource myresource {

  volume 0 {
device/dev/drbd0;
disk  /dev/mapper/vg0-res--etc;
meta-disk internal;
  }

  on 123z555666y0 {
node-id 0;
address 10.1.3.1:7789;
  }

  on 123z555666y1 {
node-id 1;
address 10.1.3.2:7789;
  }

  connection {
host 123z555666y0;
host 123z555666y1;
  }

  handlers {
before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
  }

}

I need to reconfigure the hostname of both the nodes of the cluster.
I've gathered some literature around

https://pacemaker.oss.clusterlabs.narkive.com/csHZkR5R/change-hostname

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-name.html
https://www.suse.com/support/kb/doc/?id=18878 <- DIDN'T  WORK
https://bugs.clusterlabs.org/show_bug.cgi?id=5265 <- DIDN'T  WORK

but have not yet found a way to address this (unless with simultaneous
reboot of both nodes).

The procedure:

Update the hostname on both Master and Slave nodes
update /etc/hostname
update /etc/hosts
update system with hostname -F /etc/hostname
Reconfigure drbd on Master and Slave nodes
modify drbd.01.conf (attached) to reflect new hostname
invoke drbdadm adjust all
Update pacemaker config on Master node only
crm configure property maintenance-mode=true
crm configure delete --force 1
crm configure delete --force 2
crm configure xml '

  

  '
crm configure xml '

  

  '
crm resource reprobe
crm configure refresh
crm configure property maintenance-mode=false

Let's say for example that I migrate the hostnames like this

hostname10 -> hostname20
hostname11 -> hostname21

After the above procedure is concluded the cluster is correctly
reconfigured and when I check with crm_mon or crm status or crm
configure show xml or even by inspecting the cib.xml I find the proper
new hostnames fetched by pacemaker/corosync (hostname20 and hostname21).

The documentation reports that pacemaker node name is taken from

corosync.conf nodelist->ring0_addr if not an ip address: NOT MY
CASE => skip
corosync.conf nodelist->name if available: NOT MY CASE => skip
uname -n [SHOULD BE IN HERE]

Apparently case number 3 does not apply:

[root@hostname20 ~]# crm_node -n
hostname10
[root@hostname20 ~]# uname -n
hostname20

This becomes evident as soon as I reboot/poweroff one of  the two nodes:
crm_mon which after the reconfiguration was correctly showing

Online: [ hostname21 hostname20 ]

"rolls back" the configuration without any notice and starts showing the
old one

Online: [ hostname10 ]
OFFLINE: [ hostname11 ]

Do you have any idea of where on heath pacemaker is recovering the old
hostnames ?

I've even checked  the code and see that there are cmaps involved so I
suspect there's some caching issues involved in this.

It looks like it is retaining the old hostnames in memory and when
something .. "fails" it restores them.

Besides don't blame me for this use case (reconfigure hostnames in a
two-nodes cluster), as I didn't make it up. I just carry the pain.

R


Riccardo Manfrin
R&D DEPARTMENT
Web | 
LinkedIn t +39 (0)444 750045
e riccardo.manf...@athonet.com
[https://www.athonet.com/signature/logo_athonet.png]
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use 
of the intended recipient. If you are not t

Re: [ClusterLabs] Pacemaker not starting

2020-09-29 Thread Ambadas Kawle
Hello

Please find below attached ss for cluster configuration file.

On Fri, 25 Sep 2020, 8:46 pm Klaus Wenninger,  wrote:

> On 9/24/20 2:53 PM, Ambadas Kawle wrote:
>
> Hello Team
>
>
> Please help me to solve this problem
>
> You have to provide us with some information about your
> cluster setup so that we can help.
> That is why we had asked you for the content
> of /etc/cluster/cluster.conf and the output
> of 'pcs config'.
>
> Klaus
>
>
> Thanks
>
> On Wed, 23 Sep 2020, 11:41 am Ambadas Kawle,  wrote:
>
>> Hello All
>>
>> We have 2 node with Mysql cluster and we are not able to start pacemaker on 
>> one of the node (slave node)
>> We are getting error "waiting for quorum... timed-out waiting for cluster"
>>
>> Following are package detail
>> pacemaker pacemaker-1.1.15-5.el6.x86_64
>> pacemaker-libs-1.1.15-5.el6.x86_64
>> pacemaker-cluster-libs-1.1.15-5.el6.x86_64
>> pacemaker-cli-1.1.15-5.el6.x86_64
>>
>> Corosync corosync-1.4.7-6.el6.x86_64
>> corosynclib-1.4.7-6.el6.x86_64
>>
>> Mysql mysql-5.1.73-7.el6.x86_64
>> "mysql-connector-python-2.0.4-1.el6.noarch
>>
>> Your help is appreciated
>>
>>  Thanks
>>
>> Ambadas kawle
>>
>>
> ___
> Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not starting

2020-09-29 Thread Klaus Wenninger
On 9/29/20 10:04 AM, Ambadas Kawle wrote:
> Hello 
>
> Please find below attached ss for cluster configuration file.
As Reid already pointed out your issue is probably a
"pre pacemaker start thing".
Regarding pacemaker-clusters running with cman there
are probably otherswho can give you better advice out
of the back of their minds. Honestly I don't remember
having ever setup a 2-node-cluster on that config but
your cman section is empty instead of showing
'two_node="1" expected_votes="1"' Reid had mentioned.
Could you pls. in the future rather provide text-files
instead of graphical screenshots.
Your test of crm_mon shows - somehow expected - that
pacemaker isreally not running. That is why it is
not able to connect.
As with most of the tools you can make crm_mon read
the cib froma file instead of connecting with pacemaker
to get it via ipc.
Not that I would expect too much insight from it but ...:

  CIB_file=/var/lib/pacemaker/cib/cib.xml crm_mon

Klaus
>
> On Fri, 25 Sep 2020, 8:46 pm Klaus Wenninger,  > wrote:
>
> On 9/24/20 2:53 PM, Ambadas Kawle wrote:
>> Hello Team 
>>
>>
>> Please help me to solve this problem
> You have to provide us with some information about your
> cluster setup so that we can help.
> That is why we had asked you for the content
> of /etc/cluster/cluster.conf and the output
> of 'pcs config'.
>
> Klaus
>>
>> Thanks 
>>
>> On Wed, 23 Sep 2020, 11:41 am Ambadas Kawle, > > wrote:
>>
>> Hello All
>>
>> We have 2 node with Mysql cluster and we are not able to start 
>> pacemaker on one of the node (slave node)
>> We are getting error "waiting for quorum... timed-out waiting for 
>> cluster"
>>
>> Following are package detail
>> pacemaker pacemaker-1.1.15-5.el6.x86_64
>> pacemaker-libs-1.1.15-5.el6.x86_64
>> pacemaker-cluster-libs-1.1.15-5.el6.x86_64
>> pacemaker-cli-1.1.15-5.el6.x86_64
>> 
>> Corosync corosync-1.4.7-6.el6.x86_64
>> corosynclib-1.4.7-6.el6.x86_64
>> 
>> Mysql mysql-5.1.73-7.el6.x86_64
>> "mysql-connector-python-2.0.4-1.el6.noarch
>>
>> Your help is appreciated
>>
>> Thanks 
>>
>> Ambadas kawle
>>
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not starting

2020-09-25 Thread Klaus Wenninger
On 9/24/20 2:53 PM, Ambadas Kawle wrote:
> Hello Team 
>
>
> Please help me to solve this problem
You have to provide us with some information about your
cluster setup so that we can help.
That is why we had asked you for the content
of /etc/cluster/cluster.conf and the output
of 'pcs config'.

Klaus
>
> Thanks 
>
> On Wed, 23 Sep 2020, 11:41 am Ambadas Kawle,  > wrote:
>
> Hello All
>
> We have 2 node with Mysql cluster and we are not able to start pacemaker 
> on one of the node (slave node)
> We are getting error "waiting for quorum... timed-out waiting for cluster"
>
> Following are package detail
> pacemaker pacemaker-1.1.15-5.el6.x86_64
> pacemaker-libs-1.1.15-5.el6.x86_64
> pacemaker-cluster-libs-1.1.15-5.el6.x86_64
> pacemaker-cli-1.1.15-5.el6.x86_64
> 
> Corosync corosync-1.4.7-6.el6.x86_64
> corosynclib-1.4.7-6.el6.x86_64
> 
> Mysql mysql-5.1.73-7.el6.x86_64
> "mysql-connector-python-2.0.4-1.el6.noarch
>
> Your help is appreciated
>
> Thanks 
>
> Ambadas kawle
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not starting

2020-09-25 Thread Ambadas Kawle
Hello Team


Please help me to solve this problem

Thanks

On Wed, 23 Sep 2020, 11:41 am Ambadas Kawle,  wrote:

> Hello All
>
> We have 2 node with Mysql cluster and we are not able to start pacemaker on 
> one of the node (slave node)
> We are getting error "waiting for quorum... timed-out waiting for cluster"
>
> Following are package detail
> pacemaker pacemaker-1.1.15-5.el6.x86_64
> pacemaker-libs-1.1.15-5.el6.x86_64
> pacemaker-cluster-libs-1.1.15-5.el6.x86_64
> pacemaker-cli-1.1.15-5.el6.x86_64
>
> Corosync corosync-1.4.7-6.el6.x86_64
> corosynclib-1.4.7-6.el6.x86_64
>
> Mysql mysql-5.1.73-7.el6.x86_64
> "mysql-connector-python-2.0.4-1.el6.noarch
>
> Your help is appreciated
>
>
> Thanks
>
> Ambadas kawle
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not starting

2020-09-23 Thread Strahil Nikolov
What is the output of 'corosync-quorumtool -s' on both nodes ?
What is your cluster's configuration :

'crm configure show' or 'pcs config'


Best Regards,
Strahil Nikolov






В сряда, 23 септември 2020 г., 16:07:16 Гринуич+3, Ambadas Kawle 
 написа: 





Hello All

We have 2 node with Mysql cluster and we are not able to start pacemaker on one 
of the node (slave node)
We are getting error "waiting for quorum... timed-out waiting for cluster"

Following are package detail
pacemaker pacemaker-1.1.15-5.el6.x86_64
pacemaker-libs-1.1.15-5.el6.x86_64
pacemaker-cluster-libs-1.1.15-5.el6.x86_64
pacemaker-cli-1.1.15-5.el6.x86_64

Corosync corosync-1.4.7-6.el6.x86_64
corosynclib-1.4.7-6.el6.x86_64

Mysql mysql-5.1.73-7.el6.x86_64
"mysql-connector-python-2.0.4-1.el6.noarch

Your help is appreciatedThanks Ambadas kawle
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker not starting

2020-09-23 Thread Reid Wahl
Please also share /etc/cluster/cluster.conf. Do you have `two_node="1"
expected_votes="1"` in the  element of cluster.conf?

This is technically a cman startup issue. Pacemaker is waiting for
cman to start first and form quorum through corosync first.

On Wed, Sep 23, 2020 at 9:55 AM Strahil Nikolov  wrote:
>
> What is the output of 'corosync-quorumtool -s' on both nodes ?
> What is your cluster's configuration :
>
> 'crm configure show' or 'pcs config'
>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В сряда, 23 септември 2020 г., 16:07:16 Гринуич+3, Ambadas Kawle 
>  написа:
>
>
>
>
>
> Hello All
>
> We have 2 node with Mysql cluster and we are not able to start pacemaker on 
> one of the node (slave node)
> We are getting error "waiting for quorum... timed-out waiting for cluster"
>
> Following are package detail
> pacemaker pacemaker-1.1.15-5.el6.x86_64
> pacemaker-libs-1.1.15-5.el6.x86_64
> pacemaker-cluster-libs-1.1.15-5.el6.x86_64
> pacemaker-cli-1.1.15-5.el6.x86_64
>
> Corosync corosync-1.4.7-6.el6.x86_64
> corosynclib-1.4.7-6.el6.x86_64
>
> Mysql mysql-5.1.73-7.el6.x86_64
> "mysql-connector-python-2.0.4-1.el6.noarch
>
> Your help is appreciatedThanks Ambadas kawle
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker/corosync with PostgreSQL 12

2020-09-04 Thread Jehan-Guillaume de Rorthais
On Fri, 4 Sep 2020 10:55:31 +0200
Oyvind Albrigtsen  wrote:

> Add the "recovery.conf" parameters to postgresql.conf (except the
> standby one) and touch standby.signal (which does the same thing).

+1

> After you've verified that it's working and stop PostgreSQL you simply
> rm standby.signal and the "recovery.conf" specific parameters,

Why removing standby.signal and the recovery parameters?
The RA should deal with the first, and seconds are ignored on a primary
instance.

> and the resource agent will properly add/remove them when appropriate.

It depend on the agent OP picked. I suppose this is true with the one provided
by the resource-agents project.

If you are using the PAF resource agent, it only deals with standby.signal.

> On 04/09/20 08:47 +, Ларионов Андрей Валентинович wrote:
> [...] or give link to existing documentation

This one is available, adapt to your environment:
https://clusterlabs.github.io/PAF/Quick_Start-CentOS-8.html

Another one should appears soon using Debian10/PgSQL12.

Regards,
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker/corosync with PostgreSQL 12

2020-09-04 Thread Oyvind Albrigtsen

Add the "recovery.conf" parameters to postgresql.conf (except the
standby one) and touch standby.signal (which does the same thing).

After you've verified that it's working and stop PostgreSQL you simply
rm standby.signal and the "recovery.conf" specific parameters, and the
resource agent will properly add/remove them when appropriate.

On 04/09/20 08:47 +, Ларионов Андрей Валентинович wrote:

Hello,

Please, can you provide example, explanation or give link to existing 
documentation - how to use
pacemaker/corosync for HA Cluster for PostgreSQL 12.x?
Problem is what in PostgreSQL 12 file "recovery.conf" is deprecated, not used 
now.

--
WBR,
Andrey Larionov




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker-remote: Connection to cluster failed: Transport endpoint is not connected

2020-08-13 Thread luckydog xf
Probably I didn't configure  pacemaker resource properly, now it's OK.
---
crm configure show remote-db8-ca-3a-69-60-f4
node remote-db8-ca-3a-69-60-f4:remote \
attributes OpenStack-role=compute standby=off
primitive remote-db8-ca-3a-69-60-f4 ocf:pacemaker:remote \
params reconnect_interval=60s server=db8-ca-3a-69-60-f4.ipa.pthl.hk
\
op monitor interval=30s \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s \
meta target-role=Started
---

On Thu, Aug 13, 2020 at 4:33 PM luckydog xf  wrote:

> Hi, guys,
>
>   I'm running SLES12 sp3 and pacemaker-remote-1.1.16-4.8.x86_64, a few
> months ago, compute nodes of openstack are running well.  But today when I
> setup  a new compute node, it's said,"
> -
> Aug 13 16:31:04 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted:   notice:
> lrmd_init_remote_tls_server:Starting a tls listener on port 3121.
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted:   notice:
> bind_and_listen:Listening on address ::
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_ro
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_rw
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: cib_shm
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: attrd
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: stonith-ng
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> qb_ipcs_us_publish: server name: crmd
> Aug 13 16:31:05 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted: info:
> main:   Starting
> ---
> and after I run `crm_mon -1r`, two lines are appended of pacemaker.log
>
> Aug 13 16:31:38 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted:error:
> ipc_proxy_accept:   No ipc providers available for uid 0 gid 0
> Aug 13 16:31:38 [43122] db8-ca-3a-69-60-f4 pacemaker_remoted:error:
> handle_new_connection:  Error in connection setup (43122-43151-15): Remote
> I/O error (121)
>
> And the output of `crm_mon -1r` is
>
> Connection to cluster failed: Transport endpoint is not connected
>
> My environment is almost the same with other servers. So what's up ?
> Thanks,
>
>
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Strahil Nikolov
Early  systemd bugs caused dbus  issues  and session files  not being cleaned  
up properly.  At least EL 7.4  or  older  were  affected.

What is your OS and version?

P.S.: I know your pain. I am still fighting to explain that without planned 
downtime, the end users will definitely get unplanned downtime.

Best Regards,
Strahil Nikolov

На 29 юли 2020 г. 12:46:16 GMT+03:00, lkxjtu  написа:
>Hi Reid Wahl,
>
>
>There are more log informations below. The reason seems to be that
>communication with DBUS timed out. Any suggestions?
>
>
>1672712 Jul 24 21:20:17 [3945305] B0610011   lrmd: info:
>pcmk_dbus_timeout_dispatch:Timeout 0x147bbd0 expired
>1672713 Jul 24 21:20:17 [3945305] B0610011   lrmd: info:
>pcmk_dbus_find_error:  LoadUnit error
>'org.freedesktop.DBus.Error.NoReply': Did not receive a reply.
>Possible causes include: the remote application did not send a reply,
>the message bus security policy blocked the reply, the reply timeout
>expired, or the network connection was broken.
>1672714 Jul 24 21:20:17 [3945305] B0610011   lrmd:error:
>systemd_loadunit_result:   Unexepcted DBus type, expected o in 's'
>instead of s
>1672715 Jul 24 21:20:17 [3945305] B0610011   lrmd:error:
>crm_abort: systemd_unit_exec_with_unit: Triggered fatal assert at
>systemd.c:514 : unit
>1672716 2020-07-24T21:20:17.701484+08:00 B0610011 lrmd[3945305]:   
>error: systemd_loadunit_result: Unexepcted DBus type, expected o in 's'
>instead of s
>1672717 2020-07-24T21:20:17.701517+08:00 B0610011 lrmd[3945305]:   
>error: crm_abort: systemd_unit_exec_with_unit: Triggered fatal assert
>at systemd.c:514 : unit
>1672718 Jul 24 21:20:17 [3945306] B0610011   crmd:error:
>crm_ipc_read:  Connection to lrmd failed
>
>
>
>> Hi,
>>
>> It looks like this is a bug that was fixed in later releases. The
>`path`
>> variable was a null pointer when it was passed to
>> `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a
>>
>
>> adds a null check to the `path` variable before using it to call
>`systemd_unit_exec_with_unit`.
>>
>> I believe pacemaker-1.1.15-11.el7 is the first RHEL pacemaker release
>that
>> contains the fix. Can you upgrade and see if the issue is resolved?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-30 Thread Klaus Wenninger
On 7/29/20 10:39 AM, Reid Wahl wrote:
> Hi,
>
> It looks like this is a bug that was fixed in later releases. The
> `path` variable was a null pointer when it was passed to
> `systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a
> 
> adds a null check to the `path` variable before using it to call
> `systemd_unit_exec_with_unit`.
>
Hmm ... the calltree shows dbus-api being used inside
dbus_connection_dispatch which iirc
isn't allowed.
Could be related to https://github.com/ClusterLabs/pacemaker/pull/1201.

Klaus
> I believe pacemaker-1.1.15-11.el7 is the first RHEL pacemaker release
> that contains the fix. Can you upgrade and see if the issue is resolved?
>
> On Tue, Jul 28, 2020 at 11:49 PM lkxjtu  > wrote:
>
> RPM Version Information:
> corosync-2.3.4-7.el7_2.1.x86_64
> pacemaker-1.1.12-22.el7.x86_64
>
>
> Coredump file backtrace:
>
> ```
> warning: .dynamic section for "/lib64/libk5crypto.so.3" is not at
> the expected address (wrong library or version mismatch?)
> Missing separate debuginfo for
> Try: yum --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/91/375124d864f2692ced1c4a5f090826b7074dc0
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/libexec/pacemaker/lrmd'.
> Program terminated with signal 6, Aborted.
> #0  0x7f6ee9ed85f7 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> audit-libs-2.4.1-5.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
> corosynclib-2.3.4-7.el7_2.1.x86_64 dbus-libs-1.6.12-13.el7.x86_64
> glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64
> gmp-6.0.0-12.el7_1.x86_64 gnutls-3.3.8-14.el7_2.x86_64
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64
> libcom_err-1.42.9-7.el7.x86_64 libffi-3.0.13-16.el7.x86_64
> libqb-0.17.1-2.el7.1.x86_64 libselinux-2.5-12.el7.x86_64
> libtasn1-3.8-2.el7.x86_64 libtool-ltdl-2.4.2-21.el7_2.x86_64
> libuuid-2.23.2-26.el7_2.2.x86_64 libxml2-2.9.1-6.el7_2.2.x86_64
> libxslt-1.1.28-5.el7.x86_64 nettle-2.7.1-4.el7.x86_64
> openssl-libs-1.0.1e-51.el7_2.4.x86_64 p11-kit-0.20.7-3.el7.x86_64
> pam-1.1.8-12.el7_1.1.x86_64 pcre-8.32-15.el7.x86_64
> trousers-0.3.13-1.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64
> zlib-1.2.7-15.el7.x86_64
> (gdb) bt
> #0  0x7f6ee9ed85f7 in raise () from /lib64/libc.so.6
> #1  0x7f6ee9ed9ce8 in abort () from /lib64/libc.so.6
> #2  0x7f6eeb7e1f67 in crm_abort
> (file=file@entry=0x7f6eeb5c863e "systemd.c",
> function=function@entry=0x7f6eeb5c8ec0 <__FUNCTION__.33949>
> "systemd_unit_exec_with_unit", line=line@entry=514,
>     assert_condition=assert_condition@entry=0x7f6eeb5c8713 "unit",
> do_core=do_core@entry=1, do_fork=, do_fork@entry=0)
> at utils.c:1197
> #3  0x7f6eeb5c5cef in systemd_unit_exec_with_unit
> (op=op@entry=0x13adce0, unit=0x0) at systemd.c:514
> #4  0x7f6eeb5c5e81 in systemd_loadunit_result
> (reply=reply@entry=0x139f2a0, op=op@entry=0x13adce0) at systemd.c:175
> #5  0x7f6eeb5c6181 in systemd_loadunit_cb (pending=0x13aa380,
> user_data=0x13adce0) at systemd.c:197
> #6  0x7f6eeb16f862 in complete_pending_call_and_unlock () from
> /lib64/libdbus-1.so.3
> #7  0x7f6eeb172b51 in dbus_connection_dispatch () from
> /lib64/libdbus-1.so.3
> #8  0x7f6eeb5c1e40 in pcmk_dbus_connection_dispatch
> (connection=0x13a4cb0, new_status=DBUS_DISPATCH_DATA_REMAINS,
> data=0x0) at dbus.c:388
> #9  0x7f6eeb171260 in
> _dbus_connection_update_dispatch_status_and_unlock () from
> /lib64/libdbus-1.so.3
> #10 0x7f6eeb172a93 in reply_handler_timeout () from
> /lib64/libdbus-1.so.3
> #11 0x7f6eeb5c1daf in pcmk_dbus_timeout_dispatch
> (data=0x13aa660) at dbus.c:491
> #12 0x7f6ee97a21c3 in g_timeout_dispatch () from
> /lib64/libglib-2.0.so.0
> #13 0x7f6ee97a17aa in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #14 0x7f6ee97a1af8 in g_main_context_iterate.isra.24 () from
> /lib64/libglib-2.0.so.0
> #15 0x7f6ee97a1dca in g_main_loop_run () from
> /lib64/libglib-2.0.so.0
> #16 0x00402824 in main (argc=,
> argv=0x7ffce752b258) at main.c:344
> (gdb) up
> #1  0x7f6ee9ed9ce8 in abort () from /lib64/libc.so.6
> (gdb) up
> #2  0x7f6eeb7e1f67 in crm_abort
> (file=file@entry=0x7f6eeb5c863e "systemd.c",
> function=function@entry=0x7f6eeb5c8ec0 <__FUNCTION__.33949>
> "systemd_unit_exec_with_unit", line=line@entry=514,
>     assert_condition=assert_condition@entry=0x7f6eeb5c8713 "unit",
> do_core=do_core@entry=1, do_fork=, do_fork@

Re: [ClusterLabs] Pacemaker crashed and produce a coredump file

2020-07-29 Thread Reid Wahl
Hi,

It looks like this is a bug that was fixed in later releases. The `path`
variable was a null pointer when it was passed to
`systemd_unit_exec_with_unit` as the `unit` argument. Commit 62a0d26a

adds a null check to the `path` variable before using it to call
`systemd_unit_exec_with_unit`.

I believe pacemaker-1.1.15-11.el7 is the first RHEL pacemaker release that
contains the fix. Can you upgrade and see if the issue is resolved?

On Tue, Jul 28, 2020 at 11:49 PM lkxjtu  wrote:

> RPM Version Information:
> corosync-2.3.4-7.el7_2.1.x86_64
> pacemaker-1.1.12-22.el7.x86_64
>
>
> Coredump file backtrace:
>
> ```
> warning: .dynamic section for "/lib64/libk5crypto.so.3" is not at the
> expected address (wrong library or version mismatch?)
> Missing separate debuginfo for
> Try: yum --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/91/375124d864f2692ced1c4a5f090826b7074dc0
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/libexec/pacemaker/lrmd'.
> Program terminated with signal 6, Aborted.
> #0  0x7f6ee9ed85f7 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> audit-libs-2.4.1-5.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64
> corosynclib-2.3.4-7.el7_2.1.x86_64 dbus-libs-1.6.12-13.el7.x86_64
> glib2-2.42.2-5.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64
> gmp-6.0.0-12.el7_1.x86_64 gnutls-3.3.8-14.el7_2.x86_64
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64
> libcom_err-1.42.9-7.el7.x86_64 libffi-3.0.13-16.el7.x86_64
> libqb-0.17.1-2.el7.1.x86_64 libselinux-2.5-12.el7.x86_64
> libtasn1-3.8-2.el7.x86_64 libtool-ltdl-2.4.2-21.el7_2.x86_64
> libuuid-2.23.2-26.el7_2.2.x86_64 libxml2-2.9.1-6.el7_2.2.x86_64
> libxslt-1.1.28-5.el7.x86_64 nettle-2.7.1-4.el7.x86_64
> openssl-libs-1.0.1e-51.el7_2.4.x86_64 p11-kit-0.20.7-3.el7.x86_64
> pam-1.1.8-12.el7_1.1.x86_64 pcre-8.32-15.el7.x86_64
> trousers-0.3.13-1.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64
> zlib-1.2.7-15.el7.x86_64
> (gdb) bt
> #0  0x7f6ee9ed85f7 in raise () from /lib64/libc.so.6
> #1  0x7f6ee9ed9ce8 in abort () from /lib64/libc.so.6
> #2  0x7f6eeb7e1f67 in crm_abort (file=file@entry=0x7f6eeb5c863e
> "systemd.c", function=function@entry=0x7f6eeb5c8ec0 <__FUNCTION__.33949>
> "systemd_unit_exec_with_unit", line=line@entry=514,
> assert_condition=assert_condition@entry=0x7f6eeb5c8713 "unit",
> do_core=do_core@entry=1, do_fork=, do_fork@entry=0) at
> utils.c:1197
> #3  0x7f6eeb5c5cef in systemd_unit_exec_with_unit (op=op@entry=0x13adce0,
> unit=0x0) at systemd.c:514
> #4  0x7f6eeb5c5e81 in systemd_loadunit_result 
> (reply=reply@entry=0x139f2a0,
> op=op@entry=0x13adce0) at systemd.c:175
> #5  0x7f6eeb5c6181 in systemd_loadunit_cb (pending=0x13aa380,
> user_data=0x13adce0) at systemd.c:197
> #6  0x7f6eeb16f862 in complete_pending_call_and_unlock () from
> /lib64/libdbus-1.so.3
> #7  0x7f6eeb172b51 in dbus_connection_dispatch () from
> /lib64/libdbus-1.so.3
> #8  0x7f6eeb5c1e40 in pcmk_dbus_connection_dispatch
> (connection=0x13a4cb0, new_status=DBUS_DISPATCH_DATA_REMAINS, data=0x0) at
> dbus.c:388
> #9  0x7f6eeb171260 in
> _dbus_connection_update_dispatch_status_and_unlock () from
> /lib64/libdbus-1.so.3
> #10 0x7f6eeb172a93 in reply_handler_timeout () from
> /lib64/libdbus-1.so.3
> #11 0x7f6eeb5c1daf in pcmk_dbus_timeout_dispatch (data=0x13aa660) at
> dbus.c:491
> #12 0x7f6ee97a21c3 in g_timeout_dispatch () from
> /lib64/libglib-2.0.so.0
> #13 0x7f6ee97a17aa in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #14 0x7f6ee97a1af8 in g_main_context_iterate.isra.24 () from
> /lib64/libglib-2.0.so.0
> #15 0x7f6ee97a1dca in g_main_loop_run () from /lib64/libglib-2.0.so.0
> #16 0x00402824 in main (argc=, argv=0x7ffce752b258)
> at main.c:344
> (gdb) up
> #1  0x7f6ee9ed9ce8 in abort () from /lib64/libc.so.6
> (gdb) up
> #2  0x7f6eeb7e1f67 in crm_abort (file=file@entry=0x7f6eeb5c863e
> "systemd.c", function=function@entry=0x7f6eeb5c8ec0 <__FUNCTION__.33949>
> "systemd_unit_exec_with_unit", line=line@entry=514,
> assert_condition=assert_condition@entry=0x7f6eeb5c8713 "unit",
> do_core=do_core@entry=1, do_fork=, do_fork@entry=0) at
> utils.c:1197
> 1197abort();
> (gdb) up
> #3  0x7f6eeb5c5cef in systemd_unit_exec_with_unit (op=op@entry=0x13adce0,
> unit=0x0) at systemd.c:514
> 514 CRM_ASSERT(unit);
> (gdb) up
> #4  0x7f6eeb5c5e81 in systemd_loadunit_result 
> (reply=reply@entry=0x139f2a0,
> op=op@entry=0x13adce0) at systemd.c:175
> 175 systemd_unit_exec_with_unit(op, path);
> (gdb) up
> #5  0x7f6eeb5c6181 in systemd_loadunit_cb (pending=0x13aa380,
> user_data=0x13adce0) at systemd.c:197
> 197 systemd_loadunit_result(reply, user_data);
> (gdb) up
> 

Re: [ClusterLabs] pacemaker startup problem

2020-07-27 Thread Gabriele Bulfon
Solved this, actually I don't need heartbeat component and service running.
I just use corosync and pacemaker, and this seems to work.
Now going on with crm configuration.
 
Thanks!
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Reid Wahl
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 12.25.20 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem
Hmm. If it's reading PCMK_ipc_type and matching the server type to 
QB_IPC_SOCKET, then the only other place I see it could be coming from is 
qb_ipc_auth_creds.
 
qb_ipcs_run -qb_ipcs_us_publish -qb_ipcs_us_connection_acceptor 
-qb_ipcs_uc_recv_and_auth -process_auth -qb_ipc_auth_creds -
 
static int32_t
qb_ipc_auth_creds(struct ipc_auth_data *data)
{
...
#ifdef HAVE_GETPEERUCRED
        /*
         * Solaris and some BSD systems
...
#elif defined(HAVE_GETPEEREID)
        /*
        * Usually MacOSX systems
...
#elif defined(SO_PASSCRED)
        /*
        * Usually Linux systems
...
#else /* no credentials */
        data-ugp.pid = 0;
        data-ugp.uid = 0;
        data-ugp.gid = 0;
        res = -ENOTSUP;
#endif /* no credentials */
        return res;
 
I'll leave it to Ken to say whether that's likely and what it implies if so.
On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon
gbul...@sonicle.com
wrote:
Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming 
out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or 
crmd?
 
Here's the env of the process:
 
sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node 
xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to 
create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The 
local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State 
transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: 
Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input 
I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: 
Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child 
process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: 
Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
kgail...@redhat.com
A: Cluster Labs - All topics related to open-source clustering welcomed
users@clusterlabs.org
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Reid Wahl
Illumos might have getpeerucred, which can also set errno to ENOTSUP.

On Sun, Jul 26, 2020 at 3:25 AM Reid Wahl  wrote:

> Hmm. If it's reading PCMK_ipc_type and matching the server type to
> QB_IPC_SOCKET, then the only other place I see it could be coming from is
> qb_ipc_auth_creds.
>
> qb_ipcs_run -> qb_ipcs_us_publish -> qb_ipcs_us_connection_acceptor ->
> qb_ipcs_uc_recv_and_auth -> process_auth -> qb_ipc_auth_creds ->
>
> static int32_t
> qb_ipc_auth_creds(struct ipc_auth_data *data)
> {
> ...
> #ifdef HAVE_GETPEERUCRED
> /*
>  * Solaris and some BSD systems
> ...
> #elif defined(HAVE_GETPEEREID)
> /*
> * Usually MacOSX systems
> ...
> #elif defined(SO_PASSCRED)
> /*
> * Usually Linux systems
> ...
> #else /* no credentials */
> data->ugp.pid = 0;
> data->ugp.uid = 0;
> data->ugp.gid = 0;
> res = -ENOTSUP;
> #endif /* no credentials */
>
> return res;
>
> I'll leave it to Ken to say whether that's likely and what it implies if
> so.
>
> On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon 
> wrote:
>
>> Sorry, actually the problem is not gone yet.
>> Now corosync and pacemaker are running happily, but those IPC errors are
>> coming out of heartbeat and crmd as soon as I start it.
>> The pacemakerd process has PCMK_ipc_type=socket, what's wrong with
>> heartbeat or crmd?
>>
>> Here's the env of the process:
>>
>> sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
>> 4222: /usr/sbin/pacemakerd
>> envp[0]: PCMK_respawned=true
>> envp[1]: PCMK_watchdog=false
>> envp[2]: HA_LOGFACILITY=none
>> envp[3]: HA_logfacility=none
>> envp[4]: PCMK_logfacility=none
>> envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
>> envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
>> envp[7]: HA_debug=0
>> envp[8]: PCMK_debug=0
>> envp[9]: HA_quorum_type=corosync
>> envp[10]: PCMK_quorum_type=corosync
>> envp[11]: HA_cluster_type=corosync
>> envp[12]: PCMK_cluster_type=corosync
>> envp[13]: HA_use_logd=off
>> envp[14]: PCMK_use_logd=off
>> envp[15]: HA_mcp=true
>> envp[16]: PCMK_mcp=true
>> envp[17]: HA_LOGD=no
>> envp[18]: LC_ALL=C
>> envp[19]: PCMK_service=pacemakerd
>> envp[20]: PCMK_ipc_type=socket
>> envp[21]: SMF_ZONENAME=global
>> envp[22]: PWD=/
>> envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
>> envp[24]: _=/usr/sbin/pacemakerd
>> envp[25]: TZ=Europe/Rome
>> envp[26]: LANG=en_US.UTF-8
>> envp[27]: SMF_METHOD=start
>> envp[28]: SHLVL=2
>> envp[29]: PATH=/usr/sbin:/usr/bin
>> envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
>> envp[31]: A__z="*SHLVL
>>
>>
>> Here are crmd complaints:
>>
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> Node xstorage1 state is now member
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Could not start crmd IPC server: Operation not supported (-48)
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Failed to create IPC server: shutting down and inhibiting respawn
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> The local CRM is operational
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Input I_ERROR received in state S_STARTING from do_started
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> State transition S_STARTING -> S_RECOVERY
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
>> Fast-tracking shutdown in response to errors
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
>> Input I_PENDING received in state S_RECOVERY from do_started
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Input I_TERMINATE received in state S_RECOVERY from do_recover
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
>> Disconnected from the LRM
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Child process pengine exited (pid=4316, rc=100)
>> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
>> Could not recover from internal error
>> Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]:
>> WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return
>> code 201.
>>
>>
>>
>>
>> *Sonicle S.r.l. *

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Reid Wahl
Hmm. If it's reading PCMK_ipc_type and matching the server type to
QB_IPC_SOCKET, then the only other place I see it could be coming from is
qb_ipc_auth_creds.

qb_ipcs_run -> qb_ipcs_us_publish -> qb_ipcs_us_connection_acceptor ->
qb_ipcs_uc_recv_and_auth -> process_auth -> qb_ipc_auth_creds ->

static int32_t
qb_ipc_auth_creds(struct ipc_auth_data *data)
{
...
#ifdef HAVE_GETPEERUCRED
/*
 * Solaris and some BSD systems
...
#elif defined(HAVE_GETPEEREID)
/*
* Usually MacOSX systems
...
#elif defined(SO_PASSCRED)
/*
* Usually Linux systems
...
#else /* no credentials */
data->ugp.pid = 0;
data->ugp.uid = 0;
data->ugp.gid = 0;
res = -ENOTSUP;
#endif /* no credentials */

return res;

I'll leave it to Ken to say whether that's likely and what it implies if so.

On Sun, Jul 26, 2020 at 2:53 AM Gabriele Bulfon  wrote:

> Sorry, actually the problem is not gone yet.
> Now corosync and pacemaker are running happily, but those IPC errors are
> coming out of heartbeat and crmd as soon as I start it.
> The pacemakerd process has PCMK_ipc_type=socket, what's wrong with
> heartbeat or crmd?
>
> Here's the env of the process:
>
> sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
> 4222: /usr/sbin/pacemakerd
> envp[0]: PCMK_respawned=true
> envp[1]: PCMK_watchdog=false
> envp[2]: HA_LOGFACILITY=none
> envp[3]: HA_logfacility=none
> envp[4]: PCMK_logfacility=none
> envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
> envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
> envp[7]: HA_debug=0
> envp[8]: PCMK_debug=0
> envp[9]: HA_quorum_type=corosync
> envp[10]: PCMK_quorum_type=corosync
> envp[11]: HA_cluster_type=corosync
> envp[12]: PCMK_cluster_type=corosync
> envp[13]: HA_use_logd=off
> envp[14]: PCMK_use_logd=off
> envp[15]: HA_mcp=true
> envp[16]: PCMK_mcp=true
> envp[17]: HA_LOGD=no
> envp[18]: LC_ALL=C
> envp[19]: PCMK_service=pacemakerd
> envp[20]: PCMK_ipc_type=socket
> envp[21]: SMF_ZONENAME=global
> envp[22]: PWD=/
> envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
> envp[24]: _=/usr/sbin/pacemakerd
> envp[25]: TZ=Europe/Rome
> envp[26]: LANG=en_US.UTF-8
> envp[27]: SMF_METHOD=start
> envp[28]: SHLVL=2
> envp[29]: PATH=/usr/sbin:/usr/bin
> envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
> envp[31]: A__z="*SHLVL
>
>
> Here are crmd complaints:
>
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> Node xstorage1 state is now member
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Could not start crmd IPC server: Operation not supported (-48)
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Failed to create IPC server: shutting down and inhibiting respawn
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> The local CRM is operational
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Input I_ERROR received in state S_STARTING from do_started
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> State transition S_STARTING -> S_RECOVERY
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
> Fast-tracking shutdown in response to errors
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning:
> Input I_PENDING received in state S_RECOVERY from do_started
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Input I_TERMINATE received in state S_RECOVERY from do_recover
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice:
> Disconnected from the LRM
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Child process pengine exited (pid=4316, rc=100)
> Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error:
> Could not recover from internal error
> Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]:
> WARN: Managed /usr/libexec/pacemaker/crmd process 4315 exited with return
> code 201.
>
>
>
>
> *Sonicle S.r.l. *: http://www.sonicle.com
> *Music: *http://www.gabrielebulfon.com
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>
>
>
>
> --
>
> Da: Ken Gaillot 
> A: Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> Data: 25 luglio 2020 0.46.52 CEST
> Oggetto: Re: [ClusterLabs] pacemaker startup problem
>
> On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
> > Hello,
> >
> > after a long time I'm b

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Gabriele Bulfon
Sorry, actually the problem is not gone yet.
Now corosync and pacemaker are running happily, but those IPC errors are coming 
out of heartbeat and crmd as soon as I start it.
The pacemakerd process has PCMK_ipc_type=socket, what's wrong with heartbeat or 
crmd?
 
Here's the env of the process:
 
sonicle@xstorage1:/sonicle/etc/cluster/ha.d# penv 4222
4222: /usr/sbin/pacemakerd
envp[0]: PCMK_respawned=true
envp[1]: PCMK_watchdog=false
envp[2]: HA_LOGFACILITY=none
envp[3]: HA_logfacility=none
envp[4]: PCMK_logfacility=none
envp[5]: HA_logfile=/sonicle/var/log/cluster/corosync.log
envp[6]: PCMK_logfile=/sonicle/var/log/cluster/corosync.log
envp[7]: HA_debug=0
envp[8]: PCMK_debug=0
envp[9]: HA_quorum_type=corosync
envp[10]: PCMK_quorum_type=corosync
envp[11]: HA_cluster_type=corosync
envp[12]: PCMK_cluster_type=corosync
envp[13]: HA_use_logd=off
envp[14]: PCMK_use_logd=off
envp[15]: HA_mcp=true
envp[16]: PCMK_mcp=true
envp[17]: HA_LOGD=no
envp[18]: LC_ALL=C
envp[19]: PCMK_service=pacemakerd
envp[20]: PCMK_ipc_type=socket
envp[21]: SMF_ZONENAME=global
envp[22]: PWD=/
envp[23]: SMF_FMRI=svc:/sonicle/xstream/cluster/pacemaker:default
envp[24]: _=/usr/sbin/pacemakerd
envp[25]: TZ=Europe/Rome
envp[26]: LANG=en_US.UTF-8
envp[27]: SMF_METHOD=start
envp[28]: SHLVL=2
envp[29]: PATH=/usr/sbin:/usr/bin
envp[30]: SMF_RESTARTER=svc:/system/svc/restarter:default
envp[31]: A__z="*SHLVL
 
 
Here are crmd complaints:
 
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: Node 
xstorage1 state is now member
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
start crmd IPC server: Operation not supported (-48)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Failed to 
create IPC server: shutting down and inhibiting respawn
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: The 
local CRM is operational
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_ERROR received in state S_STARTING from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: State 
transition S_STARTING -S_RECOVERY
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: 
Fast-tracking shutdown in response to errors
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.warning] warning: Input 
I_PENDING received in state S_RECOVERY from do_started
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Input 
I_TERMINATE received in state S_RECOVERY from do_recover
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.notice] notice: 
Disconnected from the LRM
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Child 
process pengine exited (pid=4316, rc=100)
Jul 26 11:39:07 xstorage1 crmd[4315]: [ID 702911 daemon.error] error: Could not 
recover from internal error
Jul 26 11:39:07 xstorage1 heartbeat: [ID 996084 daemon.warning] [4275]: WARN: 
Managed /usr/libexec/pacemaker/crmd process 4315 exited with return code 201.
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit outdated, but I want to start from where I left).
Looks like pacemaker is having trouble starting up showin this logs:
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Set r/w permissions for uid=401, gid=401 on /var/log/pacemaker.log
Jul 24 18:21:32 [971] crmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [971] crmd: info: main: CRM Git Version: 1.1.15
(e174ec8)
Jul 24 18:21:32 [971] crmd: info: do_log: Input I_STARTUP received in
state S_STARTING from crmd_init
Jul 24 18:21:32 [969] lrmd: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: crm_log_init: Changed active
directory to /sonicle/var/cluster/lib/pacemaker/cores
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Verifying
cluster type: 'heartbeat'
Jul 24 18:21:32 [968] stonith-ng: info: get_cluster_type: Assuming an
active 'heartbeat' cluster
Jul 24 18:21:32 [968] stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: heartbeat
Jul 24 18:21:32 [969] lrmd: error: mainloop_add_ipc_server: Could not
start lrmd IPC server: Operation not supported (-48)
This is repeated for all the subdaemons ... the error is coming from
qb_ipcs

Re: [ClusterLabs] pacemaker startup problem

2020-07-26 Thread Gabriele Bulfon
Sorry, I was using wrong hostnames for that networks, using debug log I found 
it was not finding "this node" in conf file.
 
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Gabriele Bulfon
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
26 luglio 2020 11.23.53 CEST
Oggetto:
Re: [ClusterLabs] pacemaker startup problem
 
Thanks, I ran it manually so I got those errors, running from service script it 
correctly set PCMK_ipc_type to socket.
 
But now I see these now:
Jul 26 11:08:16 [4039] pacemakerd: info: crm_log_init: Changed active directory 
to /sonicle/var/cluster/lib/pacemaker/cores
Jul 26 11:08:16 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 1s
Jul 26 11:08:17 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 2s
Jul 26 11:08:19 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 3s
Jul 26 11:08:22 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 4s
Jul 26 11:08:26 [4039] pacemakerd: info: mcp_read_config: cmap connection setup 
failed: CS_ERR_LIBRARY. Retrying in 5s
Jul 26 11:08:31 [4039] pacemakerd: warning: mcp_read_config: Could not connect 
to Cluster Configuration Database API, error 2
Jul 26 11:08:31 [4039] pacemakerd: notice: main: Could not obtain corosync 
config data, exiting
Jul 26 11:08:31 [4039] pacemakerd: info: crm_xml_cleanup: Cleaning up memory 
from libxml2
 
So I think I need to start corosync first (right?) but it dies with this:
 
Jul 26 11:07:06 [4027] xstorage1 corosync notice [MAIN ] Corosync Cluster 
Engine ('2.4.1'): started and ready to provide service.
Jul 26 11:07:06 [4027] xstorage1 corosync info [MAIN ] Corosync built-in 
features: bindnow
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: none hash: none
Jul 26 11:07:06 [4027] xstorage1 corosync notice [TOTEM ] The network interface 
[10.100.100.1] is now up.
Jul 26 11:07:06 [4027] xstorage1 corosync notice [SERV ] Service engine loaded: 
corosync configuration map access [0]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync configuration service [1]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync cluster closed process group service v1.01 [2]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [YKD ] Service engine loaded: 
corosync profile loading service [4]
Jul 26 11:07:06 [4027] xstorage1 corosync notice [QUORUM] Using quorum provider 
corosync_votequorum
Jul 26 11:07:06 [4027] xstorage1 corosync crit [QUORUM] Quorum provider: 
corosync_votequorum failed to initialize.
Jul 26 11:07:06 [4027] xstorage1 corosync error [SERV ] Service engine 
'corosync_quorum' failed to load for reason 'configuration error: nodelist or 
quorum.expected_votes must be configured!'
Jul 26 11:07:06 [4027] xstorage1 corosync error [MAIN ] Corosync Cluster Engine 
exiting with status 20 at 
/data/sources/sonicle/xstream-storage-gate/components/cluster/corosync/corosync-2.4.1/exec/service.c:356.
My corosync conf has nodelist configured! Here it is:
 
service {ver: 1name: pacemakeruse_mgmtd: nouse_logd: no}totem { 
   version: 2crypto_cipher: nonecrypto_hash: none
interface {ringnumber: 0bindnetaddr: 
10.100.100.0mcastaddr: 239.255.1.1mcastport: 
5405ttl: 1}}nodelist {   node { ring0_addr: 
xstorage1 nodeid: 1}   node { ring0_addr: xstorage2 
nodeid: 2}}quorum {provider: corosync_votequorum
two_node: 1}logging {fileline: offto_stderr: no
to_logfile: yeslogfile: /sonicle/var/log/cluster/corosync.log
to_syslog: nodebug: offtimestamp: onlogger_subsys { 
   subsys: QUORUMdebug: off}}
 
 
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
--
Da: Ken Gaillot
A: Cluster Labs - All topics related to open-source clustering welcomed
Data: 25 luglio 2020 0.46.52 CEST
Oggetto: Re: [ClusterLabs] pacemaker startup problem
On Fri, 2020-07-24 at 18:34 +0200, Gabriele Bulfon wrote:
Hello,
after a long time I'm back to run heartbeat/pacemaker/corosync on our
XStreamOS/illumos distro.
I rebuilt the original components I did in 2016 on our latest release
(probably a bit ou

  1   2   3   4   5   6   >