subject:"\[ClusterLabs\] Restoring network connection breaks cluster services"

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-13 Thread Jan Friesse


Momcilo

On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger  wrote:


On 8/7/19 12:26 PM, Momcilo Medic wrote:

We have three node cluster that is setup to stop resources on lost quorum.
Failure (network going down) handling is done properly, but recovery
doesn't seem to work.

What do you mean by 'network going down'?
Loss of link? Does the IP persist on the interface
in that case?



Yes, we simulate faulty cable by turning switch ports down and up.
In such a case, the IP does not persist on the interface.


What corosync version you have? Corosync was really bad in handling 
ifdown (removal of ip) properly till version 3 with knet which solved 
problem completely and 2.4.5, where it is so-so for udpu (udp is still 
affected).


Solution is ether upgrade corosync or configure system to keep ip intact.


Honza




That there are issue reconnecting the CPG-API
sounds strange to me. Already the fact that
something has to be reconnected. I got it
that your nodes were persistently up during the
network-disconnection. Although I would have
expected fencing to kick in at least on those
which are part of the non-quorate cluster-partition.
Maybe a few words more on your scenario
(fening-setup e.g.) would help to understand what
is going on.



We don't use any fencing mechanisms, we rely on quorum to run the services.
In more detail, we run three node Linbit LINSTOR storage that is
hyperconverged.
Meaning, we run clustered storage on the virtualization hypervisors.

We use pcs in order to have linstor-controller service in high availabilty
mode.
Policy for no quorum is to stop the resources.

In such hyperconverged setup, we can't fence a node without impact.
It may happen that network instability causes primary node to no longer be
primary.
In that case, we don't want running VMs to go down with the ship, as there
was no impact for them.

However, we would like to have high-availability of that service upon
network restoration, without manual actions.




Klaus


What happens is, services crash when we re-enable network connection.

 From journal:

```
...
Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu corosync[9069]: corosync:
totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries >= 1'
failed.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu attrd[9104]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu stonith-ng[9100]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Main process exited, code=dumped, status=6/ABRT
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu cib[9098]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Failed with result 'core-dump'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu pacemakerd[9087]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Main process exited, code=exited, status=107/n/a
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Failed with result 'exit-code'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: Stopped Pacemaker
High Availability Cluster Manager.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu lrmd[9102]:  warning:
new_event_notification (9102-9107-7): Bad file descriptor (9)
...
```
Pacemaker's log shows no relevant info.

This is from corosync's log:

```
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9100]

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-12 Thread Jan Pokorný

On 07/08/19 16:06 +0200, Momcilo Medic wrote:
> On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger  wrote:
> 
>> On 8/7/19 12:26 PM, Momcilo Medic wrote:
>> 
>>> We have three node cluster that is setup to stop resources on lost
>>> quorum.  Failure (network going down) handling is done properly,
>>> but recovery doesn't seem to work.
>> 
>> What do you mean by 'network going down'?
>> Loss of link? Does the IP persist on the interface
>> in that case?
> 
> Yes, we simulate faulty cable by turning switch ports down and up.
> In such a case, the IP does not persist on the interface.
> 
>> That there are issue reconnecting the CPG-API sounds strange to me.
>> Already the fact that something has to be reconnected. I got it
>> that your nodes were persistently up during the
>> network-disconnection. Although I would have expected fencing to
>> kick in at least on those which are part of the non-quorate
>> cluster-partition.  Maybe a few words more on your scenario
>> (fening-setup e.g.) would help to understand what is going on.
> 
> We don't use any fencing mechanisms, we rely on quorum to run the
> services.  In more detail, we run three node Linbit LINSTOR storage
> that is hyperconverged.  Meaning, we run clustered storage on the
> virtualization hypervisors.
> 
> We use pcs in order to have linstor-controller service in high
> availabilty mode.  Policy for no quorum is to stop the resources.
> 
> In such hyperconverged setup, we can't fence a node without impact.
> It may happen that network instability causes primary node to no
> longer be primary.  In that case, we don't want running VMs to go
> down with the ship, as there was no impact for them.
> 
> However, we would like to have high-availability of that service
> upon network restoration, without manual actions.

This spurred a train of thought that is admittedly not immediately
helpful in this case:

* * *

1. the word "converged" is a fitting word for how we'd like
   the cluster stack to appear (from the outside), but what we have
   is that some circumstances are not clearly articulated across the
   components meaning that there's no way for users to express the
   preferences in simple terms and in a non-conflicting and
   unambiguous ways when 2+ components' realms combine together
   -- high level tools like pcs may attempt to rectify that to some
   extent, but they fall short when there are no surfaces to glue (at
   least unambiguously, see also parallel thread about shutting the
   cluster down in the presence of sbd)

   it seems to me that the very circumstance that was hit here is
   exactly where corosync authors decided that it's rare and obnoxious
   to indicate up the chain for a detached destiny reasoning (which
   pacemaker normaly performs) enough that they rather stop right
   there (and in a well-behaved cluster configuration hence ask to be
   fenced)

   all is actually sound, until one starts to make compromises like
   here was done, with ditching of the fencing (think: sanity
   assurance) layer, relying fully on no-quorum-policy=stop, naively
   thinking that one 100% covered, but with purely pacemaker hat
   on, we -- the pacemaker dev -- can't really give you such
   a guarantee, because we have no visibility into said "bail out"
   shortcuts that corosyncs makes for such rare circumstances -- you
   shall refer to corosync documentation, but it's not covered there
   (man pages) AFAIK (if it was _all_ indicated to pacemaker, just
   standard response on quorum loss could be carried out, not
   resorting to anything more drastic like here)


2. based on said missing explicit and clear inter-component signalling
   (1.) and the logs provided, it's fair to bring an argument that
   pacemaker had an opportunity to see, barring said explicit API
   signalling, that corosync died, but then, the major assumed case is:

   - corosync crashed or was explicitly killed (perhaps to test the
 claimed HA resiliency towards the outer world)

   - broken pacemaker-corosync communication consistency
 (did some messages fall through the cracks?)

   i.e., cluster endangering scenarios, not something to keep alive
   at all costs, better to try to stabilize the environment first,
   no to speak about chances with "miracles awaiting" strategy


3. despite 2.. there was a decision with systemd-enabled systems to
   actually pursue said "at all costs" (althought implicitly
   mitigated when the restart cycles would be happening in the rapid
   pace)

   - it's all then in the hands in slightly non-deterministic timing
 (token loss timeout window hit/miss, although perhaps not in
 this very case if the state within the protocol would be a clear
 indicator for other corosync peers)
   
   - I'd actually assume the pacemaker would be restarted in said
 scenario (unless one fiddled with the pacemaker service file,
 that is), and just prior to that, corosync would be forcibly
 started anew as well

   - is the

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-07 Thread Momcilo Medic

On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger  wrote:

> On 8/7/19 12:26 PM, Momcilo Medic wrote:
>
> We have three node cluster that is setup to stop resources on lost quorum.
> Failure (network going down) handling is done properly, but recovery
> doesn't seem to work.
>
> What do you mean by 'network going down'?
> Loss of link? Does the IP persist on the interface
> in that case?
>

Yes, we simulate faulty cable by turning switch ports down and up.
In such a case, the IP does not persist on the interface.


> That there are issue reconnecting the CPG-API
> sounds strange to me. Already the fact that
> something has to be reconnected. I got it
> that your nodes were persistently up during the
> network-disconnection. Although I would have
> expected fencing to kick in at least on those
> which are part of the non-quorate cluster-partition.
> Maybe a few words more on your scenario
> (fening-setup e.g.) would help to understand what
> is going on.
>

We don't use any fencing mechanisms, we rely on quorum to run the services.
In more detail, we run three node Linbit LINSTOR storage that is
hyperconverged.
Meaning, we run clustered storage on the virtualization hypervisors.

We use pcs in order to have linstor-controller service in high availabilty
mode.
Policy for no quorum is to stop the resources.

In such hyperconverged setup, we can't fence a node without impact.
It may happen that network instability causes primary node to no longer be
primary.
In that case, we don't want running VMs to go down with the ship, as there
was no impact for them.

However, we would like to have high-availability of that service upon
network restoration, without manual actions.


>
> Klaus
>
>
> What happens is, services crash when we re-enable network connection.
>
> From journal:
>
> ```
> ...
> Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu corosync[9069]: corosync:
> totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries >= 1'
> failed.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu attrd[9104]:error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu stonith-ng[9100]:error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
> Main process exited, code=dumped, status=6/ABRT
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu cib[9098]:error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
> Failed with result 'core-dump'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu pacemakerd[9087]:error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
> Main process exited, code=exited, status=107/n/a
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
> Failed with result 'exit-code'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: Stopped Pacemaker
> High Availability Cluster Manager.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu lrmd[9102]:  warning:
> new_event_notification (9102-9107-7): Bad file descriptor (9)
> ...
> ```
> Pacemaker's log shows no relevant info.
>
> This is from corosync's log:
>
> ```
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
> qb_ipcs_us_withdraw:withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd:error:
> pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:error:
> pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib:error:
> pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd:error:
> pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
> qb_ipcs_us_withdraw:withdrawing server sockets
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd: info:
> crm_xml_cleanup:Cleaning up memory from libxml2
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
> crm_xml_cleanup:Cleaning up memory from libxml2
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng: info:
> qb_ipcs_us_withdraw:withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
> crm_xml_cleanup:Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
> qb_ipcs_us_withdraw:withdrawing server sockets
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng: info:
> crm_xml_cleanup:Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
>

Re: [ClusterLabs] Restoring network connection breaks cluster services

2019-08-07 Thread Klaus Wenninger

On 8/7/19 12:26 PM, Momcilo Medic wrote:
> We have three node cluster that is setup to stop resources on lost quorum.
> Failure (network going down) handling is done properly, but recovery
> doesn't seem to work.
What do you mean by 'network going down'?
Loss of link? Does the IP persist on the interface
in that case?
That there are issue reconnecting the CPG-API
sounds strange to me. Already the fact that
something has to be reconnected. I got it
that your nodes were persistently up during the
network-disconnection. Although I would have
expected fencing to kick in at least on those
which are part of the non-quorate cluster-partition.
Maybe a few words more on your scenario
(fening-setup e.g.) would help to understand what
is going on.

Klaus
>
> What happens is, services crash when we re-enable network connection.
>
> From journal:
>
> ```
> ...
> Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu
>  corosync[9069]: corosync:
> totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries
> >= 1' failed.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  attrd[9104]:    error: Connection
> to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  stonith-ng[9100]:    error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  systemd[1]: corosync.service: Main
> process exited, code=dumped, status=6/ABRT
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  cib[9098]:    error: Connection to
> the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  systemd[1]: corosync.service:
> Failed with result 'core-dump'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  pacemakerd[9087]:    error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  systemd[1]: pacemaker.service:
> Main process exited, code=exited, status=107/n/a
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  systemd[1]: pacemaker.service:
> Failed with result 'exit-code'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  systemd[1]: Stopped Pacemaker High
> Availability Cluster Manager.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
>  lrmd[9102]:  warning:
> new_event_notification (9102-9107-7): Bad file descriptor (9)
> ...
> ```
> Pacemaker's log shows no relevant info.
>
> This is from corosync's log:
>
> ```
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu
>        crmd:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
>       attrd:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
>  stonith-ng:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
>         cib:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu
>  pacemakerd:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
>       attrd:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu
>  pacemakerd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu
>        crmd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
>  stonith-ng:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
>       attrd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
>         cib:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
>  stonith-ng:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
>         cib:     info:
> qb_ipcs_us_withdraw:

[ClusterLabs] Restoring network connection breaks cluster services

2019-08-07 Thread Momcilo Medic

 We have three node cluster that is setup to stop resources on lost quorum.
Failure (network going down) handling is done properly, but recovery
doesn't seem to work.

What happens is, services crash when we re-enable network connection.

>From journal:

```
...
Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu corosync[9069]: corosync:
totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries >= 1'
failed.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu attrd[9104]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu stonith-ng[9100]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Main process exited, code=dumped, status=6/ABRT
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu cib[9098]:error: Connection
to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Failed with result 'core-dump'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu pacemakerd[9087]:error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Main process exited, code=exited, status=107/n/a
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Failed with result 'exit-code'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: Stopped Pacemaker
High Availability Cluster Manager.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu lrmd[9102]:  warning:
new_event_notification (9102-9107-7): Bad file descriptor (9)
...
```
Pacemaker's log shows no relevant info.

This is from corosync's log:

```
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd:error:
pcmk_cpg_dispatch:  Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu   crmd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu  attrd: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
qb_ipcs_us_withdraw:withdrawing server sockets
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eucib: info:
crm_xml_cleanup:Cleaning up memory from libxml2
Jul 12 00:27:33 [9102] itaftestkvmls02.dc.itaf.eu   lrmd:  warning:
qb_ipcs_event_sendv:new_event_notification (9102-9107-7): Bad file
descriptor (9)
```

Please let me know if you need any further info, I'll be more than happy to
provide it.

This is always reproducible in our environment:
Ubuntu 18.04.2
corosync 2.4.3-0ubuntu1.1
pcs 0.9.164-1
pacemaker 1.1.18-0ubuntu1.1

Kind regards,
Momo.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Restoring network connection breaks cluster services

Re: [ClusterLabs] Restoring network connection breaks cluster services

Re: [ClusterLabs] Restoring network connection breaks cluster services

Re: [ClusterLabs] Restoring network connection breaks cluster services

[ClusterLabs] Restoring network connection breaks cluster services

5 matches

Site Navigation

Mail list logo

Footer information