Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Strangely I can't see any timeouts set in   at the example in 
https://pve.proxmox.com/wiki/Fencing ?

Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 18:54, Sebastien BASTARD 
wrote:   Hello Strahil,
I don't have pcs software (corosync is embedded in proxmox), but I have "pvecm 
status" :
Cluster information
---
Name:             cluster
Config Version:   24
Transport:        knet
Secure auth:      on

Quorum information
--
Date:             Tue Feb 22 17:52:06 2022
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x0003
Ring ID:          1.5130
Quorate:          Yes

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate Qdevice 

Membership information
--
    Nodeid      Votes    Qdevice Name
0x0001          1    A,V,NMW serverA
0x0003          1    A,V,NMW serverB (local)
0x          1            Qdevice

Hope you can find the kind of fencing.
Best regards.
Le mar. 22 févr. 2022 à 17:40, Strahil Nikolov  a écrit :

fencing is the reboot mechanism
pcs status
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 16:44, Sebastien BASTARD 
wrote:   Hello Strahil,
As I don't know the kind of fencing, here is the current configuration of 
corosync :

logging {
  debug: off
  to_syslog: yes}
nodelist {
  node {
    name: serverA
    nodeid: 1
    quorum_votes: 1
    ring0_addr: xx.xx.xx.xx
  }
  node {
    name: serverB
    nodeid: 3
    quorum_votes: 1
    ring0_addr: xx.xx.xx.xx
  }}
quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: xx.xx.xx.xx
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum}
totem {
  cluster_name: cluster
  config_version: 24
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
  token_retransmits_before_loss_const: 40
  token: 3







}

Best regards.
Le mar. 22 févr. 2022 à 14:29, Strahil Nikolov  a écrit :

What kind of fencing are you using ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD 
wrote:   Hello Strahil Nikolov,

Qdevice is not a vm. It is a Linux Debian, physical server.

Best regards.

Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a écrit :

Is the qdevice on a VM ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD 
wrote:   ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

  



-- 

| 
|  
 
  |  Sébastien BASTARD 
 Ingénieur R | Domalys • Créateurs d’autonomie 
 
  | phone : +33 5 49 83 00 08 
  | site :  www.domalys.com 
  | email : sebast...@domalys.com 
  | address : 58 Rue du Vercors 86240 Fontaine-Le-Comte 
 
  |


|  
|  |  |  |  |  |  |  |

  |

 | 
 |

 
  



-- 

| 
|  
 
  |  Sébastien BASTARD 
 Ingénieur R | Domalys • Créateurs d’autonomie 
 
  | phone : +33 5 49 83 00 08 
  | site :  www.domalys.com 
  | email : sebast...@domalys.com 
  | address : 58 Rue du Vercors 86240 Fontaine-Le-Comte 
 
  |


|  
|  |  |  |  |  |  |  |

  |

 | 
 |

 
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Sebastien BASTARD
Hello Strahil,

I don't have pcs software (corosync is embedded in proxmox), but I have
"pvecm status" :

Cluster information
---
Name: cluster
Config Version:   24
Transport:knet
Secure auth:  on

Quorum information
--
Date: Tue Feb 22 17:52:06 2022
Quorum provider:  corosync_votequorum
Nodes:2
Node ID:  0x0003
Ring ID:  1.5130
Quorate:  Yes

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  3
Quorum:   2
Flags:Quorate Qdevice

Membership information
--
Nodeid  VotesQdevice Name
0x0001  1A,V,NMW serverA
0x0003  1A,V,NMW serverB (local)
0x  1Qdevice

Hope you can find the kind of fencing.

Best regards.

Le mar. 22 févr. 2022 à 17:40, Strahil Nikolov  a
écrit :

> fencing is the reboot mechanism
>
> pcs status
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 16:44, Sebastien BASTARD
>  wrote:
> Hello Strahil,
>
> As I don't know the kind of fencing, here is the current configuration of
> corosync :
>
> logging {
>   debug: off
>   to_syslog: yes
> }
>
> nodelist {
>   node {
> name: serverA
> nodeid: 1
> quorum_votes: 1
> ring0_addr: xx.xx.xx.xx
>   }
>   node {
> name: serverB
> nodeid: 3
> quorum_votes: 1
> ring0_addr: xx.xx.xx.xx
>   }
> }
>
> quorum {
>   device {
> model: net
> net {
>   algorithm: ffsplit
>   host: xx.xx.xx.xx
>   tls: on
> }
> votes: 1
>   }
>   provider: corosync_votequorum
> }
>
> totem {
>   cluster_name: cluster
>   config_version: 24
>   interface {
> linknumber: 0
>   }
>   ip_version: ipv4-6
>   link_mode: passive
>   secauth: on
>   version: 2
>   *token_retransmits_before_loss_const: 40*
>   *token: 3*
>
> }
>
> Best regards.
>
> Le mar. 22 févr. 2022 à 14:29, Strahil Nikolov  a
> écrit :
>
> What kind of fencing are you using ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD
>  wrote:
> Hello Strahil Nikolov,
>
> Qdevice is not a vm. It is a Linux Debian, physical server.
>
> Best regards.
>
> Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a
> écrit :
>
> Is the qdevice on a VM ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD
>  wrote:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> --
>
>
> Sébastien BASTARD
> *Ingénieur R* | Domalys • Créateurs d’autonomie
>
> | phone : +33 5 49 83 00 08
> | site : www.domalys.com
> | email : sebast...@domalys.com
> | address : 58 Rue du Vercors 86240 Fontaine-Le-Comte
>
>  
> 
> 
> 
>  
> 
>
>

-- 


Sébastien BASTARD
*Ingénieur R* | Domalys • Créateurs d’autonomie

| phone : +33 5 49 83 00 08
| site : www.domalys.com
| email : sebast...@domalys.com
| address : 58 Rue du Vercors 86240 Fontaine-Le-Comte

 



 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
fencing is the reboot mechanism
pcs status
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 16:44, Sebastien BASTARD 
wrote:   Hello Strahil,
As I don't know the kind of fencing, here is the current configuration of 
corosync :

logging {
  debug: off
  to_syslog: yes}
nodelist {
  node {
    name: serverA
    nodeid: 1
    quorum_votes: 1
    ring0_addr: xx.xx.xx.xx
  }
  node {
    name: serverB
    nodeid: 3
    quorum_votes: 1
    ring0_addr: xx.xx.xx.xx
  }}
quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: xx.xx.xx.xx
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum}
totem {
  cluster_name: cluster
  config_version: 24
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
  token_retransmits_before_loss_const: 40
  token: 3







}

Best regards.
Le mar. 22 févr. 2022 à 14:29, Strahil Nikolov  a écrit :

What kind of fencing are you using ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD 
wrote:   Hello Strahil Nikolov,

Qdevice is not a vm. It is a Linux Debian, physical server.

Best regards.

Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a écrit :

Is the qdevice on a VM ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD 
wrote:   ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

  



-- 

| 
|  
 
  |  Sébastien BASTARD 
 Ingénieur R | Domalys • Créateurs d’autonomie 
 
  | phone : +33 5 49 83 00 08 
  | site :  www.domalys.com 
  | email : sebast...@domalys.com 
  | address : 58 Rue du Vercors 86240 Fontaine-Le-Comte 
 
  |


|  
|  |  |  |  |  |  |  |

  |

 | 
 |

 
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Sebastien BASTARD
Hello Strahil,

As I don't know the kind of fencing, here is the current configuration of
corosync :

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
name: serverA
nodeid: 1
quorum_votes: 1
ring0_addr: xx.xx.xx.xx
  }
  node {
name: serverB
nodeid: 3
quorum_votes: 1
ring0_addr: xx.xx.xx.xx
  }
}

quorum {
  device {
model: net
net {
  algorithm: ffsplit
  host: xx.xx.xx.xx
  tls: on
}
votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster
  config_version: 24
  interface {
linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
  *token_retransmits_before_loss_const: 40*
  *token: 3*

}

Best regards.

Le mar. 22 févr. 2022 à 14:29, Strahil Nikolov  a
écrit :

> What kind of fencing are you using ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD
>  wrote:
> Hello Strahil Nikolov,
>
> Qdevice is not a vm. It is a Linux Debian, physical server.
>
> Best regards.
>
> Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a
> écrit :
>
> Is the qdevice on a VM ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD
>  wrote:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

-- 


Sébastien BASTARD
*Ingénieur R* | Domalys • Créateurs d’autonomie

| phone : +33 5 49 83 00 08
| site : www.domalys.com
| email : sebast...@domalys.com
| address : 58 Rue du Vercors 86240 Fontaine-Le-Comte

 



 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Sebastien BASTARD
Hello Strahil,

I didn't know that there are different kinds of fencing. Where can I find
this information ?

Best regards.

Le mar. 22 févr. 2022 à 14:29, Strahil Nikolov  a
écrit :

> What kind of fencing are you using ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD
>  wrote:
> Hello Strahil Nikolov,
>
> Qdevice is not a vm. It is a Linux Debian, physical server.
>
> Best regards.
>
> Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a
> écrit :
>
> Is the qdevice on a VM ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD
>  wrote:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

-- 


Sébastien BASTARD
*Ingénieur R* | Domalys • Créateurs d’autonomie

| phone : +33 5 49 83 00 08
| site : www.domalys.com
| email : sebast...@domalys.com
| address : 58 Rue du Vercors 86240 Fontaine-Le-Comte

 



 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
What kind of fencing are you using ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:24, Sebastien BASTARD 
wrote:   Hello Strahil Nikolov,

Qdevice is not a vm. It is a Linux Debian, physical server.

Best regards.

Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a écrit :

Is the qdevice on a VM ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD 
wrote:   ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Sebastien BASTARD
Hello Strahil Nikolov,

Qdevice is not a vm. It is a Linux Debian, physical server.

Best regards.

Le mar. 22 févr. 2022 à 14:20, Strahil Nikolov  a
écrit :

> Is the qdevice on a VM ?
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD
>  wrote:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Strahil Nikolov via Users
Is the qdevice on a VM ?
Best Regards,Strahil Nikolov
 
 
  On Tue, Feb 22, 2022 at 15:03, Sebastien BASTARD 
wrote:   ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-22 Thread Sebastien BASTARD
Hello Ulrich, Hello team,

This night the servers of the cluster restarted together twice ( 08h17m07 &
08h50m04 22/02/2022 for server A, 08h16m32 & 08h49m43 22/02/2022 for server
B ).

Here is the result of the up/down test :

*ServerA :*

*Log of Qdevice from ServerA :*


   - None

*Log of ServerB from ServerA :*


   - 21/02/2022 18:48:45 Down between 0 and 4 seconds
   - 21/02/2022 18:58:33 Down between 0 and 4 seconds
   - 21/02/2022 19:19:43 Down between 0 and 3 seconds
   - *No trace of lost communication for 08h17 & 08h50  of server B because
   after the restart, the scripts of up/down test have not restarted.*

*ServerB :*

*Log of Qdevice from ServerB :*


   - 21/02/2022 08:30:26 Down between 0 and 3 seconds
   - 21/02/2022 23:02:14 Down between 0 and 3 seconds

*Log of ServerA from ServerB :  *


   - 21/02/2022 18:47:38 Down between 0 and 4 seconds
   - 21/02/2022 19:25:06 Down between 0 and 4 seconds
   - 21/02/2022 19:42:39 Down between 0 and 4 seconds
   - *No trace of lost communication for 08h16 & 08h49 of server B because
   after the restart, the scripts of up/down test have not restarted.*

*QDevice :*

*Log of ServerA from Qdevice :*


   - 22/02/2022 07:15:57 Down between 83 and 86 seconds => ( it match of
   restart of the server if we add 1 hour to the time )
   - 22/02/2022 07:48:52 Down between 82 and 85 seconds => ( it match of
   restart of the server if we add 1 hour to the time )

*Log of ServerB from Qdevice :*


   - 21/02/2022 23:02:22 Down between 0 and 4 seconds
   - 22/02/2022 07:15:46 Down between 55 and 58 seconds => ( it match of
   restart of the server if we add 1 hour to the time )
   - 22/02/2022 07:48:58 Down between 56 and 59 seconds => ( it match of
   restart of the server if we add 1 hour to the time )


Strangely, the clocks of the 3 computers are the same but, each time, the
time of Qdevice is less than 1 hour than ServerA or ServerB.

I don't understand why I have no trace of lost connection between servers
before they restarted.

If ServerS and ServerS lost connection with the qDevice, can someone
confirm to me if they restart (fencing) or not ?

Thanks for your help.

Le lun. 21 févr. 2022 à 10:08, Sebastien BASTARD  a
écrit :

> Hello Ulrich,
>
> I modified your script to add the capability to test the TCP connectivity.
> Currently, between servers A or B and the QDevice, there is a firewall
> which doesn't answer to ping request. So, I tested the 5403 port.
>
> There is result of the week-end :
>
> Logs of Server A :
>
> ==> log_up_down_ServerB_from_ServerA.txt <==
> ---START 1645111039 (2022-02-17_15:17:19)
> 0 (11) -> 1 1645111050 (2022-02-17_15:17:30)
> ---EXIT 1645177062 (2022-02-18_09:37:42)
> ---START 1645199714 (2022-02-18_15:55:14)
> 0 (4) -> 1 1645199718 (2022-02-18_15:55:18)
>
>
> ==> log_up_down_qdevice_from_ServerA.txt <==
> ---START 1645117334 (2022-02-17_17:02:14)
> 0 (10) -> 1 1645117344 (2022-02-17_17:02:24)
> *1 (27820) -> 0 1645145164 (2022-02-18_00:46:04)*
> 0 (10) -> 1 1645145174 (2022-02-18_00:46:14)
> ---EXIT 1645177062 (2022-02-18_09:37:42)
> ---START 1645199684 (2022-02-18_15:54:44)
> 0 (3) -> 1 1645199687 (2022-02-18_15:54:47)
> *1 (19519) -> 0 1645219206 (2022-02-18_21:20:06)*
> 0 (3) -> 1 1645219209 (2022-02-18_21:20:09)
>
> The scripts on Server A stopped working because I forgot to launch it in
> the background. But we can see that server A lost connection with the
> Qdevice twice.
>
> Logs of Server B :
>
> ==> log_up_down_ ServerA_from_ServerB.txt <==
> ---START 1645110964 (2022-02-17_15:16:04)
> 0 (11) -> 1 1645110975 (2022-02-17_15:16:15)
> ---EXIT 1645199533 (2022-02-18_15:52:13)
> ---START 1645199576 (2022-02-18_15:52:56)
> 0 (4) -> 1 1645199580 (2022-02-18_15:53:00)
>
>
> ==> log_up_down_qdevice_from_ ServerB  .txt <==
> ---START 1645117428 (2022-02-17_17:03:48)
> 0 (10) -> 1 1645117438 (2022-02-17_17:03:58)
> ---EXIT 1645199529 (2022-02-18_15:52:09)
> ---START 1645199546 (2022-02-18_15:52:26)
> 0 (3) -> 1 1645199549 (2022-02-18_15:52:29)
> *1 (232677) -> 0 1645432226 (2022-02-21_08:30:26)*
> 0 (3) -> 1 1645432229 (2022-02-21_08:30:29)
>
>
> The scripts on Server B stopped working because I forgot to launch it in
> the background. But we can see that server B lost connection with the
> Qdevice one time.
>
> Logs of qDevice :
>
> ==> log_up_down_ServerA_from_qdevice.txt <==
> ---START 1645363302 (2022-02-20_13:21:42)
> 0 (4) -> 1 1645363306 (2022-02-20_13:21:46)
>
>
> ==> log_up_down_ ServerB _from_qdevice.txt <==
> ---START 1645363310 (2022-02-20_13:21:50)
> 0 (4) -> 1 1645363314 (2022-02-20_13:21:54)
>
>
> The scripts on qDevice stopped working because the input was linked to the
> script and after some minutes, the OS killed the script. We can see the
> Qdevice never lost the connection with the 2 servers.
>
> I continue to control the output of the scripts to see when the servers
> lost the connections and when they are fencing.
>
> Best regards.
>
>
> Le ven. 18 févr. 2022 à 08:07, Ulrich Windl <
> 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-21 Thread Sebastien BASTARD
Hello Ulrich,

I modified your script to add the capability to test the TCP connectivity.
Currently, between servers A or B and the QDevice, there is a firewall
which doesn't answer to ping request. So, I tested the 5403 port.

There is result of the week-end :

Logs of Server A :

==> log_up_down_ServerB_from_ServerA.txt <==
---START 1645111039 (2022-02-17_15:17:19)
0 (11) -> 1 1645111050 (2022-02-17_15:17:30)
---EXIT 1645177062 (2022-02-18_09:37:42)
---START 1645199714 (2022-02-18_15:55:14)
0 (4) -> 1 1645199718 (2022-02-18_15:55:18)


==> log_up_down_qdevice_from_ServerA.txt <==
---START 1645117334 (2022-02-17_17:02:14)
0 (10) -> 1 1645117344 (2022-02-17_17:02:24)
*1 (27820) -> 0 1645145164 (2022-02-18_00:46:04)*
0 (10) -> 1 1645145174 (2022-02-18_00:46:14)
---EXIT 1645177062 (2022-02-18_09:37:42)
---START 1645199684 (2022-02-18_15:54:44)
0 (3) -> 1 1645199687 (2022-02-18_15:54:47)
*1 (19519) -> 0 1645219206 (2022-02-18_21:20:06)*
0 (3) -> 1 1645219209 (2022-02-18_21:20:09)

The scripts on Server A stopped working because I forgot to launch it in
the background. But we can see that server A lost connection with the
Qdevice twice.

Logs of Server B :

==> log_up_down_ ServerA_from_ServerB.txt <==
---START 1645110964 (2022-02-17_15:16:04)
0 (11) -> 1 1645110975 (2022-02-17_15:16:15)
---EXIT 1645199533 (2022-02-18_15:52:13)
---START 1645199576 (2022-02-18_15:52:56)
0 (4) -> 1 1645199580 (2022-02-18_15:53:00)


==> log_up_down_qdevice_from_ ServerB  .txt <==
---START 1645117428 (2022-02-17_17:03:48)
0 (10) -> 1 1645117438 (2022-02-17_17:03:58)
---EXIT 1645199529 (2022-02-18_15:52:09)
---START 1645199546 (2022-02-18_15:52:26)
0 (3) -> 1 1645199549 (2022-02-18_15:52:29)
*1 (232677) -> 0 1645432226 (2022-02-21_08:30:26)*
0 (3) -> 1 1645432229 (2022-02-21_08:30:29)


The scripts on Server B stopped working because I forgot to launch it in
the background. But we can see that server B lost connection with the
Qdevice one time.

Logs of qDevice :

==> log_up_down_ServerA_from_qdevice.txt <==
---START 1645363302 (2022-02-20_13:21:42)
0 (4) -> 1 1645363306 (2022-02-20_13:21:46)


==> log_up_down_ ServerB _from_qdevice.txt <==
---START 1645363310 (2022-02-20_13:21:50)
0 (4) -> 1 1645363314 (2022-02-20_13:21:54)


The scripts on qDevice stopped working because the input was linked to the
script and after some minutes, the OS killed the script. We can see the
Qdevice never lost the connection with the 2 servers.

I continue to control the output of the scripts to see when the servers
lost the connections and when they are fencing.

Best regards.


Le ven. 18 févr. 2022 à 08:07, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> a écrit :

> >>> Sebastien BASTARD  schrieb am 17.02.2022 um
> 16:28 in
> Nachricht
> :
> > Thank you Ulrich for your script !
> >
> > I launched it, with 10 seconds delay :
> >
> >- on Server A, to ping Server B
> >- on Server B, to ping server A
> >- on QDevice, to ping server A and Server B
> >
> > I currently can't ping Qdevice from server A and B, because it is behind
> a
> > firewall which only authorizes port 5403.
> >
> > Tomorrow, I will see the results.
>
> Maybe another remark: The script was not desoigned for cluster, so it was
> good enough to reditrect the output of the script to a file.
> However bash may buffer some lines before they are written. If the script
> is killed, that's not a problem, but if the node is fenced, you might loose
> the last lines(s).
> So maybe you want do change the echo statement in log_time() to:
> echo "$@ $t ($(date -d@"$t" -u +%F_%T))" >> your_log_file
>
> Maybe you want to use a variable or parameter for that.
>
> Regards,
> Ulrich
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>


-- 


Sébastien BASTARD
*Ingénieur R* | Domalys • Créateurs d’autonomie

| phone : +33 5 49 83 00 08
| site : www.domalys.com
| email : sebast...@domalys.com
| address : 58 Rue du Vercors 86240 Fontaine-Le-Comte

 



 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] The 2 servers of the cluster randomly reboot almost together

2022-02-17 Thread Ulrich Windl
>>> Sebastien BASTARD  schrieb am 17.02.2022 um 16:28 in
Nachricht
:
> Thank you Ulrich for your script !
> 
> I launched it, with 10 seconds delay :
> 
>- on Server A, to ping Server B
>- on Server B, to ping server A
>- on QDevice, to ping server A and Server B
> 
> I currently can't ping Qdevice from server A and B, because it is behind a
> firewall which only authorizes port 5403.
> 
> Tomorrow, I will see the results.

Maybe another remark: The script was not desoigned for cluster, so it was good 
enough to reditrect the output of the script to a file.
However bash may buffer some lines before they are written. If the script is 
killed, that's not a problem, but if the node is fenced, you might loose the 
last lines(s).
So maybe you want do change the echo statement in log_time() to:
echo "$@ $t ($(date -d@"$t" -u +%F_%T))" >> your_log_file

Maybe you want to use a variable or parameter for that.

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/