date:20211020

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov

On 20.10.2021 17:54, Ian Diddams wrote:
>  
> 
> On Wednesday, 20 October 2021, 11:15:48 BST, Andrei Borzenkov 
>  wrote:  
>  
>  
>> You cannot resolve split brain without fencing. This is as simple as
>> that. Your pacemaker configuration (from another mail) shows
> 
>> pcs -f clust_cfg property set stonith-enabled=false
>> pcs -f clust_cfg property set no-quorum-policy=ignore
> 
>> This is a recipe for data corruption.
> Thanks Andrei - I appreciate your feedback.  So the obvious question form 
> this bloke that trying to get up to speed etc etc is
> 
> ...  how do i set up fencing to
> 

It depends on what hardware you have. For physical systems IPMI may be
available or managed power outlets; both allow cutting power to another
node over LAN.

For virtual machines you may use fencing agent that contacts hypervisor
or management instance (like vCenter).

There is also SBD, it requires shared storage. Third node with iSCSI
target may do it.

Qdevice with watchdog may be an option.

Personally I prefer SBD which is the most hardware-agnostic solution.

> * avoid data corruption
> * enable automatic resolution of split-brains?
> 
> cheers
> ian
> 
>   
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Ian Diddams via Users

 

On Wednesday, 20 October 2021, 11:15:48 BST, Andrei Borzenkov 
 wrote:  
 
 
>You cannot resolve split brain without fencing. This is as simple as
>that. Your pacemaker configuration (from another mail) shows

> pcs -f clust_cfg property set stonith-enabled=false
> pcs -f clust_cfg property set no-quorum-policy=ignore

>This is a recipe for data corruption.
Thanks Andrei - I appreciate your feedback.  So the obvious question form this 
bloke that trying to get up to speed etc etc is

...  how do i set up fencing to

* avoid data corruption
* enable automatic resolution of split-brains?

cheers
ian

  ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Klaus Wenninger

On Wed, Oct 20, 2021 at 12:06 PM Ian Diddams via Users <
users@clusterlabs.org> wrote:

> FWIW here is the basis for my implementation being the "best" and easily
> followed drbd/clustering guide/explanantiojn I could find when I searched
>
> Lisenet.com :: Linux | Security | Networking | Admin Blog
> 
>
>
> After all it is still a pacemaker cluster so you can get the basics from
https://clusterlabs.org/pacemaker/doc/.
Pick "Clusters from Scratch" - version matching your pacemaker-version -
for an intuitive walkthrough.
It even has a section going through a DRBD setup.

Klaus

> cheers
>
> ian
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-20 Thread Ken Gaillot

On Wed, 2021-10-20 at 09:35 +0200, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 19.10.2021 um
> > > > 19:16 in
> Nachricht
> :
> > Hi all,
> > 
> > I hope to get the first release candidate for Pacemaker 2.1.2 out
> > in a
> > couple of weeks.
> > 
> > One improvement will be in status displays (crm_mon, and the
> > crm_resource ‑‑force‑* options) for failed actions.
> > 
> > OCF resource agents already have the ability to output an "exit
> > reason"
> > for failures. These are displayed in the status, to give more
> > detailed
> > information than just "error".
> > 
> > Now, Pacemaker will set exit reasons for internal failures as well.
> > This includes problems such as an agent or systemd unit not being
> > installed, timeouts in Pacemaker communication as opposed to the
> > agent
> > itself, an agent process being killed by a signal, etc.
> > 
> > As an example, sending a kill ‑9 to a running agent monitor would
> > previously result in status with no explanation, requiring some log
> > diving to figure it out:
> > 
> >  * rsc1_monitor_6 on node1 'error' (1): call=188,
> > status='Error',
> > exitreason='', last‑rc‑change='Fri Sep 24 14:45:02 2021',
> > queued=0ms,
> > exec=0ms
> > 
> > Now, the exit reason will plainly say what happened:
> > 
> >  * rsc1_monitor_6 on node1 'error' (1): call=188,
> > status='Error',
> > exitreason='Process interrupted by signal', last‑rc‑change='Fri Sep
> > 24
> > 14:45:02 2021', queued=0ms, exec=0ms
> 
> Oops: When you detected that a process was terminated by a signal you
> would
> also know _which_ signal; why not log it then?
> And: Do you also detect and log when a core-dump was created?
> 
> That would just sound logical to me.
> 
> Regards,
> Ulrich

Yes, the log messages do have more detail -- the crm_mon display has to
be more concise, but it should at least give a strong pointer to what
to look for in the logs or elsewhere.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Andrei Borzenkov

On Wed, Oct 20, 2021 at 11:54 AM Ian Diddams via Users
 wrote:
>
> So - system logs recently show this
>
> ESTRELA
> Oct 18th
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node
>
> Oct 19th
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: 
> Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: 
> Split-Brain detected but unresolved, dropping connection!
>
>
> RAFEIRO
> Oct 18
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
>
> Oct 19
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: 
> Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: 
> Split-Brain detected but unresolved, dropping connection!
>
>
>
> So on the 18th the split-brain issues was detected but (automatically?) fixed.
> But on the 19th it wasnt...
>

18th is probably initial synchronization. In this case DRBD knows who
is who and which side is primary.

You cannot resolve split brain without fencing. This is as simple as
that. Your pacemaker configuration (from another mail) shows

> pcs -f clust_cfg property set stonith-enabled=false
> pcs -f clust_cfg property set no-quorum-policy=ignore

This is a recipe for data corruption.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Ian Diddams via Users

 FWIW here is the basis for my implementation being the "best" and easily 
followed drbd/clustering guide/explanantiojn I could find when I searched
Lisenet.com :: Linux | Security | Networking | Admin Blog

cheers
ian
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Ian Diddams via Users

 

  
>>So you drive without safety-belt and airbag (read: fencing)?

possibly?  probably?

As I said Im flying blind with this all - I was asked to try and implement it, 
Ive tried the best I can to implement it but for all I know the how-tos and 
advice I found have nmissed what may be needed.

Ive looked for online tutorials  but have failed to come up with anything much 
aside from "do these commands and there you have it".  Which may not uinclude 
more belts and braces.

>>
you're asking the worng operson.  Im given two systems, whose disks are on a 
SAN.  Minbe is not to reason why etc.
>> I wondered where the cluster is in those logs.
sorry - Ive not understood the questin here.


happy to provide extracts form logs etc.  Below Ive appended the "set up" 
commands/steps used to implement drbd_pcs+corosync on the systems if that helps 
outloine any more.

I'm just the guy that fires the bullets in effect, trying to aim as best he 
can...

ian

# prep
umount /var/lib/mysql
  - and remove /var/lib/mysql from /etc/fstab
yum remove mysql-community-server
cd /var/lib/mysql; rm -rf *
mkdir /var/lib/mysql
chown mysql:mysql /var/lib/mysql
chmod 755 /var/lib/mysql
reboot

yum makecache fast
yum -y install wget mlocate telnet lsof
updatedb


# Install Pacemaker and Corosync
yum install -y pcs
yum install -y policycoreutils-python
echo "passwd" | passwd hacluster --stdin
systemctl start pcsd.service
systemctl enable pcsd.service

#Configure Corosync
#[estrela]]
pcs cluster auth estrela rafeiro -u hacluster -p passwd
pcs cluster setup --name mysql_cluster estrela rafeiro
pcs cluster start --all

## Install DRBD
## BOTH

yum install -y kmod-drbd90
yum install -y drbd90-utils
systemctl enable corosync
systemctl enable pacemaker
reboot

# when back up
modprobe drbd
systemctl status pcsd
systemctl status corosync
systemctl status pacemaker

cat << EOL >/etc/drbd.d/mysql01.res
resource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_mysql/lv_mysql;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on estrela {
  address  10.108.248.165:7789;
 }
 on rafeiro {
  address  10.108.248.166:7789;
 }
}
EOL

* clear previously created filesystem
dd if=/dev/zero of=/dev/vg_mysql/lv_mysql bs=1M count=128   


drbdadm create-md mysql01
systemctl start drbd
systemctl enable drbd
systemctl status drbd

#[estrela]
drbdadm primary --force mysql01
estrela: cat 
/sys/kernel/debug/drbd/resources/mysql01/connections/rafeiro/0/proc_drbd
rafeiro:  cat 
/sys/kernel/debug/drbd/resources/mysql01/connections/estrela/0/proc_drbd
drbdadm status

# WAIT UNTIL DRBD IS SYNCED

#[estrela]
mkfs.xfs -f  -L drbd /dev/drbd0
mount /dev/drbd0 /mnt


## INSTALL MYSQL on all
## BOTH
yum install mysql-server -y

# [estrela]
mysql_install_db --datadir=/mnt --user=mysql
systemctl stop mysqld
umount /mnt

#BOTH  
cp -p /etc/my.cnf /etc/my.cnf.ORIG
set up my.cnf as needed (migtated from existing mysql server)

#BOTH
mv /var/lib/mysql /var/lib/mysql.orig
mkdir /var/lib/mysql
chown mysql:mysql /var/lib/mysql
chmod 751 /var/lib/mysql
mkdir  /var/lib/mysql/innodb
chown mysql:mysql /var/lib/mysql/innodb
chmod 755 /var/lib/mysql/innodb


# estrela
mount /dev/drbd0 /var/lib/mysql
systemctl start mysqld

# set up mysql
grep 'temporary password' /var/log/mysqld.log
mysql_secure_installation
rm /root/.mysql_secret


# set up grants

flush privileges;

# test grants
[estrela]# mysql -uroot --skip-column-names -A -e"SELECT CONCAT('SHOW GRANTS 
FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql -uroot 
--skip-column-names -A | sed 's/$/;/g'
[rafeiro]# mysql -h estrela -uroot --skip-column-names -A -e"SELECT 
CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE 
user<>''" | mysql -hestrela -uroot --skip-column-names -A | sed 's/$/;/g'
mysql -h mysqldbdynabookHA -uroot --skip-column-names -A -e"SELECT CONCAT('SHOW 
GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql 
-hestrela -uroot --skip-column-names -A | sed 's/$/;/g'
# stop test_200

# [estrela]
systemctl stop mysqld
umount /var/lib/mysql

# snapshot servers - pre-clustered

# Configure Pacemaker Cluster
# [estrela]
pcs cluster cib clust_cfg


pcs -f clust_cfg property set stonith-enabled=false
pcs -f clust_cfg property set no-quorum-policy=ignore
pcs -f clust_cfg resource defaults resource-stickiness=200
pcs -f clust_cfg resource create mysql_data01 ocf:linbit:drbd 
drbd_resource=mysql01 op monitor interval=30s
pcs -f clust_cfg resource master MySQLClone01 mysql_data01 master-max=1 
master-node-max=1 clone-max=2 clone-node-max=1 notify=true
#pcs -f clust_cfg resource create mysql_fs01 Filesystem device="/dev/drbd0" 
directory="/var/lib/mysql" fstype="ext4"

[ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Ulrich Windl

>>> Ian Diddams via Users  schrieb am 20.10.2021 um 
>>> 10:54 in
Nachricht <527856924.6994967.1634720079...@mail.yahoo.com>:
> I've been testing an implementation of a HA mysql cluster for a few months 
> now. I came to this project with no preior knoweldge of what was 
> copncerned/needed and have learned orgainscally via various online how-tos 
> and web sites which many cases wrere slightly out-of-date to missing large 
> chunks of perinent information.  Thats not a criticism at all of those still 
> helpful aids, but more an indication of how there are huge holes in my 
> knowledge..
> 
> So with that background ...
> 
> The cluster consits of 2 centos7 servers (esterla and rafeiro) running 
> DRBD90
> corosync 2.4.5pacemaker 0.9.169
> On the whole its all running fine with some squeaks that we are hoping are 
> down to underlying SAN issues.
> 
>  However...
> earlier this week we had some split-brain issues - some of which seem to 
> have fixed themselves, others not.  What we did notice that whilst the 
> split-brain was being reported the overall cluster remained up (of course?) 

So you drive without safety-belt and airbag (read: fencing)?

> in that the VIP remained up, abnd the mysql instance remained abvailavle via 
> the VIP on port 3306. The underlying coincern being of course that had a 
> "flip" occurred from previous master to the previous slave, the new master's 
> drbd device (moun ted on /var/lib/mysql) may well be out of sync and thus 
> contain "old" data.
> 
> So - system logs recently show this
> 
> ESTRELAOct 18th
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node
> Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> peer node

You said you have a SAN and you are using DRBD? Why?

> 
> Oct 19th
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> 
> 
> RAFEIRO
> Oct 18
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
> Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 
> drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from 
> this node
> 
> Oct 19
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 
> drbd0: Split-Brain detected but unresolved, dropping connection!
> 
> 
> 
> So on the 18th the split-brain issues was detected but (automatically?) 
> fixed.
> But on the 19th it wasnt...
> 
> Any ideas how to investigate why it worked on the 18th and not the 19th?  I 
> am presuming the drbd config is set up to automatically fix stuff but maybe 
> we just got lucky and it isnt?  (Ive googled automatic fixes but I am afarid 
> I cant follow what Im being told/reading :-(  )

I wondered where the cluster is in those logs.

> 
> drbd config below
> ta
> ian
> 
> ==
> ESTRELAresource mysql01 {
>  protocol C;
>  meta-disk internal;
>  device /dev/drbd0;
>  disk   /dev/vg_mysql/lv_mysql;
>  handlers {
>   split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>  }
>  net {
>   allow-two-primaries no;
>   after-sb-0pri discard-zero-changes;
>   after-sb-1pri discard-secondary;
>   after-sb-2pri disconnect;
>   rr-conflict disconnect;
>  }
>  disk {
>   on-io-error detach;
>  }
>  syncer {
>   verify-alg sha1;
>  }
>  on estrela {
>   address  10.108.248.165:7789;
>  }
>  on rafeiro {
>   address  10.108.248.166:7789;
>  }
> }
> 
> 
> 
> RAFEIRO
> resource mysql01 {
>  protocol C;
>  meta-disk internal;
>  device /dev/drbd0;
>  disk   /dev/vg_mysql/lv_mysql;
>  handlers {
>   split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>  }
>  net {
>   allow-two-primaries no;
>   after-sb-0pri discard-zero-changes;
>   after-sb-1pri discard-secondary;
>   after-sb-2pri disconnect;
>   rr-conflict disconnect;
>  }
>  disk {
>   on-io-error detach;
>  }
>  syncer {
>   verify-alg sha1;
>  }
>  on estrela {
>   address  10.108.248.165:7789;
>  }
>  on rafeiro {
>   address  10.108.248.166:7789;
>  }
> }



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

2021-10-20 Thread Ian Diddams via Users

I've been testing an implementation of a HA mysql cluster for a few months now. 
I came to this project with no preior knoweldge of what was copncerned/needed 
and have learned orgainscally via various online how-tos and web sites which 
many cases wrere slightly out-of-date to missing large chunks of perinent 
information.  Thats not a criticism at all of those still helpful aids, but 
more an indication of how there are huge holes in my knowledge..

So with that background ...

The cluster consits of 2 centos7 servers (esterla and rafeiro) running 
DRBD90
corosync 2.4.5pacemaker 0.9.169
On the whole its all running fine with some squeaks that we are hoping are down 
to underlying SAN issues.

 However...
earlier this week we had some split-brain issues - some of which seem to have 
fixed themselves, others not.  What we did notice that whilst the split-brain 
was being reported the overall cluster remained up (of course?) in that the VIP 
remained up, abnd the mysql instance remained abvailavle via the VIP on port 
3306. The underlying coincern being of course that had a "flip" occurred from 
previous master to the previous slave, the new master's drbd device (moun ted 
on /var/lib/mysql) may well be out of sync and thus contain "old" data.

So - system logs recently show this

ESTRELAOct 18th
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0: 
Split-Brain detected, 1 primaries, automatically solved. Sync from peer node
Oct 18 04:04:28 wp-vldyn-estrela kernel: [584651.491139] drbd mysql01/0 drbd0: 
Split-Brain detected, 1 primaries, automatically solved. Sync from peer node

Oct 19th
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: 
Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-estrela kernel: [47892.092191] drbd mysql01/0 drbd0: 
Split-Brain detected but unresolved, dropping connection!


RAFEIRO
Oct 18
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0: 
Split-Brain detected, 1 primaries, automatically solved. Sync from this node
Oct 18 04:04:28 wp-vldyn-rafeiro kernel: [584652.907126] drbd mysql01/0 drbd0: 
Split-Brain detected, 1 primaries, automatically solved. Sync from this node

Oct 19
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: 
Split-Brain detected but unresolved, dropping connection!
Oct 19 03:45:43 wp-vldyn-rafeiro kernel: [47864.401284] drbd mysql01/0 drbd0: 
Split-Brain detected but unresolved, dropping connection!



So on the 18th the split-brain issues was detected but (automatically?) fixed.
But on the 19th it wasnt...

Any ideas how to investigate why it worked on the 18th and not the 19th?  I am 
presuming the drbd config is set up to automatically fix stuff but maybe we 
just got lucky and it isnt?  (Ive googled automatic fixes but I am afarid I 
cant follow what Im being told/reading :-(  )

drbd config below
ta
ian

==
ESTRELAresource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_mysql/lv_mysql;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on estrela {
  address  10.108.248.165:7789;
 }
 on rafeiro {
  address  10.108.248.166:7789;
 }
}



RAFEIRO
resource mysql01 {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_mysql/lv_mysql;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on estrela {
  address  10.108.248.165:7789;
 }
 on rafeiro {
  address  10.108.248.166:7789;
 }
}






___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-20 Thread Ulrich Windl

>>> "Walker, Chris"  schrieb am 19.10.2021 um 20:11
in
Nachricht


> That looks great … is that a string that an RA can set on failure?  I’d love

> to be able to communicate RA‑specific failure reasons back to crm_mon 
> consumers…

What about:
ocf_exit_reason "$0 $1 not implemented"

Regards,
Ulrich

> Thanks!
> Chris
> 
> From: Users 
> Date: Tuesday, October 19, 2021 at 1:17 PM
> To: users@clusterlabs.org 
> Subject: [ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal

> failures
> Hi all,
> 
> I hope to get the first release candidate for Pacemaker 2.1.2 out in a
> couple of weeks.
> 
> One improvement will be in status displays (crm_mon, and the
> crm_resource ‑‑force‑* options) for failed actions.
> 
> OCF resource agents already have the ability to output an "exit reason"
> for failures. These are displayed in the status, to give more detailed
> information than just "error".
> 
> Now, Pacemaker will set exit reasons for internal failures as well.
> This includes problems such as an agent or systemd unit not being
> installed, timeouts in Pacemaker communication as opposed to the agent
> itself, an agent process being killed by a signal, etc.
> 
> As an example, sending a kill ‑9 to a running agent monitor would
> previously result in status with no explanation, requiring some log
> diving to figure it out:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='', last‑rc‑change='Fri Sep 24 14:45:02 2021', queued=0ms,
> exec=0ms
> 
> Now, the exit reason will plainly say what happened:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='Process interrupted by signal', last‑rc‑change='Fri Sep 24
> 14:45:02 2021', queued=0ms, exec=0ms
> 
> ‑‑
> Ken Gaillot 
> 
> ___
> Manage your subscription:
>
https://lists.clusterlabs.org/mailman/listinfo/users s.org/mailman/listinfo/users>
> 
> ClusterLabs home:
https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

2021-10-20 Thread Ulrich Windl

>>> Ken Gaillot  schrieb am 19.10.2021 um 19:16 in
Nachricht
:
> Hi all,
> 
> I hope to get the first release candidate for Pacemaker 2.1.2 out in a
> couple of weeks.
> 
> One improvement will be in status displays (crm_mon, and the
> crm_resource ‑‑force‑* options) for failed actions.
> 
> OCF resource agents already have the ability to output an "exit reason"
> for failures. These are displayed in the status, to give more detailed
> information than just "error".
> 
> Now, Pacemaker will set exit reasons for internal failures as well.
> This includes problems such as an agent or systemd unit not being
> installed, timeouts in Pacemaker communication as opposed to the agent
> itself, an agent process being killed by a signal, etc.
> 
> As an example, sending a kill ‑9 to a running agent monitor would
> previously result in status with no explanation, requiring some log
> diving to figure it out:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='', last‑rc‑change='Fri Sep 24 14:45:02 2021', queued=0ms,
> exec=0ms
> 
> Now, the exit reason will plainly say what happened:
> 
>  * rsc1_monitor_6 on node1 'error' (1): call=188, status='Error',
> exitreason='Process interrupted by signal', last‑rc‑change='Fri Sep 24
> 14:45:02 2021', queued=0ms, exec=0ms

Oops: When you detected that a process was terminated by a signal you would
also know _which_ signal; why not log it then?
And: Do you also detect and log when a core-dump was created?

That would just sound logical to me.

Regards,
Ulrich


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Is there a DRBD forum?

2021-10-20 Thread Digimer


  
  
On 2021-10-19 04:27, Ian Diddams via
  Users wrote:


  
  
Rather than clog up what i
  perceive as a pacemaker/corosync forum is there a DRBD forum I
  could ask a query to?
  
  (FWIW Im trying to find a way to specifically log drbd to a
  separate log other thjan system log via its kernel logging ?)

  

DRBD, along with any and all open source projects related to high
  availability, are welcome and on-topic. So no worries there.
Separately, there is also a dedicated DRBD user's list at
  drbd-u...@lists.linbit.com, and they maintain a slack on
  "linbit-community". 

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

Re: [ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

Re: [ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

[ClusterLabs] Antw: [EXT] DRBD split‑brain investigations, automatic fixes and manual intervention...

[ClusterLabs] DRBD split-brain investigations, automatic fixes and manual intervention...

[ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.1.2: better display of internal failures

[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

Re: [ClusterLabs] Is there a DRBD forum?

12 matches

Site Navigation

Mail list logo

Footer information