[ClusterLabs] Help with tweaking an active/passive NFS cluster

Ronny Adsetts Thu, 30 Mar 2023 14:42:09 -0700

Hi,

I wonder if someone more familiar with the workings of pacemaker/corosync would 
be able to assist in solving an issue.


I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are 
presented to the nodes via multipathd.

This all works fine except that I can't stop just one export. Sometimes I need 
to take a single filesystem offline for maintenance for example. Or if there's 
an issue and a filesystem goes offline and can't come back.

There's a trimmed down config below but essentially I want all the NFS exports 
on one node but I don't want any of the exports to block. So it's OK to stop 
(or fail) a single export.

My config has a group for each export and filesystem and another group for the 
NFS server and VIP. I then co-locate them together.

Cut-down config to limit the number of exports:

node 1: nfs-01
node 2: nfs-02
node 3: nfs-03
primitive NFSExportAdminHomes exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
directory="/srv/adminhomes" fsid=dcfd1bbb-c026-4d6d-8541-7fc29d6fef1a \
        op monitor timeout=20 interval=10 \
        op_params interval=10
primitive NFSExportArchive exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
directory="/srv/archive" fsid=3abb6e34-bff2-4896-b8ff-fc1123517359 \
        op monitor timeout=20 interval=10 \
        op_params interval=10 \
        meta target-role=Started
primitive NFSExportDBBackups exportfs \
        params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
directory="/srv/dbbackups" fsid=df58b9c0-593b-45c0-9923-155b3d7d9483 \
        op monitor timeout=20 interval=10 \
        op_params interval=10
primitive NFSFSAdminHomes Filesystem \
        params device="/dev/mapper/adminhomes-part1" 
directory="/srv/adminhomes" fstype=xfs \
        op start interval=0 timeout=120 \
        op monitor interval=60 timeout=60 \
        op_params OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=240
primitive NFSFSArchive Filesystem \
        params device="/dev/mapper/archive-part1" directory="/srv/archive" 
fstype=xfs \
        op start interval=0 timeout=120 \
        op monitor interval=60 timeout=60 \
        op_params OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=240 \
        meta target-role=Started
primitive NFSFSDBBackups Filesystem \
        params device="/dev/mapper/dbbackups-part1" directory="/srv/dbbackups" 
fstype=xfs \
        op start timeout=60 interval=0 \
        op monitor interval=20 timeout=40 \
        op stop timeout=60 interval=0 \
        op_params OCF_CHECK_LEVEL=20
primitive NFSIP-01 IPaddr2 \
        params ip=172.16.40.17 cidr_netmask=24 nic=ens14 \
        op monitor interval=30s
group AdminHomes NFSFSAdminHomes NFSExportAdminHomes \
        meta target-role=Started
group Archive NFSFSArchive NFSExportArchive \
        meta target-role=Started
group DBBackups NFSFSDBBackups NFSExportDBBackups \
        meta target-role=Started
group NFSServerIP NFSIP-01 NFSServer \
        meta target-role=Started
colocation NFSMaster inf: NFSServerIP AdminHomes Archive DBBackups
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.1-9e909a5bdd \
        cluster-infrastructure=corosync \
        cluster-name=nfs-cluster \
        stonith-enabled=false \
        last-lrm-refresh=1675344768
rsc_defaults rsc-options: \
        resource-stickiness=200


The problem is that if one export fails, none of the following exports will be 
attempted. Reading the docs, that's to be expected as each item in the 
colocation needs the preceding item to succeed.

I tried changing the colocation line like so to remove the dependency:

colocation NFSMaster inf: NFSServerIP ( AdminHomes Archive DBBackups )

but this gave me two problems:

1. Issuing a "resource stop DBBackups" took everything offline briefly

2. Issuing a "resource start DBBackups" brought it back on a different node to 
NFSServerIP 

I'm very obviously missing something here.

Could someone kindly point me in the right direction?

TIA.

Ronny

-- 
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Help with tweaking an active/passive NFS cluster

Reply via email to