Hello!
> I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured? I have setup ssh fencing just for testing purposes. Kr, Gregory -----Original Message----- From: Users <users-boun...@clusterlabs.org> On Behalf Of Ulrich Windl Sent: 15 July 2020 09:26 To: users@clusterlabs.org Subject: [ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem Hi! I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured? >>> Grégory Sacré <gregory.sa...@s-clinica.com> schrieb am 14.07.2020 um >>> 13:56 in Nachricht <fe83686a028367468065016ac856b02107b7d...@ias-ex10-01.s-clinica.int>: > Dear all, > > > I'm pretty new to Pacemaker so I must be missing something but I > cannot find > it in the documentation. > > I'm setting up a SAMBA File Server cluster with DRBD and Pacemaker. > Here are > the relevant pcs commands related to the mount part: > > user $ sudo pcs cluster cib fs_cfg > user $ sudo pcs ‑f fs_cfg resource create VPSFSMount Filesystem > device="/dev/drbd1" directory="/srv/vps‑fs" fstype="gfs2" > "options=acl,noatime" > Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from > 'Filesystem') > > It all works fine, here is an extract of the pcs status command: > > user $ sudo pcs status > Cluster name: vps‑fs > Stack: corosync > Current DC: vps‑fs‑04 (version 1.1.18‑2b07d5c5a9) ‑ partition with > quorum Last updated: Tue Jul 14 11:13:55 2020 Last change: Tue Jul 14 > 10:31:36 2020 by root via cibadmin on vps‑fs‑04 > > 2 nodes configured > 7 resources configured > > Online: [ vps‑fs‑03 vps‑fs‑04 ] > > Full list of resources: > > stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑04 Clone Set: > dlm‑clone [dlm] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > Master/Slave Set: VPSFSClone [VPSFS] > Masters: [ vps‑fs‑03 vps‑fs‑04 ] > Clone Set: VPSFSMount‑clone [VPSFSMount] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > I can start CTDB (SAMBA cluster manager manually) and it's fine. > However, CTDB shares a lock file between both nodes which is located > on the shared mount point. > > The problem comes from the moment I reboot one of the servers > (vps‑fs‑04) and > Pacemaker (and Corosync) are started automatically upon boot (I'm > talking about unexpected reboot, not maintenance reboot which I didn't try > yet). > After reboot, the server (vps‑fs‑04) comes back online and in the > cluster but > the one that wasn't rebooted has an issue with the mount resource: > > user $ sudo pcs status > Cluster name: vps‑fs > Stack: corosync > Current DC: vps‑fs‑03 (version 1.1.18‑2b07d5c5a9) ‑ partition with > quorum Last updated: Tue Jul 14 11:33:44 2020 Last change: Tue Jul 14 > 10:31:36 2020 by root via cibadmin on vps‑fs‑04 > > 2 nodes configured > 7 resources configured > > Node vps‑fs‑03: UNCLEAN (online) > Online: [ vps‑fs‑04 ] > > Full list of resources: > > stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑03 Clone Set: > dlm‑clone [dlm] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > Master/Slave Set: VPSFSClone [VPSFS] > Masters: [ vps‑fs‑03 ] > Slaves: [ vps‑fs‑04 ] > Clone Set: VPSFSMount‑clone [VPSFSMount] > VPSFSMount (ocf::heartbeat:Filesystem): FAILED vps‑fs‑03 > Stopped: [ vps‑fs‑04 ] > > Failed Actions: > * VPSFSMount_stop_0 on vps‑fs‑03 'unknown error' (1): call=65, > status=Timed > Out, exitreason='Couldn't unmount /srv/vps‑fs; trying cleanup with KILL', > last‑rc‑change='Tue Jul 14 11:23:46 2020', queued=0ms, > exec=60011ms > > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > The problem seems to come from the fact that the mount point > (/srv/vps‑fs) is > busy (probably CTDB lock file) but what I don't understand is why does > the server not rebooted (vps‑fs‑03) need to remount an already mounted > file system > when the other node comes back online. > > I've checked the 'ocf:heartbeat:Filesystem' documentation but nothing > seemed > to help. The only thing I did was to change the following: > > user $ sudo pcs resource update VPSFSMount fast_stop="no" op monitor > timeout="60" > > However this didn't help. Google doesn't give me much help either (but > maybe > I'm not searching for the right thing). > > Thank you in advance for any pointer! > > > Kr, > > Gregory _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/