Hi! I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?
>>> Grégory Sacré <gregory.sa...@s-clinica.com> schrieb am 14.07.2020 um 13:56 in Nachricht <fe83686a028367468065016ac856b02107b7d...@ias-ex10-01.s-clinica.int>: > Dear all, > > > I'm pretty new to Pacemaker so I must be missing something but I cannot find > it in the documentation. > > I'm setting up a SAMBA File Server cluster with DRBD and Pacemaker. Here are > the relevant pcs commands related to the mount part: > > user $ sudo pcs cluster cib fs_cfg > user $ sudo pcs ‑f fs_cfg resource create VPSFSMount Filesystem > device="/dev/drbd1" directory="/srv/vps‑fs" fstype="gfs2" > "options=acl,noatime" > Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem') > > It all works fine, here is an extract of the pcs status command: > > user $ sudo pcs status > Cluster name: vps‑fs > Stack: corosync > Current DC: vps‑fs‑04 (version 1.1.18‑2b07d5c5a9) ‑ partition with quorum > Last updated: Tue Jul 14 11:13:55 2020 > Last change: Tue Jul 14 10:31:36 2020 by root via cibadmin on vps‑fs‑04 > > 2 nodes configured > 7 resources configured > > Online: [ vps‑fs‑03 vps‑fs‑04 ] > > Full list of resources: > > stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑04 > Clone Set: dlm‑clone [dlm] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > Master/Slave Set: VPSFSClone [VPSFS] > Masters: [ vps‑fs‑03 vps‑fs‑04 ] > Clone Set: VPSFSMount‑clone [VPSFSMount] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > I can start CTDB (SAMBA cluster manager manually) and it's fine. However, > CTDB shares a lock file between both nodes which is located on the shared > mount point. > > The problem comes from the moment I reboot one of the servers (vps‑fs‑04) and > Pacemaker (and Corosync) are started automatically upon boot (I'm talking > about unexpected reboot, not maintenance reboot which I didn't try yet). > After reboot, the server (vps‑fs‑04) comes back online and in the cluster but > the one that wasn't rebooted has an issue with the mount resource: > > user $ sudo pcs status > Cluster name: vps‑fs > Stack: corosync > Current DC: vps‑fs‑03 (version 1.1.18‑2b07d5c5a9) ‑ partition with quorum > Last updated: Tue Jul 14 11:33:44 2020 > Last change: Tue Jul 14 10:31:36 2020 by root via cibadmin on vps‑fs‑04 > > 2 nodes configured > 7 resources configured > > Node vps‑fs‑03: UNCLEAN (online) > Online: [ vps‑fs‑04 ] > > Full list of resources: > > stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑03 > Clone Set: dlm‑clone [dlm] > Started: [ vps‑fs‑03 vps‑fs‑04 ] > Master/Slave Set: VPSFSClone [VPSFS] > Masters: [ vps‑fs‑03 ] > Slaves: [ vps‑fs‑04 ] > Clone Set: VPSFSMount‑clone [VPSFSMount] > VPSFSMount (ocf::heartbeat:Filesystem): FAILED vps‑fs‑03 > Stopped: [ vps‑fs‑04 ] > > Failed Actions: > * VPSFSMount_stop_0 on vps‑fs‑03 'unknown error' (1): call=65, status=Timed > Out, exitreason='Couldn't unmount /srv/vps‑fs; trying cleanup with KILL', > last‑rc‑change='Tue Jul 14 11:23:46 2020', queued=0ms, exec=60011ms > > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > The problem seems to come from the fact that the mount point (/srv/vps‑fs) is > busy (probably CTDB lock file) but what I don't understand is why does the > server not rebooted (vps‑fs‑03) need to remount an already mounted file system > when the other node comes back online. > > I've checked the 'ocf:heartbeat:Filesystem' documentation but nothing seemed > to help. The only thing I did was to change the following: > > user $ sudo pcs resource update VPSFSMount fast_stop="no" op monitor > timeout="60" > > However this didn't help. Google doesn't give me much help either (but maybe > I'm not searching for the right thing). > > Thank you in advance for any pointer! > > > Kr, > > Gregory _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/