On 10/11/19 10:32 PM, Lentes, Bernd wrote: > Hi, > > occasionally the stop of a Filesystem resource for an OCFS2 Partition fails > to stop.
The stop failure is very bad, and is crucial for HA system. You can try o2locktop cli to find the potential INODE to be blamed[1]. `o2locktop --help` gives you more usage details [1] o2locktop package https://software.opensuse.org/package/o2locktop?search_term=o2locktop > I'm currently tracing this RA hoping to find the culprit. > I'm putting one of both nodes into standby, hoping the error appears. > Afterwards setting it online again and doing the same procedure with the > other node. > Of course now the error does not appear :-)) > But i don't find any files under /var/lib/heartbeat/trace_ra/Filesystem for a > stop operation. > Resource is part of a group which is cloned. > > I configured the tracing with "crm resource trace fs_ocfs2 stop". > > Result: > primitive fs_ocfs2 Filesystem \ > params device="/dev/vg_san/lv_ocfs2" directory="/mnt/ocfs2" > fstype=ocfs2 \ > params fast_stop=no force_unmount=true \ > op monitor interval=30 timeout=20 \ > op start timeout=60 interval=0 \ > op stop timeout=60 interval=0 \ > op_params trace_ra=1 \ > meta is-managed=true target-role=Started > > I expect log files for the stop operation in > /var/lib/heartbeat/trace_ra/Filesystem. > But i don't get any. > Might be umount hung and has not time to flush log to disk. Cheers Roger _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
