On 10/11/19 10:32 PM,  Lentes, Bernd  wrote:
> Hi,
> 
> occasionally the stop of a Filesystem resource for an OCFS2 Partition fails 
> to stop.

The stop failure is very bad, and is crucial for HA system.

You can try o2locktop cli to find the potential INODE to be blamed[1].

`o2locktop --help` gives you more usage details

[1] o2locktop package
https://software.opensuse.org/package/o2locktop?search_term=o2locktop


> I'm currently tracing this RA hoping to find the culprit.
> I'm putting one of both nodes into standby, hoping the error appears.
> Afterwards setting it online again and doing the same procedure with the 
> other node.
> Of course now the error does not appear :-))
> But i don't find any files under /var/lib/heartbeat/trace_ra/Filesystem for a 
> stop operation.
> Resource is part of a group which is cloned.
> 
> I configured the tracing with "crm resource trace fs_ocfs2 stop".
> 
> Result:
> primitive fs_ocfs2 Filesystem \
>          params device="/dev/vg_san/lv_ocfs2" directory="/mnt/ocfs2" 
> fstype=ocfs2 \
>          params fast_stop=no force_unmount=true \
>          op monitor interval=30 timeout=20 \
>          op start timeout=60 interval=0 \
>          op stop timeout=60 interval=0 \
>          op_params trace_ra=1 \
>          meta is-managed=true target-role=Started
> 
> I expect log files for the stop operation in 
> /var/lib/heartbeat/trace_ra/Filesystem.
> But i don't get any.
> 

Might be umount hung and has not time to flush log to disk.

Cheers
Roger

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to