[Cluster-devel] [GFS2 PATCH 0/4] gfs2: misc withdraw patch fixes

2020-04-24 Thread Bob Peterson
Hi, Further recovery testing revealed some problems with the withdraw code, especially related to single-node (lock_nolock) withdraws. This patch set fixes some of the recent issues. Bob Peterson (4): gfs2: fix withdraw sequence deadlock gfs2: Fix error exit in do_xmote gfs2: Fix BUG during

[Cluster-devel] [GFS2 PATCH 4/4] gfs2: Fix use-after-free in gfs2_logd after withdraw

2020-04-24 Thread Bob Peterson
When the gfs2_logd daemon withdrew, the withdraw sequence called into make_fs_ro() to make the file system read-only. That caused the journal descriptors to be freed. However, those journal descriptors were used by gfs2_logd's call to gfs2_ail_flush_reqd(). This caused a use-after free and NULL poi

[Cluster-devel] [GFS2 PATCH 3/4] gfs2: Fix BUG during unmount after file system withdraw

2020-04-24 Thread Bob Peterson
Before this patch, when the logd daemon was forced to withdraw, it would try to request its journal be recovered by another cluster node. However, in single-user cases with lock_nolock, there are no other nodes to recover the journal. Function signal_our_withdraw() was recognizing the lock_nolock s

[Cluster-devel] [GFS2 PATCH 2/4] gfs2: Fix error exit in do_xmote

2020-04-24 Thread Bob Peterson
Before this patch , if an error was detected from glock function go_sync by function do_xmote, it would return. But the function had temporarily unlocked the gl_lockref spin_lock, and it never re-locked it. When the caller of do_xmote tried to unlock it again, it was already unlocked, which resulte

[Cluster-devel] [GFS2 PATCH 1/4] gfs2: fix withdraw sequence deadlock

2020-04-24 Thread Bob Peterson
After a gfs2 file system withdraw, any attempt to read metadata is automatically rejected by function gfs2_meta_read() except for reads of the journal inode. This turns out to be a problem because function signal_our_withdraw() repeatedly calls check_journal_clean() which reads the metadata (both i

Re: [Cluster-devel] BUG during umount() after withdrawal

2020-04-24 Thread Bob Peterson
- Original Message - > Hi, > > I'm doing some testing on 5.7-rc2 which includes Bob's recovery patches. > I used a new xfstest (see the end of this mail) which injects some > IO errors to force the filesystem to be withdrawn and then checks > that it can be remounted successfully. > > How

[Cluster-devel] BUG during umount() after withdrawal

2020-04-24 Thread Ross Lagerwall
Hi, I'm doing some testing on 5.7-rc2 which includes Bob's recovery patches. I used a new xfstest (see the end of this mail) which injects some IO errors to force the filesystem to be withdrawn and then checks that it can be remounted successfully. However, it hits a BUG() during umount() after i