Re: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put

2015-12-08 Thread Steven Whitehouse

Hi,

On 08/12/15 07:57, Dave Chinner wrote:

On Wed, Dec 02, 2015 at 11:42:13AM -0500, Bob Peterson wrote:

- Original Message -
(snip)

Please take a look at this
again and figure out what the problematic cycle of events is, and then
work out how to avoid that happening in the first place. There is no
point in replacing one problem with another one, particularly one which
would likely be very tricky to debug,

Steve.

Rhe problematic cycle of events is well known:
gfs2_clear_inode calls gfs2_glock_put() for the inode's glock,
but if it's the very last put, it calls into dlm, which can block,
and that's where we get into trouble.

The livelock goes like this:

1. A fence operation needs memory, so it blocks on memory allocation.
2. Memory allocation blocks on slab shrinker.
3. Slab shrinker calls into vfs inode shrinker to free inodes from memory.



7. dlm blocks on a pending fence operation. Goto 1.

Therefore, the fence operation should be doing GFP_NOFS allocations
to prevent re-entry into the DLM via the filesystem via the shrinker

Cheers,

Dave.


Which would be ideal, but how do you do that from user space?

Steve.



[Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush

2015-12-08 Thread Benjamin Marzinski
When gfs2 was unmounting filesystems or changing them to read-only it
was clearing the SDF_JOURNAL_LIVE bit before the final log flush.  This
caused a race.  If an inode glock got demoted in the gap between
clearing the bit and the shutdown flush, it would be unable to reserve
log space to clear out the acive items list in inode_go_sync, causing an
error in inode_go_inval because the glock was still dirty.

To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
shutdown log flush.  This means that, because of the locking on the log
blocks, either inode_go_sync will be able to reserve space to clean the
glock before the shutdown flush, or the shutdown flush will clean the
glock itself, before inode_go_sync fails to reserve the space. Either
way, the glock will be clean before inode_go_inval.

Signed-off-by: Benjamin Marzinski 
---
 fs/gfs2/log.c   | 3 +++
 fs/gfs2/super.c | 4 
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..0ff028c 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -716,6 +716,9 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock 
*gl,
}
trace_gfs2_log_flush(sdp, 1);
 
+   if (type == SHUTDOWN_FLUSH)
+   clear_bit(SDF_JOURNAL_LIVE, >sd_flags);
+
sdp->sd_log_flush_head = sdp->sd_log_head;
sdp->sd_log_flush_wrapped = 0;
tr = sdp->sd_log_tr;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 894fb01..e55c9b6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -842,10 +842,6 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
gfs2_quota_sync(sdp->sd_vfs, 0);
gfs2_statfs_sync(sdp->sd_vfs, 0);
 
-   down_write(>sd_log_flush_lock);
-   clear_bit(SDF_JOURNAL_LIVE, >sd_flags);
-   up_write(>sd_log_flush_lock);
-
gfs2_log_flush(sdp, NULL, SHUTDOWN_FLUSH);
wait_event(sdp->sd_reserving_log_wait, 
atomic_read(>sd_reserving_log) == 0);
gfs2_assert_warn(sdp, atomic_read(>sd_log_blks_free) == 
sdp->sd_jdesc->jd_blocks);
-- 
1.8.3.1