Re: [Cluster-devel] [GFS2 PATCH 3/9] gfs2: Empty the ail for the glock when rgrps are invalidated

2019-02-15 Thread Steven Whitehouse

Hi,

On 13/02/2019 15:21, Bob Peterson wrote:

Before this patch, function rgrp_go_inval would not invalidate the
ail list, which meant that there might still be buffers outstanding
on the ail that had revokes still pending. If the revokes had still
not been written when the glock was given to another node, and that
node (with outstanding revokes) died for some reason, the resulting
journal replay would replay the un-revoked rgrps, thus wiping out
changes made by the node who rightfully received the rgrp in EX.
This caused metadata corruption.

Signed-off-by: Bob Peterson 


rgrp_go_sync() has a call to gfs2_ail_empty_gl() so there should be no 
revokes to worry about when rgrp_go_inval is called, because everything 
should have already been written to the log & flushed. Has something 
gone wrong in the logic which somehow allows that step to be skipped in 
some cases?


Steve.


---
  fs/gfs2/glops.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 9c86c8004ba7..4b0e52bf5825 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -166,6 +166,8 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
error = filemap_fdatawait_range(mapping, gl->gl_vm.start, 
gl->gl_vm.end);
mapping_set_error(mapping, error);
gfs2_ail_empty_gl(gl);
+   gfs2_assert_withdraw(gl->gl_name.ln_sbd,
+gl->gl_name.ln_sbd->sd_log_error == 0);
  
  	spin_lock(>gl_lockref.lock);

rgd = gl->gl_object;
@@ -196,6 +198,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
WARN_ON_ONCE(!(flags & DIO_METADATA));
gfs2_assert_withdraw(sdp, !atomic_read(>gl_ail_count));
truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
+   gfs2_ail_empty_gl(gl);
  
  	if (rgd)

rgd->rd_flags &= ~GFS2_RDF_UPTODATE;




[Cluster-devel] [GFS2 PATCH 3/9] gfs2: Empty the ail for the glock when rgrps are invalidated

2019-02-13 Thread Bob Peterson
Before this patch, function rgrp_go_inval would not invalidate the
ail list, which meant that there might still be buffers outstanding
on the ail that had revokes still pending. If the revokes had still
not been written when the glock was given to another node, and that
node (with outstanding revokes) died for some reason, the resulting
journal replay would replay the un-revoked rgrps, thus wiping out
changes made by the node who rightfully received the rgrp in EX.
This caused metadata corruption.

Signed-off-by: Bob Peterson 
---
 fs/gfs2/glops.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 9c86c8004ba7..4b0e52bf5825 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -166,6 +166,8 @@ static void rgrp_go_sync(struct gfs2_glock *gl)
error = filemap_fdatawait_range(mapping, gl->gl_vm.start, 
gl->gl_vm.end);
mapping_set_error(mapping, error);
gfs2_ail_empty_gl(gl);
+   gfs2_assert_withdraw(gl->gl_name.ln_sbd,
+gl->gl_name.ln_sbd->sd_log_error == 0);
 
spin_lock(>gl_lockref.lock);
rgd = gl->gl_object;
@@ -196,6 +198,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
WARN_ON_ONCE(!(flags & DIO_METADATA));
gfs2_assert_withdraw(sdp, !atomic_read(>gl_ail_count));
truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
+   gfs2_ail_empty_gl(gl);
 
if (rgd)
rgd->rd_flags &= ~GFS2_RDF_UPTODATE;
-- 
2.20.1