[Cluster-devel] [GFS2 PATCH v3] gfs2: clean_journal improperly set sd_log_flush_head

2019-03-28 Thread Bob Peterson
Hi,

Andreas found some problems with the previous version. Here is version 3.

Ross: Can you please test this one with your scenario? Thanks.

Bob Peterson
---

This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, gfs2_extent_map
is called to figure it out.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers")

Signed-off-by: Bob Peterson 
---
 fs/gfs2/incore.h   |  2 +-
 fs/gfs2/log.c  | 26 +++---
 fs/gfs2/log.h  |  3 ++-
 fs/gfs2/lops.c |  6 +++---
 fs/gfs2/lops.h |  2 +-
 fs/gfs2/recovery.c |  8 
 fs/gfs2/recovery.h |  2 +-
 7 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index cdf07b408f54..86840a70ee1a 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -535,7 +535,7 @@ struct gfs2_jdesc {
unsigned long jd_flags;
 #define JDF_RECOVERY 1
unsigned int jd_jid;
-   unsigned int jd_blocks;
+   u32 jd_blocks;
int jd_recover_error;
/* Replay stuff */
 
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index b8830fda51e8..8a5a19a26582 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -672,13 +672,15 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp)
  * @seq: sequence number
  * @tail: tail of the log
  * @flags: log header flags GFS2_LOG_HEAD_*
+ * @lblock: value for lh_blkno (block number relative to start of journal)
  * @op_flags: flags to pass to the bio
  *
  * Returns: the initialized log buffer descriptor
  */
 
 void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
-  u64 seq, u32 tail, u32 flags, int op_flags)
+  u64 seq, u32 tail, u32 flags, u32 lblock,
+  int op_flags)
 {
struct gfs2_log_header *lh;
u32 hash, crc;
@@ -686,7 +688,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct 
gfs2_jdesc *jd,
struct gfs2_statfs_change_host *l_sc = >sd_statfs_local;
struct timespec64 tv;
struct super_block *sb = sdp->sd_vfs;
-   u64 addr;
+   u64 dblock;
 
lh = page_address(page);
clear_page(lh);
@@ -699,15 +701,25 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct 
gfs2_jdesc *jd,
lh->lh_sequence = cpu_to_be64(seq);
lh->lh_flags = cpu_to_be32(flags);
lh->lh_tail = cpu_to_be32(tail);
-   lh->lh_blkno = cpu_to_be32(sdp->sd_log_flush_head);
+   lh->lh_blkno = cpu_to_be32(lblock);
hash = ~crc32(~0, lh, LH_V1_SIZE);
lh->lh_hash = cpu_to_be32(hash);
 
ktime_get_coarse_real_ts64();
lh->lh_nsec = cpu_to_be32(tv.tv_nsec);
lh->lh_sec = cpu_to_be64(tv.tv_sec);
-   addr = gfs2_log_bmap(sdp);
-   lh->lh_addr = cpu_to_be64(addr);
+   if (jd->jd_jid == sdp->sd_lockstruct.ls_jid)
+   dblock = gfs2_log_bmap(sdp);
+   else {
+   u32 extlen;
+   int new = 0, error;
+
+   error = gfs2_extent_map(jd->jd_inode, lblock, , ,
+   );
+   if (gfs2_assert_withdraw(sdp, error == 0))
+   return;
+   }
+   lh->lh_addr = cpu_to_be64(dblock);
lh->lh_jinode = cpu_to_be64(GFS2_I(jd->jd_inode)->i_no_addr);
 
/* We may only write local statfs, quota, etc., when writing to our
@@ -732,7 +744,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct 
gfs2_jdesc *jd,
 sb->s_blocksize - LH_V1_SIZE - 4);
lh->lh_crc = cpu_to_be32(crc);
 
-   gfs2_log_write(sdp, page, sb->s_blocksize, 0, addr);
+   gfs2_log_write(sdp, page, sb->s_blocksize, 0, dblock);
gfs2_log_submit_bio(>sd_log_bio, REQ_OP_WRITE, op_flags);
log_flush_wait(sdp);
 }
@@ -761,7 +773,7 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 
flags)
}
sdp->sd_log_idle = (tail == sdp->sd_log_flush_head);
gfs2_write_log_header(sdp, sdp->sd_jdesc, sdp->sd_log_sequence++, tail,
- flags, op_flags);
+ flags, sdp->sd_log_flush_head, op_flags);
 
if (sdp->sd_log_tail != tail)
log_pull_tail(sdp, tail);
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 1bc9bd444b28..1e061657d263 100644
--- 

Re: [Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED

2019-03-28 Thread Christoph Hellwig
On Thu, Mar 21, 2019 at 02:13:04PM +0100, Andreas Gruenbacher wrote:
> Hi Christoph,
> 
> we need your help fixing a gfs2 deadlock involving iomap.  What's going
> on is the following:
> 
> * During iomap_file_buffered_write, gfs2_iomap_begin grabs the log flush
>   lock and keeps it until gfs2_iomap_end.  It currently always does that
>   even though there is no point other than for journaled data writes.
> 
> * iomap_file_buffered_write then calls balance_dirty_pages_ratelimited.
>   If that ends up calling gfs2_write_inode, gfs2 will try to grab the
>   log flush lock again and deadlock.

What is the exactly call chain?  balance_dirty_pages_ratelimited these
days doesn't start I/O, but just wakes up the flusher threads.  Or
do we have a issue where it is blocking on those threads?

Also why do you need to flush the log for background writeback in
->write_inode?

balance_dirty_pages_ratelimited is per definition not a data integrity
writeback, so there shouldn't be a good reason to flush the log
(which I assume the log flush log is for).  If we look gfs2_write_inode,
this seems to be the code:

bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));

if (flush_all)
gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
   GFS2_LOG_HEAD_FLUSH_NORMAL |
   GFS2_LFC_WRITE_INODE);

But what is the requirement to do this in writeback context?  Can't
we move it out into another context instead?



Re: [Cluster-devel] [GFS2 PATCH V2] gfs2: clean_journal was setting sd_log_flush_head replaying other journals

2019-03-28 Thread Ross Lagerwall

On 3/27/19 5:35 PM, Bob Peterson wrote:

Hi,

Yesterday I posted a patch for this problem, but it was grossly inadequate.
This patch, version 2, is another attempt to fix it. Many thanks to Ross
Lagerwall for helping us identify, fix, and test the problem.

Regards,

Bob Peterson
---
gfs2: clean_journal was setting sd_log_flush_head replaying other journals

Function clean_journal was setting the value of sd_log_flush_head,
but that's only a valid thing to do if it is replaying its own journal.
If it's replaying another node's journal, that's completely wrong and
will lead to multiple problems. This patch tries to clean up the
mess by passing the value of head to gfs2_write_log_header so it can
treat non-owned journals fairly.

Signed-off-by: Bob Peterson 

This does seem to fix the issue. Thanks!

Tested-by: Ross Lagerwall