Hi,

This is a revision to the patch set I sent on 13 February 2019. These
won't make this merge window, obviously, because that's almost upon us.

This version fixes some glaring mistakes and problems of the first set.
As before, this may not be the final version, but I wanted to put it out
for review anyway.

Among changes from the original are:

1. I fixed some really stupid mistakes of the original patch set.
2. I found and fixed several additional problems not covered by the first
   patch set.
3. I broke up the patch "Force withdraw to replay journals and wait for
   it to finish" into more reasonaly sized pieces. It's still complex,
   but not nearly as bad as the original.
4. I included some of the instrumentation I've used to detect file system
   corruption. It makes sense to include them in mainline, I think.
5. I still need to figure out what to do about Dave Teigland's observation
   regarding the patch "dlm: recover slot regardless of whether we still
   have a connection". The patch is omitted in this set until I figure out
   a reasonable course of action.

This version is much more stable. I've still been able to break it, given
enough pressure, but I think that's an additional bug. I'll continue to
chase it, and will post further patches, if necessary.

These patches address a bunch of problems related to journal replay
overwriting valid gfs2 metadata due to io errors, withdraws and such.
These seem to fix several metadata corruption problems I've been able
to reliably recreate lately with multi-node multi-file system recovery
tests.

Bob Peterson (15):
  gfs2: log error reform
  gfs2: Introduce concept of a pending withdraw
  gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn
  gfs2: move check_journal_clean to util.c for future use
  gfs2: Allow some glocks to be used during withdraw
  gfs2: Make secondary withdrawers wait for first withdrawer
  gfs2: Don't write log headers after file system withdraw
  gfs2: Force withdraw to replay journals and wait for it to finish
  gfs2: Add verbose option to check_journal_clean
  gfs2: Check for log write errors before telling dlm  to unlock
  gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
  gfs2: If the journal isn't live ignore log flushes
  gfs2: Issue revokes more intelligently
  gfs2: Warn when a journal replay overwrites a rgrp with buffers
  gfs2: log which portion of the journal is replayed

 fs/gfs2/aops.c       |   4 +-
 fs/gfs2/file.c       |   2 +-
 fs/gfs2/glock.c      |  39 ++++++++--
 fs/gfs2/glock.h      |   1 +
 fs/gfs2/glops.c      |  82 ++++++++++++++++++++-
 fs/gfs2/incore.h     |  14 +++-
 fs/gfs2/lock_dlm.c   |  68 +++++++++++++++++
 fs/gfs2/log.c        |  94 +++++++++++-------------
 fs/gfs2/log.h        |   1 +
 fs/gfs2/lops.c       |  29 +++++++-
 fs/gfs2/meta_io.c    |   6 +-
 fs/gfs2/ops_fstype.c |  52 ++-----------
 fs/gfs2/quota.c      |   8 +-
 fs/gfs2/recovery.c   |   3 +-
 fs/gfs2/super.c      |  30 ++++----
 fs/gfs2/super.h      |   1 +
 fs/gfs2/sys.c        |   2 +-
 fs/gfs2/util.c       | 171 ++++++++++++++++++++++++++++++++++++++++++-
 fs/gfs2/util.h       |  11 +++
 19 files changed, 477 insertions(+), 141 deletions(-)

-- 
2.20.1

Reply via email to