Re: [Cluster-devel] gfs2: dlm based recovery coordination

2012-01-10 Thread Steven Whitehouse
Hi,

On Mon, 2012-01-09 at 17:18 -0500, David Teigland wrote:
> Steve, attached is the latest version of this patch, which includes
> changes for all the suggestions I've seen.  I've sent sent a pull request
> for all the dlm patches preceding this.  Would you like to take this into
> your tree once the dlm patches are pulled?  As I've mentioned, I think
> the current merge cycle would be good, but you can send it off whenever
> you think is right.
> 
> Dave
> 

Yes - I'll keep a look out for the dlm patches going in, and I can stick
this in -nmw as soon as thats happened. I know thats kind of bending the
rules for linux-next, but I think its justifiable since this has already
been in linux-next and we may still be able to push in this merge
window.

The window has been open 5 days now, so I'd expect it to remain open for
another 5 days (minimum). I know Linus doesn't like people trying to
second guess when the window will close, but I think we still have time
to try and get this in, and I hope that we can manage it. Lets see how
it goes...

Steve.






[Cluster-devel] gfs2: dlm based recovery coordination

2012-01-09 Thread David Teigland
Steve, attached is the latest version of this patch, which includes
changes for all the suggestions I've seen.  I've sent sent a pull request
for all the dlm patches preceding this.  Would you like to take this into
your tree once the dlm patches are pulled?  As I've mentioned, I think
the current merge cycle would be good, but you can send it off whenever
you think is right.

Dave

>From 0fb2d7726b570c6a5eb289bac237fb384b9c6f0b Mon Sep 17 00:00:00 2001
From: David Teigland 
Date: Tue, 20 Dec 2011 17:03:04 -0600
Subject: [PATCH] gfs2: dlm based recovery coordination

This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery

Signed-off-by: David Teigland 
---
 fs/gfs2/glock.c |2 +-
 fs/gfs2/glock.h |7 +-
 fs/gfs2/incore.h|   58 +++-
 fs/gfs2/lock_dlm.c  |  993 ++-
 fs/gfs2/main.c  |   10 +
 fs/gfs2/ops_fstype.c|   29 +-
 fs/gfs2/recovery.c  |4 +
 fs/gfs2/sys.c   |   33 +-
 fs/gfs2/sys.h   |2 +
 include/linux/gfs2_ondisk.h |2 +
 10 files changed, 1098 insertions(+), 42 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 88e8a23..376816f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1353,7 +1353,7 @@ void gfs2_glock_complete(struct gfs2_glock *gl, int ret)
spin_lock(&gl->gl_spin);
gl->gl_reply = ret;
 
-   if (unlikely(test_bit(DFL_BLOCK_LOCKS, &ls->ls_flags))) {
+   if (unlikely(test_bit(DFL_BLOCK_LOCKS, &ls->ls_recover_flags))) {
if (gfs2_should_freeze(gl)) {
set_bit(GLF_FROZEN, &gl->gl_flags);
spin_unlock(&gl->gl_spin);
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 6670711..5b548b07 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -121,8 +121,11 @@ enum {
 
 struct lm_lockops {
const char *lm_proto_name;
-   int (*lm_mount) (struct gfs2_sbd *sdp, const char *fsname);
-   void (*lm_unmount) (struct gfs2_sbd *sdp);
+   int (*lm_mount) (struct gfs2_sbd *sdp, const char *table);
+   void (*lm_first_done) (struct gfs2_sbd *sdp);
+   void (*lm_recovery_result) (struct gfs2_sbd *sdp, unsigned int jid,
+   unsigned int result);
+   void (*lm_unmount) (struct gfs2_sbd *sdp);
void (*lm_withdraw) (struct gfs2_sbd *sdp);
void (*lm_put_lock) (struct gfs2_glock *gl);
int (*lm_lock) (struct gfs2_glock *gl, unsigned int req_state,
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 892ac37..9182a87 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -139,8 +139,45 @@ struct gfs2_bufdata {
 #define GDLM_STRNAME_BYTES 25
 #define GDLM_LVB_SIZE  32
 
+/*
+ * ls_recover_flags:
+ *
+ * DFL_BLOCK_LOCKS: dlm is in recovery and will grant locks that had been
+ * held by failed nodes whose journals need recovery.  Those locks should
+ * only be used for journal recovery until the journal recovery is done.
+ * This is set by the dlm recover_prep callback and cleared by the
+ * gfs2_control thread when journal recovery is complete.  To avoid
+ * races between recover_prep setting and gfs2_control clearing, recover_spin
+ * is held while changing this bit and reading/writing recover_block
+ * and recover_start.
+ *
+ * DFL_NO_DLM_OPS: dlm lockspace ops/callbacks are not being used.
+ *
+ * DFL_FIRST_MOUNT: this node is the first to mount this fs and is doing
+ * recovery of all journals before allowing other nodes to mount the fs.
+ * This is cleared when FIRST_MOUNT_DONE is set.
+ *
+ * DFL_FIRST_MOUNT_DONE: this node was the first mounter, and has finished
+ * recovery of all journals, and now allows other nodes to mount the fs.
+ *
+ * DFL_MOUNT_DONE: gdlm_mount has completed successfully and cleared
+ * BLOCK_LOCKS for the first time.  The gfs2_control thread should now
+ * control clearing BLOCK_LOCKS for further recoveries.
+ *
+ * DFL_UNMOUNT: gdlm_unmount sets to keep sdp off gfs2_control_wq.
+ *
+ * DFL_DLM_RECOVERY: set while dlm is in recovery, between recover_prep()
+ * and recover_done(), i.e. set while recover_block == recover_start.
+ */
+
 enum {
DFL_BLOCK_LOCKS = 0,
+   DFL_NO_DLM_OPS  = 1,
+   DFL_FIRST_MOUNT = 2,
+   DFL_FIRST_MOUNT_DONE= 3,
+   DFL_MOUNT_DONE  = 4,
+   DFL_UNMOUNT = 5,
+   DFL_DLM_RECOVERY= 6,
 };
 
 struct lm_lockname {
@@ -504,14 +541,26 @@ struct gfs2_sb_host {
 struct lm_lockstruct {
int ls_jid;
unsigned int ls_first;
-   unsigned int ls_first_done;
unsigned int ls_nodir;
const struct lm_lockops *ls_ops;
-   unsigned l