[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs Tue, 23 Feb 2016 01:33:53 -0800

The branch, master has been updated
       via  46edef2 ctdb-recovery: Limit scope of reclock latency statistics
       via  188019b ctdb-recovery: Negate the status when checking the recovery 
lock
       via  fad3f36 ctdb-recovery: Clean up status handling from recmode child
       via  b6c3918 ctdb-recovery: Don't bother ensuring file descriptor is -1
       via  531e672 ctdb-recovery: Don't store recmode in recovery mode state
       via  6695fa5 ctdb: Use ctdb_wait_for_process_to_exit()
       via  907a5a6 ctdb-common: New function ctdb_wait_for_process_to_exit()
       via  4d6ec81 ctdb-recovery: Drop redundant status send when setting 
recovery mode
       via  3e2f216 ctdb-recovery: Include lib/util/time.h instead of 
samba_util.h
      from  fb0d624 torture:smb2: improve torture_comments in connect test


https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 46edef25df5457e49ac5698838a5d76ef733e60e
Author: Martin Schwenke <[email protected]>
Date:   Mon Feb 1 11:46:05 2016 +1100

    ctdb-recovery: Limit scope of reclock latency statistics
    
    It does not make sense to update this statistic for the timeout case,
    since this could skew the statistic.  To keep it simple, just update
    it for the usual case where there is lock contention, since this is
    the usual case.  So the daemon statistic measures time to test the
    lock and the corresponding recovery daemon statistic measures time to
    take the lock.
    
    Additionally, the recovery daemon will eventually use this code to
    take the lock, and the method of updating the latency statistic will
    need to be pushed further out to a configurable handler that depends
    on the calling context.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>
    
    Autobuild-User(master): Amitay Isaacs <[email protected]>
    Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144

commit 188019b877fb797e7da460028c5ec6f98f691877
Author: Martin Schwenke <[email protected]>
Date:   Thu Jan 28 15:07:30 2016 +1100

    ctdb-recovery: Negate the status when checking the recovery lock
    
    Have 0 indicate that the lock was taken.  This allows non-zero values
    to be used to indicate why the lock could not be taken.  EACCES means
    lock contention.
    
    For now use just EACCES to cover all failures, since
    ctdb_recovery_lock() returns a bool and details of other errors will
    be lost.  ctdb_recovery_lock() will undergo some big changes, so don't
    try to fix this now.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit fad3f367b77b120e2d0fc31cc2d87f46614dd266
Author: Martin Schwenke <[email protected]>
Date:   Thu Jan 28 14:59:18 2016 +1100

    ctdb-recovery: Clean up status handling from recmode child
    
    This currently returns an incorrect error when the expected number of
    bytes are not read.  Separate out the different cases to clarify the
    logic and avoid reporting the wrong error.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit b6c39184579d6e0ab24006f36c46e689039c8ace
Author: Martin Schwenke <[email protected]>
Date:   Mon Jan 11 14:50:14 2016 +1100

    ctdb-recovery: Don't bother ensuring file descriptor is -1
    
    This is already done before the destructor is assigned.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit 531e6724ba9d0a4fab2dca40344e4c7be3e966c1
Author: Martin Schwenke <[email protected]>
Date:   Mon Jan 11 13:58:54 2016 +1100

    ctdb-recovery: Don't store recmode in recovery mode state
    
    The callbacks that use this value are only ever called if recovery
    mode is being set to NORMAL.  So do not check if recmode is NORMAL
    either.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit 6695fa50aed43ad2b2998197ae79fa768f5a89d7
Author: Martin Schwenke <[email protected]>
Date:   Tue Dec 8 14:20:59 2015 +1100

    ctdb: Use ctdb_wait_for_process_to_exit()
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit 907a5a6f1bb7e1d7717e7563c49d948eccef9120
Author: Martin Schwenke <[email protected]>
Date:   Tue Dec 8 14:12:46 2015 +1100

    ctdb-common: New function ctdb_wait_for_process_to_exit()
    
    This pattern is used quite a few times in the CTDB code.  Many
    instances use ctdb_kill() but for signal 0 this just calls kill(2)
    anyway.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit 4d6ec81299b70cf67eb716b93daed39d89b54ffb
Author: Martin Schwenke <[email protected]>
Date:   Tue Dec 8 14:21:33 2015 +1100

    ctdb-recovery: Drop redundant status send when setting recovery mode
    
    The child process writes the status into the pipe before looping to
    wait.
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

commit 3e2f2169a41207a674a05106180a25d574e90224
Author: Martin Schwenke <[email protected]>
Date:   Tue Feb 16 13:41:21 2016 +1100

    ctdb-recovery: Include lib/util/time.h instead of samba_util.h
    
    Less is more...
    
    Signed-off-by: Martin Schwenke <[email protected]>
    Reviewed-by: Amitay Isaacs <[email protected]>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/common/system.h             |  2 +
 ctdb/common/system_util.c        |  7 ++++
 ctdb/server/ctdb_call.c          |  5 +--
 ctdb/server/ctdb_lock_helper.c   |  4 +-
 ctdb/server/ctdb_recover.c       | 80 ++++++++++++++++++++--------------------
 ctdb/server/ctdb_takeover.c      |  5 +--
 ctdb/server/ctdb_traverse.c      |  4 +-
 ctdb/server/ctdb_update_record.c |  5 +--
 8 files changed, 53 insertions(+), 59 deletions(-)


Changeset truncated at 500 lines:

diff --git a/ctdb/common/system.h b/ctdb/common/system.h
index ba11d20..1229a7e 100644
--- a/ctdb/common/system.h
+++ b/ctdb/common/system.h
@@ -62,4 +62,6 @@ void mkdir_p_or_die(const char *dir, int mode);
 ssize_t sys_read(int fd, void *buf, size_t count);
 ssize_t sys_write(int fd, const void *buf, size_t count);
 
+void ctdb_wait_for_process_to_exit(pid_t pid);
+
 #endif /* __CTDB_SYSTEM_H__ */
diff --git a/ctdb/common/system_util.c b/ctdb/common/system_util.c
index e6c4f17..f47d586 100644
--- a/ctdb/common/system_util.c
+++ b/ctdb/common/system_util.c
@@ -420,3 +420,10 @@ ssize_t sys_write(int fd, const void *buf, size_t count)
 #endif
         return ret;
 }
+
+void ctdb_wait_for_process_to_exit(pid_t pid)
+{
+       while (kill(pid, 0) == 0 || errno != ESRCH) {
+               sleep(5);
+       }
+}
diff --git a/ctdb/server/ctdb_call.c b/ctdb/server/ctdb_call.c
index 510be7d..3478419 100644
--- a/ctdb/server/ctdb_call.c
+++ b/ctdb/server/ctdb_call.c
@@ -1865,10 +1865,7 @@ int ctdb_start_revoke_ro_record(struct ctdb_context 
*ctdb, struct ctdb_db_contex
 
 child_finished:
                sys_write(rc->fd[1], &c, 1);
-               /* make sure we die when our parent dies */
-               while (ctdb_kill(ctdb, parent, 0) == 0 || errno != ESRCH) {
-                       sleep(5);
-               }
+               ctdb_wait_for_process_to_exit(parent);
                _exit(0);
        }
 
diff --git a/ctdb/server/ctdb_lock_helper.c b/ctdb/server/ctdb_lock_helper.c
index 543c5d0..4f1fe2d 100644
--- a/ctdb/server/ctdb_lock_helper.c
+++ b/ctdb/server/ctdb_lock_helper.c
@@ -182,8 +182,6 @@ int main(int argc, char *argv[])
 
        send_result(write_fd, result);
 
-       while (kill(ppid, 0) == 0 || errno != ESRCH) {
-               sleep(5);
-       }
+       ctdb_wait_for_process_to_exit(ppid);
        return 0;
 }
diff --git a/ctdb/server/ctdb_recover.c b/ctdb/server/ctdb_recover.c
index 127ff6b..79b5b2b 100644
--- a/ctdb/server/ctdb_recover.c
+++ b/ctdb/server/ctdb_recover.c
@@ -30,7 +30,7 @@
 #include "lib/tdb_wrap/tdb_wrap.h"
 #include "lib/util/dlinklist.h"
 #include "lib/util/debug.h"
-#include "lib/util/samba_util.h"
+#include "lib/util/time.h"
 #include "lib/util/util_process.h"
 
 #include "ctdb_private.h"
@@ -410,7 +410,6 @@ failed:
 struct ctdb_set_recmode_state {
        struct ctdb_context *ctdb;
        struct ctdb_req_control_old *c;
-       uint32_t recmode;
        int fd[2];
        struct tevent_timer *te;
        struct tevent_fd *fde;
@@ -435,7 +434,7 @@ static void ctdb_set_recmode_timeout(struct tevent_context 
*ev,
           arbitrate locks immediately after a node failure.       
         */
        DEBUG(DEBUG_ERR,(__location__ " set_recmode child process hung/timedout 
CFS slow to grant locks? (allowing recmode set anyway)\n"));
-       state->ctdb->recovery_mode = state->recmode;
+       state->ctdb->recovery_mode = CTDB_RECOVERY_NORMAL;
        ctdb_request_control_reply(state->ctdb, state->c, NULL, 0, NULL);
        talloc_free(state);
 }
@@ -445,16 +444,9 @@ static void ctdb_set_recmode_timeout(struct tevent_context 
*ev,
 */
 static int set_recmode_destructor(struct ctdb_set_recmode_state *state)
 {
-       double l = timeval_elapsed(&state->start_time);
-
-       CTDB_UPDATE_RECLOCK_LATENCY(state->ctdb, "daemon reclock", 
reclock.ctdbd, l);
-
        if (state->fd[0] != -1) {
                state->fd[0] = -1;
        }
-       if (state->fd[1] != -1) {
-               state->fd[1] = -1;
-       }
        ctdb_kill(state->ctdb, state->child, SIGKILL);
        return 0;
 }
@@ -470,6 +462,8 @@ static void set_recmode_handler(struct tevent_context *ev,
                                             struct ctdb_set_recmode_state);
        char c = 0;
        int ret;
+       int status = 0;
+       const char *err = NULL;
 
        /* we got a response from our child process so we can abort the
           timeout.
@@ -477,32 +471,38 @@ static void set_recmode_handler(struct tevent_context *ev,
        talloc_free(state->te);
        state->te = NULL;
 
-
-       /* If, as expected, the child was unable to take the recovery
-        * lock then it will have written 0 into the pipe, so
-        * continue.  However, any other value (e.g. 1) indicates that
-        * it was able to take the recovery lock when it should have
-        * been held by the recovery daemon on the recovery master.
-       */
        ret = sys_read(state->fd[0], &c, 1);
-       if (ret != 1 || c != 0) {
-               ctdb_request_control_reply(
-                       state->ctdb, state->c, NULL, -1,
-                       "Took recovery lock from daemon during recovery - 
probably a cluster filesystem lock coherence problem");
-               talloc_free(state);
-               return;
-       }
-
-       state->ctdb->recovery_mode = state->recmode;
-
-       /* release any deferred attach calls from clients */
-       if (state->recmode == CTDB_RECOVERY_NORMAL) {
-               ctdb_process_deferred_attach(state->ctdb);
+       if (ret == 1) {
+               /* Child wrote status. EACCES indicates that it was unable
+                * to take the lock, which is the expected outcome.
+                * 0 indicates that it was able to take the
+                * lock, which is an error because the recovery daemon
+                * should be holding the lock. */
+               double l = timeval_elapsed(&state->start_time);
+
+               if (c == EACCES) {
+                       status = 0;
+                       err = NULL;
+
+                       state->ctdb->recovery_mode = CTDB_RECOVERY_NORMAL;
+
+                       /* release any deferred attach calls from clients */
+                       ctdb_process_deferred_attach(state->ctdb);
+
+                       CTDB_UPDATE_RECLOCK_LATENCY(state->ctdb, "daemon 
reclock", reclock.ctdbd, l);
+               } else {
+                       status = -1;
+                       err = "Took recovery lock from daemon during recovery - 
probably a cluster filesystem lock coherence problem";
+               }
+       } else {
+               /* Child did not write status.  Unexpected error.
+                * Child may have received a signal.  */
+               status = -1;
+               err = "Unexpected error when testing recovery lock";
        }
 
-       ctdb_request_control_reply(state->ctdb, state->c, NULL, 0, NULL);
+       ctdb_request_control_reply(state->ctdb, state->c, NULL, status, err);
        talloc_free(state);
-       return;
 }
 
 static void
@@ -573,7 +573,10 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
                return 0;
        }
 
-       /* some special handling when ending recovery mode */
+       /* From this point: recmode == CTDB_RECOVERY_NORMAL
+        *
+        * Therefore, what follows is special handling when setting
+        * recovery mode back to normal */
 
        for (ctdb_db = ctdb->db_list; ctdb_db != NULL; ctdb_db = ctdb_db->next) 
{
                if (ctdb_db->generation != ctdb->vnn_map->generation) {
@@ -631,7 +634,7 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
        }
 
        if (state->child == 0) {
-               char cc = 0;
+               char cc = EACCES;
                close(state->fd[0]);
 
                prctl_set_comment("ctdb_recmode");
@@ -643,15 +646,11 @@ int32_t ctdb_control_set_recmode(struct ctdb_context 
*ctdb,
                              ("ERROR: Daemon able to take recovery lock on 
\"%s\" during recovery\n",
                               ctdb->recovery_lock_file));
                        ctdb_recovery_unlock(ctdb);
-                       cc = 1;
+                       cc = 0;
                }
 
                sys_write(state->fd[1], &cc, 1);
-               /* make sure we die when our parent dies */
-               while (ctdb_kill(ctdb, parent, 0) == 0 || errno != ESRCH) {
-                       sleep(5);
-                       sys_write(state->fd[1], &cc, 1);
-               }
+               ctdb_wait_for_process_to_exit(parent);
                _exit(0);
        }
        close(state->fd[1]);
@@ -676,7 +675,6 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
        tevent_fd_set_auto_close(state->fde);
 
        state->ctdb    = ctdb;
-       state->recmode = recmode;
        state->c       = talloc_steal(state, c);
 
        *async_reply = true;
diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c
index 62af1e6..13276af 100644
--- a/ctdb/server/ctdb_takeover.c
+++ b/ctdb/server/ctdb_takeover.c
@@ -3824,10 +3824,7 @@ int32_t ctdb_control_reload_public_ips(struct 
ctdb_context *ctdb, struct ctdb_re
                }
 
                sys_write(h->fd[1], &res, 1);
-               /* make sure we die when our parent dies */
-               while (ctdb_kill(ctdb, parent, 0) == 0 || errno != ESRCH) {
-                       sleep(5);
-               }
+               ctdb_wait_for_process_to_exit(parent);
                _exit(0);
        }
 
diff --git a/ctdb/server/ctdb_traverse.c b/ctdb/server/ctdb_traverse.c
index 73c3a06..11a5385 100644
--- a/ctdb/server/ctdb_traverse.c
+++ b/ctdb/server/ctdb_traverse.c
@@ -260,9 +260,7 @@ static struct ctdb_traverse_local_handle 
*ctdb_traverse_local(struct ctdb_db_con
 
                sys_write(h->fd[1], &res, sizeof(res));
 
-               while (ctdb_kill(ctdb, parent, 0) == 0 || errno != ESRCH) {
-                       sleep(5);
-               }
+               ctdb_wait_for_process_to_exit(parent);
                _exit(0);
        }
 
diff --git a/ctdb/server/ctdb_update_record.c b/ctdb/server/ctdb_update_record.c
index bc9c6fe..56f3b83 100644
--- a/ctdb/server/ctdb_update_record.c
+++ b/ctdb/server/ctdb_update_record.c
@@ -276,10 +276,7 @@ static struct childwrite_handle *ctdb_childwrite(
 
                sys_write(result->fd[1], &c, 1);
 
-               /* make sure we die when our parent dies */
-               while (ctdb_kill(ctdb_db->ctdb, parent, 0) == 0 || errno != 
ESRCH) {
-                       sleep(5);
-               }
+               ctdb_wait_for_process_to_exit(parent);
                _exit(0);
        }
 


-- 
Samba Shared Repository

[SCM] Samba Shared Repository - branch master updated

Reply via email to