The branch, 2.5 has been updated
       via  ccddcec6258f511511abe6861fe8347360d8a066 (commit)
       via  034cecad7d96d521134c7d795d033d1dc418e351 (commit)
       via  cd801651028f495eae6ac9a3135180dcfdd0ad88 (commit)
       via  24065effa286c2a92fbf03cea8d03c505cc0ca6f (commit)
       via  8a431c0513c5e6c22d76a859dc69ffb7be52aee0 (commit)
       via  cf36ae2cf51152c75ac191fb0597827219d08133 (commit)
       via  d31b80e2dd2c9c0ad319f4c3bdba25a66777b287 (commit)
       via  96271120c02131f97ce9c3033606b8e45962b1c1 (commit)
       via  f97221d8d3c2c3e18a84494f8b7b132e1af3507c (commit)
       via  4581d2d7fa37669db3e3ce72ae1152db8ee33264 (commit)
       via  9924218dc6a7f1cb184db9953d839ab5d9124ca2 (commit)
       via  e4a4f2656b86ba271451375d229eeaac6a598f93 (commit)
       via  22459b3c9d4d56b5a43a9bb1af724b4c52b7487e (commit)
       via  8f81f4d1b6ac2e7fa3e47937fe7af24cab29bd09 (commit)
       via  6b28c24d30b85dcb860f7d101249020ae4994419 (commit)
       via  9f98a42f01cdaa262e186eac093f7aa06766c88c (commit)
       via  f87d186a40d50dd9fe97aadf81414c9327ad8cf4 (commit)
      from  2c5e86fab6b1d178a717a5f023e3ba346342f94e (commit)

https://git.samba.org/?p=ctdb.git;a=shortlog;h=2.5


- Log -----------------------------------------------------------------
commit ccddcec6258f511511abe6861fe8347360d8a066
Author: Amitay Isaacs <ami...@gmail.com>
Date:   Mon Feb 23 12:38:11 2015 +1100

    io: Do not use sys_write to write to client sockets
    
    When sending messages to clients, ctdb checks for EAGAIN error code and
    schedules next write in the subsequent event loop.  Using sys_write in
    these places causes ctdb to loop hard till a client is able to read from
    the socket.  With real time scheduling, ctdb daemon spins consuming 100%
    of CPU trying to write to the client sockets.  This can be quite harmful
    when running under VMs or machines with single CPU.
    
    This regression was introduced when all read/write calls were replaced to
    use sys_read/sys_write wrappers (c1558adeaa980fb4bd6177d36250ec8262e9b9fe).
    
    The existing code backs off in case of EAGAIN failures and waits for an
    event loop to process the write again.  This should give ctdb clients
    a chance to get scheduled and to process the ctdb socket.
    
    Signed-off-by: Amitay Isaacs <ami...@gmail.com>
    Reviewed-by: Martin Schwenke <mar...@meltin.net>
    
    Autobuild-User(master): Martin Schwenke <mart...@samba.org>
    Autobuild-Date(master): Tue Feb 24 12:29:30 CET 2015 on sn-devel-104
    
    (Imported from commit 04a061e4d19d5bdbd8179fb0fab8b0875eec243e)

commit 034cecad7d96d521134c7d795d033d1dc418e351
Author: Martin Schwenke <mar...@meltin.net>
Date:   Sat Feb 14 12:53:08 2015 +1100

    scripts: Improve messages about invalid tunables during "setup"
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <ami...@samba.org>
    Autobuild-Date(master): Wed Feb 18 08:03:33 CET 2015 on sn-devel-104
    
    (Imported from commit dc32f11b871a7d4e8ea6fd1d01491d89103decf7)

commit cd801651028f495eae6ac9a3135180dcfdd0ad88
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Feb 9 10:33:35 2015 +1100

    tool: Print a warning when setting an obsolete tunable variable
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit c3706e7fb07bcb35f7d894c4e8e0c12b4a62d0db)

commit 24065effa286c2a92fbf03cea8d03c505cc0ca6f
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Feb 9 10:32:47 2015 +1100

    client: Return a value of 1 when setting obsolete tunable variable
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 54f0c39e5a33871847aa9fe2c070c7f638f54cc4)

commit 8a431c0513c5e6c22d76a859dc69ffb7be52aee0
Author: Martin Schwenke <mar...@meltin.net>
Date:   Sun Feb 15 14:39:51 2015 +1100

    tests: New tests for 00.ctdb "setup" event - set tunables from config
    
    Unit test infrastructure tweaks to support.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 2c7c35377e5452e37925b970253b70875a8d7470)

commit cf36ae2cf51152c75ac191fb0597827219d08133
Author: Martin Schwenke <mar...@meltin.net>
Date:   Mon Feb 16 14:04:09 2015 +1100

    scripts: Fix tunable setup code by making it shell-agnostic
    
    All tunables set in configuration are currently set to 0 on system
    where /bin/sh is dash (and perhaps other non-bash shells).  dash puts
    single quotes around all values in the output of the "set" builtin
    command, whereas bash only puts them around values when something
    needs to be quoted.  Tunables always have a simple integer value so
    dash will quote them and bash won't.  The setup code currently passes
    the raw value, including any quotes to "ctdb setvar ...".  This
    command does no error checking on the input, so "'1'" is converted to
    0.
    
    Change the code so that the value is determined from the shell
    variable and is independent of the "set" output.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 39686f45056d942de5ebe3263a533a99ca17c79e)

commit d31b80e2dd2c9c0ad319f4c3bdba25a66777b287
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Jan 27 12:55:42 2015 +1100

    recoverd: Abort when daemon can take recovery lock during recovery
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <ami...@samba.org>
    Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104
    
    (Imported from commit 39d2fd330a60ea590d76213f8cb406a42fa8d680)

commit 96271120c02131f97ce9c3033606b8e45962b1c1
Author: Martin Schwenke <mar...@meltin.net>
Date:   Wed Dec 17 20:33:19 2014 +1100

    recoverd: Improve error messages on recovery lock coherence fail
    
    When the daemon is able to take the recovery lock during recovery we
    might as well guess that the cluster filesystem has a lock coherence
    problem and print a more useful message.  This will be more helpful to
    those trying out cluster filesystems that don't have lock coherence or
    that are difficult to setup.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 432d6774891eba30a959cd2d8ee8469d189c7872)

commit f97221d8d3c2c3e18a84494f8b7b132e1af3507c
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 13:51:27 2014 +1100

    recoverd: Don't release and re-take the recovery lock
    
    Just continue to hold it, otherwise a broken node might win an
    election and grab the lock.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 48c91407abd5e34463d3a10cb6fce47ec4a0d5f6)

commit 4581d2d7fa37669db3e3ce72ae1152db8ee33264
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 14:50:38 2014 +1100

    recoverd: Simplify ctdb_recovery_lock()
    
    Have it just silently take or fail to take the lock, except on an
    unexpected failure (where it should log an error).
    
    This means that when it is called we need to keep the old behaviour
    and explicitly release the lock.  In do_recovery() the lock is
    released and a message is printed before attempting to take the lock.
    In the daemon sanity check the lock must be released in the error path
    if it is actually taken.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 1d6ed91f5518d462ba368bca03be923428710157)

commit 9924218dc6a7f1cb184db9953d839ab5d9124ca2
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 14:45:08 2014 +1100

    recoverd: Remove check_recovery_lock()
    
    This has not done anything useful since commit
    b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520.  Instead, just check that
    the lock is held.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit be19a17faf6da97365c425c5b423e9b74f9c9e0c)

commit e4a4f2656b86ba271451375d229eeaac6a598f93
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 14:09:40 2014 +1100

    recoverd: Improve logging when recovery lock file is changed
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 668ed5366237b61f0ff618f32555ce29cca5e6f3)

commit 22459b3c9d4d56b5a43a9bb1af724b4c52b7487e
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 14:07:20 2014 +1100

    recoverd: New function ctdb_recovery_unlock()
    
    Unlock the recovery lock file.  This way knowledge of the file
    descriptor isn't sprinkled throughout the code.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit db32a2bce54b9618fe247b33d6de81bd5f7a3b62)

commit 8f81f4d1b6ac2e7fa3e47937fe7af24cab29bd09
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 13:50:22 2014 +1100

    recoverd: New function ctdb_recovery_have_lock()
    
    True if this recovery daemon holds the lock.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 72701be663ddb265320a022a22130a3437bbf6bc)

commit 6b28c24d30b85dcb860f7d101249020ae4994419
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 13:49:06 2014 +1100

    daemon: Log a warning when setting obsolete tunables
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit 4d3b52f1cec46f66f8d0827bc8f458cd8c86b5a5)

commit 9f98a42f01cdaa262e186eac093f7aa06766c88c
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Dec 9 13:47:42 2014 +1100

    daemon: Mark tunable VerifyRecoveryLock as obsolete
    
    It is pointless having a recovery lock but not sanity checking that it
    is working.  Also, the logic that uses this tunable is confusing.  In
    some places the recovery lock is released unnecessarily because the
    tunable isn't set.
    
    Simplify the logic by assuming that if a recovery lock is specified
    then it should be verified.
    
    Update documentation that references this tunable.
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit d110fe231849d76ecb83378c934627dc64b74c72)

commit f87d186a40d50dd9fe97aadf81414c9327ad8cf4
Author: Martin Schwenke <mar...@meltin.net>
Date:   Tue Feb 3 14:27:11 2015 +1100

    doc: Improve documentation of the recovery lock
    
    Signed-off-by: Martin Schwenke <mar...@meltin.net>
    Reviewed-by: Amitay Isaacs <ami...@gmail.com>
    
    (Imported from commit a01744c08ff5b8aca4af99842acfc78a87af9297)

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c                               |   4 +-
 common/ctdb_io.c                                   |   6 +-
 config/events.d/00.ctdb                            |  20 +-
 doc/ctdb-tunables.7.xml                            |   9 -
 doc/ctdb.1.xml                                     |  34 ++-
 doc/ctdb.7.xml                                     |  54 ++++
 doc/ctdbd.1.xml                                    |  16 +-
 doc/ctdbd.conf.5.xml                               |   6 +
 include/ctdb_private.h                             |   4 +-
 server/ctdb_recover.c                              |  90 +++----
 server/ctdb_recoverd.c                             | 272 ++++-----------------
 server/ctdb_server.c                               |   1 -
 server/ctdb_tunables.c                             |  20 +-
 ...terface.startup.002.sh => 00.ctdb.setup.001.sh} |   2 +-
 tests/eventscripts/00.ctdb.setup.002.sh            |  19 ++
 tests/eventscripts/00.ctdb.setup.003.sh            |  21 ++
 tests/eventscripts/00.ctdb.setup.004.sh            |  20 ++
 tests/eventscripts/etc/sysconfig/ctdb              |   4 +
 tests/eventscripts/scripts/local.sh                |  11 +
 tests/eventscripts/stubs/ctdb                      |  26 ++
 tools/ctdb.c                                       |   5 +
 21 files changed, 327 insertions(+), 317 deletions(-)
 copy tests/eventscripts/{10.interface.startup.002.sh => 00.ctdb.setup.001.sh} 
(64%)
 create mode 100755 tests/eventscripts/00.ctdb.setup.002.sh
 create mode 100755 tests/eventscripts/00.ctdb.setup.003.sh
 create mode 100755 tests/eventscripts/00.ctdb.setup.004.sh


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index 4553113..cf6835a 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -2735,12 +2735,12 @@ int ctdb_ctrl_set_tunable(struct ctdb_context *ctdb,
        ret = ctdb_control(ctdb, destnode, 0, CTDB_CONTROL_SET_TUNABLE, 0, 
data, NULL,
                           NULL, &res, &timeout, NULL);
        talloc_free(data.dptr);
-       if (ret != 0 || res != 0) {
+       if ((ret != 0) || (res == -1)) {
                DEBUG(DEBUG_ERR,(__location__ " ctdb_control for set_tunable 
failed\n"));
                return -1;
        }
 
-       return 0;
+       return res;
 }
 
 /*
diff --git a/common/ctdb_io.c b/common/ctdb_io.c
index 467ec9a..53486f4 100644
--- a/common/ctdb_io.c
+++ b/common/ctdb_io.c
@@ -232,9 +232,9 @@ static void queue_io_write(struct ctdb_queue *queue)
                struct ctdb_queue_pkt *pkt = queue->out_queue;
                ssize_t n;
                if (queue->ctdb->flags & CTDB_FLAG_TORTURE) {
-                       n = sys_write(queue->fd, pkt->data, 1);
+                       n = write(queue->fd, pkt->data, 1);
                } else {
-                       n = sys_write(queue->fd, pkt->data, pkt->length);
+                       n = write(queue->fd, pkt->data, pkt->length);
                }
 
                if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) {
@@ -310,7 +310,7 @@ int ctdb_queue_send(struct ctdb_queue *queue, uint8_t 
*data, uint32_t length)
           queue overhead. This relies on non-blocking sockets */
        if (queue->out_queue == NULL && queue->fd != -1 &&
            !(queue->ctdb->flags & CTDB_FLAG_TORTURE)) {
-               ssize_t n = sys_write(queue->fd, data, length2);
+               ssize_t n = write(queue->fd, data, length2);
                if (n == -1 && errno != EAGAIN && errno != EWOULDBLOCK) {
                        talloc_free(queue->fde);
                        queue->fde = NULL;
diff --git a/config/events.d/00.ctdb b/config/events.d/00.ctdb
index a0f4102..ec18175 100755
--- a/config/events.d/00.ctdb
+++ b/config/events.d/00.ctdb
@@ -119,14 +119,19 @@ update_config_from_tdb() {
     fi
 }
 
-set_ctdb_variables () {
+set_ctdb_variables ()
+{
     # set any tunables from the config file
-    set | grep ^CTDB_SET_ | cut -d_ -f3- | 
+    set | sed -n '/^CTDB_SET_/s/=.*//p' |
     while read v; do
-       varname=`echo $v | cut -d= -f1`
-       value=`echo $v | cut -d= -f2`
-       ctdb setvar $varname $value || return 1
-       echo "Set $varname to $value"
+       varname="${v#CTDB_SET_}"
+       value=$(eval echo "\$$v")
+       if ctdb setvar $varname $value ; then
+           echo "Set $varname to $value"
+       else
+           echo "Invalid configuration: CTDB_SET_${varname}=${value}"
+           return 1
+       fi
     done
 }
 
@@ -198,7 +203,8 @@ case "$1" in
 
     setup)
        # Set any tunables from the config file
-       set_ctdb_variables || die "Failed to set CTDB tunables"
+       set_ctdb_variables || \
+           die "Aborting setup due to invalid configuration - fix typos, 
remove unknown tunables"
        ;;
 
     startup)
diff --git a/doc/ctdb-tunables.7.xml b/doc/ctdb-tunables.7.xml
index 456e856..b029fdb 100644
--- a/doc/ctdb-tunables.7.xml
+++ b/doc/ctdb-tunables.7.xml
@@ -448,15 +448,6 @@
     </refsect2>
 
     <refsect2>
-      <title>VerifyRecoveryLock</title>
-      <para>Default: 1</para>
-      <para>
-       Should we take a fcntl() lock on the reclock file to verify that we are 
the
-       sole recovery master node on the cluster or not.
-      </para>
-    </refsect2>
-
-    <refsect2>
       <title>VacuumInterval</title>
       <para>Default: 10</para>
       <para>
diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml
index f680563..ccd46d6 100644
--- a/doc/ctdb.1.xml
+++ b/doc/ctdb.1.xml
@@ -682,7 +682,6 @@ RecdFailCount           = 10
 LogLatencyMs            = 0
 RecLockLatencyMs        = 1000
 RecoveryDropAllIPs      = 120
-VerifyRecoveryLock      = 1
 VacuumInterval          = 10
 VacuumMaxRunTime        = 30
 RepackLimit             = 10000
@@ -883,30 +882,51 @@ DB Statistics: locking.tdb
     <refsect2>
       <title>getreclock</title>
       <para>
-       This command is used to show the filename of the reclock file that is 
used.
+       Show the name of the recovery lock file, if any.
       </para>
 
       <para>
        Example output:
       </para>
       <screen>
-       Reclock file:/gpfs/.ctdb/shared
+       Reclock file:/clusterfs/.ctdb/recovery.lock
       </screen>
 
     </refsect2>
 
     <refsect2>
-      <title>setreclock [filename]</title>
+      <title>
+       setreclock <optional><parameter>FILE</parameter></optional>
+      </title>
+
       <para>
-       This command is used to modify, or clear, the file that is used as the 
reclock file at runtime. When this command is used, the reclock file checks are 
disabled. To re-enable the checks the administrator needs to activate the 
"VerifyRecoveryLock" tunable using "ctdb setvar".
+       FILE specifies the name of the recovery lock file.  If the
+       recovery lock file is changed at run-time then this will cause
+       a recovery, which in turn causes the recovery lock to be
+       retaken.
       </para>
 
       <para>
-       If run with no parameter this will remove the reclock file completely. 
If run with a parameter the parameter specifies the new filename to use for the 
recovery lock.
+       If no FILE is specified then a recovery lock file will no
+       longer be used.
       </para>
 
       <para>
-       This command only affects the runtime settings of a ctdb node and will 
be lost when ctdb is restarted. For persistent changes to the reclock file 
setting you must edit /etc/sysconfig/ctdb.
+       This command only affects the run-time setting of a single
+       CTDB node.  This setting <emphasis>must</emphasis> be changed
+       on all nodes simultaneously by specifying <option>-n
+       all</option> (or similar).  For information about configuring
+       the recovery lock file please see the
+       <citetitle>CTDB_RECOVERY_LOCK</citetitle> entry in
+       <citerefentry><refentrytitle>ctdbd.conf</refentrytitle>
+       <manvolnum>5</manvolnum></citerefentry> and the
+       <citetitle>--reclock</citetitle> entry in
+       <citerefentry><refentrytitle>ctdbd</refentrytitle>
+       <manvolnum>1</manvolnum></citerefentry>.  For information
+       about the recovery lock please see the <citetitle>RECOVERY
+       LOCK</citetitle> section in
+       <citerefentry><refentrytitle>ctdb</refentrytitle>
+       <manvolnum>7</manvolnum></citerefentry>.
       </para>
     </refsect2>
 
diff --git a/doc/ctdb.7.xml b/doc/ctdb.7.xml
index b54fa42..ad17df7 100644
--- a/doc/ctdb.7.xml
+++ b/doc/ctdb.7.xml
@@ -76,6 +76,60 @@
 </refsect1>
 
   <refsect1>
+    <title>Recovery Lock</title>
+
+    <para>
+      CTDB uses a <emphasis>recovery lock</emphasis> to avoid a
+      <emphasis>split brain</emphasis>, where a cluster becomes
+      partitioned and each partition attempts to operate
+      independently.  Issues that can result from a split brain
+      include file data corruption, because file locking metadata may
+      not be tracked correctly.
+    </para>
+
+    <para>
+      CTDB uses a <emphasis>cluster leader and follower</emphasis>
+      model of cluster management.  All nodes in a cluster elect one
+      node to be the leader.  The leader node coordinates privileged
+      operations such as database recovery and IP address failover.
+      CTDB refers to the leader node as the <emphasis>recovery
+      master</emphasis>.  This node takes and holds the recovery lock
+      to assert its privileged role in the cluster.
+    </para>
+
+    <para>
+      The recovery lock is implemented using a file residing in shared
+      storage (usually) on a cluster filesystem.  To support a
+      recovery lock the cluster filesystem must support lock
+      coherence.  See
+      <citerefentry><refentrytitle>ping_pong</refentrytitle>
+      <manvolnum>1</manvolnum></citerefentry> for more details.
+    </para>
+
+    <para>
+      If a cluster becomes partitioned (for example, due to a
+      communication failure) and a different recovery master is
+      elected by the nodes in each partition, then only one of these
+      recovery masters will be able to take the recovery lock.  The
+      recovery master in the "losing" partition will not be able to
+      take the recovery lock and will be excluded from the cluster.
+      The nodes in the "losing" partition will elect each node in turn
+      as their recovery master so eventually all the nodes in that
+      partition will be excluded.
+    </para>
+
+    <para>
+      CTDB does sanity checks to ensure that the recovery lock is held
+      as expected.
+    </para>
+
+    <para>
+      CTDB can run without a recovery lock but this is not recommended
+      as there will be no protection from split brains.
+    </para>
+  </refsect1>
+
+  <refsect1>
     <title>Private vs Public addresses</title>
 
     <para>
diff --git a/doc/ctdbd.1.xml b/doc/ctdbd.1.xml
index ab222bc..080f506 100644
--- a/doc/ctdbd.1.xml
+++ b/doc/ctdbd.1.xml
@@ -306,18 +306,18 @@
       </varlistentry>
 
       <varlistentry>
-       <term>--reclock=<parameter>FILENAME</parameter></term>
+       <term>--reclock=<parameter>FILE</parameter></term>
        <listitem>
          <para>
-           FILENAME is the name of the recovery lock file stored in
-           <emphasis>shared storage</emphasis> that ctdbd uses to
-           prevent split brains from occuring.
+           FILE is the name of the recovery lock file, stored in
+           <emphasis>shared storage</emphasis>, that CTDB uses to
+           prevent split brains.
          </para>
          <para>
-           It is possible to run CTDB without a recovery lock file, but
-           then there will be no protection against split brain if the
-           cluster/network becomes partitioned. Using CTDB without a
-           reclock file is strongly discouraged.
+           For information about the recovery lock please see the
+           <citetitle>RECOVERY LOCK</citetitle> section in
+           <citerefentry><refentrytitle>ctdb</refentrytitle>
+           <manvolnum>7</manvolnum></citerefentry>.
          </para>
        </listitem>
       </varlistentry>
diff --git a/doc/ctdbd.conf.5.xml b/doc/ctdbd.conf.5.xml
index 803c232..a99ae9b 100644
--- a/doc/ctdbd.conf.5.xml
+++ b/doc/ctdbd.conf.5.xml
@@ -312,6 +312,12 @@
            should be change to a useful value.  Corresponds to
            <option>--reclock</option>.
          </para>
+         <para>
+           For information about the recovery lock please see the
+           <citetitle>RECOVERY LOCK</citetitle> section in
+           <citerefentry><refentrytitle>ctdb</refentrytitle>
+           <manvolnum>7</manvolnum></citerefentry>.
+         </para>
        </listitem>
       </varlistentry>
 
diff --git a/include/ctdb_private.h b/include/ctdb_private.h
index fe9250b..341bad6 100644
--- a/include/ctdb_private.h
+++ b/include/ctdb_private.h
@@ -1260,7 +1260,9 @@ void ctdb_release_all_ips(struct ctdb_context *ctdb);
 void set_nonblocking(int fd);
 void set_close_on_exec(int fd);
 
-bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep);
+bool ctdb_recovery_have_lock(struct ctdb_context *ctdb);
+bool ctdb_recovery_lock(struct ctdb_context *ctdb);
+void ctdb_recovery_unlock(struct ctdb_context *ctdb);
 
 int ctdb_set_recovery_lock_file(struct ctdb_context *ctdb, const char *file);
 
diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index 393b72f..50bcf25 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -527,18 +527,21 @@ static void set_recmode_handler(struct event_context *ev, 
struct fd_event *fde,
        state->te = NULL;
 
 
-       /* read the childs status when trying to lock the reclock file.
-          child wrote 0 if everything is fine and 1 if it did manage
-          to lock the file, which would be a problem since that means
-          we got a request to exit from recovery but we could still lock
-          the file   which at this time SHOULD be locked by the recovery
-          daemon on the recmaster
-       */              
+       /* If, as expected, the child was unable to take the recovery
+        * lock then it will have written 0 into the pipe, so
+        * continue.  However, any other value (e.g. 1) indicates that
+        * it was able to take the recovery lock when it should have
+        * been held by the recovery daemon on the recovery master.
+       */
        ret = sys_read(state->fd[0], &c, 1);
        if (ret != 1 || c != 0) {
-               ctdb_request_control_reply(state->ctdb, state->c, NULL, -1, 
"managed to lock reclock file from inside daemon");
+               const char *msg = \
+                       "Took recovery lock from daemon - probably a cluster 
filesystem lock coherence problem";
+               ctdb_request_control_reply(
+                       state->ctdb, state->c, NULL, -1,
+                       msg);
                talloc_free(state);
-               return;
+               ctdb_die(state->ctdb, msg);
        }
 
        state->ctdb->recovery_mode = state->recmode;
@@ -639,8 +642,8 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
                ctdb_process_deferred_attach(ctdb);
        }
 
-       if (ctdb->tunable.verify_recovery_lock == 0) {
-               /* dont need to verify the reclock file */
+       if (ctdb->recovery_lock_file == NULL) {
+               /* Not using recovery lock file */
                ctdb->recovery_mode = recmode;
                return 0;
        }
@@ -671,11 +674,13 @@ int32_t ctdb_control_set_recmode(struct ctdb_context 
*ctdb,
 
                ctdb_set_process_name("ctdb_recmode");
                debug_extra = talloc_asprintf(NULL, "set_recmode:");
-               /* we should not be able to get the lock on the reclock file, 
-                 as it should  be held by the recovery master 
-               */
-               if (ctdb_recovery_lock(ctdb, false)) {
-                       DEBUG(DEBUG_CRIT,("ERROR: recovery lock file %s not 
locked when recovering!\n", ctdb->recovery_lock_file));
+               /* Daemon should not be able to get the recover lock,
+                * as it should be held by the recovery master */
+               if (ctdb_recovery_lock(ctdb)) {
+                       DEBUG(DEBUG_ERR,
+                             ("ERROR: Daemon able to take recovery lock on 
\"%s\" during recovery\n",
+                              ctdb->recovery_lock_file));
+                       ctdb_recovery_unlock(ctdb);
                        cc = 1;
                }
 
@@ -720,26 +725,25 @@ int32_t ctdb_control_set_recmode(struct ctdb_context 
*ctdb,
 }
 
 
+bool ctdb_recovery_have_lock(struct ctdb_context *ctdb)
+{
+       return ctdb->recovery_lock_fd != -1;
+}
+
 /*
   try and get the recovery lock in shared storage - should only work
   on the recovery master recovery daemon. Anywhere else is a bug
  */
-bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool keep)
+bool ctdb_recovery_lock(struct ctdb_context *ctdb)
 {
        struct flock lock;
 
-       if (keep) {
-               DEBUG(DEBUG_ERR, ("Take the recovery lock\n"));
-       }
-       if (ctdb->recovery_lock_fd != -1) {
-               close(ctdb->recovery_lock_fd);
-               ctdb->recovery_lock_fd = -1;
-       }
-
-       ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file, O_RDWR|O_CREAT, 
0600);
+       ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file,
+                                     O_RDWR|O_CREAT, 0600);
        if (ctdb->recovery_lock_fd == -1) {
-               DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Unable to open %s - 
(%s)\n", 
-                        ctdb->recovery_lock_file, strerror(errno)));
+               DEBUG(DEBUG_ERR,
+                     ("ctdb_recovery_lock: Unable to open %s - (%s)\n",
+                      ctdb->recovery_lock_file, strerror(errno)));
                return false;
        }
 
@@ -755,27 +759,29 @@ bool ctdb_recovery_lock(struct ctdb_context *ctdb, bool 
keep)
                int saved_errno = errno;
                close(ctdb->recovery_lock_fd);
                ctdb->recovery_lock_fd = -1;
-               if (keep) {
-                       DEBUG(DEBUG_CRIT,("ctdb_recovery_lock: Failed to get "
-                                         "recovery lock on '%s' - (%s)\n",
-                                         ctdb->recovery_lock_file,
-                                         strerror(saved_errno)));
+               /* Fail silently on these errors, since they indicate
+                * lock contention, but log an error for any other
+                * failure. */
+               if (saved_errno != EACCES &&
+                   saved_errno != EAGAIN) {
+                       DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Failed to get "
+                                        "recovery lock on '%s' - (%s)\n",
+                                        ctdb->recovery_lock_file,
+                                        strerror(saved_errno)));
                }
                return false;
        }
 
-       if (!keep) {
+       return true;
+}
+
+void ctdb_recovery_unlock(struct ctdb_context *ctdb)
+{
+       if (ctdb->recovery_lock_fd != -1) {
+               DEBUG(DEBUG_NOTICE, ("Releasing recovery lock\n"));
                close(ctdb->recovery_lock_fd);
                ctdb->recovery_lock_fd = -1;
        }
-
-       if (keep) {
-               DEBUG(DEBUG_NOTICE, ("Recovery lock taken successfully\n"));
-       }
-
-       DEBUG(DEBUG_NOTICE,("ctdb_recovery_lock: Got recovery lock on '%s'\n", 
ctdb->recovery_lock_file));
-
-       return true;
 }
 
 /*
diff --git a/server/ctdb_recoverd.c b/server/ctdb_recoverd.c
index 39e833c..a4daf09 100644
--- a/server/ctdb_recoverd.c
+++ b/server/ctdb_recoverd.c
@@ -1808,28 +1808,36 @@ static int do_recovery(struct ctdb_recoverd *rec,
                return -1;
        }
 
-        if (ctdb->tunable.verify_recovery_lock != 0) {
-               DEBUG(DEBUG_ERR,("Taking out recovery lock from recovery 
daemon\n"));
-               start_time = timeval_current();
-               if (!ctdb_recovery_lock(ctdb, true)) {
-                       if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) {
-                               /* If ctdb is trying first recovery, it's
-                                * possible that current node does not know yet
-                                * who the recmaster is.
-                                */
-                               DEBUG(DEBUG_ERR, ("Unable to get recovery lock"
-                                               " - retrying recovery\n"));
+        if (ctdb->recovery_lock_file != NULL) {
+               if (ctdb_recovery_have_lock(ctdb)) {
+                       DEBUG(DEBUG_NOTICE, ("Already holding recovery 
lock\n"));
+               } else {
+                       start_time = timeval_current();
+                       DEBUG(DEBUG_NOTICE, ("Attempting to take recovery lock 
(%s)\n",
+                                            ctdb->recovery_lock_file));
+                       if (!ctdb_recovery_lock(ctdb)) {
+                               if (ctdb->runstate == 
CTDB_RUNSTATE_FIRST_RECOVERY) {
+                                       /* If ctdb is trying first recovery, 
it's
+                                        * possible that current node does not 
know
+                                        * yet who the recmaster is.
+                                        */
+                                       DEBUG(DEBUG_ERR, ("Unable to get 
recovery lock"
+                                                         " - retrying 
recovery\n"));
+                                       return -1;
+                               }
+
+                               DEBUG(DEBUG_ERR,("Unable to get recovery lock - 
aborting recovery "
+                                                "and ban ourself for %u 
seconds\n",
+                                                
ctdb->tunable.recovery_ban_period));
+                               ctdb_ban_node(rec, pnn, 
ctdb->tunable.recovery_ban_period);
                                return -1;
                        }
-
-                       DEBUG(DEBUG_ERR,("Unable to get recovery lock - 
aborting recovery "
-                                        "and ban ourself for %u seconds\n",
-                                        ctdb->tunable.recovery_ban_period));
-                       ctdb_ban_node(rec, pnn, 
ctdb->tunable.recovery_ban_period);
-                       return -1;
+                       ctdb_ctrl_report_recd_lock_latency(ctdb,
+                                                          CONTROL_TIMEOUT(),


-- 
CTDB repository

Reply via email to