The branch, master has been updated via b849fb4923d6a34141fe19006a974de81508ceda (commit) via c75b5e5b4d000f5c7dab403df8238ceed390c1c0 (commit) via 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37 (commit) via 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507 (commit) via 496387a585b2c5778c808cf02b8e1435abde4c3e (commit) via 3221fce9ee2f6fdd3bb17a5e1629ad52a32f90d6 (commit) via 776590bf84d221092298346a28d7fc0552a67c9d (commit) from 5067392d2e06795559f25828b65c129608b65c0b (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit b849fb4923d6a34141fe19006a974de81508ceda Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Jan 7 12:00:34 2013 +1100 tests/complex: Add NFS test when CTDB is killed on one of the nodes Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 4 15:00:44 2012 +1100 Eventscripts: Change the default reconfigure action to do nothing A default action of restarting the service doesn't obey the principle of least surprise. It cause the NFS service to be implicitly reintroduced. This allows no-op functions to be removed from some eventscripts and service restart functions to be added to others. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 4 14:52:25 2012 +1100 Eventscripts: Do not restart NFS on reconfigure It looks like this restart was accidentally reintroduced in commit fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure became unset so the default action of restarting the service would occur. From there cleanups have explicitly reintroduced it and carried it through the code. Also update the unit tests affected by this change. The restart was originally removed in commit bc481c3f1a44c50648488c4f8a7f15ec395d446f. The default reconfigure action of restarting a service is clearly suboptimal and will be addressed in a separate patch. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Dec 4 14:28:06 2012 +1100 ctdbd: Initialise the node flags in just one place Currently flags are initialised in 2 places. One of them is in ctdb_tcp_listen_automatic(), which just seems wrong. This makes the code easier to follow by just doing it in ctdb_start_daemon(). This means that the flags are now initialised later than previously. However, it is still done before the transport is started and before clients can connect. In future it might make sense to do a similar thing with setting the PNN. However, the current optimisation is reasonably obvious... Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 496387a585b2c5778c808cf02b8e1435abde4c3e Author: Martin Schwenke <mar...@meltin.net> Date: Mon Dec 3 15:44:12 2012 +1100 ctdbd: Remove debug option --node-ip, use --listen instead This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0 Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 3221fce9ee2f6fdd3bb17a5e1629ad52a32f90d6 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Dec 3 15:32:49 2012 +1100 tests: Local daemons should use --listen instead of --node-ip Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 776590bf84d221092298346a28d7fc0552a67c9d Author: Martin Schwenke <mar...@meltin.net> Date: Fri Nov 30 12:59:35 2012 +1100 Initscript: when checking status, print output of "ctdb ping" if it fails At the moment the caller has no idea why it thinks CTDB isn't running and we can't debug failures... Signed-off-by: Martin Schwenke <mar...@meltin.net> ----------------------------------------------------------------------- Summary of changes: config/ctdb.init | 5 +- config/events.d/40.vsftpd | 5 ++ config/events.d/41.httpd | 5 ++ config/events.d/49.winbind | 6 -- config/events.d/50.samba | 6 -- config/events.d/60.nfs | 2 - config/functions | 4 +- include/ctdb_private.h | 1 - server/ctdb_daemon.c | 23 +++++++ server/ctdb_server.c | 12 ---- server/ctdbd.c | 26 +++----- tcp/tcp_connect.c | 69 ++++++-------------- ...ilover_nfs_basic.sh => 45_failover_nfs_kill.sh} | 10 ++- tests/eventscripts/60.nfs.multi.001.sh | 2 - tests/eventscripts/60.nfs.multi.002.sh | 2 - tests/eventscripts/60.nfs.multi.003.sh | 2 - tests/eventscripts/60.nfs.multi.004.sh | 2 - tests/eventscripts/60.nfs.multi.005.sh | 2 - tests/eventscripts/60.nfs.multi.006.sh | 2 - tests/scripts/integration.bash | 2 +- 20 files changed, 77 insertions(+), 111 deletions(-) copy tests/complex/{43_failover_nfs_basic.sh => 45_failover_nfs_kill.sh} (89%) Changeset truncated at 500 lines: diff --git a/config/ctdb.init b/config/ctdb.init index b8ff733..85c1728 100755 --- a/config/ctdb.init +++ b/config/ctdb.init @@ -353,7 +353,7 @@ restart() { status() { echo -n $"Checking for ctdbd service: " - ctdb ping >/dev/null 2>&1 || { + _out=$(ctdb ping 2>&1) || { RETVAL=$? echo -n " ctdbd not running. " case $init_style in @@ -371,6 +371,9 @@ status() { fi ;; esac + echo 'Output from "ctdb ping":' + echo "$_out" + return $RETVAL } echo "" diff --git a/config/events.d/40.vsftpd b/config/events.d/40.vsftpd index e6d58c8..ab23a80 100755 --- a/config/events.d/40.vsftpd +++ b/config/events.d/40.vsftpd @@ -15,6 +15,11 @@ service_stop () service $service_name stop } +service_reconfigure () +{ + service $service_name restart +} + service_fail_limit=2 service_tcp_ports=21 diff --git a/config/events.d/41.httpd b/config/events.d/41.httpd index 6ae5d61..4fb6aa0 100755 --- a/config/events.d/41.httpd +++ b/config/events.d/41.httpd @@ -38,6 +38,11 @@ service_stop () killall -q -9 $service_name || true } +service_reconfigure () +{ + service $service_name restart +} + loadconfig ctdb_start_stop_service diff --git a/config/events.d/49.winbind b/config/events.d/49.winbind index b67951b..cd360a9 100755 --- a/config/events.d/49.winbind +++ b/config/events.d/49.winbind @@ -31,12 +31,6 @@ service_stop () service "$CTDB_SERVICE_WINBIND" stop } -service_reconfigure () -{ - # winbind automatically reloads config - no restart needed. - : -} - ########################### ctdb_start_stop_service diff --git a/config/events.d/50.samba b/config/events.d/50.samba index a4d50fe..3a43bbe 100755 --- a/config/events.d/50.samba +++ b/config/events.d/50.samba @@ -65,12 +65,6 @@ service_stop () fi } -service_reconfigure () -{ - # Samba automatically reloads config - no restart needed. - : -} - # set default samba cleanup period - in minutes [ -z "$SAMBA_CLEANUP_PERIOD" ] && { SAMBA_CLEANUP_PERIOD=10 diff --git a/config/events.d/60.nfs b/config/events.d/60.nfs index ef2c1f7..69a99dc 100755 --- a/config/events.d/60.nfs +++ b/config/events.d/60.nfs @@ -16,8 +16,6 @@ service_stop () } service_reconfigure () { - startstop_nfs restart - # if the ips have been reallocated, we must restart the lockmanager # across all nodes and ping all statd listeners [ -x $CTDB_BASE/statd-callout ] && { diff --git a/config/functions b/config/functions index 078b50b..330d057 100755 --- a/config/functions +++ b/config/functions @@ -1125,10 +1125,10 @@ ctdb_service_reconfigure () ctdb_counter_init "$@" } -# Default service_reconfigure() function. +# Default service_reconfigure() function does nothing. service_reconfigure () { - service "${1:-$service_name}" restart + : } ctdb_reconfigure_try_lock () diff --git a/include/ctdb_private.h b/include/ctdb_private.h index 582a767..152af64 100644 --- a/include/ctdb_private.h +++ b/include/ctdb_private.h @@ -501,7 +501,6 @@ struct ctdb_context { pid_t recoverd_pid; pid_t syslogd_pid; bool done_startup; - const char *node_ip; struct ctdb_monitor_state *monitor; struct ctdb_log_state *log; int start_as_disabled; diff --git a/server/ctdb_daemon.c b/server/ctdb_daemon.c index 13ea319..623e623 100644 --- a/server/ctdb_daemon.c +++ b/server/ctdb_daemon.c @@ -1029,6 +1029,26 @@ failed: return -1; } +static void initialise_node_flags (struct ctdb_context *ctdb) +{ + if (ctdb->pnn == -1) { + ctdb_fatal(ctdb, "PNN is set to -1 (unknown value)"); + } + + ctdb->nodes[ctdb->pnn]->flags &= ~NODE_FLAGS_DISCONNECTED; + + /* do we start out in DISABLED mode? */ + if (ctdb->start_as_disabled != 0) { + DEBUG(DEBUG_INFO, ("This node is configured to start in DISABLED state\n")); + ctdb->nodes[ctdb->pnn]->flags |= NODE_FLAGS_DISABLED; + } + /* do we start out in STOPPED mode? */ + if (ctdb->start_as_stopped != 0) { + DEBUG(DEBUG_INFO, ("This node is configured to start in STOPPED state\n")); + ctdb->nodes[ctdb->pnn]->flags |= NODE_FLAGS_STOPPED; + } +} + static void ctdb_setup_event_callback(struct ctdb_context *ctdb, int status, void *private_data) { @@ -1189,6 +1209,9 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog, if (ctdb->methods->initialise(ctdb) != 0) { ctdb_fatal(ctdb, "transport failed to initialise"); } + + initialise_node_flags(ctdb); + if (public_address_list) { ctdb->public_addresses_file = public_address_list; ret = ctdb_set_public_addresses(ctdb, true); diff --git a/server/ctdb_server.c b/server/ctdb_server.c index ec608cf..de3c690 100644 --- a/server/ctdb_server.c +++ b/server/ctdb_server.c @@ -146,18 +146,6 @@ static int ctdb_add_node(struct ctdb_context *ctdb, char *nstr) ctdb_same_address(&ctdb->address, &node->address)) { /* for automatic binding to interfaces, see tcp_connect.c */ ctdb->pnn = node->pnn; - node->flags &= ~NODE_FLAGS_DISCONNECTED; - - /* do we start out in DISABLED mode? */ - if (ctdb->start_as_disabled != 0) { - DEBUG(DEBUG_INFO, ("This node is configured to start in DISABLED state\n")); - node->flags |= NODE_FLAGS_DISABLED; - } - /* do we start out in STOPPED mode? */ - if (ctdb->start_as_stopped != 0) { - DEBUG(DEBUG_INFO, ("This node is configured to start in STOPPED state\n")); - node->flags |= NODE_FLAGS_STOPPED; - } } ctdb->num_nodes++; diff --git a/server/ctdbd.c b/server/ctdbd.c index ddda8ff..491b4a2 100644 --- a/server/ctdbd.c +++ b/server/ctdbd.c @@ -41,7 +41,6 @@ static struct { const char *db_dir_state; const char *public_interface; const char *single_public_ip; - const char *node_ip; int valgrinding; int nosetsched; int use_syslog; @@ -126,7 +125,6 @@ int main(int argc, const char *argv[]) { "event-script-dir", 0, POPT_ARG_STRING, &options.event_script_dir, 0, "event script directory", "dirname" }, { "logfile", 0, POPT_ARG_STRING, &options.logfile, 0, "log file location", "filename" }, { "nlist", 0, POPT_ARG_STRING, &options.nlist, 0, "node list file", "filename" }, - { "node-ip", 0, POPT_ARG_STRING, &options.node_ip, 0, "node ip", "ip-address"}, { "notification-script", 0, POPT_ARG_STRING, &options.notification_script, 0, "notification script", "filename" }, { "debug-hung-script", 0, POPT_ARG_STRING, &options.debug_hung_script, 0, "debug script for hung eventscripts", "filename" }, { "listen", 0, POPT_ARG_STRING, &options.myaddress, 0, "address to listen on", "address" }, @@ -244,23 +242,19 @@ int main(int argc, const char *argv[]) ctdb->capabilities |= CTDB_CAP_LVS; } + /* Initialise this node's PNN to the unknown value. This will + * be set to the correct value by either ctdb_add_node() as + * part of loading the nodes file or by + * ctdb_tcp_listen_automatic() when the transport is + * initialised. At some point we should de-optimise this and + * pull it out into ctdb_start_daemon() so it is done clearly + * and only in one place. + */ + ctdb->pnn = -1; + /* tell ctdb what nodes are available */ ctdb_load_nodes_file(ctdb); - /* if a node-ip was specified, verify that it exists in the - nodes file - */ - if (options.node_ip != NULL) { - DEBUG(DEBUG_NOTICE,("IP for this node is %s\n", options.node_ip)); - ret = ctdb_ip_to_nodeid(ctdb, options.node_ip); - if (ret == -1) { - DEBUG(DEBUG_ALERT,("The specified node-ip:%s is not a valid node address. Exiting.\n", options.node_ip)); - exit(1); - } - ctdb->node_ip = options.node_ip; - DEBUG(DEBUG_NOTICE,("This is node %d\n", ret)); - } - if (options.db_dir) { ret = ctdb_set_tdb_dir(ctdb, options.db_dir); if (ret == -1) { diff --git a/tcp/tcp_connect.c b/tcp/tcp_connect.c index c94b88f..93111f3 100644 --- a/tcp/tcp_connect.c +++ b/tcp/tcp_connect.c @@ -290,45 +290,32 @@ static int ctdb_tcp_listen_automatic(struct ctdb_context *ctdb) return -1; } - /* We only need to serialize this if we dont yet know the node ip */ - if (!ctdb->node_ip) { - /* in order to ensure that we don't get two nodes with the - same adddress, we must make the bind() and listen() calls - atomic. The SO_REUSEADDR setsockopt only prevents double - binds if the first socket is in LISTEN state */ - lock_fd = open(lock_path, O_RDWR|O_CREAT, 0666); - if (lock_fd == -1) { - DEBUG(DEBUG_CRIT,("Unable to open %s\n", lock_path)); - return -1; - } + /* in order to ensure that we don't get two nodes with the + same adddress, we must make the bind() and listen() calls + atomic. The SO_REUSEADDR setsockopt only prevents double + binds if the first socket is in LISTEN state */ + lock_fd = open(lock_path, O_RDWR|O_CREAT, 0666); + if (lock_fd == -1) { + DEBUG(DEBUG_CRIT,("Unable to open %s\n", lock_path)); + return -1; + } - lock.l_type = F_WRLCK; - lock.l_whence = SEEK_SET; - lock.l_start = 0; - lock.l_len = 1; - lock.l_pid = 0; + lock.l_type = F_WRLCK; + lock.l_whence = SEEK_SET; + lock.l_start = 0; + lock.l_len = 1; + lock.l_pid = 0; - if (fcntl(lock_fd, F_SETLKW, &lock) != 0) { - DEBUG(DEBUG_CRIT,("Unable to lock %s\n", lock_path)); - close(lock_fd); - return -1; - } + if (fcntl(lock_fd, F_SETLKW, &lock) != 0) { + DEBUG(DEBUG_CRIT,("Unable to lock %s\n", lock_path)); + close(lock_fd); + return -1; } for (i=0; i < ctdb->num_nodes; i++) { if (ctdb->nodes[i]->flags & NODE_FLAGS_DELETED) { continue; } - - /* if node_ip is specified we will only try to bind to that - ip. - */ - if (ctdb->node_ip != NULL) { - if (strcmp(ctdb->node_ip, ctdb->nodes[i]->address.address)) { - continue; - } - } - ZERO_STRUCT(sock); if (ctdb_tcp_get_address(ctdb, ctdb->nodes[i]->address.address, @@ -387,21 +374,10 @@ static int ctdb_tcp_listen_automatic(struct ctdb_context *ctdb) ctdb->address.address, ctdb->address.port); ctdb->pnn = ctdb->nodes[i]->pnn; - ctdb->nodes[i]->flags &= ~NODE_FLAGS_DISCONNECTED; DEBUG(DEBUG_INFO,("ctdb chose network address %s:%u pnn %u\n", ctdb->address.address, ctdb->address.port, ctdb->pnn)); - /* do we start out in DISABLED mode? */ - if (ctdb->start_as_disabled != 0) { - DEBUG(DEBUG_INFO, ("This node is configured to start in DISABLED state\n")); - ctdb->nodes[i]->flags |= NODE_FLAGS_DISABLED; - } - /* do we start out in STOPPED mode? */ - if (ctdb->start_as_stopped != 0) { - DEBUG(DEBUG_INFO, ("This node is configured to start in STOPPED state\n")); - ctdb->nodes[i]->flags |= NODE_FLAGS_STOPPED; - } if (listen(ctcp->listen_fd, 10) == -1) { goto failed; @@ -411,15 +387,12 @@ static int ctdb_tcp_listen_automatic(struct ctdb_context *ctdb) ctdb_listen_event, ctdb); tevent_fd_set_auto_close(fde); - if (!ctdb->node_ip) { - close(lock_fd); - } + close(lock_fd); + return 0; failed: - if (!ctdb->node_ip) { - close(lock_fd); - } + close(lock_fd); close(ctcp->listen_fd); ctcp->listen_fd = -1; return -1; diff --git a/tests/complex/43_failover_nfs_basic.sh b/tests/complex/45_failover_nfs_kill.sh similarity index 89% copy from tests/complex/43_failover_nfs_basic.sh copy to tests/complex/45_failover_nfs_kill.sh index 71a8229..f551036 100755 --- a/tests/complex/43_failover_nfs_basic.sh +++ b/tests/complex/45_failover_nfs_kill.sh @@ -24,7 +24,7 @@ Steps: 3. Select the 1st NFS share exported on the node. 4. Mount the selected NFS share. 5. Create a file in the NFS mount and calculate its checksum. -6. Disable the selected node. +6. Kill CTDB on the selected node. 7. Read the file and calculate its checksum. 8. Compare the checksums. @@ -69,9 +69,11 @@ original_sum=$(sum $test_file) gratarp_sniff_start -echo "Disabling node $test_node" -try_command_on_node 0 $CTDB disable -n $test_node -wait_until_node_has_status $test_node disabled +echo "Killing node $test_node" +try_command_on_node $test_node $CTDB getpid +pid=${out#*:} +try_command_on_node $test_node kill -9 $pid +wait_until_node_has_status $test_node disconnected gratarp_sniff_wait_show diff --git a/tests/eventscripts/60.nfs.multi.001.sh b/tests/eventscripts/60.nfs.multi.001.sh index e578c56..f983df7 100755 --- a/tests/eventscripts/60.nfs.multi.001.sh +++ b/tests/eventscripts/60.nfs.multi.001.sh @@ -14,8 +14,6 @@ simple_test_event "takeip" $public_address ok <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK EOF simple_test_event "ipreallocated" diff --git a/tests/eventscripts/60.nfs.multi.002.sh b/tests/eventscripts/60.nfs.multi.002.sh index 0c203cd..350c1bc 100755 --- a/tests/eventscripts/60.nfs.multi.002.sh +++ b/tests/eventscripts/60.nfs.multi.002.sh @@ -17,8 +17,6 @@ simple_test_event "takeip" $public_address # to split this into 2 tests. ok <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK Replaying previous status for this script due to reconfigure... EOF diff --git a/tests/eventscripts/60.nfs.multi.003.sh b/tests/eventscripts/60.nfs.multi.003.sh index 31867b2..68f45ab 100755 --- a/tests/eventscripts/60.nfs.multi.003.sh +++ b/tests/eventscripts/60.nfs.multi.003.sh @@ -18,8 +18,6 @@ ctdb_fake_scriptstatus 1 "ERROR" "$err" required_result 1 <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK Replaying previous status for this script due to reconfigure... $err EOF diff --git a/tests/eventscripts/60.nfs.multi.004.sh b/tests/eventscripts/60.nfs.multi.004.sh index 6220ad3..b071ec8 100755 --- a/tests/eventscripts/60.nfs.multi.004.sh +++ b/tests/eventscripts/60.nfs.multi.004.sh @@ -18,8 +18,6 @@ ctdb_fake_scriptstatus -62 "TIMEDOUT" "$err" required_result 1 <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK Replaying previous status for this script due to reconfigure... [Replay of TIMEDOUT scriptstatus - note incorrect return code.] $err EOF diff --git a/tests/eventscripts/60.nfs.multi.005.sh b/tests/eventscripts/60.nfs.multi.005.sh index 1a8200c..82802aa 100755 --- a/tests/eventscripts/60.nfs.multi.005.sh +++ b/tests/eventscripts/60.nfs.multi.005.sh @@ -18,8 +18,6 @@ ctdb_fake_scriptstatus -8 "DISABLED" "$err" ok <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK Replaying previous status for this script due to reconfigure... [Replay of DISABLED scriptstatus - note incorrect return code.] $err EOF diff --git a/tests/eventscripts/60.nfs.multi.006.sh b/tests/eventscripts/60.nfs.multi.006.sh index 21beaad..84bb9ef 100755 --- a/tests/eventscripts/60.nfs.multi.006.sh +++ b/tests/eventscripts/60.nfs.multi.006.sh @@ -13,8 +13,6 @@ err="" ok <<EOF Reconfiguring service "nfs"... -Starting nfslock: OK -Starting nfs: OK EOF simple_test_event "reconfigure" diff --git a/tests/scripts/integration.bash b/tests/scripts/integration.bash index 38420ba..8813499 100644 --- a/tests/scripts/integration.bash +++ b/tests/scripts/integration.bash @@ -575,7 +575,7 @@ daemons_start_1 () fi local node_ip=$(sed -n -e "$(($pnn + 1))p" "$CTDB_NODES") - local ctdb_options="--reclock=${TEST_VAR_DIR}/rec.lock --nlist $CTDB_NODES --nopublicipcheck --node-ip=${node_ip} --event-script-dir=${TEST_VAR_DIR}/events.d --logfile=${TEST_VAR_DIR}/daemon.${pnn}.log -d 3 --log-ringbuf-size=10000 --dbdir=${TEST_VAR_DIR}/test.db --dbdir-persistent=${TEST_VAR_DIR}/test.db/persistent --dbdir-state=${TEST_VAR_DIR}/test.db/state" + local ctdb_options="--reclock=${TEST_VAR_DIR}/rec.lock --nlist $CTDB_NODES --nopublicipcheck --listen=${node_ip} --event-script-dir=${TEST_VAR_DIR}/events.d --logfile=${TEST_VAR_DIR}/daemon.${pnn}.log -d 3 --log-ringbuf-size=10000 --dbdir=${TEST_VAR_DIR}/test.db --dbdir-persistent=${TEST_VAR_DIR}/test.db/persistent --dbdir-state=${TEST_VAR_DIR}/test.db/state" if [ -n "$TEST_LOCAL_DAEMONS" ] ; then ctdb_options="$ctdb_options --public-interface=lo" -- CTDB repository