The branch, master has been updated via 57aa2dffea60abd73a95233f8b761cc676adebb6 (commit) via 37ccc7c6cc43a80aaa92291aea7a438f4225488a (commit) via 782814288bb560099ee44b607bf35f3eddf37f82 (commit) via a20d94717d2e4ab866d8a002cdf39c0669b74c6a (commit) via af5aa369c266430fe912df0c26116b68bac3572e (commit) via a69e03a5e4671e998d45b4fef8611a421bbdb3e1 (commit) via bf4a7c1ad87e0e848296d15d63eb8cd901ca5335 (commit) via 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714 (commit) via e0f3fa1020e13b84bdd672538168d148f1847d57 (commit) via 29e98017221326bdc9b1c4f7c05b3b495c1de29b (commit) via 9d6e1c147bd036d832b98c155f405ee2a5d6f57f (commit) via ae3c03d80264e997b7da9f3279d7810e18b8a1df (commit) via 90d792cf28d6a823141e4c417b6978f02a9cf596 (commit) via 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58 (commit) via 53e4eca74429f76adc81d98e3d11d1bd61194d71 (commit) via 501f19b16fd6d67fbb754248868c38ee5bcf79ef (commit) via c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7 (commit) via 57ef5d3827ea3417a32703e259a53ce6fd10ac45 (commit) from 5740155cc5de1a223412e8529aa1a383a5412514 (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit 57aa2dffea60abd73a95233f8b761cc676adebb6 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 26 15:09:24 2013 +1000 doc: Update XML files to use standard DocBook DTD This simplifies building since we don't use any of the Samba extensions. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 26 11:20:47 2013 +1000 initscript: The wrapper script should export CTDB_SOCKET This ensures that any invocation of the ctdb tool (within the wrapper) gets the desired value. This at least ensures that ctdbd will be started. If a non-standard value is set for CTDB_SOCKET then command-line users will still need the variable in their environment. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 782814288bb560099ee44b607bf35f3eddf37f82 Author: Martin Schwenke <mar...@meltin.net> Date: Thu Jul 25 16:17:07 2013 +1000 ctdbd: Kill client process without checking for tracked child Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check to ensure that CTDB never kills unrelated processes. However, client processes are unrelated. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a Author: Martin Schwenke <mar...@meltin.net> Date: Thu Jul 25 13:40:43 2013 +1000 eventscripts: kill_tcp_connections() should send connections to stdin This avoids issuing multiple "ctdb killtcp" commands to terminate tcp connections, one per connection. This will considerably reduce the time when there is a large number of tcp connections. This also makes it possible to avoid calling "ctdb killtcp" when there are no connections. Add a couple of unit tests for killtcp and update eventscript unit test infrastructure to support. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit af5aa369c266430fe912df0c26116b68bac3572e Author: Martin Schwenke <mar...@meltin.net> Date: Thu Jul 25 13:28:26 2013 +1000 tools/ctdb: Allow killtcp to read connections from standard input This will allows eventscripts to send information about multiple tcp connections to a single "ctdb killtcp" command, saving the overhead of setting up a client connection per tcp connection. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 22 20:11:58 2013 +1000 tests: Always tally the number of passed/failed tests Regardless of whether a summary is being printed! Signed-off-by: Martin Schwenke <mar...@meltin.net> commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 22 16:39:46 2013 +1000 recoverd: Call takeover fail callback only once per node Currently the fail callback is called once per (takeip/releaseip) control failure. This is overkill and can get a node banned much too quickly. Instead, keep track of control failures per node and only call fail callback once per failed node. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 22 15:08:32 2013 +1000 scripts: Run scriptstatus for hung event The timeout information printed by ctdbd is less than useful because it refers to the cumulative time taken by the eventscripts run so far. Adding scriptstatus output indicates where time was actually spent. Since there is now quite a bit of output, serialise the calls to this script using flock. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit e0f3fa1020e13b84bdd672538168d148f1847d57 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 22 15:06:52 2013 +1000 ctdbd: Pass event name to hung script debugger Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jul 22 14:32:13 2013 +1000 tests/complex: Fix NFS tests to work with root_squash Refactor the NFS test setup/cleanup code into new common functions. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 9d6e1c147bd036d832b98c155f405ee2a5d6f57f Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 19 19:59:43 2013 +1000 tests: Fix exit status of run_tests when a single test is run with -H Signed-off-by: Martin Schwenke <mar...@meltin.net> commit ae3c03d80264e997b7da9f3279d7810e18b8a1df Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 19 15:33:38 2013 +1000 tests/simple: Add -p in onnode test to help show groups of connections Change the command from "true" to "hostname" since the former won't produce any output when used in combination with "onnode -p". This could just be changed to "echo" but the hostname might actually be useful. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 90d792cf28d6a823141e4c417b6978f02a9cf596 Author: Martin Schwenke <mar...@meltin.net> Date: Wed Jul 17 11:14:37 2013 +1000 ctdbd: Sleep at exit to allow time for log messages to flush Register print_exit_message() earlier so that it covers most of the early exits. Signed-off-by: Martin Schwenke <mar...@meltin.net> Pair-programmed-with: Amitay Isaacs <ami...@gmail.com> commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 19 15:36:29 2013 +1000 ctdbd: Exit if something is already listening on CTDB socket Don't blindly remove the socket. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 53e4eca74429f76adc81d98e3d11d1bd61194d71 Author: Martin Schwenke <mar...@meltin.net> Date: Tue Jul 16 19:57:18 2013 +1000 tests/eventscripts: Add tests for monitoring of missing interfaces Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 12 12:48:34 2013 +1000 eventscripts: A missing interface should cause monitoring to fail A missing interface is at least as bad as an interface with a link that is down so should have a similar effect. This couldn't be done previously because orphaned interfaces used to be listed for monitoring. This was worked around in 10.interface in commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a. If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't actually fail but the interface is still marked as down. While we're touching this code, use "ip link" instead of "ip addr". It is marginally cheaper but not enough for a separate patch. ;-) This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7 Author: Martin Schwenke <mar...@meltin.net> Date: Fri Jul 12 12:33:36 2013 +1000 eventscripts: Get list of configured interfaces using "ctdb ifaces" This was previosuly changed because ctdbd didn't garbage collect orphaned interfaces. This was fixed in commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a. Signed-off-by: Martin Schwenke <mar...@meltin.net> commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45 Author: Martin Schwenke <mar...@meltin.net> Date: Mon Jun 24 15:49:48 2013 +1000 ctdbd: Allow extra recovery to repair persistent DBs during first recovery Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential regression because a node may not have completed the "recovered" event (so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node becomes healthy. Signed-off-by: Martin Schwenke <mar...@meltin.net> ----------------------------------------------------------------------- Summary of changes: config/ctdbd_wrapper | 2 + config/debug-hung-script.sh | 19 +++- config/events.d/10.interface | 11 +- config/functions | 25 +++-- doc/ctdb.1.xml | 14 ++- doc/ctdbd.1.xml | 4 +- doc/ltdbtool.1.xml | 4 +- doc/onnode.1.xml | 4 +- doc/ping_pong.1.xml | 4 +- server/ctdb_daemon.c | 40 +++++--- server/ctdb_monitor.c | 2 +- server/ctdb_takeover.c | 51 ++++++++- server/eventscript.c | 5 +- tests/complex/43_failover_nfs_basic.sh | 19 +--- tests/complex/44_failover_nfs_oneway.sh | 25 +---- tests/complex/45_failover_nfs_kill.sh | 19 +--- tests/eventscripts/10.interface.monitor.015.sh | 16 +++ tests/eventscripts/10.interface.monitor.016.sh | 18 +++ tests/eventscripts/10.interface.releaseip.010.sh | 32 ++++++ tests/eventscripts/10.interface.releaseip.011.sh | 38 +++++++ tests/eventscripts/scripts/local.sh | 5 + tests/eventscripts/stubs/ctdb | 8 ++ tests/eventscripts/stubs/ip | 126 ++++++++++++--------- tests/eventscripts/stubs/netstat | 3 + tests/scripts/integration.bash | 36 ++++++ tests/scripts/run_tests | 19 ++-- tests/simple/00_ctdb_onnode.sh | 2 +- tools/ctdb.c | 112 +++++++++++++++++++- 28 files changed, 504 insertions(+), 159 deletions(-) mode change 100644 => 100755 config/debug-hung-script.sh create mode 100755 tests/eventscripts/10.interface.monitor.015.sh create mode 100755 tests/eventscripts/10.interface.monitor.016.sh create mode 100755 tests/eventscripts/10.interface.releaseip.010.sh create mode 100755 tests/eventscripts/10.interface.releaseip.011.sh Changeset truncated at 500 lines: diff --git a/config/ctdbd_wrapper b/config/ctdbd_wrapper index 33bef06..fbc76cf 100755 --- a/config/ctdbd_wrapper +++ b/config/ctdbd_wrapper @@ -20,6 +20,8 @@ action="$2" . "${CTDB_BASE}/functions" loadconfig "ctdb" +export CTDB_SOCKET + ctdbd="${CTDBD:-/usr/sbin/ctdbd}" ############################################################ diff --git a/config/debug-hung-script.sh b/config/debug-hung-script.sh old mode 100644 new mode 100755 index dcf68ba..32dbd5f --- a/config/debug-hung-script.sh +++ b/config/debug-hung-script.sh @@ -1,4 +1,19 @@ #!/bin/sh -echo "Pstree output for the hung script:" -pstree -p -a $1 +( + flock --wait 2 9 || exit 1 + + echo "===== Start of hung script debug for PID=\"$1\", event\"$2\" =====" + + echo "pstree -p -a ${1}:" + pstree -p -a $1 + + echo "ctdb scriptstatus ${2}:" + # No use running several of these in parallel if, say, "releaseip" + # event hangs for multiple IPs. In that case the output would be + # interleaved in the log and would just be confusing. + ctdb scriptstatus "$2" + + echo "===== End of hung script debug for PID=\"$1\", event\"$2\" =====" + +) 9>"${CTDB_VARDIR}/debug-hung-script.lock" diff --git a/config/events.d/10.interface b/config/events.d/10.interface index caff9fa..5379ea8 100755 --- a/config/events.d/10.interface +++ b/config/events.d/10.interface @@ -44,9 +44,9 @@ get_all_interfaces () [ "$CTDB_PUBLIC_INTERFACE" ] && all_interfaces="$CTDB_PUBLIC_INTERFACE $all_interfaces" [ "$CTDB_NATGW_PUBLIC_IFACE" ] && all_interfaces="$CTDB_NATGW_PUBLIC_IFACE $all_interfaces" - # Get the configured interfaces for each IP. That is, for all but - # the 1st line, get the last field with commas changed to spaces. - ctdb_ifaces=$(ctdb -Y ip -v | sed -e '1d' -e 's/:$//' -e 's/^.*://' -e 's/,/ /g') + # Get the interfaces for which CTDB has public IPs configured. + # That is, for all but the 1st line, get the 1st field. + ctdb_ifaces=$(ctdb -Y ifaces | sed -e '1d' -e 's@^:@@' -e 's@:.*@@') # Add $ctdb_interfaces and uniquify all_interfaces=$(echo $all_interfaces $ctdb_ifaces | tr ' ' '\n' | sort -u) @@ -65,8 +65,9 @@ monitor_interfaces() # problem with an interface then set fail=true and continue. for iface in $all_interfaces ; do - ip addr show $iface 2>/dev/null >/dev/null || { - echo "WARNING: Interface $iface does not exist but it is used by public addresses." + ip link show $iface 2>/dev/null >/dev/null || { + echo "ERROR: Interface $iface does not exist but it is used by public addresses." + mark_down $iface continue } diff --git a/config/functions b/config/functions index 0679938..eabc940 100755 --- a/config/functions +++ b/config/functions @@ -648,32 +648,35 @@ kill_tcp_connections () get_tcp_connections_for_ip "$_ip" | { _killcount=0 - _failed=false - - while read dest src; do - echo "Killing TCP connection $src $dest" - ctdb killtcp $src $dest >/dev/null 2>&1 || _failed=true - _destport="${dest##*:}" + _connections="" + _nl=" +" + while read _dst _src; do + _destport="${_dst##*:}" __oneway=$_oneway case $_destport in # we only do one-way killtcp for CIFS 139|445) __oneway=true ;; esac + + echo "Killing TCP connection $_src $_dst" + _connections="${_connections}${_nl}${_src} ${_dst}" if ! $__oneway ; then - ctdb killtcp $dest $src >/dev/null 2>&1 || _failed=true + _connections="${_connections}${_nl}${_dst} ${_src}" fi _killcount=$(($_killcount + 1)) done - if $_failed ; then - echo "Failed to send killtcp control" - return - fi if [ $_killcount -eq 0 ] ; then return fi + echo "$_connections" | ctdb killtcp || { + echo "Failed to send killtcp control" + return + } + _count=0 while : ; do if [ -z "$(get_tcp_connections_for_ip $_ip)" ] ; then diff --git a/doc/ctdb.1.xml b/doc/ctdb.1.xml index c854619..ebb9c8e 100644 --- a/doc/ctdb.1.xml +++ b/doc/ctdb.1.xml @@ -1,5 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc"> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <refentry id="ctdb.1"> <refmeta> @@ -1014,11 +1016,13 @@ Reclock file:/gpfs/.ctdb/shared </para> </refsect2> - <refsect2><title>killtcp <srcip:port> <dstip:port></title> + <refsect2><title>killtcp [<srcip:port> <dstip:port>]</title> <para> - This command will kill the specified TCP connection by issuing a - TCP RST to the srcip:port endpoint. This is a command used by the - ctdb eventscripts. + This command will kill the specified TCP connections by + issuing a TCP RST to the srcip:port endpoint. A single + connection can be specified on the command-line, otherwise + connections are read one-per-line from standard input. This + is a command used by the ctdb eventscripts. </para> </refsect2> diff --git a/doc/ctdbd.1.xml b/doc/ctdbd.1.xml index 1053d9b..111a8f4 100644 --- a/doc/ctdbd.1.xml +++ b/doc/ctdbd.1.xml @@ -1,5 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc"> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <refentry id="ctdbd.1"> <refmeta> diff --git a/doc/ltdbtool.1.xml b/doc/ltdbtool.1.xml index a0379a6..fe9e3e8 100644 --- a/doc/ltdbtool.1.xml +++ b/doc/ltdbtool.1.xml @@ -1,5 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc"> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <refentry id="ltdbtool.1"> <refmeta> diff --git a/doc/onnode.1.xml b/doc/onnode.1.xml index 1b97c2f..65b1792 100644 --- a/doc/onnode.1.xml +++ b/doc/onnode.1.xml @@ -1,5 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc"> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <refentry id="onnode.1"> <refmeta> diff --git a/doc/ping_pong.1.xml b/doc/ping_pong.1.xml index f4148ae..2e4b016 100644 --- a/doc/ping_pong.1.xml +++ b/doc/ping_pong.1.xml @@ -1,5 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!DOCTYPE refentry PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc"> +<!DOCTYPE refentry + PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <refentry id="ping_pong.1"> <refmeta> diff --git a/server/ctdb_daemon.c b/server/ctdb_daemon.c index 0932157..644b5ed 100644 --- a/server/ctdb_daemon.c +++ b/server/ctdb_daemon.c @@ -47,6 +47,9 @@ static void print_exit_message(void) DEBUG(DEBUG_NOTICE,("CTDB %s shutting down\n", debug_extra)); } else { DEBUG(DEBUG_NOTICE,("CTDB daemon shutting down\n")); + + /* Wait a second to allow pending log messages to be flushed */ + sleep(1); } } @@ -976,23 +979,35 @@ static int ux_socket_bind(struct ctdb_context *ctdb) return -1; } - set_close_on_exec(ctdb->daemon.sd); - set_nonblocking(ctdb->daemon.sd); - memset(&addr, 0, sizeof(addr)); addr.sun_family = AF_UNIX; strncpy(addr.sun_path, ctdb->daemon.name, sizeof(addr.sun_path)); + /* First check if an old ctdbd might be running */ + if (connect(ctdb->daemon.sd, + (struct sockaddr *)&addr, sizeof(addr)) == 0) { + DEBUG(DEBUG_CRIT, + ("Something is already listening on ctdb socket '%s'\n", + ctdb->daemon.name)); + goto failed; + } + + /* Remove any old socket */ + unlink(ctdb->daemon.name); + + set_close_on_exec(ctdb->daemon.sd); + set_nonblocking(ctdb->daemon.sd); + if (bind(ctdb->daemon.sd, (struct sockaddr *)&addr, sizeof(addr)) == -1) { DEBUG(DEBUG_CRIT,("Unable to bind on ctdb socket '%s'\n", ctdb->daemon.name)); goto failed; - } + } if (chown(ctdb->daemon.name, geteuid(), getegid()) != 0 || chmod(ctdb->daemon.name, 0700) != 0) { DEBUG(DEBUG_CRIT,("Unable to secure ctdb socket '%s', ctdb->daemon.name\n", ctdb->daemon.name)); goto failed; - } + } if (listen(ctdb->daemon.sd, 100) != 0) { @@ -1139,13 +1154,10 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog, struct fd_event *fde; const char *domain_socket_name; - /* get rid of any old sockets */ - unlink(ctdb->daemon.name); - /* create a unix domain stream socket to listen to */ res = ux_socket_bind(ctdb); if (res!=0) { - DEBUG(DEBUG_ALERT,(__location__ " Failed to open CTDB unix domain socket\n")); + DEBUG(DEBUG_ALERT,("Cannot continue. Exiting!\n")); exit(10); } @@ -1171,6 +1183,12 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog, CTDB_VERSION_STRING, ctdbd_pid)); ctdb_create_pidfile(ctdb->ctdbd_pid); + /* Make sure we log something when the daemon terminates. + * This must be the first exit handler to run (so the last to + * be registered. + */ + atexit(print_exit_message); + if (ctdb->do_setsched) { /* try to set us up as realtime */ ctdb_set_scheduler(ctdb); @@ -1283,10 +1301,6 @@ int ctdb_start_daemon(struct ctdb_context *ctdb, bool do_fork, bool use_syslog, ctdb_release_all_ips(ctdb); } - - /* Make sure we log something when the daemon terminates */ - atexit(print_exit_message); - /* Start the transport */ if (ctdb->methods->start(ctdb) != 0) { DEBUG(DEBUG_ALERT,("transport failed to start!\n")); diff --git a/server/ctdb_monitor.c b/server/ctdb_monitor.c index 63eb9df..c23477d 100644 --- a/server/ctdb_monitor.c +++ b/server/ctdb_monitor.c @@ -480,7 +480,7 @@ int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata) DEBUG(DEBUG_INFO, ("Control modflags on node %u - flags now 0x%x\n", c->pnn, node->flags)); - if (node->flags == 0 && ctdb->runstate == CTDB_RUNSTATE_STARTUP) { + if (node->flags == 0 && ctdb->runstate <= CTDB_RUNSTATE_STARTUP) { DEBUG(DEBUG_ERR, (__location__ " Node %u became healthy - force recovery for startup\n", c->pnn)); ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE; diff --git a/server/ctdb_takeover.c b/server/ctdb_takeover.c index be49b3f..82fecfc 100644 --- a/server/ctdb_takeover.c +++ b/server/ctdb_takeover.c @@ -861,7 +861,7 @@ static void release_kill_clients(struct ctdb_context *ctdb, ctdb_sock_addr *addr (unsigned)client->pid, ctdb_addr_to_str(addr), ip->client_id)); - ctdb_kill(ctdb, client->pid, SIGKILL); + kill(client->pid, SIGKILL); } } } @@ -2622,6 +2622,40 @@ static void iprealloc_fail_callback(struct ctdb_context *ctdb, uint32_t pnn, } } +struct takeover_callback_data { + bool *node_failed; + client_async_callback fail_callback; + void *fail_callback_data; + struct ctdb_node_map *nodemap; +}; + +static void takeover_run_fail_callback(struct ctdb_context *ctdb, + uint32_t node_pnn, int32_t res, + TDB_DATA outdata, void *callback_data) +{ + struct takeover_callback_data *cd = + talloc_get_type_abort(callback_data, + struct takeover_callback_data); + int i; + + for (i = 0; i < cd->nodemap->num; i++) { + if (node_pnn == cd->nodemap->nodes[i].pnn) { + break; + } + } + + if (i == cd->nodemap->num) { + DEBUG(DEBUG_ERR, (__location__ " invalid PNN %u\n", node_pnn)); + return; + } + + if (!cd->node_failed[i]) { + cd->node_failed[i] = true; + cd->fail_callback(ctdb, node_pnn, res, outdata, + cd->fail_callback_data); + } +} + /* make any IP alias changes for public addresses that are necessary */ @@ -2640,6 +2674,7 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap, TALLOC_CTX *tmp_ctx = talloc_new(ctdb); uint32_t disable_timeout; struct ctdb_ipflags *ipflags; + struct takeover_callback_data *takeover_data; struct iprealloc_callback_data iprealloc_data; bool *retry_data; @@ -2683,11 +2718,21 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map *nodemap, /* now tell all nodes to delete any alias that they should not have. This will be a NOOP on nodes that don't currently hold the given alias */ + takeover_data = talloc_zero(tmp_ctx, struct takeover_callback_data); + CTDB_NO_MEMORY_FATAL(ctdb, takeover_data); + + takeover_data->node_failed = talloc_zero_array(tmp_ctx, + bool, nodemap->num); + CTDB_NO_MEMORY_FATAL(ctdb, takeover_data->node_failed); + takeover_data->fail_callback = fail_callback; + takeover_data->fail_callback_data = callback_data; + takeover_data->nodemap = nodemap; + async_data = talloc_zero(tmp_ctx, struct client_async_data); CTDB_NO_MEMORY_FATAL(ctdb, async_data); - async_data->fail_callback = fail_callback; - async_data->callback_data = callback_data; + async_data->fail_callback = takeover_run_fail_callback; + async_data->callback_data = takeover_data; for (i=0;i<nodemap->num;i++) { /* don't talk to unconnected nodes, but do talk to banned nodes */ diff --git a/server/eventscript.c b/server/eventscript.c index 10d426f..c255e17 100644 --- a/server/eventscript.c +++ b/server/eventscript.c @@ -548,8 +548,9 @@ static void ctdb_run_debug_hung_script(struct ctdb_context *ctdb, struct ctdb_ev debug_hung_script = getenv("CTDB_DEBUG_HUNG_SCRIPT"); } - buf = talloc_asprintf(NULL, "%s %d", - debug_hung_script, state->child); + buf = talloc_asprintf(NULL, "%s %d %s", + debug_hung_script, state->child, + ctdb_eventscript_call_names[state->call]); system(buf); talloc_free(buf); diff --git a/tests/complex/43_failover_nfs_basic.sh b/tests/complex/43_failover_nfs_basic.sh index 71a8229..a68f7db 100755 --- a/tests/complex/43_failover_nfs_basic.sh +++ b/tests/complex/43_failover_nfs_basic.sh @@ -49,22 +49,11 @@ cluster_is_healthy # Reset configuration ctdb_restart_when_done -select_test_node_and_ips - -first_export=$(showmount -e $test_ip | sed -n -e '2s/ .*//p') -mnt_d=$(mktemp -d) -test_file="${mnt_d}/$RANDOM" - -ctdb_test_exit_hook_add rm -f "$test_file" -ctdb_test_exit_hook_add umount -f "$mnt_d" -ctdb_test_exit_hook_add rmdir "$mnt_d" - -echo "Mounting ${test_ip}:${first_export} on ${mnt_d} ..." -mount -o timeo=1,hard,intr,vers=3 ${test_ip}:${first_export} ${mnt_d} +nfs_test_setup echo "Create file containing random data..." -dd if=/dev/urandom of=$test_file bs=1k count=1 -original_sum=$(sum $test_file) +dd if=/dev/urandom of=$nfs_local_file bs=1k count=1 +original_sum=$(sum $nfs_local_file) [ $? -eq 0 ] gratarp_sniff_start @@ -75,7 +64,7 @@ wait_until_node_has_status $test_node disabled gratarp_sniff_wait_show -new_sum=$(sum $test_file) +new_sum=$(sum $nfs_local_file) [ $? -eq 0 ] if [ "$original_md5" = "$new_md5" ] ; then diff --git a/tests/complex/44_failover_nfs_oneway.sh b/tests/complex/44_failover_nfs_oneway.sh index 7da8d01..aaec2ed 100755 --- a/tests/complex/44_failover_nfs_oneway.sh +++ b/tests/complex/44_failover_nfs_oneway.sh @@ -51,31 +51,18 @@ cluster_is_healthy # Reset configuration ctdb_restart_when_done -select_test_node_and_ips +nfs_test_setup -first_export=$(showmount -e $test_ip | sed -n -e '2s/ .*//p') +echo "Create file containing random data..." local_f=$(mktemp) -mnt_d=$(mktemp -d) -nfs_f="${mnt_d}/$RANDOM" -remote_f="${test_ip}:${first_export}/$(basename $nfs_f)" - ctdb_test_exit_hook_add rm -f "$local_f" -ctdb_test_exit_hook_add rm -f "$nfs_f" -ctdb_test_exit_hook_add umount -f "$mnt_d" -ctdb_test_exit_hook_add rmdir "$mnt_d" - -echo "Create file containing random data..." dd if=/dev/urandom of=$local_f bs=1k count=1 -chmod 644 "$local_f" # needed for *_squash? local_sum=$(sum $local_f) -[ $? -eq 0 ] - -scp -p "$local_f" "$remote_f" -echo "Mounting ${test_ip}:${first_export} on ${mnt_d} ..." -mount -o timeo=1,hard,intr,vers=3 ${test_ip}:${first_export} ${mnt_d} +scp -p "$local_f" "${test_ip}:${nfs_remote_file}" +try_command_on_node $test_node "chmod 644 $nfs_remote_file" -nfs_sum=$(sum $nfs_f) +nfs_sum=$(sum $nfs_local_file) if [ "$local_sum" = "$nfs_sum" ] ; then echo "GOOD: file contents read correctly via NFS" @@ -94,7 +81,7 @@ wait_until_node_has_status $test_node disabled gratarp_sniff_wait_show -new_sum=$(sum $nfs_f) +new_sum=$(sum $nfs_local_file) [ $? -eq 0 ] -- CTDB repository