The branch, master has been updated
via f9d4cb4 ctdb-recoverd: Unify takeover run triggering code in main
loop
via e3e4f37 ctdb-recoverd: Add early return in srvid_requests_reply()
via ebbeab7 ctdb-recoverd: Drop an unnecessary log message
via 2a93b84 ctdb-recoverd: Move takeover run checks after recover checks
via 662f06d ctdb-recoverd: Drop explicit check to flag takeover run
needed
via 4331306 ctdb-takeover: Do not set node unhealthy when "takeip" fails
via 9dc3b11 ctdb-takeover: Recovery daemon no longer passes fail
callback
via 1e9f650 ctdb-takeover: Only apply banning credits to the worst
offender
via 1c60694 ctdb-takeover: Count takeover run failures
via 0053b85 ctdb-takeover: Send banning credit messages from fail
callback
via db9ec11 ctdb-takeover: Have the takeover fail callback log a message
via 1f0263c ctdb-takeover: Use the takeover_run_fail_callback() in more
cases
via 06ad171 ctdb-takeover: New function takeover_callback_data_init()
via a44c099 ctdb-takeover: Takeover callback data doesn't need a node
map
via d61a75f ctdb-takeover: PNN can be used to index into node map
via 9056b43 ctdb-takeover: Drop ipreallocated fallback code
from d36e693 travis: run the samba-o3 target
https://git.samba.org/?p=samba.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit f9d4cb4c291ea4364cd789ec88cae7cc55e95313
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 16:07:34 2016 +1000
ctdb-recoverd: Unify takeover run triggering code in main loop
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
Autobuild-User(master): Amitay Isaacs <[email protected]>
Autobuild-Date(master): Fri May 13 17:15:57 CEST 2016 on sn-devel-144
commit e3e4f37c4179ded8750646c61bb61856c534aa0a
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 15:56:09 2016 +1000
ctdb-recoverd: Add early return in srvid_requests_reply()
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit ebbeab74ed3ddcd9edf7e9fb70999df89c566538
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 16:02:38 2016 +1000
ctdb-recoverd: Drop an unnecessary log message
do_takeover_run() will logs something at NOTICE level anyway.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 2a93b8423be9d4a204a09ed4125977208b3e4dfe
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 16:00:02 2016 +1000
ctdb-recoverd: Move takeover run checks after recover checks
If a recovery is going to be done then this will be followed by a
takeover run anyway. So, there's no use doing the takeover run
checks, potentially doing a takeover run and then doing a recovery.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 662f06de9fdce7b1bc1772a4fbe43de271564917
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 11:52:48 2016 +1000
ctdb-recoverd: Drop explicit check to flag takeover run needed
The recovery daemon should be less involved in the service monitoring
logic.
The cases handled here are already handled elsewhere:
* When a node becomes unhealthy/healthy the monitoring code will
trigger a takeover run
* When a node is disabled/enabled the ctdb CLI tool will trigger a
takeover run
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 4331306fceb343894eb24385922dda8b3606f71c
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 14:18:02 2016 +1000
ctdb-takeover: Do not set node unhealthy when "takeip" fails
It will just become healthy again in the next monitor cycle.
Instead, let the recovery master ban it if the problem persists.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 9dc3b117e2cb6849f5ea8f2944e39e79ca1eb0d3
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 15:35:08 2016 +1000
ctdb-takeover: Recovery daemon no longer passes fail callback
Banning is now handled by the takeover code sending banning credit
messages.
This commit makes a change in behaviour quite obvious. Takeover runs
were initiated from several locations in the code but banning was only
done from one of these locations. Now banning can be done from any
failed takeover run.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 1e9f650382c25ba2c9b9d41c90fcce7372f100e3
Author: Martin Schwenke <[email protected]>
Date: Thu May 5 15:53:48 2016 +1000
ctdb-takeover: Only apply banning credits to the worst offender
Post-process failues and only send banning credits to the node with
the most failures.
If there is a widespread problem or a problem on the recovery master
node then this should help avoid banning all the nodes.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 1c60694e53e8c904c3f5ab0e215579b8d339f910
Author: Martin Schwenke <[email protected]>
Date: Thu May 5 15:36:12 2016 +1000
ctdb-takeover: Count takeover run failures
This will allow banning credits assignments to be limited according to
some criteria.
Note that this only matters when multiple controls are sent to each
node: RELEASE_IP and TAKEOVER_IP. This doesn't change the behaviour
for IPREALLOCATED.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 0053b85fc63dbe1386d1053cfde0301135eba0a3
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 15:09:25 2016 +1000
ctdb-takeover: Send banning credit messages from fail callback
Banning credits are now assigned by takeover runs called from all
locations in the recovery daemon. Previously this only happened from
one of the callers. When separating out the takeover run code the
behaviour should be consistent.
The callback (and corresponding data) passed to ctdb_takeover_run() is
now ignored. Dropping this will allow the interface between the
recovery daemon and IP takeover to be simplified.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit db9ec11b1abfa9e1690b4cfcfa2ad12da6ce9091
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 12:54:10 2016 +1000
ctdb-takeover: Have the takeover fail callback log a message
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 1f0263c6d432039f7ac3b01029dcd91350829b93
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 12:52:36 2016 +1000
ctdb-takeover: Use the takeover_run_fail_callback() in more cases
Probably due to oversight, this is currently only used for the
"takeip" step.
This does consistent error handling and provides a layer of
indirection to the passed callback, so use it for "releaseip" and
"ipreallocated" steps too.
The callback data now needs to be initialised before the first
possible jump to "ipreallocated".
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 06ad1711cfb8945a579eae8f042d75f2876034e3
Author: Martin Schwenke <[email protected]>
Date: Sat May 7 16:10:48 2016 +1000
ctdb-takeover: New function takeover_callback_data_init()
Abstract out the initialisation of the callback data. Later, we'll
need to do it multiple times or move it.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit a44c099e421d8c2e6289ec39c6398d8b807c6ca3
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 15:29:18 2016 +1000
ctdb-takeover: Takeover callback data doesn't need a node map
It just needs to know the number of nodes.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit d61a75fd677e75586c67754887c3c43ba9eafce3
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 12:46:16 2016 +1000
ctdb-takeover: PNN can be used to index into node map
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
commit 9056b43b96bc77c42b82fcebb22c6e0ede57ec72
Author: Martin Schwenke <[email protected]>
Date: Tue May 3 14:14:53 2016 +1000
ctdb-takeover: Drop ipreallocated fallback code
The ipreallocated control has been in CTDB for a long time.
Signed-off-by: Martin Schwenke <[email protected]>
Reviewed-by: Amitay Isaacs <[email protected]>
-----------------------------------------------------------------------
Summary of changes:
ctdb/include/ctdb_private.h | 3 +-
ctdb/server/ctdb_recoverd.c | 80 +++------------
ctdb/server/ctdb_takeover.c | 233 ++++++++++++++++++--------------------------
3 files changed, 110 insertions(+), 206 deletions(-)
Changeset truncated at 500 lines:
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index f8889e0..a2f6dfc 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -975,8 +975,7 @@ int32_t ctdb_control_ipreallocated(struct ctdb_context
*ctdb,
int ctdb_set_public_addresses(struct ctdb_context *ctdb, bool check_addresses);
int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map_old
*nodemap,
- uint32_t *force_rebalance_nodes,
- client_async_callback fail_callback, void *callback_data);
+ uint32_t *force_rebalance_nodes);
int32_t ctdb_control_tcp_client(struct ctdb_context *ctdb, uint32_t client_id,
TDB_DATA indata);
diff --git a/ctdb/server/ctdb_recoverd.c b/ctdb/server/ctdb_recoverd.c
index f3fea02..09940dc 100644
--- a/ctdb/server/ctdb_recoverd.c
+++ b/ctdb/server/ctdb_recoverd.c
@@ -84,6 +84,10 @@ static void srvid_requests_reply(struct ctdb_context *ctdb,
{
struct srvid_list *r;
+ if (*requests == NULL) {
+ return;
+ }
+
for (r = (*requests)->requests; r != NULL; r = r->next) {
srvid_request_reply(ctdb, r->request, result);
}
@@ -1644,25 +1648,6 @@ static int sync_recovery_lock_file_across_cluster(struct
ctdb_recoverd *rec)
return 0;
}
-
-/*
- * this callback is called for every node that failed to execute
ctdb_takeover_run()
- * and set flag to re-run takeover run.
- */
-static void takeover_fail_callback(struct ctdb_context *ctdb, uint32_t
node_pnn, int32_t res, TDB_DATA outdata, void *callback_data)
-{
- DEBUG(DEBUG_ERR, ("Node %u failed the takeover run\n", node_pnn));
-
- if (callback_data != NULL) {
- struct ctdb_recoverd *rec = talloc_get_type(callback_data,
struct ctdb_recoverd);
-
- DEBUG(DEBUG_ERR, ("Setting node %u as recovery fail culprit\n",
node_pnn));
-
- ctdb_set_culprit(rec, node_pnn);
- }
-}
-
-
static void ban_misbehaving_nodes(struct ctdb_recoverd *rec, bool *self_ban)
{
struct ctdb_context *ctdb = rec->ctdb;
@@ -1693,8 +1678,7 @@ static void ban_misbehaving_nodes(struct ctdb_recoverd
*rec, bool *self_ban)
}
static bool do_takeover_run(struct ctdb_recoverd *rec,
- struct ctdb_node_map_old *nodemap,
- bool banning_credits_on_fail)
+ struct ctdb_node_map_old *nodemap)
{
uint32_t *nodes = NULL;
struct ctdb_disable_message dtr;
@@ -1747,9 +1731,7 @@ static bool do_takeover_run(struct ctdb_recoverd *rec,
}
ret = ctdb_takeover_run(rec->ctdb, nodemap,
- rec->force_rebalance_nodes,
- takeover_fail_callback,
- banning_credits_on_fail ? rec : NULL);
+ rec->force_rebalance_nodes);
/* Reenable takeover runs and IP checks on other nodes */
dtr.timeout = 0;
@@ -2226,7 +2208,7 @@ static int do_recovery(struct ctdb_recoverd *rec,
goto fail;
}
- do_takeover_run(rec, nodemap, false);
+ do_takeover_run(rec, nodemap);
/* execute the "recovered" event script on all nodes */
ret = run_recovered_eventscript(rec, nodemap, "do_recovery");
@@ -2676,8 +2658,6 @@ static void process_ipreallocate_requests(struct
ctdb_context *ctdb,
int32_t ret;
struct srvid_requests *current;
- DEBUG(DEBUG_INFO, ("recovery master forced ip reallocation\n"));
-
/* Only process requests that are currently pending. More
* might come in while the takeover run is in progress and
* they will need to be processed later since they might
@@ -2686,7 +2666,7 @@ static void process_ipreallocate_requests(struct
ctdb_context *ctdb,
current = rec->reallocate_requests;
rec->reallocate_requests = NULL;
- if (do_takeover_run(rec, rec->nodemap, false)) {
+ if (do_takeover_run(rec, rec->nodemap)) {
ret = ctdb_get_pnn(ctdb);
} else {
ret = -1;
@@ -2836,7 +2816,6 @@ static void monitor_handler(uint64_t srvid, TDB_DATA
data, void *private_data)
struct ctdb_node_map_old *nodemap=NULL;
TALLOC_CTX *tmp_ctx;
int i;
- int disabled_flag_changed;
if (data.dsize != sizeof(*c)) {
DEBUG(DEBUG_ERR,(__location__ "Invalid data in
ctdb_node_flag_change\n"));
@@ -2868,28 +2847,8 @@ static void monitor_handler(uint64_t srvid, TDB_DATA
data, void *private_data)
DEBUG(DEBUG_NOTICE,("Node %u has changed flags - now 0x%x was
0x%x\n", c->pnn, c->new_flags, c->old_flags));
}
- disabled_flag_changed = (nodemap->nodes[i].flags ^ c->new_flags) &
NODE_FLAGS_DISABLED;
-
nodemap->nodes[i].flags = c->new_flags;
- ret = ctdb_ctrl_getrecmode(ctdb, tmp_ctx, CONTROL_TIMEOUT(),
- CTDB_CURRENT_NODE, &ctdb->recovery_mode);
-
- if (ret == 0 &&
- rec->recmaster == ctdb->pnn &&
- ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {
- /* Only do the takeover run if the perm disabled or unhealthy
- flags changed since these will cause an ip failover but not
- a recovery.
- If the node became disconnected or banned this will also
- lead to an ip address failover but that is handled
- during recovery
- */
- if (disabled_flag_changed) {
- rec->need_takeover_run = true;
- }
- }
-
talloc_free(tmp_ctx);
}
@@ -3696,18 +3655,12 @@ static void main_loop(struct ctdb_context *ctdb, struct
ctdb_recoverd *rec,
}
- /* if there are takeovers requested, perform it and notify the waiters
*/
- if (!ctdb_op_is_disabled(rec->takeover_run) &&
- rec->reallocate_requests) {
- process_ipreallocate_requests(ctdb, rec);
- }
-
/* If recoveries are disabled then there is no use doing any
* nodemap or flags checks. Recoveries might be disabled due
* to "reloadnodes", so doing these checks might cause an
* unnecessary recovery. */
if (ctdb_op_is_disabled(rec->recovery)) {
- return;
+ goto takeover_run_checks;
}
/* get the nodemap for all active remote nodes
@@ -3916,14 +3869,13 @@ static void main_loop(struct ctdb_context *ctdb, struct
ctdb_recoverd *rec,
}
}
- /* we might need to change who has what IP assigned */
- if (rec->need_takeover_run) {
- /* If takeover run fails, then the offending nodes are
- * assigned ban culprit counts. And we re-try takeover.
- * If takeover run fails repeatedly, the node would get
- * banned.
- */
- do_takeover_run(rec, nodemap, true);
+takeover_run_checks:
+
+ /* If there are IP takeover runs requested or the previous one
+ * failed then perform one and notify the waiters */
+ if (!ctdb_op_is_disabled(rec->takeover_run) &&
+ (rec->reallocate_requests || rec->need_takeover_run)) {
+ process_ipreallocate_requests(ctdb, rec);
}
}
diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c
index 5001489..cb431b3 100644
--- a/ctdb/server/ctdb_takeover.c
+++ b/ctdb/server/ctdb_takeover.c
@@ -423,8 +423,6 @@ static void ctdb_do_takeip_callback(struct ctdb_context
*ctdb, int status,
TDB_DATA data;
if (status != 0) {
- struct ctdb_node *node = ctdb->nodes[ctdb->pnn];
-
if (status == -ETIME) {
ctdb_ban_self(ctdb);
}
@@ -433,7 +431,6 @@ static void ctdb_do_takeip_callback(struct ctdb_context
*ctdb, int status,
ctdb_vnn_iface_string(state->vnn)));
ctdb_request_control_reply(ctdb, state->c, NULL, status, NULL);
- node->flags |= NODE_FLAGS_UNHEALTHY;
talloc_free(state);
return;
}
@@ -1552,74 +1549,35 @@ fail:
return NULL;
}
-struct iprealloc_callback_data {
- bool *retry_nodes;
- int retry_count;
- client_async_callback fail_callback;
- void *fail_callback_data;
- struct ctdb_node_map_old *nodemap;
+struct takeover_callback_data {
+ uint32_t num_nodes;
+ unsigned int *fail_count;
};
-static void iprealloc_fail_callback(struct ctdb_context *ctdb, uint32_t pnn,
- int32_t res, TDB_DATA outdata,
- void *callback)
+static struct takeover_callback_data *
+takeover_callback_data_init(TALLOC_CTX *mem_ctx,
+ uint32_t num_nodes)
{
- int numnodes;
- struct iprealloc_callback_data *cd =
- (struct iprealloc_callback_data *)callback;
+ static struct takeover_callback_data *takeover_data;
- numnodes = talloc_array_length(cd->retry_nodes);
- if (pnn > numnodes) {
- DEBUG(DEBUG_ERR,
- ("ipreallocated failure from node %d, "
- "but only %d nodes in nodemap\n",
- pnn, numnodes));
- return;
+ takeover_data = talloc_zero(mem_ctx, struct takeover_callback_data);
+ if (takeover_data == NULL) {
+ DEBUG(DEBUG_ERR, (__location__ " out of memory\n"));
+ return NULL;
}
- /* Can't run the "ipreallocated" event on a INACTIVE node */
- if (cd->nodemap->nodes[pnn].flags & NODE_FLAGS_INACTIVE) {
- DEBUG(DEBUG_WARNING,
- ("ipreallocated failed on inactive node %d, ignoring\n",
- pnn));
- return;
+ takeover_data->fail_count = talloc_zero_array(takeover_data,
+ unsigned int, num_nodes);
+ if (takeover_data->fail_count == NULL) {
+ DEBUG(DEBUG_ERR, (__location__ " out of memory\n"));
+ talloc_free(takeover_data);
+ return NULL;
}
- switch (res) {
- case -ETIME:
- /* If the control timed out then that's a real error,
- * so call the real fail callback
- */
- if (cd->fail_callback) {
- cd->fail_callback(ctdb, pnn, res, outdata,
- cd->fail_callback_data);
- } else {
- DEBUG(DEBUG_WARNING,
- ("iprealloc timed out but no callback
registered\n"));
- }
- break;
- default:
- /* If not a timeout then either the ipreallocated
- * eventscript (or some setup) failed. This might
- * have failed because the IPREALLOCATED control isn't
- * implemented - right now there is no way of knowing
- * because the error codes are all folded down to -1.
- * Consider retrying using EVENTSCRIPT control...
- */
- DEBUG(DEBUG_WARNING,
- ("ipreallocated failure from node %d, flagging retry\n",
- pnn));
- cd->retry_nodes[pnn] = true;
- cd->retry_count++;
- }
-}
+ takeover_data->num_nodes = num_nodes;
-struct takeover_callback_data {
- bool *node_failed;
- client_async_callback fail_callback;
- void *fail_callback_data;
- struct ctdb_node_map_old *nodemap;
-};
+ return takeover_data;
+}
static void takeover_run_fail_callback(struct ctdb_context *ctdb,
uint32_t node_pnn, int32_t res,
@@ -1628,23 +1586,53 @@ static void takeover_run_fail_callback(struct
ctdb_context *ctdb,
struct takeover_callback_data *cd =
talloc_get_type_abort(callback_data,
struct takeover_callback_data);
- int i;
- for (i = 0; i < cd->nodemap->num; i++) {
- if (node_pnn == cd->nodemap->nodes[i].pnn) {
- break;
- }
- }
-
- if (i == cd->nodemap->num) {
+ if (node_pnn >= cd->num_nodes) {
DEBUG(DEBUG_ERR, (__location__ " invalid PNN %u\n", node_pnn));
return;
}
- if (!cd->node_failed[i]) {
- cd->node_failed[i] = true;
- cd->fail_callback(ctdb, node_pnn, res, outdata,
- cd->fail_callback_data);
+ if (cd->fail_count[node_pnn] == 0) {
+ DEBUG(DEBUG_ERR,
+ ("Node %u failed the takeover run\n", node_pnn));
+ }
+
+ cd->fail_count[node_pnn]++;
+}
+
+static void takeover_run_process_failures(struct ctdb_context *ctdb,
+ struct takeover_callback_data *tcd)
+{
+ unsigned int max_fails = 0;
+ uint32_t max_pnn = -1;
+ uint32_t i;
+
+ for (i = 0; i < tcd->num_nodes; i++) {
+ if (tcd->fail_count[i] > max_fails) {
+ max_pnn = i;
+ max_fails = tcd->fail_count[i];
+ }
+ }
+
+ if (max_fails > 0) {
+ int ret;
+ TDB_DATA data;
+
+ DEBUG(DEBUG_ERR,
+ ("Sending banning credits to %u with fail count %u\n",
+ max_pnn, max_fails));
+
+ data.dptr = (uint8_t *)&max_pnn;
+ data.dsize = sizeof(uint32_t);
+ ret = ctdb_client_send_message(ctdb,
+ CTDB_BROADCAST_CONNECTED,
+ CTDB_SRVID_BANNING,
+ data);
+ if (ret != 0) {
+ DEBUG(DEBUG_ERR,
+ ("Failed to set banning credits for node %u\n",
+ max_pnn));
+ }
}
}
@@ -1677,10 +1665,9 @@ static void takeover_run_fail_callback(struct
ctdb_context *ctdb,
* - Send IPREALLOCATED to all nodes (with backward compatibility hack)
*/
int ctdb_takeover_run(struct ctdb_context *ctdb, struct ctdb_node_map_old
*nodemap,
- uint32_t *force_rebalance_nodes,
- client_async_callback fail_callback, void *callback_data)
+ uint32_t *force_rebalance_nodes)
{
- int i, j, ret;
+ int i, ret;
struct ctdb_public_ip ip;
uint32_t *nodes;
struct public_ip_list *all_ips, *tmp_ip;
@@ -1691,10 +1678,19 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct
ctdb_node_map_old *nodem
TALLOC_CTX *tmp_ctx = talloc_new(ctdb);
struct ipalloc_state *ipalloc_state;
struct takeover_callback_data *takeover_data;
- struct iprealloc_callback_data iprealloc_data;
- bool *retry_data;
bool can_host_ips;
+ /* Initialise fail callback data to be used with
+ * takeover_run_fail_callback(). A failure in any of the
+ * following steps will cause an early return, so this can be
+ * reused for each of those steps without re-initialising. */
+ takeover_data = takeover_callback_data_init(tmp_ctx,
+ nodemap->num);
+ if (takeover_data == NULL) {
+ talloc_free(tmp_ctx);
+ return -1;
+ }
+
/*
* ip failover is completely disabled, just send out the
* ipreallocated event.
@@ -1754,16 +1750,6 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct
ctdb_node_map_old *nodem
* host. This will be a NOOP on nodes that don't currently
* hold the given IP.
*/
- takeover_data = talloc_zero(tmp_ctx, struct takeover_callback_data);
- CTDB_NO_MEMORY_FATAL(ctdb, takeover_data);
-
- takeover_data->node_failed = talloc_zero_array(tmp_ctx,
- bool, nodemap->num);
- CTDB_NO_MEMORY_FATAL(ctdb, takeover_data->node_failed);
- takeover_data->fail_callback = fail_callback;
- takeover_data->fail_callback_data = callback_data;
- takeover_data->nodemap = nodemap;
-
async_data = talloc_zero(tmp_ctx, struct client_async_data);
CTDB_NO_MEMORY_FATAL(ctdb, async_data);
@@ -1811,9 +1797,9 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct
ctdb_node_map_old *nodem
}
}
if (ctdb_client_async_wait(ctdb, async_data) != 0) {
- DEBUG(DEBUG_ERR,(__location__ " Async control
CTDB_CONTROL_RELEASE_IP failed\n"));
- talloc_free(tmp_ctx);
- return -1;
+ DEBUG(DEBUG_ERR,
+ ("Async control CTDB_CONTROL_RELEASE_IP failed\n"));
+ goto fail;
}
talloc_free(async_data);
@@ -1825,8 +1811,8 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct
ctdb_node_map_old *nodem
async_data = talloc_zero(tmp_ctx, struct client_async_data);
CTDB_NO_MEMORY_FATAL(ctdb, async_data);
- async_data->fail_callback = fail_callback;
- async_data->callback_data = callback_data;
+ async_data->fail_callback = takeover_run_fail_callback;
+ async_data->callback_data = takeover_data;
for (tmp_ip=all_ips;tmp_ip;tmp_ip=tmp_ip->next) {
if (tmp_ip->pnn == -1) {
@@ -1852,9 +1838,9 @@ int ctdb_takeover_run(struct ctdb_context *ctdb, struct
ctdb_node_map_old *nodem
ctdb_client_async_add(async_data, state);
}
if (ctdb_client_async_wait(ctdb, async_data) != 0) {
- DEBUG(DEBUG_ERR,(__location__ " Async control
CTDB_CONTROL_TAKEOVER_IP failed\n"));
- talloc_free(tmp_ctx);
- return -1;
+ DEBUG(DEBUG_ERR,
+ ("Async control CTDB_CONTROL_TAKEOVER_IP failed\n"));
+ goto fail;
}
ipreallocated:
@@ -1865,58 +1851,25 @@ ipreallocated:
* IPs have moved. Once upon a time this event only used to
* update natgw.
*/
- retry_data = talloc_zero_array(tmp_ctx, bool, nodemap->num);
- CTDB_NO_MEMORY_FATAL(ctdb, retry_data);
- iprealloc_data.retry_nodes = retry_data;
- iprealloc_data.retry_count = 0;
- iprealloc_data.fail_callback = fail_callback;
- iprealloc_data.fail_callback_data = callback_data;
- iprealloc_data.nodemap = nodemap;
-
nodes = list_of_connected_nodes(ctdb, nodemap, tmp_ctx, true);
ret = ctdb_client_async_control(ctdb, CTDB_CONTROL_IPREALLOCATED,
nodes, 0, TAKEOVER_TIMEOUT(),
false, tdb_null,
- NULL, iprealloc_fail_callback,
- &iprealloc_data);
+ NULL, takeover_run_fail_callback,
+ takeover_data);
if (ret != 0) {
- /* If the control failed then we should retry to any
- * nodes flagged by iprealloc_fail_callback using the
- * EVENTSCRIPT control. This is a best-effort at
- * backward compatiblity when running a mixed cluster
- * where some nodes have not yet been upgraded to
- * support the IPREALLOCATED control.
- */
- DEBUG(DEBUG_WARNING,
- ("Retry ipreallocated to some nodes using eventscript
control\n"));
-
- nodes = talloc_array(tmp_ctx, uint32_t,
- iprealloc_data.retry_count);
- CTDB_NO_MEMORY_FATAL(ctdb, nodes);
-
- j = 0;
- for (i=0; i<nodemap->num; i++) {
- if (iprealloc_data.retry_nodes[i]) {
- nodes[j] = i;
- j++;
- }
- }
-
- data.dptr = discard_const("ipreallocated");
- data.dsize = strlen((char *)data.dptr) + 1;
- ret = ctdb_client_async_control(ctdb,
- CTDB_CONTROL_RUN_EVENTSCRIPTS,
--
Samba Shared Repository