The branch, master has been updated via ed7d999214ee009e480c26410a04fa105028cb8e (commit) via af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa (commit) via 929045335212e825deb645cc6c7f97b8a40fdbb3 (commit) via 14bfd22fad1a5fd27eede1be7fccbaed9466e13e (commit) via 961dd5d0acbb971756944ea9f69992020ea7d9fc (commit) via 41bdbcfd72092cdd25da87e60689c087bca97933 (commit) via 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5 (commit) via 40589ae5259880431f358250c1f0d07bcaa21d1f (commit) via 55f91ea4373c54ddb5faad87fa2826d86a4b6172 (commit) via 22a253b7ccf1ff854cddf0b67969dc84d7d6a654 (commit) via 7d176352986317e63696d74252ff5d8eccb2fee5 (commit) via 3c892ea1b5aa42686adb82ce29b9fcfdf9d204a1 (commit) via 2ce3a48cc969d563c26dd295723416c0d7b077a2 (commit) from 6182bd0c19f215a997efe5272e633b1b1bd0c882 (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit ed7d999214ee009e480c26410a04fa105028cb8e Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Oct 1 11:54:35 2013 +1000 tests: If transaction_start fails, try again Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Oct 1 11:53:57 2013 +1000 tests: Make sure test exits with zero status on successful completion Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 929045335212e825deb645cc6c7f97b8a40fdbb3 Author: Amitay Isaacs <ami...@gmail.com> Date: Fri Sep 27 11:26:27 2013 +1000 tests: Re-enable transaction test code Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 14bfd22fad1a5fd27eede1be7fccbaed9466e13e Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Sep 24 13:10:31 2013 +1000 tools/ctdb: Remove setdbseqnum command This command was added to test persistent database recovery with sequence numbers. With the new persistent transaction code, sequence numbers get updated automatically, so there is no need for this command. Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 961dd5d0acbb971756944ea9f69992020ea7d9fc Author: Amitay Isaacs <ami...@gmail.com> Date: Tue Sep 24 13:08:48 2013 +1000 tests: No need to set sequence number when modifying persistent database With the new persistent transaction code, sequence numbers will be automatically updated whenever a record is updated. Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 41bdbcfd72092cdd25da87e60689c087bca97933 Author: Amitay Isaacs <ami...@gmail.com> Date: Wed Sep 25 19:16:53 2013 +1000 client: Remove old persistent transaction code Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5 Author: Amitay Isaacs <ami...@gmail.com> Date: Mon Sep 23 18:30:04 2013 +1000 client: Reimplement persistent transaction code using TRANS3_COMMIT Implementing persistent trasnaction code from Samba. Persistent transaction code was reimplemented in Samba using g_lock.tdb to hold transaction locks and using TRANS3_COMMIT control. Implementation details: 1. When starting a transaction, create a record with "transaction-<dbid>" as key and store current server_id in the structure. 2. If a record already exists, some other client has already started a transaction. Verify that the process corresponding to server_id stored in the record really exists or it's a stale record and overwrite it. 3. All modifications to the actual persistent database are stored in a marshal buffer. 4. When transaction is committed, read the sequence number of the persistent database and increment it. Sequence number record is also stored in the marshal buffer. 5. Send the changed records (marshal buffer) in TRANS3_COMMIT control to all the active nodes. 6. If all controls succeed, verify that the sequence number has been incremented. Commit is successful. If any of the controls fail, abort the transaction. 7. In case sequence number has not yet been incremented, then database recovery has been triggered. So repeat from step 5. Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 40589ae5259880431f358250c1f0d07bcaa21d1f Author: Amitay Isaacs <ami...@gmail.com> Date: Fri Oct 4 15:38:04 2013 +1000 client: Add functions to parse g_lock.tdb records Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 55f91ea4373c54ddb5faad87fa2826d86a4b6172 Author: Amitay Isaacs <ami...@gmail.com> Date: Fri Oct 4 15:37:24 2013 +1000 client: Add functions to handle server_id structure server_id records are stored in g_lock.tdb for persistent transactions. Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Sep 12 16:43:43 2013 +1000 ctdbd: Remove transaction code related to TRANS2 commits This removes data types and structure elements related to TRANS2 persistent transaction code. Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 7d176352986317e63696d74252ff5d8eccb2fee5 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Sep 12 16:27:39 2013 +1000 ctdbd: Deprecate TRANS2 commit controls Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 3c892ea1b5aa42686adb82ce29b9fcfdf9d204a1 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Sep 12 16:36:09 2013 +1000 ctdbd: Create a utility function to log error for "not implemented" controls Signed-off-by: Amitay Isaacs <ami...@gmail.com> commit 2ce3a48cc969d563c26dd295723416c0d7b077a2 Author: Amitay Isaacs <ami...@gmail.com> Date: Thu Sep 12 16:35:17 2013 +1000 include: Remove unused set_dmaster structure Signed-off-by: Amitay Isaacs <ami...@gmail.com> ----------------------------------------------------------------------- Summary of changes: client/ctdb_client.c | 673 ++++++++++++++++++++-------------- include/ctdb_client.h | 5 - include/ctdb_private.h | 7 - include/ctdb_protocol.h | 20 +- server/ctdb_call.c | 2 +- server/ctdb_control.c | 30 +- server/ctdb_daemon.c | 3 - server/ctdb_persistent.c | 313 +---------------- tests/simple/76_ctdb_pdb_recovery.sh | 12 +- tests/src/ctdb_transaction.c | 12 +- tools/ctdb.c | 86 ----- 11 files changed, 429 insertions(+), 734 deletions(-) Changeset truncated at 500 lines: diff --git a/client/ctdb_client.c b/client/ctdb_client.c index 997a234..9c1c27d 100644 --- a/client/ctdb_client.c +++ b/client/ctdb_client.c @@ -3730,189 +3730,353 @@ int ctdb_ctrl_getcapabilities(struct ctdb_context *ctdb, struct timeval timeout, return ret; } -/** - * check whether a transaction is active on a given db on a given node - */ -int32_t ctdb_ctrl_transaction_active(struct ctdb_context *ctdb, - uint32_t destnode, - uint32_t db_id) +struct server_id { + uint64_t pid; + uint32_t task_id; + uint32_t vnn; + uint64_t unique_id; +}; + +static struct server_id server_id_get(struct ctdb_context *ctdb, uint32_t reqid) { - int32_t status; - int ret; - TDB_DATA indata; + struct server_id id; - indata.dptr = (uint8_t *)&db_id; - indata.dsize = sizeof(db_id); + id.pid = getpid(); + id.task_id = reqid; + id.vnn = ctdb_get_pnn(ctdb); + id.unique_id = id.vnn; + id.unique_id = (id.unique_id << 32) | reqid; - ret = ctdb_control(ctdb, destnode, 0, - CTDB_CONTROL_TRANS2_ACTIVE, - 0, indata, NULL, NULL, &status, - NULL, NULL); + return id; +} + +static bool server_id_equal(struct server_id *id1, struct server_id *id2) +{ + if (id1->pid != id2->pid) { + return false; + } + + if (id1->task_id != id2->task_id) { + return false; + } + + if (id1->vnn != id2->vnn) { + return false; + } + if (id1->unique_id != id2->unique_id) { + return false; + } + + return true; +} + +static bool server_id_exists(struct ctdb_context *ctdb, struct server_id *id) +{ + struct ctdb_server_id sid; + int ret; + uint32_t result; + + sid.type = SERVER_TYPE_SAMBA; + sid.pnn = id->vnn; + sid.server_id = id->pid; + + ret = ctdb_ctrl_check_server_id(ctdb, timeval_current_ofs(3,0), + id->vnn, &sid, &result); if (ret != 0) { - DEBUG(DEBUG_ERR, (__location__ " ctdb control for transaction_active failed\n")); - return -1; + /* If control times out, assume server_id exists. */ + return true; } - return status; + if (result) { + return true; + } + + return false; } -struct ctdb_transaction_handle { - struct ctdb_db_context *ctdb_db; - bool in_replay; - /* - * we store the reads and writes done under a transaction: - * - one list stores both reads and writes (m_all), - * - the other just writes (m_write) - */ - struct ctdb_marshall_buffer *m_all; - struct ctdb_marshall_buffer *m_write; +enum g_lock_type { + G_LOCK_READ = 0, + G_LOCK_WRITE = 1, }; -/* start a transaction on a database */ -static int ctdb_transaction_destructor(struct ctdb_transaction_handle *h) +struct g_lock_rec { + enum g_lock_type type; + struct server_id id; +}; + +struct g_lock_recs { + unsigned int num; + struct g_lock_rec *lock; +}; + +static bool g_lock_parse(TALLOC_CTX *mem_ctx, TDB_DATA data, + struct g_lock_recs **locks) { - tdb_transaction_cancel(h->ctdb_db->ltdb->tdb); - return 0; + struct g_lock_recs *recs; + + recs = talloc_zero(mem_ctx, struct g_lock_recs); + if (recs == NULL) { + return false; + } + + if (data.dsize == 0) { + goto done; + } + + if (data.dsize % sizeof(struct g_lock_rec) != 0) { + DEBUG(DEBUG_ERR, (__location__ "invalid data size %lu in g_lock record\n", + data.dsize)); + talloc_free(recs); + return false; + } + + recs->num = data.dsize / sizeof(struct g_lock_rec); + recs->lock = talloc_memdup(mem_ctx, data.dptr, data.dsize); + if (recs->lock == NULL) { + talloc_free(recs); + return false; + } + +done: + if (locks != NULL) { + *locks = recs; + } + + return true; } -/* start a transaction on a database */ -static int ctdb_transaction_fetch_start(struct ctdb_transaction_handle *h) + +static bool g_lock_lock(TALLOC_CTX *mem_ctx, + struct ctdb_db_context *ctdb_db, + const char *keyname, uint32_t reqid) { - struct ctdb_record_handle *rh; - TDB_DATA key; - TDB_DATA data; - struct ctdb_ltdb_header header; - TALLOC_CTX *tmp_ctx; - const char *keyname = CTDB_TRANSACTION_LOCK_KEY; - int ret; - struct ctdb_db_context *ctdb_db = h->ctdb_db; - pid_t pid; - int32_t status; + TDB_DATA key, data; + struct ctdb_record_handle *h; + struct g_lock_recs *locks; + struct server_id id; + int i; - key.dptr = discard_const(keyname); - key.dsize = strlen(keyname); + key.dptr = (uint8_t *)discard_const(keyname); + key.dsize = strlen(keyname) + 1; + h = ctdb_fetch_lock(ctdb_db, mem_ctx, key, &data); + if (h == NULL) { + return false; + } - if (!ctdb_db->persistent) { - DEBUG(DEBUG_ERR,(__location__ " Attempted transaction on non-persistent database\n")); - return -1; + if (!g_lock_parse(h, data, &locks)) { + DEBUG(DEBUG_ERR, ("g_lock: error parsing locks\n")); + talloc_free(data.dptr); + talloc_free(h); + return false; } -again: - tmp_ctx = talloc_new(h); + talloc_free(data.dptr); - rh = ctdb_fetch_lock(ctdb_db, tmp_ctx, key, NULL); - if (rh == NULL) { - DEBUG(DEBUG_ERR,(__location__ " Failed to fetch_lock database\n")); - talloc_free(tmp_ctx); - return -1; + id = server_id_get(ctdb_db->ctdb, reqid); + + i = 0; + while (i < locks->num) { + if (server_id_equal(&locks->lock[i].id, &id)) { + /* Internal error */ + talloc_free(h); + return false; + } + + if (!server_id_exists(ctdb_db->ctdb, &locks->lock[i].id)) { + if (i < locks->num-1) { + locks->lock[i] = locks->lock[locks->num-1]; + } + locks->num--; + continue; + } + + /* This entry is locked. */ + DEBUG(DEBUG_INFO, ("g_lock: lock already granted for " + "pid=0x%llx taskid=%x vnn=%d id=0x%llx\n", + (unsigned long long)id.pid, + id.task_id, id.vnn, + (unsigned long long)id.unique_id)); + talloc_free(h); + return false; } - status = ctdb_ctrl_transaction_active(ctdb_db->ctdb, - CTDB_CURRENT_NODE, - ctdb_db->db_id); - if (status == 1) { - unsigned long int usec = (1000 + random()) % 100000; - DEBUG(DEBUG_DEBUG, (__location__ " transaction is active " - "on db_id[0x%08x]. waiting for %lu " - "microseconds\n", - ctdb_db->db_id, usec)); - talloc_free(tmp_ctx); - usleep(usec); - goto again; + locks->lock = talloc_realloc(locks, locks->lock, struct g_lock_rec, + locks->num+1); + if (locks->lock == NULL) { + talloc_free(h); + return false; } - /* - * store the pid in the database: - * it is not enough that the node is dmaster... - */ - pid = getpid(); - data.dptr = (unsigned char *)&pid; - data.dsize = sizeof(pid_t); - rh->header.rsn++; - rh->header.dmaster = ctdb_db->ctdb->pnn; - ret = ctdb_ltdb_store(ctdb_db, key, &(rh->header), data); - if (ret != 0) { - DEBUG(DEBUG_ERR, (__location__ " Failed to store pid in " - "transaction record\n")); - talloc_free(tmp_ctx); - return -1; + locks->lock[locks->num].type = G_LOCK_WRITE; + locks->lock[locks->num].id = id; + locks->num++; + + data.dptr = (uint8_t *)locks->lock; + data.dsize = locks->num * sizeof(struct g_lock_rec); + + if (ctdb_record_store(h, data) != 0) { + DEBUG(DEBUG_ERR, ("g_lock: failed to write transaction lock for " + "pid=0x%llx taskid=%x vnn=%d id=0x%llx\n", + (unsigned long long)id.pid, + id.task_id, id.vnn, + (unsigned long long)id.unique_id)); + talloc_free(h); + return false; } - talloc_free(rh); + DEBUG(DEBUG_INFO, ("g_lock: lock granted for " + "pid=0x%llx taskid=%x vnn=%d id=0x%llx\n", + (unsigned long long)id.pid, + id.task_id, id.vnn, + (unsigned long long)id.unique_id)); - ret = tdb_transaction_start(ctdb_db->ltdb->tdb); - if (ret != 0) { - DEBUG(DEBUG_ERR,(__location__ " Failed to start tdb transaction\n")); - talloc_free(tmp_ctx); - return -1; + talloc_free(h); + return true; +} + +static bool g_lock_unlock(TALLOC_CTX *mem_ctx, + struct ctdb_db_context *ctdb_db, + const char *keyname, uint32_t reqid) +{ + TDB_DATA key, data; + struct ctdb_record_handle *h; + struct g_lock_recs *locks; + struct server_id id; + int i; + bool found = false; + + key.dptr = (uint8_t *)discard_const(keyname); + key.dsize = strlen(keyname) + 1; + h = ctdb_fetch_lock(ctdb_db, mem_ctx, key, &data); + if (h == NULL) { + return false; } - ret = ctdb_ltdb_fetch(ctdb_db, key, &header, tmp_ctx, &data); - if (ret != 0) { - DEBUG(DEBUG_ERR,(__location__ " Failed to re-fetch transaction " - "lock record inside transaction\n")); - tdb_transaction_cancel(ctdb_db->ltdb->tdb); - talloc_free(tmp_ctx); - goto again; + if (!g_lock_parse(h, data, &locks)) { + DEBUG(DEBUG_ERR, ("g_lock: error parsing locks\n")); + talloc_free(data.dptr); + talloc_free(h); + return false; } - if (header.dmaster != ctdb_db->ctdb->pnn) { - DEBUG(DEBUG_DEBUG,(__location__ " not dmaster any more on " - "transaction lock record\n")); - tdb_transaction_cancel(ctdb_db->ltdb->tdb); - talloc_free(tmp_ctx); - goto again; + talloc_free(data.dptr); + + id = server_id_get(ctdb_db->ctdb, reqid); + + for (i=0; i<locks->num; i++) { + if (server_id_equal(&locks->lock[i].id, &id)) { + if (i < locks->num-1) { + locks->lock[i] = locks->lock[locks->num-1]; + } + locks->num--; + found = true; + break; + } } - if ((data.dsize != sizeof(pid_t)) || (*(pid_t *)(data.dptr) != pid)) { - DEBUG(DEBUG_DEBUG, (__location__ " my pid is not stored in " - "the transaction lock record\n")); - tdb_transaction_cancel(ctdb_db->ltdb->tdb); - talloc_free(tmp_ctx); - goto again; + if (!found) { + DEBUG(DEBUG_ERR, ("g_lock: lock not found\n")); + talloc_free(h); + return false; } - talloc_free(tmp_ctx); + data.dptr = (uint8_t *)locks->lock; + data.dsize = locks->num * sizeof(struct g_lock_rec); + + if (ctdb_record_store(h, data) != 0) { + talloc_free(h); + return false; + } + + talloc_free(h); + return true; +} + +struct ctdb_transaction_handle { + struct ctdb_db_context *ctdb_db; + struct ctdb_db_context *g_lock_db; + char *lock_name; + uint32_t reqid; + /* + * we store reads and writes done under a transaction: + * - one list stores both reads and writes (m_all) + * - the other just writes (m_write) + */ + struct ctdb_marshall_buffer *m_all; + struct ctdb_marshall_buffer *m_write; +}; + +static int ctdb_transaction_destructor(struct ctdb_transaction_handle *h) +{ + g_lock_unlock(h, h->g_lock_db, h->lock_name, h->reqid); + ctdb_reqid_remove(h->ctdb_db->ctdb, h->reqid); return 0; } -/* start a transaction on a database */ +/** + * start a transaction on a database + */ struct ctdb_transaction_handle *ctdb_transaction_start(struct ctdb_db_context *ctdb_db, TALLOC_CTX *mem_ctx) { struct ctdb_transaction_handle *h; - int ret; + struct ctdb_server_id id; h = talloc_zero(mem_ctx, struct ctdb_transaction_handle); if (h == NULL) { - DEBUG(DEBUG_ERR,(__location__ " oom for transaction handle\n")); + DEBUG(DEBUG_ERR, (__location__ " memory allocation error\n")); return NULL; } h->ctdb_db = ctdb_db; + h->lock_name = talloc_asprintf(h, "transaction_db_0x%08x", + (unsigned int)ctdb_db->db_id); + if (h->lock_name == NULL) { + DEBUG(DEBUG_ERR, (__location__ " talloc asprintf failed\n")); + talloc_free(h); + return NULL; + } - ret = ctdb_transaction_fetch_start(h); - if (ret != 0) { + h->g_lock_db = ctdb_attach(h->ctdb_db->ctdb, timeval_current_ofs(3,0), + "g_lock.tdb", false, 0); + if (!h->g_lock_db) { + DEBUG(DEBUG_ERR, (__location__ " unable to attach to g_lock.tdb\n")); talloc_free(h); return NULL; } - talloc_set_destructor(h, ctdb_transaction_destructor); + id.type = SERVER_TYPE_SAMBA; + id.pnn = ctdb_get_pnn(ctdb_db->ctdb); + id.server_id = getpid(); - return h; -} + if (ctdb_ctrl_register_server_id(ctdb_db->ctdb, timeval_current_ofs(3,0), + &id) != 0) { + DEBUG(DEBUG_ERR, (__location__ " unable to register server id\n")); + talloc_free(h); + return NULL; + } + h->reqid = ctdb_reqid_new(h->ctdb_db->ctdb, h); + if (!g_lock_lock(h, h->g_lock_db, h->lock_name, h->reqid)) { + DEBUG(DEBUG_ERR, (__location__ " Error locking g_lock.tdb\n")); + talloc_free(h); + return NULL; + } -/* - fetch a record inside a transaction + talloc_set_destructor(h, ctdb_transaction_destructor); + return h; +} + +/** + * fetch a record inside a transaction */ -int ctdb_transaction_fetch(struct ctdb_transaction_handle *h, - TALLOC_CTX *mem_ctx, +int ctdb_transaction_fetch(struct ctdb_transaction_handle *h, + TALLOC_CTX *mem_ctx, TDB_DATA key, TDB_DATA *data) { struct ctdb_ltdb_header header; @@ -3926,26 +4090,24 @@ int ctdb_transaction_fetch(struct ctdb_transaction_handle *h, *data = tdb_null; ret = 0; } - + if (ret != 0) { return ret; } - if (!h->in_replay) { - h->m_all = ctdb_marshall_add(h, h->m_all, h->ctdb_db->db_id, 1, key, NULL, *data); - if (h->m_all == NULL) { - DEBUG(DEBUG_ERR,(__location__ " Failed to add to marshalling record\n")); - return -1; - } + h->m_all = ctdb_marshall_add(h, h->m_all, h->ctdb_db->db_id, 1, key, NULL, *data); + if (h->m_all == NULL) { + DEBUG(DEBUG_ERR,(__location__ " Failed to add to marshalling record\n")); + return -1; } -- CTDB repository