Test discussion @ https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046440.html
Tested-by: aginwala <[email protected]> On Sat, Feb 24, 2018 at 3:17 PM, Justin Pettit <[email protected]> wrote: > > > > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <[email protected]> wrote: > > > > diff --git a/Documentation/ref/ovsdb.5.rst b/Documentation/ref/ovsdb.5. > rst > > index f3e50976b5c7..0ab888996eac 100644 > > --- a/Documentation/ref/ovsdb.5.rst > > +++ b/Documentation/ref/ovsdb.5.rst > > > > +When the server successfully joins the cluster, the database file is > replaced > > +by one in the general format described earlier. > > Just to be crystal clear, I think it be good to reference that it's the > one described in the "Clustered Format" section. > > > diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c > > new file mode 100644 > > index 000000000000..457d1292a949 > > --- /dev/null > > +++ b/ovsdb/raft-private.c > > > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_address_validate(const char *address) > > +{ > > ... > > + return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\" > address", > > + address); > > I assume you're downplaying "unix:", since it only makes sense for testing > purposes? > > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_address_validate_json(const struct json *address) > > This function is only used within this file, so it could be declared > static if you don't think it's generally useful. > > > +char * > > +raft_address_to_nickname(const char *address, const struct uuid *sid) > > +{ > > I think it may be worth explaining what a nickname is and how one gets one. > > > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_servers_from_json__(const struct json *json, struct hmap *servers) > > +{ > > ... > > + if (!uuid_from_string(&sid, node->name)) { > > + return ovsdb_syntax_error(json, NULL, "%s is a not a UUID", > > I think you can drop the first "a" in that error message. > > > +void > > +raft_servers_format(const struct hmap *servers, struct ds *ds) > > +{ > > + int i = 0; > > + const struct raft_server *s; > > + HMAP_FOR_EACH (s, hmap_node, servers) { > > + if (i++) { > > + ds_put_cstr(ds, ", "); > > + } > > + ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address); > > Do you think it's worth putting a space between the SID and parenthetical > address? > > > +static void > > +raft_header_from_json__(struct raft_header *h, struct ovsdb_parser *p) > > +{ > > ... > > + /* Parse "remotes", if present. > > Should that be "remote_addresses"? > > > +static void > > +raft_record_from_json__(struct raft_record *r, struct ovsdb_parser *p) > > +{ > > > > + /* All remaining types of log records include "term", plus at most > one of: > > + * > > + * - "index" plus zero or more of "data" and "servers". If > "data" is > > + * present then "eid" may also be present. > > ... > > + r->entry.data = json_nullable_clone( > > + ovsdb_parser_member(p, "data", > > + OP_OBJECT | OP_ARRAY | OP_OPTIONAL)); > > + r->entry.eid = (r->entry.data > > + ? raft_parse_required_uuid(p, "eid") > > + : UUID_ZERO); > > The description of "index" makes it sound like "eid" is optional if "data" > is present, but this code seems to expect it to be set. > > > diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h > > new file mode 100644 > > index 000000000000..6e147fadb0ac > > --- /dev/null > > +++ b/ovsdb/raft-private.h > > ... > > + char *nickname; /* 1ab3(s3) */ > > It might be nice to put the example in quotes. > > > +const char *raft_servers_get_nickname__(const struct hmap *servers, > > + const struct uuid *sid); > > ... > > +bool raft_parse_uuid__(struct ovsdb_parser *, const char *name, bool > optional, > > + struct uuid *); > > The coding style states that functions with a "__" suffix are used to > indicate "internal use only". I suppose this entire file is supposed to be > internal, but it still strikes me as funny to see these function in header > files. What do you think? > > > diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c > > new file mode 100644 > > index 000000000000..617d4aa4eaad > > --- /dev/null > > +++ b/ovsdb/raft-rpc.c > > > > ... > > +static void > > +raft_append_request_from_jsonrpc(struct ovsdb_parser *p, > > + struct raft_append_request *rq) > > +{ > > ... > > + > > + const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY); > > + if (!log) { > > + return; > > + } > > Should this parser argument include OP_OPTIONAL? > > > +static void > > +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy, > > + struct json *args) > > +{ > > ... > > + json_object_put_string(args, "result", > > + raft_append_result_to_string(rpy->result)); > > I believe raft_append_result_to_string() can return NULL, which could > cause a problem here. > > > +static void > > +raft_format_append_reply(const struct raft_append_reply *rpy, struct > ds *s) > > +{ > > ... > > + ds_put_format(s, " result=\"%s\"", > > + raft_append_result_to_string(rpy->result)); > > I don't think this will crash, but it will print strange if 'rpy->result' > is not valid. > > Is it intentional that "n_entries" and "prev_log_*" aren't printed? > > > +static void > > +raft_format_vote_request(const struct raft_vote_request *rq, struct ds > *s) > > +{ > > + ds_put_format(s, " term=%"PRIu64, rq->term); > > + ds_put_format(s, " last_log_index=%"PRIu64, rq->last_log_index); > > + ds_put_format(s, " last_log_term=%"PRIu64, rq->last_log_term); > > +} > > Should "leadership_transfer" be printed? > > > +static void > > +raft_install_snapshot_request_to_jsonrpc( > > + const struct raft_install_snapshot_request *rq, struct json *args) > > +{ > > ... > > + json_object_put(args, "last_servers", json_clone(rq->last_servers)); > > ... > > + > > + json_object_put(args, "data", json_clone(rq->data)); > > I believe "rq->last_servers" and 'rq->data' may be null from > raft_install_snapshot_request_clone(). > > > +static void > > +raft_format_install_snapshot_reply( > > + const struct raft_install_snapshot_reply *rpy, struct ds *s) > > +{ > > + ds_put_format(s, " term=%"PRIu64, rpy->term); > > +} > > Was it intentional to leave off "last_index" and "last_term"? > > > +static void > > +raft_execute_command_request_to_jsonrpc( > > + const struct raft_execute_command_request *rq, struct json *args) > > +{ > > + json_object_put(args, "data", json_clone(rq->data)); > > Can 'rq->data' be null from raft_execute_command_request_clone()? > > > +static void > > +raft_format_execute_command_request( > > + const struct raft_execute_command_request *rq, struct ds *s) > > +{ > > ... > > + json_to_ds(rq->data, JSSF_SORT, s); > > Same question about 'rq->data' being null. > > > +static void > > +raft_execute_command_reply_to_jsonrpc( > > + const struct raft_execute_command_reply *rpy, struct json *args) > > +{ > > + json_object_put_format(args, "result", UUID_FMT, > UUID_ARGS(&rpy->result)); > > + json_object_put_string(args, "status", > > + raft_command_status_to_string(rpy->status)); > > I believe raft_command_status_to_string() can return null. > > > +static void > > +raft_format_execute_command_reply( > > + const struct raft_execute_command_reply *rpy, struct ds *s) > > +{ > > + ds_put_format(s, " result="UUID_FMT, UUID_ARGS(&rpy->result)); > > + ds_put_format(s, " status=\"%s\"", > > + raft_command_status_to_string(rpy->status)); > > Same concern, but this wouldn't cause a crash. > > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_rpc_from_jsonrpc(struct uuid *cidp, > > + const struct uuid *sid, > > + const struct jsonrpc_msg *msg, > > + union raft_rpc *rpc) > > +{ > > I think it would be worth adding a comment to this function; behavior such > as how 'cidp' is used is not obvious unless you read the code. > > > +void > > +raft_rpc_format(const union raft_rpc *rpc, struct ds *s) > > +{ > > + ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type)); > > + if (rpc->common.comment) { > > + ds_put_format(s, " \"%s\"", rpc->common.comment); > > + } > > + ds_put_char(s, ':'); > > Is printing the SID not important? > > > diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h > > index 969a6ed38cd6..a44114e8dcd9 100644 > > --- a/lib/jsonrpc.h > > +++ b/lib/jsonrpc.h > > > > struct raft_rpc_common { > > enum raft_rpc_type type; > > struct uuid sid; /* SID of peer server. */ > > I found it surprising in the code that 'sid' could be the source or > destination depending on whether we were converting to or from jsonrpc. I > think it could be made clearer here. > > > diff --git a/ovsdb/raft.c b/ovsdb/raft.c > > new file mode 100644 > > index 000000000000..9c358bf35687 > > --- /dev/null > > +++ b/ovsdb/raft.c > > ... > > +/* An action deferred until a log write commits to disk. */ > > +struct raft_waiter { > > + struct ovs_list list_node; > > + uint64_t commit_ticket; > > + > > + enum raft_waiter_type type; > > + union { > > + /* RAFT_W_ENTRY. > > + * > > + * Waits for a RAFT_REC_ENTRY write to our local log to > commit. Upon > > + * completion, updates log_synced to indicate that the new log > entry or > > + * entries are committed and, if we are leader, also updates > our local > > + * match_index. */ > > + struct { > > + uint64_t index; > > + } entry; > > + > > + /* RAFT_W_TERM. > > + * > > + * Waits for a RAFT_REC_TERM or RAFT_REC_VOTE record write to > commit. > > + * Upon completion, updates synced_term and synced_vote, which > triggers > > + * sending RPCs deferred by the uncommitted term and vote. */ > > + struct { > > + uint64_t term; > > + struct uuid vote; > > + } term; > > + > > + /* RAFT_W_RPC. > > + * > > + * Sometimes, sending an RPC to a peer must be delayed until an > entry, > > + * a term, or a vote mentioned in the RPC is synced to disk. > This > > + * waiter keeps a copy of such an RPC until the previous > waiters have > > + * committed. */ > > + union raft_rpc *rpc; > > + }; > > +}; > > Minor, but I think it's a little easier to read when single quotes are > used around references to struct members. > > > +/* The Raft state machine. */ > > +struct raft { > > ... > > +/* Persistent derived state. > > + * > > + * This must be updated on stable storage before responding to RPCs. > It can be > > + * derived from the header, snapshot, and log in 'log'. */ > > + > > + struct uuid cid; /* Cluster ID (immutable for the > cluster). */ > > + struct uuid sid; /* Server ID (immutable for the > server). */ > > + char *local_address; /* Local address (immutable for the > server). */ > > + char *local_nickname; /* Used for local server in log > messages. */ > > + char *name; /* Cluster name (immutable for the > cluster). */ > > I think 'name' is the schema name based on other references, including > raft_get_name(), but this makes it sound different to me. > > > +/* Persistent state on all servers. > > + * > > + * Must be updated on stable storage before responding to RPCs. */ > > + > > + /* Current term and vote, which might be on the way to disk now. */ > > + uint64_t term; /* Initialized to 0 and only increases. > */ > > + struct uuid vote; /* In 'term', or all-zeros if none. */ > > I found the 'vote' description confusing. > > > +/* Creates an on-disk file that represents a new Raft cluster and > initializes > > + * it to consist of a single server, the one on which this function is > called. > > + * > > + * Creates the local copy of the cluster's log in 'file_name', which > must not > > + * already exist. Gives it the name 'name', which should be the > database > > + * schema name and which is used only to match up this database with > server > > + * added to the cluster later if the cluster ID is unavailable. > > "with *the* server" > > > + * The new server is located at 'local_address', which must take one of > the > > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4 > address or a > > + * square bracket enclosed IPv6 address. PORT, if present, is a port > number > > + * that defaults to RAFT_PORT. > > I think the port must be explicitly set. I don't think there is a > "RAFT_PORT" default defined. > > > +/* Creates a database file that represents a new server in an existing > Raft > > + * cluster. > > + * > > + * Creates the local copy of the cluster's log in 'file_name', which > must not > > + * already exist. Gives it the name 'name', which must be the same name > > + * passed in to raft_create_cluster() earlier. > > Is raft_create_cluster() supposed to be called before this? I think > raft_create_cluster() would have already created the log file, which the > previous sentence said must not already exist. > > > + * 'cid' is optional. If specified, the new server will join only the > cluster > > + * with the given cluster ID. > > + * > > + * The new server is located at 'local_address', which must take one of > the > > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4 > address or a > > + * square bracket enclosed IPv6 address. PORT, if present, is a port > number > > + * that defaults to RAFT_PORT. > > Same comment about explicitly setting the port and "RAFT_PORT" not being > defined. > > > + * > > + * Joining the cluster requiring contacting it. Thus, > 'remote_addresses' > > s/requiring/requires/ > > > + * This only creates the on-disk file and does no network access. Use > > + * raft_open() to start operating the new server. (Until this happens, > the > > + * new server has not joined the cluster.) > > The name for this function (and the corresponding ovsdb-tool command) > seems a bit confusing in this way. Not a big deal, but it doesn't seem > obvious. > > > + * Returns null if successful, otherwise an ovsdb_error describing the > > + * problem. */ > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_join_cluster(const char *file_name, > > + const char *name, const char *local_address, > > + const struct sset *remote_addresses, > > + const struct uuid *cid) > > +{ > > ... > > + /* Write log file. */ > > + struct raft_header h = { > > + .sid = uuid_random(), > > + .cid = cid ? *cid : UUID_ZERO, > > + .name = xstrdup(name), > > + .local_address = xstrdup(local_address), > > + .joining = true, > > + }; > > + sset_clone(&h.remote_addresses, remote_addresses); > > + error = ovsdb_log_write_and_free(log, raft_header_to_json(&h)); > > + raft_header_uninit(&h); > > I don't think anything initializes 'h->snap', which seems like it could be > a problem for raft_header_to_json() and raft_header_uninit(). > > > +/* Reads the initial header record from 'log', which must be a Raft > clustered > > + * database log, and populates '*md' with the information read from > it. The > > + * caller must eventually destroy 'md'. > > You might mention that they do this by calling raft_metadata_destroy(). > > > + * > > + * On success, returns NULL. On failure, returns a error that the > caller must > > + * eventually destroy and zeros '*md'. */ > > s/a error/an error/ > > > +static struct json * > > +raft_servers_for_index(const struct raft *raft, uint64_t index) > > I think it would be worth mentioning that the caller must free the return > value. > > > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_read_header(struct raft *raft) > > +{ > > + /* Read header record. */ > > + struct json *json; > > + struct ovsdb_error *error = ovsdb_log_read(raft->log, &json); > > I don't think anything free 'json'. > > > +static void > > +raft_add_conn(struct raft *raft, struct jsonrpc_session *js, > > + const struct uuid *sid, bool incoming) > > +{ > > ... > > + if (sid) { > > + conn->sid = *sid; > > + } > > + conn->nickname = raft_address_to_nickname( > jsonrpc_session_get_name(js), > > + &conn->sid); > > The 'sid' argument can be null, which seems like it could be a problem > with raft_address_to_nickname(). > > > +/* Attempts to start 'raft' leaving its cluster. The caller can check > progress > > + * using raft_is_leaving() and raft_left(). */ > > +void > > +raft_leave(struct raft *raft) > > +{ > > ... > > + raft->leaving = true; > > + raft_transfer_leadership(raft, "this server is leaving the > cluster"); > > + raft_become_follower(raft); > > + raft_send_remove_server_requests(raft); > > + raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC; > > + raft->leaving = true; > > This sets 'raft->leaving' twice. Was that intentional? > > > +/* Returns true if 'raft' is successfully left its cluster. */ > > +bool > > +raft_left(const struct raft *raft) > > I think you can drop "is" in the comment describing the function. > > > +static void > > +raft_close__(struct raft *raft) > > +{ > > I think it would be worth explaining why this would be called separately > from raft_close(). > > > + struct raft_server *rs = raft->remove_server; > > + if (rs) { > > + raft_send_remove_server_reply__(raft, &rs->sid, > &rs->requester_sid, > > + rs->requester_conn, false, > > + RAFT_SERVER_SHUTDOWN); > > + raft_server_destroy(raft->remove_server); > > + } > > I think it might be safer to set 'raft->remove_server' to null in case > this gets called, and then raft_close() is called. > > > void > > raft_close(struct raft *raft) > > { > > Should something free the 'raft->waiters' entries? > > > +static bool > > +raft_conn_receive(struct raft *raft, struct raft_conn *conn, > > + union raft_rpc *rpc) > > +{ > > ... > > + } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) { > > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > > + VLOG_WARN_RL(&rl, > > + "%s: remote server ID changed from "SID_FMT" to > "SID_FMT, > > + jsonrpc_session_get_name(conn->js), > > + SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid)); > > If the SID changed, shouldn't we update "conn->sid"? > > > +/* Returns true if 'conn' should stay open, 'conn' if it should be > closed. */ > > +static bool > > +raft_conn_should_stay_open(struct raft *raft, struct raft_conn *conn) > > I assume "false" if it should be closed. :-) > > > +/* Allows 'raft' to maintain the distributed log. Call this function > as part > > + * of the process's main loop. */ > > +void > > +raft_run(struct raft *raft) > > +{ > > ... > > + /* Close unneeded sessions. */ > > + struct raft_conn *next; > > + LIST_FOR_EACH_SAFE (conn, next, list_node, &raft->conns) { > > + if (!raft_conn_should_stay_open(raft, conn)) { > > + jsonrpc_session_close(conn->js); > > + ovs_list_remove(&conn->list_node); > > + free(conn); > > I think 'conn->nickname' needs to be freed, too. Maybe a helper function, > since there are a couple of places that free 'conn' > > > + if (raft->joining && time_msec() >= raft->join_timeout) { > > + raft->join_timeout = time_msec() + 1000; > > + struct raft_conn *conn; > > This 'conn' shadows a previous declaration. I think you can just drop > declaring this one. > > > + if (time_msec() >= raft->ping_timeout) { > > + if (raft->role == RAFT_LEADER) { > > + raft_send_heartbeats(raft); > > + } else { > > + long long int now = time_msec(); > > + struct raft_command *cmd, *next; > > This 'next' shadows a 'next' of a different type. > > > +/* Frees 'cmd'. */ > > +void > > +raft_command_unref(struct raft_command *cmd) > > +{ > > It seems like a more accurate description would be along the lines of > "Decrements the ref count for 'cmd' and frees it if zero." > > > +/* 'sid' is the server being added or removed. */ > > +static void > > +raft_send_add_server_reply__(struct raft *raft, const struct uuid *sid, > > + const char *address, > > + bool success, const char *comment) > > The description makes it sound like a server could be removed using this > function, but I don't think that's the case. > > > +static void > > +raft_server_init_leader(struct raft *raft, struct raft_server *s) > > +{ > > + s->next_index = raft->log_end; > > + s->match_index = 0; > > + s->phase = RAFT_PHASE_STABLE; > > +} > > The name of this function is a little surprising to me. It seems like > it's initializing a server, not necessarily a leader. > > > +/* Processes term 'term' received as part of RPC 'common'. Returns > true if the > > + * caller should continue processing the RPC, false if the caller > should reject > > + * it due to a stale term. */ > > +static bool > > +raft_receive_term__(struct raft *raft, const struct raft_rpc_common > *common, > > + uint64_t term) > > +{ > > + /* Section 3.3 says: > > + * > > + * Current terms are exchanged whenever servers communicate; if > one > > + * server’s current term is smaller than the other’s, then it > updates > > + * its current term to the larger value. If a candidate or > leader > > + * discovers that its term is out of date, it immediately > reverts to > > + * follower state. If a server receives a request with a stale > term > > + * number, it rejects the request. > > + */ > > + if (term > raft->term) { > > + if (!raft_set_term(raft, term, NULL)) { > > + return false; > > I don't understand how the result of raft_set_term() plays into this > decision to return false. > > > +static const struct json * > > +raft_peek_next_entry(struct raft *raft, struct uuid *eid) > > +{ > > + /* Invariant: log_start - 2 <= last_applied <= commit_index < > log_end. */ > > + ovs_assert(raft->log_start <= raft->last_applied + 2); > > + ovs_assert(raft->last_applied <= raft->commit_index); > > + ovs_assert(raft->commit_index < raft->log_end); > > + > > + if (raft->joining || raft->failed) { /* XXX needed? */ > > + return NULL; > > + } > > Did you want to check this "XXX"? > > > +/* If 'prev_log_index' exists in 'raft''s log, term 'prev_log_term', > returns > > + * NULL. Otherwise, returns an explanation for the mismatch. */ > > +static const char * > > +match_index_and_term(const struct raft *raft, > > + uint64_t prev_log_index, uint64_t prev_log_term) > > The description of 'prev_log_term' was confusing to me. I think it may be > missing some text about it matching the term pointed to by 'prev_log_index'. > > > +/* Returns NULL on success, RAFT_IN_PROGRESS for an operation in > progress, > > + * otherwise a brief comment explaining failure. */ > > +static void > > +raft_handle_append_entries(struct raft *raft, > > + const struct raft_append_request *rq, > > + uint64_t prev_log_index, uint64_t > prev_log_term, > > + const struct raft_entry *entries, > > + unsigned int n_entries) > > This description doesn't seem to match the function. > > > + if (raft->entries[log_index - raft->log_start].term > > + != entries[i].term) { > > + if (raft_truncate(raft, log_index)) { > > + servers_changed = true; > > Does this always mean the servers changed? > > > +static bool > > +raft_update_leader(struct raft *raft, const struct uuid *sid) > > +{ > > + if (raft->role == RAFT_LEADER && !uuid_equals(sid, &raft->sid)) { > > + char buf[SID_LEN + 1]; > > + VLOG_ERR("this server is leader but server %s claims to be", > > + raft_get_nickname(raft, sid, buf, sizeof buf)); > > + return false; > > If 'sid' and 'raft->sid' were equal, wouldn't that mean that this other > system is using the same sid? That also seems like it would be a problem. > > > +static void > > +raft_handle_append_request__(struct raft *raft, > > + const struct raft_append_request *rq) > > +{ > > ... > > + /* > > + * We now know that the data in rq->entries[] overlaps the data in > > + * raft->entries[], as shown below, with some positive 'ofs': > > + * > > + * rq->prev_log_index > > + * | first_entry_index > > + * | | nth_entry_index > > + * | | | > > + * v v v > > + * +---+---+---+---+---+ > > + * T | T | T | T | T | T | > > + * +---+-------+---+---+ > > + * +---+---+---+---+ > > + * T | T | T | T | T | > > + * +---+---+---+---+ > > + * ^ ^ > > + * | | > > + * log_start log_end > > + * > > + * |<-- ofs -->| > > I think 'ofs' points to the middle of the 'log_start' record. I think it > should maybe be moved a couple of spaces to make it clearer. > > > +/* Returns the next log entry or snapshot from 'raft', or NULL if there > are > > + * none left to read.. Stores the entry ID of the log entry in > '*eid'. Stores > > + * true in '*is_snapshot' if the returned data is a snapshot, false if > it is a > > + * log entry.*/ > > +const struct json * > > +raft_next_entry(struct raft *raft, struct uuid *eid, bool *is_snapshot) > > +{ > > There's an extra period after "read". > > > static void > > raft_handle_append_request(struct raft *raft, > > const struct raft_append_request *rq) > > { > > raft_handle_append_request__(raft, rq); > > } > > Is there a reason not to fold in raft_handle_append_request__()? There > don't seem to be any other callers to it. > > > +static void > > +raft_log_reconfiguration(struct raft *raft) > > +{ > > + /* Add the reconfiguration to the log. > > + * > > + * We ignore any */ > > Is this comment not finished? > > > +static void > > +raft_run_reconfigure(struct raft *raft) > > +{ > > ... > > + /* Remove a server, if one is scheduled for removal. */ > > + HMAP_FOR_EACH (s, hmap_node, &raft->servers) { > > + if (s->phase == RAFT_PHASE_REMOVE) { > > + hmap_remove(&raft->servers, &s->hmap_node); > > + raft->remove_server = s; > > Are we sure that 'raft->remove_server' will only be assigned once? I > think it might be worth asserting that it's null before assigning here. > > > +static void > > +raft_handle_remove_server_reply(struct raft *raft, > > + const struct raft_remove_server_reply > *rpc) > > +{ > > + if (rpc->success) { > > + VLOG_INFO("%04x: finished leaving cluster %04x", > > + uuid_prefix(&raft->sid, 4), uuid_prefix(&raft->cid, > 4)); > > Do you want to use SID_FMT and CID_FMT here? > > > +/* Returns true if 'raft' has grown enough that reducing the log to a > snapshot > > + * would be valuable, false otherwise. When this function returns > true, the > > + * client should consider using raft_store_snapshot() to reduce the log > storage > > + * requirements. */ > > +bool > > +raft_grew_lots(const struct raft *raft) > > +{ > > + return (!raft->joining > > + && !raft->leaving > > + && !raft->left > > + && !raft->failed > > + && raft->last_applied - raft->log_start >= 100 > > + && ovsdb_log_grew_lots(raft->log)); > > ovsdb_log_grew_lots() is based on the filesize. Raft entries are bigger. > It's probably not significant, but should this difference be taken that > into account? > > > +/* Replaces the log for 'raft', up to the last log entry read, by > > + * 'new_snapshot_data'. Returns NULL if successful, otherwise an error > that > > + * the caller must eventually free. */ > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT > > +raft_store_snapshot(struct raft *raft, const struct json > *new_snapshot_data) > > I think it would be worth mentioning that this function doesn't take > ownership of 'new_snapshot_data'. > > > + const struct raft_entry new_snapshot = { > > + .term = raft_get_term(raft, new_log_start - 1), > > + .data = CONST_CAST(struct json *, new_snapshot_data), > > + .eid = *raft_get_eid(raft, new_log_start - 1), > > + .servers = raft_servers_for_index(raft, new_log_start - 1), > > + }; > > + struct ovsdb_error *error = raft_save_snapshot(raft, new_log_start, > > + &new_snapshot); > > + if (error) { > > + return error; > > + } > > I think this may leak 'new_snapshot.servers'. If it doesn't under the > normal circumstance, I think it can under an error. > > > +static void > > +raft_unixctl_status(struct unixctl_conn *conn, > > + int argc OVS_UNUSED, const char *argv[], > > + void *aux OVS_UNUSED) > > +{ > > ... > > + ds_put_format(&s, "Cluster ID: "); > > + if (!uuid_is_zero(&raft->cid)) { > > + ds_put_format(&s, UUID_FMT"\n", UUID_ARGS(&raft->cid)); > > + } else { > > + ds_put_format(&s, "not yet known\n"); > > + } > > + ds_put_format(&s, "Server ID: "SID_FMT" ("UUID_FMT")\n", > > + SID_ARGS(&raft->sid), UUID_ARGS(&raft->sid)); > > Did you intentionally not use CID_FMT like you did for SID? > > > + ds_put_cstr(&s, "Servers:\n"); > > + struct raft_server *server; > > + HMAP_FOR_EACH (server, hmap_node, &raft->servers) { > > + ds_put_format(&s, " %s ("SID_FMT" at %s)", > > + server->nickname, > > + SID_ARGS(&server->sid), server->address); > > + if (uuid_equals(&server->sid, &raft->sid)) { > > + ds_put_cstr(&s, " (me)"); > > raft_put_sid() seems to use "self" to describe its own SID. Do you think > it's worth using the same terminology? > > > +static void > > +raft_unixctl_leave__(struct unixctl_conn *conn, struct raft *raft) > > +{ > > + if (raft_left(raft)) { > > + unixctl_command_reply(conn, NULL); > > ... > > + } else { > > + raft_leave(raft); > > + unixctl_command_reply(conn, NULL); > > + } > > +} > > It doesn't seem like this makes a distinction between starting to leave > and already having left. It seems like it would be helpful to provide a > reply that indicates the difference. > > > +static void > > +raft_unixctl_leave(struct unixctl_conn *conn, int argc, const char > *argv[], > > + void *aux OVS_UNUSED) > > +{ > > + bool force = argc > 2 && !strcmp(argv[1], "--force"); > > It doesn't appear that "--force" actually does anything. > > Some final random things: > > - It might be nice to add some tests that use "tcp" instead of > "unix" sockets. I thought I noticed some instances in the code that could > be problematic. > > - I think corner cases, such as 'raft->failed' getting set to true > aren't handled in the client code. > > - It's very difficult to determine whether an argument is consumed > or not, or whether the return value should be freed. Better documentation > (particularly for public functions) would help. > > - Should the copyright dates just be 2017 and 2018 for new files? > A lot of them reference older years. > > Thanks for implementing this important feature! > > Acked-by: Justin Pettit <[email protected]> > > --Justin > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
