On Wed, Nov 13, 2019 at 9:18 AM Dumitru Ceara <[email protected]> wrote: > > On Fri, Nov 8, 2019 at 8:27 PM Han Zhou <[email protected]> wrote: > > > > > > > > On Fri, Nov 8, 2019 at 11:22 AM Han Zhou <[email protected]> wrote: > > > > > > 1. storage data and the void *arg of init() breaks the engine node data > > > encapsulation. > > > 2. engine_node_valid(&en_flow_output, engine_run_id) is not needed? > > > Should use storage to access instead? > > > 3. refactor of engine is good but better to be a separate commit > > > 4. we can have a new interface: engine_get_data(), which returns data if > > > it is valid. we should never expose the data directly. We should init the > > > engine node with dynamically allocated engine data structure (and > > > remember to free during destroy) > > > > Oops! please ignore the above part since it was draft and I forgot to > > delete after editing the formal response, mostly redundant :-) > > Real response started here => > > > > > > Hi Dumitru, > > > > > > Sorry for late response. > > > On Mon, Nov 4, 2019 at 4:54 AM Dumitru Ceara <[email protected]> wrote: > > > > > > > > The incremental processing engine might stop a run before the > > > > en_runtime_data node is processed. In such cases the ed_runtime_data > > > > fields might contain pointers to already deleted SB records. For > > > > example, if a port binding corresponding to a patch port is removed from > > > > the SB database and the incremental processing engine aborts before the > > > > en_runtime_data node is processed then the corresponding local_datapath > > > > hashtable entry in ed_runtime_data is stale and will store a pointer to > > > > the already freed sbrec_port_binding record. > > > > > > > > This will cause invalid memory accesses in various places (e.g., > > > > pinctrl_run() -> prepare_ipv6_ras()). > > > > > > > > To fix the issue we need a way to track how each node was processed > > > > during an engine run. This commit transforms the 'changed' field in > > > > struct engine_node in a 'state' field. Possible node states are: > > > > - "New": the node is not yet initialized. > > > > - "Stale": data in the node is not up to date with the DB. > > > > - "Updated": data in the node is valid but was updated during > > > > the last run of the engine. > > > > - "Valid": data in the node is valid and didn't change during > > > > the last run of the engine. > > > > - "Aborted": during the last run, processing was aborted for > > > > this node. > > > > - "Destroyed": the node was already cleaned up. > > > > > > > > We also add a separation between engine node data that can be accessed > > > > at any time (regardless if the last engine run was successful or not) > > > > and data that may be accessed only if the nodes are up to date. This > > > > helps avoiding custom "engine_node_valid" handlers for different > > > > nodes. > > > > > > > > The commit also simplifies the logic of calling engine_run and > > > > engine_need_run in order to reduce the number of external variables > > > > required to track the result of the last engine execution. > > > > > > > > Functions that need to be called from the main loop and depend on > > > > various data contents of the engine's nodes are now called only if > > > > the data is up to date. > > > > > > > > CC: Han Zhou <[email protected]> > > > > Fixes: ca278d98a4f5 ("ovn-controller: Initial use of incremental engine > > > > - quiet mode.") > > > > Signed-off-by: Dumitru Ceara <[email protected]> > > > > > > > > --- > > > > v2: Address Han's comments: > > > > - call engine_node_valid() in all the places where node local data is > > > > used. > > > > - move out "global" data outside the engine nodes. Make a clear > > > > separation between data that can be safely used at any time and data > > > > that can be used only when the engine run was successful. > > > > > > I am concerned with this kind of separation of *global* data, which > > > breaks the data encapsulation of engine node, and can easily result in > > > hidden dependency. As you know the main purpose of the I-P engine is to > > > force explicit dependency exposed between different engine nodes thus > > > ensure the correctness (at least it helps to ensure) of incremental > > > processing. > > > > > > Here is my proposal to address the problem with better encapsulation. > > > > > > Firstly, let's avoid direct engine data access outside of engine module. > > > At engine node construction, instead of using reference of stack variable > > > (such as struct ed_type_runtime_data ed_runtime_data), we can allocate > > > the memory in the engine node's init() interface, and free in the > > > cleanup() interface. This way, there will be no way to directly access > > > engine data like &ed_runtime_data.local_datapaths. > > > > > > Secondly, let's add a engine module interface engine_get_data() to > > > retrieve *and validate* data for an engine node: > > > void * > > > engine_get_data(struct engine_node *node, uint64_t run_id) > > > { > > > if (engine_node_valid(node, run_id)) { > > > return node->data; > > > } > > > return NULL; > > > } > > > > > > This should be used whenever an engine node data need to be accessed. (It > > > may be even better to use node name as input instead of node structure, > > > but it seems ok to me either way) > > > > > > Now let's address the always-valid node problem. I was proposing an > > > is_valid() interface for engine node. It can be NULL by default, but if a > > > node xxx's data is always valid, it can be implemented like: > > > > > > static bool > > > en_xxx_is_valid(struct engine_node *node) > > > { > > > /* This node's data will always be valid */ > > > return true; > > > } > > > > > > For the engine_node_valid() function, it can be changed slightly: > > > > > > bool > > > engine_node_valid(struct engine_node *node, uint64_t run_id) > > > { > > > if (node->is_valid) { > > > return node->is_valid(); > > > } > > > return node->run_id == run_id && > > > (node->state == EN_UPDATED || node->state == EN_VALID); > > > } > > > > > > So if is_valid is not implemented, it will be the default check, > > > otherwise, follow whatever logic is used for is_valid(). This is flexible > > > and may fit for some new cases if a node's data validity depends on some > > > other factors it can always be customized in is_valid() interface of that > > > node. > > > > > > This may result in a little bit more code, since we will have to use > > > engine_get_data() to retrieve data from a node (and convert to its node's > > > data type) before accessing, but it would look more clean and safe, while > > > keeping the dependency well maintained. Does this sound plausible? > > > > > > Thanks for the other updates and refactors. Please see some minor > > > comments inlined below. > > > > > > > > > > > - if (en_runtime_data.changed) { > > > > + /* We need to make sure the en_flow_output node was > > > > + * properly computed before trying to install OF > > > > flows. > > > > + */ > > > > + if (engine_node_valid(&en_flow_output, > > > > engine_run_id)) { > > > > > > I guess your intention of using storage.flow_table here was to avoid the > > > engine_node_valid check, so that this "if" can be avoided, right? > > > (However, I suggest not using storage at all as discussed above) > > > > > > > + ofctrl_put(&storage.flow_table, > > > > + &storage.pending_ct_zones, > > > > + > > > > sbrec_meter_table_get(ovnsb_idl_loop.idl), > > > > + > > > > get_nb_cfg(sbrec_sb_global_table_get( > > > > + > > > > ovnsb_idl_loop.idl)), > > > > + engine_node_changed(&en_flow_output, > > > > + engine_run_id)); > > > > + } > > > > > > > > > > > > > > > > if (engine_force_recompute) { > > > > - need_recompute = true; > > > > - } else { > > > > - for (size_t i = 0; i < node->n_inputs; i++) { > > > > - if (node->inputs[i].node->changed) { > > > > - need_compute = true; > > > > - if (!node->inputs[i].change_handler) { > > > > - need_recompute = true; > > > > - break; > > > > - } > > > > + engine_recompute(node, true, !engine_abort_recompute); > > > > + return; > > > > + } > > > > + > > > > + /* If one of the inputs updated data then we need to recompute the > > > > + * current node too. > > > > + */ > > > > > > The comment could be more accurate as: If any of the inputs updated data > > > but there is no change_handler, then ... > > > > > > > + for (size_t i = 0; i < node->n_inputs; i++) { > > > > + if (node->inputs[i].node->state == EN_UPDATED) { > > > > + need_compute = true; > > > > + > > > > + /* Trigger a recompute if we don't have a change handler. > > > > */ > > > > + if (!node->inputs[i].change_handler) { > > > > + engine_recompute(node, false, !engine_abort_recompute); > > > > + return; > > > > } > > > > } > > > > } > > > > > > > > > > Thanks, > > > Han > > Hi Han, > > Thanks for your comments. Sorry I didn't reply earlier. > I'll send a new version soon improving the data encapsulation and > other issues you pointed out. > > Regards, > Dumitru
Hi Han, I just sent v3 (turned it into a series to make it easier to follow the changes) where I think I addressed everything we discussed until now: https://patchwork.ozlabs.org/project/openvswitch/list/?series=142890 I decided to implement data accessing a bit different than you suggested (through another pointer field in the engine node) but that's mainly just because it makes the code that accesses the data a bit easier to write. The internal implementation is actually the same (through an is_valid() callback). Thanks, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
