On Fri, Nov 8, 2019 at 8:27 PM Han Zhou <[email protected]> wrote: > > > > On Fri, Nov 8, 2019 at 11:22 AM Han Zhou <[email protected]> wrote: > > > > 1. storage data and the void *arg of init() breaks the engine node data > > encapsulation. > > 2. engine_node_valid(&en_flow_output, engine_run_id) is not needed? Should > > use storage to access instead? > > 3. refactor of engine is good but better to be a separate commit > > 4. we can have a new interface: engine_get_data(), which returns data if it > > is valid. we should never expose the data directly. We should init the > > engine node with dynamically allocated engine data structure (and remember > > to free during destroy) > > Oops! please ignore the above part since it was draft and I forgot to delete > after editing the formal response, mostly redundant :-) > Real response started here => > > > > Hi Dumitru, > > > > Sorry for late response. > > On Mon, Nov 4, 2019 at 4:54 AM Dumitru Ceara <[email protected]> wrote: > > > > > > The incremental processing engine might stop a run before the > > > en_runtime_data node is processed. In such cases the ed_runtime_data > > > fields might contain pointers to already deleted SB records. For > > > example, if a port binding corresponding to a patch port is removed from > > > the SB database and the incremental processing engine aborts before the > > > en_runtime_data node is processed then the corresponding local_datapath > > > hashtable entry in ed_runtime_data is stale and will store a pointer to > > > the already freed sbrec_port_binding record. > > > > > > This will cause invalid memory accesses in various places (e.g., > > > pinctrl_run() -> prepare_ipv6_ras()). > > > > > > To fix the issue we need a way to track how each node was processed > > > during an engine run. This commit transforms the 'changed' field in > > > struct engine_node in a 'state' field. Possible node states are: > > > - "New": the node is not yet initialized. > > > - "Stale": data in the node is not up to date with the DB. > > > - "Updated": data in the node is valid but was updated during > > > the last run of the engine. > > > - "Valid": data in the node is valid and didn't change during > > > the last run of the engine. > > > - "Aborted": during the last run, processing was aborted for > > > this node. > > > - "Destroyed": the node was already cleaned up. > > > > > > We also add a separation between engine node data that can be accessed > > > at any time (regardless if the last engine run was successful or not) > > > and data that may be accessed only if the nodes are up to date. This > > > helps avoiding custom "engine_node_valid" handlers for different > > > nodes. > > > > > > The commit also simplifies the logic of calling engine_run and > > > engine_need_run in order to reduce the number of external variables > > > required to track the result of the last engine execution. > > > > > > Functions that need to be called from the main loop and depend on > > > various data contents of the engine's nodes are now called only if > > > the data is up to date. > > > > > > CC: Han Zhou <[email protected]> > > > Fixes: ca278d98a4f5 ("ovn-controller: Initial use of incremental engine - > > > quiet mode.") > > > Signed-off-by: Dumitru Ceara <[email protected]> > > > > > > --- > > > v2: Address Han's comments: > > > - call engine_node_valid() in all the places where node local data is > > > used. > > > - move out "global" data outside the engine nodes. Make a clear > > > separation between data that can be safely used at any time and data > > > that can be used only when the engine run was successful. > > > > I am concerned with this kind of separation of *global* data, which breaks > > the data encapsulation of engine node, and can easily result in hidden > > dependency. As you know the main purpose of the I-P engine is to force > > explicit dependency exposed between different engine nodes thus ensure the > > correctness (at least it helps to ensure) of incremental processing. > > > > Here is my proposal to address the problem with better encapsulation. > > > > Firstly, let's avoid direct engine data access outside of engine module. At > > engine node construction, instead of using reference of stack variable > > (such as struct ed_type_runtime_data ed_runtime_data), we can allocate the > > memory in the engine node's init() interface, and free in the cleanup() > > interface. This way, there will be no way to directly access engine data > > like &ed_runtime_data.local_datapaths. > > > > Secondly, let's add a engine module interface engine_get_data() to retrieve > > *and validate* data for an engine node: > > void * > > engine_get_data(struct engine_node *node, uint64_t run_id) > > { > > if (engine_node_valid(node, run_id)) { > > return node->data; > > } > > return NULL; > > } > > > > This should be used whenever an engine node data need to be accessed. (It > > may be even better to use node name as input instead of node structure, but > > it seems ok to me either way) > > > > Now let's address the always-valid node problem. I was proposing an > > is_valid() interface for engine node. It can be NULL by default, but if a > > node xxx's data is always valid, it can be implemented like: > > > > static bool > > en_xxx_is_valid(struct engine_node *node) > > { > > /* This node's data will always be valid */ > > return true; > > } > > > > For the engine_node_valid() function, it can be changed slightly: > > > > bool > > engine_node_valid(struct engine_node *node, uint64_t run_id) > > { > > if (node->is_valid) { > > return node->is_valid(); > > } > > return node->run_id == run_id && > > (node->state == EN_UPDATED || node->state == EN_VALID); > > } > > > > So if is_valid is not implemented, it will be the default check, otherwise, > > follow whatever logic is used for is_valid(). This is flexible and may fit > > for some new cases if a node's data validity depends on some other factors > > it can always be customized in is_valid() interface of that node. > > > > This may result in a little bit more code, since we will have to use > > engine_get_data() to retrieve data from a node (and convert to its node's > > data type) before accessing, but it would look more clean and safe, while > > keeping the dependency well maintained. Does this sound plausible? > > > > Thanks for the other updates and refactors. Please see some minor comments > > inlined below. > > > > > > > > - if (en_runtime_data.changed) { > > > + /* We need to make sure the en_flow_output node was > > > + * properly computed before trying to install OF > > > flows. > > > + */ > > > + if (engine_node_valid(&en_flow_output, > > > engine_run_id)) { > > > > I guess your intention of using storage.flow_table here was to avoid the > > engine_node_valid check, so that this "if" can be avoided, right? (However, > > I suggest not using storage at all as discussed above) > > > > > + ofctrl_put(&storage.flow_table, > > > + &storage.pending_ct_zones, > > > + > > > sbrec_meter_table_get(ovnsb_idl_loop.idl), > > > + get_nb_cfg(sbrec_sb_global_table_get( > > > + ovnsb_idl_loop.idl)), > > > + engine_node_changed(&en_flow_output, > > > + engine_run_id)); > > > + } > > > > > > > > > > > if (engine_force_recompute) { > > > - need_recompute = true; > > > - } else { > > > - for (size_t i = 0; i < node->n_inputs; i++) { > > > - if (node->inputs[i].node->changed) { > > > - need_compute = true; > > > - if (!node->inputs[i].change_handler) { > > > - need_recompute = true; > > > - break; > > > - } > > > + engine_recompute(node, true, !engine_abort_recompute); > > > + return; > > > + } > > > + > > > + /* If one of the inputs updated data then we need to recompute the > > > + * current node too. > > > + */ > > > > The comment could be more accurate as: If any of the inputs updated data > > but there is no change_handler, then ... > > > > > + for (size_t i = 0; i < node->n_inputs; i++) { > > > + if (node->inputs[i].node->state == EN_UPDATED) { > > > + need_compute = true; > > > + > > > + /* Trigger a recompute if we don't have a change handler. */ > > > + if (!node->inputs[i].change_handler) { > > > + engine_recompute(node, false, !engine_abort_recompute); > > > + return; > > > } > > > } > > > } > > > > > > > Thanks, > > Han
Hi Han, Thanks for your comments. Sorry I didn't reply earlier. I'll send a new version soon improving the data encapsulation and other issues you pointed out. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
