The incremental processing engine might stop a run before the en_runtime_data node is processed. In such cases the ed_runtime_data fields might contain pointers to already deleted SB records. For example, if a port binding corresponding to a patch port is removed from the SB database and the incremental processing engine aborts before the en_runtime_data node is processed then the corresponding local_datapath hashtable entry in ed_runtime_data is stale and will store a pointer to the already freed sbrec_port_binding record.
This will cause invalid memory accesses in various places (e.g., pinctrl_run() -> prepare_ipv6_ras()). This series fixes the issue (patch3) but to make the fix generic and easier to debug it first refactors the incremental processing engine in the following way: - patch1: remove recursion from the I-P engine code. Introduce node states to track validity of node data. - patch2: move ct-zones to its own engine node in order to remove dependencies on other runtime data. CC: Han Zhou <hz...@ovn.org> Fixes: ca278d98a4f5 ("ovn-controller: Initial use of incremental engine - quiet mode.") Signed-off-by: Dumitru Ceara <dce...@redhat.com> Dumitru Ceara (3): ovn-controller: Add per node states to I-P engine. ovn-controller: Add separate I-P engine node for processing ct-zones. ovn-controller: Fix use of dangling pointers in I-P runtime_data. controller/ovn-controller.c | 410 ++++++++++++++++++++++++------------------- lib/inc-proc-eng.c | 267 +++++++++++++++++++++------- lib/inc-proc-eng.h | 102 ++++++++--- 3 files changed, 501 insertions(+), 278 deletions(-) --- v7: - Remove first patch from series as it was already applied. - Address Han's comments: - Reintroduce "engine_run_done" to prevent unnecessary engine runs when ofctrl_can_put() is false. However, to make it easier to follow the code in ovn-controller.c, the variable is moved in inc-proc-eng.c and renamed to engine_run_aborted. This also simplifies the engine_aborted() function. - Change engine_run's argument from 'abort_on_recompute' to 'recompute_allowed' to avoid negated logic in the code. - Stop an engine_run() immediately when a node aborts. This also has implications on engine_has_run() because now the ABORTED state doesn't get propagated so in order to determine if a run was completed we need to check that all nodes of the engine are in state EN_STALE. Also, engine_need_run() should be called only if engine_has_run() returns false because engine_need_run() calls the 'run' method on the leaf nodes, potentially changing their states. - Simplify return from function engine_compute() when a change_handler fails to incrementally update a node. - Remove check of nodes input states. It's guaranteed by the topological sort and the fact that we stop processing nodes when one aborts. v6: - Address Han's comments: - Call engine_recompute only once for a node if at least one of its input nodes' change handler returns false. - Simplify the incremental engine API and internally store the topologically sorted engine nodes. - Change 'engine_abort_recompute' from global variable to argument to be passed to engine_run(). It's only relevant in one run context anyway as we used to reset it before every call to engine_run(). - engine_init_run() and engine_has_run() now check all the nodes in the engine instead of a single one. - Change 'engine_node_valid()' to call the node's 'is_valid' method only if the node is not in state EN_UPDATED or EN_VALID. v5: - Rebase. v4: - Address Numan's comments: - Fix engine_need_run(). v3: - split the change in series. - Address Han's comments: - fix the data encapsulation issue. - add is_valid method to nodes. - add internal_data/data fields to nodes as it makes it easier to write the code instead of adding an "engine_get_data()" API. v2: Address Han's comments: - call engine_node_valid() in all the places where node local data is used. - move out "global" data outside the engine nodes. Make a clear separation between data that can be safely used at any time and data that can be used only when the engine run was successful. - add a debug log for iterations when the engine didn't run. - refactor a bit more the incremental engine code. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev