The incremental processing engine might stop a run before the
en_runtime_data node is processed. In such cases the ed_runtime_data
fields might contain pointers to already deleted SB records. For
example, if a port binding corresponding to a patch port is removed from
the SB database and the incremental processing engine aborts before the
en_runtime_data node is processed then the corresponding local_datapath
hashtable entry in ed_runtime_data is stale and will store a pointer to
the already freed sbrec_port_binding record.

This will cause invalid memory accesses in various places (e.g.,
pinctrl_run() -> prepare_ipv6_ras()).

This series fixes the issue (patch3) but to make the fix generic and
easier to debug it first refactors the incremental processing engine in
the following way:
- patch1: remove recursion from the I-P engine code. Introduce node states
  to track validity of node data.
- patch2: move ct-zones to its own engine node in order to remove dependencies
  on other runtime data.

CC: Han Zhou <hz...@ovn.org>
Fixes: ca278d98a4f5 ("ovn-controller: Initial use of incremental engine - quiet 
mode.")
Signed-off-by: Dumitru Ceara <dce...@redhat.com>

Dumitru Ceara (3):
      ovn-controller: Add per node states to I-P engine.
      ovn-controller: Add separate I-P engine node for processing ct-zones.
      ovn-controller: Fix use of dangling pointers in I-P runtime_data.


 controller/ovn-controller.c |  410 ++++++++++++++++++++++++-------------------
 lib/inc-proc-eng.c          |  267 +++++++++++++++++++++-------
 lib/inc-proc-eng.h          |  102 ++++++++---
 3 files changed, 501 insertions(+), 278 deletions(-)


---
v7:
- Remove first patch from series as it was already applied.
- Address Han's comments:
    - Reintroduce "engine_run_done" to prevent unnecessary engine runs when
      ofctrl_can_put() is false. However, to make it easier to follow the
      code in ovn-controller.c, the variable is moved in inc-proc-eng.c and
      renamed to engine_run_aborted. This also simplifies the
      engine_aborted() function.
    - Change engine_run's argument from 'abort_on_recompute' to
      'recompute_allowed' to avoid negated logic in the code.
    - Stop an engine_run() immediately when a node aborts. This also
      has implications on engine_has_run() because now the ABORTED
      state doesn't get propagated so in order to determine if a
      run was completed we need to check that all nodes of the engine
      are in state EN_STALE. Also, engine_need_run() should be
      called only if engine_has_run() returns false because engine_need_run()
      calls the 'run' method on the leaf nodes, potentially changing their
      states.
    - Simplify return from function engine_compute() when a change_handler
      fails to incrementally update a node.
    - Remove check of nodes input states. It's guaranteed by the topological
      sort and the fact that we stop processing nodes when one aborts.
v6:
- Address Han's comments:
    - Call engine_recompute only once for a node if at least one of its input
      nodes' change handler returns false.
    - Simplify the incremental engine API and internally store the
      topologically sorted engine nodes.
    - Change 'engine_abort_recompute' from global variable to argument to be
      passed to engine_run(). It's only relevant in one run context anyway
      as we used to reset it before every call to engine_run().
    - engine_init_run() and engine_has_run() now check all the nodes in the
      engine instead of a single one.
    - Change 'engine_node_valid()' to call the node's 'is_valid' method only
      if the node is not in state EN_UPDATED or EN_VALID.
v5:
- Rebase.
v4:
- Address Numan's comments:
    - Fix engine_need_run().
v3:
- split the change in series.
- Address Han's comments:
    - fix the data encapsulation issue.
    - add is_valid method to nodes.
    - add internal_data/data fields to nodes as it makes it easier to write
      the code instead of adding an "engine_get_data()" API.
v2: Address Han's comments:
- call engine_node_valid() in all the places where node local data is
  used.
- move out "global" data outside the engine nodes. Make a clear
  separation between data that can be safely used at any time and data
  that can be used only when the engine run was successful.
- add a debug log for iterations when the engine didn't run.
- refactor a bit more the incremental engine code.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to