On Thu, May 23, 2024 at 6:47 AM Roi Dayan via dev
<[email protected]> wrote:
>
> It is observed in some environments that there are much more ukeys than
> actual DP flows. For example:
>
> $ ovs-appctl upcall/show
> system@ovs-system:
> flows : (current 7) (avg 6) (max 117) (limit 2125)
> offloaded flows : 525
> dump duration : 1063ms
> ufid enabled : true
>
> 23: (keys 3612)
> 24: (keys 3625)
> 25: (keys 3485)
>
> The revalidator threads are busy revalidating the stale ukeys leading to
> high CPU and long dump duration.
>
> There are some possible situations that may result in stale ukeys that
> have no corresponding DP flows.
>
> In revalidator, push_dp_ops() doesn't check error if the op type is not
> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
> case, where the old flow is deleted first and then the new one is
> created. If the creation fails, the ukey will be stale (no corresponding
> DP flow). This patch adds a warning in such case.
>
> Another possible scenario is in handle_upcalls() if a PUT operation did
> not succeed and op->error attribute was not set correctly it can lead to
> stale ukey in operational state.
>
> This patch adds checks in the sweep phase for such ukeys and move them
> to DELETE so that they can be cleared eventually.
>
> Co-authored-by: Han Zhou <[email protected]>
> Signed-off-by: Han Zhou <[email protected]>
> Signed-off-by: Roi Dayan <[email protected]>
> ---
> ofproto/ofproto-dpif-upcall.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 83609ec62b63..e9520ebdf910 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
> COVERAGE_DEFINE(dumped_new_flow);
> COVERAGE_DEFINE(handler_duplicate_upcall);
> COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(revalidate_missed_dp_flow_del);
> COVERAGE_DEFINE(ukey_dp_change);
> COVERAGE_DEFINE(ukey_invalid_stat_reset);
> COVERAGE_DEFINE(ukey_replace_contention);
> @@ -278,6 +279,7 @@ enum flow_del_reason {
> FDR_BAD_ODP_FIT, /* Bad ODP flow fit. */
> FDR_FLOW_IDLE, /* Flow idle timeout. */
> FDR_FLOW_LIMIT, /* Kill all flows condition reached. */
> + FDR_FLOW_STALE, /* Flow stale detected. */
> FDR_FLOW_WILDCARDED, /* Flow needs a narrower wildcard mask. */
> FDR_NO_OFPROTO, /* Bridge not found. */
> FDR_PURGE, /* User requested flow deletion. */
> @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops,
> size_t n_ops)
>
> if (op->dop.type != DPIF_OP_FLOW_DEL) {
> /* Only deleted flows need their stats pushed. */
> + if (op->dop.error) {
> + VLOG_WARN_RL(&rl, "push_dp_ops: error %d in op type %d, ukey
> "
> + "%p", op->dop.error, op->dop.type, op->ukey);
> + }
> continue;
> }
>
> @@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator,
> bool purge)
> del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
> } else if (!seq_mismatch) {
> result = UKEY_KEEP;
> + } else if (!ukey->stats.used &&
Would it be possible for stats.used to be set but the dp flow to be
deleted? For example, if a flow is offloaded to TC, but something
external to OVS clears it?
Thanks,
Mike
> + udpif_flow_time_delta(udpif, ukey) * 1000 >
> + ofproto_max_idle) {
> + COVERAGE_INC(revalidate_missed_dp_flow_del);
> + VLOG_WARN_RL(&rl, "revalidator_sweep__: Remove stale
> ukey "
> + "%p delta %llus", ukey,
> + udpif_flow_time_delta(udpif, ukey));
> + result = UKEY_DELETE;
> + del_reason = FDR_FLOW_STALE;
> } else {
> struct dpif_flow_stats stats;
> COVERAGE_INC(revalidate_missed_dp_flow);
> --
> 2.21.0
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev