Re: [ovs-dev] [PATCH ovn] northd: Track max ACL tiers more accurately.

Mark Michelson Mon, 21 Oct 2024 06:24:22 -0700

On 10/8/24 06:30, Ales Musil wrote:

On Fri, Sep 27, 2024 at 7:15 PM Mark Michelson <[email protected]<mailto:[email protected]>> wrote:


    When ACL tiers were introduced, the code kept track of the highest ACL
    tier so that when iterating through ACL tiers, we would not attempt to
    advance the current tier past the highest configured tier.

    Unfortunately, keeping track of a single max ACL tier doesn't work when
    ACLs are evaluated in three separate places. ACLs can be evaluated on
    ingress before load balancing, on ingress after load balancing, and on
    egress. By only keeping track of a single max ACL tier, it means that we
    could perform superfluous checks if one stage of ACLs has a higher max
    tier than other stages. As an example, if ingress pre-load balancing
    ACLs have a maximum tier of 1, and egress ACLs have a maximum tier of 2,
    then it means that for all stages of ACLs, we will evaluate tiers 0, 1,
    and 2 of ACLs, even though only one stage of ACLs uses tier 2.

     From a pure functionality standpoint, this doesn't cause any issues.
    Even if we advance the tier past the highest configured value, it
    results in a no-op and the same net result happens.

    However, the addition of sampling into ACLs has caused an unwanted side
    effect. In the example scenario above, let's say the tier 1 ACL in the
    ingress pre-load balancing stage evaluates to "pass". After the
    evaluation, we send a sample for the "pass" result. We then advance
    the tier to 2, then move back to ACL evaluation. There are no tier 2
    ACLs, so we move on to the sampling stage again. We then send a second
    sample for the previous "pass" result from tier 1. The result is
    confusing since we've sent two samples for the same ACL evaluation.

    To remedy this, we now track the max ACL tier in each of the stages
    where ACLs are evaluated. Now there are no superfluous ACL evaluations
    and no superfluous samples sent either.

    Reported-at: https://issues.redhat.com/browse/FDP-760
    <https://issues.redhat.com/browse/FDP-760>
    Signed-off-by: Mark Michelson <[email protected]
    <mailto:[email protected]>>
    ---



Hi Mark,

thank you for the patch. I'm not sure if this fixes the originally reported

issue. For that issue we should generate generic sample flows that alsomatchon tier. That should result in samples being generated for theappropriate tier.

Either way we should have a test case to mimic the behavior described by
the FDP-760. I have a few additional comments down below.

I think your suggestion would also fix the issue, but my method does aswell. If you check the ovn-trace output on the issue, you can see thatthe only reason that ACLs get re-evaluated for tier 2 is that theACL_ACTION stage sends control back to the ACL_EVAL stage since itmistakenly thinks there's another tier to evaluate. By not sendingcontrol back to ACL_EVAL after tier 1, there is no chance of sending abogus sample since we won't reach the ACL_SAMPLE stage again. That beingsaid, I can incorporate your suggestion into v2 of the patch since theremay be other weird corner cases that could trigger phantom samplesbesides a bad max ACL tier count.

You're correct about the test case. I had taken the shortcut of makingsure that evaluation wouldn't reach a phantom tier just by confirminglogical flows. But having a system test to ensure the proper behavior isa good call.



      northd/en-ls-stateful.c | 48 ++++++++++++++++++++++++++++++++++-------
      northd/en-ls-stateful.h |  8 ++++++-
      northd/northd.c         | 47 ++++++++++++++++++++++++++++++----------
      tests/ovn-northd.at <http://ovn-northd.at>     | 34
    +++++++++++++++++++++++++++++
      4 files changed, 117 insertions(+), 20 deletions(-)

    diff --git a/northd/en-ls-stateful.c b/northd/en-ls-stateful.c
    index 44f74ea08..eb751d8dd 100644
    --- a/northd/en-ls-stateful.c
    +++ b/northd/en-ls-stateful.c
    @@ -204,16 +204,18 @@ ls_stateful_port_group_handler(struct
    engine_node *node, void *data_)
                  ovn_datapaths_find_by_index(input_data.ls_datapaths,
                                              ls_stateful_rec->ls_index);
              bool had_stateful_acl = ls_stateful_rec->has_stateful_acl;
    -        uint64_t max_acl_tier = ls_stateful_rec->max_acl_tier;
    +        struct acl_tier old_max = ls_stateful_rec->max_acl_tier;
              bool had_acls = ls_stateful_rec->has_acls;
              bool modified = false;

              ls_stateful_record_reinit(ls_stateful_rec, od, ls_pg,
                                        input_data.ls_port_groups);

    +        struct acl_tier new_max = ls_stateful_rec->max_acl_tier;
    +
              if ((had_stateful_acl != ls_stateful_rec->has_stateful_acl)
                  || (had_acls != ls_stateful_rec->has_acls)
    -            || max_acl_tier != ls_stateful_rec->max_acl_tier) {
    +            || memcmp(&old_max, &new_max, sizeof(old_max))) {


We have to be very careful about padding if we want to use
memcmp. Currently the struct is fine, but maybe we should
mention it in the comment?


Sure I can add a comment.


                  modified = true;
              }

    @@ -365,7 +367,8 @@ ls_stateful_record_set_acl_flags(struct
    ls_stateful_record *ls_stateful_rec,
                                       const struct ls_port_group_table
    *ls_pgs)
      {
          ls_stateful_rec->has_stateful_acl = false;
    -    ls_stateful_rec->max_acl_tier = 0;
    +    memset(&ls_stateful_rec->max_acl_tier, 0,
    +           sizeof ls_stateful_rec->max_acl_tier);
          ls_stateful_rec->has_acls = false;

          if (ls_stateful_record_set_acl_flags_(ls_stateful_rec,
    od->nbs->acls,
    @@ -391,6 +394,36 @@ ls_stateful_record_set_acl_flags(struct
    ls_stateful_record *ls_stateful_rec,
          }
      }

    +static void
    +update_ls_max_acl_tier(struct ls_stateful_record *ls_stateful_rec,
    +                       const struct nbrec_acl *acl)
    +{
    +    if (!acl->tier) {
    +        return;
    +    }
    +
    +    if (!strcmp(acl->direction, "from-lport")) {
    +        if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
    +            if (acl->tier >
    ls_stateful_rec->max_acl_tier.ingress_post_lb) {
    +                ls_stateful_rec->max_acl_tier.ingress_post_lb =
    acl->tier;
    +            }


We could use MAX(acl->tier, ls_stateful_rec->max_acl_tier.ingress_post_lb)
instead of the if statement, the same applies down below.


Thanks for the idea, I'm not sure why I didn't think of it :)


    +        } else if (acl->tier >
    ls_stateful_rec->max_acl_tier.ingress_pre_lb) {
    +            ls_stateful_rec->max_acl_tier.ingress_pre_lb = acl->tier;
    +        }
    +    } else if (acl->tier > ls_stateful_rec->max_acl_tier.egress) {
    +        ls_stateful_rec->max_acl_tier.egress = acl->tier;
    +    }
    +}
    +
    +static bool
    +ls_acl_tiers_are_maxed_out(struct acl_tier *acl_tier,
    +                           uint64_t max_allowed_acl_tier)
    +{
    +    return acl_tier->ingress_post_lb == max_allowed_acl_tier &&
    +           acl_tier->ingress_pre_lb == max_allowed_acl_tier &&
    +           acl_tier->egress == max_allowed_acl_tier;
    +}
    +
      static bool
      ls_stateful_record_set_acl_flags_(struct ls_stateful_record
    *ls_stateful_rec,
                                        struct nbrec_acl **acls,
    @@ -408,16 +441,15 @@ ls_stateful_record_set_acl_flags_(struct
    ls_stateful_record *ls_stateful_rec,
          ls_stateful_rec->has_acls = true;
          for (size_t i = 0; i < n_acls; i++) {
              const struct nbrec_acl *acl = acls[i];
    -        if (acl->tier > ls_stateful_rec->max_acl_tier) {
    -            ls_stateful_rec->max_acl_tier = acl->tier;
    -        }
    +        update_ls_max_acl_tier(ls_stateful_rec, acl);
              if (!ls_stateful_rec->has_stateful_acl
                      && !strcmp(acl->action, "allow-related")) {
                  ls_stateful_rec->has_stateful_acl = true;
              }
              if (ls_stateful_rec->has_stateful_acl &&
    -            ls_stateful_rec->max_acl_tier ==
    -                nbrec_acl_col_tier.type.value.integer.max) {
    +            ls_acl_tiers_are_maxed_out(
    +                &ls_stateful_rec->max_acl_tier,
    +                nbrec_acl_col_tier.type.value.integer.max)) {
                  return true;
              }
          }
    diff --git a/northd/en-ls-stateful.h b/northd/en-ls-stateful.h
    index eae4b08e2..18a7398a6 100644
    --- a/northd/en-ls-stateful.h
    +++ b/northd/en-ls-stateful.h
    @@ -33,6 +33,12 @@

      struct lflow_ref;

    +struct acl_tier {
    +    uint64_t ingress_pre_lb;
    +    uint64_t ingress_post_lb;
    +    uint64_t egress;
    +};
    +
      struct ls_stateful_record {
          struct hmap_node key_node;

    @@ -46,7 +52,7 @@ struct ls_stateful_record {
          bool has_stateful_acl;
          bool has_lb_vip;
          bool has_acls;
    -    uint64_t max_acl_tier;
    +    struct acl_tier max_acl_tier;

          /* 'lflow_ref' is used to reference logical flows generated for
           * this ls_stateful record.
    diff --git a/northd/northd.c b/northd/northd.c
    index a267cd5f8..9d3edee32 100644
    --- a/northd/northd.c
    +++ b/northd/northd.c
    @@ -7321,28 +7321,34 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
              S_SWITCH_OUT_ACL_EVAL,
          };

    +    uint64_t max_acl_tiers[] = {
    +        ls_stateful_rec->max_acl_tier.ingress_pre_lb,
    +        ls_stateful_rec->max_acl_tier.ingress_post_lb,
    +        ls_stateful_rec->max_acl_tier.egress,
    +    };
    +
          ds_clear(actions);
          ds_put_cstr(actions, REGBIT_ACL_VERDICT_ALLOW " = 0; "
                              REGBIT_ACL_VERDICT_DROP " = 0; "
                              REGBIT_ACL_VERDICT_REJECT " = 0; ");
    -    if (ls_stateful_rec->max_acl_tier) {
    -        ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
    -    }

          size_t verdict_len = actions->length;
    -
          for (size_t i = 0; i < ARRAY_SIZE(stages); i++) {
              enum ovn_stage stage = stages[i];
    +        if (max_acl_tiers[i]) {
    +            ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
    +        }
    +        size_t verdict_tier_len = actions->length;
              if (!ls_stateful_rec->has_acls) {
                  ovn_lflow_add(lflows, od, stage, 0, "1", "next;",
    lflow_ref);
                  continue;
              }
    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, "next;");
              ovn_lflow_add(lflows, od, stage, 1000,
                            REGBIT_ACL_VERDICT_ALLOW " == 1",
    ds_cstr(actions),
                            lflow_ref);
    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, debug_implicit_drop_action());
              ovn_lflow_add(lflows, od, stage, 1000,
                            REGBIT_ACL_VERDICT_DROP " == 1",
    @@ -7350,7 +7356,7 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                            lflow_ref);
              bool ingress = ovn_stage_get_pipeline(stage) == P_IN;

    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              build_acl_reject_action(actions, ingress);

              ovn_lflow_metered(lflows, od, stage, 1000,
    @@ -7358,12 +7364,12 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                                copp_meter_get(COPP_REJECT, od->nbs->copp,
                                meter_groups), lflow_ref);

    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, default_acl_action);
              ovn_lflow_add(lflows, od, stage, 0, "1", ds_cstr(actions),
    lflow_ref);

              struct ds tier_actions = DS_EMPTY_INITIALIZER;
    -        for (size_t j = 0; j < ls_stateful_rec->max_acl_tier; j++) {
    +        for (size_t j = 0; j < max_acl_tiers[i]; j++) {
                  ds_clear(match);
                  ds_put_format(match, REG_ACL_TIER " == %"PRIuSIZE, j);
                  ds_clear(&tier_actions);
    @@ -7375,6 +7381,7 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                               ds_cstr(&tier_actions), lflow_ref);
              }
              ds_destroy(&tier_actions);
    +        ds_truncate(actions, verdict_len);
          }
      }

    @@ -7443,6 +7450,21 @@ build_acl_log_related_flows(const struct
    ovn_datapath *od,
                                  &acl->header_, lflow_ref);
      }

    +static uint64_t
    +choose_max_acl_tier(const struct ls_stateful_record *ls_stateful_rec,
    +                    const struct nbrec_acl *acl)
    +{
    +    if (!strcmp(acl->direction, "from-lport")) {
    +        if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
    +            return ls_stateful_rec->max_acl_tier.ingress_post_lb;
    +        } else {
    +            return ls_stateful_rec->max_acl_tier.ingress_pre_lb;
    +        }
    +    } else {
    +        return ls_stateful_rec->max_acl_tier.egress;
    +    }
    +}

It would probably fit a separate patch, most the NAT related functions are
parsing the type/stage over and over again. It would help to have an enum
(maybe to repurpose enum acl_observation_stage) and do it only once in the
loop. This could also lead to replacement of struct acl_tier with array.
WDYT?

I had an idea when writing this patch that at the beginning ofbuild_acls(), we could loop through all ACLs and sort them into ingress,ingress after lb, and egress. This isn't the exact structure I wasthinking, but it would be something like this:


    struct acl_stage {
        enum ovn_stage eval_stage;
        enum ovn_stage sample_stage;
        enum ovn_stage action_stage;
        enum ovn_stage reject_stage;
        uint32_t num_tiers;
        struct nbrec_acl **acls;
    };

Then we would declare three structs and fill them in with theappropriate details:


    struct acl_stage ingress_pre_lb;
    struct acl_stage ingress_post_lb;
    struct acl_stage egress;

This way, we are not constantly re-evaluating ingress vs egress, pre-lbvs post-lb, etc. We do it once, sort the ACLs into their acl_stagestructs, and pass them appropriately to generate the logical flows. Andif (god forbid) we add a new place where ACLs are evaluated in a logicalswitch, it would just be a matter of creating a new acl_stage struct andfilling its details in to get the logical flows written. It would makeconsider_acl() much more streamlined since it would reference fields onstruct acl_stage instead of using a bunch of if statements to determinethe logic.

I ended up not doing this because this felt more like a refactor than abug fix. However, I agree that the current processing is wasteful andshould be addressed.


    +
      static void
      build_acls(const struct ls_stateful_record *ls_stateful_rec,
                 const struct ovn_datapath *od,
    @@ -7639,8 +7661,9 @@ build_acls(const struct ls_stateful_record
    *ls_stateful_rec,
              build_acl_log_related_flows(od, lflows, acl, has_stateful,
                                          meter_groups, &match, &actions,
                                          lflow_ref);
    +        uint64_t max_acl_tier =
    choose_max_acl_tier(ls_stateful_rec, acl);
              consider_acl(lflows, od, acl, has_stateful,
    -                     meter_groups, ls_stateful_rec->max_acl_tier,
    +                     meter_groups, max_acl_tier,
                           &match, &actions, lflow_ref);
              build_acl_sample_flows(ls_stateful_rec, od, lflows, acl,
                                     &match, &actions, sampling_apps,
    @@ -7658,8 +7681,10 @@ build_acls(const struct ls_stateful_record
    *ls_stateful_rec,
                      build_acl_log_related_flows(od, lflows, acl,
    has_stateful,
                                                  meter_groups, &match,
    &actions,
                                                  lflow_ref);
    +                uint64_t max_acl_tier =
    choose_max_acl_tier(ls_stateful_rec,
    +                                                            acl);
                      consider_acl(lflows, od, acl, has_stateful,
    -                             meter_groups,
    ls_stateful_rec->max_acl_tier,
    +                             meter_groups, max_acl_tier,
                                   &match, &actions, lflow_ref);
                      build_acl_sample_flows(ls_stateful_rec, od,
    lflows, acl,
                                             &match, &actions,
    sampling_apps,
    diff --git a/tests/ovn-northd.at <http://ovn-northd.at>
    b/tests/ovn-northd.at <http://ovn-northd.at>
    index dcc3dbbc3..dd2ad2330 100644
    --- a/tests/ovn-northd.at <http://ovn-northd.at>
    +++ b/tests/ovn-northd.at <http://ovn-northd.at>
    @@ -13864,3 +13864,37 @@ check_no_redirect

      AT_CLEANUP
      ])
    +
    +OVN_FOR_EACH_NORTHD_NO_HV([
    +AT_SETUP([ACL mismatched tiers])
    +ovn_start
    +
    +check ovn-nbctl ls-add S1
    +
    +# Ingress pre-lb ACLs have only a tier 1 ACL configured.
    +# Ingress post-lb ACLs have tier up to 3 configured.
    +# Egress ACLs have up to tier 2 configured.
    +check ovn-nbctl --tier=1 acl-add S1 from-lport 1001 "tcp" allow
    +check ovn-nbctl --tier=3 --apply-after-lb acl-add  S1 from-lport
    1001 "tcp" allow
    +check ovn-nbctl --tier=2 acl-add S1 to-lport 1001 "tcp" allow
    +
    +# Ingress pre-lb ACLs should only ever increment the tier to 1.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_action | grep
    priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_in_acl_action   ), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=ingress,table=??);)
    +])
    +
    +# Ingress post-lb ACLs should increment the tier to 3.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_after_lb_action
    | grep priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=ingress,table=??);)
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
    next(pipeline=ingress,table=??);)
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 2), action=(reg8[[30..31]] = 3;
    next(pipeline=ingress,table=??);)
    +])
    +
    +# Egress ACLs should increment the tier to 2.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_out_acl_action | grep
    priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_out_acl_action  ), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=egress,table=??);)
    +  table=??(ls_out_acl_action  ), priority=500  ,
    match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
    next(pipeline=egress,table=??);)
    +])
    +
    +AT_CLEANUP
    +])

--2.45.2


    _______________________________________________
    dev mailing list
    [email protected] <mailto:[email protected]>
    https://mail.openvswitch.org/mailman/listinfo/ovs-dev
    <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>


Thanks,
Ales
--

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected] <mailto:[email protected]>

<https://red.ht/sig>


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn] northd: Track max ACL tiers more accurately.

Reply via email to