On 10/8/24 06:30, Ales Musil wrote:


On Fri, Sep 27, 2024 at 7:15 PM Mark Michelson <[email protected] <mailto:[email protected]>> wrote:

    When ACL tiers were introduced, the code kept track of the highest ACL
    tier so that when iterating through ACL tiers, we would not attempt to
    advance the current tier past the highest configured tier.

    Unfortunately, keeping track of a single max ACL tier doesn't work when
    ACLs are evaluated in three separate places. ACLs can be evaluated on
    ingress before load balancing, on ingress after load balancing, and on
    egress. By only keeping track of a single max ACL tier, it means that we
    could perform superfluous checks if one stage of ACLs has a higher max
    tier than other stages. As an example, if ingress pre-load balancing
    ACLs have a maximum tier of 1, and egress ACLs have a maximum tier of 2,
    then it means that for all stages of ACLs, we will evaluate tiers 0, 1,
    and 2 of ACLs, even though only one stage of ACLs uses tier 2.

     From a pure functionality standpoint, this doesn't cause any issues.
    Even if we advance the tier past the highest configured value, it
    results in a no-op and the same net result happens.

    However, the addition of sampling into ACLs has caused an unwanted side
    effect. In the example scenario above, let's say the tier 1 ACL in the
    ingress pre-load balancing stage evaluates to "pass". After the
    evaluation, we send a sample for the "pass" result. We then advance
    the tier to 2, then move back to ACL evaluation. There are no tier 2
    ACLs, so we move on to the sampling stage again. We then send a second
    sample for the previous "pass" result from tier 1. The result is
    confusing since we've sent two samples for the same ACL evaluation.

    To remedy this, we now track the max ACL tier in each of the stages
    where ACLs are evaluated. Now there are no superfluous ACL evaluations
    and no superfluous samples sent either.

    Reported-at: https://issues.redhat.com/browse/FDP-760
    <https://issues.redhat.com/browse/FDP-760>
    Signed-off-by: Mark Michelson <[email protected]
    <mailto:[email protected]>>
    ---



Hi Mark,

thank you for the patch. I'm not sure if this fixes the originally reported
issue. For that issue we should generate generic sample flows that also match on tier. That should result in samples being generated for the appropriate tier.
Either way we should have a test case to mimic the behavior described by
the FDP-760. I have a few additional comments down below.

I think your suggestion would also fix the issue, but my method does as well. If you check the ovn-trace output on the issue, you can see that the only reason that ACLs get re-evaluated for tier 2 is that the ACL_ACTION stage sends control back to the ACL_EVAL stage since it mistakenly thinks there's another tier to evaluate. By not sending control back to ACL_EVAL after tier 1, there is no chance of sending a bogus sample since we won't reach the ACL_SAMPLE stage again. That being said, I can incorporate your suggestion into v2 of the patch since there may be other weird corner cases that could trigger phantom samples besides a bad max ACL tier count.

You're correct about the test case. I had taken the shortcut of making sure that evaluation wouldn't reach a phantom tier just by confirming logical flows. But having a system test to ensure the proper behavior is a good call.



      northd/en-ls-stateful.c | 48 ++++++++++++++++++++++++++++++++++-------
      northd/en-ls-stateful.h |  8 ++++++-
      northd/northd.c         | 47 ++++++++++++++++++++++++++++++----------
      tests/ovn-northd.at <http://ovn-northd.at>     | 34
    +++++++++++++++++++++++++++++
      4 files changed, 117 insertions(+), 20 deletions(-)

    diff --git a/northd/en-ls-stateful.c b/northd/en-ls-stateful.c
    index 44f74ea08..eb751d8dd 100644
    --- a/northd/en-ls-stateful.c
    +++ b/northd/en-ls-stateful.c
    @@ -204,16 +204,18 @@ ls_stateful_port_group_handler(struct
    engine_node *node, void *data_)
                  ovn_datapaths_find_by_index(input_data.ls_datapaths,
                                              ls_stateful_rec->ls_index);
              bool had_stateful_acl = ls_stateful_rec->has_stateful_acl;
    -        uint64_t max_acl_tier = ls_stateful_rec->max_acl_tier;
    +        struct acl_tier old_max = ls_stateful_rec->max_acl_tier;
              bool had_acls = ls_stateful_rec->has_acls;
              bool modified = false;

              ls_stateful_record_reinit(ls_stateful_rec, od, ls_pg,
                                        input_data.ls_port_groups);

    +        struct acl_tier new_max = ls_stateful_rec->max_acl_tier;
    +
              if ((had_stateful_acl != ls_stateful_rec->has_stateful_acl)
                  || (had_acls != ls_stateful_rec->has_acls)
    -            || max_acl_tier != ls_stateful_rec->max_acl_tier) {
    +            || memcmp(&old_max, &new_max, sizeof(old_max))) {


We have to be very careful about padding if we want to use
memcmp. Currently the struct is fine, but maybe we should
mention it in the comment?

Sure I can add a comment.


                  modified = true;
              }

    @@ -365,7 +367,8 @@ ls_stateful_record_set_acl_flags(struct
    ls_stateful_record *ls_stateful_rec,
                                       const struct ls_port_group_table
    *ls_pgs)
      {
          ls_stateful_rec->has_stateful_acl = false;
    -    ls_stateful_rec->max_acl_tier = 0;
    +    memset(&ls_stateful_rec->max_acl_tier, 0,
    +           sizeof ls_stateful_rec->max_acl_tier);
          ls_stateful_rec->has_acls = false;

          if (ls_stateful_record_set_acl_flags_(ls_stateful_rec,
    od->nbs->acls,
    @@ -391,6 +394,36 @@ ls_stateful_record_set_acl_flags(struct
    ls_stateful_record *ls_stateful_rec,
          }
      }

    +static void
    +update_ls_max_acl_tier(struct ls_stateful_record *ls_stateful_rec,
    +                       const struct nbrec_acl *acl)
    +{
    +    if (!acl->tier) {
    +        return;
    +    }
    +
    +    if (!strcmp(acl->direction, "from-lport")) {
    +        if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
    +            if (acl->tier >
    ls_stateful_rec->max_acl_tier.ingress_post_lb) {
    +                ls_stateful_rec->max_acl_tier.ingress_post_lb =
    acl->tier;
    +            }


We could use MAX(acl->tier, ls_stateful_rec->max_acl_tier.ingress_post_lb)
instead of the if statement, the same applies down below.

Thanks for the idea, I'm not sure why I didn't think of it :)


    +        } else if (acl->tier >
    ls_stateful_rec->max_acl_tier.ingress_pre_lb) {
    +            ls_stateful_rec->max_acl_tier.ingress_pre_lb = acl->tier;
    +        }
    +    } else if (acl->tier > ls_stateful_rec->max_acl_tier.egress) {
    +        ls_stateful_rec->max_acl_tier.egress = acl->tier;
    +    }
    +}
    +
    +static bool
    +ls_acl_tiers_are_maxed_out(struct acl_tier *acl_tier,
    +                           uint64_t max_allowed_acl_tier)
    +{
    +    return acl_tier->ingress_post_lb == max_allowed_acl_tier &&
    +           acl_tier->ingress_pre_lb == max_allowed_acl_tier &&
    +           acl_tier->egress == max_allowed_acl_tier;
    +}
    +
      static bool
      ls_stateful_record_set_acl_flags_(struct ls_stateful_record
    *ls_stateful_rec,
                                        struct nbrec_acl **acls,
    @@ -408,16 +441,15 @@ ls_stateful_record_set_acl_flags_(struct
    ls_stateful_record *ls_stateful_rec,
          ls_stateful_rec->has_acls = true;
          for (size_t i = 0; i < n_acls; i++) {
              const struct nbrec_acl *acl = acls[i];
    -        if (acl->tier > ls_stateful_rec->max_acl_tier) {
    -            ls_stateful_rec->max_acl_tier = acl->tier;
    -        }
    +        update_ls_max_acl_tier(ls_stateful_rec, acl);
              if (!ls_stateful_rec->has_stateful_acl
                      && !strcmp(acl->action, "allow-related")) {
                  ls_stateful_rec->has_stateful_acl = true;
              }
              if (ls_stateful_rec->has_stateful_acl &&
    -            ls_stateful_rec->max_acl_tier ==
    -                nbrec_acl_col_tier.type.value.integer.max) {
    +            ls_acl_tiers_are_maxed_out(
    +                &ls_stateful_rec->max_acl_tier,
    +                nbrec_acl_col_tier.type.value.integer.max)) {
                  return true;
              }
          }
    diff --git a/northd/en-ls-stateful.h b/northd/en-ls-stateful.h
    index eae4b08e2..18a7398a6 100644
    --- a/northd/en-ls-stateful.h
    +++ b/northd/en-ls-stateful.h
    @@ -33,6 +33,12 @@

      struct lflow_ref;

    +struct acl_tier {
    +    uint64_t ingress_pre_lb;
    +    uint64_t ingress_post_lb;
    +    uint64_t egress;
    +};
    +
      struct ls_stateful_record {
          struct hmap_node key_node;

    @@ -46,7 +52,7 @@ struct ls_stateful_record {
          bool has_stateful_acl;
          bool has_lb_vip;
          bool has_acls;
    -    uint64_t max_acl_tier;
    +    struct acl_tier max_acl_tier;

          /* 'lflow_ref' is used to reference logical flows generated for
           * this ls_stateful record.
    diff --git a/northd/northd.c b/northd/northd.c
    index a267cd5f8..9d3edee32 100644
    --- a/northd/northd.c
    +++ b/northd/northd.c
    @@ -7321,28 +7321,34 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
              S_SWITCH_OUT_ACL_EVAL,
          };

    +    uint64_t max_acl_tiers[] = {
    +        ls_stateful_rec->max_acl_tier.ingress_pre_lb,
    +        ls_stateful_rec->max_acl_tier.ingress_post_lb,
    +        ls_stateful_rec->max_acl_tier.egress,
    +    };
    +
          ds_clear(actions);
          ds_put_cstr(actions, REGBIT_ACL_VERDICT_ALLOW " = 0; "
                              REGBIT_ACL_VERDICT_DROP " = 0; "
                              REGBIT_ACL_VERDICT_REJECT " = 0; ");
    -    if (ls_stateful_rec->max_acl_tier) {
    -        ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
    -    }

          size_t verdict_len = actions->length;
    -
          for (size_t i = 0; i < ARRAY_SIZE(stages); i++) {
              enum ovn_stage stage = stages[i];
    +        if (max_acl_tiers[i]) {
    +            ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
    +        }
    +        size_t verdict_tier_len = actions->length;
              if (!ls_stateful_rec->has_acls) {
                  ovn_lflow_add(lflows, od, stage, 0, "1", "next;",
    lflow_ref);
                  continue;
              }
    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, "next;");
              ovn_lflow_add(lflows, od, stage, 1000,
                            REGBIT_ACL_VERDICT_ALLOW " == 1",
    ds_cstr(actions),
                            lflow_ref);
    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, debug_implicit_drop_action());
              ovn_lflow_add(lflows, od, stage, 1000,
                            REGBIT_ACL_VERDICT_DROP " == 1",
    @@ -7350,7 +7356,7 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                            lflow_ref);
              bool ingress = ovn_stage_get_pipeline(stage) == P_IN;

    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              build_acl_reject_action(actions, ingress);

              ovn_lflow_metered(lflows, od, stage, 1000,
    @@ -7358,12 +7364,12 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                                copp_meter_get(COPP_REJECT, od->nbs->copp,
                                meter_groups), lflow_ref);

    -        ds_truncate(actions, verdict_len);
    +        ds_truncate(actions, verdict_tier_len);
              ds_put_cstr(actions, default_acl_action);
              ovn_lflow_add(lflows, od, stage, 0, "1", ds_cstr(actions),
    lflow_ref);

              struct ds tier_actions = DS_EMPTY_INITIALIZER;
    -        for (size_t j = 0; j < ls_stateful_rec->max_acl_tier; j++) {
    +        for (size_t j = 0; j < max_acl_tiers[i]; j++) {
                  ds_clear(match);
                  ds_put_format(match, REG_ACL_TIER " == %"PRIuSIZE, j);
                  ds_clear(&tier_actions);
    @@ -7375,6 +7381,7 @@ build_acl_action_lflows(const struct
    ls_stateful_record *ls_stateful_rec,
                               ds_cstr(&tier_actions), lflow_ref);
              }
              ds_destroy(&tier_actions);
    +        ds_truncate(actions, verdict_len);
          }
      }

    @@ -7443,6 +7450,21 @@ build_acl_log_related_flows(const struct
    ovn_datapath *od,
                                  &acl->header_, lflow_ref);
      }

    +static uint64_t
    +choose_max_acl_tier(const struct ls_stateful_record *ls_stateful_rec,
    +                    const struct nbrec_acl *acl)
    +{
    +    if (!strcmp(acl->direction, "from-lport")) {
    +        if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
    +            return ls_stateful_rec->max_acl_tier.ingress_post_lb;
    +        } else {
    +            return ls_stateful_rec->max_acl_tier.ingress_pre_lb;
    +        }
    +    } else {
    +        return ls_stateful_rec->max_acl_tier.egress;
    +    }
    +}

It would probably fit a separate patch, most the NAT related functions are
parsing the type/stage over and over again. It would help to have an enum
(maybe to repurpose enum acl_observation_stage) and do it only once in the
loop. This could also lead to replacement of struct acl_tier with array.
WDYT?

I had an idea when writing this patch that at the beginning of build_acls(), we could loop through all ACLs and sort them into ingress, ingress after lb, and egress. This isn't the exact structure I was thinking, but it would be something like this:

    struct acl_stage {
        enum ovn_stage eval_stage;
        enum ovn_stage sample_stage;
        enum ovn_stage action_stage;
        enum ovn_stage reject_stage;
        uint32_t num_tiers;
        struct nbrec_acl **acls;
    };

Then we would declare three structs and fill them in with the appropriate details:

    struct acl_stage ingress_pre_lb;
    struct acl_stage ingress_post_lb;
    struct acl_stage egress;

This way, we are not constantly re-evaluating ingress vs egress, pre-lb vs post-lb, etc. We do it once, sort the ACLs into their acl_stage structs, and pass them appropriately to generate the logical flows. And if (god forbid) we add a new place where ACLs are evaluated in a logical switch, it would just be a matter of creating a new acl_stage struct and filling its details in to get the logical flows written. It would make consider_acl() much more streamlined since it would reference fields on struct acl_stage instead of using a bunch of if statements to determine the logic.

I ended up not doing this because this felt more like a refactor than a bug fix. However, I agree that the current processing is wasteful and should be addressed.


    +
      static void
      build_acls(const struct ls_stateful_record *ls_stateful_rec,
                 const struct ovn_datapath *od,
    @@ -7639,8 +7661,9 @@ build_acls(const struct ls_stateful_record
    *ls_stateful_rec,
              build_acl_log_related_flows(od, lflows, acl, has_stateful,
                                          meter_groups, &match, &actions,
                                          lflow_ref);
    +        uint64_t max_acl_tier =
    choose_max_acl_tier(ls_stateful_rec, acl);
              consider_acl(lflows, od, acl, has_stateful,
    -                     meter_groups, ls_stateful_rec->max_acl_tier,
    +                     meter_groups, max_acl_tier,
                           &match, &actions, lflow_ref);
              build_acl_sample_flows(ls_stateful_rec, od, lflows, acl,
                                     &match, &actions, sampling_apps,
    @@ -7658,8 +7681,10 @@ build_acls(const struct ls_stateful_record
    *ls_stateful_rec,
                      build_acl_log_related_flows(od, lflows, acl,
    has_stateful,
                                                  meter_groups, &match,
    &actions,
                                                  lflow_ref);
    +                uint64_t max_acl_tier =
    choose_max_acl_tier(ls_stateful_rec,
    +                                                            acl);
                      consider_acl(lflows, od, acl, has_stateful,
    -                             meter_groups,
    ls_stateful_rec->max_acl_tier,
    +                             meter_groups, max_acl_tier,
                                   &match, &actions, lflow_ref);
                      build_acl_sample_flows(ls_stateful_rec, od,
    lflows, acl,
                                             &match, &actions,
    sampling_apps,
    diff --git a/tests/ovn-northd.at <http://ovn-northd.at>
    b/tests/ovn-northd.at <http://ovn-northd.at>
    index dcc3dbbc3..dd2ad2330 100644
    --- a/tests/ovn-northd.at <http://ovn-northd.at>
    +++ b/tests/ovn-northd.at <http://ovn-northd.at>
    @@ -13864,3 +13864,37 @@ check_no_redirect

      AT_CLEANUP
      ])
    +
    +OVN_FOR_EACH_NORTHD_NO_HV([
    +AT_SETUP([ACL mismatched tiers])
    +ovn_start
    +
    +check ovn-nbctl ls-add S1
    +
    +# Ingress pre-lb ACLs have only a tier 1 ACL configured.
    +# Ingress post-lb ACLs have tier up to 3 configured.
    +# Egress ACLs have up to tier 2 configured.
    +check ovn-nbctl --tier=1 acl-add S1 from-lport 1001 "tcp" allow
    +check ovn-nbctl --tier=3 --apply-after-lb acl-add  S1 from-lport
    1001 "tcp" allow
    +check ovn-nbctl --tier=2 acl-add S1 to-lport 1001 "tcp" allow
    +
    +# Ingress pre-lb ACLs should only ever increment the tier to 1.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_action | grep
    priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_in_acl_action   ), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=ingress,table=??);)
    +])
    +
    +# Ingress post-lb ACLs should increment the tier to 3.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_after_lb_action
    | grep priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=ingress,table=??);)
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
    next(pipeline=ingress,table=??);)
    +  table=??(ls_in_acl_after_lb_action), priority=500  ,
    match=(reg8[[30..31]] == 2), action=(reg8[[30..31]] = 3;
    next(pipeline=ingress,table=??);)
    +])
    +
    +# Egress ACLs should increment the tier to 2.
    +AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_out_acl_action | grep
    priority=500 | ovn_strip_lflows], [0], [dnl
    +  table=??(ls_out_acl_action  ), priority=500  ,
    match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
    next(pipeline=egress,table=??);)
    +  table=??(ls_out_acl_action  ), priority=500  ,
    match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
    next(pipeline=egress,table=??);)
    +])
    +
    +AT_CLEANUP
    +])
-- 2.45.2

    _______________________________________________
    dev mailing list
    [email protected] <mailto:[email protected]>
    https://mail.openvswitch.org/mailman/listinfo/ovs-dev
    <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>


Thanks,
Ales
--

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected] <mailto:[email protected]>

<https://red.ht/sig>


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to