On 10/8/24 06:30, Ales Musil wrote:
On Fri, Sep 27, 2024 at 7:15 PM Mark Michelson <[email protected]
<mailto:[email protected]>> wrote:
When ACL tiers were introduced, the code kept track of the highest ACL
tier so that when iterating through ACL tiers, we would not attempt to
advance the current tier past the highest configured tier.
Unfortunately, keeping track of a single max ACL tier doesn't work when
ACLs are evaluated in three separate places. ACLs can be evaluated on
ingress before load balancing, on ingress after load balancing, and on
egress. By only keeping track of a single max ACL tier, it means that we
could perform superfluous checks if one stage of ACLs has a higher max
tier than other stages. As an example, if ingress pre-load balancing
ACLs have a maximum tier of 1, and egress ACLs have a maximum tier of 2,
then it means that for all stages of ACLs, we will evaluate tiers 0, 1,
and 2 of ACLs, even though only one stage of ACLs uses tier 2.
From a pure functionality standpoint, this doesn't cause any issues.
Even if we advance the tier past the highest configured value, it
results in a no-op and the same net result happens.
However, the addition of sampling into ACLs has caused an unwanted side
effect. In the example scenario above, let's say the tier 1 ACL in the
ingress pre-load balancing stage evaluates to "pass". After the
evaluation, we send a sample for the "pass" result. We then advance
the tier to 2, then move back to ACL evaluation. There are no tier 2
ACLs, so we move on to the sampling stage again. We then send a second
sample for the previous "pass" result from tier 1. The result is
confusing since we've sent two samples for the same ACL evaluation.
To remedy this, we now track the max ACL tier in each of the stages
where ACLs are evaluated. Now there are no superfluous ACL evaluations
and no superfluous samples sent either.
Reported-at: https://issues.redhat.com/browse/FDP-760
<https://issues.redhat.com/browse/FDP-760>
Signed-off-by: Mark Michelson <[email protected]
<mailto:[email protected]>>
---
Hi Mark,
thank you for the patch. I'm not sure if this fixes the originally reported
issue. For that issue we should generate generic sample flows that also
match
on tier. That should result in samples being generated for the
appropriate tier.
Either way we should have a test case to mimic the behavior described by
the FDP-760. I have a few additional comments down below.
I think your suggestion would also fix the issue, but my method does as
well. If you check the ovn-trace output on the issue, you can see that
the only reason that ACLs get re-evaluated for tier 2 is that the
ACL_ACTION stage sends control back to the ACL_EVAL stage since it
mistakenly thinks there's another tier to evaluate. By not sending
control back to ACL_EVAL after tier 1, there is no chance of sending a
bogus sample since we won't reach the ACL_SAMPLE stage again. That being
said, I can incorporate your suggestion into v2 of the patch since there
may be other weird corner cases that could trigger phantom samples
besides a bad max ACL tier count.
You're correct about the test case. I had taken the shortcut of making
sure that evaluation wouldn't reach a phantom tier just by confirming
logical flows. But having a system test to ensure the proper behavior is
a good call.
northd/en-ls-stateful.c | 48 ++++++++++++++++++++++++++++++++++-------
northd/en-ls-stateful.h | 8 ++++++-
northd/northd.c | 47 ++++++++++++++++++++++++++++++----------
tests/ovn-northd.at <http://ovn-northd.at> | 34
+++++++++++++++++++++++++++++
4 files changed, 117 insertions(+), 20 deletions(-)
diff --git a/northd/en-ls-stateful.c b/northd/en-ls-stateful.c
index 44f74ea08..eb751d8dd 100644
--- a/northd/en-ls-stateful.c
+++ b/northd/en-ls-stateful.c
@@ -204,16 +204,18 @@ ls_stateful_port_group_handler(struct
engine_node *node, void *data_)
ovn_datapaths_find_by_index(input_data.ls_datapaths,
ls_stateful_rec->ls_index);
bool had_stateful_acl = ls_stateful_rec->has_stateful_acl;
- uint64_t max_acl_tier = ls_stateful_rec->max_acl_tier;
+ struct acl_tier old_max = ls_stateful_rec->max_acl_tier;
bool had_acls = ls_stateful_rec->has_acls;
bool modified = false;
ls_stateful_record_reinit(ls_stateful_rec, od, ls_pg,
input_data.ls_port_groups);
+ struct acl_tier new_max = ls_stateful_rec->max_acl_tier;
+
if ((had_stateful_acl != ls_stateful_rec->has_stateful_acl)
|| (had_acls != ls_stateful_rec->has_acls)
- || max_acl_tier != ls_stateful_rec->max_acl_tier) {
+ || memcmp(&old_max, &new_max, sizeof(old_max))) {
We have to be very careful about padding if we want to use
memcmp. Currently the struct is fine, but maybe we should
mention it in the comment?
Sure I can add a comment.
modified = true;
}
@@ -365,7 +367,8 @@ ls_stateful_record_set_acl_flags(struct
ls_stateful_record *ls_stateful_rec,
const struct ls_port_group_table
*ls_pgs)
{
ls_stateful_rec->has_stateful_acl = false;
- ls_stateful_rec->max_acl_tier = 0;
+ memset(&ls_stateful_rec->max_acl_tier, 0,
+ sizeof ls_stateful_rec->max_acl_tier);
ls_stateful_rec->has_acls = false;
if (ls_stateful_record_set_acl_flags_(ls_stateful_rec,
od->nbs->acls,
@@ -391,6 +394,36 @@ ls_stateful_record_set_acl_flags(struct
ls_stateful_record *ls_stateful_rec,
}
}
+static void
+update_ls_max_acl_tier(struct ls_stateful_record *ls_stateful_rec,
+ const struct nbrec_acl *acl)
+{
+ if (!acl->tier) {
+ return;
+ }
+
+ if (!strcmp(acl->direction, "from-lport")) {
+ if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
+ if (acl->tier >
ls_stateful_rec->max_acl_tier.ingress_post_lb) {
+ ls_stateful_rec->max_acl_tier.ingress_post_lb =
acl->tier;
+ }
We could use MAX(acl->tier, ls_stateful_rec->max_acl_tier.ingress_post_lb)
instead of the if statement, the same applies down below.
Thanks for the idea, I'm not sure why I didn't think of it :)
+ } else if (acl->tier >
ls_stateful_rec->max_acl_tier.ingress_pre_lb) {
+ ls_stateful_rec->max_acl_tier.ingress_pre_lb = acl->tier;
+ }
+ } else if (acl->tier > ls_stateful_rec->max_acl_tier.egress) {
+ ls_stateful_rec->max_acl_tier.egress = acl->tier;
+ }
+}
+
+static bool
+ls_acl_tiers_are_maxed_out(struct acl_tier *acl_tier,
+ uint64_t max_allowed_acl_tier)
+{
+ return acl_tier->ingress_post_lb == max_allowed_acl_tier &&
+ acl_tier->ingress_pre_lb == max_allowed_acl_tier &&
+ acl_tier->egress == max_allowed_acl_tier;
+}
+
static bool
ls_stateful_record_set_acl_flags_(struct ls_stateful_record
*ls_stateful_rec,
struct nbrec_acl **acls,
@@ -408,16 +441,15 @@ ls_stateful_record_set_acl_flags_(struct
ls_stateful_record *ls_stateful_rec,
ls_stateful_rec->has_acls = true;
for (size_t i = 0; i < n_acls; i++) {
const struct nbrec_acl *acl = acls[i];
- if (acl->tier > ls_stateful_rec->max_acl_tier) {
- ls_stateful_rec->max_acl_tier = acl->tier;
- }
+ update_ls_max_acl_tier(ls_stateful_rec, acl);
if (!ls_stateful_rec->has_stateful_acl
&& !strcmp(acl->action, "allow-related")) {
ls_stateful_rec->has_stateful_acl = true;
}
if (ls_stateful_rec->has_stateful_acl &&
- ls_stateful_rec->max_acl_tier ==
- nbrec_acl_col_tier.type.value.integer.max) {
+ ls_acl_tiers_are_maxed_out(
+ &ls_stateful_rec->max_acl_tier,
+ nbrec_acl_col_tier.type.value.integer.max)) {
return true;
}
}
diff --git a/northd/en-ls-stateful.h b/northd/en-ls-stateful.h
index eae4b08e2..18a7398a6 100644
--- a/northd/en-ls-stateful.h
+++ b/northd/en-ls-stateful.h
@@ -33,6 +33,12 @@
struct lflow_ref;
+struct acl_tier {
+ uint64_t ingress_pre_lb;
+ uint64_t ingress_post_lb;
+ uint64_t egress;
+};
+
struct ls_stateful_record {
struct hmap_node key_node;
@@ -46,7 +52,7 @@ struct ls_stateful_record {
bool has_stateful_acl;
bool has_lb_vip;
bool has_acls;
- uint64_t max_acl_tier;
+ struct acl_tier max_acl_tier;
/* 'lflow_ref' is used to reference logical flows generated for
* this ls_stateful record.
diff --git a/northd/northd.c b/northd/northd.c
index a267cd5f8..9d3edee32 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -7321,28 +7321,34 @@ build_acl_action_lflows(const struct
ls_stateful_record *ls_stateful_rec,
S_SWITCH_OUT_ACL_EVAL,
};
+ uint64_t max_acl_tiers[] = {
+ ls_stateful_rec->max_acl_tier.ingress_pre_lb,
+ ls_stateful_rec->max_acl_tier.ingress_post_lb,
+ ls_stateful_rec->max_acl_tier.egress,
+ };
+
ds_clear(actions);
ds_put_cstr(actions, REGBIT_ACL_VERDICT_ALLOW " = 0; "
REGBIT_ACL_VERDICT_DROP " = 0; "
REGBIT_ACL_VERDICT_REJECT " = 0; ");
- if (ls_stateful_rec->max_acl_tier) {
- ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
- }
size_t verdict_len = actions->length;
-
for (size_t i = 0; i < ARRAY_SIZE(stages); i++) {
enum ovn_stage stage = stages[i];
+ if (max_acl_tiers[i]) {
+ ds_put_cstr(actions, REG_ACL_TIER " = 0; ");
+ }
+ size_t verdict_tier_len = actions->length;
if (!ls_stateful_rec->has_acls) {
ovn_lflow_add(lflows, od, stage, 0, "1", "next;",
lflow_ref);
continue;
}
- ds_truncate(actions, verdict_len);
+ ds_truncate(actions, verdict_tier_len);
ds_put_cstr(actions, "next;");
ovn_lflow_add(lflows, od, stage, 1000,
REGBIT_ACL_VERDICT_ALLOW " == 1",
ds_cstr(actions),
lflow_ref);
- ds_truncate(actions, verdict_len);
+ ds_truncate(actions, verdict_tier_len);
ds_put_cstr(actions, debug_implicit_drop_action());
ovn_lflow_add(lflows, od, stage, 1000,
REGBIT_ACL_VERDICT_DROP " == 1",
@@ -7350,7 +7356,7 @@ build_acl_action_lflows(const struct
ls_stateful_record *ls_stateful_rec,
lflow_ref);
bool ingress = ovn_stage_get_pipeline(stage) == P_IN;
- ds_truncate(actions, verdict_len);
+ ds_truncate(actions, verdict_tier_len);
build_acl_reject_action(actions, ingress);
ovn_lflow_metered(lflows, od, stage, 1000,
@@ -7358,12 +7364,12 @@ build_acl_action_lflows(const struct
ls_stateful_record *ls_stateful_rec,
copp_meter_get(COPP_REJECT, od->nbs->copp,
meter_groups), lflow_ref);
- ds_truncate(actions, verdict_len);
+ ds_truncate(actions, verdict_tier_len);
ds_put_cstr(actions, default_acl_action);
ovn_lflow_add(lflows, od, stage, 0, "1", ds_cstr(actions),
lflow_ref);
struct ds tier_actions = DS_EMPTY_INITIALIZER;
- for (size_t j = 0; j < ls_stateful_rec->max_acl_tier; j++) {
+ for (size_t j = 0; j < max_acl_tiers[i]; j++) {
ds_clear(match);
ds_put_format(match, REG_ACL_TIER " == %"PRIuSIZE, j);
ds_clear(&tier_actions);
@@ -7375,6 +7381,7 @@ build_acl_action_lflows(const struct
ls_stateful_record *ls_stateful_rec,
ds_cstr(&tier_actions), lflow_ref);
}
ds_destroy(&tier_actions);
+ ds_truncate(actions, verdict_len);
}
}
@@ -7443,6 +7450,21 @@ build_acl_log_related_flows(const struct
ovn_datapath *od,
&acl->header_, lflow_ref);
}
+static uint64_t
+choose_max_acl_tier(const struct ls_stateful_record *ls_stateful_rec,
+ const struct nbrec_acl *acl)
+{
+ if (!strcmp(acl->direction, "from-lport")) {
+ if (smap_get_bool(&acl->options, "apply-after-lb", false)) {
+ return ls_stateful_rec->max_acl_tier.ingress_post_lb;
+ } else {
+ return ls_stateful_rec->max_acl_tier.ingress_pre_lb;
+ }
+ } else {
+ return ls_stateful_rec->max_acl_tier.egress;
+ }
+}
It would probably fit a separate patch, most the NAT related functions are
parsing the type/stage over and over again. It would help to have an enum
(maybe to repurpose enum acl_observation_stage) and do it only once in the
loop. This could also lead to replacement of struct acl_tier with array.
WDYT?
I had an idea when writing this patch that at the beginning of
build_acls(), we could loop through all ACLs and sort them into ingress,
ingress after lb, and egress. This isn't the exact structure I was
thinking, but it would be something like this:
struct acl_stage {
enum ovn_stage eval_stage;
enum ovn_stage sample_stage;
enum ovn_stage action_stage;
enum ovn_stage reject_stage;
uint32_t num_tiers;
struct nbrec_acl **acls;
};
Then we would declare three structs and fill them in with the
appropriate details:
struct acl_stage ingress_pre_lb;
struct acl_stage ingress_post_lb;
struct acl_stage egress;
This way, we are not constantly re-evaluating ingress vs egress, pre-lb
vs post-lb, etc. We do it once, sort the ACLs into their acl_stage
structs, and pass them appropriately to generate the logical flows. And
if (god forbid) we add a new place where ACLs are evaluated in a logical
switch, it would just be a matter of creating a new acl_stage struct and
filling its details in to get the logical flows written. It would make
consider_acl() much more streamlined since it would reference fields on
struct acl_stage instead of using a bunch of if statements to determine
the logic.
I ended up not doing this because this felt more like a refactor than a
bug fix. However, I agree that the current processing is wasteful and
should be addressed.
+
static void
build_acls(const struct ls_stateful_record *ls_stateful_rec,
const struct ovn_datapath *od,
@@ -7639,8 +7661,9 @@ build_acls(const struct ls_stateful_record
*ls_stateful_rec,
build_acl_log_related_flows(od, lflows, acl, has_stateful,
meter_groups, &match, &actions,
lflow_ref);
+ uint64_t max_acl_tier =
choose_max_acl_tier(ls_stateful_rec, acl);
consider_acl(lflows, od, acl, has_stateful,
- meter_groups, ls_stateful_rec->max_acl_tier,
+ meter_groups, max_acl_tier,
&match, &actions, lflow_ref);
build_acl_sample_flows(ls_stateful_rec, od, lflows, acl,
&match, &actions, sampling_apps,
@@ -7658,8 +7681,10 @@ build_acls(const struct ls_stateful_record
*ls_stateful_rec,
build_acl_log_related_flows(od, lflows, acl,
has_stateful,
meter_groups, &match,
&actions,
lflow_ref);
+ uint64_t max_acl_tier =
choose_max_acl_tier(ls_stateful_rec,
+ acl);
consider_acl(lflows, od, acl, has_stateful,
- meter_groups,
ls_stateful_rec->max_acl_tier,
+ meter_groups, max_acl_tier,
&match, &actions, lflow_ref);
build_acl_sample_flows(ls_stateful_rec, od,
lflows, acl,
&match, &actions,
sampling_apps,
diff --git a/tests/ovn-northd.at <http://ovn-northd.at>
b/tests/ovn-northd.at <http://ovn-northd.at>
index dcc3dbbc3..dd2ad2330 100644
--- a/tests/ovn-northd.at <http://ovn-northd.at>
+++ b/tests/ovn-northd.at <http://ovn-northd.at>
@@ -13864,3 +13864,37 @@ check_no_redirect
AT_CLEANUP
])
+
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([ACL mismatched tiers])
+ovn_start
+
+check ovn-nbctl ls-add S1
+
+# Ingress pre-lb ACLs have only a tier 1 ACL configured.
+# Ingress post-lb ACLs have tier up to 3 configured.
+# Egress ACLs have up to tier 2 configured.
+check ovn-nbctl --tier=1 acl-add S1 from-lport 1001 "tcp" allow
+check ovn-nbctl --tier=3 --apply-after-lb acl-add S1 from-lport
1001 "tcp" allow
+check ovn-nbctl --tier=2 acl-add S1 to-lport 1001 "tcp" allow
+
+# Ingress pre-lb ACLs should only ever increment the tier to 1.
+AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_action | grep
priority=500 | ovn_strip_lflows], [0], [dnl
+ table=??(ls_in_acl_action ), priority=500 ,
match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
next(pipeline=ingress,table=??);)
+])
+
+# Ingress post-lb ACLs should increment the tier to 3.
+AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_in_acl_after_lb_action
| grep priority=500 | ovn_strip_lflows], [0], [dnl
+ table=??(ls_in_acl_after_lb_action), priority=500 ,
match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
next(pipeline=ingress,table=??);)
+ table=??(ls_in_acl_after_lb_action), priority=500 ,
match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
next(pipeline=ingress,table=??);)
+ table=??(ls_in_acl_after_lb_action), priority=500 ,
match=(reg8[[30..31]] == 2), action=(reg8[[30..31]] = 3;
next(pipeline=ingress,table=??);)
+])
+
+# Egress ACLs should increment the tier to 2.
+AT_CHECK([ovn-sbctl lflow-list S1 | grep ls_out_acl_action | grep
priority=500 | ovn_strip_lflows], [0], [dnl
+ table=??(ls_out_acl_action ), priority=500 ,
match=(reg8[[30..31]] == 0), action=(reg8[[30..31]] = 1;
next(pipeline=egress,table=??);)
+ table=??(ls_out_acl_action ), priority=500 ,
match=(reg8[[30..31]] == 1), action=(reg8[[30..31]] = 2;
next(pipeline=egress,table=??);)
+])
+
+AT_CLEANUP
+])
--
2.45.2
_______________________________________________
dev mailing list
[email protected] <mailto:[email protected]>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
Thanks,
Ales
--
Ales Musil
Senior Software Engineer - OVN Core
Red Hat EMEA <https://www.redhat.com>
[email protected] <mailto:[email protected]>
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev