Re-adding the list and people from CC. On 6/27/21 8:16 AM, 贺鹏 wrote: > Hi, Ilya, > > sorry for replying so late, being really busy this month.
Busy time for everyone. BTW, I'm taking some time off next week. Maybe someone else will be able to take a look at the changes while I'm not here. My replies inline. > I've read through your comments. I personally prefer to keep the > number of megaflows as few as possible, as megaflow is becoming > expensive as the size of flow struct is large. > below are my experiments. > > Ilya Maximets <[email protected]> 于2021年5月22日周六 上午3:50写道: >> >> On 5/21/21 5:00 AM, 贺鹏 wrote: >>> Hi, Ilya >>> >>> >>> >>> Ilya Maximets <[email protected]> 于2021年5月19日周三 下午8:50写道: >>>> >>>> On 2/27/21 10:34 AM, Peng He wrote: >>>>> CT zone could be set from a field that is not included in frozen >>>>> metadata. Consider the example rules which are typically seen in >>>>> OpenStack security group rules: >>>>> >>>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0) >>>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2 >>>>> >>>>> The zone is set from the first rule's ct action. These two rules will >>>>> generate two megaflows: the first one uses zone=5 to query the CT module, >>>>> the second one sets the zone-id from the first megaflow and commit to CT. >>>>> >>>>> The current implementation will generate a megaflow that does not use >>>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as >>>>> zone is >>>>> set by an Imm not a field. >>>>> >>>>> Consider a situation that one changes the zone id (for example to 15) >>>>> in the first rule, however, still keep the second rule unchanged. During >>>>> this change, there is traffic hitting the two generated megaflows, the >>>>> revaldiator would revalidate all megaflows, however, the revalidator will >>>>> not change the second megaflow, because zone=5 is recorded in the >>>>> megaflow, so the xlate will still translate the commit action into zone=5, >>>>> and the new traffic will still commit to CT as zone=5, not zone=15, >>>>> resulting in taffic drops and other issues. >>>>> >>>>> Just like OVS set-field convention, if a field X is set by Y >>>>> (Y is a variable not an Imm), we should also mask Y as a match >>>>> in the generated megaflow. An exception is that if the zone-id is >>>>> set by the field that is included in the frozen state (i.e. regs) and this >>>>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped, >>>>> as the recirc_id is a hash of the values in these fields, and it will >>>>> change >>>>> following the changes of these fields. When the recirc_id changes, >>>>> all megaflows with the old recirc id will be invalid later. >>>>> >>>>> Fixes: 07659514c3 ("Add support for connection tracking.") >>>>> Reported-by: Sai Su <[email protected]> >>>>> Signed-off-by: Peng He <[email protected]> >>>>> --- >>>>> include/openvswitch/meta-flow.h | 1 + >>>>> lib/meta-flow.c | 13 ++++++++++ >>>>> ofproto/ofproto-dpif-xlate.c | 12 +++++++++ >>>>> tests/system-traffic.at | 45 +++++++++++++++++++++++++++++++++ >>>>> 4 files changed, 71 insertions(+) >>>>> >>>>> diff --git a/include/openvswitch/meta-flow.h >>>>> b/include/openvswitch/meta-flow.h >>>>> index 95e52e358..045dce8f5 100644 >>>>> --- a/include/openvswitch/meta-flow.h >>>>> +++ b/include/openvswitch/meta-flow.h >>>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field >>>>> *, >>>>> const union mf_value *mask, >>>>> struct flow *); >>>>> bool mf_is_tun_metadata(const struct mf_field *); >>>>> +bool mf_is_frozen_metadata(const struct mf_field *); >>>>> bool mf_is_pipeline_field(const struct mf_field *); >>>>> bool mf_is_set(const struct mf_field *, const struct flow *); >>>>> void mf_mask_field(const struct mf_field *, struct flow_wildcards *); >>>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c >>>>> index c808d205d..e03cd8d0c 100644 >>>>> --- a/lib/meta-flow.c >>>>> +++ b/lib/meta-flow.c >>>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf) >>>>> mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS; >>>>> } >>>>> >>>>> +bool >>>>> +mf_is_frozen_metadata(const struct mf_field *mf) >>>>> +{ >>>>> + if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) { >>>>> + return true; >>>>> + } >>>>> + >>>>> + if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) { >>>>> + return true; >>>>> + } >>>>> + return false; >>>>> +} >>>>> + >>>>> bool >>>>> mf_is_pipeline_field(const struct mf_field *mf) >>>>> { >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >>>>> index 7108c8a30..14d00db1e 100644 >>>>> --- a/ofproto/ofproto-dpif-xlate.c >>>>> +++ b/ofproto/ofproto-dpif-xlate.c >>>>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, >>>>> struct ofpact_conntrack *ofc, >>>>> >>>>> if (ofc->zone_src.field) { >>>>> zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); >>>>> + if (ctx->xin->frozen_state) { >>>>> + /* If the upcall is a resume of a recirculation, we only >>>>> need to >>>>> + * unwildcard the fields that are not in the >>>>> frozen_metadata, as >>>>> + * when the rules update, OVS will generate a new recirc_id, >>>>> + * which will invalidate the megaflow with old the recirc_id. >>>>> + */ >>>>> + if (!mf_is_frozen_metadata(ofc->zone_src.field)) { >>>>> + mf_mask_field(ofc->zone_src.field, ctx->wc); >>>>> + } >>>>> + } else { >>>>> + mf_mask_field(ofc->zone_src.field, ctx->wc); >>>>> + } >>>> >>>> IIUC, in current code above block is equal to just a single line: >>>> >>>> mf_mask_field(ofc->zone_src.field, ctx->wc); >>>> >>>> That is because zone is not part of a frozen metadata, right? >>> >>> Yes, basically this is the reason. >>> I think the original code only considers the case that the zone id is >>> set by IMM, not >>> by a reg. >>> >>>> >>>> Can we just remove all this extra logic and unwildcard unconditionally? >>>> I understand that this code might save us one match in case someday >>>> zone will be part of a frozen metadata, but is that really important? >>>> Is there any harm in unwildcarding this field unconditionally? >>> >>> I think the code can reduce the number of megaflows in case that the rule >>> sets the zone-id using IMM. But I think unconditionally un-wildcarding is >>> also >>> OK. >> >> From what I managed to test, if value is taken from any register >> populated by Imm, there are no extra matches in datapath flows. >> Maybe I'm missing some cases, though. >> >> I tried to experiment a bit with this code and the test case. >> First thing that I found is that we, probably, need to use >> mf_write_subfield_flow() instead of mf_mask_field(), because >> we should not unwildcard the whole field if only part of it >> used. >> >> Second thing is that I tried to write following set of flows >> that might not make much sense, but should probably work fine: >> >> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]])) >> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2 >> >> And what I see is that in resulted datapath flows there is no >> match on ct_label. This happens because masks of ct_label and >> ct_mark gets saved at the beginning of the function and restored >> later, i.e. our un-wildcarding is lost. > > Yes, good catch! > > Yes, I am using the belowing code: > > if (ofc->zone_src.field) { > zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); > + union mf_subvalue value; > + memset(&value, 0xff, sizeof(value)); > if (ctx->xin->frozen_state) { > /* If the upcall is a resume of a recirculation, we only need to > * unwildcard the fields that are not in the frozen_metadata, as > @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx > *ctx, struct ofpact_conntrack *ofc, > * which will invalidate the megaflow with old the recirc_id. > */ > if (!mf_is_frozen_metadata(ofc->zone_src.field)) { > - mf_mask_field(ofc->zone_src.field, ctx->wc); > + mf_write_subfield_flow(&ofc->zone_src, &value, > &ctx->wc->masks); > } > } else { > - mf_mask_field(ofc->zone_src.field, ctx->wc); > + mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks); > } > } else { > zone = ofc->zone_imm; > } > > Not sure if it fits the idiom usage of mf APIs. I'm not an expert in this part of the code, but this seems fine to me. I would re-order the definition part though, oterwise it's hard to read. Something like: if (ofc->zone_src.field) { union mf_subvalue value; memset(&value, 0xff, sizeof(value)); zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); if (ctx->xin->frozen_state) { ... > >> >> The third thing is that if I'm un-wildcarding before saving or >> after restoring, I see a bit weird datapath flow after modification >> of the first flow to: >> >> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]])) >> >> There is a new datapath flow with a correct ct_label match, but >> the old flow gets updated to have ct_label match on 0. This might >> make some sense for this set of flows, but I'm not sure. >> >> Before flow-mod: >> >> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), >> actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1) >> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), >> actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1 >> >> After: >> >> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), >> actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1) >> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), >> actions:ct(commit,label=0/0xffff),ovs-p1 >> >> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no), >> actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1 >> > > I do not encounter such an issue, my changes are here: Hmm. Maybe I did something wrong. Could you prepare v2, so we can continue discussion from there? Best regards, Ilya Maximets. > > compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc, > bool is_last_action) > { > - ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; > - uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; > - size_t ct_offset; > uint16_t zone; > - > uint16_t zone; > - > - /* Ensure that any prior actions are applied before composing the new > - * conntrack action. */ > - xlate_commit_actions(ctx); > - > - /* Process nested actions first, to populate the key. */ > - ctx->ct_nat_action = NULL; > - ctx->wc->masks.ct_mark = 0; > - ctx->wc->masks.ct_label = OVS_U128_ZERO; > - do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx, > - is_last_action, false); > - > if (ofc->zone_src.field) { > zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow); > + union mf_subvalue value; > + memset(&value, 0xff, sizeof(value)); > if (ctx->xin->frozen_state) { > /* If the upcall is a resume of a recirculation, we only need to > * unwildcard the fields that are not in the frozen_metadata, as > @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx > *ctx, struct ofpact_conntrack *ofc, > * which will invalidate the megaflow with old the recirc_id. > */ > if (!mf_is_frozen_metadata(ofc->zone_src.field)) { > - mf_mask_field(ofc->zone_src.field, ctx->wc); > + mf_write_subfield_flow(&ofc->zone_src, &value, > &ctx->wc->masks); > } > } else { > - mf_mask_field(ofc->zone_src.field, ctx->wc); > + mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks); > } > } else { > zone = ofc->zone_imm; > } > > + size_t ct_offset; > + ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label; > + uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark; > + /* Ensure that any prior actions are applied before composing the new > + * conntrack action. */ > + xlate_commit_actions(ctx); > + > + /* Process nested actions first, to populate the key. */ > + ctx->ct_nat_action = NULL; > + ctx->wc->masks.ct_mark = 0; > + ctx->wc->masks.ct_label = OVS_U128_ZERO; > + do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx, > + is_last_action, false); > + > + > > and my test is OK. > root@debian:~/ovs# ovs-appctl dpctl/dump-flows | grep label > ct_state(+trk),ct_label(0xffff0005/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), > packets:5, bytes:465, used:1.323s, flags:FP., > actions:ct(commit,zone=5),3 > ct_state(-trk),recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), > packets:11, bytes:1004, used:1.293s, flags:SFP., > actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x2) > ct_state(+trk),ct_label(0xffff000f/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no), > packets:5, bytes:465, used:1.293s, flags:FP., > actions:ct(commit,zone=15),3 > > > >> Best regards, Ilya Maximets. > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
