On Tue, Dec 13, 2022 at 5:03 AM Dumitru Ceara <[email protected]> wrote:
>
> On 12/13/22 07:17, Han Zhou wrote:
> > On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote:
> >>
> >> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi
> >> <[email protected]> wrote:
> >>>
> >>> In the current codebase ct_commit {} action clears ct_state metadata
of
> >>> the incoming packet. This behaviour introduces an issue if we need to
> >>> check the connection tracking state in the subsequent pipeline stages,
> >>> e.g. for hairpin traffic:
> >>>
> >>> table=14(ls_in_pre_hairpin  ), priority=100  , match=(ip && ct.trk),
> > action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply();
> > next;)
> >>>
> >>> Fix the issue introducing ct_commit_continue action used to allow the
ct
> >>> packet to proceed in the pipeline instead of the original one.
> >>>
> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086
> >>> Signed-off-by: Lorenzo Bianconi <[email protected]>
> >>
> >> I've a few comments.
> >>
> >> Since the OVS action ct() supports specifying the recirculation table
> >> number,  I think we can do something similar in OVN too
> >> instead of just restricting to the next table.
> >>
> >> I'd suggest something similar on the v1 of this patch, but instead of
> >> the next flag,  ovn-northd can specify the table number.
> >>
> >> Something like   - ct_commit {..., table=x}  where 'x' is a valid
> >> table number in the stage where this action is used i.e restrict the
> >> jump to the table within the
> >> ingress or egress pipeline but not from ingress to egress or from
> >> egress to ingress.
> >>
> >> ovn-northd can use the same feature flag approach used in this patch
> >> to include this 'table' option.
> >>
> >> Also I'd suggest making use of this new version of ct_commit in all
> >> the places and not just in the specific ones.
> >>
> >> Later if there is a requirement to skip a few stages after committing
> >> the packet to conntrack we can do so without the need to add a new
> >> action or modify ct_commit.
> >>
> >> @Dumitru Ceara @Mark Michelson  @Han Zhou  Do you have any thoughts ?
> >> or objections ?
>
> Hi all,
>
> >
> > Hi Numan, Lorenzo, I agree with Numan that adding the table=x in
ct_commit
> > action is more flexible, but I have a different concern. Either the
> > ct_commit_continue implemented by this patch or the ct_commit
{...,table=x}
> > proposed by Numan would introduce an extra recirculation, which has a
> > significant cost for data-plane. Since apply-after-lb is now frequently
> > used for ACLs that implement egress network policies, it is better to
avoid
> > this penalty if possible. Another option to solve the problem is to move
> > the ls_in_stateful stage after the hairpin stages, and keep the current
> > ct_commit. Would that work?
> >
>
> I think you're right, Han.  We seem to add quite a few extra
recirculations.
> For example, the datapath flows for a SYN packet hitting a load balancer
> (dnats from 66.66.66.66:666 to 42.42.42.2:4242):
>
>
recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:1, bytes:66, used:1.180s, flags:.,
actions:ct(zone=6,nat),recirc(0x50)
>
recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54)
>
recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55)
>
recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51)
>
recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4
>
> Now, with Lorenzo's patch:
>
>
recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:1, bytes:66, used:1.893s, flags:.,
actions:ct(zone=6,nat),recirc(0x47)
>
recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48)
>
recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49)
> recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a)
>
recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b)
>
recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
packets:0, bytes:0, used:never,
actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c)
>
recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d)
>
recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never, actions:4
>
> I also think you're right about swapping the stages.  But because hairpin
> is part of load balancing shouldn't the stage order actually be as below?

Yes, we can move both ls_in_acl_after_lb and ls_in_stateful, to keep LB
related stages together.

Thanks,
Han

>
>     /* All LB related stuff grouped together. */
>     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_CHECK,  11, "ls_in_lb_aff_check")
 \
>     PIPELINE_STAGE(SWITCH, IN,  LB,            12, "ls_in_lb")
 \
>     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_LEARN,  13, "ls_in_lb_aff_learn")
 \
>     PIPELINE_STAGE(SWITCH, IN,  PRE_HAIRPIN,   14, "ls_in_pre_hairpin")
\
>     PIPELINE_STAGE(SWITCH, IN,  NAT_HAIRPIN,   15, "ls_in_nat_hairpin")
\
>     PIPELINE_STAGE(SWITCH, IN,  HAIRPIN,       16, "ls_in_hairpin")
\
>     /* All ACL related stuff grouped together. */
>     PIPELINE_STAGE(SWITCH, IN,  ACL_AFTER_LB,  17, "ls_in_acl_after_lb")
 \
>     PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      18, "ls_in_stateful")
 \
>
> This has to be properly tested though.
>
> Regards,
> Dumitru
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to