> On Tue, Dec 13, 2022 at 5:03 AM Dumitru Ceara <[email protected]> wrote:
> >
> > On 12/13/22 07:17, Han Zhou wrote:
> > > On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote:
> > >>
> > >> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi
> > >> <[email protected]> wrote:
> > >>>
> > >>> In the current codebase ct_commit {} action clears ct_state metadata
> of
> > >>> the incoming packet. This behaviour introduces an issue if we need to
> > >>> check the connection tracking state in the subsequent pipeline stages,
> > >>> e.g. for hairpin traffic:
> > >>>
> > >>> table=14(ls_in_pre_hairpin  ), priority=100  , match=(ip && ct.trk),
> > > action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply();
> > > next;)
> > >>>
> > >>> Fix the issue introducing ct_commit_continue action used to allow the
> ct
> > >>> packet to proceed in the pipeline instead of the original one.
> > >>>
> > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086
> > >>> Signed-off-by: Lorenzo Bianconi <[email protected]>
> > >>
> > >> I've a few comments.
> > >>
> > >> Since the OVS action ct() supports specifying the recirculation table
> > >> number,  I think we can do something similar in OVN too
> > >> instead of just restricting to the next table.
> > >>
> > >> I'd suggest something similar on the v1 of this patch, but instead of
> > >> the next flag,  ovn-northd can specify the table number.
> > >>
> > >> Something like   - ct_commit {..., table=x}  where 'x' is a valid
> > >> table number in the stage where this action is used i.e restrict the
> > >> jump to the table within the
> > >> ingress or egress pipeline but not from ingress to egress or from
> > >> egress to ingress.
> > >>
> > >> ovn-northd can use the same feature flag approach used in this patch
> > >> to include this 'table' option.
> > >>
> > >> Also I'd suggest making use of this new version of ct_commit in all
> > >> the places and not just in the specific ones.
> > >>
> > >> Later if there is a requirement to skip a few stages after committing
> > >> the packet to conntrack we can do so without the need to add a new
> > >> action or modify ct_commit.
> > >>
> > >> @Dumitru Ceara @Mark Michelson  @Han Zhou  Do you have any thoughts ?
> > >> or objections ?
> >
> > Hi all,
> >
> > >
> > > Hi Numan, Lorenzo, I agree with Numan that adding the table=x in
> ct_commit
> > > action is more flexible, but I have a different concern. Either the
> > > ct_commit_continue implemented by this patch or the ct_commit
> {...,table=x}
> > > proposed by Numan would introduce an extra recirculation, which has a
> > > significant cost for data-plane. Since apply-after-lb is now frequently
> > > used for ACLs that implement egress network policies, it is better to
> avoid
> > > this penalty if possible. Another option to solve the problem is to move
> > > the ls_in_stateful stage after the hairpin stages, and keep the current
> > > ct_commit. Would that work?
> > >
> >
> > I think you're right, Han.  We seem to add quite a few extra
> recirculations.
> > For example, the datapath flows for a SYN packet hitting a load balancer
> > (dnats from 66.66.66.66:666 to 42.42.42.2:4242):
> >
> >
> recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> packets:1, bytes:66, used:1.180s, flags:.,
> actions:ct(zone=6,nat),recirc(0x50)
> >
> recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54)
> >
> recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55)
> >
> recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
> 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51)
> >
> recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4
> >
> > Now, with Lorenzo's patch:
> >
> >
> recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> packets:1, bytes:66, used:1.893s, flags:.,
> actions:ct(zone=6,nat),recirc(0x47)
> >
> recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48)
> >
> recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49)
> > recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a)
> >
> recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b)
> >
> recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
> 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
> packets:0, bytes:0, used:never,
> actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c)
> >
> recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never,
> actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d)
> >
> recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
> packets:0, bytes:0, used:never, actions:4
> >
> > I also think you're right about swapping the stages.  But because hairpin
> > is part of load balancing shouldn't the stage order actually be as below?
> 
> Yes, we can move both ls_in_acl_after_lb and ls_in_stateful, to keep LB
> related stages together.

ack, I will look into it.

Regards,
Lorenzo

> 
> Thanks,
> Han
> 
> >
> >     /* All LB related stuff grouped together. */
> >     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_CHECK,  11, "ls_in_lb_aff_check")
>  \
> >     PIPELINE_STAGE(SWITCH, IN,  LB,            12, "ls_in_lb")
>  \
> >     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_LEARN,  13, "ls_in_lb_aff_learn")
>  \
> >     PIPELINE_STAGE(SWITCH, IN,  PRE_HAIRPIN,   14, "ls_in_pre_hairpin")
> \
> >     PIPELINE_STAGE(SWITCH, IN,  NAT_HAIRPIN,   15, "ls_in_nat_hairpin")
> \
> >     PIPELINE_STAGE(SWITCH, IN,  HAIRPIN,       16, "ls_in_hairpin")
> \
> >     /* All ACL related stuff grouped together. */
> >     PIPELINE_STAGE(SWITCH, IN,  ACL_AFTER_LB,  17, "ls_in_acl_after_lb")
>  \
> >     PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      18, "ls_in_stateful")
>  \
> >
> > This has to be properly tested though.
> >
> > Regards,
> > Dumitru
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to