On Tue, Dec 13, 2022 at 1:20 PM Lorenzo Bianconi
<[email protected]> wrote:
>
> > On Tue, Dec 13, 2022 at 5:03 AM Dumitru Ceara <[email protected]> wrote:
> > >
> > > On 12/13/22 07:17, Han Zhou wrote:
> > > > On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote:
> > > >>
> > > >> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi
> > > >> <[email protected]> wrote:
> > > >>>
> > > >>> In the current codebase ct_commit {} action clears ct_state metadata
> > of
> > > >>> the incoming packet. This behaviour introduces an issue if we need to
> > > >>> check the connection tracking state in the subsequent pipeline stages,
> > > >>> e.g. for hairpin traffic:
> > > >>>
> > > >>> table=14(ls_in_pre_hairpin  ), priority=100  , match=(ip && ct.trk),
> > > > action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply();
> > > > next;)
> > > >>>
> > > >>> Fix the issue introducing ct_commit_continue action used to allow the
> > ct
> > > >>> packet to proceed in the pipeline instead of the original one.
> > > >>>
> > > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086
> > > >>> Signed-off-by: Lorenzo Bianconi <[email protected]>
> > > >>
> > > >> I've a few comments.
> > > >>
> > > >> Since the OVS action ct() supports specifying the recirculation table
> > > >> number,  I think we can do something similar in OVN too
> > > >> instead of just restricting to the next table.
> > > >>
> > > >> I'd suggest something similar on the v1 of this patch, but instead of
> > > >> the next flag,  ovn-northd can specify the table number.
> > > >>
> > > >> Something like   - ct_commit {..., table=x}  where 'x' is a valid
> > > >> table number in the stage where this action is used i.e restrict the
> > > >> jump to the table within the
> > > >> ingress or egress pipeline but not from ingress to egress or from
> > > >> egress to ingress.
> > > >>
> > > >> ovn-northd can use the same feature flag approach used in this patch
> > > >> to include this 'table' option.
> > > >>
> > > >> Also I'd suggest making use of this new version of ct_commit in all
> > > >> the places and not just in the specific ones.
> > > >>
> > > >> Later if there is a requirement to skip a few stages after committing
> > > >> the packet to conntrack we can do so without the need to add a new
> > > >> action or modify ct_commit.
> > > >>
> > > >> @Dumitru Ceara @Mark Michelson  @Han Zhou  Do you have any thoughts ?
> > > >> or objections ?
> > >
> > > Hi all,
> > >
> > > >
> > > > Hi Numan, Lorenzo, I agree with Numan that adding the table=x in
> > ct_commit
> > > > action is more flexible, but I have a different concern. Either the
> > > > ct_commit_continue implemented by this patch or the ct_commit
> > {...,table=x}
> > > > proposed by Numan would introduce an extra recirculation, which has a
> > > > significant cost for data-plane. Since apply-after-lb is now frequently
> > > > used for ACLs that implement egress network policies, it is better to
> > avoid
> > > > this penalty if possible. Another option to solve the problem is to move
> > > > the ls_in_stateful stage after the hairpin stages, and keep the current
> > > > ct_commit. Would that work?
> > > >

Thanks for bringing this very important point.  I was under the
(wrong) impression that
the there will be a  recirculation after ct_commit regardless of OVN setting the
next table id or not.

Thanks
Numan

> > >
> > > I think you're right, Han.  We seem to add quite a few extra
> > recirculations.
> > > For example, the datapath flows for a SYN packet hitting a load balancer
> > > (dnats from 66.66.66.66:666 to 42.42.42.2:4242):
> > >
> > >
> > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> > packets:1, bytes:66, used:1.180s, flags:.,
> > actions:ct(zone=6,nat),recirc(0x50)
> > >
> > recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> > packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54)
> > >
> > recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55)
> > >
> > recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
> > 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51)
> > >
> > recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4
> > >
> > > Now, with Lorenzo's patch:
> > >
> > >
> > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> > packets:1, bytes:66, used:1.893s, flags:.,
> > actions:ct(zone=6,nat),recirc(0x47)
> > >
> > recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
> > packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48)
> > >
> > recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49)
> > > recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a)
> > >
> > recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b)
> > >
> > recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=
> > 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c)
> > >
> > recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never,
> > actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d)
> > >
> > recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
> > packets:0, bytes:0, used:never, actions:4
> > >
> > > I also think you're right about swapping the stages.  But because hairpin
> > > is part of load balancing shouldn't the stage order actually be as below?
> >
> > Yes, we can move both ls_in_acl_after_lb and ls_in_stateful, to keep LB
> > related stages together.
>
> ack, I will look into it.
>
> Regards,
> Lorenzo
>
> >
> > Thanks,
> > Han
> >
> > >
> > >     /* All LB related stuff grouped together. */
> > >     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_CHECK,  11, "ls_in_lb_aff_check")
> >  \
> > >     PIPELINE_STAGE(SWITCH, IN,  LB,            12, "ls_in_lb")
> >  \
> > >     PIPELINE_STAGE(SWITCH, IN,  LB_AFF_LEARN,  13, "ls_in_lb_aff_learn")
> >  \
> > >     PIPELINE_STAGE(SWITCH, IN,  PRE_HAIRPIN,   14, "ls_in_pre_hairpin")
> > \
> > >     PIPELINE_STAGE(SWITCH, IN,  NAT_HAIRPIN,   15, "ls_in_nat_hairpin")
> > \
> > >     PIPELINE_STAGE(SWITCH, IN,  HAIRPIN,       16, "ls_in_hairpin")
> > \
> > >     /* All ACL related stuff grouped together. */
> > >     PIPELINE_STAGE(SWITCH, IN,  ACL_AFTER_LB,  17, "ls_in_acl_after_lb")
> >  \
> > >     PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      18, "ls_in_stateful")
> >  \
> > >
> > > This has to be properly tested though.
> > >
> > > Regards,
> > > Dumitru
> > >
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to