On Tue, Dec 13, 2022 at 1:20 PM Lorenzo Bianconi <[email protected]> wrote: > > > On Tue, Dec 13, 2022 at 5:03 AM Dumitru Ceara <[email protected]> wrote: > > > > > > On 12/13/22 07:17, Han Zhou wrote: > > > > On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote: > > > >> > > > >> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi > > > >> <[email protected]> wrote: > > > >>> > > > >>> In the current codebase ct_commit {} action clears ct_state metadata > > of > > > >>> the incoming packet. This behaviour introduces an issue if we need to > > > >>> check the connection tracking state in the subsequent pipeline stages, > > > >>> e.g. for hairpin traffic: > > > >>> > > > >>> table=14(ls_in_pre_hairpin ), priority=100 , match=(ip && ct.trk), > > > > action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); > > > > next;) > > > >>> > > > >>> Fix the issue introducing ct_commit_continue action used to allow the > > ct > > > >>> packet to proceed in the pipeline instead of the original one. > > > >>> > > > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086 > > > >>> Signed-off-by: Lorenzo Bianconi <[email protected]> > > > >> > > > >> I've a few comments. > > > >> > > > >> Since the OVS action ct() supports specifying the recirculation table > > > >> number, I think we can do something similar in OVN too > > > >> instead of just restricting to the next table. > > > >> > > > >> I'd suggest something similar on the v1 of this patch, but instead of > > > >> the next flag, ovn-northd can specify the table number. > > > >> > > > >> Something like - ct_commit {..., table=x} where 'x' is a valid > > > >> table number in the stage where this action is used i.e restrict the > > > >> jump to the table within the > > > >> ingress or egress pipeline but not from ingress to egress or from > > > >> egress to ingress. > > > >> > > > >> ovn-northd can use the same feature flag approach used in this patch > > > >> to include this 'table' option. > > > >> > > > >> Also I'd suggest making use of this new version of ct_commit in all > > > >> the places and not just in the specific ones. > > > >> > > > >> Later if there is a requirement to skip a few stages after committing > > > >> the packet to conntrack we can do so without the need to add a new > > > >> action or modify ct_commit. > > > >> > > > >> @Dumitru Ceara @Mark Michelson @Han Zhou Do you have any thoughts ? > > > >> or objections ? > > > > > > Hi all, > > > > > > > > > > > Hi Numan, Lorenzo, I agree with Numan that adding the table=x in > > ct_commit > > > > action is more flexible, but I have a different concern. Either the > > > > ct_commit_continue implemented by this patch or the ct_commit > > {...,table=x} > > > > proposed by Numan would introduce an extra recirculation, which has a > > > > significant cost for data-plane. Since apply-after-lb is now frequently > > > > used for ACLs that implement egress network policies, it is better to > > avoid > > > > this penalty if possible. Another option to solve the problem is to move > > > > the ls_in_stateful stage after the hairpin stages, and keep the current > > > > ct_commit. Would that work? > > > >
Thanks for bringing this very important point. I was under the (wrong) impression that the there will be a recirculation after ct_commit regardless of OVN setting the next table id or not. Thanks Numan > > > > > > I think you're right, Han. We seem to add quite a few extra > > recirculations. > > > For example, the datapath flows for a SYN packet hitting a load balancer > > > (dnats from 66.66.66.66:666 to 42.42.42.2:4242): > > > > > > > > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), > > packets:1, bytes:66, used:1.180s, flags:., > > actions:ct(zone=6,nat),recirc(0x50) > > > > > recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), > > packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54) > > > > > recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55) > > > > > recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src= > > 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51) > > > > > recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4 > > > > > > Now, with Lorenzo's patch: > > > > > > > > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), > > packets:1, bytes:66, used:1.893s, flags:., > > actions:ct(zone=6,nat),recirc(0x47) > > > > > recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), > > packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48) > > > > > recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49) > > > recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a) > > > > > recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b) > > > > > recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src= > > 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no), > > packets:0, bytes:0, used:never, > > actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c) > > > > > recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, > > actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d) > > > > > recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), > > packets:0, bytes:0, used:never, actions:4 > > > > > > I also think you're right about swapping the stages. But because hairpin > > > is part of load balancing shouldn't the stage order actually be as below? > > > > Yes, we can move both ls_in_acl_after_lb and ls_in_stateful, to keep LB > > related stages together. > > ack, I will look into it. > > Regards, > Lorenzo > > > > > Thanks, > > Han > > > > > > > > /* All LB related stuff grouped together. */ > > > PIPELINE_STAGE(SWITCH, IN, LB_AFF_CHECK, 11, "ls_in_lb_aff_check") > > \ > > > PIPELINE_STAGE(SWITCH, IN, LB, 12, "ls_in_lb") > > \ > > > PIPELINE_STAGE(SWITCH, IN, LB_AFF_LEARN, 13, "ls_in_lb_aff_learn") > > \ > > > PIPELINE_STAGE(SWITCH, IN, PRE_HAIRPIN, 14, "ls_in_pre_hairpin") > > \ > > > PIPELINE_STAGE(SWITCH, IN, NAT_HAIRPIN, 15, "ls_in_nat_hairpin") > > \ > > > PIPELINE_STAGE(SWITCH, IN, HAIRPIN, 16, "ls_in_hairpin") > > \ > > > /* All ACL related stuff grouped together. */ > > > PIPELINE_STAGE(SWITCH, IN, ACL_AFTER_LB, 17, "ls_in_acl_after_lb") > > \ > > > PIPELINE_STAGE(SWITCH, IN, STATEFUL, 18, "ls_in_stateful") > > \ > > > > > > This has to be properly tested though. > > > > > > Regards, > > > Dumitru > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
