On Tue, Dec 13, 2022 at 5:03 AM Dumitru Ceara <[email protected]> wrote: > > On 12/13/22 07:17, Han Zhou wrote: > > On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote: > >> > >> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi > >> <[email protected]> wrote: > >>> > >>> In the current codebase ct_commit {} action clears ct_state metadata of > >>> the incoming packet. This behaviour introduces an issue if we need to > >>> check the connection tracking state in the subsequent pipeline stages, > >>> e.g. for hairpin traffic: > >>> > >>> table=14(ls_in_pre_hairpin ), priority=100 , match=(ip && ct.trk), > > action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); > > next;) > >>> > >>> Fix the issue introducing ct_commit_continue action used to allow the ct > >>> packet to proceed in the pipeline instead of the original one. > >>> > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086 > >>> Signed-off-by: Lorenzo Bianconi <[email protected]> > >> > >> I've a few comments. > >> > >> Since the OVS action ct() supports specifying the recirculation table > >> number, I think we can do something similar in OVN too > >> instead of just restricting to the next table. > >> > >> I'd suggest something similar on the v1 of this patch, but instead of > >> the next flag, ovn-northd can specify the table number. > >> > >> Something like - ct_commit {..., table=x} where 'x' is a valid > >> table number in the stage where this action is used i.e restrict the > >> jump to the table within the > >> ingress or egress pipeline but not from ingress to egress or from > >> egress to ingress. > >> > >> ovn-northd can use the same feature flag approach used in this patch > >> to include this 'table' option. > >> > >> Also I'd suggest making use of this new version of ct_commit in all > >> the places and not just in the specific ones. > >> > >> Later if there is a requirement to skip a few stages after committing > >> the packet to conntrack we can do so without the need to add a new > >> action or modify ct_commit. > >> > >> @Dumitru Ceara @Mark Michelson @Han Zhou Do you have any thoughts ? > >> or objections ? > > Hi all, > > > > > Hi Numan, Lorenzo, I agree with Numan that adding the table=x in ct_commit > > action is more flexible, but I have a different concern. Either the > > ct_commit_continue implemented by this patch or the ct_commit {...,table=x} > > proposed by Numan would introduce an extra recirculation, which has a > > significant cost for data-plane. Since apply-after-lb is now frequently > > used for ACLs that implement egress network policies, it is better to avoid > > this penalty if possible. Another option to solve the problem is to move > > the ls_in_stateful stage after the hairpin stages, and keep the current > > ct_commit. Would that work? > > > > I think you're right, Han. We seem to add quite a few extra recirculations. > For example, the datapath flows for a SYN packet hitting a load balancer > (dnats from 66.66.66.66:666 to 42.42.42.2:4242): > > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), packets:1, bytes:66, used:1.180s, flags:., actions:ct(zone=6,nat),recirc(0x50) > recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54) > recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55) > recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src= 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51) > recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4 > > Now, with Lorenzo's patch: > > recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), packets:1, bytes:66, used:1.893s, flags:., actions:ct(zone=6,nat),recirc(0x47) > recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666), packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48) > recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49) > recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a) > recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b) > recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src= 42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c) > recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d) > recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:4 > > I also think you're right about swapping the stages. But because hairpin > is part of load balancing shouldn't the stage order actually be as below?
Yes, we can move both ls_in_acl_after_lb and ls_in_stateful, to keep LB related stages together. Thanks, Han > > /* All LB related stuff grouped together. */ > PIPELINE_STAGE(SWITCH, IN, LB_AFF_CHECK, 11, "ls_in_lb_aff_check") \ > PIPELINE_STAGE(SWITCH, IN, LB, 12, "ls_in_lb") \ > PIPELINE_STAGE(SWITCH, IN, LB_AFF_LEARN, 13, "ls_in_lb_aff_learn") \ > PIPELINE_STAGE(SWITCH, IN, PRE_HAIRPIN, 14, "ls_in_pre_hairpin") \ > PIPELINE_STAGE(SWITCH, IN, NAT_HAIRPIN, 15, "ls_in_nat_hairpin") \ > PIPELINE_STAGE(SWITCH, IN, HAIRPIN, 16, "ls_in_hairpin") \ > /* All ACL related stuff grouped together. */ > PIPELINE_STAGE(SWITCH, IN, ACL_AFTER_LB, 17, "ls_in_acl_after_lb") \ > PIPELINE_STAGE(SWITCH, IN, STATEFUL, 18, "ls_in_stateful") \ > > This has to be properly tested though. > > Regards, > Dumitru > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
