On 12/13/22 07:17, Han Zhou wrote:
> On Mon, Dec 12, 2022 at 8:52 AM Numan Siddique <[email protected]> wrote:
>>
>> On Thu, Dec 8, 2022 at 12:15 PM Lorenzo Bianconi
>> <[email protected]> wrote:
>>>
>>> In the current codebase ct_commit {} action clears ct_state metadata of
>>> the incoming packet. This behaviour introduces an issue if we need to
>>> check the connection tracking state in the subsequent pipeline stages,
>>> e.g. for hairpin traffic:
>>>
>>> table=14(ls_in_pre_hairpin ), priority=100 , match=(ip && ct.trk),
> action=(reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply();
> next;)
>>>
>>> Fix the issue introducing ct_commit_continue action used to allow the ct
>>> packet to proceed in the pipeline instead of the original one.
>>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2103086
>>> Signed-off-by: Lorenzo Bianconi <[email protected]>
>>
>> I've a few comments.
>>
>> Since the OVS action ct() supports specifying the recirculation table
>> number, I think we can do something similar in OVN too
>> instead of just restricting to the next table.
>>
>> I'd suggest something similar on the v1 of this patch, but instead of
>> the next flag, ovn-northd can specify the table number.
>>
>> Something like - ct_commit {..., table=x} where 'x' is a valid
>> table number in the stage where this action is used i.e restrict the
>> jump to the table within the
>> ingress or egress pipeline but not from ingress to egress or from
>> egress to ingress.
>>
>> ovn-northd can use the same feature flag approach used in this patch
>> to include this 'table' option.
>>
>> Also I'd suggest making use of this new version of ct_commit in all
>> the places and not just in the specific ones.
>>
>> Later if there is a requirement to skip a few stages after committing
>> the packet to conntrack we can do so without the need to add a new
>> action or modify ct_commit.
>>
>> @Dumitru Ceara @Mark Michelson @Han Zhou Do you have any thoughts ?
>> or objections ?
Hi all,
>
> Hi Numan, Lorenzo, I agree with Numan that adding the table=x in ct_commit
> action is more flexible, but I have a different concern. Either the
> ct_commit_continue implemented by this patch or the ct_commit {...,table=x}
> proposed by Numan would introduce an extra recirculation, which has a
> significant cost for data-plane. Since apply-after-lb is now frequently
> used for ACLs that implement egress network policies, it is better to avoid
> this penalty if possible. Another option to solve the problem is to move
> the ls_in_stateful stage after the hairpin stages, and keep the current
> ct_commit. Would that work?
>
I think you're right, Han. We seem to add quite a few extra recirculations.
For example, the datapath flows for a SYN packet hitting a load balancer
(dnats from 66.66.66.66:666 to 42.42.42.2:4242):
recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:1, bytes:66, used:1.180s, flags:., actions:ct(zone=6,nat),recirc(0x50)
recirc_id(0x50),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x54)
recirc_id(0x54),dp_hash(0x7/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x55)
recirc_id(0x55),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0/0x1,nat(src)),set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x51)
recirc_id(0x51),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never, actions:ct(commit,zone=5,mark=0/0x1,nat(src)),4
Now, with Lorenzo's patch:
recirc_id(0),in_port(5),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:1, bytes:66, used:1.893s, flags:., actions:ct(zone=6,nat),recirc(0x47)
recirc_id(0x47),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(dst=66.66.66.66,proto=6,frag=no),tcp(dst=666),
packets:0, bytes:0, used:never, actions:hash(l4(0)),recirc(0x48)
recirc_id(0x48),dp_hash(0x1/0xf),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0x2/0x2,nat(dst=42.42.42.2:4242)),recirc(0x49)
recirc_id(0x49),in_port(5),eth(),eth_type(0x0800),ipv4(frag=no), packets:0,
bytes:0, used:never, actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4a)
recirc_id(0x4a),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.3,proto=6,frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=6,mark=0/0x1,nat(src)),recirc(0x4b)
recirc_id(0x4b),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:00:02,dst=00:00:00:00:01:00),eth_type(0x0800),ipv4(src=42.42.42.2/255.255.255.254,dst=42.42.42.2,proto=6,ttl=64,frag=no),
packets:0, bytes:0, used:never,
actions:ct_clear,set(eth(src=00:00:00:00:01:00,dst=00:00:00:00:00:01)),set(ipv4(ttl=63)),ct(zone=5,nat),recirc(0x4c)
recirc_id(0x4c),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=00:00:00:00:01:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never,
actions:ct(commit,zone=5,mark=0/0x1,nat(src)),recirc(0x4d)
recirc_id(0x4d),in_port(5),eth(dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
packets:0, bytes:0, used:never, actions:4
I also think you're right about swapping the stages. But because hairpin
is part of load balancing shouldn't the stage order actually be as below?
/* All LB related stuff grouped together. */
PIPELINE_STAGE(SWITCH, IN, LB_AFF_CHECK, 11, "ls_in_lb_aff_check") \
PIPELINE_STAGE(SWITCH, IN, LB, 12, "ls_in_lb") \
PIPELINE_STAGE(SWITCH, IN, LB_AFF_LEARN, 13, "ls_in_lb_aff_learn") \
PIPELINE_STAGE(SWITCH, IN, PRE_HAIRPIN, 14, "ls_in_pre_hairpin") \
PIPELINE_STAGE(SWITCH, IN, NAT_HAIRPIN, 15, "ls_in_nat_hairpin") \
PIPELINE_STAGE(SWITCH, IN, HAIRPIN, 16, "ls_in_hairpin") \
/* All ACL related stuff grouped together. */
PIPELINE_STAGE(SWITCH, IN, ACL_AFTER_LB, 17, "ls_in_acl_after_lb") \
PIPELINE_STAGE(SWITCH, IN, STATEFUL, 18, "ls_in_stateful") \
This has to be properly tested though.
Regards,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev