On 5/19/25 12:20 PM, Q Kay via discuss wrote:
> Attached topology
> 
> Vào Th 2, 19 thg 5, 2025 vào lúc 17:19 Q Kay <tqkhang...@gmail.com> đã
> viết:
> 
>> Dear OVN Team,
>>

Hi Ice Bear,

>> I would like to report an issue observed with OVN networking related to
>> asymmetric routing. The problem occurs when using instances to transit
>> traffic between two routed logical switch, and appears to be caused by OVN
>> connection tracking, which I wish to bypass for stateless forwarding.
>>
>> Environment Information
>>
>>    - OVN Version: 24.03.2 (same issue observed on 24.09).
>>    - Port security disabled.
>>
>> Issue Description
>>
>> I have two instances, each with a loopback IP configured (5.5.5.5 on
>> Instance A and 6.6.6.6 on Instance B), deployed on different compute nodes
>> (Compute 1 and Compute 2 respectively). The instances are connected to two
>> different networks (10.10.10.0/24 and 10.10.20.0/24).
>> I have configured static routes on both instances as follows:
>>
>>    - Instance A: Route 6.6.6.6/32 via 10.10.10.218
>>    - Instance B: Route 5.5.5.5/32 via 10.10.20.41
>>
>>
>> Topology is in attached file below.
>> Expected Behavior
>> I should be able to communicate using ICMP. between the two endpoint IPs
>> (5.5.5.5 and 6.6.6.6) with the routing path as configured above.
>> ICMP:
>>
>>    - On Instance A: ping 6.6.6.6 -I 5.5.5.5 (using 5.5.5.5 as source IP)
>>    => should succeed
>>    - On Instance B: ping 5.5.5.5 -I 6.6.6.6 (using 6.6.6.6 as source IP)
>>    => should succeed
>>
>>
>> Actual Behavior
>> When attempting to ping between these loopback IPs, I observe that traffic
>> only works in one direction:
>>
>>    - On Instance A: ping 6.6.6.6 -I 5.5.5.5 (using 5.5.5.5 as source IP)
>>    => fails
>>    - On Instance B: ping 5.5.5.5 -I 6.6.6.6 (using 6.6.6.6 as source IP)
>>    => succeeds
>>
>>
>> Despite disabling port security and ensuring necessary routes are
>> configured, the asymmetric routing scenario still fails in one direction in
>> ICMP, and both failed in TCP. I have verified that packet handling at the
>> instance level is working correctly (confirmed with tcpdump at the tap
>> port).
>> I've tried moving both instances to a single compute node, but the same
>> issue still occurs.
>> Troubleshooting Steps1. Reversed routing direction:
>>
>>    - On Instance A: route 6.6.6.6/32 via 10.10.10.78
>>       - On Instance B: route 5.5.5.5/32 via 10.10.20.102 => Result: Ping
>>       from A to B succeeds, from B to A fails (opposite of initial results)
>>
>> 2. Using OVN trace:
>> ovn-trace --no-leader-only 70974da0-2e9d-469a-9782-455a0380ab95 'inport ==
>> "319cd637-10fb-4b45-9708-d02beefd698a" && eth.src==fa:16:3e:ea:67:18 &&
>> eth.dst==fa:16:3e:04:28:c7 && ip4.src==6.6.6.6 && ip4.dst==5.5.5.5 &&
>> ip.proto==1 && ip.ttl==64'
>>
>> *Output*:
>> ingress(dp="A", inport="319cd6") 0. ls_in_check_port_sec: priority 50
>> reg0[15] = check_in_port_sec(); next; 2. ls_in_lookup_fdb: inport ==
>> "319cd6", priority 100 reg0[11] = lookup_fdb(inport, eth.src); next; 27.
>> ls_in_l2_lkup: eth.dst == fa:16:3e:04:28:c7, priority 50 outport =
>> "869b33"; output;
>> egress(dp="A", inport="319cd6", outport="869b33") 9.
>> ls_out_check_port_sec: priority 0 reg0[15] = check_out_port_sec(); next;
>> 10. ls_out_apply_port_sec: priority 0 output; /* output to "869b33" */
>>
>> 3. Examining recirculation to identify where my flow is being dropped
>> *For successful ping flow: 5.5.5.5 -> 6.6.6.6*
>> *- On Compute 1 (containing source instance): *
>>
>> 'recirc_id(0x3d71),in_port(28),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(src=
>> 4.0.0.0/252.0.0.0,dst=0.0.0.0/248.0.0.0,proto=1,tos=0/0x3,frag=no),
>> packets:55, bytes:5390, used:0.205s,
>> actions:ct(commit,zone=87,mark=0/0x1,nat(src)),set(tunnel(tun_id=0x6,dst=10.10.10.85,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x50006}),flags(df|csum|key))),9'
>>
>> 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no),
>> packets:55, bytes:5390, used:0.205s, actions:ct(zone=87),recirc(0x3d71)'
>>
>> 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=0/0xfe),
>> packets:55, bytes:5390, used:0.204s, actions:29'
>>
>> *- On Compute 2: *
>> 'recirc_id(0),tunnel(tun_id=0x6,src=10.10.10.84,dst=10.10.10.85,geneve({class=0x102,type=0x80,len=4,0x50006/0x7fffffff}),flags(-df+csum+key)),in_port(10),eth(src=fa:16:3e:81:ed:92,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8),
>> packets:193, bytes:18914, used:0.009s, actions:ct(zone=53),recirc(0x1791e)'
>>
>> 'recirc_id(0x1791e),tunnel(tun_id=0x6,src=10.10.10.84,dst=10.10.10.85,geneve({}{}),flags(-df+csum+key)),in_port(10),ct_state(+new-est-rel-rpl-inv+trk),ct_mark(0/0x1),eth(src=fa:16:3e:81:ed:92,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
>> packets:193, bytes:18914, used:0.009s,
>> actions:ct(commit,zone=53,mark=0/0x1,nat(src)),23'
>>
>> 'recirc_id(0),in_port(21),eth(src=fa:16:3e:ea:67:18,dst=fa:16:3e:04:28:c7),eth_type(0x0800),ipv4(src=6.6.6.6,dst=5.5.5.5,proto=1,tos=0/0x3,frag=no),
>> packets:193, bytes:18914, used:0.008s,
>> actions:set(tunnel(tun_id=0x2,dst=10.10.10.84,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb000a}),flags(df|csum|key))),10'
>>
>>
>> *For failed ping flow: 6.6.6.6 -> 5.5.5.5*
>> *- On Compute 2 (containing source instance): *
>> 'recirc_id(0),in_port(21),eth(src=fa:16:3e:ea:67:18,dst=fa:16:3e:04:28:c7),eth_type(0x0800),ipv4(src=6.6.6.6,dst=5.5.5.5,proto=1,tos=0/0x3,frag=no),
>> packets:5, bytes:490, used:0.728s,
>> actions:set(tunnel(tun_id=0x2,dst=10.10.10.84,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0xb000a}),flags(df|csum|key))),10'
>>
>> *- On Compute 1: *
>> 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=8/0xf8),
>> packets:48, bytes:4704, used:0.940s, actions:29'
>>
>> 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no),
>> packets:48, bytes:4704, used:0.940s, actions:ct(zone=87),recirc(0x3d77)'
>>
>> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no),
>> packets:48, bytes:4704, used:0.940s, actions:drop'
>>

Thanks for all the details!

>> Observations
>> I've noticed that packet handling at the compute nodes is not consistent.

Actually, I'd argue that it is.

>> My hypothesis is that the handling of ct_state flags is causing the return
>> traffic to be dropped. This may be because the outgoing and return
>> connections do not share the same logical_switch datapath.

If original and reply paths of a connection are not processed by the
same logical switch then that's exactly the problem, you're right.

>> The critical evidence is in the failed flow, where we see:
>> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no),
>> packets:48, bytes:4704, used:0.940s, actions:drop'
>> The packet is being marked as invalid (+inv) and subsequently dropped.

It's a bit weird though that this isn't a +rpl traffic.  Is this hit by
the ICMP echo or by the ICMP echo-reply packet?

>> Impact
>> This unexplained packet drop significantly impacts my service when I use
>> instances for transit purpose in OVN environment. Although I have disabled
>> port security to use stateless mode, the behavior is not as expected.
>> Request for Clarification
>> Based on the situation described above, I have the following questions:
>>
>>    1. Is the packet drop behavior described above consistent with OVN's
>>    design?

If original and reply directions of a session (in conntrack terms) are
processed on different logical switches, then yes.

>>    2. If this is the expected behavior of OVN, please explain why packets
>>    are being dropped.

OVN switches drop all traffic marked as ct_state=+trk+inv by default.

>>    3. If this is not the expected behavior, could you confirm whether
>>    this is a bug that will be fixed in the future?
>>

I'd say it's not a bug.  However, if you want to change the default
behavior you can use the NB_Global.options:use_ct_inv_match=true knob to
allow +inv packets in the logical switch pipeline.

There's one caveat though: if you're using hardware offload some NICs
(e.g., NVIDIA CX-5/6) will not be able to offload the traffic it's
forwarded on ct_state=+trk+inv.

>>
>> I can provide additional information as needed. Please let me know if you
>> require any further details.
>> Thank you very much for your time and support. I greatly appreciate your
>> guidance to better understand OVN's behavior design here.
>>
>>
>> Best regards,
>> Ice Bear
>>

Regards,
Dumitru

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to