On 5/26/25 2:18 PM, Q Kay wrote: > Hi Dumitru, > Hi Ice Bear,
> I think you got something wrong about the logical_switch_port id. >> The a2b9537d-d8a1-4cb9-9582-f41e49ed22a3 logical switch port is part of > the following port group. > This port does not belong to my two instances. It's just a port from > another instance. > > As I mentioned, my topology below, does not contain this id: > a2b9537d-d8a1-4cb9-9582-f41e49ed22a3. > > Logical switch 1 id: 70974da0-2e9d-469a-9782-455a0380ab95 > Logical switch 2 id: ec22da44-9964-49ff-9c29-770a26794ba4 > > Instance A: > port 1 (connect to ls1): 61a871bc-7709-4072-9991-8e3a1096b02a > port 2 (connect to ls2): 63d76c2b-2960-4a89-97ac-9f7a7d4bb718 > > Instance B: > port 1: 46848e3c-7a73-46ce-8b3a-b6331e14fc74 > port 2: 7d39750a-29d6-40df-b42b-54a17efcc423 > > You can check in the DB that all 4 ports above do not belong to any port > group. > I think there's some confusion. Let me try to clarify. If a logical switch has some ports, let's say LS1 = [LSP1, LSP2, LSP3]. If a port group is defined that includes ports from LS1, e.g., PG1 = [LSP2, LSP42, LSP84, ...] and if an ACL is applied to PG1 then that is _equivalent_ to configuring the ACL on LS1 (for all its ports). From our man page, in the port group section: <column name="acls"> Access control rules that apply to the port group. Applying an ACL to a port group has the same effect as applying the ACL to all logical lswitches that the ports of the port group belong to. </column> In your specific case, Logical switch 2 id ec22da44-9964-49ff-9c29-770a26794ba4 > ovn-nbctl --columns _uuid,name,ports list logical_switch > ec22da44-9964-49ff-9c29-770a26794ba4 _uuid : ec22da44-9964-49ff-9c29-770a26794ba4 name : neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9 ports : [0b9f5414-43bd-4499-9da5-a071ff6063fc, 12869fa4-2f1f-4c2f-bf65-60ce796a1d51, 63d76c2b-2960-4a89-97ac-9f7a7d4bb718, 7d39750a-29d6-40df-b42b-54a17efcc423, ebe1f8ac-2e13-4c90-b7aa-a8a6d352606b] has 5 switch ports. Out of these, 63d76c2b is the one connected to "instance A". I know that there are no port groups that include 63d76c2b but there is a port group that includes one of the other ports of LS2, that is port 12869fa4: > ovn-nbctl list logical_switch_port 12869fa4 _uuid : 12869fa4-2f1f-4c2f-bf65-60ce796a1d51 addresses : ["fa:16:3e:9e:4d:93 10.10.20.137"] dhcpv4_options : 159d49d0-964f-4ba6-aa58-dfbb8bfeb463 dhcpv6_options : [] dynamic_addresses : [] enabled : true external_ids : {"neutron:cidrs"="10.10.20.137/24", "neutron:device_id"="1cda8c1a-b594-4942-8273-557c1e88c666", "neutron:device_owner"="compute:nova", "neutron:host_id"=khangtt-osp-compute-01-84, "neutron:mtu"="", "neutron:network_name"=neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9, "neutron:port_capabilities"="", "neutron:port_name"="", "neutron:project_id"="7f19299bb3bd43d4978fff45783e4346", "neutron:revision_number"="4", "neutron:security_group_ids"="940e2484-bb38-463b-a15f-d05b9dc9f5f0", "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal} ha_chassis_group : [] mirror_rules : [] name : "a2b9537d-d8a1-4cb9-9582-f41e49ed22a3" options : {requested-chassis=khangtt-osp-compute-01-84} parent_name : [] peer : [] port_security : ["fa:16:3e:9e:4d:93 10.10.20.137"] tag : [] tag_request : [] type : "" up : false This is included in port group pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0: > ovn-nbctl list port_group pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 _uuid : 6d232961-a51c-48cb-aa4f-84eb3108c71f acls : [d7e20fdb-f613-4147-b605-64b8ffbe9742, dcae0790-6c86-4e4d-8f01-d9be12d26c48] external_ids : {"neutron:security_group_id"="940e2484-bb38-463b-a15f-d05b9dc9f5f0"} name : pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 ports : [12869fa4-2f1f-4c2f-bf65-60ce796a1d51, <<<<< HERE is the LS2 port 1972206b-327a-496b-88fc-d17625d013e1, 2fb22d1a-bbfc-4173-b6fc-1ae3adc5ddcd, 3947661b-4deb-4aed-bd15-65839933fea3, caf0fe63-61be-4b1a-b306-ff00fa578982, fbfaeb2b-6e42-458a-a65f-8d2ef29b8b69, fd662347-4013-4306-b222-e29545f866ec] And this port group has the following ACLs configured: > ovn-nbctl acl-list pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 from-lport 1002 (inport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && ip4) allow-related to-lport 1002 (outport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && ip4 && ip4.src == 0.0.0.0/0) allow-related As mentioned above, applying an ACL to a port group is equivalent to applying the ACL to all logical switches that have ports in the port group. So the two allow-related ACLs are _implicitly_ applied to LS2 _too_. Now, because the ACLs have action allow-related, that means _all_ traffic processed by LS2 _must_ go through conntrack (regardless of logical port). That's the only way we can ensure the semantic of allow-related (allow all packets on a session that has been matched by an allow-related ACL) is respected. It also means all "allow" ACLs on that switch act as "allow-related" too. > I hope you can check this out. > I understand this behavior might create confusion, however, this is documented and is the way OVN works when stateful (allow-related) ACLs are configured. Regards, Dumitru > > Best regards, > Ice Bear > > Vào Th 2, 26 thg 5, 2025 vào lúc 17:55 Dumitru Ceara <dce...@redhat.com> > đã viết: > >> On 5/26/25 12:31 PM, Q Kay wrote: >>> Hi Dumitru, >>> >> >> Hi Ice Bear, >> >>> I think this is the file you want. >> >> >> Yes, that's it, thanks! >> >>> Thanks for guiding me. >> >> No problem. >> >> So, after looking at the DB contents I see that logical switch 1 >> (70974da0-2e9d-469a-9782-455a0380ab95) has no ACLs applied (directly or >> indirectly through port groups). >> >> On the other hand, for logical switch 2: >> >>> ovn-nbctl show neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9 >> switch ec22da44-9964-49ff-9c29-770a26794ba4 >> (neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9) (aka Logical_switch_2) >> port b8f1e947-7d06-4899-8c1c-206e81e70e74 >> type: localport >> addresses: ["fa:16:3e:55:88:90 10.10.20.2"] >> port a2b9537d-d8a1-4cb9-9582-f41e49ed22a3 >> addresses: ["fa:16:3e:9e:4d:93 10.10.20.137"] >> port 97f2c854-44e9-4558-a0ef-81e42a08f414 >> addresses: ["fa:16:3e:81:ed:92 10.10.20.102", "unknown"] >> port 4b7aa4f3-d126-41b6-9f0e-591c6921698b >> addresses: ["fa:16:3e:72:fd:e5 10.10.20.41", "unknown"] >> port 43888846-637f-46e6-ad5d-0acd5e6d6064 >> addresses: ["unknown"] >> >> The a2b9537d-d8a1-4cb9-9582-f41e49ed22a3 logical switch port is part of >> the following port group: >> >>> ovn-nbctl list logical_switch_port 12869fa4-2f1f-4c2f-bf65-60ce796a1d51 >> _uuid : 12869fa4-2f1f-4c2f-bf65-60ce796a1d51 <<<<<< UUID >> addresses : ["fa:16:3e:9e:4d:93 10.10.20.137"] >> dhcpv4_options : 159d49d0-964f-4ba6-aa58-dfbb8bfeb463 >> dhcpv6_options : [] >> dynamic_addresses : [] >> enabled : true >> external_ids : {"neutron:cidrs"="10.10.20.137/24", >> "neutron:device_id"="1cda8c1a-b594-4942-8273-557c1e88c666", >> "neutron:device_owner"="compute:nova", >> "neutron:host_id"=khangtt-osp-compute-01-84, "neutron:mtu"="", >> "neutron:network_name"=neutron-6aba7876-b3bc-4d71-99bc-7b2644f326e9, >> "neutron:port_capabilities"="", "neutron:port_name"="", >> "neutron:project_id"="7f19299bb3bd43d4978fff45783e4346", >> "neutron:revision_number"="4", >> "neutron:security_group_ids"="940e2484-bb38-463b-a15f-d05b9dc9f5f0", >> "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"="", >> "neutron:vnic_type"=normal} >> ha_chassis_group : [] >> mirror_rules : [] >> name : "a2b9537d-d8a1-4cb9-9582-f41e49ed22a3" >> options : {requested-chassis=khangtt-osp-compute-01-84} >> parent_name : [] >> peer : [] >> port_security : ["fa:16:3e:9e:4d:93 10.10.20.137"] >> tag : [] >> tag_request : [] >> type : "" >> up : false >> >>> ovn-nbctl list port_group pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 >> _uuid : 6d232961-a51c-48cb-aa4f-84eb3108c71f >> acls : [d7e20fdb-f613-4147-b605-64b8ffbe9742, >> dcae0790-6c86-4e4d-8f01-d9be12d26c48] >> external_ids : >> {"neutron:security_group_id"="940e2484-bb38-463b-a15f-d05b9dc9f5f0"} >> name : pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 >> ports : [12869fa4-2f1f-4c2f-bf65-60ce796a1d51, >> 1972206b-327a-496b-88fc-d17625d013e1, 2fb22d1a-bbfc-4173-b6fc-1ae3adc5ddcd, >> 3947661b-4deb-4aed-bd15-65839933fea3, caf0fe63-61be-4b1a-b306-ff00fa578982, >> fbfaeb2b-6e42-458a-a65f-8d2ef29b8b69, fd662347-4013-4306-b222-e29545f866ec] >> >> And this port group does have allow-related (stateful) ACLs that require >> conntrack: >> >>> ovn-nbctl acl-list pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 >> from-lport 1002 (inport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && >> ip4) allow-related >> to-lport 1002 (outport == @pg_940e2484_bb38_463b_a15f_d05b9dc9f5f0 && >> ip4 && ip4.src == 0.0.0.0/0) allow-related >> >> So, as suspected before this explains why traffic works in one direction >> and doesn't work in the other direction. Only one logical switch has >> stateful ACLs and needs conntrack. >> >> This is an unsupported configuration (so not a bug). The only way to make >> it work is to set the use_ct_inv_match=false option in the NB. >> >> Just mentioning it again here to make sure it's not lost in the thread: >> "asymmetric conntrack" and use_ct_inv_match=false means the datapath might >> forward traffic with ct_state=+trk+inv and might cause HW offload to not >> work. >> >> If that's OK for the use case then it's fine to set the option in the NB >> database. >> >> Best regards, >> Dumitru >> >>> >>> Best regards, >>> Ice Bear >>> >>> Vào Th 2, 26 thg 5, 2025 vào lúc 17:05 Dumitru Ceara < >> dce...@redhat.com> >>> đã viết: >>> >>>> On 5/26/25 11:38 AM, Q Kay wrote: >>>>> Hi Dumitru, >>>>> >>>> >>>> Hi Ice Bear, >>>> >>>>> Here is the NB DB in JSON format (attachment). >>>>> >>>> >>>> Sorry, I think my request might have been confusing. >>>> >>>> I didn't mean running something like: >>>> ovsdb-client -f json dump <path-to-database-socket> >>>> >>>> Instead I meant just attaching the actual database file. That's a file >>>> (in json format) usually stored in /etc/ovn/ovnnb_db.db. For OpenStack >>>> that might be /var/lib/openvswitch/ovn/ovnnb_db.db on controller nodes. >>>> >>>> Hope that helps. >>>> >>>> Regards, >>>> Dumitru >>>> >>>>> Best regards, >>>>> Ice Bear >>>>> >>>>> Vào Th 2, 26 thg 5, 2025 vào lúc 16:10 Dumitru Ceara < >>>> dce...@redhat.com> >>>>> đã viết: >>>>> >>>>>> On 5/22/25 9:05 AM, Q Kay wrote: >>>>>>> Hi Dumitru, >>>>>>> >>>>>> >>>>>> Hi Ice Bear, >>>>>> >>>>>> Please keep the ovs-discuss mailing list in CC. >>>>>> >>>>>>> I am very willing to provide NB DB file for you (attached). >>>>>>> I will provide more information about the ports for you to check. >>>>>>> >>>>>>> Logical switch 1 id: 70974da0-2e9d-469a-9782-455a0380ab95 >>>>>>> Logical switch 2 id: ec22da44-9964-49ff-9c29-770a26794ba4 >>>>>>> >>>>>>> Instance A: >>>>>>> port 1 (connect to ls1): 61a871bc-7709-4072-9991-8e3a1096b02a >>>>>>> port 2 (connect to ls2): 63d76c2b-2960-4a89-97ac-9f7a7d4bb718 >>>>>>> >>>>>>> >>>>>>> Instance B: >>>>>>> port 1: 46848e3c-7a73-46ce-8b3a-b6331e14fc74 >>>>>>> port 2: 7d39750a-29d6-40df-b42b-54a17efcc423 >>>>>>> >>>>>> >>>>>> Thanks for the info. However, it's easier to investigate if you just >>>>>> share the actual NB DB (json) file instead of the ovsdb-client dump. >>>>>> It's probably located in a path similar to /etc/ovn/ovnnb_db.db. >>>>>> >>>>>> Like that I could just load it in a sandbox and run ovn-nbctl commands >>>>>> against it directly. >>>>>> >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Ice Bear >>>>>>> Vào Th 4, 21 thg 5, 2025 vào lúc 16:19 Dumitru Ceara < >>>>>> dce...@redhat.com> >>>>>>> đã viết: >>>>>>> >>>>>>>> On 5/21/25 5:16 AM, Q Kay wrote: >>>>>>>>> Hi Dumitru, >>>>>>>> >>>>>>>> Hi Ice Bear, >>>>>>>> >>>>>>>> CC: ovs-discuss@openvswitch.org >>>>>>>> >>>>>>>>> Thanks for your answer. First, I will address some of your >> questions. >>>>>>>>> >>>>>>>>>>> The critical evidence is in the failed flow, where we see: >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no), >>>>>>>>>>> packets:48, bytes:4704, used:0.940s, actions:drop' >>>>>>>>>>> The packet is being marked as invalid (+inv) and subsequently >>>>>> dropped. >>>>>>>>>>> It's a bit weird though that this isn't a +rpl traffic. Is this >>>> hit >>>>>> by >>>>>>>> the ICMP echo or by the ICMP echo-reply packet? >>>>>>>>> >>>>>>>>> This recirc hit by icmp echo reply packet. >>>>>>>>> >>>>>>>> >>>>>>>> OK, that's good. >>>>>>>> >>>>>>>>> I understand what you mean. The outgoing and return traffic from >>>>>>>>> different logical switches will be flagged as inv. If that's the >>>> case, >>>>>>>>> it will work correctly with TCP (both are dropped). But for ICMP, I >>>>>>>>> notice something a bit strange. >>>>>>>>> >>>>>>>>>>> My hypothesis is that the handling of ct_state flags is causing >> the >>>>>>>> return >>>>>>>>>>> traffic to be dropped. This may be because the outgoing and >> return >>>>>>>>>>> connections do not share the same logical_switch datapath. >>>>>>>>> >>>>>>>>> According to your reasoning, ICMP reply packets from a different >>>>>> logical >>>>>>>>> switch than the request packets will be dropped. However, in >>>> practice, >>>>>>>>> when I initiate an ICMP request from 6.6.6.6 <https://6.6.6.6> to >>>>>>>>> 5.5.5.5 <https://5.5.5.5>, the result I get is success (note that >>>> echo >>>>>>>>> request and reply come from different logical switches regardless >> of >>>>>>>>> whether they are initiated by 5.5.5.5 <https://5.5.5.5> or 6.6.6.6 >>>>>>>>> <https://6.6.6.6>). You can compare the two recirculation flows to >>>> see >>>>>>>>> this oddity. You can take a look at the attached image for better >>>>>>>>> visualization. >>>>>>>>> >>>>>>>> >>>>>>>> OK. From the ovn-trace command you shared >>>>>>>> >>>>>>>>> 2. Using OVN trace: >>>>>>>>> ovn-trace --no-leader-only 70974da0-2e9d-469a-9782-455a0380ab95 >>>> 'inport >>>>>>>> == >>>>>>>>> "319cd637-10fb-4b45-9708-d02beefd698a" && >> eth.src==fa:16:3e:ea:67:18 >>>> && >>>>>>>>> eth.dst==fa:16:3e:04:28:c7 && ip4.src==6.6.6.6 && ip4.dst==5.5.5.5 >> && >>>>>>>>> ip.proto==1 && ip.ttl==64' >>>>>>>> >>>>>>>> I'm guessing the fa:16:3e:ea:67:18 MAC is the one owned by 6.6.6.6. >>>>>>>> >>>>>>>> Now, after filtering only the ICMP ECHO reply flows in your initial >>>>>>>> datapath >>>>>>>> flow dump: >>>>>>>> >>>>>>>>> *For successful ping flow: 5.5.5.5 -> 6.6.6.6* >>>>>>>> >>>>>>>> Note: ICMP reply comes from 6.6.6.6 to 5.5.5.5 (B -> A). >>>>>>>> >>>>>>>>> *- On Compute 1 (containing source instance): * >>>>>>>>> >>>>>>>> >>>>>> >>>> >> 'recirc_id(0),tunnel(tun_id=0x2,src=10.10.10.85,dst=10.10.10.84,geneve({class=0x102,type=0x80,len=4,0xb000a/0x7fffffff}),flags(-df+csum+key)),in_port(9),eth(src=fa:16:3e:ea:67:18,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no),icmp(type=0/0xfe), >>>>>>>>> packets:55, bytes:5390, used:0.204s, actions:29' >>>>>>>> >>>>>>>> We see no conntrack fields in the match. So, based on the diagram >> you >>>>>>>> shared, >>>>>>>> I'm guessing there's no allow-related ACL or load balancer on >> logical >>>>>>>> switch 2. >>>>>>>> >>>>>>>> But then for the failed ping flow: >>>>>>>> >>>>>>>>> *For failed ping flow: 6.6.6.6 -> 5.5.5.5* >>>>>>>> >>>>>>>> Note: ICMP reply comes from 5.5.5.5 to 6.6.6.6 (A -> B). >>>>>>>> >>>>>>>>> *- On Compute 1: * >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >> 'recirc_id(0),in_port(28),eth(src=fa:16:3e:81:ed:92,dst=fa:16:3e:72:fd:e5),eth_type(0x0800),ipv4(proto=1,frag=no), >>>>>>>>> packets:48, bytes:4704, used:0.940s, >>>>>> actions:ct(zone=87),recirc(0x3d77)' >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >> 'recirc_id(0x3d77),in_port(28),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(),eth_type(0x0800),ipv4(frag=no), >>>>>>>>> packets:48, bytes:4704, used:0.940s, actions:drop' >>>>>>>> >>>>>>>> In this case we _do_ have conntrack fields in the match/actions. >>>>>>>> Is it possible that logical switch 1 has allow-related ACLs or LBs? >>>>>>>> >>>>>>>> On the TCP side of things: it's kind of hard to tell what's going on >>>>>>>> without having the complete configuration of your OVN deployment. >>>>>>>> >>>>>>>> NOTE: if an ACL is applied to a port group, that is equivalent to >>>>>> applying >>>>>>>> the ACL to all logical switches that have ports in that port group. >>>>>>>> >>>>>>>>>>> I'd say it's not a bug. However, if you want to change the >> default >>>>>>>>>>> behavior you can use the NB_Global.options:use_ct_inv_match=true >>>> knob >>>>>>>> to >>>>>>>>>>> allow +inv packets in the logical switch pipeline. >>>>>>>>> >>>>>>>>> I tried setting the option use_ct_inv_match=. The result is just as >>>> you >>>>>>>>> said, everything works successfully with both ICMP and TCP. >>>>>>>>> Based on this experiment, I suspect there might be a small bug when >>>> OVN >>>>>>>>> handles ICMP packets. Could you please let me know if my experiment >>>> and >>>>>>>>> reasoning are correct? >>>>>>>>> >>>>>>>> >>>>>>>> As said above, it really depends on the full configuration. Maybe >> we >>>>>> can >>>>>>>> tell more if you can share the NB database? Or at least if you >> share >>>>>> the >>>>>>>> ACLs applied on the two logical switches (or port groups). >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for your support. >>>>>>>>> >>>>>>>> >>>>>>>> No problem. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Ice Bear >>>>>>>> >>>>>>>> Regards, >>>>>>>> Dumitru >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss