The flipping could explain connectivity issues for metadata. AFAIR the HA group membership is managed by Neutron itself, so if you see the group's `ha_chassis` list flipping, the problem is probably on the neutron side. Anything interesting in neutron logs?
You haven't confirmed that you have other compute nodes to host the external port. What is the a667f120-013c-4715-b049-0f53903ad9d9 chassis in relation to the sr-iov port? Is it the same chassis or a different chassis? Note that the external port chassis has to be a chassis that is NOT the one hosting the sr-iov port. I see in the original post that you set the cms options for the sr-iov hosting chassis. This is wrong, unless you have more compute nodes with the same setting in the cluster. Ihar On Sun, Jun 1, 2025 at 2:48 PM engineer2024 <engineerlinux2...@gmail.com> wrote: > also, the ovn-nb commands are showing fluctuating outputs, each time I > run these commands. these commands are run within a gap of 1 second > > -------- > # ovn-nbctl list ha_chassis_group | grep -A 4 > 411ce494-b5aa-4d74-a544-f7dfb9c048cc > _uuid : 411ce494-b5aa-4d74-a544-f7dfb9c048cc > external_ids : {"neutron:availability_zone_hints"=""} > ha_chassis : [] > name : neutron-extport-c27d408a-a926-4509-b707-39bc43732c05 > > > > # ovn-nbctl list ha_chassis_group | grep -A 4 > 411ce494-b5aa-4d74-a544-f7dfb9c048cc > _uuid : 411ce494-b5aa-4d74-a544-f7dfb9c048cc > external_ids : {"neutron:availability_zone_hints"=""} > ha_chassis : [a667f120-013c-4715-b049-0f53903ad9d9] > name : neutron-extport-c27d408a-a926-4509-b707-39bc43732c05 > > --------------- > > On Mon, Jun 2, 2025 at 12:04 AM engineer2024 <engineerlinux2...@gmail.com> > wrote: > >> Thanks for the reply.... >> >> I have tried it and it worked for the dhcp ip lease for the sriov >> external ovn port. >> >> But the metadata requests from the VM are not getting any replies. I have >> pasted the br-int flows of the extport chassis. >> >> ------ >> grep 'fa:16:3e:d2:18:19' flows >> >> cookie=0xb4c75793, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,conj_id=1857670037,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,tp_src=68,tp_dst=67 >> actions=controller(userdata=00.00.00.02.00.00.00.00.00.01.de.10.00.00.00.63.0a.25.34.a5.79.13.20.a9.fe.a9.fe.0a.25.34.0b.00.0a.25.34.01.00.0a.25.34.01.06.08.0a.25.e7.eb.0a.e3.64.11.33.04.00.00.a8.c0.1a.02.05.dc.01.04.ff.ff.fc.00.03.04.0a.25.34.01.36.04.0a.25.34.01,pause),resubmit(,31) >> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_dst=10.37.52.1,tp_src=68,tp_dst=67 >> actions=conjunction(1857670037,1/2) >> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_dst=255.255.255.255,tp_src=68,tp_dst=67 >> actions=conjunction(1857670037,1/2) >> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_src=0.0.0.0,tp_src=68,tp_dst=67 >> actions=conjunction(1857670037,2/2) >> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_src=10.37.52.165,tp_src=68,tp_dst=67 >> actions=conjunction(1857670037,2/2) >> cookie=0xd7d9689d, duration=0.526s, table=31, n_packets=0, n_bytes=0, >> priority=100,udp,reg0=0x8/0x8,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,tp_src=68,tp_dst=67 >> actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:d7:50:0e->eth_src,set_field:10.37.52.1->ip_src,set_field:67->udp_src,set_field:68->udp_dst,move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) >> cookie=0x3ca10142, duration=0.526s, table=41, n_packets=0, n_bytes=0, >> priority=170,reg10=0x400/0x400,reg15=0x1,metadata=0x1,dl_dst=fa:16:3e:d2:18:19 >> actions=set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,42) >> --------------- >> >> ---------- >> grep 10.72.46.3 flows >> >> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d3:67:02,nw_src=10.72.46.3,tp_src=68,tp_dst=67 >> actions=conjunction(1857670037,2/2) >> cookie=0x0, duration=2330.818s, table=46, n_packets=0, n_bytes=0, >> priority=2002,ip,reg0=0x80/0x80,metadata=0x1,nw_src=10.72.46.3 >> actions=conjunction(2270989993,1/2) >> cookie=0x0, duration=2330.817s, table=46, n_packets=0, n_bytes=0, >> priority=2002,ip,reg0=0x100/0x100,metadata=0x1,nw_src=10.72.46.3 >> actions=conjunction(2898587360,1/2) >> ----------- >> >> The sriov vm's ip is 10.72.46.3 and its mac addr is >> ''fa:16:3e:d2:18:19'. >> >> Also when pinging the metadata ip 169.254.169.254 from the vm, it is >> getting a single reply out of 30 reqs or so as shown below >> >> ------ >> # ip netns exec ovnmeta-8f126b23-b062-4021-9245-41d91bdf97d9 tcpdump -l >> -i tapsd99jef-88 icmp >> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >> listening on tapsd99jef-88, link-type EN10MB (Ethernet), snapshot length >> 262144 bytes >> 18:13:39.038132 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 1, length 64 >> 18:13:40.061961 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 2, length 64 >> 18:13:41.085949 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 3, length 64 >> 18:13:42.109964 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 4, length 64 >> 18:13:43.133974 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 5, length 64 >> 18:13:44.157950 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 6, length 64 >> 18:13:44.286197 IP 169.254.169.254 > 10.72.46.3: ICMP echo reply, id 15, >> seq 6, length 64 >> 18:13:45.182021 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 7, length 64 >> 18:13:46.205956 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 8, length 64 >> 18:13:47.229978 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 9, length 64 >> 18:13:48.253948 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 10, length 64 >> 18:13:49.277955 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 11, length 64 >> 18:13:50.301977 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 12, length 64 >> 18:13:51.325966 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 13, length 64 >> 18:13:52.349957 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 14, length 64 >> 18:13:53.373976 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 15, length 64 >> 18:13:54.397962 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 16, length 64 >> 18:13:55.421947 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 17, length 64 >> 18:13:56.445936 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >> 15, seq 18, length 64 >> ---------- >> >> Can you point out why this metadata is failing ? >> >> Appreciate your time.... >> >> Que tenga un buen día !!! >> >> >> On Thu, May 29, 2025 at 10:02 PM Ihar Hrachyshka <ihrac...@redhat.com> >> wrote: >> >>> On Thu, May 29, 2025 at 11:53 AM engineer2024 via discuss < >>> ovs-discuss@openvswitch.org> wrote: >>> >>>> Thanks for the response. >>>> >>>> This is not what I exactly asked. This scenario is specifically for the >>>> sriov ports. In this case, how the pkt from the physical nic go back to the >>>> same node ? Two questions. >>>> >>>> 1. For sriov vm ports , where does the DHCP responses come from ? Where >>>> is this maintained in the OVN ? I know for non sriov ports or non direct >>>> vNIC types, the ovn controller on the compute node intercepts it and >>>> responds. So it never comes out of the compute node. >>>> >>>> >>> Responses, if they do, come from the fabric, probably served from >>> *another* chassis that is in the HA Group list for the external port (do >>> you have other computes with the same cms-options setting?). In the >>> OpenStack group scheduler for external ports, there's an explicit check >>> against landing external port for a SR-IOV port on the same chassis as the >>> SR-IOV port itself. You can check sync_ha_chassis_group_network function >>> in neutron/common/ovn/utils.py to confirm it. >>> >>> >>>> 2. How to provide metadata service for sriov ports ? For non sriov >>>> ports the ovn metadata namespace does. >>>> >>>> >>> Same as with non-SRIOV ports, metadata will be served by the >>> ovn-metadata-agent. But it will be served from the host that owns the >>> external port (through localport). Which is - by design - a different host >>> from the one that hosts the SR-IOV port. >>> >>> >>>> On Thu, 29 May 2025, 21:09 Daniel Alvarez Sanchez, <dalva...@redhat.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> On Thu, May 29, 2025 at 4:50 PM engineer2024 via discuss < >>>>> ovs-discuss@openvswitch.org> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> i have an openstack ovn setup. For sriov neutron ports, when the cms >>>>>> option is set on the compute node hosting the sriov nic, as shown below >>>>>> >>>>>> "ovs-vsctl set Open_vSwitch . external-ids:ovn-cms-options=\" >>>>>> enable-chassis-as-extport-host\"" >>>>>> >>>>>> the port is getting the dhcp ip. Now, my question is, from where is >>>>>> the OVN responding to this external port's DHCP reqs ? I know for a >>>>>> normal tap port , it goes through br-int and then the ovn-controller >>>>>> gives >>>>>> the response. but the sriov port by passes the whole ovs and the host's >>>>>> kernel network stack. But where does it go to after it exits the physical >>>>>> VF interface, and how does OVN answer it ? How and where does OVN's >>>>>> inbuilt >>>>>> dhcp service maintained ? >>>>>> >>>>>> >>>>> DHCP will be answered wherever your external port is scheduled. >>>>> I recommend reading this blogpost I wrote some time back: >>>>> https://dani.foroselectronica.es/ovn-external-ports-604/ >>>>> >>>>> If you're seeing this behavior and you are 100% sure that the same >>>>> compute node that has the SRIOV port is serving the DHCP requests to that >>>>> instance then it means that the broadcast request is coming out from the >>>>> SRIOV port and back in from the same switch presumably to the compute node >>>>> through a different NIC and from there to br-ex (or similar?) -> br-int -> >>>>> external-port. I'm not entirely sure about the return path though but you >>>>> can possibly check with tcpdump :) >>>>> >>>>> >>>>>> Next, for sriov ports, the nova metadata service is also unreachable, >>>>>> as, it bypasses the ovn-meta namespace on the compute host connected to >>>>>> the >>>>>> br-int via veth cables. So injecting user data like ssh keys is not >>>>>> possible and failing... >>>>>> >>>>> >>>>> Same... you should have the ovn-metadata-agent running where your >>>>> external port is and this one will proxy the metadata request to nova and >>>>> serve it back to your sriov instance wherever it is. >>>>> >>>>> >>>>>> >>>>>> Thanks >>>>>> elinux >>>>>> _______________________________________________ >>>>>> discuss mailing list >>>>>> disc...@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>> >>>>> _______________________________________________ >>>> discuss mailing list >>>> disc...@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>> >>>
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss