[Yahoo-eng-team] [Bug 1926531] [NEW] SNAT namespace prematurely created then deleted on hosts, resulting in removal of RFP/FPR link to FIP namespace

2021-04-28 Thread Arjun Baindur
Public bug reported: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1850779 I think this fix causes problems. We have multiple nodes that are DVR_SNAT mode. Snat namespace is scheduled to 1 of them. When l3-agent is restarted on the othre nodes, now, initialize() is invoked

[Yahoo-eng-team] [Bug 1884708] [NEW] explicity_egress_direction prevents learning of local MACs and causes flooding of ingress packets

2020-06-22 Thread Arjun Baindur
Public bug reported: We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067 and then also backported ourselves https://bugs.launchpad.net/neutron/+bug/1866445 The latter is for iptables based firewall. We have VLAN based networks, and seeing ingress packets destined to local MACs

[Yahoo-eng-team] [Bug 1878719] [NEW] DHCP Agent's iptables CHECKSUM rule causes skb_warn_bad_offload kernel

2020-05-14 Thread Arjun Baindur
Public bug reported: We are hitting this kernel issue due to a DHCP agent CHECKSUM rule that is probably obsolete/not needed: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840619 Upgrading the kernel is one workaround, but more disruptive, especially since still using CentOS7, and kernel

[Yahoo-eng-team] [Bug 1866139] [NEW] GARP not sent on provider network after live migration

2020-03-04 Thread Arjun Baindur
Public bug reported: Using Rocky, with OVS. Live migrated a VM on regular VLAN based provider network. Network connectivity was stopped, no GARP packets observed on tcpdump. Things started working after VM initiated traffic, causing MAC to be relearned. Looking at the code, send_ip_addr_adv_notif

[Yahoo-eng-team] [Bug 1864711] [NEW] DHCP port rescheduling causes ports to grow, internal DNS to be broken

2020-02-25 Thread Arjun Baindur
Public bug reported: Suppose we have DHCP servers per network 2. And we have a # of DHCP agents > 2. During a time of network instability, RabbitMQ issues, or even a DHCP host temporarily going down the DHCP port will get rescheduled. Except it looks like it's not so much as getting rescheduled,

[Yahoo-eng-team] [Bug 1862851] [NEW] update_floatingip_statuses: StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched.

2020-02-11 Thread Arjun Baindur
Public bug reported: Running Rocky, in a DVR environment We are seeing repeated errors like this in neutron-server logs. Not sure at this point what the effect is, or harmless. This is in a build test environment where a lot of automated VMs get spawned and deleted then associated/disassociated w

[Yahoo-eng-team] [Bug 1853071] [NEW] AMQP disconnects, q-reports-plugin queue grows, leading to DBDeadlocks while trying to update agent heartbeats

2019-11-18 Thread Arjun Baindur
Public bug reported: Since upgrading to Rocky, we have seen this issue pop up in several environments, small and large. First we see various AMQP/Rabbit related errors - missed heartbeats from neutron-server to rabbitmq, then repeated errors such as Socket Closed, Broken Pipe, etc... This continu

[Yahoo-eng-team] [Bug 1852504] [NEW] DHCP reserved ports that were unscheduled are advertised as DNS servers

2019-11-13 Thread Arjun Baindur
Public bug reported: We have 2 DHCP servers per network. After network outages, and when hosts come back online, the number of ACTIVE DHCP servers grow. This happened again after more outages, with some networks having up to 9-10+ DHCP ports, many in ACTIVE state, despite neutron-server's neutron.

[Yahoo-eng-team] [Bug 1838699] [NEW] Removing a subnet from DVR router also removes DVR MAC flows for other router on network

2019-08-01 Thread Arjun Baindur
Public bug reported: This bug builds on issue seen in https://bugs.launchpad.net/neutron/+bug/1838697 In that issue, if you create a tenant network, some VMs, and attach it to 2 DVR routers, only the DVR MAC rules exist for the first router. With this issue, simply removing the subnet or deletin

[Yahoo-eng-team] [Bug 1838697] [NEW] DVR Mac conversion rules are only added for the first router a network is attached to

2019-08-01 Thread Arjun Baindur
Public bug reported: This is seen on stable/pike, have not checked latest or stein. 1. Create a basic tenant network and create a DVR router, attach them. Spin up some VMs: [r...@pf9-kvm-neutron.platform9.net arjun(admin)]# openstack port list --network 8cd0e19e-9041-4a62-9cc9-6bfb5b10f955 --lo

[Yahoo-eng-team] [Bug 1823798] [NEW] rfp and fpr interfaces not updated after changing network MTU

2019-04-08 Thread Arjun Baindur
Public bug reported: I have a tenant network attached to a router and external network, and some VMs with Floating IPs deployed. I updated the network MTU from 1500 to 1300 (just as a test, but try any other MTU), and restarted nova compute, ovs agent, l3-agent. All interfaces (qvo/qvb/qbr, qr, d

[Yahoo-eng-team] [Bug 1806770] [NEW] DHCP Agent should not release DHCP lease when client ID is not set on port

2018-12-04 Thread Arjun Baindur
client ID should only be enforce, and leases released, if it's actually set on the port. In that case it means someone knows what they are doing, and we want to check for a mismatch. If its None, I suspect in 99.% of cases the operator does not know or care about client ID field.

[Yahoo-eng-team] [Bug 1802006] [NEW] Floating IP attach/detach fails for non-admin user and unbound port with router in different tenant

2018-11-06 Thread Arjun Baindur
Public bug reported: Seeing this on pike, but code looks same in master so issue still likely exists. We have a shared external network connected to router in TenantA. Now create a network, either shared in tenantA or owned by tenantB, and attach to tenantA's router (an admin user will have to do

[Yahoo-eng-team] [Bug 1783908] [NEW] dnsmasq does not remove leases for deleted VMs - leases and host files point to different MACS

2018-07-26 Thread Arjun Baindur
Public bug reported: We see this sporadically, sometimes it works, sometimes it doesn't. We delete VMs, then create new ones. The new fixed IP can't get an IP via DHCP, because the leases file still points to MAC of some old, deleted VMs. The host file is correctly updated. For example we see in

[Yahoo-eng-team] [Bug 1783654] [NEW] DVR process flow not installed on physical bridge for shared tenant network

2018-07-25 Thread Arjun Baindur
Public bug reported: Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396 In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts. The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with

[Yahoo-eng-team] [Bug 1776778] [NEW] Floating IPs broken after upgrade to Centos 7.5 - DNAT not working

2018-06-13 Thread Arjun Baindur
Public bug reported: Since upgrading to Centos 7.5, floating IP functionality has been completely busted. Packets arrive inbound to qrouter from fip namespace via RFP, but are not DNAT'd or routed, as we see nothing going out qr- interface. For outbound packets leaving the VM, they are fine, but t

[Yahoo-eng-team] [Bug 1681979] [NEW] L2pop flows are lost after OVS agent restart

2017-04-11 Thread Arjun Baindur
Public bug reported: In OVS agent, there is a race condition between l2pop's add_fdb_entries notification and provision_local_vlan when we create a vlanmanager mapping. This results in either unicast, flooding, or both entries not being populated on the host. Without the flooding entries, connecti