Public bug reported: Pike DVR + L3_HA L2population enabled
Some of our L3 HA routers are not working correctly. They are not reachable from instances. After deep investigation, I've found that "HA port tenant <tenant id>" ports are in state DOWN. They are DOWN because they don't have binding information. They don't have binding information because 'HA network tenant <tenant_id>' network is corrupted. I mean it does not have provider:network_type and provider:segmentation_id parameters set. The weird thing is that this network was OK and worked but in some point in time has been corrupted. I don't have any logs from this point in time. For comparison working HA tenant network: +---------------------------+----------------------------------------------------+ | Field | Value | +---------------------------+----------------------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | nova | | created_at | 2018-02-16T16:52:31Z | | description | | | id | fa2fea5c-ccaa-4116-bb0c-ff59bbd8229a | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 9000 | | name | HA network tenant afeeb372d7934795b63868330eca0dfe | | port_security_enabled | True | | project_id | | | provider:network_type | vxlan | | provider:physical_network | | | provider:segmentation_id | 35 | | revision_number | 3 | | router:external | False | | shared | False | | status | ACTIVE | | subnets | 5cbc612d-13cf-4889-88fb-02d1debe5f8d | | tags | | | tenant_id | | | updated_at | 2018-02-16T16:52:31Z | +---------------------------+----------------------------------------------------+ and not working HA tenant network: +---------------------------+----------------------------------------------------+ | Field | Value | +---------------------------+----------------------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | created_at | 2018-01-26T12:24:15Z | | description | | | id | 6390c381-871e-4945-bfa0-00828bb519bc | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 9000 | | name | HA network tenant 3e88cffb9dbb4e1fba96ee72a02e012e | | port_security_enabled | True | | project_id | | | provider:network_type | | | provider:physical_network | | | provider:segmentation_id | | | revision_number | 5 | | router:external | False | | shared | False | | status | ACTIVE | | subnets | 4d579b00-c780-45ed-9bd8-4d3256fa8a42 | | tags | | | tenant_id | | | updated_at | 2018-01-29T14:08:11Z | +---------------------------+----------------------------------------------------+ I've found that all working networks have revision_number = 3 and all not working networks have revision_number = 5. When HA network tenant network is corrupted ALL L3-HA routers in a particular tenant are not working. Is there any way to fix this without removing all existing L3-HA routers in this tenant? Unfortunately I can't find any code responsible for "HA network tenant" updating or modification so I hit a wall in my debugging process. It is probable that network has been corrupted during some automatic network resources provisioning using Heat stack but I can't reproduce this. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1757188 Title: some L3 HA routrers does not work Status in neutron: New Bug description: Pike DVR + L3_HA L2population enabled Some of our L3 HA routers are not working correctly. They are not reachable from instances. After deep investigation, I've found that "HA port tenant <tenant id>" ports are in state DOWN. They are DOWN because they don't have binding information. They don't have binding information because 'HA network tenant <tenant_id>' network is corrupted. I mean it does not have provider:network_type and provider:segmentation_id parameters set. The weird thing is that this network was OK and worked but in some point in time has been corrupted. I don't have any logs from this point in time. For comparison working HA tenant network: +---------------------------+----------------------------------------------------+ | Field | Value | +---------------------------+----------------------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | nova | | created_at | 2018-02-16T16:52:31Z | | description | | | id | fa2fea5c-ccaa-4116-bb0c-ff59bbd8229a | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 9000 | | name | HA network tenant afeeb372d7934795b63868330eca0dfe | | port_security_enabled | True | | project_id | | | provider:network_type | vxlan | | provider:physical_network | | | provider:segmentation_id | 35 | | revision_number | 3 | | router:external | False | | shared | False | | status | ACTIVE | | subnets | 5cbc612d-13cf-4889-88fb-02d1debe5f8d | | tags | | | tenant_id | | | updated_at | 2018-02-16T16:52:31Z | +---------------------------+----------------------------------------------------+ and not working HA tenant network: +---------------------------+----------------------------------------------------+ | Field | Value | +---------------------------+----------------------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | | | created_at | 2018-01-26T12:24:15Z | | description | | | id | 6390c381-871e-4945-bfa0-00828bb519bc | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 9000 | | name | HA network tenant 3e88cffb9dbb4e1fba96ee72a02e012e | | port_security_enabled | True | | project_id | | | provider:network_type | | | provider:physical_network | | | provider:segmentation_id | | | revision_number | 5 | | router:external | False | | shared | False | | status | ACTIVE | | subnets | 4d579b00-c780-45ed-9bd8-4d3256fa8a42 | | tags | | | tenant_id | | | updated_at | 2018-01-29T14:08:11Z | +---------------------------+----------------------------------------------------+ I've found that all working networks have revision_number = 3 and all not working networks have revision_number = 5. When HA network tenant network is corrupted ALL L3-HA routers in a particular tenant are not working. Is there any way to fix this without removing all existing L3-HA routers in this tenant? Unfortunately I can't find any code responsible for "HA network tenant" updating or modification so I hit a wall in my debugging process. It is probable that network has been corrupted during some automatic network resources provisioning using Heat stack but I can't reproduce this. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1757188/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp