Public bug reported:
Pike
DVR + L3_HA
L2population enabled
Some of our L3 HA routers are not working correctly. They are not reachable
from instances.
After deep investigation, I've found that "HA port tenant <tenant id>" ports
are in state DOWN.
They are DOWN because they don't have binding information.
They don't have binding information because 'HA network tenant <tenant_id>'
network is corrupted.
I mean it does not have provider:network_type and
provider:segmentation_id parameters set.
The weird thing is that this network was OK and worked but in some point
in time has been corrupted. I don't have any logs from this point in
time.
For comparison working HA tenant network:
+---------------------------+----------------------------------------------------+
| Field | Value
|
+---------------------------+----------------------------------------------------+
| admin_state_up | True
|
| availability_zone_hints |
|
| availability_zones | nova
|
| created_at | 2018-02-16T16:52:31Z
|
| description |
|
| id | fa2fea5c-ccaa-4116-bb0c-ff59bbd8229a
|
| ipv4_address_scope |
|
| ipv6_address_scope |
|
| mtu | 9000
|
| name | HA network tenant
afeeb372d7934795b63868330eca0dfe |
| port_security_enabled | True
|
| project_id |
|
| provider:network_type | vxlan
|
| provider:physical_network |
|
| provider:segmentation_id | 35
|
| revision_number | 3
|
| router:external | False
|
| shared | False
|
| status | ACTIVE
|
| subnets | 5cbc612d-13cf-4889-88fb-02d1debe5f8d
|
| tags |
|
| tenant_id |
|
| updated_at | 2018-02-16T16:52:31Z
|
+---------------------------+----------------------------------------------------+
and not working HA tenant network:
+---------------------------+----------------------------------------------------+
| Field | Value
|
+---------------------------+----------------------------------------------------+
| admin_state_up | True
|
| availability_zone_hints |
|
| availability_zones |
|
| created_at | 2018-01-26T12:24:15Z
|
| description |
|
| id | 6390c381-871e-4945-bfa0-00828bb519bc
|
| ipv4_address_scope |
|
| ipv6_address_scope |
|
| mtu | 9000
|
| name | HA network tenant
3e88cffb9dbb4e1fba96ee72a02e012e |
| port_security_enabled | True
|
| project_id |
|
| provider:network_type |
|
| provider:physical_network |
|
| provider:segmentation_id |
|
| revision_number | 5
|
| router:external | False
|
| shared | False
|
| status | ACTIVE
|
| subnets | 4d579b00-c780-45ed-9bd8-4d3256fa8a42
|
| tags |
|
| tenant_id |
|
| updated_at | 2018-01-29T14:08:11Z
|
+---------------------------+----------------------------------------------------+
I've found that all working networks have revision_number = 3 and all not
working networks have revision_number = 5.
When HA network tenant network is corrupted ALL L3-HA routers in a particular
tenant are not working.
Is there any way to fix this without removing all existing L3-HA routers in
this tenant?
Unfortunately I can't find any code responsible for "HA network tenant"
updating or modification so I hit a wall in my debugging process.
It is probable that network has been corrupted during some automatic
network resources provisioning using Heat stack but I can't reproduce
this.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1757188
Title:
some L3 HA routrers does not work
Status in neutron:
New
Bug description:
Pike
DVR + L3_HA
L2population enabled
Some of our L3 HA routers are not working correctly. They are not reachable
from instances.
After deep investigation, I've found that "HA port tenant <tenant id>" ports
are in state DOWN.
They are DOWN because they don't have binding information.
They don't have binding information because 'HA network tenant <tenant_id>'
network is corrupted.
I mean it does not have provider:network_type and
provider:segmentation_id parameters set.
The weird thing is that this network was OK and worked but in some
point in time has been corrupted. I don't have any logs from this
point in time.
For comparison working HA tenant network:
+---------------------------+----------------------------------------------------+
| Field | Value
|
+---------------------------+----------------------------------------------------+
| admin_state_up | True
|
| availability_zone_hints |
|
| availability_zones | nova
|
| created_at | 2018-02-16T16:52:31Z
|
| description |
|
| id | fa2fea5c-ccaa-4116-bb0c-ff59bbd8229a
|
| ipv4_address_scope |
|
| ipv6_address_scope |
|
| mtu | 9000
|
| name | HA network tenant
afeeb372d7934795b63868330eca0dfe |
| port_security_enabled | True
|
| project_id |
|
| provider:network_type | vxlan
|
| provider:physical_network |
|
| provider:segmentation_id | 35
|
| revision_number | 3
|
| router:external | False
|
| shared | False
|
| status | ACTIVE
|
| subnets | 5cbc612d-13cf-4889-88fb-02d1debe5f8d
|
| tags |
|
| tenant_id |
|
| updated_at | 2018-02-16T16:52:31Z
|
+---------------------------+----------------------------------------------------+
and not working HA tenant network:
+---------------------------+----------------------------------------------------+
| Field | Value
|
+---------------------------+----------------------------------------------------+
| admin_state_up | True
|
| availability_zone_hints |
|
| availability_zones |
|
| created_at | 2018-01-26T12:24:15Z
|
| description |
|
| id | 6390c381-871e-4945-bfa0-00828bb519bc
|
| ipv4_address_scope |
|
| ipv6_address_scope |
|
| mtu | 9000
|
| name | HA network tenant
3e88cffb9dbb4e1fba96ee72a02e012e |
| port_security_enabled | True
|
| project_id |
|
| provider:network_type |
|
| provider:physical_network |
|
| provider:segmentation_id |
|
| revision_number | 5
|
| router:external | False
|
| shared | False
|
| status | ACTIVE
|
| subnets | 4d579b00-c780-45ed-9bd8-4d3256fa8a42
|
| tags |
|
| tenant_id |
|
| updated_at | 2018-01-29T14:08:11Z
|
+---------------------------+----------------------------------------------------+
I've found that all working networks have revision_number = 3 and all not
working networks have revision_number = 5.
When HA network tenant network is corrupted ALL L3-HA routers in a particular
tenant are not working.
Is there any way to fix this without removing all existing L3-HA routers in
this tenant?
Unfortunately I can't find any code responsible for "HA network
tenant" updating or modification so I hit a wall in my debugging
process.
It is probable that network has been corrupted during some automatic
network resources provisioning using Heat stack but I can't reproduce
this.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1757188/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp