[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-18 Thread LIU Yulong
Temporarily close this, I will test this in master, and reopen if it
encountered.

** Changed in: neutron
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  Invalid

Bug description:
  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Log:
  [1] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  It has a very strange behavior that those 3 APIs have a same request-
  id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and locally a temporary solution, we add a new config to 
decide whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-17 Thread LIU Yulong
Thanks John, now this bug only for one creation failed log, something
I've described in comment #2.

** Description changed:

  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.
  
- Exception log:
- [1] http://paste.openstack.org/show/585669/
- [2] http://paste.openstack.org/show/585670/
+ Log:
+ [1] http://paste.openstack.org/show/585670/
  
  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:
  
  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """
  
- Log [2] has a very strange behavior that those 3 APIs have a same
- request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].
+ It has a very strange behavior that those 3 APIs have a same request-id
+ [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].
  
  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.
  
  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and locally a temporary solution, we add a new config to 
decide whether delete the HA network every time.

** Changed in: neutron
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  New

Bug description:
  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Log:
  [1] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  It has a very strange behavior that those 3 APIs have a same request-
  id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and locally a temporary solution, we add a new config to 
decide whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-14 Thread John Schwarz
Looking at the log involving the server ([1] - the same one you provided
in the first comment and in comment #3), and specifically lines 19 and
21, it's clear that sync_routers() is triggering
auto_schedule_routers(). Before [2] removed in, the call from
sync_routers() to auto_schedule_routers() was done in line 96 of
neutron/api/rpc/handlers/l3_rpc.py, as can be observed from the log:

2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher   File 
"/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 96, 
in sync_routers
2016-10-09 17:03:52.366 144166 ERROR oslo_messaging.rpc.dispatcher 
self.l3plugin.auto_schedule_routers(context, host, router_ids)

In [2], it's evident that the line 96 itself is removed. Thus, this
can't be reproduced in master or in stable/mitaka and there is no
(upstream) bug to fix.

[1]: http://paste.openstack.org/show/585669/
[2]: 
https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  Invalid

Bug description:
  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Exception log:
  [1] http://paste.openstack.org/show/585669/
  [2] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  Log [2] has a very strange behavior that those 3 APIs have a same
  request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and locally a temporary solution, we add a new config to 
decide whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-14 Thread LIU Yulong
** Changed in: neutron
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  New

Bug description:
  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Exception log:
  [1] http://paste.openstack.org/show/585669/
  [2] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  Log [2] has a very strange behavior that those 3 APIs have a same
  request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and temporarily solution, we add a new config to decide 
whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)

2016-10-14 Thread John Schwarz
Adding a new configuration option is almost never temporary as deleting
config options is rarely backward-compatible.

The race condition, as I understand it, is as following:

1. Create HA router, have worker1 send 'router_updated' to agent1.
2. Delete HA router (done by worker2). worker2 will now detect that there are 
no more HA routers and will delete the HA network for the tenant.
3. agent1 issues a 'sync_router', which triggers auto_schedule_routers. 
create_ha_port_and_bind will try to create the HA port but there are no more IP 
addresses available, causing add_ha_port to fail as specified in the first 
paste.

Point #3 is a bit weird to me, as it looks like IPAM is detecting a
"network deleted during function run" as "no more IP addresses". In
addition, this should be caught by [2], forcing a silent retrigger of
this issue.

Aside from the issue that isn't clear to me, I'd like to point out that
the latest stable/mitaka [1] doesn't even trigger auto_schedule_routers
on sync_router (not since [3] - perhaps you're missing this backport?) -
hence the trace received in the first paste can't be reproduced. For
this reason, I'm closing this as Invalid. Liu, feel free to reopen if
you disagree with my assessment :)

[1]: 
https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/api/rpc/handlers/l3_rpc.py#L79
[2]: 
https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/common/utils.py#L726
[3]: 
https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4

(5860fb21e966ab8f1e011654dd477d7af35f7a27 is the latest stable/mitaka
hash that github.com provided.)

** Changed in: neutron
   Importance: High => Undecided

** Changed in: neutron
   Status: Confirmed => Invalid

** Changed in: neutron
Milestone: ocata-1 => None

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306

Title:
  Partial HA network causing HA router creation failed (race conditon)

Status in neutron:
  Invalid

Bug description:
  ENV: stable/mitaka,VXLAN
  Neutron API: two neutron-servers behind a HA proxy VIP.

  Exception log:
  [1] http://paste.openstack.org/show/585669/
  [2] http://paste.openstack.org/show/585670/

  Log [1] shows that the subnet of HA network is concurrently deleted
  while a new HA router create API comes. Seems the race conditon
  described in this bug is till exists :
  https://bugs.launchpad.net/neutron/+bug/1533440, where has description
  said:

  """
  Some known exceptions:
  ...
  2. IpAddressGenerationFailure: (HA port created failed due to the
     concurrently HA subnet deletion)
  ...
  """

  Log [2] has a very strange behavior that those 3 APIs have a same
  request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].

  Test scenario:
  Just create one HA router for a tenant, and then quickly delete it.

  For now, our mitaka ENV use VxLAN as tenant network type. So there is a very 
large range of VNI.
  So don't save that, and temporarily solution, we add a new config to decide 
whether delete the HA network every time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp