[Yahoo-eng-team] [Bug 1982373] Re: nova/neutron ignore and overwrite custom device owner fields

2022-07-21 Thread Oleg Bondarev
Since it's nova logic to update port I guess the bug should be filed against 
nova project.
@akkaris what do you think?

Also in the last statement "the port is actually bound now to the
instance" - I can't see this from "openstack server list" output, am I
missing something?

** Changed in: neutron
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1982373

Title:
  nova/neutron ignore and overwrite custom device owner fields

Status in neutron:
  Opinion

Bug description:
  nova/neutron ignore custom device owner fields when the device_id matches a 
nova server.  The fact that the device_owner field is set to something other 
than nova
  is completely ignored.

  Sequence of command line actions:
  ~~~
  ~~~
  stack@standalone ~]$ openstack server list
  
+--+-++-+---+---+
  | ID   | Name| Status 
| Networks| Image | 
Flavor|
  
+--+-++-+---+---+
  | 382c107f-a082-4e9b-8adb-2ba45323c479 | ostest-lq27s-worker-0-cz6gw | ACTIVE 
| ostest-lq27s-openshift=10.196.2.215 | rhcos | 
m1.large  |
  | 985a609a-1fdd-4f48-b996-9311883c33a2 | ostest-lq27s-worker-0-5vcxf | ACTIVE 
| ostest-lq27s-openshift=10.196.2.151 | rhcos | 
m1.large  |
  ~~~

  ~~~
  # openstack port create --network ed889e25-f8fa-4684-a9c4-54fff8de37b8  
--device 382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner 
--fixed-ip 
subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.200 TestPort
  (...)
  | id  | 697f4773-7fe7-4d1b-9804-8fbb003b1194
  (...)
  # openstack port create --network ed889e25-f8fa-4684-a9c4-54fff8de37b8  
--device 382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner 
--fixed-ip 
subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.201 TestPort2
  (...)
  | id  | bc22dfa9-90fa-4d70-84a8-ec3a41ea2305
  (...)
  ~~~

  Now, run this in a terminal:
  ~~~
  while true ; do sleep 10 ; date ;  openstack port show 
697f4773-7fe7-4d1b-9804-8fbb003b1194 | grep device_owner; done
  Wed Jul 20 14:21:26 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:21:38 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:21:51 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:22:03 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:22:15 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:22:28 UTC 2022
  | device_owner| TestOwner 

   |
  Wed Jul 20 14:22:40 UTC 2022
  | device_owner| TestOwner 

   |
  (...)
  ~~~

  In another terminal, delete and recreate the second port:
  ~~~
  [stack@standalone ~]$ openstack port delete 
bc22dfa9-90fa-4d70-84a8-ec3a41ea2305
  [stack@standalone ~]$ openstack port create --network 
ed889e25-f8fa-4684-a9c4-54fff8de37b8  --device 
382c107f-a082-4e9b-8adb-2ba45323c479 --device-owner TestOwner --fixed-ip 
subnet=ba4e5cdb-a0e3-47f2-9233-47d5a12c,ip-address=10.196.100.201 TestPort2
  (...)
  | id  | bc22dfa9-90fa-4d70-84a8-ec3a41ea2305
  (...)
  ~~~

  Check in the terminal that's running the while loop:
  ~~~
  Wed Jul 20 14:22:53 UTC 2022
  | device_owner| TestOwner  

[Yahoo-eng-team] [Bug 1969615] Re: OVS: flow loop is created with openvswitch version 2.16

2022-04-21 Thread Oleg Bondarev
So flows look the same for both 2.15 and 2.16 (no surprise here), just
that in 2.16 case this weird ofport 7 appears out of nowhere according
to vswitchd log, and in fact there's no such ofport on the bridge.

Also flow counters are zero for 2.16 case:

cookie=0xb722108b439955c3, duration=81.938s, table=0, n_packets=0,
n_bytes=0, idle_age=81, priority=0 actions=resubmit(,60)

for 2.15 we see packets:

cookie=0xb722108b439955c3, duration=631.481s, table=0, n_packets=35,
n_bytes=2870, idle_age=20, priority=0 actions=resubmit(,60)

Not sure it's a neutron issue, probably openvswitch folks could point
some ways to debug.

** Changed in: neutron
   Status: Confirmed => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1969615

Title:
  OVS: flow loop is created with openvswitch version 2.16

Status in neutron:
  Opinion

Bug description:
  * Summary
  neutron-openvswitch-agent is causing a flow loop when using Openvswitch 
version 2.16.

  * High level description
  Running Neutron in Xena release using the openvswitch plugin is causing a 
flow loop when using openvswitch version 2.16. This does not occur when 
deploying openvswitch version 2.15.

  * Pre-conditions
  Ansible-Kolla based deployment using "source: ubuntu" in stable/xena release. 
neutron_plugin_agent: "openvswitch". Deploying a 3 node cluster with basic 
Openstack services.

  * Version:
   ** OpenStack version: Xena
   ** Linux distro: kolla-ansible stable/xena, Ubuntu 20.04.4 LTS

  * Step-by-step
  1. Deploy Openstack using kolla-ansible from stable/xena branch
  2. Create a project network/subnet for Octavia
  3. Create Octavia health-manager ports in Neutron for the 3 control nodes
  4. Create the ports on each control node as ovs bridge ports
  5. Assign IP addresses to the o-hm0 interfaces on all 3 nodes
  6. try to ping one node from another node

  ubuntu@ctl1:~$ openstack network show lb-mgmt
  +---+--+
  | Field | Value|
  +---+--+
  | admin_state_up| UP   |
  | availability_zone_hints   |  |
  | availability_zones| nova |
  | created_at| 2022-04-20T10:36:26Z |
  | description   |  |
  | dns_domain| None |
  | id| c0c1b3ec-a6c3-4145-b94a-6c7fa4d7a740 |
  | ipv4_address_scope| None |
  | ipv6_address_scope| None |
  | is_default| None |
  | is_vlan_transparent   | None |
  | mtu   | 1450 |
  | name  | lb-mgmt  |
  | port_security_enabled | True |
  | project_id| 6cbb86e577a042499529110f6a1e8603 |
  | provider:network_type | vxlan|
  | provider:physical_network | None |
  | provider:segmentation_id  | 577  |
  | qos_policy_id | None |
  | revision_number   | 2|
  | router:external   | Internal |
  | segments  | None |
  | shared| False|
  | status| ACTIVE   |
  | subnets   | bf004f5a-4cae-4277-a3f4-a4cf787033cb |
  | tags  |  |
  | updated_at| 2022-04-20T10:36:28Z |
  +---+--+

  ubuntu@ctl1:~$ openstack subnet show lb-mgmt
  +--+--+
  | Field| Value|
  +--+--+
  | allocation_pools | 172.16.1.1-172.16.255.254|
  | cidr | 172.16.0.0/16|
  | created_at   | 2022-04-20T10:36:28Z |
  | description  |  |
  | dns_nameservers  |  |
  | dns_publish_fixed_ip | None |
  | enable_dhcp  | True |
  | gateway_ip   | 172.16.0.1   |
  | 

[Yahoo-eng-team] [Bug 1961173] [NEW] [fullstack] test_vm_is_accessible_by_local_ip fails sometimes

2022-02-17 Thread Oleg Bondarev
Public bug reported:

Happens for (with_conntrack_rules) scenario. 
Examples:

-
https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron-
fullstack-with-uwsgi/cdc875c/testr_results.html

-
https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron-
fullstack-with-uwsgi/0243e12/testr_results.html


Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
return f(self, *args, **kwargs)
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py",
 line 111, in test_vm_is_accessible_by_local_ip
vms.ping_all()
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py",
 line 46, in ping_all
vm_1.block_until_ping(vm_2.ip)
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py",
 line 67, in block_until_ping
utils.wait_until_true(
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", 
line 722, in wait_until_true
raise exception
neutron.tests.common.machine_fixtures.FakeMachineException: No ICMP reply 
obtained from IP address 10.0.0.38

The test fails even before Local IP creation - on initial VMs
connectivity check

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: gate-failure

** Description changed:

- Happens for (with_conntrack_rules) scenario, example
+ Happens for (with_conntrack_rules) scenario. 
+ Examples:
  
+ -
  
https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron-
  fullstack-with-uwsgi/cdc875c/testr_results.html
  
+ -
+ 
https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron-
+ fullstack-with-uwsgi/0243e12/testr_results.html
+ 
+ 
  Traceback (most recent call last):
-   File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
- return f(self, *args, **kwargs)
-   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py",
 line 111, in test_vm_is_accessible_by_local_ip
- vms.ping_all()
-   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py",
 line 46, in ping_all
- vm_1.block_until_ping(vm_2.ip)
-   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py",
 line 67, in block_until_ping
- utils.wait_until_true(
-   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 
722, in wait_until_true
- raise exception
+   File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
+ return f(self, *args, **kwargs)
+   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py",
 line 111, in test_vm_is_accessible_by_local_ip
+ vms.ping_all()
+   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py",
 line 46, in ping_all
+ vm_1.block_until_ping(vm_2.ip)
+   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py",
 line 67, in block_until_ping
+ utils.wait_until_true(
+   File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 
722, in wait_until_true
+ raise exception
  neutron.tests.common.machine_fixtures.FakeMachineException: No ICMP reply 
obtained from IP address 10.0.0.38
  
- 
- The test fails even before Local IP creation - on initial VMs connectivity 
check
+ The test fails even before Local IP creation - on initial VMs
+ connectivity check

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1961173

Title:
  [fullstack] test_vm_is_accessible_by_local_ip fails  sometimes

Status in neutron:
  Confirmed

Bug description:
  Happens for (with_conntrack_rules) scenario. 
  Examples:

  -
  
https://b4c71a9e78e49e1ca534-33cd363c3f72485dda255154bdda0fc8.ssl.cf1.rackcdn.com/829247/2/check/neutron-
  fullstack-with-uwsgi/cdc875c/testr_results.html

  -
  
https://1c11d883c451b6b39e08-76fe6537709af1be557ea31f3d630d58.ssl.cf5.rackcdn.com/829022/3/check/neutron-
  fullstack-with-uwsgi/0243e12/testr_results.html

  
  Traceback (most recent call last):
    File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
  return f(self, *args, **kwargs)
    File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_local_ip.py",
 line 111, in test_vm_is_accessible_by_local_ip
  vms.p

[Yahoo-eng-team] [Bug 1958627] Re: Incomplete ARP entries on L3 gw namespace

2022-01-25 Thread Oleg Bondarev
Seems more related to neutron-dynamic-routing project

** Tags added: l3-bgp

** Changed in: neutron
   Importance: Undecided => Medium

** Changed in: neutron
   Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1958627

Title:
  Incomplete ARP entries on L3 gw namespace

Status in neutron:
  Opinion

Bug description:
  Setup information:

  Legacy l3 router + bgp + Public Address

  
  I see a lot unnecessary ARP request traffic to all instances in same network.

  
  Check bgp speaker advertised routes:(fake addresses)

  openstack bgp speaker list advertised routes 3e533042-729a-4782-8b8d-x
  +--+---+
  | Destination  | Nexthop   |
  +--+---+
  | 99.99.99.0/24   | 99.99.99.1|
  +--+---+

  Check L3 gw namespace arp table:(large count of incomplete ARP
  entries)

  ip netns exec qrouter-b524c5fc-dc91-41cb--ceded936xxx arp -ne
  Address  HWtype  HWaddress   Flags Mask
Iface
  99.99.99.92ether   (incomplete).   C 
qg-6a574a15-db
  99.99.99.96ether   fa:16:3e:c6:85:28   C 
qr-7a4cfad1-f7
  99.99.99.97ether   fa:16:3e:b4:3d:28   C 
qr-7a4cfad1-f7
  99.99.99.90ether   fa:16:3e:83:1f:4b   C 
qr-7a4cfad1-f7
  99.99.99.91(incomplete)  
qr-7a4cfad1-f7
  99.99.99.98ether   (incomplete)C 
qr-7a4cfad1-f7
  99.99.99.99ether   fa:16:3e:c6:e6:fd   C 
qr-7a4cfad1-f7
  99.99.99.94ether   fa:16:3e:dc:34:74   C 
qr-7a4cfad1-f7
  99.99.99.95(incomplete)  
qr-7a4cfad1-f7
  99.99.99.92ether   fa:16:3e:51:af:ef   C 
qr-7a4cfad1-f7
  99.99.99.93(incomplete)  
qr-7a4cfad1-f7
  99.99.99.98(incomplete)  
qr-7a4cfad1-f7
  99.99.99.96(incomplete)  
qr-7a4cfad1-f7
  .

  
  Neutron adds all subnet IPs and try arping all time.

  
  Wallaby release

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1958627/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1958128] Re: Neutron l3 agent keeps restarting (Ubuntu)

2022-01-24 Thread Oleg Bondarev
Marking "Invalid" for neutron based on Brian's last comment

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1958128

Title:
  Neutron l3 agent keeps restarting (Ubuntu)

Status in neutron:
  Invalid

Bug description:
  After following neutron install guide, when trying to create a
  floating IP, the request succeeded, but the floating IP never became
  reachable.

  Looking at neutron-l3-agent status, I could see that it was restarting
  every 2 seconds, failing with an exception of 'file not found
  /etc/neutron/fwaas_driver.ini'.

  As a temporary fix, I touched the file to create an empty one, and the
  service started without any errors and the floating IP started
  working.

  My configuration is exactly the one provided in the install guide, I
  didn't change anything.

  Maybe the documentation should contain a step to avoid this issue ?


  - [ ] This doc is inaccurate in this way: __
  - [x] This is a doc addition request.
  - [ ] I have a fix to the document that I can paste below including example: 
input and output. 

  If you have a troubleshooting or support issue, use the following
  resources:

   - The mailing list: https://lists.openstack.org
   - IRC: 'openstack' channel on OFTC

  ---
  Release: 19.1.1.dev10 on 2019-08-21 16:09:09
  SHA: d202e323d7f03edc56add8e83aeb9cddbbbce895
  Source: 
https://opendev.org/openstack/neutron/src/doc/source/install/controller-install-ubuntu.rst
  URL: 
https://docs.openstack.org/neutron/xena/install/controller-install-ubuntu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1958128/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1948676] Re: rpc response timeout for agent report_state is not possible

2021-10-25 Thread Oleg Bondarev
Did you investigate "This has the side effect that if a rabbitmq or
neutron-server is restarted all agents that is currently reporting there
will hang for a long time until report_state times out"? Is it expected
behavior from messaging side?

** Changed in: neutron
   Status: In Progress => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1948676

Title:
  rpc response timeout for agent report_state is not possible

Status in neutron:
  Opinion

Bug description:
  When hosting a large amount of routers and/or networks the RPC calls
  from the agents can take a long time which requires us to increase the
  rpc_response_timeout from the default of 60 seconds to a higher value
  for the agents to not timeout.

  This has the side effect that if a rabbitmq or neutron-server is
  restarted all agents that is currently reporting there will hang for a
  long time until report_state times out, during this time neutron-
  server has not got any reports causing it to set the agent as down.

  When it times out and tries again the reporting will succeed but a
  full sync will be triggered for all agents that was previously dead.
  This in itself can cause a very high load on the control plane.

  Consider the fact that a configuration change is deployed using
  tooling to all neutron-server nodes which is restarted, all agents
  will die, when they either 1) come back after rpc_response_timeout is
  reached and tries again or 2) is restarted manually all of them will
  do a full sync.

  We should have a configuration option that only applies to the rpc
  timeout for the report_state RPC call from agents because that could
  be lowered to be within the bounds of the agent not being seen as
  down.

  The old behavior can be kept by simply falling back to
  rpc_response_timeout by default instead of introducing a new default
  in this override.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1948676/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1939125] Re: Incorect Auto schedule new network segments notification listner

2021-08-06 Thread Oleg Bondarev
** Also affects: neutron/stein
   Importance: Undecided
   Status: New

** Also affects: neutron/queens
   Importance: Undecided
   Status: New

** Also affects: neutron/rocky
   Importance: Undecided
   Status: New

** Changed in: neutron/queens
   Status: New => Triaged

** Changed in: neutron/rocky
   Status: New => Triaged

** Changed in: neutron/stein
   Status: New => Triaged

** Changed in: neutron/queens
   Importance: Undecided => Medium

** Changed in: neutron/stein
   Importance: Undecided => Medium

** Changed in: neutron/rocky
   Importance: Undecided => Medium

** Changed in: neutron
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1939125

Title:
  Incorect Auto schedule new network segments notification listner

Status in neutron:
  New
Status in neutron queens series:
  Triaged
Status in neutron rocky series:
  Triaged
Status in neutron stein series:
  Triaged

Bug description:
  auto_schedule_new_network_segments() added in
  Ic9e64aa4ecdc3d56f00c26204ad931b810db7599 uses new payload
  notification listener in old stable branches of Neutron that still use
  old notify syntax.

  Following branches are affected: stable/stein, stable/rocky,
  stable/queens

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1939125/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938788] Re: Validate if fixed_ip given for port isn't the same as subnet's gateway_ip

2021-08-05 Thread Oleg Bondarev
** Changed in: neutron
   Status: In Progress => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938788

Title:
  Validate if fixed_ip given for port isn't the same as subnet's
  gateway_ip

Status in neutron:
  Opinion

Bug description:
  Currently when new port is created with fixed_ip given, neutron is not
  validating if that fixed_ip address isn't the same as subnet's gateway
  IP. That may cause problems, like e.g.:

  $ openstack subnet show 
  | allocation_pools  | 10.0.0.2-10.0.0.254
  | cidr  | 10.0.0.0/24   
  | enable_dhcp   | True  
  ...
  | gateway_ip| 10.0.0.1  

  
  $ nova boot   --flavor test --image test  --nic  
net-id=,v4-fixed-ip=10.0.0.1  test-vm1

  The instance will be created successfully, but after that, network
  communication issue could be happened since the gateway ip conflict.

  So Neutron should forbid creation of the port with gateway's ip
  address if it is not router's port (device_owner isn't set for one of
  the router device owners).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1938788/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1938913] Re: Install and configure compute node in Neutron

2021-08-04 Thread Oleg Bondarev
From the log it's absolutely impossible to figure out what's wrong.
Anyway it's definitely not a Neutron issue

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938913

Title:
  Install and configure compute node in Neutron

Status in neutron:
  Invalid

Bug description:
  After following the steps to install and configure neutron in the
  compute node, the nova service is not starting:

  sudo service nova-compute status
  ● nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; 
vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2021-08-04 12:04:06 -03; 
32s ago
  Process: 71167 ExecStart=/etc/init.d/nova-compute systemd-start 
(code=exited, status=1/FAILURE)
     Main PID: 71167 (code=exited, status=1/FAILURE)

  ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Scheduled 
restart job, restart coun>
  ago 04 12:04:06 openstack-compute systemd[1]: Stopped OpenStack Compute.
  ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Start 
request repeated too quickly.
  ago 04 12:04:06 openstack-compute systemd[1]: nova-compute.service: Failed 
with result 'exit-code'.
  ago 04 12:04:06 openstack-compute systemd[1]: Failed to start OpenStack 
Compute.

  Here is the log:
  2021-08-04 12:04:05.528 71167 INFO os_vif [-] Loaded VIF plugins: 
linux_bridge, noop, ovs
  2021-08-04 12:04:05.573 71167 CRITICAL nova 
[req-558c7b67-0fd7-4430-882b-e1a398d4ec4c - - - - -] Unhandled error: 
TypeError: argument of type 'NoneType' is not iterable
  2021-08-04 12:04:05.573 71167 ERROR nova Traceback (most recent call last):
  2021-08-04 12:04:05.573 71167 ERROR nova   File "/usr/bin/nova-compute", line 
10, in 
  2021-08-04 12:04:05.573 71167 ERROR nova sys.exit(main())
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/nova/cmd/compute.py", line 58, in main
  2021-08-04 12:04:05.573 71167 ERROR nova server = 
service.Service.create(binary='nova-compute',
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/nova/service.py", line 252, in create
  2021-08-04 12:04:05.573 71167 ERROR nova service_obj = cls(host, binary, 
topic, manager,
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/nova/service.py", line 115, in __init__
  2021-08-04 12:04:05.573 71167 ERROR nova 
conductor_api.wait_until_ready(context.get_admin_context())
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/nova/conductor/api.py", line 67, in 
wait_until_ready
  2021-08-04 12:04:05.573 71167 ERROR nova self.base_rpcapi.ping(context, 
'1.21 GigaWatts',
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/nova/baserpc.py", line 58, in ping
  2021-08-04 12:04:05.573 71167 ERROR nova return cctxt.call(context, 
'ping', arg=arg_p)
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
  2021-08-04 12:04:05.573 71167 ERROR nova 
self.transport._send(self.target, msg_ctxt, msg,
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
  2021-08-04 12:04:05.573 71167 ERROR nova return self._driver.send(target, 
ctxt, message,
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
680, in send
  2021-08-04 12:04:05.573 71167 ERROR nova return self._send(target, ctxt, 
message, wait_for_reply, timeout,
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
626, in _send
  2021-08-04 12:04:05.573 71167 ERROR nova msg.update({'_reply_q': 
self._get_reply_q()})
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
607, in _get_reply_q
  2021-08-04 12:04:05.573 71167 ERROR nova conn = 
self._get_connection(rpc_common.PURPOSE_LISTEN)
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
597, in _get_connection
  2021-08-04 12:04:05.573 71167 ERROR nova return 
rpc_common.ConnectionContext(self._connection_pool,
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/common.py", line 425, 
in __init__
  2021-08-04 12:04:05.573 71167 ERROR nova self.connection = 
connection_pool.create(purpose)
  2021-08-04 12:04:05.573 71167 ERROR nova   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/pool.py", line 146, in 
create
  2021-08-04 12:04:05.573 71167 ERROR nova return 

[Yahoo-eng-team] [Bug 1938826] Re: Install and configure controller node in Neutron

2021-08-04 Thread Oleg Bondarev
Looks like your config file is missing required config values.
This is an issue of the installer.

Please file a bug to the installer project.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938826

Title:
  Install and configure controller node in Neutron

Status in neutron:
  Invalid

Bug description:
  When restarting the neutron-linuxbridge-agent service, I am facing a problem:
  sudo service neutron-linuxbridge-agent status
  ● neutron-linuxbridge-agent.service - Openstack Neutron Linux Bridge Agent
   Loaded: loaded (/lib/systemd/system/neutron-linuxbridge-agent.service; 
enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2021-08-03 16:04:01 -03; 79ms ago
  Process: 377517 ExecStartPre=/bin/mkdir -p /var/lock/neutron 
/var/log/neutron /var/lib/neutron (code=exited, status=0/SUCCESS)
  Process: 377518 ExecStartPre=/bin/chown neutron:neutron /var/lock/neutron 
/var/log/neutron /var/lib/neutron (code=exited, status=0/SUCCESS)
  Process: 377519 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, 
status=0/SUCCESS)
  Process: 377520 ExecStart=/etc/init.d/neutron-linuxbridge-agent 
systemd-start (code=exited, status=0/SUCCESS)
     Main PID: 377520 (code=exited, status=0/SUCCESS)

  ago 03 16:03:59 openstack-controller systemd[1]: Starting Openstack Neutron 
Linux Bridge Agent...
  ago 03 16:03:59 openstack-controller systemd[1]: Started Openstack Neutron 
Linux Bridge Agent.
  ago 03 16:04:00 openstack-controller sudo[377533]:  neutron : TTY=unknown ; 
PWD=/var/lib/neutron ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap 
/etc/neutron/rootwrap.conf privsep-helper --config-file />
  ago 03 16:04:00 openstack-controller sudo[377533]: pam_unix(sudo:session): 
session opened for user root by (uid=0)
  ago 03 16:04:00 openstack-controller sudo[377533]: pam_unix(sudo:session): 
session closed for user root
  ago 03 16:04:01 openstack-controller systemd[1]: 
neutron-linuxbridge-agent.service: Succeeded.

  Same problem for the neutron-dhcp-agent service:
  sudo service neutron-dhcp-agent status
  ● neutron-dhcp-agent.service - OpenStack Neutron DHCP agent
   Loaded: loaded (/lib/systemd/system/neutron-dhcp-agent.service; enabled; 
vendor preset: enabled)
   Active: inactive (dead) since Tue 2021-08-03 16:22:18 -03; 6s ago
 Docs: man:neutron-dhcp-agent(1)
  Process: 384411 ExecStart=/etc/init.d/neutron-dhcp-agent systemd-start 
(code=exited, status=0/SUCCESS)
 Main PID: 384411 (code=exited, status=0/SUCCESS)

  ago 03 16:22:17 openstack-controller systemd[1]: Started OpenStack Neutron 
DHCP agent.
  ago 03 16:22:18 openstack-controller systemd[1]: neutron-dhcp-agent.service: 
Succeeded.

  
  Below is the log (/var/log/neutron/neutron-linuxbridge-agent.log):

  2021-08-02 15:27:07.843 56737 ERROR 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Tunneling cannot be enabled without the local_ip bound to an interface on the 
host. Please>
  2021-08-02 15:27:09.601 56805 INFO neutron.common.config [-] Logging enabled!
  2021-08-02 15:27:09.601 56805 INFO neutron.common.config [-] 
/usr/bin/neutron-linuxbridge-agent version 18.0.0
  2021-08-02 15:27:09.601 56805 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Interface mappings: {}
  2021-08-02 15:27:09.601 56805 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Bridge mappings: {}
  2021-08-02 15:27:09.602 56805 ERROR 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Tunneling cannot be enabled without the local_ip bound to an interface on the 
host. Please>
  2021-08-02 15:27:11.321 56832 INFO neutron.common.config [-] Logging enabled!
  2021-08-02 15:27:11.322 56832 INFO neutron.common.config [-] 
/usr/bin/neutron-linuxbridge-agent version 18.0.0
  2021-08-02 15:27:11.322 56832 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Interface mappings: {}
  2021-08-02 15:27:11.322 56832 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Bridge mappings: {}
  2021-08-02 15:27:11.322 56832 ERROR 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Tunneling cannot be enabled without the local_ip bound to an interface on the 
host. Please>
  2021-08-03 15:48:49.813 372237 INFO neutron.common.config [-] Logging enabled!
  2021-08-03 15:48:49.814 372237 INFO neutron.common.config [-] 
/usr/bin/neutron-linuxbridge-agent version 18.0.0
  2021-08-03 15:48:49.814 372237 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Interface mappings: {'provider': 'enp0s31f6'}
  2021-08-03 15:48:49.814 372237 INFO 
neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] 
Bridge mappings: {}
  

[Yahoo-eng-team] [Bug 1938685] [NEW] ofctl timeouts lead to dvr-ha-multinode-full failures

2021-08-02 Thread Oleg Bondarev
Public bug reported:

Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job
start failing often. Usual test fail is:

"Details: (ServersTestJSON:setUpClass) Server
74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and
task state "None" within the required time (196 s). Current status:
BUILD. Current task state: spawning."

Looking at logs I see that the reason is ofctl timeout (300 sec) that
causes OVS agent to not process new port(s) in time:

Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: DEBUG 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received 
{{(pid=82746) fdb_add 
/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None 
req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request 
version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 
OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst':
 'fa:16:3e:0f:58:bc', 'vlan_vid': 
4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: 
eventlet.timeout.Timeout: 300 seconds
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF 
ports: RuntimeError: ofctl request 
version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 
OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst':
 'fa:16:3e:0f:58:bc', 'vlan_vid': 
4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py",
 line 91, in _send_msg
...

Not sure why ofctl timeouts, but anyway default 300 seems too much

** Affects: neutron
 Importance: High
 Status: Triaged


** Tags: gate-failure

** Description changed:

  Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job
  start failing often. Usual test fail is:
  
  "Details: (ServersTestJSON:setUpClass) Server
  74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and
  task state "None" within the required time (196 s). Current status:
  BUILD. Current task state: spawning."
  
  Looking at logs I see that the reason is ofctl timeout (300 sec) that
  causes OVS agent to not process new port(s) in time:
  
  Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: DEBUG 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received 
{{(pid=82746) fdb_add 
/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
  Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None 
req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request 
version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 
OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst':
 'fa:16:3e:0f:58:bc', 'vlan_vid': 
4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: 
eventlet.timeout.Timeout: 300 seconds
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 
neutron-openvswitch-agent[82746]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF 
ports: RuntimeError: ofctl request 

[Yahoo-eng-team] [Bug 1933234] [NEW] [Fullstack] TestLegacyL3Agent.test_mtu_update fails sometimes

2021-06-22 Thread Oleg Bondarev
Public bug reported:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
return f(self, *args, **kwargs)
  File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py",
 line 322, in test_mtu_update
common_utils.wait_until_true(lambda: ri_dev.link.mtu == mtu)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", 
line 707, in wait_until_true
raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 60 seconds

example:
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c93/674044/13/check
/neutron-fullstack-with-uwsgi/c9334b7/testr_results.html

So router interface device MTU is not updated after network MTU update.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1933234

Title:
  [Fullstack] TestLegacyL3Agent.test_mtu_update fails sometimes

Status in neutron:
  New

Bug description:
  Traceback (most recent call last):
File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", 
line 183, in func
  return f(self, *args, **kwargs)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py",
 line 322, in test_mtu_update
  common_utils.wait_until_true(lambda: ri_dev.link.mtu == mtu)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 
707, in wait_until_true
  raise WaitTimeout(_("Timed out after %d seconds") % timeout)
  neutron.common.utils.WaitTimeout: Timed out after 60 seconds

  example:
  
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c93/674044/13/check
  /neutron-fullstack-with-uwsgi/c9334b7/testr_results.html

  So router interface device MTU is not updated after network MTU
  update.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1933234/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1930401] Re: Fullstack l3 agent tests failing due to timeout waiting until port is active

2021-06-02 Thread Oleg Bondarev
** Also affects: oslo.privsep
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1930401

Title:
  Fullstack l3 agent tests failing due to timeout waiting until port is
  active

Status in neutron:
  Confirmed
Status in oslo.privsep:
  New

Bug description:
  Many fullstack L3 agent related tests are failing recently and the
  common thing for many of them is the fact that they are failing while
  waiting until port status will be ACTIVE. Like e.g.:

  
https://9cec50bd524f94a2df4c-c6273b9a7cf594e42eb2c4e7f818.ssl.cf5.rackcdn.com/791365/6/check/neutron-fullstack-with-uwsgi/6fc0704/testr_results.html
  
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_73b/793141/2/check/neutron-fullstack-with-uwsgi/73b08ae/testr_results.html
  
https://b87ba208d44b7f1356ad-f27c11edabee52a7804784593cf2712d.ssl.cf5.rackcdn.com/791365/5/check/neutron-fullstack-with-uwsgi/634ccb1/testr_results.html
  
https://dd43e0f9601da5e2e650-51b18fcc89837fbadd0245724df9c686.ssl.cf1.rackcdn.com/791365/6/check/neutron-fullstack-with-uwsgi/5413cd9/testr_results.html
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8d0/791365/5/check/neutron-fullstack-with-uwsgi/8d024fb/testr_results.html
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_188/791365/5/check/neutron-fullstack-with-uwsgi/188aa48/testr_results.html
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9a3/792998/2/check/neutron-fullstack-with-uwsgi/9a3b5a2/testr_results.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1930401/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1928299] Re: centos7 train vm live migration stops network on vm for some minutes

2021-05-13 Thread Oleg Bondarev
** Also affects: neutron/train
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1928299

Title:
  centos7 train vm live migration stops network on vm for some minutes

Status in neutron:
  New
Status in neutron train series:
  New

Bug description:
  Hello, I have upgraded my centos 7 openstack installation from Stein to Train.
  On train I am facing an issue with live migration:
  when a vm is migrated from one kvm node to another, it stops to respond to 
ping requests from some minutes.
  I had the same issue on Stein and I resolved it with a workaround suggest by 
Sean Mooney where legacy port binding was used. 

  On train seems there aren't backported patches to solve the issue.

  I enabled debug option on neutron and here there is the dhcp-agent.log from 
the exact time when the live migration started:
  http://paste.openstack.org/show/805325/

  Here there is the openvswitch-agent log from the source kvm node:

  http://paste.openstack.org/show/805327/

  Here there is the openvswich agent log from the destination kvm node:

  http://paste.openstack.org/show/805329/

  
  I am using openvswitch mechanism driver and iptables_hybrid firewall driver.

  Please any help will be appreciated 
  Ignazio

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1928299/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923470] [NEW] test_security_group_recreated_on_port_update fails in CI

2021-04-12 Thread Oleg Bondarev
Public bug reported:

neutron-tempest-plugin-api job start failing,
test_security_group_recreated_on_port_update:

Traceback (most recent call last):
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py",
 line 43, in test_security_group_recreated_on_port_update
self.assertIn('default', names)
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 421, in assertIn
self.assertThat(haystack, Contains(needle), message)
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 502, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 'default' not in []

Seems the culprit is patch
https://review.opendev.org/c/openstack/neutron/+/777605.

** Affects: neutron
 Importance: Critical
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923470

Title:
  test_security_group_recreated_on_port_update fails in CI

Status in neutron:
  New

Bug description:
  neutron-tempest-plugin-api job start failing,
  test_security_group_recreated_on_port_update:

  Traceback (most recent call last):
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py",
 line 43, in test_security_group_recreated_on_port_update
  self.assertIn('default', names)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 421, in assertIn
  self.assertThat(haystack, Contains(needle), message)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 502, in assertThat
  raise mismatch_error
  testtools.matchers._impl.MismatchError: 'default' not in []

  Seems the culprit is patch
  https://review.opendev.org/c/openstack/neutron/+/777605.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923470/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923161] [NEW] DHCP notification could be optimized

2021-04-09 Thread Oleg Bondarev
Public bug reported:

DHCP notification is done after each create/update/delete for
network, subnet and port [1].

This notification currently has to retrieve network from DB each time,
which is a quite heavy DB request and hence affects performance of
port and subnet CRUD [2].

2 proposals:
- not fetch network when it's not needed
- pass network dict from plugin

[1]
https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L111-L120

[2]
https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L200

** Affects: neutron
 Importance: Wishlist
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: loadimpact

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923161

Title:
  DHCP notification could be optimized

Status in neutron:
  In Progress

Bug description:
  DHCP notification is done after each create/update/delete for
  network, subnet and port [1].

  This notification currently has to retrieve network from DB each time,
  which is a quite heavy DB request and hence affects performance of
  port and subnet CRUD [2].

  2 proposals:
  - not fetch network when it's not needed
  - pass network dict from plugin

  [1]
  
https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L111-L120

  [2]
  
https://github.com/openstack/neutron/blob/bdd661d21898d573ef39448316860aa4c692b834/neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py#L200

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923161/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1917866] [NEW] No need to fetch whole network object on port create

2021-03-05 Thread Oleg Bondarev
Public bug reported:

DB plugin checks for network existence during port create:
https://github.com/openstack/neutron/blob/cb64e3a19fdddb3eac593114a482c9dd69be68d5/neutron/db/db_base_plugin_v2.py#L1422

There is no need to fetch the whole net object (which leads to several
heavy DB requests according to OSProfiler stats) when only need to check
net existence.

** Affects: neutron
 Importance: Wishlist
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: db loadimpact

** Changed in: neutron
 Assignee: (unassigned) => Oleg Bondarev (obondarev)

** Changed in: neutron
   Importance: Undecided => Wishlist

** Tags added: loadimpact

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1917866

Title:
  No need to fetch whole network object on port create

Status in neutron:
  New

Bug description:
  DB plugin checks for network existence during port create:
  
https://github.com/openstack/neutron/blob/cb64e3a19fdddb3eac593114a482c9dd69be68d5/neutron/db/db_base_plugin_v2.py#L1422

  There is no need to fetch the whole net object (which leads to several
  heavy DB requests according to OSProfiler stats) when only need to
  check net existence.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1917866/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1916618] Re: Neutron unit test for QoS driver fails in master branch

2021-02-23 Thread Oleg Bondarev
Please make sure you have latest neutron-lib version (2.9.0) installed
on your env, this should fix the test.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916618

Title:
  Neutron unit test for QoS driver fails in master branch

Status in neutron:
  Invalid

Bug description:
  When running the unit tests on my own machine, 1 test related to QoS
  fails. However this does not seem to be happening on the Zuul gates.
  It also happens in other people workstation.

  Step-by-step reproduction: 
  - Clone neutron repo
  - Run Python 3.8 unit tests with Tox. You can use:
  $ tox -e py38 
neutron.tests.unit.services.qos.drivers.test_manager.TestQoSDriversRulesValidations.test_validate_rule_for_network)

  OUTPUT:

  
neutron.tests.unit.services.qos.drivers.test_manager.TestQoSDriversRulesValidations.test_validate_rule_for_network
  
--

  Captured traceback:
  ~~~
  Traceback (most recent call last):

File "/home/elvira/neutron/neutron/tests/base.py", line 182, in func
  return f(self, *args, **kwargs)

File 
"/home/elvira/neutron/neutron/tests/unit/services/qos/drivers/test_manager.py", 
line 141, in test_validate_rule_for_network
  self.assertTrue(driver_manager.validate_rule_for_network(

File "/home/elvira/neutron/neutron/services/qos/drivers/manager.py", 
line 160, in validate_rule_for_network
  driver.validate_rule_for_network(context, rule,

  AttributeError: 'QoSDriver' object has no attribute
  'validate_rule_for_network'

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1916618/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1916428] Re: dibbler tool for dhcpv6 is concluded

2021-02-21 Thread Oleg Bondarev
It's not an actual bug in Neutron, but the topic is worth a discussion.

** Changed in: neutron
   Status: New => Opinion

** Tags added: ipv6

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916428

Title:
  dibbler tool for dhcpv6 is concluded

Status in neutron:
  Opinion

Bug description:
  Hi team,
  according to the latest annoucement of 
https://github.com/tomaszmrugalski/dibbler, 
  seems the said project is concluded by lacking maintainers,
  and I also found the said tools have been as the Ipv6 dhcp default 
implementation. 

  The author suggest https://gitlab.isc.org/isc-projects/kea . Is there
  any plan for this in Neutron team?

  Thanks very much

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1916428/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905726] Re: Qos plugin performs too many queries

2021-02-16 Thread Oleg Bondarev
** Changed in: neutron
   Status: In Progress => Fix Released

** Changed in: neutron
Milestone: None => wallaby-3

** Changed in: neutron
   Status: Fix Released => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905726

Title:
  Qos plugin performs too many queries

Status in neutron:
  Fix Committed

Bug description:
  Whenever retrieving the port list while having the QoS plugin enabled,
  Neutron performs about 10 DB queries per port, most of them being QoS
  related: http://paste.openstack.org/raw/800461/

  For 1000 ports, we end up with 10 000 sequential DB queries. A simple
  "neutron port-list" or "nova list" command will exceed 1 minute, which
  is likely to hit timeouts.

  This seems to be the problem:
  
https://github.com/openstack/neutron/blob/17.0.0/neutron/db/db_base_plugin_v2.py#L1566-L1570

  For each of the retrieved ports, the plugins are then supposed to
  provide additional details, so for each port we get a certain number
  of extra queries.

  One idea would be to add a flag such as 'detailed' or
  'include_extensions' to 'get_ports' and then propagate it to
  '_make_port_dict' through the 'process_extensions' parameter. Another
  idea would be to let the plugins extend the query but that might be
  less feasible.

  Worth mentioning that there were a couple of commits meant to reduce the 
number of queries but it's still excessive:
  https://review.opendev.org/c/openstack/neutron/+/667998
  https://review.opendev.org/c/openstack/neutron/+/667981/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905726/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1915271] [NEW] test_create_router_set_gateway_with_fixed_ip fails in dvr-ha job

2021-02-10 Thread Oleg Bondarev
/opt/stack/tempest/tempest/lib/services/network/networks_client.py", 
line 52, in delete_network
return self.delete_resource(uri)
  File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
resp, body = self.delete(req_uri)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 331, in 
delete
return self.request('DELETE', url, extra_headers, headers, body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in 
request
self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 825, in 
_error_checker
raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: Conflict with state of target resource
Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on 
network c06258a3-d817-4ffd-b9c6-1c20eaedd688. There are one or more ports still 
in use on the network.', 'detail': ''}
}}}

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 109, in 
wrapper
return func(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/network/admin/test_routers.py", line 
254, in test_create_router_set_gateway_with_fixed_ip
self.admin_routers_client.delete_router(router['id'])
  File "/opt/stack/tempest/tempest/lib/services/network/routers_client.py", 
line 52, in delete_router
return self.delete_resource(uri)
  File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
resp, body = self.delete(req_uri)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 331, in 
delete
return self.request('DELETE', url, extra_headers, headers, body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 704, in 
request
self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 880, in 
_error_checker
raise exceptions.ServerFault(resp_body, resp=resp,
tempest.lib.exceptions.ServerFault: Got server fault
Details: Request Failed: internal server error while processing your request.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: gate-failure l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1915271

Title:
  test_create_router_set_gateway_with_fixed_ip fails in dvr-ha job

Status in neutron:
  Confirmed

Bug description:
  test_create_router_set_gateway_with_fixed_ip periodically fails in 
neutron-tempest-dvr-ha-multinode-full.
  Failure start to happen after new engine facade switch in L3 code.
  Failure is due to router failed to be deleted (neutron returns 500), see 
first line in below traceback.

  Traceback:

  2021-01-28 13:36:26,139 81948 INFO [tempest.lib.common.rest_client] 
Request (RoutersAdminTest:test_create_router_set_gateway_with_fixed_ip): 500 
DELETE 
https://10.209.160.170:9696/v2.0/routers/3f337e0e-3bed-44f7-a8f9-e43c01787445 
10.779s
  2021-01-28 13:36:26,139 81948 DEBUG[tempest.lib.common.rest_client] 
Request - Headers: {'Content-Type': 'application/json', 'Accept': 
'application/json', 'X-Auth-Token': ''}
  Body: None
  Response - Headers: {'date': 'Thu, 28 Jan 2021 13:36:26 GMT', 'server': 
'Apache/2.4.41 (Ubuntu)', 'content-type': 'application/json', 'content-length': 
'150', 'x-openstack-request-id': 'req-2392442f-a3b0-4c79-a79c-c765c7cab834', 
'connection': 'close', 'status': '500', 'content-location': 
'https://10.209.160.170:9696/v2.0/routers/3f337e0e-3bed-44f7-a8f9-e43c01787445'}
  Body: b'{"NeutronError": {"type": "HTTPInternalServerError", 
"message": "Request Failed: internal server error while processing your 
request.", "detail": ""}}'
  2021-01-28 13:36:26,305 81948 INFO [tempest.lib.common.rest_client] 
Request (RoutersAdminTest:_run_cleanups): 409 DELETE 
https://10.209.160.170:9696/v2.0/subnets/812a9855-15a2-4a8e-b246-c6ea68cdadcd 
0.165s
  2021-01-28 13:36:26,306 81948 DEBUG[tempest.lib.common.rest_client] 
Request - Headers: {'Content-Type': 'application/json', 'Accept': 
'application/json', 'X-Auth-Token': ''}
  Body: None
  Response - Headers: {'date': 'Thu, 28 Jan 2021 13:36:26 GMT', 'server': 
'Apache/2.4.41 (Ubuntu)', 'content-type': 'application/json', 'content-length': 
'204', 'x-openstack-request-id': 'req-9eb0af8b-9154-489b-94b8-93d3da458c05', 
'connection': 'close', 'status': '409', 'content-location': 
'https://10.209.160.170:9696/v2.0/subnets/812a9855-15a2-4a8e-b246-c6ea68cdadcd'}
  Body: b'{"NeutronError": {"type": "SubnetInUse", "message": "Unable 
to

[Yahoo-eng-team] [Bug 1905552] Re: neutron-fwaas netlink conntrack driver would catch error while conntrack rules protocol is 'unknown'

2020-11-25 Thread Oleg Bondarev
Neutron-fwaas development is stopped:
https://review.opendev.org/c/openstack/governance/+/735828/

** Changed in: neutron
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905552

Title:
  neutron-fwaas netlink conntrack driver would catch error while
  conntrack rules protocol is 'unknown'

Status in neutron:
  Won't Fix

Bug description:
  2020-11-25 11:07:32.606 127 DEBUG oslo_concurrency.lockutils 
[req-ab14782d-80b1-43f6-8d1b-2874531aca5e - 9d40b483f885496896d81c487f420438 - 
- -] Releasing semaphore 
"iptables-qrouter-9e18395d-961d-46b3-a0e9-4c6a94c32baf" lock 
/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 [req-ab14782d-80b1-43f6-8d1b-2874531aca5e - 9d40b483f885496896d81c487f420438 - 
- -] Failed to update firewall: daedc38a-04ee-4818-b7a6-3d8311d7fc30: KeyError: 
'unknown'
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 Traceback (most recent call last):
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/iptables_fwaas_v2.py",
 line 144, in update_firewall_group
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 apply_list, self.pre_firewall, firewall)
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/iptables_fwaas_v2.py",
 line 327, in _remove_conntrack_updated_firewall
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 ipt_mgr.namespace)
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/neutron_fwaas/services/firewall/service_drivers/agents/drivers/linux/netlink_conntrack.py",
 line 41, in delete_entries
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 entries = nl_lib.list_entries(namespace)
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", 
line 207, in _wrap
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 return self.channel.remote_call(name, args, kwargs)
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 
202, in remote_call
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 raise exc_type(*result[2])
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2
 KeyError: 'unknown'
  2020-11-25 11:07:32.609 127 ERROR 
neutron_fwaas.services.firewall.service_drivers.agents.drivers.linux.iptables_fwaas_v2

  This error appears when  configured the neutron-fwaas v2 with 
netlink_conntrack driver in fwaas_agent.ini
  vim /etc/kolla/neutron-l3-agent/fwaas_driver.ini 
 [fwaas]
 enabled = True
 agent_version = v2
 driver = iptables_v2
 conntrack_driver = netlink_conntrack

  And the conntrack list has 'unknown' rules, example below:
  unknown  2 597 src=169.254.192.2 dst=224.0.0.22 [UNREPLIED] src=224.0.0.22 
dst=169.254.192.2 mark=0 use=1
  unknown  112 598 src=169.254.192.2 dst=224.0.0.18 [UNREPLIED] src=224.0.0.18 
dst=169.254.192.2 mark=0 use=1

  This may interrupt conntrack refresh when firewall rules update.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905552/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905538] [NEW] Some OVS bridges may lack OpenFlow10 protocol

2020-11-25 Thread Oleg Bondarev
Public bug reported:

After commit https://review.opendev.org/c/openstack/neutron/+/371455 
OVSAgentBridge.setup_controllers() no longer sets OpenFlow10 protocol for the 
bridge, instead it was moved to ovs_lib.OVSBridge.create(). 
However some (custom) OVS bridges could be created by nova/os-vif when plugging 
VM interface.
For such bridges neutron does not call create(), only setup_controllers() - as 
a result such bridges support only OpenFlow13 and ovs-ofctl command fails:

2020-11-24T20:18:38Z|1|vconn|WARN|unix:/var/run/openvswitch/br01711489f-fe.24081.mgmt:
 version negotiation failed (we support version 0x01, peer supports version 
0x04)
ovs-ofctl: br01711489f-fe: failed to connect to socket (Broken pipe)

Fix: return setting of OpenFlow10 (along with OpenFlow13) to
setup_controllers(). It's doesn't hurt even if bridge already has
OpenFlow10 in supported protocols.

** Affects: neutron
 Importance: Low
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: ovs ovs-lib

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905538

Title:
  Some OVS bridges may lack OpenFlow10 protocol

Status in neutron:
  New

Bug description:
  After commit https://review.opendev.org/c/openstack/neutron/+/371455 
OVSAgentBridge.setup_controllers() no longer sets OpenFlow10 protocol for the 
bridge, instead it was moved to ovs_lib.OVSBridge.create(). 
  However some (custom) OVS bridges could be created by nova/os-vif when 
plugging VM interface.
  For such bridges neutron does not call create(), only setup_controllers() - 
as a result such bridges support only OpenFlow13 and ovs-ofctl command fails:

  
2020-11-24T20:18:38Z|1|vconn|WARN|unix:/var/run/openvswitch/br01711489f-fe.24081.mgmt:
 version negotiation failed (we support version 0x01, peer supports version 
0x04)
  ovs-ofctl: br01711489f-fe: failed to connect to socket (Broken pipe)

  Fix: return setting of OpenFlow10 (along with OpenFlow13) to
  setup_controllers(). It's doesn't hurt even if bridge already has
  OpenFlow10 in supported protocols.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905538/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905392] Re: xanax online cod overnight

2020-11-24 Thread Oleg Bondarev
** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905392

Title:
  xanax online cod overnight

Status in neutron:
  Invalid

Bug description:
  xanax online cod overnight

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905392/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1905268] [NEW] port list performance for trunks can be optimized

2020-11-23 Thread Oleg Bondarev
Public bug reported:

Use case: many trunk ports each with many subports.
Problem: port list takes much time.
Reason: for each port trunk extension adds a DB call to retrieve subports mac 
addresses.
Solution: retrieve subports info once when having a full list of trunk ports 
and hence full list of subport IDs.

** Affects: neutron
 Importance: Wishlist
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: loadimpact

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905268

Title:
  port list performance for trunks can be optimized

Status in neutron:
  Confirmed

Bug description:
  Use case: many trunk ports each with many subports.
  Problem: port list takes much time.
  Reason: for each port trunk extension adds a DB call to retrieve subports mac 
addresses.
  Solution: retrieve subports info once when having a full list of trunk ports 
and hence full list of subport IDs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905268/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1902998] [NEW] tempest test_create_router_set_gateway_with_fixed_ip often fails with DVR

2020-11-04 Thread Oleg Bondarev
Public bug reported:

test_create_router_set_gateway_with_fixed_ip often fails in neutron-
tempest-dvr-ha-multinode-full:

traceback-1: {{{
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in 
call_and_ignore_notfound_exc
return func(*args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/services/network/networks_client.py", 
line 52, in delete_network
return self.delete_resource(uri)
  File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
resp, body = self.delete(req_uri)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in 
delete
return self.request('DELETE', url, extra_headers, headers, body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in 
request
self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in 
_error_checker
raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: Conflict with state of target resource
Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on 
network 40a63562-61e7-41b0-82c8-e076b8463584. There are one or more ports still 
in use on the network.', 'detail': ''}
}}}

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in 
call_and_ignore_notfound_exc
return func(*args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/services/network/subnets_client.py", 
line 52, in delete_subnet
return self.delete_resource(uri)
  File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
resp, body = self.delete(req_uri)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in 
delete
return self.request('DELETE', url, extra_headers, headers, body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in 
request
self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in 
_error_checker
raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: Conflict with state of target resource
Details: {'type': 'SubnetInUse', 'message': 'Unable to complete operation on 
subnet a1110e0b-d7c8-4830-b1df-e526b632aab9: One or more ports have an IP 
allocation from this subnet.', 'detail': ''}

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1902998

Title:
  tempest test_create_router_set_gateway_with_fixed_ip  often fails with
  DVR

Status in neutron:
  In Progress

Bug description:
  test_create_router_set_gateway_with_fixed_ip often fails in neutron-
  tempest-dvr-ha-multinode-full:

  traceback-1: {{{
  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, 
in call_and_ignore_notfound_exc
  return func(*args, **kwargs)
File "/opt/stack/tempest/tempest/lib/services/network/networks_client.py", 
line 52, in delete_network
  return self.delete_resource(uri)
File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
  resp, body = self.delete(req_uri)
File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in 
delete
  return self.request('DELETE', url, extra_headers, headers, body)
File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 702, in 
request
  self._error_checker(resp, resp_body)
File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 823, in 
_error_checker
  raise exceptions.Conflict(resp_body, resp=resp)
  tempest.lib.exceptions.Conflict: Conflict with state of target resource
  Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on 
network 40a63562-61e7-41b0-82c8-e076b8463584. There are one or more ports still 
in use on the network.', 'detail': ''}
  }}}

  Traceback (most recent call last):
File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, 
in call_and_ignore_notfound_exc
  return func(*args, **kwargs)
File "/opt/stack/tempest/tempest/lib/services/network/subnets_client.py", 
line 52, in delete_subnet
  return self.delete_resource(uri)
File "/opt/stack/tempest/tempest/lib/services/network/base.py", line 41, in 
delete_resource
  resp, body = self.delete(req_uri)
File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 329, in 
delete
  return self.request('D

[Yahoo-eng-team] [Bug 1862315] [NEW] Sometimes VMs can't get IP when spawned concurrently

2020-02-07 Thread Oleg Bondarev
Public bug reported:

Version: Stein
Scenario description:
Rally creates 60 VMs with 6 threads. Each thread:
 - creates a VM
 - pings it
 - if successful ping, tries to reach the VM via ssh and execute a command. It 
tries to do that during 2 minutes.
 - if successful ssh - deletes the VM

For some VMs ping fails. Console log shows that VM failed to get IP from
DHCP.

tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does 
not reply.
>From dnsmasq logs:

Feb  6 00:15:43 dnsmasq[4175]: read 
/var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 
addresses
Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at 
line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

So it must be something wrong with neutron-dhcp-agent network cache.

>From neutron-dhcp-agent log:

2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been 
scheduled _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function 
clear wrapper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync 
(da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP 
cache is out of sync'] _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

so the agent is aware of invalid cache for the net, but for unknown
reason actual net resync happens only in 8 minutes:

2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-f5107bdd-
d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1862315

Title:
  Sometimes VMs can't get IP when spawned concurrently

Status in neutron:
  New

Bug description:
  Version: Stein
  Scenario description:
  Rally creates 60 VMs with 6 threads. Each thread:
   - creates a VM
   - pings it
   - if successful ping, tries to reach the VM via ssh and execute a command. 
It tries to do that during 2 minutes.
   - if successful ssh - deletes the VM

  For some VMs ping fails. Console log shows that VM failed to get IP
  from DHCP.

  tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does 
not reply.
  From dnsmasq logs:

  Feb  6 00:15:43 dnsmasq[4175]: read 
/var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 
addresses
  Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at 
line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

  So it must be something wrong with neutron-dhcp-agent network cache.

  From neutron-dhcp-agent log:

  2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been 
scheduled _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
  2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function 
clear wrapper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
  2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent 
[req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync 
(da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP 
cache is out of sync'] _periodic_resync_helper 
/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

  so the agent is aware of invalid cache for the net, but for unknown
  reason actual net resync happens only in 8 minutes:

  2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-
  f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1862315/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1860521] [NEW] L2 pop notifications are not reliable

2020-01-22 Thread Oleg Bondarev
Public bug reported:

Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between
nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC
message loss or agent failure on applying fdb entry updates.

Why: currently FDB entries are sent by neutron server to L2 agents one-
way (no feedback), thus agent has no way to detect if all required
tunnels/flows are built. On the other hand server has no way to detect
if all sent FDB entries were delivered and required flows were applied.
In case some messages are lost - only agent restart fixes possible
issues.

Way to address: new synchronization mechanism on L2 agent side, which
will periodically request net topology from server and match it to
actual config applied on the node, with applying missing parts.

Option 2: move from RPC fanouts and casts to RPC calls which guarantee
message delivery. Concerns: scalability, increased load on neutron
server.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1860521

Title:
  L2 pop notifications are not reliable

Status in neutron:
  New

Bug description:
  Problem: lack of connectivity (e.g. vxlan tunnels, OVS flows) between
  nodes/VMs in L2 segment due to partial RabbitMQ unavailability, RPC
  message loss or agent failure on applying fdb entry updates.

  Why: currently FDB entries are sent by neutron server to L2 agents
  one-way (no feedback), thus agent has no way to detect if all required
  tunnels/flows are built. On the other hand server has no way to detect
  if all sent FDB entries were delivered and required flows were
  applied. In case some messages are lost - only agent restart fixes
  possible issues.

  Way to address: new synchronization mechanism on L2 agent side, which
  will periodically request net topology from server and match it to
  actual config applied on the node, with applying missing parts.

  Option 2: move from RPC fanouts and casts to RPC calls which guarantee
  message delivery. Concerns: scalability, increased load on neutron
  server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1860521/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1850639] Re: FloatingIP list bad performance

2019-11-12 Thread Oleg Bondarev
Ok, so it's not related to sqlalchemy, as I expected it's an issue with
neutron DB object, fixed in Rocky:
https://review.opendev.org/#/c/565358/

** Changed in: neutron
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1850639

Title:
  FloatingIP list bad performance

Status in neutron:
  Invalid

Bug description:
  Faced on stable/queens but applicable to master too.
  On quite a heavy loaded environment it was noticed that simple floatingip 
list command takes significant time (~1200 fips) while for example port list is 
always faster (>7000 ports).
  If enable sqlalchemy debug logs there can be seen lots of:

  2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry 
[req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb
  b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] 
set 'memoized_setups' on path 'EntityRegistry(
  (,))' to '{}' set 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry.
  py:63

  - which basically eats all the time of a request.

  As a test I commented 'dns' field in FloatingIP DB object definition and 
response time reduced from 14 to 1 second. DNS extension is not configured on 
the environment and no external DNS is used.
  Also I don't see this field used anywhere in neutron.

  Interestingly Port DB object has 'dns' field either (with
  corresponding portdnses table in DB, all the same as done for
  floatingips), however DB object is not used when listing ports.

  The proposal would be to remove 'dns' field from FloatingIP OVO as not
  used, until we find performance bottleneck.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1850639/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1851609] [NEW] Add an option for graceful l3 agent shutdown

2019-11-07 Thread Oleg Bondarev
Public bug reported:

If KillMode in systemd config of a neutron l3 agent service is set to
'process' - it will not kill child processes on main service stop - this
is useful when we don't want data-plane downtime on agent stop/restart
due to keepalived exit.

However in some cases graceful cleanup on l3 agent shutdown is needed -
like with containerised control plane, when kubernetes kills l3-agent
pod, it automatically kills its children (keepalived processes) in non-
graceful way, so that keepalived does not clear VIPs. This leads to a
situation when same VIP is present on different nodes and hence to long
downtime.

The proposal is to add a new l3 agent config so that it handles stop
(SIGTERM) by deleting all routers. For HA routers it results in graceful
keepalived shutdown.

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ha l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1851609

Title:
  Add an option for graceful l3 agent shutdown

Status in neutron:
  New

Bug description:
  If KillMode in systemd config of a neutron l3 agent service is set to
  'process' - it will not kill child processes on main service stop -
  this is useful when we don't want data-plane downtime on agent
  stop/restart due to keepalived exit.

  However in some cases graceful cleanup on l3 agent shutdown is needed
  - like with containerised control plane, when kubernetes kills
  l3-agent pod, it automatically kills its children (keepalived
  processes) in non-graceful way, so that keepalived does not clear
  VIPs. This leads to a situation when same VIP is present on different
  nodes and hence to long downtime.

  The proposal is to add a new l3 agent config so that it handles stop
  (SIGTERM) by deleting all routers. For HA routers it results in
  graceful keepalived shutdown.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1851609/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1850639] [NEW] FloatingIP list bad performance

2019-10-30 Thread Oleg Bondarev
Public bug reported:

Faced on stable/queens but applicable to master too.
On quite a heavy loaded environment it was noticed that simple floatingip list 
command takes significant time (~1200 fips) while for example port list is 
always faster (>7000 ports).
If enable sqlalchemy debug logs there can be seen lots of:

2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry 
[req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb
b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] set 
'memoized_setups' on path 'EntityRegistry(
(,))' to '{}' set 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry.
py:63

- which basically eats all the time of a request.

As a test I commented 'dns' field in FloatingIP DB object definition and 
response time reduced from 14 to 1 second. DNS extension is not configured on 
the environment and no external DNS is used.
Also I don't see this field used anywhere in neutron.

Interestingly Port DB object has 'dns' field either (with corresponding
portdnses table in DB, all the same as done for floatingips), however DB
object is not used when listing ports.

The proposal would be to remove 'dns' field from FloatingIP OVO as not
used, until we find performance bottleneck.

** Affects: neutron
 Importance: High
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1850639

Title:
  FloatingIP list bad performance

Status in neutron:
  New

Bug description:
  Faced on stable/queens but applicable to master too.
  On quite a heavy loaded environment it was noticed that simple floatingip 
list command takes significant time (~1200 fips) while for example port list is 
always faster (>7000 ports).
  If enable sqlalchemy debug logs there can be seen lots of:

  2019-10-22 21:02:44,977.977 23957 DEBUG sqlalchemy.orm.path_registry 
[req-3db31d53-f6b9-408e-b8c7-bf037ef10a1b 1df8a7d5eb
  b5414b9e29cf581098681c 10479799101a4fe4ada17daa105707c5 - default default] 
set 'memoized_setups' on path 'EntityRegistry(
  (,))' to '{}' set 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/path_registry.
  py:63

  - which basically eats all the time of a request.

  As a test I commented 'dns' field in FloatingIP DB object definition and 
response time reduced from 14 to 1 second. DNS extension is not configured on 
the environment and no external DNS is used.
  Also I don't see this field used anywhere in neutron.

  Interestingly Port DB object has 'dns' field either (with
  corresponding portdnses table in DB, all the same as done for
  floatingips), however DB object is not used when listing ports.

  The proposal would be to remove 'dns' field from FloatingIP OVO as not
  used, until we find performance bottleneck.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1850639/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1849098] [NEW] ovs agent is stuck with OVSFWTagNotFound when dealing with unbound port

2019-10-21 Thread Oleg Bondarev
thon3.6/site-packages/neutron/agent/securitygroups_rpc.py",
 line 125, in decorated_function
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent *args, 
**kwargs)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/securitygroups_rpc.py",
 line 133, in prepare_devices_filter
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self._apply_port_filter(device_ids)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/securitygroups_rpc.py",
 line 164, in _apply_port_filter
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self.firewall.prepare_port_filter(device)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
 line 555, in prepare_port_filter
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent of_port = 
self.get_or_create_ofport(port)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
 line 532, in get_or_create_ofport
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
port_vlan_id = self._get_port_vlan_tag(ovs_port.port_name)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
 line 516, in _get_port_vlan_tag
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return 
get_tag_from_other_config(self.int_br.br, port_name)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/openstack/lib/python3.6/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py",
 line 84, in get_tag_from_other_config
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
port_name=port_name, other_config=other_config)
2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
neutron.agent.linux.openvswitch_firewall.exceptions.OVSFWTagNotFound: Cannot 
get tag for port o-hm0 from its other_config: {}
2019-10-17 11:32:21.909 135 INFO 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Agent out of sync with 
plugin!

this happens in each agent cycle so agent can't do anything.

Need to handle OVSFWTagNotFound in prepare_port_filter() like was done
for update_port_filter in https://review.opendev.org/#/c/630910/

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1849098

Title:
  ovs agent is stuck with OVSFWTagNotFound when dealing with unbound
  port

Status in neutron:
  In Progress

Bug description:
  neutron-openvswitch-agent meets unbound port:

  2019-10-17 11:32:21.868 135 WARNING
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
  aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Device
  ef34215f-e099-4fd0-935f-c9a42951d166 not defined on plugin or binding
  failed

  Later when applying firewall rules:

  2019-10-17 11:32:21.901 135 INFO neutron.agent.securitygroups_rpc 
[req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Preparing filters for 
devices {'ef34215f-e099-4fd0-935f-c9a42951d166', 
'e9c97cf0-1a5e-4d77-b57b-0ba474d12e29', 'fff1bb24-6423-4486-87c4-1fe17c552cca', 
'2e20f9ee-bcb5-445c-b31f-d70d276d45c9', '03a60047-cb07-42a4-8b49-619d5982a9bd', 
'a452cea2-deaf-4411-bbae-ce83870cbad4', '79b03e5c-9be0-4808-9784-cb4878c3dbd5', 
'9b971e75-3c1b-463d-88cf-3f298105fa6e'}
  2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-aae68b42-a99f-4bb3-bcf6-a6d3c4ca9e31 - - - - -] Error while processing VIF 
ports: neutron.agent.linux.openvswitch_firewall.exceptions.OVSFWTagNotFound: 
Cannot get tag for port o-hm0 from its other_config: {}
  2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
  2019-10-17 11:32:21.906 135 ERROR 
neutron.plugins.ml2.drivers.openvswitch.ag

[Yahoo-eng-team] [Bug 1839252] [NEW] Connectivity issues due to skb marks on the encapsulating packet

2019-08-07 Thread Oleg Bondarev
Public bug reported:

Looks like by default OVS tunnels inherit skb marks from tunneled packets. 
As a result Neutron IPTables marks set in qrouter namespace are inherited by 
VXLAN encapsulating packets.
These marks may conflict with marks used by underlying networking (like Calico) 
and lead to VXLAN
tunneled packets being dropped.

The proposal is to set 'egress_pkt_mark = 0' explicitly for tunnel
ports. The option was added in OVS 2.8.0
(https://www.openvswitch.org/releases/NEWS-2.8.0.txt)

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: ovs ovs-lib

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1839252

Title:
  Connectivity issues due to skb marks on the  encapsulating  packet

Status in neutron:
  In Progress

Bug description:
  Looks like by default OVS tunnels inherit skb marks from tunneled packets. 
  As a result Neutron IPTables marks set in qrouter namespace are inherited by 
VXLAN encapsulating packets.
  These marks may conflict with marks used by underlying networking (like 
Calico) and lead to VXLAN
  tunneled packets being dropped.

  The proposal is to set 'egress_pkt_mark = 0' explicitly for tunnel
  ports. The option was added in OVS 2.8.0
  (https://www.openvswitch.org/releases/NEWS-2.8.0.txt)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1839252/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1836023] [NEW] OVS agent "hangs" while processing trusted ports

2019-07-10 Thread Oleg Bondarev
Public bug reported:

Queens, ovsdb native interface.

On a loaded gtw node hosting > 1000 ports when restarting neutron-
openvswitch-agent at some moment agent stops sending state reports and
do any logging for a significant time, depending on number of ports. In
our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds.
Thus if configured agent_down_time is less that 100 seconds, neutron
server sees agent as down, starts resources rescheduling. After agent
stops hanging it sees itself as "revived" and starts new full sync. This
loop is almost endless.

Debug showed the culprit is process_trusted_ports:
https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
- this func does not yield control to other greenthreads and blocks
until all trusted ports are processed. Since on gateway nodes almost al
ports are "trusted" (router and dhcp ports) process_trusted_ports may
take significant time.

The proposal would be to add greenlet.sleep(0) inside loop in
process_trusted_ports - that fixed the issue on our environment.

** Affects: neutron
 Importance: High
     Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: ovs-fw

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836023

Title:
  OVS agent "hangs" while processing trusted ports

Status in neutron:
  In Progress

Bug description:
  Queens, ovsdb native interface.

  On a loaded gtw node hosting > 1000 ports when restarting neutron-
  openvswitch-agent at some moment agent stops sending state reports and
  do any logging for a significant time, depending on number of ports.
  In our case gtw node hosts > 1400 ports and agent hangs for ~100
  seconds. Thus if configured agent_down_time is less that 100 seconds,
  neutron server sees agent as down, starts resources rescheduling.
  After agent stops hanging it sees itself as "revived" and starts new
  full sync. This loop is almost endless.

  Debug showed the culprit is process_trusted_ports:
  
https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
  - this func does not yield control to other greenthreads and blocks
  until all trusted ports are processed. Since on gateway nodes almost
  al ports are "trusted" (router and dhcp ports) process_trusted_ports
  may take significant time.

  The proposal would be to add greenlet.sleep(0) inside loop in
  process_trusted_ports - that fixed the issue on our environment.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836023/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1835731] [NEW] Neutron server error: failed to update port DOWN

2019-07-08 Thread Oleg Bondarev
Public bug reported:

Before adding extra logging:

2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device
d75fca78-2f64-4c5a-9a94-6684c753bf3d down

After adding logging:

2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update device
d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object has no
attribute 'started_at': AttributeError: 'NoneType' object has no
attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03
13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 367,
in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220
ERROR neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
packages/neutron/plugins/ml2/rpc.py", line 233, in
update_device_down2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03
13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 319,
in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc agent_restarted =
l2pop_driver.obj.agent_restarted(port_context)2019-07-03
13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
"/usr/lib/python2.7/dist-
packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in
agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) <
cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in
get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR
neutron.plugins.ml2.rpc return
timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701
17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object
has no attribute 'started_at'

** Affects: neutron
     Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1835731

Title:
  Neutron server error: failed to update port DOWN

Status in neutron:
  New

Bug description:
  Before adding extra logging:

  2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
  2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update
  device d75fca78-2f64-4c5a-9a94-6684c753bf3d down

  After adding logging:

  2019-07-03 13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc [req-
  2b7602fb-e990-45ee-974f-ef3b55b41bed - - - - -] Failed to update
  device d75fca78-2f64-4c5a-9a94-6684c753bf3d down: 'NoneType' object
  has no attribute 'started_at': AttributeError: 'NoneType' object has
  no attribute 'started_at'2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc Traceback (most recent call last):2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line
  367, in update_device_list2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc **kwargs)2019-07-03 13:35:54,701.701 17220
  ERROR neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/rpc.py", line 233, in
  update_device_down2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc n_const.PORT_STATUS_DOWN, host)2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line
  319, in notify_l2pop_port_wiring2019-07-03 13:35:54,701.701 17220
  ERROR neutron.plugins.ml2.rpc agent_restarted =
  l2pop_driver.obj.agent_restarted(port_context)2019-07-03
  13:35:54,701.701 17220 ERROR neutron.plugins.ml2.rpc   File
  "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253,
  in agent_restarted2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc if l2pop_db.get_agent_uptime(agent) <
  cfg.CONF.l2pop.agent_boot_time:2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc   File "/usr/lib/python2.7/dist-
  packages/neutron/plugins/ml2/drivers/l2pop/db.py", line 51, in
  get_agent_uptime2019-07-03 13:35:54,701.701 17220 ERROR
  neutron.plugins.ml2.rpc return
  timeutils.delta_seconds(agent.started_at,2019-07-03 13:35:54,701.701
  17220 ERROR neutron.plugins.ml2.rpc AttributeError: 'NoneType' object
  has no attribute 'started_at'

To manage notifications about this bug go t

[Yahoo-eng-team] [Bug 1831622] [NEW] SRIOV: agent may not register VFs

2019-06-04 Thread Oleg Bondarev
Public bug reported:

When a VM instantiated with a PF-PT (direct-physical) port, the Neutron
SR-IOV agent removes the respective embedded switch device instance from
the switch manager. After the VM releases the PF, the associated device
(sys/class/net/) appears immediately, but the initialization of
its VFs and the creation of the appropriate sysfs entries
(/sys/class/net//device/virtfn<#vf>) may even take more than a
second, depending on the platform and the NIC's kernel driver
capabilities. The Neutron SR-IOV agent eagerly tries to discover and
register NIC devices, that are not blacklisted and not yet known, by
creating the respective embedded switch instances and enumerating the
avalable VFs underneat them. However, when it is done in an early phase,
where the sysfs entries for the VFs are not yet present, because the PF
has just been released, then a port-less embedded switch will be created
to represent that device. As a consequence, port updates that target VFs
which are supposed to belong to a incorrectly registered embedded
device, won't be treated properly by the agent, causing a VM
instantiation timeout.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: sriov-pci-pt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1831622

Title:
  SRIOV: agent may not register VFs

Status in neutron:
  In Progress

Bug description:
  When a VM instantiated with a PF-PT (direct-physical) port, the
  Neutron SR-IOV agent removes the respective embedded switch device
  instance from the switch manager. After the VM releases the PF, the
  associated device (sys/class/net/) appears immediately, but
  the initialization of its VFs and the creation of the appropriate
  sysfs entries (/sys/class/net//device/virtfn<#vf>) may even
  take more than a second, depending on the platform and the NIC's
  kernel driver capabilities. The Neutron SR-IOV agent eagerly tries to
  discover and register NIC devices, that are not blacklisted and not
  yet known, by creating the respective embedded switch instances and
  enumerating the avalable VFs underneat them. However, when it is done
  in an early phase, where the sysfs entries for the VFs are not yet
  present, because the PF has just been released, then a port-less
  embedded switch will be created to represent that device. As a
  consequence, port updates that target VFs which are supposed to belong
  to a incorrectly registered embedded device, won't be treated properly
  by the agent, causing a VM instantiation timeout.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1831622/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1830383] [NEW] SRIOV: MAC address in use error

2019-05-24 Thread Oleg Bondarev
Public bug reported:

When using direct-physical port, the port inherits physical device MAC address 
on binding.
When deleting the VM later - MAC address stays.
If try spawn a VM with another direct-physical port - we have "Neutron error: 
MAC address 0c:c4:7a:de:ae:19 is already in use on network None.: 
MacAddressInUseClient: Unable to complete operation for network 
42915db3-4e46-4150-af9d-86d0c59d765f. The mac address 0c:c4:7a:de:ae:19 is in 
use."

The proposal is to reset port's MAC address when unbinding.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: sriov-pci-pt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1830383

Title:
  SRIOV: MAC address in use error

Status in neutron:
  In Progress

Bug description:
  When using direct-physical port, the port inherits physical device MAC 
address on binding.
  When deleting the VM later - MAC address stays.
  If try spawn a VM with another direct-physical port - we have "Neutron error: 
MAC address 0c:c4:7a:de:ae:19 is already in use on network None.: 
MacAddressInUseClient: Unable to complete operation for network 
42915db3-4e46-4150-af9d-86d0c59d765f. The mac address 0c:c4:7a:de:ae:19 is in 
use."

  The proposal is to reset port's MAC address when unbinding.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1830383/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1825521] [NEW] Bulk IPv6 subnet create: update port called within a transaction

2019-04-19 Thread Oleg Bondarev
m': item})
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 
706, in _create_bulk_ml2
result, mech_context = obj_creator(context, item)
  File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 
1048, in _create_subnet_db
self._create_subnet_postcommit(context, result, net_db, ipam_sub)
  File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 163, in 
wrapped
return method(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 
716, in _create_subnet_postcommit
self.update_port(context, port_id, port_info)
  File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 673, in 
inner
"transaction.") % f)

RuntimeError: Method  cannot be
called within a transaction.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1825521

Title:
  Bulk IPv6 subnet create: update port called within a transaction

Status in neutron:
  New

Bug description:
  When bulk creating auto address IPv6 subnets, port update happens
  within a transaction:

  2019-03-28 15:48:50.894 2377 ERROR
  neutron.pecan_wsgi.hooks.translation [req-e84aba73-3fc5-4b3f-
  bf41-a7e762af4bdf 166b7ed45cd6404e884ba63f89e88bf9
  2ce6b2792eee4dc88639a3575f1ac7f0 - default default] POST failed.:
  RuntimeError: Method  cannot
  be called within a transaction.

  Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 678, in __call__
  self.invoke_controller(controller, args, kwargs, state)
File "/usr/lib/python2.7/dist-packages/pecan/core.py", line 569, in 
invoke_controller
  result = controller(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 93, in 
wrapped
  setattr(e, '_RETRY_EXCEEDED', True)
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
  self.force_reraise()
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
  six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 89, in 
wrapped
  return f(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 150, in wrapper
  ectxt.value = e.inner_exc
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
  self.force_reraise()
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
  six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 138, in wrapper
  return f(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 128, in 
wrapped
  LOG.debug("Retry wrapper got retriable exception: %s", e)
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
  self.force_reraise()
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
  six.reraise(self.type_, self.value, self.tb)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 124, in 
wrapped
  return f(*dup_args, **dup_kwargs)
File 
"/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/utils.py", 
line 76, in wrapped
  return f(*args, **kwargs)
File 
"/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", 
line 159, in post
  return self.create(resources)
File 
"/usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/controllers/resource.py", 
line 177, in create
  return {key: creator(*creator_args, **creator_kwargs)}
File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 674, 
in inner
  return f(self, context, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 163, in 
wrapped
  return method(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 93, in 
wrapped
  setattr(e, '_RETRY_EXCEEDED', True)
File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, 
in __exit__
  self.force_reraise()
File "/usr/lib/python2.7/dist-packages/oslo_utils/

[Yahoo-eng-team] [Bug 1824299] [NEW] Race condition during init may lead to neutron server malfunction

2019-04-11 Thread Oleg Bondarev
Public bug reported:

release: Queens

quite a lot of advanced services enabled:
"service_plugins = 
neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,metering,lbaasv2,neutron.services.qos.qos_plugin.QoSPlugin,trunk,networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin,bgpvpn"

Neutron server fails to start with repeating sqlalchemy errors "Class
'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPNPortAssociationRoute' is
not mapped" or "Class 'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPN' is
not mapped".

The errors happen on handling state reports from agents. So if stop all
neutron agents, start server, wait server initialization, and only then
start agents - everything is ok.

Also it appears that if place 'bgpvpn' in "service_plugins" config closer to 
beginning:
"service_plugins = 
neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,bgpvpn,metering,lbaasv2,neutron.services.qos.qos_plugin.QoSPlugin,trunk,networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin"
 - no errors happen, even if not stop neutron agents during server restart.

Full log near error trace:

2019-04-09 13:42:27,194.194 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-0392737c-e6c8-472f-8685-91190f882862 - - - - -] 
(BGPVPNPortAssociation|bgpvpn_port_associations) initialize prop routes
2019-04-09 13:42:27,197.197 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] 
(BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) 
_configure_property(port_association, RelationshipProperty)
2019-04-09 13:42:27,197.197 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] 
(BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) 
_configure_property(bgpvpn, RelationshipProperty)
2019-04-09 13:42:27,198.198 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] 
(BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) 
_configure_property(id, Column)
2019-04-09 13:42:27,199.199 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] 
(BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) 
_configure_property(port_association_id, Column)
2019-04-09 13:42:27,200.200 15604 INFO sqlalchemy.orm.mapper.Mapper 
[req-917b59e9-2b38-4214-a808-6bf2872d708f - - - - -] 
(BGPVPNPortAssociationRoute|bgpvpn_port_association_routes) 
_configure_property(type, Column)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server 
[req-0392737c-e6c8-472f-8685-91190f882862 - - - - -] Exception during message 
handling: UnmappedClassError: Class 
'networking_bgpvpn.neutron.db.bgpvpn_db.BGPVPNPortAssociationRoute' is not 
mapped
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server Traceback 
(most recent call last):
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in 
_process_incoming
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server res = 
self.dispatcher.dispatch(message)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, 
in dispatch
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return 
self._do_dispatch(endpoint, method, ctxt, args)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, 
in _do_dispatch
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server result = 
func(ctxt, **new_args)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 161, in wrapped
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return 
method(*args, **kwargs)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 91, in wrapped
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server 
setattr(e, '_RETRY_EXCEEDED', True)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server 
self.force_reraise()
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server 
six.reraise(self.type_, self.value, self.tb)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server   File 
"/usr/lib/python2.7/dist-packages/neutron/db/api.py", line 87, in wrapped
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server return 
f(*args, **kwargs)
2019-04-09 13:42:27,201.201 15604 ERROR oslo_messaging.rpc.server  

[Yahoo-eng-team] [Bug 1821753] [NEW] openvswitch agent ofctl request errors: 'timed out' and 'Datapath Invalid'

2019-03-26 Thread Oleg Bondarev
Public bug reported:

Release: Queens, ovsdb_interface=native, of_request_timeout = 30

With number of OVS ports growing on the node following errors start to
occur (starting at ~1200 ports):

ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch 
[req-db47426c-1719-43dd-8ecf-4fb4bdcbc316 - - - - -] ofctl request 
version=None,msg_type=None,msg_len=None,xid=None,OFPFlowMod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 OFPActionSetField(tunnel_id=725), 
OFPActionOutput(len=16,max_len=0,port=1793,type=0), 
OFPActionOutput(len=16,max_len=0,port=2,type=0)],type=4)],match=OFPMatch(oxm_fields={'vlan_vid':
 4175}),out_group=0,out_port=0,priority=1,table_id=22) error Datapath Invalid 
64183592930369: InvalidDatapath: Datapath Invalid
 or 
ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch 
[req-632b8ede-1234-4682-afe0-3aefb615b121 - - - - -] ofctl request 
version=0x4,msg_type=0xe,msg_len=0x78,xid=0x73c67c07,OFPFlow
Mod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 OFPActionSetField(tunnel_id=666), OFPActionOu
tput(len=16,max_len=0,port=2,type=0)],len=48,type=4)],match=OFPMatch(oxm_fields={'eth_dst':
 'fa:16:3e:4a:79:ce', 'vlan_vid': 
6107}),out_group=0,out_port=0,priority=2,table_id=20) timed out: Timeout: 30 
seconds

with corresponding errors is ovs-vswitchd logs:

|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 
seconds, disconnecting
|rconn|ERR|br-floating<->tcp:127.0.0.1:6633: no response to inactivity probe 
after 5 seconds, disconnecting
|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 
seconds, disconnecting


Setting inactivity_probe to a greater value helps:

#ovs-vsctl set controller br-int inactivity_probe=3
#ovs-vsctl set controller br-tun inactivity_probe=3
#ovs-vsctl set controller br-floating inactivity_probe=3

Should neutron allow setting inactivity_probe for controllers?
Should it correspond to of_request_timeout value?

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1821753

Title:
  openvswitch agent ofctl request errors: 'timed out' and 'Datapath
  Invalid'

Status in neutron:
  New

Bug description:
  Release: Queens, ovsdb_interface=native, of_request_timeout = 30

  With number of OVS ports growing on the node following errors start to
  occur (starting at ~1200 ports):

  ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch 
[req-db47426c-1719-43dd-8ecf-4fb4bdcbc316 - - - - -] ofctl request 
version=None,msg_type=None,msg_len=None,xid=None,OFPFlowMod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 OFPActionSetField(tunnel_id=725), 
OFPActionOutput(len=16,max_len=0,port=1793,type=0), 
OFPActionOutput(len=16,max_len=0,port=2,type=0)],type=4)],match=OFPMatch(oxm_fields={'vlan_vid':
 4175}),out_group=0,out_port=0,priority=1,table_id=22) error Datapath Invalid 
64183592930369: InvalidDatapath: Datapath Invalid
   or 
  ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch 
[req-632b8ede-1234-4682-afe0-3aefb615b121 - - - - -] ofctl request 
version=0x4,msg_type=0xe,msg_len=0x78,xid=0x73c67c07,OFPFlow
  
Mod(buffer_id=4294967295,command=0,cookie=5881109557449606263L,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18),
 OFPActionSetField(tunnel_id=666), OFPActionOu
  
tput(len=16,max_len=0,port=2,type=0)],len=48,type=4)],match=OFPMatch(oxm_fields={'eth_dst':
 'fa:16:3e:4a:79:ce', 'vlan_vid': 
6107}),out_group=0,out_port=0,priority=2,table_id=20) timed out: Timeout: 30 
seconds

  with corresponding errors is ovs-vswitchd logs:

  |rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 
5 seconds, disconnecting
  |rconn|ERR|br-floating<->tcp:127.0.0.1:6633: no response to inactivity probe 
after 5 seconds, disconnecting
  |rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 
5 seconds, disconnecting

  
  Setting inactivity_probe to a greater value helps:

  #ovs-vsctl set controller br-int inactivity_probe=3
  #ovs-vsctl set controller br-tun inactivity_probe=3
  #ovs-vsctl set controller br-floating inactivity_probe=3

  Should neutron allow setting inactivity_probe for controllers?
  Should it correspond to of_request_timeout value?

To manage notifications about this bug go to:

[Yahoo-eng-team] [Bug 1817306] [NEW] Failed gARP for floating IP in l3 agent logs

2019-02-22 Thread Oleg Bondarev
Public bug reported:

release: Pike.

When a DVR router with centralized floating IPs is rescheduled from down l3 
agent to another,
following traces are seen on the destination agent:

2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib [-] Failed 
sending gratuitous ARP to 10.13.250.14 on qg-af0de258-a8 in namespace 
snat-afe70a67-a007-4bcf-93ac-099aad63411c: Exit code: 2; Stdin: ; Stdout: ; 
Stderr: bind: Cannot assign requested address
: ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot 
assign requested address
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib Traceback 
(most recent call last):
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 1097, in 
_arping
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1])
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 916, in 
execute
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
log_fail_as_error=log_fail_as_error, **kwargs)
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in 
execute
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib raise 
ProcessExecutionError(msg, returncode=returncode)
2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot 
assign requested address

Earlier in logs following is seen:

2019-02-22 07:52:14,894.894 9528 DEBUG neutron.agent.linux.utils [-] Running 
command (rootwrap daemon): ['ip', 'netns', 'exec', 
'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'ip', '-4', 'addr', 'del', 
'10.13.250.14/32', 'dev', 'qg-af0de258-a8'] execute_rootwrap_daemon 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:108
2019-02-22 07:52:14,922.922 9528 DEBUG neutron.agent.linux.utils [-] Running 
command (rootwrap daemon): ['ip', 'netns', 'exec', 
'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'conntrack', '-D', '-d', 
'10.13.250.14'] execute_rootwrap_daemon 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:108

So centralized floating ip is deleted from gw device for some reason.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1817306

Title:
  Failed gARP for floating IP in l3 agent logs

Status in neutron:
  New

Bug description:
  release: Pike.

  When a DVR router with centralized floating IPs is rescheduled from down l3 
agent to another,
  following traces are seen on the destination agent:

  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib [-] Failed 
sending gratuitous ARP to 10.13.250.14 on qg-af0de258-a8 in namespace 
snat-afe70a67-a007-4bcf-93ac-099aad63411c: Exit code: 2; Stdin: ; Stdout: ; 
Stderr: bind: Cannot assign requested address
  : ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: 
Cannot assign requested address
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib Traceback 
(most recent call last):
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 1097, in 
_arping
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
ip_wrapper.netns.execute(arping_cmd, extra_ok_codes=[1])
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 916, in 
execute
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
log_fail_as_error=log_fail_as_error, **kwargs)
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in 
execute
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib raise 
ProcessExecutionError(msg, returncode=returncode)
  2019-02-21 14:55:59,150.150 24730 ERROR neutron.agent.linux.ip_lib 
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot 
assign requested address

  Earlier in logs following is seen:

  2019-02-22 07:52:14,894.894 9528 DEBUG neutron.agent.linux.utils [-] Running 
command (rootwrap daemon): ['ip', 'netns', 'exec', 
'snat-afe70a67-a007-4bcf-93ac-099aad63411c', 'ip', '-4', 'addr', 'del', 
'10.13.250.14/32', 'dev', 'qg-af0de258-a8'] execute_rootwrap_daemon 
/usr/lib/python2.7/dist-packages/neutron/agent

[Yahoo-eng-team] [Bug 1808136] [NEW] l2 pop doesn't always provide the whole list of fdb entries on OVS restart

2018-12-12 Thread Oleg Bondarev
Public bug reported:

bug https://bugs.launchpad.net/neutron/+bug/1804842 was fixed, but there
is still a race condition, which leads to the same issue:

success scenario:
 - OVS is restarted
 - agent start_flag set to True
 - report state done
 - ports updated, server sends fdb entries as it sees agent as restarted

fail scenario:
 - OVS is restarted
 - agent start_flag set to True
 - ports updated, server doesn't send fdb entries as report state with start 
flag has not yet come
 - report state done
 - no fdb entries on agent (since no update port messages to server within 
config.agent_boot_time sec)

The proposal is to force state report right after setting start_flag.
No side effects expected.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l2-pop

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1808136

Title:
  l2 pop doesn't always provide the whole list of fdb entries on OVS
  restart

Status in neutron:
  New

Bug description:
  bug https://bugs.launchpad.net/neutron/+bug/1804842 was fixed, but
  there is still a race condition, which leads to the same issue:

  success scenario:
   - OVS is restarted
   - agent start_flag set to True
   - report state done
   - ports updated, server sends fdb entries as it sees agent as restarted

  fail scenario:
   - OVS is restarted
   - agent start_flag set to True
   - ports updated, server doesn't send fdb entries as report state with start 
flag has not yet come
   - report state done
   - no fdb entries on agent (since no update port messages to server within 
config.agent_boot_time sec)

  The proposal is to force state report right after setting start_flag.
  No side effects expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1808136/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1800157] [NEW] privsep: lack of capabilities on kernel 4.15

2018-10-26 Thread Oleg Bondarev
Public bug reported:

l3 and dhcp agents are not functioning on kernel 4.15 due to privsep
errors:

2018-10-25 09:10:38,747.747 24060 INFO oslo.privsep.daemon [-] Running privsep 
helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 
'privsep-helper', '--config-file', '/etc/neutron/l3_agent.ini', 
'--config-file', '/etc/neutron/fwaas_driver.ini', '--config-file', 
'/etc/neutron/neutron.conf', '--privsep_context', 'neutron.privileged.default', 
'--privsep_sock_path', '/tmp/tmpS5k5y2/privsep.sock']
2018-10-25 09:10:39,361.361 24060 WARNING oslo.privsep.daemon [-] privsep log: 
Error in sys.excepthook:
2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log: 
Traceback (most recent call last):
2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/dist-packages/oslo_log/log.py", line 193, in 
logging_excepthook
2018-10-25 09:10:39,364.364 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   getLogger(product_name).critical('Unhandled error', **extra)
2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 1481, in critical
2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   self.logger.critical(msg, *args, **kwargs)
2018-10-25 09:10:39,366.366 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 1212, in critical
2018-10-25 09:10:39,366.366 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   self._log(CRITICAL, msg, args, **kwargs)
2018-10-25 09:10:39,367.367 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 1286, in _log
2018-10-25 09:10:39,367.367 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   self.handle(record)
2018-10-25 09:10:39,368.368 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 1296, in handle
2018-10-25 09:10:39,368.368 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   self.callHandlers(record)
2018-10-25 09:10:39,369.369 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
2018-10-25 09:10:39,370.370 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   hdlr.handle(record)
2018-10-25 09:10:39,370.370 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/__init__.py", line 759, in handle
2018-10-25 09:10:39,371.371 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   self.emit(record)
2018-10-25 09:10:39,371.371 24060 WARNING oslo.privsep.daemon [-] privsep log:  
 File "/usr/lib/python2.7/logging/handlers.py", line 414, in emit
2018-10-25 09:10:39,372.372 24060 WARNING oslo.privsep.daemon [-] privsep log:  
   sres = os.stat(self.baseFilename)
2018-10-25 09:10:39,372.372 24060 WARNING oslo.privsep.daemon [-] privsep log: 
OSError: [Errno 13] Permission denied: '/var/log/neutron/neutron.log'
...
24060 ERROR neutron.agent.l3.agent FailedToDropPrivileges: Privsep daemon 
failed to start

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: l3-dvr-backlog l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1800157

Title:
  privsep: lack of capabilities on kernel 4.15

Status in neutron:
  In Progress

Bug description:
  l3 and dhcp agents are not functioning on kernel 4.15 due to privsep
  errors:

  2018-10-25 09:10:38,747.747 24060 INFO oslo.privsep.daemon [-] Running 
privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', 
'/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', 
'/etc/neutron/l3_agent.ini', '--config-file', '/etc/neutron/fwaas_driver.ini', 
'--config-file', '/etc/neutron/neutron.conf', '--privsep_context', 
'neutron.privileged.default', '--privsep_sock_path', 
'/tmp/tmpS5k5y2/privsep.sock']
  2018-10-25 09:10:39,361.361 24060 WARNING oslo.privsep.daemon [-] privsep 
log: Error in sys.excepthook:
  2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep 
log: Traceback (most recent call last):
  2018-10-25 09:10:39,363.363 24060 WARNING oslo.privsep.daemon [-] privsep 
log:   File "/usr/lib/python2.7/dist-packages/oslo_log/log.py", line 193, in 
logging_excepthook
  2018-10-25 09:10:39,364.364 24060 WARNING oslo.privsep.daemon [-] privsep 
log: getLogger(product_name).critical('Unhandled error', **extra)
  2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep 
log:   File "/usr/lib/python2.7/logging/__init__.py", line 1481, in critical
  2018-10-25 09:10:39,365.365 24060 WARNING oslo.privsep.daemon [-] privsep 
log: self.logger.critical(ms

[Yahoo-eng-team] [Bug 1799178] [NEW] l2 pop doesn't always provide the whole list of fdb entries on agent restart

2018-10-22 Thread Oleg Bondarev
Public bug reported:

The whole list of fdb entries is provided to the agent in case a port form new 
network appears, or when agent is restarted.
Currently agent restart is detected by agent_boot_time option, 180 sec by 
default. 
In fact boot time differs depending on port count and on some loaded clusters 
may exceed 180 secs on gateway nodes easily. Changing boot time in config 
works, but honestly this is not an ideal solution. 
There should be a smarter way for agent restart detection (like agent itself 
sending flag in state report).

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799178

Title:
  l2 pop doesn't always provide the whole list of fdb entries on agent
  restart

Status in neutron:
  New

Bug description:
  The whole list of fdb entries is provided to the agent in case a port form 
new network appears, or when agent is restarted.
  Currently agent restart is detected by agent_boot_time option, 180 sec by 
default. 
  In fact boot time differs depending on port count and on some loaded clusters 
may exceed 180 secs on gateway nodes easily. Changing boot time in config 
works, but honestly this is not an ideal solution. 
  There should be a smarter way for agent restart detection (like agent itself 
sending flag in state report).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799178/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1789846] [NEW] l2_pop flows missing when spawning VMs at a high rate

2018-08-30 Thread Oleg Bondarev
Public bug reported:


version: Pike, DVR enabled, 28 compute nodes
scenario: spawn 140 VMs concurrently, with pre-created neutron ports
issue: on some compute nodes VMs cannot get IP address, no reply on dhcp 
broadcasts. All new VMs spawned on the same compute node in the same network 
have this problem.

it appears that flood table on compute nodes with issue is missing
outputs to dhcp nodes:

 cookie=0x679aebcfbb8dc9a2, duration=296.991s, table=22, n_packets=2,
n_bytes=220, priority=1,dl_vlan=4
actions=strip_vlan,load:0x14->NXM_NX_TUN_ID[],output:"vxlan-
ac1ef47a",output:"vxlan-ac1ef480",output:"vxlan-ac1ef46d",output:"vxlan-
ac1ef477",...

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: l2-pop l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1789846

Title:
  l2_pop flows missing when spawning VMs at a high rate

Status in neutron:
  Confirmed

Bug description:
  
  version: Pike, DVR enabled, 28 compute nodes
  scenario: spawn 140 VMs concurrently, with pre-created neutron ports
  issue: on some compute nodes VMs cannot get IP address, no reply on dhcp 
broadcasts. All new VMs spawned on the same compute node in the same network 
have this problem.

  it appears that flood table on compute nodes with issue is missing
  outputs to dhcp nodes:

   cookie=0x679aebcfbb8dc9a2, duration=296.991s, table=22, n_packets=2,
  n_bytes=220, priority=1,dl_vlan=4
  actions=strip_vlan,load:0x14->NXM_NX_TUN_ID[],output:"vxlan-
  ac1ef47a",output:"vxlan-ac1ef480",output:"vxlan-ac1ef46d",output
  :"vxlan-ac1ef477",...

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1789846/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783728] [NEW] DVR: router interface creation failure (load testing)

2018-07-26 Thread Oleg Bondarev
Public bug reported:

Release: Pike.

Under load (10 parallel threads) several requests for router interface creation 
failed.
No error logs seen, just several DB retries for StaleDataError in 
standardattributes table.
Server response:

INFO neutron.api.v2.resource [req-...] add_router_interface failed (client 
error): The server could not comply with the request since it is either 
malformed or otherwise incorrect.
INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT 
/v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface 
HTTP/1.1" status: 400  len: 365 time: 4.040

Analysis to follow.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: l3-dvr-backlog

** Description changed:

  Under load (10 parallel threads) several requests for router interface 
creation failed.
  No error logs seen, just several DB retries for StaleDataError in 
standardattributes table.
+ Server response: 
+ 
+ INFO neutron.api.v2.resource [req-...] add_router_interface failed (client 
error): The server could not comply with the request since it is either 
malformed or otherwise incorrect.
+ INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT 
/v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface 
HTTP/1.1" status: 400  len: 365 time: 4.040
  
  Analysis to follow.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783728

Title:
  DVR: router interface creation failure (load testing)

Status in neutron:
  Confirmed

Bug description:
  Release: Pike.

  Under load (10 parallel threads) several requests for router interface 
creation failed.
  No error logs seen, just several DB retries for StaleDataError in 
standardattributes table.
  Server response:

  INFO neutron.api.v2.resource [req-...] add_router_interface failed (client 
error): The server could not comply with the request since it is either 
malformed or otherwise incorrect.
  INFO neutron.wsgi [req-...] 125.22.218.43,172.30.121.16 "PUT 
/v2.0/routers/d63811e1-9e54-467b-a4d3-8c407e38ccf2/add_router_interface 
HTTP/1.1" status: 400  len: 365 time: 4.040

  Analysis to follow.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1783728/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1447227] Re: Connecting two or more distributed routers to a subnet doesn't work properly

2017-07-19 Thread Oleg Bondarev
Works for me on Mitaka and on master, followed steps from John's comment
#6. Just added host-route on a subnet connected to 2 DVR routers instead
of manual adding a static route on VM. Marking as invalid.

** Changed in: neutron
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1447227

Title:
  Connecting two or more distributed routers to a subnet doesn't work
  properly

Status in neutron:
  Invalid

Bug description:
  DVR code currently assumes that only one router may be attached to a
  subnet but this is not the case. OVS flows for example will not work
  correctly for E/W traffic as incoming traffic is always assumed to be
  coming from one of the two routers.

  The simple solution is to block the attachment of a distributed router
  to a subnet already attached to another distributed router.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1447227/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1447227] Re: Connecting two or more distributed routers to a subnet doesn't work properly

2017-07-14 Thread Oleg Bondarev
Shouldn't this case be handled by specifying the proper host routes for
such a subnet (connected to several routers)?

** Changed in: neutron
   Status: Confirmed => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1447227

Title:
  Connecting two or more distributed routers to a subnet doesn't work
  properly

Status in neutron:
  Opinion

Bug description:
  DVR code currently assumes that only one router may be attached to a
  subnet but this is not the case. OVS flows for example will not work
  correctly for E/W traffic as incoming traffic is always assumed to be
  coming from one of the two routers.

  The simple solution is to block the attachment of a distributed router
  to a subnet already attached to another distributed router.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1447227/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1702635] [NEW] SR-IOV: sometimes a port may hang in BUILD state

2017-07-06 Thread Oleg Bondarev
Public bug reported:

Scenario:

1) vfio-pci driver is used for VFs
2) 2 ports are created in neutron with binding type 'direct'
3) VMs are spawned and deleted on 2 compute nodes using pre-created ports
4) one neutron port may be bound to different compute nodes at different
   moments
5) for some reason (probably a bug, but current bug is not about it)
   vfio-pci is not properly handling VF reset after VM deletion and for
   sriov agent it looks like some port's MAC is still mapped to some PCI
   slot though the port is not bound to the node
6) sriov agent requests port info from server with
   get_devices_details_list() but doesn't specify 'host' in parameters
7) in this case neutron server sets this port to BUILD, though it may be
   bound to another host:

def _get_new_status(self, host, port_context):
port = port_context.current
if not host or host == port_context.host:
new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up']
  else n_const.PORT_STATUS_DOWN)
if port['status'] != new_status:
return new_status

8) after processing, the agent notifies server with update_device_list() and 
this time specifies 'host' parameter
9) server detects port's and agent's host mismatch and doesn't update status of 
the port
10) port stays in BUILD state

A simple fix would be to specify host at step 6 - in this case neutron
server won't set port's status to BUILD because of host mismatch.

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: sriov-pci-pt

** Description changed:

  Scenario:
  
  1) vfio-pci driver is used for VFs
  2) 2 ports are created in neutron with binding type 'direct'
  3) VMs are spawned and deleted on 2 compute nodes using pre-created ports
- 4) one neutron port may be bound to different compute nodes at different 
moments
- 5) for some reason (probably a bug, but current bug is not about it) vfio-pci 
is not properly 
-handling VF reset after VM deletion and for sriov agent it looks like some 
port's MAC is 
-still mapped to some PCI slot though the port is not bound to the node
- 6) sriov agent requests port info from server with get_devices_details_list() 
but doesn't specify 'host' in parameters
- 7) in this case neutron server sets this port to BUILD, though it may be 
bound to another host:
+ 4) one neutron port may be bound to different compute nodes at different 
+moments
+ 5) for some reason (probably a bug, but current bug is not about it) 
+vfio-pci is not properly handling VF reset after VM deletion and for 
+sriov agent it looks like some port's MAC is still mapped to some PCI 
+slot though the port is not bound to the node
+ 6) sriov agent requests port info from server with 
+get_devices_details_list() but doesn't specify 'host' in parameters
+ 7) in this case neutron server sets this port to BUILD, though it may be 
+bound to another host:
  
- def _get_new_status(self, host, port_context):
- port = port_context.current
+ def _get_new_status(self, host, port_context):
+ port = port_context.current
  >>  if not host or host == port_context.host:
- new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up']
-   else n_const.PORT_STATUS_DOWN)
- if port['status'] != new_status:
- return new_status
+ new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up']
+   else n_const.PORT_STATUS_DOWN)
+ if port['status'] != new_status:
+ return new_status
  
  8) after processing, the agent notifies server with update_device_list() and 
this time specifies 'host' parameter
  9) server detects port's and agent's host mismatch and doesn't update status 
of the port
  10) port stays in BUILD state
  
  A simple fix would be to specify host at step 6 - in this case neutron
  server won't set port's status to BUILD because of host mismatch.

** Description changed:

  Scenario:
  
  1) vfio-pci driver is used for VFs
  2) 2 ports are created in neutron with binding type 'direct'
  3) VMs are spawned and deleted on 2 compute nodes using pre-created ports
- 4) one neutron port may be bound to different compute nodes at different 
-moments
- 5) for some reason (probably a bug, but current bug is not about it) 
-vfio-pci is not properly handling VF reset after VM deletion and for 
-sriov agent it looks like some port's MAC is still mapped to some PCI 
-slot though the port is not bound to the node
- 6) sriov agent requests port info from server with 
-get_devices_details_list() but doesn't specify 'host' in parameters
- 7) in this case neutron server sets this port to BUILD, though it may be 
-bound to another host:
+ 4) one neutron port may be bound to different compute nodes at different
+    momen

[Yahoo-eng-team] [Bug 1678104] [NEW] DHCP may not work on a dualstack network

2017-03-31 Thread Oleg Bondarev
Public bug reported:

There might be race between ipv6 auto-address subnet create and network
dhcp port create.

Neutron server adds an ipv6 address to a dhcp port in two cases:

 1) network already has ipv6 subnet by the time dhcp agent requests dhcp port 
creation - in this case agent includes both subnets into requested IPs of the 
port and both get allocated;
 2) ipv6 subnet is created after the network already has dhcp port existing - 
ipv6 IP then gets allocated on the dhcp port as part of subnet creation on the 
server side;

The bug is with the third case:
 3) ipv6 subnet and dhcp port are created at the same time: so no ipv6 IP is 
requested for dhcp port by dhcp agent, as well as no ipv6 address is added to 
dhcp port as part of subnet creation;

In this case dhcp agent tries to reprocess network after subnet/port
creation and updates IPs on the dhcp port:

 2017-03-30 05:12:38.990 29848 DEBUG neutron.api.rpc.handlers.dhcp_rpc
[req-bcd62396-0e9a-4f39-8bf7-e56f0588805c - - - - -] Update dhcp port
{u'port': {u'network_id': u'29d6752b-027a-4eb9-aa73-711eff1b58ca',
'binding:host_id': u'node-2.test.domain.local', u'fixed_ips':
[{u'subnet_id': u'3f81f975-5718-4bdc-878c-614f22b1b783', u'ip_address':
u'192.168.100.2'}, {u'subnet_id': u'8363ac60-c30d-43dc-
a1d1-3d39820602fd'}]}, 'id': u'2e3a5343-a995-498a-85d2-db686d119fab'}

Server ignores ipv6 auto-address subnets in this request. So agent says:

 2017-03-24 01:51:41.661 28865 DEBUG neutron.agent.linux.dhcp 
[req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Requested DHCP port with 
IPs on subnets set([u'f22021f4-c876-4d94-be93-d24ccb6b0e31', 
u'869f4abf-c440-4b65-a9be-074776fadaf1']) but only got IPs on subnets 
set([u'869f4abf-c440-4b65-a9be-074776fadaf1'])
...
 2017-03-24 01:51:41.775 28865 DEBUG neutron.agent.dhcp.agent 
[req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Error configuring DHCP 
port, scheduling resync: Subnet on port 798cefef-c4c1-482e-bbc0-acea52e6490d 
does not match the requested subnet f22021f4-c876-4d94-be93-d24ccb6b0e31. 
call_driver /usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py:124

DHCP agent keeps trying, dhcp doesn't work.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1678104

Title:
  DHCP may not work on a dualstack network

Status in neutron:
  New

Bug description:
  There might be race between ipv6 auto-address subnet create and
  network dhcp port create.

  Neutron server adds an ipv6 address to a dhcp port in two cases:

   1) network already has ipv6 subnet by the time dhcp agent requests dhcp port 
creation - in this case agent includes both subnets into requested IPs of the 
port and both get allocated;
   2) ipv6 subnet is created after the network already has dhcp port existing - 
ipv6 IP then gets allocated on the dhcp port as part of subnet creation on the 
server side;

  The bug is with the third case:
   3) ipv6 subnet and dhcp port are created at the same time: so no ipv6 IP is 
requested for dhcp port by dhcp agent, as well as no ipv6 address is added to 
dhcp port as part of subnet creation;

  In this case dhcp agent tries to reprocess network after subnet/port
  creation and updates IPs on the dhcp port:

   2017-03-30 05:12:38.990 29848 DEBUG neutron.api.rpc.handlers.dhcp_rpc
  [req-bcd62396-0e9a-4f39-8bf7-e56f0588805c - - - - -] Update dhcp port
  {u'port': {u'network_id': u'29d6752b-027a-4eb9-aa73-711eff1b58ca',
  'binding:host_id': u'node-2.test.domain.local', u'fixed_ips':
  [{u'subnet_id': u'3f81f975-5718-4bdc-878c-614f22b1b783',
  u'ip_address': u'192.168.100.2'}, {u'subnet_id': u'8363ac60-c30d-43dc-
  a1d1-3d39820602fd'}]}, 'id': u'2e3a5343-a995-498a-85d2-db686d119fab'}

  Server ignores ipv6 auto-address subnets in this request. So agent
  says:

   2017-03-24 01:51:41.661 28865 DEBUG neutron.agent.linux.dhcp 
[req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Requested DHCP port with 
IPs on subnets set([u'f22021f4-c876-4d94-be93-d24ccb6b0e31', 
u'869f4abf-c440-4b65-a9be-074776fadaf1']) but only got IPs on subnets 
set([u'869f4abf-c440-4b65-a9be-074776fadaf1'])
  ...
   2017-03-24 01:51:41.775 28865 DEBUG neutron.agent.dhcp.agent 
[req-16627de0-3069-4aa8-bfe1-3db559811c53 - - - - -] Error configuring DHCP 
port, scheduling resync: Subnet on port 798cefef-c4c1-482e-bbc0-acea52e6490d 
does not match the requested subnet f22021f4-c876-4d94-be93-d24ccb6b0e31. 
call_driver /usr/lib/python2.7/dist-packages/neutron/agent/dhcp/agent.py:124

  DHCP agent keeps trying, dhcp doesn't work.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1678104/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo

[Yahoo-eng-team] [Bug 1666549] Re: Infinite router update in neutron L3 agent (HA)

2017-02-21 Thread Oleg Bondarev
** Project changed: neutron => mos

** Tags added: area-neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1666549

Title:
  Infinite router update in neutron L3 agent (HA)

Status in Mirantis OpenStack:
  New

Bug description:
  After fresh deployment of environment and launched ostf tests (or rally), 
neutron l3 agent logs on nodes filled (every .003 second timestamp) with such 
traces:
  http://paste.openstack.org/show/599851/
  which causes cluster fall when log partition will filled up.  

  
  Environment: Fuel 9.0 upgraded to 9.2, fresh install
  3 controllers/kafka + 3 computes + 4 storage ceph-osd + 1 LMA nodes

  neutron agents 8.3.0:
  neutron-dhcp-agent   2:8.3.0-1~u14.04+mos30   
  all  OpenStack virtual network service - DHCP agent
  neutron-l3-agent 2:8.3.0-1~u14.04+mos30   
  all  OpenStack virtual network service - l3 agent
  neutron-lbaasv2-agent2:8.3.0-2~u14.04+mos1
  all  Neutron is a virtual network service for Openstack - LBaaSv2 
agent
  neutron-metadata-agent   2:8.3.0-1~u14.04+mos30   
  all  OpenStack virtual network service - metadata agent
  neutron-openvswitch-agent2:8.3.0-1~u14.04+mos30   
  all  OpenStack virtual network service - Open vSwitch agent

  Steps to reproduce:
  1. Deploy openstack witj Fuel 9.2
  2. Create rally venv and run  scenario
  NeutronNetworks.create_and_delete_routers (concurrency 100 and times 100, 
or  more)
  3. /var/log/neutron/l3-agent.log full of these traces.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mos/+bug/1666549/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1660305] [NEW] DVR multinode job fails over 20 tests

2017-01-30 Thread Oleg Bondarev
Public bug reported:

Example: http://logs.openstack.org/39/426339/2/check/gate-tempest-dsvm-
neutron-dvr-multinode-full-ubuntu-xenial-
nv/b31bdd2/logs/testr_results.html.gz

Mostly  connectivity failures: cannot ssh to instance, floating IP not
ACTIVE, etc.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: gate-failure l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1660305

Title:
  DVR multinode job fails over 20 tests

Status in neutron:
  Confirmed

Bug description:
  Example: http://logs.openstack.org/39/426339/2/check/gate-tempest-
  dsvm-neutron-dvr-multinode-full-ubuntu-xenial-
  nv/b31bdd2/logs/testr_results.html.gz

  Mostly  connectivity failures: cannot ssh to instance, floating IP not
  ACTIVE, etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1660305/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1657476] [NEW] Metadata agent fails to serve requests in python 3

2017-01-18 Thread Oleg Bondarev
Public bug reported:

from http://logs.openstack.org/09/421209/7/experimental/gate-tempest-
dsvm-nova-py35-ubuntu-xenial/2dda79b/logs/screen-q-meta.txt.gz:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenpool.py", line 82, 
in _spawn_n_impl
func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, in 
process_request
proto.__init__(sock, address, self)
  File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 409, in 
__init__
server)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
self.handle()
  File "/usr/lib/python3.5/http/server.py", line 422, in handle
self.handle_one_request()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 379, in 
handle_one_request
self.environ = self.get_environ()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 593, in 
get_environ
env['REMOTE_ADDR'] = self.client_address[0]
IndexError: index out of range

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: py34

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1657476

Title:
  Metadata agent fails to serve requests in python 3

Status in neutron:
  Confirmed

Bug description:
  from http://logs.openstack.org/09/421209/7/experimental/gate-tempest-
  dsvm-nova-py35-ubuntu-xenial/2dda79b/logs/screen-q-meta.txt.gz:

  Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/eventlet/greenpool.py", line 
82, in _spawn_n_impl
  func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, 
in process_request
  proto.__init__(sock, address, self)
File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 409, in 
__init__
  server)
File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
  self.handle()
File "/usr/lib/python3.5/http/server.py", line 422, in handle
  self.handle_one_request()
File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 379, 
in handle_one_request
  self.environ = self.get_environ()
File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 593, 
in get_environ
  env['REMOTE_ADDR'] = self.client_address[0]
  IndexError: index out of range

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1657476/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1629816] [NEW] Misleading "DVR: Duplicate DVR router interface detected for subnet"

2016-10-03 Thread Oleg Bondarev
Public bug reported:

The error message is seen on each ovs agent resync on compute node where
there are dvr serviced ports. Resync can be triggered by any error -
this is unrelated to this bug.

The error message appears on processing distributed router port for a
subnet which is already in local_dvr_map of the agent, see
_bind_distributed_router_interface_port in ovs_dvr_neutron_agent.py:

 if subnet_uuid in self.local_dvr_map:
 ldm = self.local_dvr_map[subnet_uuid]
 csnat_ofport = ldm.get_csnat_ofport()
 if csnat_ofport == constants.OFPORT_INVALID:
 LOG.error(_LE("DVR: Duplicate DVR router interface detected "
   "for subnet %s"), subnet_uuid)
 return

where csnat_ofport = OFPORT_INVALID by default and can only change when
the agent processes csnat port of the router - this will never happen on
compute node and we'll see the misleading log.

The proposal would be to delete the condition and the log as they're
useless.

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1629816

Title:
  Misleading "DVR: Duplicate DVR router interface detected for subnet"

Status in neutron:
  Confirmed

Bug description:
  The error message is seen on each ovs agent resync on compute node
  where there are dvr serviced ports. Resync can be triggered by any
  error - this is unrelated to this bug.

  The error message appears on processing distributed router port for a
  subnet which is already in local_dvr_map of the agent, see
  _bind_distributed_router_interface_port in ovs_dvr_neutron_agent.py:

   if subnet_uuid in self.local_dvr_map:
   ldm = self.local_dvr_map[subnet_uuid]
   csnat_ofport = ldm.get_csnat_ofport()
   if csnat_ofport == constants.OFPORT_INVALID:
   LOG.error(_LE("DVR: Duplicate DVR router interface detected "
 "for subnet %s"), subnet_uuid)
   return

  where csnat_ofport = OFPORT_INVALID by default and can only change
  when the agent processes csnat port of the router - this will never
  happen on compute node and we'll see the misleading log.

  The proposal would be to delete the condition and the log as they're
  useless.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1629816/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1628017] Re: unable to access vm by floating ip from vm without floating

2016-09-28 Thread Oleg Bondarev
Ok, so "Connection refused" was a result of stale ip address on rfp
device that was not deleted after l3 agent restart with new code. If
recreate instances/floating ip from scratch everything works fine. I'm
going to backport the fix to stable/mitaka. Marking this as invalid
since the problem should be fixed on master.

** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1628017

Title:
  unable to access vm by floating ip from vm without floating

Status in neutron:
  Invalid

Bug description:
  Steps to reproduce:

  1. create 2 machines in one internal network. Make sure that VMs created on 
one compute node
  2. assign floating ip to one vm
  3. try to connect from second vm (without floating ip) to vm with floating ip
  like nc -v floating_ip 22

  Expected result
  ===
  nc -v 192.168.120.116 22
  Connection to 192.168.120.116 22 port [tcp/ssh] succeeded!
  SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7

  Actual result
  =
  nc -v 192.168.120.116 22
  nc: connect to 192.168.120.116 port 22 (tcp) failed: Connection timed out


  BTW, if I ping floating ip - I get internal icmp response
  ping 192.168.120.116
  PING 192.168.120.116 (192.168.120.116) 56(84) bytes of data.
  64 bytes from 192.168.111.20: icmp_seq=1 ttl=64 time=0.513 ms
  64 bytes from 192.168.111.20: icmp_seq=2 ttl=64 time=0.538 ms

  
  This bug could be reproduced only if VMs created on the same compute node, if 
I migrate one VM to another - I am able to access floating ip

  Environment
  ===
  1. Mirantis Openstack 9.0.1 with DVR enabled

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1628017/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1625557] [NEW] Concurrent security groups creation fails with DBDuplicateEntry

2016-09-20 Thread Oleg Bondarev
Public bug reported:

 - create_security_group() is wrapped with a db retry decorator
 - it calls _ensure_default_security_group() to create a default security group 
for a tenant if one does not exist 
 - _ensure_default_security_group() in turn calls back to 
create_security_group() to create a default security group
 - due to concurrency the creation of default security group my fail with 
DBDuplicateEntry
 - this is retried for max attempts and the request eventually fails

Traceback: http://paste.openstack.org/show/581903/
Example of failed job in rally: 
http://logs.openstack.org/04/371604/1/check/gate-rally-dsvm-neutron-rally/b1c384d/

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: sg-fw

** Changed in: neutron
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1625557

Title:
  Concurrent security groups creation fails with DBDuplicateEntry

Status in neutron:
  In Progress

Bug description:
   - create_security_group() is wrapped with a db retry decorator
   - it calls _ensure_default_security_group() to create a default security 
group for a tenant if one does not exist 
   - _ensure_default_security_group() in turn calls back to 
create_security_group() to create a default security group
   - due to concurrency the creation of default security group my fail with 
DBDuplicateEntry
   - this is retried for max attempts and the request eventually fails

  Traceback: http://paste.openstack.org/show/581903/
  Example of failed job in rally: 
http://logs.openstack.org/04/371604/1/check/gate-rally-dsvm-neutron-rally/b1c384d/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1625557/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1625209] Re: ipv6 options are being checked for ipv4 subnet

2016-09-20 Thread Oleg Bondarev
Right, neutron DB layer has these fields set for both IPv4 and IPv6
layers, it also adds them when making subnet dict. So other places in
code expect these fields and the reported case might not be single one.
I'd suggest fix this on Calico plugin side.

** Also affects: networking-calico
   Importance: Undecided
   Status: New

** Changed in: neutron
   Status: Confirmed => Invalid

** Changed in: neutron
 Assignee: Oleg Bondarev (obondarev) => (unassigned)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1625209

Title:
  ipv6 options are being checked for ipv4 subnet

Status in networking-calico:
  New
Status in neutron:
  Invalid

Bug description:
  When DHCP agent tries to set fixed_ips parameter for DHCP port (see 
https://bugs.launchpad.net/networking-calico/+bug/1541490) Neutron checks 
ipv6_address_mode and ipv6_ra_mode options of subnet that corresponds to the 
given fixed IP even for IPv4 subnet. And this fails as IPv4 subnet does not 
have such options (see traceback http://paste.openstack.org/show/580996/). And, 
of course, you cannot set such flags for IPv4 subnet.
  I'd expect that such check is performed for IPv6 subnets only.

  Probably, this situation is possible not only while using non-native
  DHCP agent.

  Neutron version: Newton (7f6b5b5d8953159740f74b0a4a5280527f6baa69).
  Environment: Calico (https://github.com/openstack/networking-calico) over 
Neutron.
  Point of failure: 
https://github.com/openstack/neutron/blob/7f6b5b5d8953159740f74b0a4a5280527f6baa69/neutron/agent/linux/dhcp.py#L1342
  Traceback: http://paste.openstack.org/show/580996/
  Failure rate: always.

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-calico/+bug/1625209/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1614452] [NEW] Port create time grows at scale due to dvr arp update

2016-08-18 Thread Oleg Bondarev
Public bug reported:

Scale tests show that sometimes VMs are not able to spawn because of timeouts 
on port creation.
Neutron server logs show that port creation time grows due to dvr arp table 
updates being sent to each l3 dvr agent hosting the router one by one - this 
takes > 90% of time: http://paste.openstack.org/show/560761/

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: l3-dvr-backlog loadimpact

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1614452

Title:
  Port create time grows at scale due to dvr arp update

Status in neutron:
  Confirmed

Bug description:
  Scale tests show that sometimes VMs are not able to spawn because of timeouts 
on port creation.
  Neutron server logs show that port creation time grows due to dvr arp table 
updates being sent to each l3 dvr agent hosting the router one by one - this 
takes > 90% of time: http://paste.openstack.org/show/560761/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1614452/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1610303] [NEW] l2pop mech fails to update_port_postcommit on a loaded cluster

2016-08-05 Thread Oleg Bondarev
Public bug reported:

On a cluster where VMs boots and deletes happen pretty intensively
following traces can pop up in neutron server log:

2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
[req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Mechanism driver 
'l2population' failed in update_port_postcommit
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers Traceback (most 
recent call last):
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 401, 
in _call_on_drivers
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 120, in update_port_postcommit
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
self._update_port_up(context)
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 227, in _update_port_up
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers network_id)
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 176, in _create_agent_fdb
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
fdbs.extend(self._get_port_fdb_entries(binding.port))
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 45, in _get_port_fdb_entries
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers for ip in 
port['fixed_ips']]
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers TypeError: 
'NoneType' object has no attribute '__getitem__'
2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers
2016-08-05 14:08:29.578 9560 ERROR neutron.plugins.ml2.rpc 
[req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Failed to update device 
4c499a14-7211-4714-afa2-95b280d595a2 up

This leads to device to being set to Active state and hence Nova
timeouts waiting for the interface to be ready.

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  On a cluster where VMs boots and deletes happen pretty intensively
  following traces can pop up in neutron server log:
  
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
[req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Mechanism driver 
'l2population' failed in update_port_postcommit
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers Traceback 
(most recent call last):
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 401, 
in _call_on_drivers
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 120, in update_port_postcommit
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
self._update_port_up(context)
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 227, in _update_port_up
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
network_id)
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 176, in _create_agent_fdb
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers 
fdbs.extend(self._get_port_fdb_entries(binding.port))
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 45, in _get_port_fdb_entries
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers for ip in 
port['fixed_ips']]
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers TypeError: 
'NoneType' object has no attribute '__getitem__'
  2016-08-05 14:08:29.575 9560 ERROR neutron.plugins.ml2.managers
  2016-08-05 14:08:29.578 9560 ERROR neutron.plugins.ml2.rpc 
[req-1b5e9a29-7f7e-48f8-84ee-19ce217cb556 - - - - -] Failed to update device 
4c499a14-7211-4714-afa2-95b280d595a2 up
+ 
+ This leads to device to being set to Active state and hence Nova
+ timeouts waiting for the interface to be ready.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1610303

Title:
  l2pop 

[Yahoo-eng-team] [Bug 1610153] [NEW] nova list can sometimes return 404

2016-08-05 Thread Oleg Bondarev
Public bug reported:

On a large number of instances 'nova list' may return 404, probably this
is because some instances are deleted during command execution. Trace:

2016-08-05 09:30:52.666 878 ERROR nova.api.openstack 
[req-707a0e40-67cf-43a9-865d-c44a678b2986 2e2a43e956f344d184e40771d59c991d 
13f508a4dd0e4b538561be2afcf5d699 - - -] Caught error: Instance 
28c33ed4-c1a4-432c-96de-059b94a3dd91 could not be found.
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack Traceback (most recent 
call last):
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/__init__.py", line 139, in 
__call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return 
req.get_response(self.application)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack application, 
catch_exc_info=False)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in 
call_application
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack app_iter = 
application(self.environ, start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, 
start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack resp = 
self.call_func(req, *args, **self.kwargs)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return self.func(req, 
*args, **kwargs)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token/__init__.py", 
line 467, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack response = 
req.get_response(self._app)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack application, 
catch_exc_info=False)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in 
call_application
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack app_iter = 
application(self.environ, start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, 
start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, 
start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/routes/middleware.py", line 136, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack response = 
self.app(environ, start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return resp(environ, 
start_response)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack resp = 
self.call_func(req, *args, **self.kwargs)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack return self.func(req, 
*args, **kwargs)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 672, in 
__call__
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack content_type, body, 
accept)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 756, in 
_process_stack
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack request, action_args)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py", line 619, in 
post_process_extensions
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack **action_args)
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/extended_server_attributes.py",
 line 97, in detail
2016-08-05 09:30:52.666 878 ERROR nova.api.openstack  

[Yahoo-eng-team] [Bug 1606844] [NEW] Neutron constantly resyncing deleted router

2016-07-27 Thread Oleg Bondarev
Public bug reported:

No need to constantly resync router which was deleted and for which
there is no namespace.

Observed: l3 agent log full of

2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in 
_safe_router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in 
_router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ri.delete(self)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 347, 
in delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self.process_delete(agent)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 385, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.logger(e)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self.force_reraise()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
six.reraise(self.type_, self.value, self.tb)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 382, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent return 
func(*args, **kwargs)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 947, 
in process_delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self._process_internal_ports(agent.pd)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 530, 
in _process_internal_ports
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent existing_devices 
= self._get_existing_devices()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 413, 
in _get_existing_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ip_devs = 
ip_wrapper.get_devices(exclude_loopback=True)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 130, in 
get_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
log_fail_as_error=self.log_fail_as_error
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in 
execute
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent raise 
RuntimeError(msg)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent RuntimeError: Exit 
code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.236 13360 ERROR neutron.agent.linux.utils [-] Exit code: 1; 
Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory

this consumes memory, cpu, disk.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1606844

Title:
  Neutron constantly resyncing deleted router

Status in neutron:
  New

Bug description:
  No need to constantly resync router which was deleted and for which
  there is no namespace.

  Observed: l3 agent log full of

  2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a
  2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/di

[Yahoo-eng-team] [Bug 1606845] [NEW] L3 agent constantly resyncing deleted router

2016-07-27 Thread Oleg Bondarev
Public bug reported:

No need to constantly resync router which was deleted and for which
there is no namespace.

Observed: l3 agent log full of

2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in 
_safe_router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in 
_router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ri.delete(self)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 347, 
in delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self.process_delete(agent)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 385, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.logger(e)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self.force_reraise()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
six.reraise(self.type_, self.value, self.tb)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 382, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent return 
func(*args, **kwargs)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 947, 
in process_delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
self._process_internal_ports(agent.pd)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 530, 
in _process_internal_ports
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent existing_devices 
= self._get_existing_devices()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 413, 
in _get_existing_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ip_devs = 
ip_wrapper.get_devices(exclude_loopback=True)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 130, in 
get_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent 
log_fail_as_error=self.log_fail_as_error
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in 
execute
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent raise 
RuntimeError(msg)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent RuntimeError: Exit 
code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.236 13360 ERROR neutron.agent.linux.utils [-] Exit code: 1; 
Stdin: ; Stdout: ; Stderr: Cannot open network namespace 
"qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory

this consumes memory, cpu, disk.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

** Summary changed:

- Neutron constantly resyncing deleted router
+ L3 agent constantly resyncing deleted router

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1606845

Title:
  L3 agent constantly resyncing deleted router

Status in neutron:
  New

Bug description:
  No need to constantly resync router which was deleted and for which
  there is no namespace.

  Observed: l3 agent log full of

  2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a
  2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most

[Yahoo-eng-team] [Bug 1599089] [NEW] DVR: floating ip stops working after reassignment

2016-07-05 Thread Oleg Bondarev
Public bug reported:

When reassigning floating IP from one VM to another on the same host, it
stops responding. This happens due to l3 agent just checks that IP
address is configured on the interface and does not update ip rules to
reflect new fixed IP.

Reassignment works if disassociate floating IP first and then associate it with 
another fixed IP.
However API allows reassignment without disassociation so it should work as 
well.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog liberty-backport-potential mitaka-backport-potential

** Tags added: mitaka-backport-potential

** Tags added: liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1599089

Title:
  DVR: floating ip stops working after reassignment

Status in neutron:
  New

Bug description:
  When reassigning floating IP from one VM to another on the same host,
  it stops responding. This happens due to l3 agent just checks that IP
  address is configured on the interface and does not update ip rules to
  reflect new fixed IP.

  Reassignment works if disassociate floating IP first and then associate it 
with another fixed IP.
  However API allows reassignment without disassociation so it should work as 
well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1599089/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1595878] [NEW] Memory leak in unit tests

2016-06-24 Thread Oleg Bondarev
Public bug reported:

tests.unit.agent.ovsdb.native.test_connection.TestOVSNativeConnection
calls Connection.start() which starts a daemon with a while True loop
full of mocks. mock._CallList of those mocks start to grow very quick
and finally eat all available memory.

mem_top output during unit tests run:

refs:
18118[call(1),
 call().get_nowait(),
 call().get_nowait().do_commit(),
 call().get_nowait().results.put(),
 call().task_do
18117[call.get_nowait(),
 call.get_nowait().do_commit(),
 call.get_nowait().results.put(),
 call.task_done(),
 call.get_no
17990[call(1),
 call().get_nowait(),
 call().get_nowait().do_commit(),
 call().get_nowait().results.put(),
 call().task_do
17989[call.get_nowait(),
 call.get_nowait().do_commit(),
 call.get_nowait().results.put(),
 call.task_done(),
 call.get_no
13592[call(),
 call().fd_wait(, 1),
 call().timer_wait(),
 call().block(),
 call().fd_wait( [call.fd_wait(, 1),
 call.timer_wait(),
 call.block(),
 call.fd_wait( [call.fd_wait(, 1),
 call.timer_wait(),
 call.block(),
 call.fd_wait( [call(),
 call().do_commit(),
 call().results.put(),
 call(),
 call().do_commit(),
 call().results.put( [call(),
 call().fd_wait(, 1),
 call().timer_wait(),
 call().block(),
 call().fd_wait( [call.fd_wait(, 1),
 call.timer_wait(),
 call.block(),
 call.fd_wait( [call.fd_wait(, 1),
 call.timer_wait(),
 call.block(),
 call.fd_wait( [call(),
 call().do_commit(),
 call().results.put(),
 call(),
 call().do_commit(),
 call().results.put( {'keystoneclient.service_catalog': ,
 'oslo_messaging.r
9061 [call(, ),
 call().wait(),
 call().run(),
 call().wait( [call.wait(),
 call.run(),
 call.wait(),
 call.run(),
 call.wait( [call.wait(),
 call.run(),
 call.wait(),
 call.run(),
 call.wait( [call.do_commit(),
 call.results.put(),
 call.do_commit(),
 call.results.put( [call.do_commit(),
 call.results.put(),
 call.do_commit(),
 call.results.put( [call.get_nowait(),
 call.task_done(),
 call.get_nowait(),
 call.task_done(),
 call.get_nowait(),
 call.task_done(),
 call.get_nowait(),
 call.task_done(),
 call.get_nowait(),
 call.task_done(),
 call
8997 [call(, ),
 call().wait(),
 call().run(),
 call().wait(
79091
47269
45542
30758
14696
8601 
6579 
5639 
4940 
3858 
3291 
3275 
3267 
2439 
2304 
 
2219 
1869 
1424 

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: unittest

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1595878

Title:
  Memory leak in unit tests

Status in neutron:
  In Progress

Bug description:
  tests.unit.agent.ovsdb.native.test_connection.TestOVSNativeConnection
  calls Connection.start() which starts a daemon with a while True loop
  full of mocks. mock._CallList of those mocks start to grow very quick
  and finally eat all available memory.

  mem_top output during unit tests run:

  refs:
  18118  [call(1),
   call().get_nowait(),
   call().get_nowait().do_commit(),
   call().get_nowait().results.put(),
   call().task_do
  18117  [call.get_nowait(),
   call.get_nowait().do_commit(),
   call.get_nowait().results.put(),
   call.task_done(),
   call.get_no
  17990  [call(1),
   call().get_nowait(),
   call().get_nowait().do_commit(),
   call().get_nowait().results.put(),
   call().task_do
  17989  [call.get_nowait(),
   call.get_nowait().do_commit(),
   call.get_nowait().results.put(),
   call.task_done(),
   call.get_no
  13592  [call(),
   call().fd_wait(, 1),
   call().timer_wait(),
   call().block(),
   call().fd_wait( [call.fd_wait(, 1),
   call.timer_wait(),
   call.block(),
   call.fd_wait( [call.fd_wait(, 1),
   call.timer_wait(),
   call.block(),
   call.fd_wait( [call(),
   call().do_commit(),
   call().results.put(),
   call(),
   call().do_commit(),
   call().results.put( [call(),
   call().fd_wait(, 1),
   call().timer_wait(),
   call().block(),
   call().fd_wait( [call.fd_wait(, 1),
   call.timer_wait(),
   call.block(),
   call.fd_wait( [call.fd_wait(, 1),
   call.timer_wait(),
   call.block(),
   call.fd_wait( [call(),
   call().do_commit(),
   call().results.put(),
   call(),
   call().do_commit(),
   call().results.put( {'keystoneclient.service_catalog': ,
 'oslo_messaging.r
  9061   [call(, ),
   call().wait(),
   call().run(),
   call().wait( [call.wait(),
   call.run(),
   call.wait(),
   call.run(),
   call.wait( [call.wait(),
   call.run(),
   call.wait(),
   call.run(),
   call.wait( [call.do_commit(),
   call.results.put(),
   call.do_commit(),
   call.results.put( [call.do_commit(),
   call.results.put(),
   call.do_commit(),
   call.results.put( [call.get_nowait(),
   call.task_done(),
   call.get_nowait(),
   call.task_done(),
   call.get_nowait(),
   call.task_done(),
   call.get_nowait(),
   call.task_done(),
   call.get_nowait(),
   call.task_done(),
   call
  8997

[Yahoo-eng-team] [Bug 1593653] [NEW] DVR: cannot manually remove router from l3 agent

2016-06-17 Thread Oleg Bondarev
Public bug reported:

This is a regression from commit c198710dc551bc0f79851a7801038b033088a8c2:
if there are dvr serviceable ports on the node with agent, server now will 
notify agent with router_updated rather than router_removed, however when 
updating router, agent will request router_info and that's where server will 
schedule router back to this l3 agent due to autoscheduling being enabled.

** Affects: neutron
 Importance: High
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1593653

Title:
  DVR: cannot manually remove router from l3 agent

Status in neutron:
  New

Bug description:
  This is a regression from commit c198710dc551bc0f79851a7801038b033088a8c2:
  if there are dvr serviceable ports on the node with agent, server now will 
notify agent with router_updated rather than router_removed, however when 
updating router, agent will request router_info and that's where server will 
schedule router back to this l3 agent due to autoscheduling being enabled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1593653/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1590041] [NEW] DVR: regression with router rescheduling

2016-06-07 Thread Oleg Bondarev
Public bug reported:

L3 agent may not fully process dvr router being rescheduled to it which leads 
to loss of external connectivity.
The reason is that with commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce dvr 
edge router now creates snat_namespace object in constructor while some logic 
in the module still checks for existence of this object: like 
external_gateway_updated() will not fully process router if snat_namespace 
object exists.

The proposal is to revert commit
9dc70ed77e055677a4bd3257a0e9e24239ed4cce and the make another attempt to
fix bug 1557909.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1590041

Title:
  DVR: regression with router rescheduling

Status in neutron:
  In Progress

Bug description:
  L3 agent may not fully process dvr router being rescheduled to it which leads 
to loss of external connectivity.
  The reason is that with commit 9dc70ed77e055677a4bd3257a0e9e24239ed4cce dvr 
edge router now creates snat_namespace object in constructor while some logic 
in the module still checks for existence of this object: like 
external_gateway_updated() will not fully process router if snat_namespace 
object exists.

  The proposal is to revert commit
  9dc70ed77e055677a4bd3257a0e9e24239ed4cce and the make another attempt
  to fix bug 1557909.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1590041/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1585623] [NEW] A vm's port is in down state after compute node reboot

2016-05-25 Thread Oleg Bondarev
Public bug reported:

After compute node reboot some ports may end up in DOWN state and
corresponding VMs lose net access.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: mitaka-backport-potential ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1585623

Title:
  A vm's port is in down state after compute node reboot

Status in neutron:
  Confirmed

Bug description:
  After compute node reboot some ports may end up in DOWN state and
  corresponding VMs lose net access.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1585623/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1585149] [NEW] Do not inherit test case classes from regular Neutron classes

2016-05-24 Thread Oleg Bondarev
Public bug reported:

It's a bad practice itself and it may lead to errors during tests
initialization.

Test case classes are initialized during test loading stage by testing
framework. Some neutron classes may not be ready to be created at this
stage, for example those requiring rpc messaging system to be
initialized first. I faced this bug after I added an rpc notifier to
AgentDBMixin: unit tests started failing with:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py",
 line 149, in 
main()
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py",
 line 145, in main
stdout=stdout, exit=False)
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py",
 line 171, in __init__
self.parseArgs(argv)
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/main.py",
 line 113, in parseArgs
self._do_discovery(argv[2:])
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py",
 line 211, in _do_discovery
super(TestProgram, self)._do_discovery(argv, Loader=Loader)
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/main.py",
 line 223, in _do_discovery
self.test = loader.discover(self.start, self.pattern, self.top)
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 374, in discover
tests = list(self._find_tests(start_dir, pattern))
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 440, in _find_tests
for test in path_tests:
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 440, in _find_tests
for test in path_tests:
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 431, in _find_tests
full_path, pattern, namespace)
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 487, in _find_test_path
return self.loadTestsFromModule(module, pattern=pattern), False
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 148, in loadTestsFromModule
tests.append(self.loadTestsFromTestCase(obj))
  File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/unittest2/loader.py",
 line 112, in loadTestsFromTestCase
loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
  File "neutron/db/agents_db.py", line 190, in __init__
resources_rpc.ResourcesPushToServersRpcApi())
  File "neutron/api/rpc/handlers/resources_rpc.py", line 135, in __init__
self.client = n_rpc.get_client(target)
  File "neutron/common/rpc.py", line 174, in get_client
assert TRANSPORT is not None
AssertionError

** Affects: neutron
 Importance: Low
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed


** Tags: unittest

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1585149

Title:
  Do not inherit test case classes from regular Neutron classes

Status in neutron:
  Confirmed

Bug description:
  It's a bad practice itself and it may lead to errors during tests
  initialization.

  Test case classes are initialized during test loading stage by testing
  framework. Some neutron classes may not be ready to be created at this
  stage, for example those requiring rpc messaging system to be
  initialized first. I faced this bug after I added an rpc notifier to
  AgentDBMixin: unit tests started failing with:

  Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
  "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
  exec code in run_globals
File 
"/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py",
 line 149, in 
  main()
File 
"/home/obondarev/Neutron/neutron/.tox/py27/lib/python2.7/site-packages/subunit/run.py",
 line 145, in main
  stdout=stdout, exit=False)
File 
"/home/obondarev/Neutron/neutron/.tox/py27/local/lib/python2.7/site-packages/testtools/run.py",
 line 171, in __init__
 

[Yahoo-eng-team] [Bug 1576757] [NEW] SRIOV: ESwitchManager should handle multiple NICs per physical net

2016-04-29 Thread Oleg Bondarev
Public bug reported:

Commit 46ddaf4288a1cac44d8afc0525b4ecb3ae2186a3 made it possible to specify 
multiple NICs per network.
However ESwitchManager now stores only one EmbSwitch per physical net (the last 
one).

** Affects: neutron
 Importance: High
 Assignee: Vladimir Eremin (yottatsa)
 Status: Confirmed


** Tags: mitaka-backport-potential sriov-pci-pt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1576757

Title:
  SRIOV: ESwitchManager should handle multiple NICs per physical net

Status in neutron:
  Confirmed

Bug description:
  Commit 46ddaf4288a1cac44d8afc0525b4ecb3ae2186a3 made it possible to specify 
multiple NICs per network.
  However ESwitchManager now stores only one EmbSwitch per physical net (the 
last one).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1576757/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1558626] Re: [sriov] physical_device_mappings allows only one physnet for per nic

2016-04-29 Thread Oleg Bondarev
New bug was filed for handling multiple nics per phys net:
https://bugs.launchpad.net/neutron/+bug/1576757


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1558626

Title:
  [sriov] physical_device_mappings allows only one physnet for per nic

Status in neutron:
  Fix Released

Bug description:
  Mitaka, ML2, ml2_sriov.agent_required=True

  sriov_nic.physical_device_mappings is allowed to specify ony one NIC
  per physnet. If I try to specify two nics like following

  [sriov_nic]
  physical_device_mappings=physnet2:enp1s0f0,physnet2:enp1s0f1

  I've got next error on start

  2016-03-17 15:26:48.818 6832 INFO neutron.common.config [-] Logging enabled!
  2016-03-17 15:26:48.819 6832 INFO neutron.common.config [-] 
/usr/bin/neutron-sriov-nic-agent version 8.0.0.0b3
  2016-03-17 15:26:48.819 6832 DEBUG neutron.common.config [-] command line: 
/usr/bin/neutron-sriov-nic-agent 
--config-file=/etc/neutron/plugins/ml2/sriov_agent.ini 
--log-file=/var/log/neutron/neutron-sriov-agent.log 
--config-file=/etc/neutron/neutron.conf setup_logging 
/usr/lib/python2.7/dist-packages/neutron/common/config.py:266
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Failed on 
Agent configuration parse. Agent terminated!
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most 
recent call last):
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
 line 436, in main
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 
config_parser.parse()
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
 line 411, in parse
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 
cfg.CONF.SRIOV_NIC.physical_device_mappings)
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 240, in 
parse_mappings
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent "unique") % 
{'key': key, 'mapping': mapping})
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent ValueError: Key 
physnet2 in mapping: 'physnet2:enp1s0f1' not unique
  2016-03-17 15:26:48.819 6832 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1558626/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1573843] [NEW] Minimize agent state reports handling on server side

2016-04-22 Thread Oleg Bondarev
Public bug reported:

Agent state reports are mostly needed in order for neutron server to properly 
(re)schedule resources among agents.
New features may require more precise scheduling which in turn requires agents 
to report more and servers to handle more data.

However adding new logic to agent state reports handling has negative
effect on scalability and overall neutron server performance. Here is
one of examples: https://bugs.launchpad.net/neutron/+bug/1567497 with
more cases possibly coming in future: like
https://review.openstack.org/#/c/285548  which is adding a new db update
request for each state report.

One of the things that could be done is to not include (or just to ignore on 
server side) the data which can't be changed during runtime (like config 
parameters) in each state report. Such data should only be processed on agent 
(re)start/revival. 
So mainly it's about separating static and dynamic data in state reports 
handling to reduce the amount of db updates.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: loadimpact

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1573843

Title:
  Minimize agent state reports handling on server side

Status in neutron:
  New

Bug description:
  Agent state reports are mostly needed in order for neutron server to properly 
(re)schedule resources among agents.
  New features may require more precise scheduling which in turn requires 
agents to report more and servers to handle more data.

  However adding new logic to agent state reports handling has negative
  effect on scalability and overall neutron server performance. Here is
  one of examples: https://bugs.launchpad.net/neutron/+bug/1567497 with
  more cases possibly coming in future: like
  https://review.openstack.org/#/c/285548  which is adding a new db
  update request for each state report.

  One of the things that could be done is to not include (or just to ignore on 
server side) the data which can't be changed during runtime (like config 
parameters) in each state report. Such data should only be processed on agent 
(re)start/revival. 
  So mainly it's about separating static and dynamic data in state reports 
handling to reduce the amount of db updates.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1573843/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1567497] [NEW] resource_versions in agents state reports led to performance degradation

2016-04-07 Thread Oleg Bondarev
Public bug reported:

resource_versions were included into agent state reports recently to support 
rolling upgrades (commit 97a272a892fcf488949eeec4959156618caccae8) 
The downside is that it brought additional processing when handling state 
reports on server side: update of local resources versions cache and more 
seriously rpc casts to all other servers to do the same. 

All this led to a visible performance degradation at scale with hundreds
of agents constantly sending reports. Under load (rally test) agents may
start "blinking" which makes cluster very unstable.

Need to optimize agents notifications about resource_versions.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1567497

Title:
  resource_versions in agents state reports led to performance
  degradation

Status in neutron:
  In Progress

Bug description:
  resource_versions were included into agent state reports recently to support 
rolling upgrades (commit 97a272a892fcf488949eeec4959156618caccae8) 
  The downside is that it brought additional processing when handling state 
reports on server side: update of local resources versions cache and more 
seriously rpc casts to all other servers to do the same. 

  All this led to a visible performance degradation at scale with
  hundreds of agents constantly sending reports. Under load (rally test)
  agents may start "blinking" which makes cluster very unstable.

  Need to optimize agents notifications about resource_versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1567497/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1566291] [NEW] L3 agent: at some point an agent becomes unable to handle new routers

2016-04-05 Thread Oleg Bondarev
Public bug reported:

Following seen in l3 agent logs:

2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent [-] Failed to 
process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f'
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 497, in 
_process_router_update
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._process_router_if_compatible(router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 434, in 
_process_router_if_compatible
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._process_added_router(router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 439, in 
_process_added_router
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._router_added(router['id'], router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 340, in 
_router_added
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent ri = 
self._create_router(router_id, router)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 337, in 
_create_router
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent return 
legacy_router.LegacyRouter(*args, **kwargs)
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 61, in 
__init__
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
DEFAULT_ADDRESS_SCOPE: ADDRESS_SCOPE_MARK_IDS.pop()}
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent KeyError: 'pop from 
an empty set'
2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent
2016-04-05 09:30:09.034 24216 DEBUG neutron.agent.l3.agent [-] Starting router 
update for e341e0e2-5089-46e9-91f9-2099a156b27f, action None, priority 1 
_process_router_update 
/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py:463
2016-04-05 09:30:09.035 24216 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL 
msg_id: 6295fbe9cf2040d79c68f5c5f8b1e963 exchange 'neutron' topic 'q-l3-plugin' 
_send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:454
2016-04-05 09:30:09.417 24216 DEBUG oslo_messaging._drivers.amqpdriver [-] 
received reply msg_id: 6295fbe9cf2040d79c68f5c5f8b1e963 __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-04-05 09:30:09.418 24216 ERROR neutron.agent.l3.agent [-] Failed to 
process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f'

So agent is constantly resyncing (causing load on neutron server) and
unable to handle new routers.

I believe that set "ADDRESS_SCOPE_MARK_IDS = set(range(1024, 2048))"
from router_info.py should not be agent global but it should be
ADDRESS_SCOPE_MARK_IDS  per router. Or at least need to return values
back to the set when router is deleted.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1566291

Title:
  L3 agent: at some point an agent becomes unable to handle new routers

Status in neutron:
  New

Bug description:
  Following seen in l3 agent logs:

  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent [-] Failed to 
process compatible router 'e341e0e2-5089-46e9-91f9-2099a156b27f'
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 497, in 
_process_router_update
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._process_router_if_compatible(router)
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 434, in 
_process_router_if_compatible
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._process_added_router(router)
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 439, in 
_process_added_router
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent 
self._router_added(router['id'], router)
  2016-04-05 09:30:09.033 24216 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/dist-packages/neutron/age

[Yahoo-eng-team] [Bug 1522436] Re: No need to autoreschedule routers if l3 agent is back online

2016-04-01 Thread Oleg Bondarev
It appeared the fix was not complete. I'm reopening the bug, will upload
a fix shortly

** Changed in: neutron
   Status: Fix Released => Triaged

** Tags removed: in-stable-liberty

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1522436

Title:
  No need to autoreschedule routers if l3 agent is back online

Status in neutron:
  Triaged

Bug description:
   - in case l3 agent goes offline the auto-rescheduling task is triggered and 
starts to reschedule each router from dead agent one by one
   - If there are a lot of routers scheduled to the agent, rescheduling all of 
them might take some time
   - during that time the agent might get back online
   - currently autorescheduling will be continued until all routers are 
rescheduled from the (already alive!) agent

  The proposal is to skip rescheduling if agent is back online.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1522436/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1546110] [NEW] DB error causes router rescheduling loop to fail

2016-02-16 Thread Oleg Bondarev
y", line 713, in _checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall fairy = 
_ConnectionRecord.checkout(pool)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 485, in checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall rec.checkin()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 60, in 
__exit__
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall 
compat.reraise(exc_type, exc_value, exc_tb)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 482, in checkout
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall dbapi_connection = 
rec.get_connection()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 594, in 
get_connection
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall self.connection = 
self.__connect()
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 607, in __connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall connection = 
self.__pool._invoke_creator(self)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 97, in 
connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return 
dialect.connect(*cargs, **cparams)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 385, in 
connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return 
self.dbapi.connect(*cargs, **cparams)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return 
Connection(*args, **kwargs)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 206, in __init__
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall super(Connection, 
self).__init__(*args, **kwargs2)
2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall DBConnectionError: 
(_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL server at 
'reading initial communication packet', system error: 0")

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp liberty-backport-potential

** Tags added: liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1546110

Title:
  DB error causes router rescheduling loop to fail

Status in neutron:
  New

Bug description:
  In router rescheduling looping task db call to get down bindings is
  done outside of try/except block which may cause task to fail (see
  traceback below). Need to put db operation inside try/except.

  2016-02-15T10:44:44.259995+00:00 err: 2016-02-15 10:44:44.250 15419 ERROR 
oslo.service.loopingcall [req-79bce4c3-2e81-446c-8b37-6d30e3a964e2 - - - - -] 
Fixed interval looping call 
'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin.reschedule_routers_from_down_agents'
 failed
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall Traceback (most 
recent call last):
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 113, in 
_run_loop
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall result = 
func(*self.args, **self.kw)
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 
101, in reschedule_routers_from_down_agents
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall down_bindings = 
self._get_down_bindings(context, cutoff)
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/neutron/db/l3_dvrscheduler_db.py", line 460, 
in _get_down_bindings
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall context, cutoff)
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 
149, in _get_down_bindings
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall return 
query.all()
  2016-02-15 10:44:44.250 15419 ERROR oslo.service.loopingcall File 
"/usr/lib/python2.7/dist-packages/

[Yahoo-eng-team] [Bug 1545695] [NEW] L3 agent: traceback is suppressed on floating ip setup failure

2016-02-15 Thread Oleg Bondarev
Public bug reported:

Following traceback says nothing about actual exception and makes it
hard to debug issues:

2016-02-10 05:26:54.025 682 ERROR neutron.agent.l3.router_info [-] L3 agent 
failure to setup floating IPs
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info Traceback (most 
recent call last):
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 604, 
in process_external
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info fip_statuses = 
self.configure_fip_addresses(interface_name)
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 268, 
in configure_fip_addresses
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info raise 
n_exc.FloatingIpSetupException('L3 agent failure to setup '
2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info 
FloatingIpSetupException: L3 agent failure to setup floating IPs

Need to log actual exception with traceback before reraising.

** Affects: neutron
 Importance: Low
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1545695

Title:
  L3 agent: traceback is suppressed on floating ip setup failure

Status in neutron:
  New

Bug description:
  Following traceback says nothing about actual exception and makes it
  hard to debug issues:

  2016-02-10 05:26:54.025 682 ERROR neutron.agent.l3.router_info [-] L3 agent 
failure to setup floating IPs
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info Traceback 
(most recent call last):
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 604, 
in process_external
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info fip_statuses = 
self.configure_fip_addresses(interface_name)
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info File 
"/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 268, 
in configure_fip_addresses
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info raise 
n_exc.FloatingIpSetupException('L3 agent failure to setup '
  2016-02-10 05:26:54.025 682 TRACE neutron.agent.l3.router_info 
FloatingIpSetupException: L3 agent failure to setup floating IPs

  Need to log actual exception with traceback before reraising.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1545695/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1543513] [NEW] Bing back dvr routers autoscheduling

2016-02-09 Thread Oleg Bondarev
Public bug reported:

Commit 1105d732b2cb6ec66d042c85968d47fe6d733f5f disabled auto scheduling for 
dvr routers because of the complexity of DVR scheduling itself which led to a 
number of logical and DB issues. Now after blueprint 
improve-dvr-l3-agent-binding is merged DVR scheduling is almost no different 
from legacy scheduling (no extra DVR logic required for auto scheduling) so we 
can bring auto scheduling for DVR routers back. 
This is better for consistency and improves UX.

** Affects: neutron
 Importance: Wishlist
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1543513

Title:
  Bing back dvr routers autoscheduling

Status in neutron:
  New

Bug description:
  Commit 1105d732b2cb6ec66d042c85968d47fe6d733f5f disabled auto scheduling for 
dvr routers because of the complexity of DVR scheduling itself which led to a 
number of logical and DB issues. Now after blueprint 
improve-dvr-l3-agent-binding is merged DVR scheduling is almost no different 
from legacy scheduling (no extra DVR logic required for auto scheduling) so we 
can bring auto scheduling for DVR routers back. 
  This is better for consistency and improves UX.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1543513/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1541348] [NEW] Regression in routers auto scheduling logic

2016-02-03 Thread Oleg Bondarev
Public bug reported:

Routers auto scheduling works when an l3 agent starst and performs a full sync 
with neutron server . Neutron server looks for all unscheduled routers (non-dvr 
routers only) and schedules them to that agent if applicable.
This was broken by commit 0e97feb0f30bc0ef6f4fe041cb41b7aa81042263 which 
changed full sync logic a bit: now l3 agent requests all ids of routers 
scheduled to it first. get_router_ids() didn't call routers auto scheduling 
which caused the regression.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp liberty-backport-potential

** Summary changed:

- regression in routers auto scheduling logic
+ Regression in routers auto scheduling logic

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1541348

Title:
  Regression in routers auto scheduling logic

Status in neutron:
  New

Bug description:
  Routers auto scheduling works when an l3 agent starst and performs a full 
sync with neutron server . Neutron server looks for all unscheduled routers 
(non-dvr routers only) and schedules them to that agent if applicable.
  This was broken by commit 0e97feb0f30bc0ef6f4fe041cb41b7aa81042263 which 
changed full sync logic a bit: now l3 agent requests all ids of routers 
scheduled to it first. get_router_ids() didn't call routers auto scheduling 
which caused the regression.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1541348/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1538163] [NEW] DVR: race in dvr serviceable port deletion

2016-01-26 Thread Oleg Bondarev
Public bug reported:

In ml2 plugin when dvr seviceable port is deleted, we check if any dvr routers 
should be deleted from port's host. 
This is done prior to actual port deletion from db by checking if there are any 
more dvr serviceable ports on this host.
This is prone to races: if two last compute ports on the host are deleted 
concurrently, the check might not return any routers as in both cases it will 
see yet another dvr serviceable port on the host:

- p1 and p2 are last compute ports on compute host 'host1'
- p1 and p2 are on the same subnet connected to a dvr router 'r1'
- p1 and p2 are deleted concurrently
- on p1 deletion plugin checks if there are any more dvr serviceable ports on 
host1 - sees p2 -> no dvr routers should be deleted
- same on p2 deletion plugin checks if there are any more dvr serviceable ports 
on host1 - sees p1 -> no dvr routers should be deleted
- p1 is deleted from DB
- p2 is deleted from DB
- r1 is not deleted from host1 though there are no more ports on it

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1538163

Title:
  DVR: race in dvr serviceable port deletion

Status in neutron:
  New

Bug description:
  In ml2 plugin when dvr seviceable port is deleted, we check if any dvr 
routers should be deleted from port's host. 
  This is done prior to actual port deletion from db by checking if there are 
any more dvr serviceable ports on this host.
  This is prone to races: if two last compute ports on the host are deleted 
concurrently, the check might not return any routers as in both cases it will 
see yet another dvr serviceable port on the host:

  - p1 and p2 are last compute ports on compute host 'host1'
  - p1 and p2 are on the same subnet connected to a dvr router 'r1'
  - p1 and p2 are deleted concurrently
  - on p1 deletion plugin checks if there are any more dvr serviceable ports on 
host1 - sees p2 -> no dvr routers should be deleted
  - same on p2 deletion plugin checks if there are any more dvr serviceable 
ports on host1 - sees p1 -> no dvr routers should be deleted
  - p1 is deleted from DB
  - p2 is deleted from DB
  - r1 is not deleted from host1 though there are no more ports on it

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1538163/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1536110] [NEW] OVS agent should fail if can't get DVR mac address

2016-01-20 Thread Oleg Bondarev
Public bug reported:

If ovs agent is configured to run in dvr mode it has to get it's unique
mac address from server on startup . In case it cannot get it after
several attempts (commit 51303b5fe4785d0cda76f095c95eb4d746d7d783) due
to some error, it falls back to non-dvr mode.

The question is what is the purpoise of ovs agent to be running in non-
dvr mode while it was configured for dvr? Server code does not handle
ovs agent 'in_distributed_mode' flag in any way and will continue
scheduling dvr routers to such nodes. This may lead to connectivity
issues which are hard to debug.

Example:

2016-01-12 11:29:15.186 16238 WARNING
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent
[req-e3b3643d-6976-4656-b247-ab291e6a4b27 - - - - -] L2 agent could not
get DVR MAC address at startup due to RPC error.  It happens when the
server does not support this RPC API.  Detailed message: Remote error:
DBConnectionError (_mysql_exceptions.OperationalError) (2013, "Lost
connection to MySQL server at 'reading initial communication packet',
system error: 0")

There were some issues with mysql on startup which led to half of ovs
agents running in non-dvr mode silently.

The proposal is to fail in case agent cannot operate in the mode it was
configured to.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1536110

Title:
  OVS agent should fail if can't get DVR mac address

Status in neutron:
  New

Bug description:
  If ovs agent is configured to run in dvr mode it has to get it's
  unique mac address from server on startup . In case it cannot get it
  after several attempts (commit
  51303b5fe4785d0cda76f095c95eb4d746d7d783) due to some error, it falls
  back to non-dvr mode.

  The question is what is the purpoise of ovs agent to be running in
  non-dvr mode while it was configured for dvr? Server code does not
  handle ovs agent 'in_distributed_mode' flag in any way and will
  continue scheduling dvr routers to such nodes. This may lead to
  connectivity issues which are hard to debug.

  Example:

  2016-01-12 11:29:15.186 16238 WARNING
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_dvr_neutron_agent
  [req-e3b3643d-6976-4656-b247-ab291e6a4b27 - - - - -] L2 agent could
  not get DVR MAC address at startup due to RPC error.  It happens when
  the server does not support this RPC API.  Detailed message: Remote
  error: DBConnectionError (_mysql_exceptions.OperationalError) (2013,
  "Lost connection to MySQL server at 'reading initial communication
  packet', system error: 0")

  There were some issues with mysql on startup which led to half of ovs
  agents running in non-dvr mode silently.

  The proposal is to fail in case agent cannot operate in the mode it
  was configured to.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1536110/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1530179] [NEW] get_subnet_for_dvr() returns wrong gateway mac

2015-12-30 Thread Oleg Bondarev
Public bug reported:

get_subnet_for_dvr should return proper gateway mac address in order for ovs 
agent to add proper flows for dvr interface on br-int.
commit e82b0e108332964c90e9d2cfaf3d334a92127155 added 'fixed_ips' parameter to 
the handler to filter gateway port of the subnet. However actual filtering was 
applied improperly which leads to wrong gateway mac being returned:
   
if fixed_ips:
filter = fixed_ips[0]
else:
filter = {'fixed_ips': {'subnet_id': [subnet],
'ip_address':
[subnet_info['gateway_ip']]}}

internal_gateway_ports = self.plugin.get_ports(
context, filters=filter)

internal_port = internal_gateway_ports[0]
subnet_info['gateway_mac'] = internal_port['mac_address']

get_ports() here actually returns _all_ ports so mac address of a random
port is returned as 'gateway_mac'. In most cases it doesn't lead to any
noticeable side effects but in some cases it may cause very weird
behavior.

The case that we faced was:
 root@node-9:~# ovs-ofctl dump-flows br-int
 ...
 cookie=0x971c69a135b8ce1f, duration=23023.412s, table=2, n_packets=1339, 
n_bytes=131234, idle_age=19050, 
priority=4,dl_vlan=3556,dl_dst=fa:16:3e:da:53:f1 
actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:6
 cookie=0x971c69a135b8ce1f, duration=31946.414s, table=2, n_packets=25320, 
n_bytes=2481408, idle_age=1, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:2c:24:86 
actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:5
 ...

fa:16:3e:2c:24:86 is mac address of a vm port and it was returned as
gateway mac due to the bug. This vm was unreachable from other subnets
connected to the same dvr router. However another vm on the same host
and the same subnet was ok. It took a while to find out what was wrong
:)

** Affects: neutron
 Importance: Medium
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1530179

Title:
  get_subnet_for_dvr() returns wrong gateway mac

Status in neutron:
  New

Bug description:
  get_subnet_for_dvr should return proper gateway mac address in order for ovs 
agent to add proper flows for dvr interface on br-int.
  commit e82b0e108332964c90e9d2cfaf3d334a92127155 added 'fixed_ips' parameter 
to the handler to filter gateway port of the subnet. However actual filtering 
was applied improperly which leads to wrong gateway mac being returned:
 
  if fixed_ips:
  filter = fixed_ips[0]
  else:
  filter = {'fixed_ips': {'subnet_id': [subnet],
  'ip_address':
  [subnet_info['gateway_ip']]}}

  internal_gateway_ports = self.plugin.get_ports(
  context, filters=filter)

  internal_port = internal_gateway_ports[0]
  subnet_info['gateway_mac'] = internal_port['mac_address']

  get_ports() here actually returns _all_ ports so mac address of a
  random port is returned as 'gateway_mac'. In most cases it doesn't
  lead to any noticeable side effects but in some cases it may cause
  very weird behavior.

  The case that we faced was:
   root@node-9:~# ovs-ofctl dump-flows br-int
   ...
   cookie=0x971c69a135b8ce1f, duration=23023.412s, table=2, n_packets=1339, 
n_bytes=131234, idle_age=19050, 
priority=4,dl_vlan=3556,dl_dst=fa:16:3e:da:53:f1 
actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:6
   cookie=0x971c69a135b8ce1f, duration=31946.414s, table=2, n_packets=25320, 
n_bytes=2481408, idle_age=1, priority=4,dl_vlan=3556,dl_dst=fa:16:3e:2c:24:86 
actions=strip_vlan,mod_dl_src:fa:16:3e:2c:24:86,output:5
   ...

  fa:16:3e:2c:24:86 is mac address of a vm port and it was returned as
  gateway mac due to the bug. This vm was unreachable from other subnets
  connected to the same dvr router. However another vm on the same host
  and the same subnet was ok. It took a while to find out what was wrong
  :)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1530179/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1424096] Re: DVR routers attached to shared networks aren't being unscheduled from a compute node after deleting the VMs using the shared net

2015-12-15 Thread Oleg Bondarev
I faced the bug while reworking unit tests into functional tests: when 
performing steps described in the description I get:
 2015-12-15 17:41:23,484ERROR [neutron.callbacks.manager] Error during 
notification for neutron.db.l3_dvrscheduler_db._notify_port_delete port, 
after_delete
Traceback (most recent call last):
  File "neutron/callbacks/manager.py", line 141, in _notify_loop
callback(resource, event, trigger, **kwargs)
  File "neutron/db/l3_dvrscheduler_db.py", line 485, in _notify_port_delete
context, router['agent_id'], router['router_id'])
  File "neutron/db/l3_dvrscheduler_db.py", line 439, in 
remove_router_from_l3_agent
router = self.get_router(context, router_id)
  File "neutron/db/l3_db.py", line 451, in get_router
router = self._get_router(context, id)
  File "neutron/db/l3_db.py", line 137, in _get_router
raise l3.RouterNotFound(router_id=router_id)
RouterNotFound: Router 7d52836b-8fe5-4417-842f-3cbe0920c89c could not be 
found

and router is not removed from host which has no more dvr serviceable
ports.

Looks like we also need admin context in order to remove admin router
from a host when non-admin tenant removes last dvr serviceable port on a
shared network.

** Changed in: neutron
   Status: Fix Released => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1424096

Title:
  DVR routers attached to shared networks aren't being unscheduled from
  a compute node after deleting the VMs using the shared net

Status in neutron:
  Confirmed
Status in neutron juno series:
  Fix Released
Status in neutron kilo series:
  New

Bug description:
  As the administrator, a DVR router is created and attached to a shared
  network. The administrator also created the shared network.

  As a non-admin tenant, a VM is created with the port using the shared
  network.  The only VM using the shared network is scheduled to a
  compute node.  When the VM is deleted, it is expected the qrouter
  namespace of the DVR router is removed.  But it is not.  This doesn't
  happen with routers attached to networks that are not shared.

  The environment consists of 1 controller node and 1 compute node.

  Routers having the problem are created by the administrator attached
  to shared networks that are also owned by the admin:

  As the administrator, do the following commands on a setup having 1
  compute node and 1 controller node:

  1. neutron net-create shared-net -- --shared True
 Shared net's uuid is f9ccf1f9-aea9-4f72-accc-8a03170fa242.

  2. neutron subnet-create --name shared-subnet shared-net 10.0.0.0/16

  3. neutron router-create shared-router
  Router's UUID is ab78428a-9653-4a7b-98ec-22e1f956f44f.

  4. neutron router-interface-add shared-router shared-subnet
  5. neutron router-gateway-set  shared-router public

  
  As a non-admin tenant (tenant-id: 95cd5d9c61cf45c7bdd4e9ee52659d13), boot a 
VM using the shared-net network:

  1. neutron net-show shared-net
  +-+--+
  | Field   | Value|
  +-+--+
  | admin_state_up  | True |
  | id  | f9ccf1f9-aea9-4f72-accc-8a03170fa242 |
  | name| shared-net   |
  | router:external | False|
  | shared  | True |
  | status  | ACTIVE   |
  | subnets | c4fd4279-81a7-40d6-a80b-01e8238c1c2d |
  | tenant_id   | 2a54d6758fab47f4a2508b06284b5104 |
  +-+--+

  At this point, there are no VMs using the shared-net network running
  in the environment.

  2. Boot a VM that uses the shared-net network: nova boot ... --nic 
net-id=f9ccf1f9-aea9-4f72-accc-8a03170fa242 ... vm_sharednet
  3. Assign a floating IP to the VM "vm_sharednet"
  4. Delete "vm_sharednet". On the compute node, the qrouter namespace of the 
shared router (qrouter-ab78428a-9653-4a7b-98ec-22e1f956f44f) is left behind

  stack@DVR-CN2:~/DEVSTACK/manage$ ip netns
  qrouter-ab78428a-9653-4a7b-98ec-22e1f956f44f
   ...

  
  This is consistent with the output of "neutron l3-agent-list-hosting-router" 
command.  It shows the router is still being hosted on the compute node.

  
  $ neutron l3-agent-list-hosting-router ab78428a-9653-4a7b-98ec-22e1f956f44f
  
+--+++---+
  | id   | host   | admin_state_up | 
alive |
  
+--+++---+
  | 42f12eb0-51bc-4861-928a-48de51ba7ae1 | DVR-Controller | True   | 
:-)   |
  | 

[Yahoo-eng-team] [Bug 1524908] [NEW] Router may be removed from dvr_snat agent by accident

2015-12-10 Thread Oleg Bondarev
Public bug reported:

This popped up during https://review.openstack.org/#/c/238478
 - when dvr serviceable port is deleted/migrated, dvr callback checks if there 
are any more dvr serviceable ports on the host and if there are no - removes 
the router from the agent on that host
 - in case dhcp port is deleted/migrated this may lead to router being deleted 
from dvr_snat agent, which includes snat namespace deletion

Need to check agent mode and only remove router from dvr agents running
on compute nodes in this case.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1524908

Title:
  Router may be removed from dvr_snat agent by accident

Status in neutron:
  New

Bug description:
  This popped up during https://review.openstack.org/#/c/238478
   - when dvr serviceable port is deleted/migrated, dvr callback checks if 
there are any more dvr serviceable ports on the host and if there are no - 
removes the router from the agent on that host
   - in case dhcp port is deleted/migrated this may lead to router being 
deleted from dvr_snat agent, which includes snat namespace deletion

  Need to check agent mode and only remove router from dvr agents
  running on compute nodes in this case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1524908/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1522824] Re: DVR multinode job: test_shelve_instance failure due to SSHTimeout

2015-12-04 Thread Oleg Bondarev
On a second thought it might be not fair to require nova to wait for
some events from neutron on cleanup. Also in case of live migration vifs
on source node are deleted after vm is already migrated and ports are
active on destination node, so neutron will not send any network-vif-
unplugged events in this case. Shelve-unshelve seems a corner case and
I'd like to avoid hacks in vm cleanup logic.

The other idea for the fix (on neutron side now) would be to change port
status to smth like PENDING_BUILD right after db update. Nova will count
such ports as non-ACTIVE and will wait for network-vif-plugged events
for them. When agent requests info for the port, neutron server will
update status to BUILD. Later when agent reports device up, the port
will be put back into ACTIVE state and network-vif-plugged event will be
sent to nova.

Changing project back to neutron.

** Project changed: nova => neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1522824

Title:
  DVR multinode job: test_shelve_instance failure due to SSHTimeout

Status in neutron:
  New

Bug description:
  gate-tempest-dsvm-neutron-multinode-full fails from time to time due
  to
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  failure:

  Captured traceback:
  2015-12-04 01:17:12.569 | ~~~
  2015-12-04 01:17:12.569 | Traceback (most recent call last):
  2015-12-04 01:17:12.570 |   File "tempest/test.py", line 127, in wrapper
  2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs)
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance
  2015-12-04 01:17:12.570 | 
self._create_server_then_shelve_and_unshelve()
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 93, in 
_create_server_then_shelve_and_unshelve
  2015-12-04 01:17:12.570 | private_key=keypair['private_key'])
  2015-12-04 01:17:12.570 |   File "tempest/scenario/manager.py", line 645, 
in get_timestamp
  2015-12-04 01:17:12.571 | private_key=private_key)
  2015-12-04 01:17:12.571 |   File "tempest/scenario/manager.py", line 383, 
in get_remote_client
  2015-12-04 01:17:12.571 | linux_client.validate_authentication()
  2015-12-04 01:17:12.571 |   File 
"tempest/common/utils/linux/remote_client.py", line 63, in 
validate_authentication
  2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth()
  2015-12-04 01:17:12.571 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 167, in test_connection_auth
  2015-12-04 01:17:12.571 | connection = self._get_ssh_connection()
  2015-12-04 01:17:12.572 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 87, in _get_ssh_connection
  2015-12-04 01:17:12.572 | password=self.password)
  2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection 
to the 172.24.5.209 via SSH timed out.
  2015-12-04 01:17:12.572 | User: cirros, Password: None

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1522824/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1522824] [NEW] DVR multinode job: test_shelve_instance failure due to SSHTimeout

2015-12-04 Thread Oleg Bondarev
Public bug reported:

gate-tempest-dsvm-neutron-multinode-full fails from time to time due to
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
failure:

Captured traceback:
2015-12-04 01:17:12.569 | ~~~
2015-12-04 01:17:12.569 | Traceback (most recent call last):
2015-12-04 01:17:12.570 |   File "tempest/test.py", line 127, in wrapper
2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs)
2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance
2015-12-04 01:17:12.570 | self._create_server_then_shelve_and_unshelve()
2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 93, in 
_create_server_then_shelve_and_unshelve
2015-12-04 01:17:12.570 | private_key=keypair['private_key'])
2015-12-04 01:17:12.570 |   File "tempest/scenario/manager.py", line 645, 
in get_timestamp
2015-12-04 01:17:12.571 | private_key=private_key)
2015-12-04 01:17:12.571 |   File "tempest/scenario/manager.py", line 383, 
in get_remote_client
2015-12-04 01:17:12.571 | linux_client.validate_authentication()
2015-12-04 01:17:12.571 |   File 
"tempest/common/utils/linux/remote_client.py", line 63, in 
validate_authentication
2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth()
2015-12-04 01:17:12.571 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 167, in test_connection_auth
2015-12-04 01:17:12.571 | connection = self._get_ssh_connection()
2015-12-04 01:17:12.572 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 87, in _get_ssh_connection
2015-12-04 01:17:12.572 | password=self.password)
2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection to 
the 172.24.5.209 via SSH timed out.
2015-12-04 01:17:12.572 | User: cirros, Password: None

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

** Description changed:

  gate-tempest-dsvm-neutron-multinode-full fails from time to time due to
- tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance:
+ tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
+ failure:
  
  Captured traceback:
  2015-12-04 01:17:12.569 | ~~~
  2015-12-04 01:17:12.569 | Traceback (most recent call last):
  2015-12-04 01:17:12.570 |   File "tempest/test.py", line 127, in wrapper
  2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs)
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance
  2015-12-04 01:17:12.570 | 
self._create_server_then_shelve_and_unshelve()
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 93, in 
_create_server_then_shelve_and_unshelve
  2015-12-04 01:17:12.570 | private_key=keypair['private_key'])
  2015-12-04 01:17:12.570 |   File "tempest/scenario/manager.py", line 645, 
in get_timestamp
  2015-12-04 01:17:12.571 | private_key=private_key)
  2015-12-04 01:17:12.571 |   File "tempest/scenario/manager.py", line 383, 
in get_remote_client
  2015-12-04 01:17:12.571 | linux_client.validate_authentication()
  2015-12-04 01:17:12.571 |   File 
"tempest/common/utils/linux/remote_client.py", line 63, in 
validate_authentication
  2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth()
  2015-12-04 01:17:12.571 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 167, in test_connection_auth
  2015-12-04 01:17:12.571 | connection = self._get_ssh_connection()
  2015-12-04 01:17:12.572 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 87, in _get_ssh_connection
  2015-12-04 01:17:12.572 | password=self.password)
  2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection 
to the 172.24.5.209 via SSH timed out.
  2015-12-04 01:17:12.572 | User: cirros, Password: None

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1522824

Title:
  DVR multinode job: test_shelve_instance failure due to SSHTimeout

Status in neutron:
  New

Bug description:
  gate-tempest-dsvm-neutron-multinode-full fails from time to time due
  to
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  failure:

  Captured traceback:
  2015-12-04 01:17:12

[Yahoo-eng-team] [Bug 1522824] Re: DVR multinode job: test_shelve_instance failure due to SSHTimeout

2015-12-04 Thread Oleg Bondarev
Changing project to nova due to reasons described in comment #3

** Project changed: neutron => nova

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1522824

Title:
  DVR multinode job: test_shelve_instance failure due to SSHTimeout

Status in OpenStack Compute (nova):
  New

Bug description:
  gate-tempest-dsvm-neutron-multinode-full fails from time to time due
  to
  tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
  failure:

  Captured traceback:
  2015-12-04 01:17:12.569 | ~~~
  2015-12-04 01:17:12.569 | Traceback (most recent call last):
  2015-12-04 01:17:12.570 |   File "tempest/test.py", line 127, in wrapper
  2015-12-04 01:17:12.570 | return f(self, *func_args, **func_kwargs)
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 101, in test_shelve_instance
  2015-12-04 01:17:12.570 | 
self._create_server_then_shelve_and_unshelve()
  2015-12-04 01:17:12.570 |   File 
"tempest/scenario/test_shelve_instance.py", line 93, in 
_create_server_then_shelve_and_unshelve
  2015-12-04 01:17:12.570 | private_key=keypair['private_key'])
  2015-12-04 01:17:12.570 |   File "tempest/scenario/manager.py", line 645, 
in get_timestamp
  2015-12-04 01:17:12.571 | private_key=private_key)
  2015-12-04 01:17:12.571 |   File "tempest/scenario/manager.py", line 383, 
in get_remote_client
  2015-12-04 01:17:12.571 | linux_client.validate_authentication()
  2015-12-04 01:17:12.571 |   File 
"tempest/common/utils/linux/remote_client.py", line 63, in 
validate_authentication
  2015-12-04 01:17:12.571 | self.ssh_client.test_connection_auth()
  2015-12-04 01:17:12.571 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 167, in test_connection_auth
  2015-12-04 01:17:12.571 | connection = self._get_ssh_connection()
  2015-12-04 01:17:12.572 |   File 
"/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/tempest_lib/common/ssh.py",
 line 87, in _get_ssh_connection
  2015-12-04 01:17:12.572 | password=self.password)
  2015-12-04 01:17:12.572 | tempest_lib.exceptions.SSHTimeout: Connection 
to the 172.24.5.209 via SSH timed out.
  2015-12-04 01:17:12.572 | User: cirros, Password: None

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1522824/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1522436] [NEW] No need to autoreschedule routers if l3 agent is back online

2015-12-03 Thread Oleg Bondarev
Public bug reported:

 - in case l3 agent goes offline the auto-rescheduling task is triggered and 
starts to reschedule each router from dead agent one by one
 - If there are a lot of routers scheduled to the agent, rescheduling all of 
them might take some time
 - during that time the agent might get back online
 - currently autorescheduling will be continued until all routers are 
rescheduled from the (already alive!) agent

The proposal is to skip rescheduling if agent is back online.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1522436

Title:
  No need to autoreschedule routers if l3 agent is back online

Status in neutron:
  New

Bug description:
   - in case l3 agent goes offline the auto-rescheduling task is triggered and 
starts to reschedule each router from dead agent one by one
   - If there are a lot of routers scheduled to the agent, rescheduling all of 
them might take some time
   - during that time the agent might get back online
   - currently autorescheduling will be continued until all routers are 
rescheduled from the (already alive!) agent

  The proposal is to skip rescheduling if agent is back online.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1522436/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1521524] [NEW] With DVR enabled instances sometimes fail to get metadata

2015-12-01 Thread Oleg Bondarev
Public bug reported:

Rally scenario which creates VMs with floating IPs at a high rate
sometimes fails with SSHTimeout when trying to connect to the VM by
floating IP. At the same time pings to the VM are fine.

It appeared that VMs may sometimes fail to get public key from metadata.
That happens because metadata proxy process was started after VM boot.

Further analysis showed that l3 agent on compute node was not notified
about new VM port at the time this port was created.

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: l3-dvr-backlog liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1521524

Title:
  With DVR enabled instances sometimes fail to get metadata

Status in neutron:
  In Progress

Bug description:
  Rally scenario which creates VMs with floating IPs at a high rate
  sometimes fails with SSHTimeout when trying to connect to the VM by
  floating IP. At the same time pings to the VM are fine.

  It appeared that VMs may sometimes fail to get public key from
  metadata. That happens because metadata proxy process was started
  after VM boot.

  Further analysis showed that l3 agent on compute node was not notified
  about new VM port at the time this port was created.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1521524/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1414559] Re: OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected

2015-11-19 Thread Oleg Bondarev
Nova patch: https://review.openstack.org/246910/

** Also affects: nova
   Importance: Undecided
   Status: New

** Changed in: nova
 Assignee: (unassigned) => Oleg Bondarev (obondarev)

** Changed in: nova
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1414559

Title:
  OVS drops RARP packets by QEMU upon live-migration - VM temporarily
  disconnected

Status in neutron:
  In Progress
Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When live-migrating a VM the QEMU send 5 RARP packets in order to allow 
re-learning of the new location of the VM's MAC address.
  However the VIF creation scheme between nova-compute and neutron-ovs-agent 
drops these RARPs:
  1. nova creates a port on OVS but without the internal tagging. 
  2. At this stage all the packets that come out from the VM, or QEMU process 
it runs in, will be dropped.
  3. The QEMU sends five RARP packets in order to allow MAC learning. These 
packets are dropped as described in #2.
  4. In the meanwhile neutron-ovs-agent loops every POLLING_INTERVAL and scans 
for new ports. Once it detects a new port is added. it will read the properties 
of the new port, and assign the correct internal tag, that will allow 
connection of the VM.

  The flow above suggests that:
  1. RARP packets are dropped, so MAC learning takes much longer and depends on 
internal traffic and advertising by the VM.
  2. VM is disconnected from the network for a mean period of POLLING_INTERVAL/2

  Seems like this could be solved by direct messages between nova vif
  driver and neutron-ovs-agent

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1414559/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1509295] [NEW] L3: agent may do double work upon start/resync

2015-10-23 Thread Oleg Bondarev
Public bug reported:

The issue was noticed during scale testing of DVR.
When l3 agent starts up it initiates a full sync with neutron server: requests 
full info about all the routers scheduled to it. At the same time agent may 
receive different notifications (router_added/updated/deleted) which were sent 
while agent was offline or starting up. For each of such notifications the 
agent will request router info again, so server will have to process it twice 
(first is for resync request). 

The following optimization makes sense: when agent is about to fullsync
we can skip all router notifications since fyllsync should bring the
agent up to date anyway.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: In Progress


** Tags: l3-ipam-dhcp loadimpact

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1509295

Title:
  L3: agent may do double work upon start/resync

Status in neutron:
  In Progress

Bug description:
  The issue was noticed during scale testing of DVR.
  When l3 agent starts up it initiates a full sync with neutron server: 
requests full info about all the routers scheduled to it. At the same time 
agent may receive different notifications (router_added/updated/deleted) which 
were sent while agent was offline or starting up. For each of such 
notifications the agent will request router info again, so server will have to 
process it twice (first is for resync request). 

  The following optimization makes sense: when agent is about to
  fullsync we can skip all router notifications since fyllsync should
  bring the agent up to date anyway.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1509295/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1508869] [NEW] DVR: handle dvr serviceable port's host change

2015-10-22 Thread Oleg Bondarev
Public bug reported:

When a VM port's host is changed we need to check if router should be 
unscheduled from old host and send corresponding notifications.
commit d5a8074ec3c67ed68e64a96827da990f1c34e10f added such a check when port is 
unbound. Need to add similar check in case of host change.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1508869

Title:
  DVR: handle dvr serviceable port's host change

Status in neutron:
  New

Bug description:
  When a VM port's host is changed we need to check if router should be 
unscheduled from old host and send corresponding notifications.
  commit d5a8074ec3c67ed68e64a96827da990f1c34e10f added such a check when port 
is unbound. Need to add similar check in case of host change.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1508869/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1505557] [NEW] L3 agent not always properly update floatingip status on server

2015-10-13 Thread Oleg Bondarev
Public bug reported:

commit c44506bfd60b2dd6036e113464f1ea682cfaeb6c introduced an
optimization to not send floating ip status update when status didn't
change: so if server returned floating ip as ACTIVE we don't need to
update it's status after successfull processing.

This migh be wrong in DVR case: when floatingip's associated fixed port is 
moved from one host to another, the notification is sent to both l3 agents on 
compute nodes (old and new). Here is what happens next:
 - old agent receives notification and requests router info from server
 - same for new agent
 - server returns router info without floating ip to old agent
 - server returns router info with floating ip to new agent. The status of 
floating ip is ACTIVE.
 - old agent removes floating ip and sends status update so server puts 
floatingip to DOWN state
 - new agent adds floatingip and doesn't send status update since it didn't 
changed from agent's point of view
 - floating ip stays in DOWN state though it's actually active

The fix would be to always update status of floating ip if agent
actually applies it.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1505557

Title:
  L3 agent not always properly update floatingip status on server

Status in neutron:
  New

Bug description:
  commit c44506bfd60b2dd6036e113464f1ea682cfaeb6c introduced an
  optimization to not send floating ip status update when status didn't
  change: so if server returned floating ip as ACTIVE we don't need to
  update it's status after successfull processing.

  This migh be wrong in DVR case: when floatingip's associated fixed port is 
moved from one host to another, the notification is sent to both l3 agents on 
compute nodes (old and new). Here is what happens next:
   - old agent receives notification and requests router info from server
   - same for new agent
   - server returns router info without floating ip to old agent
   - server returns router info with floating ip to new agent. The status of 
floating ip is ACTIVE.
   - old agent removes floating ip and sends status update so server puts 
floatingip to DOWN state
   - new agent adds floatingip and doesn't send status update since it didn't 
changed from agent's point of view
   - floating ip stays in DOWN state though it's actually active

  The fix would be to always update status of floating ip if agent
  actually applies it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1505557/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1505661] [NEW] RetryRequest failure on create_security_group_bulk

2015-10-13 Thread Oleg Bondarev
Public bug reported:

<163>Oct  5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 
ERROR neutron.api.v2.resource [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] 
create failed
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource Traceback (most 
recent call last):
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in 
resource
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource return f(*args, 
**kwargs)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 448, in create
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource objs = 
obj_creator(request.context, body, **kwargs)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 123, 
in create_security_group_bulk
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource 
security_group_rule)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 954, 
in _create_bulk
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource {'resource': 
resource, 'item': item})
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource 
six.reraise(self.type_, self.value, self.tb)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 947, 
in _create_bulk
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource 
objects.append(obj_creator(context, item))
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 150, 
in create_security_group
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource 
self._ensure_default_security_group(context, tenant_id)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py", line 663, 
in _ensure_default_security_group
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource raise 
db_exc.RetryRequest(ex)
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource RetryRequest
2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource

<167>Oct  5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.820 34082 
DEBUG neutron.db.securitygroups_db [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] 
Duplicate default security group 9839de92fb8049598f1c3ea8f32b9cf9 was not 
created _
ensure_default_security_group 
/usr/lib/python2.7/dist-packages/neutron/db/securitygroups_db.py:679
<163>Oct  5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 
ERROR neutron.db.db_base_plugin_v2 [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] 
An exception occurred while creating the security_group:{u'security_group': 
{'tenan
t_id': u'9839de92fb8049598f1c3ea8f32b9cf9', u'name': 
u'rally_neutronsecgrp_F44SF1uvTciIQJlu', u'description': u'Rally SG'}}

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1505661

Title:
  RetryRequest failure on create_security_group_bulk

Status in neutron:
  New

Bug description:
  <163>Oct  5 09:10:29 node-203 neutron-server 2015-10-05 09:10:29.831 34082 
ERROR neutron.api.v2.resource [req-ea0e5480-e8ec-4014-9015-2199424f54bc ] 
create failed
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource Traceback (most 
recent call last):
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in 
resource
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource return 
f(*args, **kwargs)
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.resource   File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 448, in create
  2015-10-05 09:10:29.831 34082 TRACE neutron.api.v2.r

[Yahoo-eng-team] [Bug 1505282] [NEW] L3 agent: explicit call to resync on init may lead to double syncing

2015-10-12 Thread Oleg Bondarev
Public bug reported:

Currently L3 agent has an explicit call to self.periodic_sync_routers_task() 
after initialization. 
Given that periodic job spacing is set to 1 second, this may lead to double 
syncing with server on initialization (especially if there are a lot of routers 
scheduled to the agent):
 - agent starts, fullsync flag is True
 - periodic_sync_routers_task is called from after_start(), agent requests 
router info from server, fullsync flag is True
 - periodic_sync_routers_task is called by periodic task framework, fullsync 
flag is still True, agent requests router info from server once again.
So it's double work on both server and agent sides which might be quite 
expensive at scale.

The proposal is to just use run_immediately parameter.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1505282

Title:
  L3 agent: explicit call to resync on init may lead to double syncing

Status in neutron:
  New

Bug description:
  Currently L3 agent has an explicit call to self.periodic_sync_routers_task() 
after initialization. 
  Given that periodic job spacing is set to 1 second, this may lead to double 
syncing with server on initialization (especially if there are a lot of routers 
scheduled to the agent):
   - agent starts, fullsync flag is True
   - periodic_sync_routers_task is called from after_start(), agent requests 
router info from server, fullsync flag is True
   - periodic_sync_routers_task is called by periodic task framework, fullsync 
flag is still True, agent requests router info from server once again.
  So it's double work on both server and agent sides which might be quite 
expensive at scale.

  The proposal is to just use run_immediately parameter.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1505282/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1494157] [NEW] Regression: ObjectDeletedError on network delete

2015-09-10 Thread Oleg Bondarev
251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py", line 614, in 
load_scalar_attributes
2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource raise 
orm_exc.ObjectDeletedError(state)
2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource ObjectDeletedError: 
Instance '' has been deleted, or its row is otherwise 
not present.
2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1494157

Title:
  Regression: ObjectDeletedError on network delete

Status in neutron:
  New

Bug description:
  Exception is raised when deleting network ports:

  2015-09-09T01:24:36.253938+00:00 err: 2015-09-09 01:24:36.251 10128 ERROR 
neutron.api.v2.resource [req-81135bfb-f40b-41ee-b6ce-279eafba97dd ] delete 
failed
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource Traceback (most 
recent call last):
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 83, in 
resource
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 131, in wrapper
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource return f(*args, 
**kwargs)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 495, in delete
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
obj_deleter(request.context, id, **kwargs)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 780, in 
delete_network
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
self._delete_ports(context, port_ids)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 693, in 
_delete_ports
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource port_id)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
six.reraise(self.type_, self.value, self.tb)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 685, in 
_delete_ports
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
self.delete_port(context, port_id)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 1292, in 
delete_port
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource super(Ml2Plugin, 
self).delete_port(context, id)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1915, 
in delete_port
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
self._delete_port(context, id)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/neutron/db/db_base_plugin_v2.py", line 1938, 
in _delete_port
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource query.delete()
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2670, in delete
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource delete_op.exec_()
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 896, in 
exec_
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
self._do_pre_synchronize()
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 958, in 
_do_pre_synchronize
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource 
eval_condition(obj)]
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 115, in 
evaluate
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource left_val = 
eval_left(obj)
  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/evaluator.py", line 72, in 

  2015-09-09 01:24:36.251 10128 TRACE neutron.api.v2.resource return lamb

[Yahoo-eng-team] [Bug 1491922] [NEW] ovs agent doesn't configure new ovs-port for an instance

2015-09-03 Thread Oleg Bondarev
Public bug reported:

in case of massive resource deletion (networks, ports) it may take agent quite 
a lot of time to process.
Port delete priocessing is happening during ovs agent periodic task. It takes 
agent ~0.25s to process one port deletion.
>From the attached log we can see that on a certain iteration the agent had to 
>process deletion of 1625 ports.
 1625 * 0.25 = 406 seconds.
Indeed:

 2015-08-29 09:13:46.004 21292 DEBUG
neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-
55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863
- starting polling. Elapsed:0.047 rpc_loop /usr/lib/python2.7/dist-
packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1733

 ... (ports deletion handling)

 2015-08-29 09:20:28.569 21292 DEBUG 
neutron.plugins.openvswitch.agent.ovs_neutron_agent 
[req-55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - 
port information retrieved. Elapsed:402.612 rpc_loop 
/usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1748
 ... (from here agent starts processing new ports)

402 seconds is somewhat not acceptable. Nova waits for 300 seconds by default 
and then fails with vif plugging timeout.
>From log we can also see that new ovs port appeared during agent was busy with 
>ports deletion stuff: 

2015-08-29 09:13:52.432 21292 DEBUG neutron.agent.linux.ovsdb_monitor [-] 
Output received from ovsdb monitor: 
{"data":[["8fd481a4-1267-445b-bedc-f1f6b3a47898","old",null,["set",[]]],["","new","qvoced59c11-1b",76]],"headings":["row","action","name","ofport"]}
 _read_stdout 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/ovsdb_monitor.py:44

Port deletion handling needs to be optimised on agent side.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: ovs

** Attachment added: "ovs-agent.log.gz"
   
https://bugs.launchpad.net/bugs/1491922/+attachment/4456894/+files/ovs-agent.log.gz

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1491922

Title:
  ovs agent doesn't configure new ovs-port for an instance

Status in neutron:
  New

Bug description:
  in case of massive resource deletion (networks, ports) it may take agent 
quite a lot of time to process.
  Port delete priocessing is happening during ovs agent periodic task. It takes 
agent ~0.25s to process one port deletion.
  From the attached log we can see that on a certain iteration the agent had to 
process deletion of 1625 ports.
   1625 * 0.25 = 406 seconds.
  Indeed:

   2015-08-29 09:13:46.004 21292 DEBUG
  neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-
  55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop -
  iteration:25863 - starting polling. Elapsed:0.047 rpc_loop
  /usr/lib/python2.7/dist-
  packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1733

   ... (ports deletion handling)

   2015-08-29 09:20:28.569 21292 DEBUG 
neutron.plugins.openvswitch.agent.ovs_neutron_agent 
[req-55e0a577-e03b-4476-9bdd-f5480cfef966 ] Agent rpc_loop - iteration:25863 - 
port information retrieved. Elapsed:402.612 rpc_loop 
/usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1748
   ... (from here agent starts processing new ports)

  402 seconds is somewhat not acceptable. Nova waits for 300 seconds by default 
and then fails with vif plugging timeout.
  From log we can also see that new ovs port appeared during agent was busy 
with ports deletion stuff: 

  2015-08-29 09:13:52.432 21292 DEBUG neutron.agent.linux.ovsdb_monitor [-] 
Output received from ovsdb monitor: 
{"data":[["8fd481a4-1267-445b-bedc-f1f6b3a47898","old",null,["set",[]]],["","new","qvoced59c11-1b",76]],"headings":["row","action","name","ofport"]}
   _read_stdout 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/ovsdb_monitor.py:44

  Port deletion handling needs to be optimised on agent side.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1491922/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1428713] Re: migrate non-dvr to dvr case, snat netns not created

2015-08-18 Thread Oleg Bondarev
I think we need to add explicit validation for router being set to admin state 
down prior to upgrade. 
This should eliminate the confusion.

** Changed in: neutron
   Status: Invalid = Triaged

** Changed in: neutron
 Assignee: ZongKai LI (lzklibj) = Oleg Bondarev (obondarev)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1428713

Title:
  migrate non-dvr to dvr case, snat netns not created

Status in neutron:
  In Progress

Bug description:
  On a 1+2 env, router has external network attached.
  Use follow steps to migrate from non-dvr to dvr:
  1) modify related config files.
  2) restart related services.
  3) run command neutron router-update --distributed=True ROUTER.

  Now, there's no snat-* netns create on controller node.
  As a workaround, restart neutron-l3-agent on controller node will work.

  And in l3-agent.log, we can find:
  2015-02-28 01:26:21.377 5283 ERROR neutron.agent.l3.agent [-] 'LegacyRouter' 
object has no attribute 'dist_fip_count'
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent Traceback (most 
recent call last):
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent   File 
/usr/lib/python2.7/site-packages/neutron/common/utils.py, line 342, in call
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent return 
func(*args, **kwargs)
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent   File 
/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py, line 592, in 
process_router
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent 
self.scan_fip_ports(ri)
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent   File 
/usr/lib/python2.7/site-packages/neutron/agent/l3/dvr.py, line 128, in 
scan_fip_ports
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent if not 
ri.router.get('distributed') or ri.dist_fip_count is not None:
  2015-02-28 01:26:21.377 5283 TRACE neutron.agent.l3.agent AttributeError: 
'LegacyRouter' object has no attribute 'dist_fip_count'

  It seems current code is not ready to migrate LegacyRouter to
  DvrRouter.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1428713/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1447397] Re: Removing one interface from a Router, deletes the qrouter namespace

2015-08-12 Thread Oleg Bondarev
*** This bug is a duplicate of bug 1443524 ***
https://bugs.launchpad.net/bugs/1443524

** This bug is no longer a duplicate of bug 1443596
   Removing an interface from a DVR router removes all SNAT ports of all 
connected subnets
** This bug has been marked a duplicate of bug 1443524
   Removing an interface by port from a DVR router deletes all SNAT ports

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1447397

Title:
  Removing one interface from a Router, deletes the qrouter namespace

Status in neutron:
  Confirmed

Bug description:
  In the DVR mode, when an interface is removed from the router, it is
  deleting the qrouter namesapce itself.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1447397/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1484135] [NEW] DetachedInstanceError on network delete

2015-08-12 Thread Oleg Bondarev
Public bug reported:

DetachedInstanceError occures when logging dhcp port was deleted
concurrently and accessing db object while it was already expunged from
session. Code in question:

def _delete_ports(self, context, ports):
for port in ports:
try:
self.delete_port(context, port.id)
except (exc.PortNotFound, sa_exc.ObjectDeletedError):
context.session.expunge(port)
# concurrent port deletion can be performed by
# release_dhcp_port caused by concurrent subnet_delete
LOG.info(_LI(Port %s was deleted concurrently), port.id)

Traceback:

2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource Traceback (most 
recent call last):
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py, line 83, in 
resource
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py, line 490, in delete
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
obj_deleter(request.context, id, **kwargs)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py, line 775, in 
delete_network
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
self._delete_ports(context, ports)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py, line 686, in 
_delete_ports
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
LOG.info(_LI(Port %s was deleted concurrently), port.id)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 239, in 
__get__
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource return 
self.impl.get(instance_state(instance), dict_)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 589, in 
get
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource value = 
callable_(state, passive)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/state.py, line 424, in 
__call__
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
self.manager.deferred_scalar_loader(self, toload)
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/loading.py, line 563, in 
load_scalar_attributes
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
(state_str(state)))
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
DetachedInstanceError: Instance Port at 0x7f8f7d544dd0 is not bound to a 
Session; attribute refresh operation cannot proceed
2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource
2015-08-12T09:26:42.990805+00:00 info:  2015-08-12 09:26:42.987 4250 INFO 
neutron.wsgi [req-2bbc2b06-40f1-41e7-a230-3026ea94414d ] 10.109.2.3 - - 
[12/Aug/2015 09:26:42] DELETE 
/v2.0/networks/a3322fce-2fc9-4be3-88d7-ba1d4f4294df.json HTTP/1.1 500 378 
0.938119

** Affects: neutron
 Importance: High
 Assignee: Oleg Bondarev (obondarev)
 Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1484135

Title:
  DetachedInstanceError on network delete

Status in neutron:
  Confirmed

Bug description:
  DetachedInstanceError occures when logging dhcp port was deleted
  concurrently and accessing db object while it was already expunged
  from session. Code in question:

  def _delete_ports(self, context, ports):
  for port in ports:
  try:
  self.delete_port(context, port.id)
  except (exc.PortNotFound, sa_exc.ObjectDeletedError):
  context.session.expunge(port)
  # concurrent port deletion can be performed by
  # release_dhcp_port caused by concurrent subnet_delete
  LOG.info(_LI(Port %s was deleted concurrently), port.id)

  Traceback:

  2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource Traceback (most 
recent call last):
  2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py, line 83, in 
resource
  2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource result = 
method(request=request, **args)
  2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource   File 
/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py, line 490, in delete
  2015-08-12 09:26:42.976 4250 TRACE neutron.api.v2.resource 
obj_deleter(request.context, id, **kwargs)
  2015-08-12 09

[Yahoo-eng-team] [Bug 1482630] [NEW] Router resources lost after rescheduling

2015-08-07 Thread Oleg Bondarev
Public bug reported:

Currently router_added_to_agent (and other) notifications are sent to
agents with an RPC cast() method which does not ensure that the message
is actually delivered to the recipient. If the message is lost (for
example due to instability of messaging system during failover
scenarios) neither server nor agent will be aware of that and router
namespace will not be created by the hosting agent till the next resync.
Resync will only happen in case of errors on agent side or restart which
might take quite a long time.

The proposal would be to use RPC call() to notify agents about added
routers thus ensuring no routers will be lost by agents.

** Affects: neutron
 Importance: Undecided
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1482630

Title:
  Router resources lost after rescheduling

Status in neutron:
  New

Bug description:
  Currently router_added_to_agent (and other) notifications are sent to
  agents with an RPC cast() method which does not ensure that the
  message is actually delivered to the recipient. If the message is lost
  (for example due to instability of messaging system during failover
  scenarios) neither server nor agent will be aware of that and router
  namespace will not be created by the hosting agent till the next
  resync. Resync will only happen in case of errors on agent side or
  restart which might take quite a long time.

  The proposal would be to use RPC call() to notify agents about added
  routers thus ensuring no routers will be lost by agents.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1482630/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


  1   2   >