[Yahoo-eng-team] [Bug 1787908] Re: ARP_spoofing on linuxbridge ml2

2018-10-22 Thread Launchpad Bug Tracker
[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1787908

Title:
  ARP_spoofing on linuxbridge ml2

Status in neutron:
  Expired

Bug description:
  Release:  Pike
  Environment:  Ubuntu 16.04
  Neutron ML2 Plugin:  Linux Bridge Vlan
  Problem:  cannot turn off arp_spoofing on linuxbridge ml2

  Background:
  According to Pike release notes linuxbridge agent parameter 
“prevent_arp_spoofing” has been depreciated.  Instead, in order to disable 
arp_spoofing (ebtables) on the Port, the Port’s attribute 
“port_security_enabled” should be set to false and no security group is on that 
port.Further, the Pike documentation says that the Network (and Port) 
“port_security_enabled” is True by default and the Port’s value if not 
explicitly set will default to the Networks value”.  

  Problem:
  Given the linuxBridge_agent config below and the Network and Port attribute 
“port_security_enabled” set to False, with no security group arp_spoofing is 
being established on the port.Further we are finding that the default 
“port_security_enabled” value for Networks and Ports are actually set to 
“false” contrary to the documentation.   

  Diagnostics:
  We’ve been able to trace the port’s “port_security_enabled” value through the 
plugin and we’ve found that at some point that we haven’t identified yet the 
value is being set from false to true.In the module, 
arp_protect.py::setup_arp_spoofing_proteciton() we’ve printed out the port’s 
value and have prior to the if statement and have found that the value has been 
previously set to True, which at this point drops through the if statement to 
enable ebtables values on the port.  We’ve tried to trace the value back up the 
stack but have not found where it is being reset.Any thoughts?   Is this a 
bug or are we missing a configuration somewhere?  As a work-around we will set 
the value to false in our environment.   


  ++ linuxbridge_agent.ini
  [linux_bridge]
  physical_interface_mappings=physnet1:eth2,physnet2:eth1,physnet3:eth3

  [vxlan]
  enable_vxlan=false

  [agent]
  prevent_arp_spoofing=false

  [ml2_type_vlan]
  network_vlan_ranges=physnet1:55:55,physnet2:50:50,physnet3:210:215

  [ml2]
  type_drivers=vlan, local
  mechanism_drivers=linuxbridge

  [securitygroup]
  enable_security_group=false
  firewall_driver = neutron.agent.firewall.NoopFirewallDriver

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1787908/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783654] Re: DVR process flow not installed on physical bridge for shared tenant network

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/609440
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=020d745f5b859f93f0c550be221c350bc14e8d23
Submitter: Zuul
Branch:stable/pike

commit 020d745f5b859f93f0c550be221c350bc14e8d23
Author: Swaminathan Vasudevan 
Date:   Thu Aug 23 05:54:17 2018 +

Revert "DVR: Inter Tenant Traffic between networks not possible with shared 
net"

This reverts commit d019790fe436b72cb05b8d0ff1f3a62ebd9e9bee.

Closes-Bug: #1783654
Change-Id: I4fd2610e185fb60cae62693cd4032ab700209b5f
(cherry picked from commit fd72643a61f726145288b2a468b044e84d02c88e)
(cherry picked from commit b70afb50138f9588a5165e1ca986f83856d5399d)


** Changed in: cloud-archive/pike
   Status: Invalid => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783654

Title:
  DVR process flow not installed on physical bridge for shared tenant
  network

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive pike series:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Fix Committed
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Committed
Status in neutron source package in Cosmic:
  Fix Released

Bug description:
  Seems like collateral from
  https://bugs.launchpad.net/neutron/+bug/1751396

  In DVR, the distributed gateway port's IP and MAC are shared in the
  qrouter across all hosts.

  The dvr_process_flow on the physical bridge (which replaces the shared
  router_distributed MAC address with the unique per-host MAC when its
  the source), is missing, and so is the drop rule which instructs the
  bridge to drop all traffic destined for the shared distributed MAC.

  Because of this, we are seeing the router MAC on the network
  infrastructure, causing it on flap on br-int on every compute host:

  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec1
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec2
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
  1 4  fa:16:3e:42:a2:ec1
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec0
  root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
     11 4  fa:16:3e:42:a2:ec0

  Where port 1 is phy-br-vlan, connecting to the physical bridge, and
  port 11 is the correct local qr-interface. Because these dvr flows are
  missing on br-vlan, pkts w/ source mac ingress into the host and br-
  int learns it upstream.

  The symptom is when pinging a VM's floating IP, we see occasional
  packet loss (10-30%), and sometimes the responses are sent upstream by
  br-int instead of the qrouter, so the ICMP replies come with fixed IP
  of the replier since no NAT'ing took place, and on the tenant network
  rather than external network.

  When I force net_shared_only to False here, the problem goes away:
  
https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436

  It should we noted we *ONLY* need to do this on our dvr_snat host. The
  dvr process's are missing on every compute host. But if we shut
  qrouter on the snat host, FIP functionality works and DVR mac stops
  flapping on others. Or if we apply fix only to snat host, it works.
  Perhaps there is something on SNAT node that is unique


  Ubuntu SRU details:
  ---
  [Impact]
  See above

  [Test Case]
  Deploy OpenStack with dvr enabled and then follow the steps above.

  [Regression Potential]
  The patches that are backported have already landed upstream in the 
corresponding stable branches, helping to minimize any regression potential.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1783654/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team

[Yahoo-eng-team] [Bug 1799340] [NEW] The default_ephemeral_device in instances table is NULL when deploying a VM

2018-10-22 Thread Sun Mengyun
Public bug reported:

Description
===
I deploy a vm with flavor:
+--+---+--+---+--+---+-+-
| Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | 
+--+---+--+---+--+---+-+---+-+
| smy_test | 512   | 10   | 5 | 1| 2 | 1.0 | True   
   | -   |
+--+---+--+---+--+---+-+-

When the VM is deployed successfully and running normally,the 
default_ephemeral_device is not updated in instances table in database. 
Obviously default_swap_device and root_device_name is updated.
+--+--+-+
| root_device_name | default_ephemeral_device | default_swap_device |
+--+--+-+
| /dev/vda | NULL | /dev/vdc|
+--+--+-+

Steps to reproduce
==
1. create a flavor using ephemeral
2. deploy a VM with this flavor

Expected result
===
check instances table and default_ephemeral_device is NULL;

Actual result
=
default_ephemeral_device  has a value;

Environment
===
[root@nail1 ~]# rpm -qa | grep nova
openstack-nova-api-18.0.2-1.el7.noarch
openstack-nova-common-18.0.2-1.el7.noarch
python2-novaclient-11.0.0-1.el7.noarch
openstack-nova-placement-api-18.0.2-1.el7.noarch
openstack-nova-scheduler-18.0.2-1.el7.noarch
openstack-nova-conductor-18.0.2-1.el7.noarch
openstack-nova-novncproxy-18.0.2-1.el7.noarch
python-nova-18.0.2-1.el7.noarch
openstack-nova-compute-18.0.2-1.el7.noarch
openstack-nova-console-18.0.2-1.el7.noarch

hypervisor:
Libvirt + KVM

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799340

Title:
  The default_ephemeral_device in instances table  is  NULL  when
  deploying a VM

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  I deploy a vm with flavor:
  
+--+---+--+---+--+---+-+-
  | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | 
  
+--+---+--+---+--+---+-+---+-+
  | smy_test | 512   | 10   | 5 | 1| 2 | 1.0 | True 
 | -   |
  
+--+---+--+---+--+---+-+-

  When the VM is deployed successfully and running normally,the 
default_ephemeral_device is not updated in instances table in database. 
Obviously default_swap_device and root_device_name is updated.
  +--+--+-+
  | root_device_name | default_ephemeral_device | default_swap_device |
  +--+--+-+
  | /dev/vda | NULL | /dev/vdc|
  +--+--+-+

  Steps to reproduce
  ==
  1. create a flavor using ephemeral
  2. deploy a VM with this flavor

  Expected result
  ===
  check instances table and default_ephemeral_device is NULL;

  Actual result
  =
  default_ephemeral_device  has a value;

  Environment
  ===
  [root@nail1 ~]# rpm -qa | grep nova
  openstack-nova-api-18.0.2-1.el7.noarch
  openstack-nova-common-18.0.2-1.el7.noarch
  python2-novaclient-11.0.0-1.el7.noarch
  openstack-nova-placement-api-18.0.2-1.el7.noarch
  openstack-nova-scheduler-18.0.2-1.el7.noarch
  openstack-nova-conductor-18.0.2-1.el7.noarch
  openstack-nova-novncproxy-18.0.2-1.el7.noarch
  python-nova-18.0.2-1.el7.noarch
  openstack-nova-compute-18.0.2-1.el7.noarch
  openstack-nova-console-18.0.2-1.el7.noarch

  hypervisor:
  Libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799340/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784353] Re: Rescheduled boot from volume instances fail due to the premature removal of their attachments

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/587071
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=41452a5c6adb8cae54eef24803f4adc468131b34
Submitter: Zuul
Branch:master

commit 41452a5c6adb8cae54eef24803f4adc468131b34
Author: Lee Yarwood 
Date:   Mon Jul 30 13:41:35 2018 +0100

conductor: Recreate volume attachments during a reschedule

When an instance with attached volumes fails to spawn, cleanup code
within the compute manager (_shutdown_instance called from
_build_resources) will delete the volume attachments referenced by
the bdms in Cinder. As a result we should check and if necessary
recreate these volume attachments when rescheduling an instance.

Note that there are a few different ways to fix this bug by
making changes to the compute manager code, either by not deleting
the volume attachment on failure before rescheduling [1] or by
performing the get/create check during each build after the
reschedule [2].

The problem with *not* cleaning up the attachments is if we don't
reschedule, then we've left orphaned "reserved" volumes in Cinder
(or we have to add special logic to tell compute when to cleanup
attachments).

The problem with checking the existence of the attachment on every
new host we build on is that we'd be needlessly checking that for
initial creates even if we don't ever need to reschedule, unless
again we have special logic against that (like checking to see if
we've rescheduled at all).

Also, in either case that involves changes to the compute means that
older computes might not have the fix.

So ultimately it seems that the best way to handle this is:

1. Only deal with this on reschedules.
2. Let the cell conductor orchestrate it since it's already dealing
   with the reschedule. Then the compute logic doesn't need to change.

[1] https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1631
[2] https://review.openstack.org/#/c/587071/4/nova/compute/manager.py@1667

Change-Id: I739c06bd02336bf720cddacb21f48e7857378487
Closes-bug: #1784353


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784353

Title:
  Rescheduled boot from volume instances fail due to the premature
  removal of their attachments

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress

Bug description:
  Description
  ===
  This is caused by the cleanup code within the compute layer 
(_shutdown_instance) removing all volume attachments associated with an 
instance with no attempt being made to recreate these ahead of the instance 
being rescheduled.

  Steps to reproduce
  ==
  - Attempt to boot an instance with volumes attached.
  - Ensure spawn() fails, for example by stopping the l2 network agent services 
on the compute host.

  Expected result
  ===
  The instance is reschedule to another compute host and boots correctly.

  Actual result
  =
  The instance fails to boot on all hosts that is rescheduled to due to a 
missing volume attachment.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 bf497cc47497d3a5603bf60de652054ac5ae1993

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 Libvirt + KVM, however this shouldn't matter.

  3. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  4. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] Traceback (most recent call last):  
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1579, in 
_prep_block_device
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] 
wait_func=self._await_block_device_map_created)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 837, in 
attach_block_devices
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] _log_and_attach(device)
  

[Yahoo-eng-team] [Bug 1799338] [NEW] cloud-init won't reformat NTFS ephemeral drive on SLES 15

2018-10-22 Thread Jason Zions
Public bug reported:

Commit aa4eeb808 (Paul Meyer 2018-05-23 15:45:39 -0400  710) detects that the 
platform doesn't support NTFS volumes by looking for the appropriate error 
message in the exception it catches. The exact syntax of that error message 
differs between RHEL (the target distro for Paul's merge) and SUSE:
  RHEL mount: unknown filesystem type 'ntfs'
  SUSE mount: /dev/sdb1: unknown filesystem type 'ntfs'

As a result, cloud-init on SUSE VMs in Azure doesn't properly detect
that the distro doesn't support NTFS and thus will not reformat the
ephemeral volume on Azure.

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Merge proposal linked:
   https://code.launchpad.net/~jasonzio/cloud-init/+git/cloud-init/+merge/357669

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1799338

Title:
  cloud-init won't reformat NTFS ephemeral drive on SLES 15

Status in cloud-init:
  New

Bug description:
  Commit aa4eeb808 (Paul Meyer 2018-05-23 15:45:39 -0400  710) detects that the 
platform doesn't support NTFS volumes by looking for the appropriate error 
message in the exception it catches. The exact syntax of that error message 
differs between RHEL (the target distro for Paul's merge) and SUSE:
RHEL mount: unknown filesystem type 'ntfs'
SUSE mount: /dev/sdb1: unknown filesystem type 'ntfs'

  As a result, cloud-init on SUSE VMs in Azure doesn't properly detect
  that the distro doesn't support NTFS and thus will not reformat the
  ephemeral volume on Azure.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1799338/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799337] [NEW] Building cloud-init on SLES 15 complains about missing cloud-id binary

2018-10-22 Thread Jason Zions
Public bug reported:

Change 6ee8a2c55 (Chad Smith  2018-10-09 22:19:20 + 286)
added 'cloud-id' to the list of console scripts in setup.py. The
packaging spec for suse wasn't changed to package that script.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1799337

Title:
  Building cloud-init on SLES 15 complains about missing cloud-id binary

Status in cloud-init:
  New

Bug description:
  Change 6ee8a2c55 (Chad Smith  2018-10-09 22:19:20 + 286)
  added 'cloud-id' to the list of console scripts in setup.py. The
  packaging spec for suse wasn't changed to package that script.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1799337/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799332] [NEW] Apache WSGI config shipping with Keystone is incompatible with Horizon

2018-10-22 Thread Mike Joseph
Public bug reported:

In keystone/httpd/wsgi-keystone.conf, the following configuration is
present:

Alias /identity /usr/local/bin/keystone-wsgi-public

SetHandler wsgi-script
Options +ExecCGI

WSGIProcessGroup keystone-public
WSGIApplicationGroup %{GLOBAL}
WSGIPassAuthorization On


However, it is both harmful and unnecessary.  The operative WSGI
configuration for Keystone comes from the ... section.  In fact, the commit which added the
/identity endpoint described it as an documentation example:

"Apache Httpd can be configured to accept keystone requests on all
sorts of interfaces. The sample config file is updated to show
how to configure Apache Httpd to also send requests on /identity
and /identity_admin to keystone."

Leaving it in place, however, causes conflicts when Horizon is
concurrently installed:

AH01630: client denied by server configuration: /usr/bin/keystone-wsgi-
public

...in responses to Horizon URL's referencing '/identity'.  Therefore, I
believe keeping this configuration snippet in the shipped WSGI
configuration (as opposed to actual documentation) is a defect.

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1799332

Title:
  Apache WSGI config shipping with Keystone is incompatible with Horizon

Status in OpenStack Identity (keystone):
  New

Bug description:
  In keystone/httpd/wsgi-keystone.conf, the following configuration is
  present:

  Alias /identity /usr/local/bin/keystone-wsgi-public
  
  SetHandler wsgi-script
  Options +ExecCGI

  WSGIProcessGroup keystone-public
  WSGIApplicationGroup %{GLOBAL}
  WSGIPassAuthorization On
  

  However, it is both harmful and unnecessary.  The operative WSGI
  configuration for Keystone comes from the ... section.  In fact, the commit which added the
  /identity endpoint described it as an documentation example:

  "Apache Httpd can be configured to accept keystone requests on all
  sorts of interfaces. The sample config file is updated to show
  how to configure Apache Httpd to also send requests on /identity
  and /identity_admin to keystone."

  Leaving it in place, however, causes conflicts when Horizon is
  concurrently installed:

  AH01630: client denied by server configuration: /usr/bin/keystone-
  wsgi-public

  ...in responses to Horizon URL's referencing '/identity'.  Therefore,
  I believe keeping this configuration snippet in the shipped WSGI
  configuration (as opposed to actual documentation) is a defect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1799332/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799153] Re: Inappropriate behaviour of limits when passing --region None in create and list.

2018-10-22 Thread wangxiyuan
https://review.openstack.org/#/c/612283

** Project changed: keystone => python-openstackclient

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1799153

Title:
  Inappropriate behaviour of limits when passing --region None in create
  and list.

Status in python-openstackclient:
  New

Bug description:
  When creating registered limit by passing --region None in registered
  limit create cli, it is giving error message "More than one resource
  exist for region" which is definitely a wrong message as regions with
  same name cannot be created neither same exist.

  The correct behaviour should be -
  1. In the case if --region None it should create a registered limit.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
creating.

  Same in case of registerd limit list
  1. In the case if --region None it should list all limits ignoring None.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
listing.

  Sane behaviors for limit create and list

To manage notifications about this bug go to:
https://bugs.launchpad.net/python-openstackclient/+bug/1799153/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1794809] Re: Gateway ports are down after reboot of control plane nodes

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/606085
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=f787f12aa3441ecffef55f261c4d87dbb12ca6cf
Submitter: Zuul
Branch:master

commit f787f12aa3441ecffef55f261c4d87dbb12ca6cf
Author: Slawek Kaplonski 
Date:   Fri Sep 28 13:07:28 2018 +0200

Make port binding attempt after agent is revived

In some cases it may happen that port is "binding_failed"
because L2 agent running on destination host was down but
this is "temporary" issue.
It is like that for example in case when using L3 HA and when
master and backup network nodes were e.g. rebooted.
L3 agent might start running before L2 agent on host in such case
and if it's new master node, router ports will have "binding_failed"
state.

When agent sends heartbeat and is getting back to live,
ML2 plugin will try to bind all ports with "binding_failed"
from this host.

Change-Id: I3bedb7c22312884cc28aa78aa0f8fbe418f97090
Closes-Bug: #1794809


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1794809

Title:
  Gateway ports are down after reboot of control plane nodes

Status in neutron:
  Fix Released

Bug description:
  Sometimes when control plane nodes are going down and then up it may happen 
that for L3 HA routers, failover of active router will happen and in such case 
if L3 agent will be running before openvswitch agent on host, gateway port may 
be in "binding failed" state on new MASTER agent.
  That will cause no connectivity to floating IPs on this router.

  I tested this on Queens but it seems that there wasn't any changes in
  this since Queens.

  One possible solution might be to trigger another bind attempt for all
  ports which are binding_failed on host when L2 agent from this host is
  revived. I will investigate if that would work.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1794809/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799328] [NEW] Should not store segmenthostmapping table when segment service plugin disabled

2018-10-22 Thread zhaobo
Public bug reported:

Version
=
Openstack neutron Ocata

Issue Description
=
Currently, the default behavior of Neutron will store the segment in compute 
nodes level, so the port binding process can know exactly which network plane 
it can reach on some specific compute nodes. But for some SDN controllers, 
which integrated as a mechanism driver of ml2 core plugin, they may don't use 
the segmenthostmapping info, as they manage their own mapping, so they don't 
use that info, and it may raise some other issues, such as performance issue in 
a large
scaled deployment.

Proposal
=
If the env doesn't enable the segments service plugin, neutron server won't 
inject any record into segmenthostmapping table. Only we enable the segments 
service plugin, and we want to create routed networks, then store the necessary 
info into the db table.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: api ocata-backport-potential

** Tags added: api ocata-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799328

Title:
  Should not store segmenthostmapping table when segment service plugin
  disabled

Status in neutron:
  New

Bug description:
  Version
  =
  Openstack neutron Ocata

  Issue Description
  =
  Currently, the default behavior of Neutron will store the segment in compute 
nodes level, so the port binding process can know exactly which network plane 
it can reach on some specific compute nodes. But for some SDN controllers, 
which integrated as a mechanism driver of ml2 core plugin, they may don't use 
the segmenthostmapping info, as they manage their own mapping, so they don't 
use that info, and it may raise some other issues, such as performance issue in 
a large
  scaled deployment.

  Proposal
  =
  If the env doesn't enable the segments service plugin, neutron server won't 
inject any record into segmenthostmapping table. Only we enable the segments 
service plugin, and we want to create routed networks, then store the necessary 
info into the db table.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799328/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1796887] Re: Validation of tokens degraded after upgrade to Rocky

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/608963
Committed: 
https://git.openstack.org/cgit/openstack/keystone/commit/?id=d465a58f02f134086d6322c5b858c056a3aea025
Submitter: Zuul
Branch:master

commit d465a58f02f134086d6322c5b858c056a3aea025
Author: Jose Castro Leon 
Date:   Tue Oct 9 15:11:48 2018 +0200

Add caching on trust role validation to improve performance

In the token model, the trust roles are not cached. This behavior
impacts services that are using trusts heavily like heat or magnum.
It introduces new cache data to improve the performance on token
validation requests on trusts.

Change-Id: I974907b427c34fd5db3228b6139d93bbcdc38df5
Closes-Bug: #1796887


** Changed in: keystone
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1796887

Title:
  Validation of tokens degraded after upgrade to Rocky

Status in OpenStack Identity (keystone):
  Fix Released

Bug description:
  Recently we have upgraded Keystone to the Rocky release and we saw a
  quite noticiable increase of the response on validation of certain
  types of tokens. Specifically tokens that are created from trusts.

  On the new token model (keystone/models/token_model.py) that's
  evaluated several times during token validation, the call to retrieve
  the roles from the trust is retrieving the information directly from
  the DB with no caching whatsoever. On other operations of the
  token_model, this information is only requested once, and then cached
  for following operations.

  Since we are using heat and magnum, that are heavily using trusts, we
  were impacted by this change of validation response.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1796887/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799113] Re: queens/pike compute rpcapi version mismatch

2018-10-22 Thread melanie witt
** Changed in: nova
   Importance: Undecided => High

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Importance: Undecided => High

** Changed in: nova/rocky
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799113

Title:
  queens/pike compute rpcapi version mismatch

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New

Bug description:
  Doing a live upgrade from pike to queens noticed that resizes weren't
  working.

  In queens source it says pike version of the compute rpcapi is 4.18

  
https://github.com/openstack/nova/blob/eae37a27caa5ca8b0ca50187928bde81f28a24e1/nova/compute/rpcapi.py#L361

  Looking at latest stable/pike the latest version there is 4.17

  
https://github.com/openstack/nova/blob/6ef30d5078595108c1c0f2b5c258ae6ef2db1eeb/nova/compute/rpcapi.py#L330

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799113/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1796976] Re: neutron.conf needs lock_path set for router to operate

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/612196
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=f4d438019e3bd2f9b6c64badb9533168e583d8af
Submitter: Zuul
Branch:master

commit f4d438019e3bd2f9b6c64badb9533168e583d8af
Author: SapanaJadhav 
Date:   Sun Oct 21 21:46:32 2018 +0530

neutron.conf needs lock_path set for router to operate
This change is adding required configuration in neutron.conf
to set the lock_path parameter, which was missing in
compute-install-ubuntu.rst

Change-Id: If090bdf060dfe21d11b1a5dfd010dc8167d9e45e
Closes-Bug: #1796976


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796976

Title:
  neutron.conf needs lock_path set for router to operate

Status in neutron:
  Fix Released

Bug description:

  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [X] This doc is inaccurate in this way: Using self-service network. Router 
fails to operate if lock_path is not set.
  - [ ] This is a doc addition request.
  - [ ] I have a fix to the document that I can paste below including example: 
input and output. 

  Detail:
  - Rocky clean install followinf the self-service network model
  - While creating the sample networks and router the following issues arise 
when running throug the verifications
  1) ip netns <-- shows qdhcp namespaces but no qrouter
  2) openstack port list --router router <-- shows the interdaces in down 
states
  After researching the log files i see in 
/var/log/neutron/neutron-l3-agent.log that the required parameter lock_path is 
missing

  I've edited the /etc/neutron/neutron.conf and in the [oslo_concurrency] have 
set 
  lock_path = /var/lib/neutron/tmp

  After reinit:
  - /var/log/neutron/neutron-l3-agent.log is clean, 
  - ip netns shows 2 qdhcp and one qrouter as expected
  - openstack port list --router router <- Ports are shown as active

  Therefore my guess is that neutron install guide should be updated to
  reflect this needed paramenter.

  Thanks a lot in advance.

  ---
  Release:  on 2018-10-08 10:46
  SHA: 6084f10333d7662a4f98994db49fd52bf9bf68f2
  Source: 
https://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/install-guide/source/launch-instance-networks-selfservice.rst
  URL: 
https://docs.openstack.org/install-guide/launch-instance-networks-selfservice.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796976/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1796854] Re: Neutron doesn't respect advscv role while creating port

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/609633
Committed: 
https://git.openstack.org/cgit/openstack/neutron-lib/commit/?id=00147a7d700e6d0142161152137bbab0c39ce4c0
Submitter: Zuul
Branch:master

commit 00147a7d700e6d0142161152137bbab0c39ce4c0
Author: Maciej Józefczyk 
Date:   Thu Oct 11 08:57:29 2018 +

Allow advsvc role to create port in foreign tenant

Change [1] introduced support for advsvc role. This added
possibility for user with role advsvc to make CRUD operations
on ports, subnets and networks in foreign tenants.
Due the check in _validate_privileges() it was not working.
This patch fixes that.

Closes-Bug: #1796854

[1] https://review.openstack.org/#/c/101281

Change-Id: I6a3f91337bf8dd32012a75916e3409e30f46b50d


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796854

Title:
  Neutron doesn't respect advscv role while creating port

Status in neutron:
  Fix Released

Bug description:
  Neutron doesn't allow user with role 'advsvc' to add port in other user 
tenant network.
  Introduced change:
  https://review.openstack.org/#/c/101281/10
  Should allow that, but in fact in neutron-lib there is no validation for 
advsvc role:
  
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/attributes.py#L28

  Error:
  Specifying 'project_id' or 'tenant_id' other than the authenticated project 
in request requires admin privileges


  
  Version
  
  Devstack master.

  
  
  How to reproduce
  

  1. Setup devstack master, add new project and user to this project with role 
advsvc
  source devstack/openrc admin demo

  openstack project create advsvc-project
  openstack user create --project advsvc-project --password test 
advsvc-project-user
  openstack role create advsvc
  openstack role add --user advsvc-project-user --project advsvc-project advsvc
  openstack role add --user advsvc-project-user --project advsvc-project member

  
  2. Create network in other project.
  openstack project create test-project
  openstack user create --project test-project --password test test-project-user
  openstack role add --user test-project-user --project test-project member

  neutron net-create private-net-test-user --provider:network_type=vxlan
  --provider:segmentation_id=1234 --project-id [[ test-project-id ]]

  neutron subnet-create private-net-test-user --name private-subnet-
  test-user --allocation-pool start=10.13.12.100,end=10.13.12.130
  10.13.12.0/24 --dns-nameserver 8.8.8.8 --project-id [[ test-project-id
  ]]

  3. Create a port in test-project tenant by user with advsvc role:

  stack@mjozefcz-devstack:~$ neutron port-create --tenant-id 
865073224f7b4e9d9fdd4a446e3a4af4 private-net-test-user
  neutron CLI is deprecated and will be removed in the future. Use openstack 
CLI instead.
  Specifying 'project_id' or 'tenant_id' other than the authenticated project 
in request requires admin privileges
  Neutron server returns request_ids: 
['req-e841edb1-2cf2-47b6-a493-11a56114a323']

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796854/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799309] [NEW] Migration/Resize fails with Unexpected exception in API method: Circular reference detected

2018-10-22 Thread Eric Miller
Public bug reported:

Description
===

Cold migration of a VM failed, so I tried resizing, and the same error
occurred.

Previous to this, I had disabled the compute service on one node, to see
if cold migrating VMs would avoid scheduling the VMs on the disabled
node.  This succeeded a few times, but then this error occurred and now
it happens on the two test VMs that I have running in this environment
(there are only two VMs total).


Steps to reproduce
==

Run as "admin" on a test server created by a domain user in its respective 
domain project:
openstack server migrate 

It continues to be 100% repeatable.  VMs are still operational and can
be shut down and powered on without issue.


Expected result
===

As with previous migrations, a graceful shut down, cold migrate, and
power on of the VM.


Actual result
=

This is returned (note that this is from a different migration attempt
than the logs included below):

Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and 
attach the Nova API log if possible.
 (HTTP 500) (Request-ID: 
req-46ce0fd3-5579-48be-9486-499fdea085a1)


Environment
===

stable/rocky deployed with Kolla-Ansible 7.0.0.0rc3devXX (as of October
15, 2018) with respective Kolla images

Hypervisor: Libvirt + KVM
Storage: iSCSI attached (Blockbridge)
Networking: DVR


Logs & Configs
==

This is a filtered list of nova files from all nova containers running
on all controllers, concatenated and sorted, filtered by the request ID.

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.105 26 DEBUG nova.api.openstack.wsgi [req-8a41af8d-23ea-491e-
be0a-b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] Action: 'action',
calling method: >, body: {"migrate": null} _process_stack
/var/lib/kolla/venv/lib/python2.7/site-
packages/nova/api/openstack/wsgi.py:615

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.106 26 DEBUG nova.compute.api [req-8a41af8d-23ea-491e-be0a-
b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] [instance: 2fd8ff29
-f64a-4e5b-bfcd-97c52cf6d66d] Fetching instance by UUID get
/var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/api.py:2402

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.117 26 DEBUG oslo_concurrency.lockutils [req-8a41af8d-23ea-
491e-be0a-b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] Lock
"43894fde-4653-4499-9c83-0e963c974fae" acquired by
"nova.context.get_or_set_cached_cell_and_set_connections" :: waited
0.000s inner /var/lib/kolla/venv/lib/python2.7/site-
packages/oslo_concurrency/lockutils.py:273

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.118 26 DEBUG oslo_concurrency.lockutils [req-8a41af8d-23ea-
491e-be0a-b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] Lock
"43894fde-4653-4499-9c83-0e963c974fae" released by
"nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s
inner /var/lib/kolla/venv/lib/python2.7/site-
packages/oslo_concurrency/lockutils.py:285

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.174 26 DEBUG nova.objects.instance [req-8a41af8d-23ea-491e-
be0a-b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] Lazy-loading
'flavor' on Instance uuid 2fd8ff29-f64a-4e5b-bfcd-97c52cf6d66d
obj_load_attr /var/lib/kolla/venv/lib/python2.7/site-
packages/nova/objects/instance.py:1109

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.212 26 DEBUG nova.compute.api [req-8a41af8d-23ea-491e-be0a-
b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] [instance: 2fd8ff29
-f64a-4e5b-bfcd-97c52cf6d66d] flavor_id is None. Assuming migration.
resize /var/lib/kolla/venv/lib/python2.7/site-
packages/nova/compute/api.py:3448

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:07.213 26 DEBUG nova.compute.api [req-8a41af8d-23ea-491e-be0a-
b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] [instance: 2fd8ff29
-f64a-4e5b-bfcd-97c52cf6d66d] Old instance type c5.4xlarge, new instance
type c5.4xlarge resize /var/lib/kolla/venv/lib/python2.7/site-
packages/nova/compute/api.py:3469

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:09.999 26 ERROR nova.api.openstack.wsgi [req-8a41af8d-23ea-491e-
be0a-b0fbe98dd6a7 ebdf7b1025a6464ba150b8ea63bfacb5
a87099d25afd4c599d34b2fae7689dec - default default] Unexpected exception
in API method: ValueError: Circular reference detected

/var/lib/docker/volumes/kolla_logs/_data/nova/nova-api.log:2018-10-22
16:05:10.005 26 INFO nova.api.openstack.wsgi [req-8a41af8d-23ea-491e-

[Yahoo-eng-team] [Bug 1799298] Re: Metadata API cross joining instance_metadata and instance_system_metadata

2018-10-22 Thread Matt Riedemann
** Tags added: api db metadata performance

** Changed in: nova
   Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Changed in: nova/pike
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/pike
   Status: New => Triaged

** Changed in: nova/queens
   Status: New => Triaged

** Changed in: nova/rocky
   Status: New => Triaged

** Changed in: nova/ocata
   Status: New => Triaged

** Changed in: nova/ocata
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799298

Title:
  Metadata API cross joining instance_metadata and
  instance_system_metadata

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  
  Description
  ===

  While troubleshooting a production issue we identified that the Nova
  metadata API is fetching a lot more raw data from the database than
  seems necessary. The problem appears to be caused by the SQL query
  used to fetch instance data, which joins the "instance" table with,
  among others, two metadata tables: "instance_metadata" and
  "instance_system_metadata". Below is a simplified version of this
  query which was captured by adding extra logging (the full query is
  listed at the end of this bug report):

  SELECT ...
FROM (SELECT ...
FROM `instances`
   WHERE `instances` . `deleted` = ?
 AND `instances` . `uuid` = ?
   LIMIT ?) AS `anon_1`
LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
  ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . 
`instance_uuid`
LEFT OUTER JOIN (`security_group_instance_association` AS 
`security_group_instance_association_1`
 INNER JOIN `security_groups` AS `security_groups_1`
 ON `security_groups_1` . `id` = 
`security_group_instance_association_1` . `security_group_id`
 AND `security_group_instance_association_1` . `deleted` = ?
 AND `security_groups_1` . `deleted` = ? )
  ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`
 AND `anon_1` . `instances_deleted` = ?
LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
  ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . 
`id`
 AND `security_group_rules_1` . `deleted` = ?
LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
  ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`
LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
  ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
  ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
 AND `instance_metadata_1` . `deleted` = ?

  The instance table has a 1-to-many relationship to both
  "instance_metadata" and "instance_system_metadata" tables, so the
  query is effectively producing a cross join of both metadata tables.

  
  Steps to reproduce
  ==

  To illustrate the impact of this query, add 2 properties to a running
  instance and verify that it has 2 records in "instance_metadata", as
  well as other records in "instance_system_metadata" such as base image
  properties:

  > select instance_uuid,`key`,value from instance_metadata where instance_uuid 
= 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
  +--+---++
  | instance_uuid| key   | value  |
  +--+---++
  | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
  | a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
  +--+---++
  2 rows in set (0.61 sec)

  > select instance_uuid,`key`,valusystem_metadata where instance_uuid = 
'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
  ++--+
  | key| value|
  ++--+
  | 

[Yahoo-eng-team] [Bug 1799301] [NEW] SUSE sysconfig renderer enablement incomplete

2018-10-22 Thread Robert Schweikert
Public bug reported:

With db50bc0d9 the sysconfig renderer was enabled for openSUSE and SUSE
Linux Enterprise. This implementation is incomplete and network
rendering for openSUSE and SLES is now completely broken.

Message in cloud-init.log:

stages.py[ERROR]: Unable to render networking. Network config is likely
broken: No available network renderers found. Searched through list:
['eni', 'sysconfig', 'netplan']

The issue is that the available() method in sysconfig.py looks for

expected_paths = [
'etc/sysconfig/network-scripts/network-functions',
'etc/sysconfig/network-scripts/ifdown-eth']

in addition to ifup and ifdown. While ifup and ifdown are found the
above scripts do not exists on openSUSE and SLES.

The equivalent to 'etc/sysconfig/network-scripts/network-functions' would be 
'etc/sysconfig/network/functions.netconfig', there is no default ifdown-eth, 
any ifdown scripts would exist in
'etc/sysconfig/network/if-down.d' but this is empty by default.

One option is of course to not look for such specific location and
"trust" that the necessary script for the given distro are installed. We
would only check for ifup and ifdown commands as those are necessary.
The underying distro implementation for script handling may not be as
important here.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1799301

Title:
  SUSE sysconfig renderer enablement incomplete

Status in cloud-init:
  New

Bug description:
  With db50bc0d9 the sysconfig renderer was enabled for openSUSE and
  SUSE Linux Enterprise. This implementation is incomplete and network
  rendering for openSUSE and SLES is now completely broken.

  Message in cloud-init.log:

  stages.py[ERROR]: Unable to render networking. Network config is
  likely broken: No available network renderers found. Searched through
  list: ['eni', 'sysconfig', 'netplan']

  The issue is that the available() method in sysconfig.py looks for

  expected_paths = [
  'etc/sysconfig/network-scripts/network-functions',
  'etc/sysconfig/network-scripts/ifdown-eth']

  in addition to ifup and ifdown. While ifup and ifdown are found the
  above scripts do not exists on openSUSE and SLES.

  The equivalent to 'etc/sysconfig/network-scripts/network-functions' would be 
'etc/sysconfig/network/functions.netconfig', there is no default ifdown-eth, 
any ifdown scripts would exist in
  'etc/sysconfig/network/if-down.d' but this is empty by default.

  One option is of course to not look for such specific location and
  "trust" that the necessary script for the given distro are installed.
  We would only check for ifup and ifdown commands as those are
  necessary. The underying distro implementation for script handling may
  not be as important here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1799301/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1687027] Re: test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout

2018-10-22 Thread Slawek Kaplonski
It looks that in some cases such test can take longer than 300 seconds
and there are still failures there. See:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22in%20test_walk_versions%5C%22%20AND%20filename%3A%5C
%22job-output.txt%5C%22

** Changed in: neutron
   Status: Fix Released => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1687027

Title:
  test_walk_versions tests fail with "IndexError: tuple index out of
  range" after timeout

Status in neutron:
  In Progress

Bug description:
  http://logs.openstack.org/99/460399/1/check/gate-neutron-dsvm-
  functional-ubuntu-xenial/25de43d/testr_results.html.gz

  Traceback (most recent call last):
File "neutron/tests/base.py", line 115, in func
  return f(self, *args, **kwargs)
File "neutron/tests/base.py", line 115, in func
  return f(self, *args, **kwargs)
File "neutron/tests/functional/db/test_migrations.py", line 551, in 
test_walk_versions
  self._migrate_up(config, engine, dest, curr, with_data=True)
File "neutron/tests/functional/db/test_migrations.py", line 537, in 
_migrate_up
  migration.do_alembic_command(config, 'upgrade', dest)
File "neutron/db/migration/cli.py", line 109, in do_alembic_command
  getattr(alembic_command, cmd)(config, *args, **kwargs)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/command.py",
 line 254, in upgrade
  script.run_env()
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/script/base.py",
 line 416, in run_env
  util.load_python_file(self.dir, 'env.py')
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/util/pyfiles.py",
 line 93, in load_python_file
  module = load_module_py(module_id, path)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/util/compat.py",
 line 75, in load_module_py
  mod = imp.load_source(module_id, path, fp)
File "neutron/db/migration/alembic_migrations/env.py", line 120, in 
  run_migrations_online()
File "neutron/db/migration/alembic_migrations/env.py", line 114, in 
run_migrations_online
  context.run_migrations()
File "", line 8, in run_migrations
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/runtime/environment.py",
 line 817, in run_migrations
  self.get_context().run_migrations(**kw)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/runtime/migration.py",
 line 323, in run_migrations
  step.migration_fn(**kw)
File 
"/opt/stack/new/neutron/neutron/db/migration/alembic_migrations/versions/mitaka/expand/3894bccad37f_add_timestamp_to_base_resources.py",
 line 36, in upgrade
  sa.Column(column_name, sa.DateTime(), nullable=True)
File "", line 8, in add_column
File "", line 3, in add_column
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/operations/ops.py",
 line 1551, in add_column
  return operations.invoke(op)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/operations/base.py",
 line 318, in invoke
  return fn(self, operation)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/operations/toimpl.py",
 line 123, in add_column
  schema=schema
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/ddl/impl.py",
 line 172, in add_column
  self._exec(base.AddColumn(table_name, column, schema=schema))
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/alembic/ddl/impl.py",
 line 118, in _exec
  return conn.execute(construct, *multiparams, **params)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 945, in execute
  return meth(self, multiparams, params)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py",
 line 68, in _execute_on_connection
  return connection._execute_ddl(self, multiparams, params)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1002, in _execute_ddl
  compiled
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1189, in _execute_context
  context)
File 
"/opt/stack/new/neutron/.tox/dsvm-functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
 line 1398, in _handle_dbapi_exception
  util.raise_from_cause(newraise, exc_info)
File 

[Yahoo-eng-team] [Bug 1799298] [NEW] Metadata API cross joining instance_metadata and instance_system_metadata

2018-10-22 Thread Sergio de Carvalho
Public bug reported:


Description
===

While troubleshooting a production issue we identified that the Nova
metadata API is fetching a lot more raw data from the database than
seems necessary. The problem appears to be caused by the SQL query used
to fetch instance data, which joins the "instance" table with, among
others, two metadata tables: "instance_metadata" and
"instance_system_metadata". Below is a simplified version of this query
which was captured by adding extra logging (the full query is listed at
the end of this bug report):

SELECT ...
  FROM (SELECT ...
  FROM `instances`
 WHERE `instances` . `deleted` = ?
   AND `instances` . `uuid` = ?
 LIMIT ?) AS `anon_1`
  LEFT OUTER JOIN `instance_system_metadata` AS `instance_system_metadata_1`
ON `anon_1` . `instances_uuid` = `instance_system_metadata_1` . 
`instance_uuid`
  LEFT OUTER JOIN (`security_group_instance_association` AS 
`security_group_instance_association_1`
   INNER JOIN `security_groups` AS `security_groups_1`
   ON `security_groups_1` . `id` = 
`security_group_instance_association_1` . `security_group_id`
   AND `security_group_instance_association_1` . `deleted` = ?
   AND `security_groups_1` . `deleted` = ? )
ON `security_group_instance_association_1` . `instance_uuid` = `anon_1` . 
`instances_uuid`
   AND `anon_1` . `instances_deleted` = ?
  LEFT OUTER JOIN `security_group_rules` AS `security_group_rules_1`
ON `security_group_rules_1` . `parent_group_id` = `security_groups_1` . `id`
   AND `security_group_rules_1` . `deleted` = ?
  LEFT OUTER JOIN `instance_info_caches` AS `instance_info_caches_1`
ON `instance_info_caches_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_extra` AS `instance_extra_1`
ON `instance_extra_1` . `instance_uuid` = `anon_1` . `instances_uuid`
  LEFT OUTER JOIN `instance_metadata` AS `instance_metadata_1`
ON `instance_metadata_1` . `instance_uuid` = `anon_1` . `instances_uuid`
   AND `instance_metadata_1` . `deleted` = ?

The instance table has a 1-to-many relationship to both
"instance_metadata" and "instance_system_metadata" tables, so the query
is effectively producing a cross join of both metadata tables.


Steps to reproduce
==

To illustrate the impact of this query, add 2 properties to a running
instance and verify that it has 2 records in "instance_metadata", as
well as other records in "instance_system_metadata" such as base image
properties:

> select instance_uuid,`key`,value from instance_metadata where instance_uuid = 
> 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
+--+---++
| instance_uuid| key   | value  |
+--+---++
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1 | value1 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2 | value  |
+--+---++
2 rows in set (0.61 sec)

> select instance_uuid,`key`,valusystem_metadata where instance_uuid = 
> 'a6cf4a6a-effe-4438-9b7f-d61b23117b9b';
++--+
| key| value|
++--+
| image_disk_format  | qcow2|
| image_min_ram  | 0|
| image_min_disk | 20   |
| image_base_image_ref   | 39cd564f-6a29-43e2-815b-62097968486a |
| image_container_format | bare |
++--+
5 rows in set (0.00 sec)

For this particular instance, the generated query used by the metadata
API will fetch 10 records from the database:

+--+-+---++--+
| anon_1_instances_uuid| instance_metadata_1_key | 
instance_metadata_1_value | instance_system_metadata_1_key | 
instance_system_metadata_1_value |
+--+-+---++--+
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1   | value1   
 | image_disk_format  | qcow2   
 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property2   | value
 | image_disk_format  | qcow2   
 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | property1   | value1   
 | image_min_ram  | 0   
 |
| a6cf4a6a-effe-4438-9b7f-d61b23117b9b | 

[Yahoo-eng-team] [Bug 1778206] Re: Compute leaks volume attachments if we fail in driver.pre_live_migration

2018-10-22 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/587439
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=1a29248d5e688ba1d4f806895dccd45fcb34b833
Submitter: Zuul
Branch:master

commit 1a29248d5e688ba1d4f806895dccd45fcb34b833
Author: Matthew Booth 
Date:   Tue Jun 26 14:42:47 2018 +0100

Ensure attachment cleanup on failure in driver.pre_live_migration

Previously, if the call to driver.pre_live_migration failed (which in
libvirt can happen with a DestinationDiskExists exception), the
compute manager wouldn't rollback/cleanup volume attachments, leading
to corrupt volume attachment information, and, depending on the
backend, the instance being unable to access its volume. This patch
moves the driver.pre_live_migration call inside the existing
try/except, allowing the compute manager to properly rollback/cleanup
volume attachments.

The compute manager has its own _rollback_live_migration() cleanup in
case the pre_live_migration() RPC call to the destination fails. There
should be no conflicts between the cleanup in that and the new volume
cleanup in the except block. The remove_volume_connection() ->
driver_detach() -> detach_volume() call catches the InstanceNotFound
exception and warns about the instance disappearing (it was never
really on the destination in the first place). The attachment_delete()
in _rollback_live_migration() is contingent on there being an
old_vol_attachment in migrate_data, which there isn't because
pre_live_migration() raised instead of returning.

Change-Id: I67f66e95d69ae6df22e539550a3eac697ea8f5d8
Closes-bug: 1778206


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778206

Title:
  Compute leaks volume attachments if we fail in
  driver.pre_live_migration

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  ComputeManager.pre_live_migration fails to clean up volume attachments
  if the call to driver.pre_live_migration() fails. There's a try block
  in there to clean up attachments, but its scope isn't large enough.
  The result is a volume in a perpetual attaching state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778206/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1795046] Re: Rocky Openstack CentOS documentation not matching

2018-10-22 Thread Colleen Murphy
As Adam said, you need to set OS_IDENTITY_API_VERSION=3 for the
openstack client to recognize that it needs to handle this v3-specific
subcommand. Marking this as invalid.

** Changed in: keystone
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1795046

Title:
  Rocky Openstack CentOS documentation not matching

Status in OpenStack Identity (keystone):
  Invalid

Bug description:
  Installation Documentation on site:
  https://docs.openstack.org/keystone/rocky/install/keystone-users-
  rdo.html is written to run command below but it is not a valid
  command.

  [cmock@controller ~]$ sudo openstack domain create --description "An Example 
Domain" example
  openstack: 'domain create --description An Example Domain example' is not an 
openstack command. See 'openstack --help'.
  Did you mean one of these?
command list
container create
container delete
container list
container save
container set
container show
container unset
  [cmock@controller ~]$

  Suggest updating documentation

  
  ---
  Release:  on 2018-09-10 22:19
  SHA: c5930abc5aa06881f28baa697d8d43a1f25157b8
  Source: 
https://git.openstack.org/cgit/openstack/keystone/tree/doc/source/install/keystone-users-rdo.rst
  URL: https://docs.openstack.org/keystone/rocky/install/keystone-users-rdo.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1795046/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783010] Re: Configure the Apache HTTP server (incorrect edit file)

2018-10-22 Thread Colleen Murphy
The instructions are correct as-is, /etc/apache2/apache2.conf is a valid
place to set the ServerName.

** Changed in: keystone
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1783010

Title:
  Configure the Apache HTTP server (incorrect edit file)

Status in OpenStack Identity (keystone):
  Invalid

Bug description:
  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [ ] This doc is inaccurate in this way: __
  - [ ] This is a doc addition request.
  - [x] I have a fix to the document that I can paste below including example: 
input and output.

  Configure the Apache HTTP server

  1. Edit the /etc/apache2/sites-enabled/keystone.conf file and add the
  ServerName option to reference the controller node:

    
  ServerName controller
  ...
    

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1783010/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784353] Re: Rescheduled boot from volume instances fail due to the premature removal of their attachments

2018-10-22 Thread Matt Riedemann
** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/queens
   Status: New => Triaged

** Changed in: nova/rocky
   Status: New => Triaged

** Changed in: nova
 Assignee: Stephen Finucane (stephenfinucane) => Lee Yarwood (lyarwood)

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/rocky
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784353

Title:
  Rescheduled boot from volume instances fail due to the premature
  removal of their attachments

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  Description
  ===
  This is caused by the cleanup code within the compute layer 
(_shutdown_instance) removing all volume attachments associated with an 
instance with no attempt being made to recreate these ahead of the instance 
being rescheduled.

  Steps to reproduce
  ==
  - Attempt to boot an instance with volumes attached.
  - Ensure spawn() fails, for example by stopping the l2 network agent services 
on the compute host.

  Expected result
  ===
  The instance is reschedule to another compute host and boots correctly.

  Actual result
  =
  The instance fails to boot on all hosts that is rescheduled to due to a 
missing volume attachment.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/

 bf497cc47497d3a5603bf60de652054ac5ae1993

  2. Which hypervisor did you use?
 (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
 What's the version of that?

 Libvirt + KVM, however this shouldn't matter.

  3. Which storage type did you use?
 (For example: Ceph, LVM, GPFS, ...)
 What's the version of that?

 N/A

  4. Which networking type did you use?
 (For example: nova-network, Neutron with OpenVSwitch, ...)

 N/A

  Logs & Configs
  ==

  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] Traceback (most recent call last):  
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1579, in 
_prep_block_device
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] 
wait_func=self._await_block_device_map_created)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 837, in 
attach_block_devices
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] _log_and_attach(device)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 834, in 
_log_and_attach
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] bdm.attach(*attach_args, 
**attach_kwargs)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in 
wrapped
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] ret_val = method(obj, context, *args, 
**kwargs)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 617, in 
attach
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] virt_driver, do_driver_attach)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in 
inner
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] return f(*args, **kwargs)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1]   File 
"/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 614, in 
_do_locked_attach
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 
d48c9894-2ba2-4752-bae5-36c437933ff1] self._do_attach(*args, **_kwargs)
  2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: 

[Yahoo-eng-team] [Bug 1799186] Re: Queens compute node is not compatible with Pike Controller node

2018-10-22 Thread Matt Riedemann
This isn't really supported. You should be configuring
[upgrade_levels]/compute to pike:

https://docs.openstack.org/nova/latest/configuration/config.html#upgrade_levels.compute

Until you get everything upgraded to Queens at which point you can
remove the RPC version pin.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799186

Title:
  Queens compute  node is not compatible with Pike Controller node

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===

  We Have OpenStack Pike running on Ubuntu 16.04.
  As per OpenStack documentation, compute node should support N+1 version of 
controller.
  But when we upgrade the controller node, we get below error on compute side 
for all actions performed.

  ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
  support RPC version 5.0. Attempted method: build_and_run_instance

  The RPC version of Pike compute is not supported with RPC version of
  Queens controller.

  Thus we are unable to upgrade this setup.

  Steps to reproduce
  ==

  1. Setup openstack Pike on Ubuntu 16.04
  2. Upgrade controller node to Queens by adding new keys
  3. Synch db, restart (standard upgrade process)
  4. After successful upgrade of controller node, check functions of nova 
(create instance, start/stop instance) Here controller is on Queens and Compute 
is on Pike
  5. You should get error that RPC versions are not supported

  Expected result
  ===

  Compute Should be compatible with N+1 version of OpenStack.
  In This case, this scenario must be supported and compute functions must work.

  Actual result
  =

  All compute related functions fails. Start/stop/reboot instance fails.

  Logs
  

  ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
  support RPC version 5.0. Attempted method: build_and_run_instance

  
  Environment
  ===

  Ubuntu 16.04 controller and compute with Pike installation.

  
  Do let me know if you need more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799186/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799246] [NEW] module level init of db transation contexts cause failure under mod_uwsgi on module reload

2018-10-22 Thread sean mooney
Public bug reported:

Description
===
This is related to a downstream bug first reported 
here:https://bugzilla.redhat.com/show_bug.cgi?id=1630069 where in it was 
discovers that due to how
triplo currently deploys the placemetn api under mod_wsgi

when deployed under mod_wsgi if the wsgi application exits with an error it is 
reloaded
back into the same python interperter instance. As a result of this behavior 
module level 
variable have a longer lifetime then the lifetime of the application. when run 
under uwsgi
when the application is reloaded it is reloaded in a new python interperter 
meaning 
the lifetime of the module level variables is scoped to the life time of the 
application.

As a result of the life time semantics of mod_wsgi the placment api and nova-api
must assume that the wsgi applications init can be invoked multiple times on 
failure.
The current use of the sqlalcamey enginefacade transaction_context is not 
reentrant
resulting in a type error being raised on subsequet calls to configure when the 
wsgi
application is reloaded. 


Expected result
===
it should be possible to reload the nova and placement api wsgi application
under mod_wsgi on failure 

Actual result
=

46087 [Wed Oct 10 15:10:49.433284 2018] [:error] [pid 14] [remote 
172.25.0.10:208] mod_wsgi (pid=14): Target WSGI script 
'/var/www/cgi-bin/nova/nova-placement-api' cannot be loaded as Python module.
46088 [Wed Oct 10 15:10:49.433305 2018] [:error] [pid 14] [remote 
172.25.0.10:208] mod_wsgi (pid=14): Exception occurred processing WSGI script 
'/var/www/cgi-bin/nova/nova-placement-api'.
46089 [Wed Oct 10 15:10:49.433320 2018] [:error] [pid 14] [remote 
172.25.0.10:208] Traceback (most recent call last):
46090 [Wed Oct 10 15:10:49.43 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File "/var/www/cgi-bin/nova/nova-placement-api", line 54, in 

46091 [Wed Oct 10 15:10:49.433354 2018] [:error] [pid 14] [remote 
172.25.0.10:208] application = init_application()
46092 [Wed Oct 10 15:10:49.433361 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi.py", line 
108, in init_application
46093 [Wed Oct 10 15:10:49.433386 2018] [:error] [pid 14] [remote 
172.25.0.10:208] db_api.configure(conf.CONF)
46094 [Wed Oct 10 15:10:49.433392 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/placement/db_api.py", line 
35, in configure
46095 [Wed Oct 10 15:10:49.433403 2018] [:error] [pid 14] [remote 
172.25.0.10:208] **_get_db_conf(conf.placement_database))
46096 [Wed Oct 10 15:10:49.433408 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 
788, in configure
46097 [Wed Oct 10 15:10:49.433420 2018] [:error] [pid 14] [remote 
172.25.0.10:208] self._factory.configure(**kw)
46098 [Wed Oct 10 15:10:49.433425 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/debtcollector/renames.py", line 43, in 
decorator
46099 [Wed Oct 10 15:10:49.433435 2018] [:error] [pid 14] [remote 
172.25.0.10:208] return wrapped(*args, **kwargs)
46100 [Wed Oct 10 15:10:49.433440 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 
312, in configure
46101 [Wed Oct 10 15:10:49.433449 2018] [:error] [pid 14] [remote 
172.25.0.10:208] self._configure(False, kw)
46102 [Wed Oct 10 15:10:49.433453 2018] [:error] [pid 14] [remote 
172.25.0.10:208]   File 
"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 
317, in _configure
46103 [Wed Oct 10 15:10:49.433462 2018] [:error] [pid 14] [remote 
172.25.0.10:208] raise TypeError("this TransactionFactory is already 
started")
46104 [Wed Oct 10 15:10:49.433473 2018] [:error] [pid 14] [remote 
172.25.0.10:208] TypeError: this TransactionFactory is already started

** Affects: nova
 Importance: Medium
 Assignee: sean mooney (sean-k-mooney)
 Status: In Progress


** Tags: api placement

** Changed in: nova
 Assignee: (unassigned) => sean mooney (sean-k-mooney)

** Changed in: nova
   Status: New => In Progress

** Changed in: nova
   Importance: Undecided => Medium

** Tags added: api placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799246

Title:
  module level init of db transation contexts cause failure under
  mod_uwsgi on module reload

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  ===
  This is related to a downstream bug first reported 
here:https://bugzilla.redhat.com/show_bug.cgi?id=1630069 where in it was 
discovers that due to how
  triplo currently deploys the placemetn api under mod_wsgi

  when 

[Yahoo-eng-team] [Bug 1799186] [NEW] Queens compute node is not compatible with Pike Controller node

2018-10-22 Thread omkar_telee
Public bug reported:

Description
===

We Have OpenStack Pike running on Ubuntu 16.04.
As per OpenStack documentation, compute node should support N+1 version of 
controller.
But when we upgrade the controller node, we get below error on compute side for 
all actions performed.

ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
support RPC version 5.0. Attempted method: build_and_run_instance

The RPC version of Pike compute is not supported with RPC version of
Queens controller.

Thus we are unable to upgrade this setup.

Steps to reproduce
==

1. Setup openstack Pike on Ubuntu 16.04
2. Upgrade controller node to Queens by adding new keys
3. Synch db, restart (standard upgrade process)
4. After successful upgrade of controller node, check functions of nova (create 
instance, start/stop instance) Here controller is on Queens and Compute is on 
Pike
5. You should get error that RPC versions are not supported

Expected result
===

Compute Should be compatible with N+1 version of OpenStack.
In This case, this scenario must be supported and compute functions must work.

Actual result
=

All compute related functions fails. Start/stop/reboot instance fails.

Logs


ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
support RPC version 5.0. Attempted method: build_and_run_instance


Environment
===

Ubuntu 16.04 controller and compute with Pike installation.


Do let me know if you need more details.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799186

Title:
  Queens compute  node is not compatible with Pike Controller node

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===

  We Have OpenStack Pike running on Ubuntu 16.04.
  As per OpenStack documentation, compute node should support N+1 version of 
controller.
  But when we upgrade the controller node, we get below error on compute side 
for all actions performed.

  ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
  support RPC version 5.0. Attempted method: build_and_run_instance

  The RPC version of Pike compute is not supported with RPC version of
  Queens controller.

  Thus we are unable to upgrade this setup.

  Steps to reproduce
  ==

  1. Setup openstack Pike on Ubuntu 16.04
  2. Upgrade controller node to Queens by adding new keys
  3. Synch db, restart (standard upgrade process)
  4. After successful upgrade of controller node, check functions of nova 
(create instance, start/stop instance) Here controller is on Queens and Compute 
is on Pike
  5. You should get error that RPC versions are not supported

  Expected result
  ===

  Compute Should be compatible with N+1 version of OpenStack.
  In This case, this scenario must be supported and compute functions must work.

  Actual result
  =

  All compute related functions fails. Start/stop/reboot instance fails.

  Logs
  

  ERROR oslo_messaging.rpc.server UnsupportedVersion: Endpoint does not
  support RPC version 5.0. Attempted method: build_and_run_instance

  
  Environment
  ===

  Ubuntu 16.04 controller and compute with Pike installation.

  
  Do let me know if you need more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799186/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799178] [NEW] l2 pop doesn't always provide the whole list of fdb entries on agent restart

2018-10-22 Thread Oleg Bondarev
Public bug reported:

The whole list of fdb entries is provided to the agent in case a port form new 
network appears, or when agent is restarted.
Currently agent restart is detected by agent_boot_time option, 180 sec by 
default. 
In fact boot time differs depending on port count and on some loaded clusters 
may exceed 180 secs on gateway nodes easily. Changing boot time in config 
works, but honestly this is not an ideal solution. 
There should be a smarter way for agent restart detection (like agent itself 
sending flag in state report).

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799178

Title:
  l2 pop doesn't always provide the whole list of fdb entries on agent
  restart

Status in neutron:
  New

Bug description:
  The whole list of fdb entries is provided to the agent in case a port form 
new network appears, or when agent is restarted.
  Currently agent restart is detected by agent_boot_time option, 180 sec by 
default. 
  In fact boot time differs depending on port count and on some loaded clusters 
may exceed 180 secs on gateway nodes easily. Changing boot time in config 
works, but honestly this is not an ideal solution. 
  There should be a smarter way for agent restart detection (like agent itself 
sending flag in state report).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799178/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1797309] Re: Every item in navigation bar of workflow form should be hide if the parameter ready is false

2018-10-22 Thread Ivan Kolodyazhny
** Also affects: horizon
   Importance: Undecided
   Status: New

** Changed in: horizon
   Status: New => Confirmed

** Changed in: horizon
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1797309

Title:
  Every item in navigation bar of workflow form should be  hide if the
  parameter ready is false

Status in OpenStack Dashboard (Horizon):
  Confirmed
Status in horizon package in Ubuntu:
  New

Bug description:
  In workflow wizard, every item of navigation used parameter 
'ng-show="viewModel.ready"' to determine whether it should be display. I think 
it should use the parameter 'ready' of every item,
  like this: 'ng-show="step.ready"'. I think it make sense.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1797309/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1798806] Re: Race condition between RT and scheduler

2018-10-22 Thread Radoslav Gerganov
*** This bug is a duplicate of bug 1729621 ***
https://bugs.launchpad.net/bugs/1729621

I just found that this problem is fixed in the master branch as part of
bug #1729621.  However, it is not backported to stable releases.

** This bug has been marked a duplicate of bug 1729621
   Inconsistent value for vcpu_used

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1798806

Title:
  Race condition between RT and scheduler

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  The HostState object which is used by the scheduler is using the
  'stats' property of the compute node to derive its own values, e.g. :

  self.stats = compute.stats or {}
  self.num_instances = int(self.stats.get('num_instances', 0))
  self.num_io_ops = int(self.stats.get('io_workload', 0))
  self.failed_builds = int(self.stats.get('failed_builds', 0))

  These values are used for both filtering and weighing compute hosts.
  However, the 'stats' property of the compute node is cleared during
  the periodic update_available_resources() and populated again. The
  clearing occurs in RT._copy_resources() and it preserves only the old
  value of 'failed_builds'. This creates a race condition between RT and
  scheduler which may result into populating wrong values for
  'num_io_ops' and 'num_instances' into the HostState object and thus
  leading to incorrect scheduling decisions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1798806/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799153] [NEW] Inappropriate behaviour of limits when passing --region None in create and list.

2018-10-22 Thread Vishakha Agarwal
Public bug reported:

When creating registered limit by passing --region None in registered
limit create cli, it is giving error message "More than one resource
exist for region" which is definitely a wrong message as regions with
same name cannot be created neither same exist.

The correct behaviour should be -
1. In the case if --region None it should create a registered limit.
2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
creating.

Same in case of registerd limit list
1. In the case if --region None it should list all limits ignoring None.
2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
listing.

Sane behaviors for limit create and list

** Affects: keystone
 Importance: Undecided
 Assignee: Vishakha Agarwal (vishakha.agarwal)
 Status: New

** Description changed:

  When creating registered limit by passing --region None in registered
  limit create cli, it is giving error message "More than one resource
  exist for region" which is definitely a wrong message as regions with
  name name cannot be created neither same exist.
  
- The correct behaviour should be - 
+ The correct behaviour should be -
  1. In the case if --region None it should create a registered limit.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
creating.
  
  Same in case of registerd limit list
  1. In the case if --region None it should list all limits ignoring None.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
listing.
+ 
+ Sane behaviors for limit create and list

** Changed in: keystone
 Assignee: (unassigned) => Vishakha Agarwal (vishakha.agarwal)

** Description changed:

  When creating registered limit by passing --region None in registered
  limit create cli, it is giving error message "More than one resource
  exist for region" which is definitely a wrong message as regions with
- name name cannot be created neither same exist.
+ same name cannot be created neither same exist.
  
  The correct behaviour should be -
  1. In the case if --region None it should create a registered limit.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
creating.
  
  Same in case of registerd limit list
  1. In the case if --region None it should list all limits ignoring None.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
listing.
  
  Sane behaviors for limit create and list

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1799153

Title:
  Inappropriate behaviour of limits when passing --region None in create
  and list.

Status in OpenStack Identity (keystone):
  New

Bug description:
  When creating registered limit by passing --region None in registered
  limit create cli, it is giving error message "More than one resource
  exist for region" which is definitely a wrong message as regions with
  same name cannot be created neither same exist.

  The correct behaviour should be -
  1. In the case if --region None it should create a registered limit.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
creating.

  Same in case of registerd limit list
  1. In the case if --region None it should list all limits ignoring None.
  2. "No region exist with name xyz" if passed a invalid 'xyz' region while 
listing.

  Sane behaviors for limit create and list

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1799153/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799155] [NEW] [l3][port_forwarding] tow different protocols can not have the same internal/external port number at the same time

2018-10-22 Thread LIU Yulong
Public bug reported:

ENV: devstack master


Floating IP port_forwardings with different protocols can not have the same 
internal or external port number to the same vm_port. But we can have different 
application server, for instance TCP server and UDP server, listen to the same 
port at same time.

For instance, if you create a port_forwarding to a floating IP with the 
following input:
{"port_forwarding": 
{
"internal_port_id": "3145b56c-949d-45d4-9e35-614117b5f69c", 
"internal_port": 22, 
"protocol": "tcp", 
"external_port": 22, 
"internal_ip_address": "192.168.188.3"
}
}

And then add another port_forwarding with protocol to udp and internal port 
number 22 again:
{"port_forwarding": 
{
"internal_port_id": "3145b56c-949d-45d4-9e35-614117b5f69c", 
"internal_port": 22, 
"protocol": "udp", 
"external_port": , 
"internal_ip_address": "192.168.188.3"
}
}

The neutron will return 40x error.

This is the key point, these unique constraints do not consider the protocol:
https://github.com/openstack/neutron/blob/master/neutron/db/migration/alembic_migrations/versions/rocky/expand/867d39095bf4_port_forwarding.py#L53-L58

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799155

Title:
  [l3][port_forwarding] tow different protocols can not have the same
  internal/external port number at the same time

Status in neutron:
  New

Bug description:
  ENV: devstack master

  
  Floating IP port_forwardings with different protocols can not have the same 
internal or external port number to the same vm_port. But we can have different 
application server, for instance TCP server and UDP server, listen to the same 
port at same time.

  For instance, if you create a port_forwarding to a floating IP with the 
following input:
  {"port_forwarding": 
  {
  "internal_port_id": "3145b56c-949d-45d4-9e35-614117b5f69c", 
  "internal_port": 22, 
  "protocol": "tcp", 
  "external_port": 22, 
  "internal_ip_address": "192.168.188.3"
  }
  }

  And then add another port_forwarding with protocol to udp and internal port 
number 22 again:
  {"port_forwarding": 
  {
  "internal_port_id": "3145b56c-949d-45d4-9e35-614117b5f69c", 
  "internal_port": 22, 
  "protocol": "udp", 
  "external_port": , 
  "internal_ip_address": "192.168.188.3"
  }
  }

  The neutron will return 40x error.

  This is the key point, these unique constraints do not consider the protocol:
  
https://github.com/openstack/neutron/blob/master/neutron/db/migration/alembic_migrations/versions/rocky/expand/867d39095bf4_port_forwarding.py#L53-L58

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799155/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799151] [NEW] table: the checkbox show uncertain when updating single row

2018-10-22 Thread Wangliangyu
Public bug reported:

If there are more than one processes running one http service, and when
updating single row data by ajax, the checkbox of the table will be
displayed or hidden uncertainty.

** Affects: horizon
 Importance: Undecided
 Status: New

** Description changed:

  If there are more than one processes running one http service, and when
  updating single row data by ajax, the checkbox of the table will be
- display or hidden uncertainly.
+ displayed or hidden uncertainty.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1799151

Title:
  table: the checkbox show uncertain when updating  single row

Status in OpenStack Dashboard (Horizon):
  New

Bug description:
  If there are more than one processes running one http service, and
  when updating single row data by ajax, the checkbox of the table will
  be displayed or hidden uncertainty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1799151/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799152] [NEW] Retry after hitting libvirt error code VIR_ERR_OPERATION_INVALID in live migration.

2018-10-22 Thread Fan Zhang
Public bug reported:

Description
===
When migration of a persistent guest completes, the guest merely shuts off,
but libvirt unhelpfully raises an VIR_ERR_OPERATION_INVALID error code, in the
nova code, we pretend this case means success. But if we are in the middle of a
live migration, and sadly qemu-kvm process is killed accidentally, such as by 
host OOM, which happens rarely in our environment but it does happen few 
times, domain state is SHUTOFF and then we will get VIR_ERR_OPERATION_INVALID
while trying to call `self._domain.jobStats()`. Under the circumstance,
migration should be considered failed, otherwise post_live_migration() function
starts to clean up instance files and we will lose customers' data forever.
IMHO, we may need to `pretend` the migration job is still running after
hitting VIR_ERR_OPERATION_INVALID and retry to get job stats for a few times,
which the count of retries can be configured. Because if migration succeeds
finally, we won't get VIR_ERR_OPERATION_INVALID after some retries, but the 
error code still happens if qemu-kvm process is killed accidentally.

Steps to reproduce
==
* Do nova live-migration  on controller node.
* Once live migration monitor on source compute node starts to get JobInfo, 
kill the qemu-kvm process on source host.
* Check if post_live_migration on source host starts to execute.
* Check if post_live_migration on destination host starts to execute.
* Check image files on both source host and destination host.

Expected result
===

Migration should be consider failed.

Actual result
=

Post live migration on source host starts to execute and clean instance
files. Instance disappears on both source and destination host.

Environment
===
1. My environment is packstack, and openstack nova release is Queens.

2. Libvirt + KVM

Logs & Configs
==

Some logs after qemu-kvm process is killed.
```
...
2018-09-21 14:08:34.180 11099 DEBUG nova.virt.libvirt.migration 
[req-d8e0cfab-ea85-4716-a2fe-1307a7004f12 bf015418722f437e9f031efabc7a98e6 
ca68d7d736374dbfb38d4ef2f80b2a5c - default default] [instance: 
ba8feaea-eedc-4b7c-8ffa-01152fc9bde8] Downtime does not need to change 
update_downtime 
/usr/lib/python2.7/site-packages/nova/virt/libvirt/migration.py:410
2018-09-21 14:08:34.305 11099 DEBUG nova.virt.libvirt.driver 
[req-d8e0cfab-ea85-4716-a2fe-1307a7004f12 bf015418722f437e9f031efabc7a98e6 
ca68d7d736374dbfb38d4ef2f80b2a5c - default default] [instance: 
ba8feaea-eedc-4b7c-8ffa-01152fc9bde8] Migration running for 10 secs, memory 
100% remaining; (bytes processed=0, remaining=0, total=0) 
_live_migration_monitor 
/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:7394
2018-09-21 14:08:34.886 11099 DEBUG nova.virt.libvirt.guest 
[req-d8e0cfab-ea85-4716-a2fe-1307a7004f12 bf015418722f437e9f031efabc7a98e6 
ca68d7d736374dbfb38d4ef2f80b2a5c - default default] Domain has shutdown/gone 
away: Requested operation is not valid: domain is not running get_job_info 
/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py:720
2018-09-21 14:08:34.887 11099 INFO nova.virt.libvirt.driver 
[req-d8e0cfab-ea85-4716-a2fe-1307a7004f12 bf015418722f437e9f031efabc7a98e6 
ca68d7d736374dbfb38d4ef2f80b2a5c - default default] [instance: 
ba8feaea-eedc-4b7c-8ffa-01152fc9bde8] Migration operation has completed
2018-09-21 14:08:34.887 11099 INFO nova.compute.manager 
[req-d8e0cfab-ea85-4716-a2fe-1307a7004f12 bf015418722f437e9f031efabc7a98e6 
ca68d7d736374dbfb38d4ef2f80b2a5c - default default] [instance: 
ba8feaea-eedc-4b7c-8ffa-01152fc9bde8] _post_live_migration() is started..
...
```

** Affects: nova
 Importance: Undecided
 Assignee: Fan Zhang (fanzhang)
 Status: New

** Changed in: nova
 Assignee: (unassigned) => Fan Zhang (fanzhang)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799152

Title:
  Retry after hitting libvirt error code VIR_ERR_OPERATION_INVALID in
  live migration.

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  When migration of a persistent guest completes, the guest merely shuts off,
  but libvirt unhelpfully raises an VIR_ERR_OPERATION_INVALID error code, in the
  nova code, we pretend this case means success. But if we are in the middle of 
a
  live migration, and sadly qemu-kvm process is killed accidentally, such as by 
host OOM, which happens rarely in our environment but it does happen few 
  times, domain state is SHUTOFF and then we will get VIR_ERR_OPERATION_INVALID
  while trying to call `self._domain.jobStats()`. Under the circumstance,
  migration should be considered failed, otherwise post_live_migration() 
function
  starts to clean up instance files and we will lose customers' data forever.
  IMHO, we may need to `pretend` the migration job is still running after
  hitting 

[Yahoo-eng-team] [Bug 1799150] [NEW] [l3][port_forwarding] internal/external port should not allow 0

2018-10-22 Thread LIU Yulong
Public bug reported:

ENV: devstack master

Floating IP port forwarding internal or external port number should not
allow 0, otherwise you will get some ValueError exception in neutron
server.

Step to reproduce:
1. create router with connected privated subnet and public gateway.
2. create VM to the private subnet
3. create floating IP
4. create port forwarding with internal or external port number 0

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799150

Title:
  [l3][port_forwarding] internal/external port should not allow 0

Status in neutron:
  New

Bug description:
  ENV: devstack master

  Floating IP port forwarding internal or external port number should
  not allow 0, otherwise you will get some ValueError exception in
  neutron server.

  Step to reproduce:
  1. create router with connected privated subnet and public gateway.
  2. create VM to the private subnet
  3. create floating IP
  4. create port forwarding with internal or external port number 0

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799150/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799140] [NEW] [l3][port_forwarding] subnet can not remove from router if have port_forwarding

2018-10-22 Thread LIU Yulong
Public bug reported:

ENV: devstack master

step to reproduce:
1. create router
2. add router public gateway
3. add router interface to subnet1, subnet2, subnet3
4. create a vm to subnet1
5. create floating IP with port forwarding to the vm port from subnet1

Then, you will not be able to remove router interface from subnet2 and subnet3.
Neutron server will raise some netaddr related error.

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  ENV: devstack master
- 
  
  step to reproduce:
  1. create router
  2. add router public gateway
  3. add router interface to subnet1, subnet2, subnet3
  4. create a vm to subnet1
- 4. create floating IP with port forwarding to the vm port from subnet1
+ 5. create floating IP with port forwarding to the vm port from subnet1
  
  Then, you will not be able to remove router interface from subnet2 and 
subnet3.
  Neutron server will raise some netaddr related error.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799140

Title:
  [l3][port_forwarding] subnet can not remove from router if have
  port_forwarding

Status in neutron:
  New

Bug description:
  ENV: devstack master

  step to reproduce:
  1. create router
  2. add router public gateway
  3. add router interface to subnet1, subnet2, subnet3
  4. create a vm to subnet1
  5. create floating IP with port forwarding to the vm port from subnet1

  Then, you will not be able to remove router interface from subnet2 and 
subnet3.
  Neutron server will raise some netaddr related error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799140/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799137] [NEW] [l3][port_forwarding] should not allow creating port_forwarding to a port which already has a binding floating IP

2018-10-22 Thread LIU Yulong
Public bug reported:

Should not allow creating port_forwarding to a port which already has a
binding floating IP for dvr routers.

ENV: devstack master

step to reproduce:
1. create dvr router with connected privated subnet and public gateway.
2. create VM to the private subnet
3. binding floating IP A to VM port
4. create floating IP B with port forwarding to the VM port

Then floating IP B with port forwarding will not work. This should be
restricted by neutron.

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  Should not allow creating port_forwarding to a port which already has a
  binding floating IP for dvr routers.
  
  ENV: devstack master
  
- 
  step to reproduce:
  1. create dvr router with connected privated subnet and public gateway.
  2. create VM to the private subnet
- 3. create floating IP B with port forwarding to the VM port
- 4. binding floating IP B to VM port
+ 3. binding floating IP B to VM port
+ 4. create floating IP B with port forwarding to the VM port
  
  Then floating IP B with port forwarding will not work. This should be
  restricted by neutron.

** Description changed:

  Should not allow creating port_forwarding to a port which already has a
  binding floating IP for dvr routers.
  
  ENV: devstack master
  
  step to reproduce:
  1. create dvr router with connected privated subnet and public gateway.
  2. create VM to the private subnet
- 3. binding floating IP B to VM port
+ 3. binding floating IP A to VM port
  4. create floating IP B with port forwarding to the VM port
  
  Then floating IP B with port forwarding will not work. This should be
  restricted by neutron.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799137

Title:
  [l3][port_forwarding] should not allow creating port_forwarding to a
  port which already has a binding floating IP

Status in neutron:
  New

Bug description:
  Should not allow creating port_forwarding to a port which already has
  a binding floating IP for dvr routers.

  ENV: devstack master

  step to reproduce:
  1. create dvr router with connected privated subnet and public gateway.
  2. create VM to the private subnet
  3. binding floating IP A to VM port
  4. create floating IP B with port forwarding to the VM port

  Then floating IP B with port forwarding will not work. This should be
  restricted by neutron.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799137/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799138] [NEW] [l3][port_forwarding] a port can have port_forwarding and then bind floating IP again

2018-10-22 Thread LIU Yulong
Public bug reported:

ENV: devstack master

step to reproduce:
1. create dvr router with connected privated subnet and public gateway.
2. create VM to the private subnet
3. create floating IP A with port forwarding to the VM port
4. binding floating IP B to VM port

Then floating IP A with port forwarding will not work. This should be 
restricted by neutron.
Something really similar to bug:
https://bugs.launchpad.net/neutron/+bug/1799137

** Affects: neutron
 Importance: Undecided
 Status: New

** Summary changed:

- [l3][port_forwarding] a port can have port_forwarding and then binding 
floating IP again
+ [l3][port_forwarding] a port can have port_forwarding and then bind floating 
IP again

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799138

Title:
  [l3][port_forwarding] a port can have port_forwarding and then bind
  floating IP again

Status in neutron:
  New

Bug description:
  ENV: devstack master

  step to reproduce:
  1. create dvr router with connected privated subnet and public gateway.
  2. create VM to the private subnet
  3. create floating IP A with port forwarding to the VM port
  4. binding floating IP B to VM port

  Then floating IP A with port forwarding will not work. This should be 
restricted by neutron.
  Something really similar to bug:
  https://bugs.launchpad.net/neutron/+bug/1799137

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799138/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1799135] [NEW] [l3][port_forwarding] update floating IP (has binding port_forwarding) with empty {} input will lose router_id

2018-10-22 Thread LIU Yulong
Public bug reported:

ENV: devstack master

Step to reproduce:
1. create floating IP
2. create port forwarding for that floating IP
3. update floating IP with empty dict:
curl -g -i -X PUT 
http://controller:9696/v2.0/floatingips/2bb4cc5d-7fae-4c1b-9482-ead60d67abea \
-H "User-Agent: python-neutronclient" -H "Accept: application/json" \
-H "X-Auth-Token: " \
-d '{"floatingip": {}}'

Then this floating IP will turn to a bad status, it can not be managed
anymore. Every action on this floating IP will get a neutron-server
ERROR log.

Furturemore only updating floating IP qos_policy_id can also result such 
behavior.
curl -g -i -X PUT 
http://controller:9696/v2.0/floatingips/2bb4cc5d-7fae-4c1b-9482-ead60d67abea \
-H "User-Agent: python-neutronclient" -H "Accept: application/json" \
-H "X-Auth-Token: " \
-d '{"floatingip": {"qos_policy_id": "d9d3639e-b616-4007-a8fe-52d6154f1eec"}}'

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1799135

Title:
  [l3][port_forwarding] update floating IP (has binding port_forwarding)
  with empty {} input will lose router_id

Status in neutron:
  New

Bug description:
  ENV: devstack master

  Step to reproduce:
  1. create floating IP
  2. create port forwarding for that floating IP
  3. update floating IP with empty dict:
  curl -g -i -X PUT 
http://controller:9696/v2.0/floatingips/2bb4cc5d-7fae-4c1b-9482-ead60d67abea \
  -H "User-Agent: python-neutronclient" -H "Accept: application/json" \
  -H "X-Auth-Token: " \
  -d '{"floatingip": {}}'

  Then this floating IP will turn to a bad status, it can not be managed
  anymore. Every action on this floating IP will get a neutron-server
  ERROR log.

  Furturemore only updating floating IP qos_policy_id can also result such 
behavior.
  curl -g -i -X PUT 
http://controller:9696/v2.0/floatingips/2bb4cc5d-7fae-4c1b-9482-ead60d67abea \
  -H "User-Agent: python-neutronclient" -H "Accept: application/json" \
  -H "X-Auth-Token: " \
  -d '{"floatingip": {"qos_policy_id": "d9d3639e-b616-4007-a8fe-52d6154f1eec"}}'

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1799135/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp