[Yahoo-eng-team] [Bug 1452641] Re: Static Ceph mon IP addresses in connection_info can prevent VM startup

2023-07-10 Thread Billy Olsen
This is not a charm bug, its a limitation/bug in the way that nova
handles the BDM devices.

** Changed in: nova
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452641

Title:
  Static Ceph mon IP addresses in connection_info can prevent VM startup

Status in OpenStack Compute (nova):
  Invalid
Status in nova package in Ubuntu:
  Triaged

Bug description:
  The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from 
the Ceph mon map when the instance/volume connection is established. This info 
is then stored in nova's block-device-mapping table and is never re-validated 
down the line. 
  Changing the Ceph mon servers' IP adresses will prevent the instance from 
booting as the stale connection info will enter the instance's XML. One idea to 
fix this would be to use the information from ceph.conf, which should be an 
alias or a loadblancer, directly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1452641/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2026284] Re: virtio-net-tx-queue-size reflects in nova conf but not for the vm even after a hard reboot

2023-07-07 Thread Billy Olsen
This does not appear to be a charm issue, but rather it appears to
potentially be a nova issue. I can confirm that setting the
rx_queue_size and tx_queue_size results in the nova.conf file being
updated by the charm, but that the resulting hard rebooted guest does
not get the tx_queue_size, only the rx_queue_size.


** Also affects: nova
   Importance: Undecided
   Status: New

** Changed in: nova
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2026284

Title:
  virtio-net-tx-queue-size reflects in nova conf but not for the vm even
  after a hard reboot

Status in OpenStack Nova Compute Charm:
  New
Status in OpenStack Compute (nova):
  Incomplete

Bug description:
  After modifying the nova compute config options,
  - virtio-net-rx-queue-size=512
  - virtio-net-tx-queue-size=512

  I hard rebooted my vm and spawned a new vm and what I see (on both of them) 
is:
  - virsh xml
  ```
  # virsh dumpxml 2 | grep -i queue
    
  ```

  - nova.conf
  ```
  # grep -i queue /etc/nova/nova.conf
  tx_queue_size = 512
  rx_queue_size = 512
  ```

  - inside the vm
  ```
  root@jammy-135110:~# ethtool -g ens2
  Ring parameters for ens2:
  Pre-set maximums:
  RX: 512
  RX Mini:n/a
  RX Jumbo:   n/a
  TX: 256
  Current hardware settings:
  RX: 512
  RX Mini:n/a
  RX Jumbo:   n/a
  TX: 256
  ```

  The RX config gets propagated, but the TX config does not
  Please let me know if any more information is needed.

  --

  env:
  - focal ussuri
  - nova-compute:
  charm: nova-compute
  channel: ussuri/stable
  revision: 669
  - this is a freshly deployed openstack on vms (not on baremetal)
  - libvirt: 6.0.0-0ubuntu8.16
  - nova-compute-libvirt 21.2.4-0ubuntu2.5
  - qemu 4.2-3ubuntu6.27

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/2026284/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1993628] Re: Designate synchronisation inconsistensies with Neutron-API

2023-05-16 Thread Billy Olsen
Agree that this likely isn’t a charm issue. I’ll mark invalid for now,
but feel free to reopen if evidence suggests otherwise.

** Changed in: charm-designate
   Status: New => Invalid

** Changed in: charm-designate
   Status: Invalid => Incomplete

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1993628

Title:
  Designate synchronisation inconsistensies with Neutron-API

Status in OpenStack Designate Charm:
  Incomplete
Status in neutron:
  New

Bug description:
  When setting a network to use automatically a dns-domain, some
  inconsistensies were observed when deleting and recreating new
  instances sharing the same names and associating them to the same
  floating IPs from before.

  This has been reproduced on :
  * Focal Ussuri (Neutron-api and Designate charms with Ussuri/edge branch)
  * Focal Yoga  (Neutron-api and Designate charms with Yoga/stable branch)

  
  Reproducible steps :
  * create a domain zone with "openstack zone create"
  * configure an existing self-service with the newly created domain "openstack 
network set --dns-domain ..."
  * create a router on the self-service network with an external gateway on 
provider network
  * create an instance on self-service network
  * create a floating ip address on provider network
  * associate floating ip to instance
  --> the DNS entry gets created

  * delete the instance *WITH* the floating ip still attached
  --> the DNS entry is deleted

  * recreate a new instance with exactly the *same* name and re-use the *same* 
floating ip
  --> the DNS entry doesn't get created
  --> it doesn't seem to be related to TTL, since this makes the issue 
permanent even after a day of testing when TTL is set by default to 1 hour

  Worse inconsistensies can be seen when, instead of deleting an instance, 
moving the floating ip directly to another instance
  * have 2 instances vm-1 and vm-2
  * attach floating ip to vm-1 "openstack server add floating ip XXX vm-1"
  --> the DNS entry is created
  * attach the same floating ip to vm-2 ""openstack server add floating ip XXX 
vm-2"  (this is permitted by CLI and simply move the fIP to vm-2)
  --> the DNS entry still use vm-1, vm-2 doesn't get created

  When you combine these 2 issues, you can be left with either false
  records being kept or automatic records failing silently to be
  created.

  
  Workaround :
  * either always remove floating ip *before* deleting an instance
  or
  * remove floating ip on instance
  * then re-add floating ip on instance

  
  Eventually when deleting the floating ip to reassign it, we are gratified 
with this error on neutron-api unit (on Ussuri but the error is similar on 
Yoga) :

  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db 
[req-e6d270d2-fbde-42d7-a75b-2c8a67c42fcb 2dc4151f6dba4c3e8ba8537c9c354c13 
f548268d5255424591baa8783f1cf277 - 6a71047e7d7f4e01945ec58df06ae63f 
6a71047e7d7f4e01945ec58df06ae63f] Error deleting Floating IP data from external 
DNS service. Name: 'vm-2'. Domain: 'compute.stack.vpn.'. IP addresses 
'192.168.21.217'. DNS service driver message 'Name vm-2.compute.stack.vpn. is 
duplicated in the external DNS service': 
neutron_lib.exceptions.dns.DuplicateRecordSet: Name vm-2.compute.stack.vpn. is 
duplicated in the external DNS service
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db Traceback (most recent 
call last):
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db   File 
"/usr/lib/python3/dist-packages/neutron/db/dns_db.py", line 214, in 
_delete_floatingip_from_external_dns_service
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db 
self.dns_driver.delete_record_set(context, dns_domain, dns_name,
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db   File 
"/usr/lib/python3/dist-packages/neutron/services/externaldns/drivers/designate/driver.py",
 line 172, in delete_record_set
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db ids_to_delete = 
self._get_ids_ips_to_delete(
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db   File 
"/usr/lib/python3/dist-packages/neutron/services/externaldns/drivers/designate/driver.py",
 line 200, in _get_ids_ips_to_delete
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db raise 
dns_exc.DuplicateRecordSet(dns_name=name)
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db 
neutron_lib.exceptions.dns.DuplicateRecordSet: Name vm-2.compute.stack.vpn. is 
duplicated in the external DNS service
  2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-designate/+bug/1993628/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1880828] Re: New instance is always in "spawning" status

2022-05-13 Thread Billy Olsen
Marking charm tasks as invalid on this particular bug as these aren't
related to the charms and were chased down to other components.

** Changed in: charm-nova-compute
   Status: New => Invalid

** Changed in: openstack-bundles
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1880828

Title:
  New instance is always in "spawning" status

Status in OpenStack Nova Compute Charm:
  Invalid
Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Bundles:
  Invalid

Bug description:
  bundle: openstack-base-bionic-train 
https://github.com/openstack-charmers/openstack-bundles/blob/master/development/openstack-base-bionic-train/bundle.yaml
  hardware: 2 d05 and 2 d06 (the log of the compute node is from one of the 
d06. Please note they are arm64 arch.)

  When trying to create new instances on the deployed openstack, the
  instance is always in the status of "spawning"

  [Steps to Reproduce]
  1. Deploy with the above bundle and hardware by following the instruction of 
https://jaas.ai/openstack-base/bundle/67
  2. Wait about 1.5 until the deployment is ready. By ready it means every unit 
shows its message as "ready" e.g. https://paste.ubuntu.com/p/k48YVnPyVZ/
  3. Follow the instruction of https://jaas.ai/openstack-base/bundle/67 until 
the step of "openstack server create" to create new instance. This step is also 
summarized in details in this gist code snippet 
https://gist.github.com/tai271828/b0c00a611e703046dd52da12a66226b0#file-02-basic-test-just-deployed-sh

  [Expected Behavior]
  An instance is created a few seconds later

  [Actual Behavior]
  The status of the instance is always (> 20 minutes) "spawning"

  [Additional Information]

  1. [workaround] Use `ps aux | grep qemu-img` to check if a qemu-img
  image converting process exists or not. The process should complete
  within ~20 sec. If the process exists for more than 1 minutes, use
  `pkill -f qemu-img` to terminate the process and re-create instances
  again.

  The image converting process looks like this one:

  ```
  qemu-img convert -t none -O raw -f qcow2 /var/lib/nova/instance 
s/_base/9b8156fbecaa194804a637226c8ffded93a57489.part 
/var/lib/nova/instances/_base/9b8156fbecaa194804a637226c8ffded93a57489.converted
  ```

  2. By investing in more details, this issue is a coupled issue of 1)
  nova should timeout instance process (comment#21) 2) qemu does not
  terminate the process to convert the image successfully (comment#20)

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1880828/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails

2021-07-17 Thread Billy Olsen
Queens and Rocky are both extended maintenance and have had the proposed
patches merged. Updating tasks to mark as fix released.

** Changed in: nova/rocky
   Status: New => Fix Released

** Changed in: nova/queens
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892361

Title:
  SRIOV instance gets type-PF interface, libvirt kvm fails

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released
Status in OpenStack Compute (nova) victoria series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Bionic:
  Fix Released
Status in nova source package in Focal:
  Fix Released
Status in nova source package in Groovy:
  Fix Released
Status in nova source package in Hirsute:
  Fix Released

Bug description:
  When spawning an SR-IOV enabled instance on a newly deployed host,
  nova attempts to spawn it with an type-PF pci device. This fails with
  the below stack trace.

  After restarting neutron-sriov-agent and nova-compute services on the
  compute node and spawning an SR-IOV instance again, a type-VF pci
  device is selected, and instance spawning succeeds.

  Stack trace:
  2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] 
received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager 
[req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 
dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 
015e4fd7db304665ab5378caa691bb8b] [insta
  nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: 
libvirtError: unsupported configuration: Interface type hostdev is currently 
supported on SR-IOV Virtual Functions only
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in 
_build_resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in 
_build_and_run_instance
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in 
spawn
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in 
_create_domain_and_network
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11]   File 
"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 
9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, 
self.tb)
  2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 

[Yahoo-eng-team] [Bug 1888395] Re: live migration of a vm using the single port binding work flow is broken in train as a result of the introduction of sriov live migration

2021-03-10 Thread Billy Olsen
** Description changed:

- it was working in queens but fails in train. nova compute at the target
- aborts with the exception:
+ [Impact]
+ 
+ Live migration of instances in an environment that uses neutron backends
+ that do not support multiple port bindings will fail with error
+ 'NotImplemented', effectively rendering live-migration inoperable in
+ these environments.
+ 
+ This is fixed by first checking to ensure the backend supports the
+ multiple port bindings before providing the port bindings.
+ 
+ [Test Plan]
+ 
+ 1. deploy a Train/Ussuri OpenStack cloud w/ at least 2 compute nodes
+ using an SDN that does not support multiple port bindings (e.g.
+ opencontrail).
+ 
+ 2. Attempt to perform a live migration of an instance.
+ 
+ 3. Observe that the live migration will fail without this fix due to the
+ trace below (NotImplementedError: Cannot load 'vif_type' in the base
+ class), and should succeed with this fix.
+ 
+ 
+ [Where problems could occur]
+ 
+ This affects the live migration code, so likely problems would arise in
+ this area. Specifically, the check introduced is guarding information
+ provided for instances using SR-IOV indirect migration.
+ 
+ Regressions would likely occur in the form of live migration errors
+ around features that rely on the multiple port bindings (e.g. the SR-
+ IOV) and not the more generic/common use case. Errors may be seen in
+ standard network providers that are included with distro packaging, but
+ may also be seen in scenarios where proprietary SDNs are used.
+ 
+ 
+ [Original Description]
+ it was working in queens but fails in train. nova compute at the target 
aborts with the exception:
  
  Traceback (most recent call last):
-   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
- res = self.dispatcher.dispatch(message)
-   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 274, in dispatch
- return self._do_dispatch(endpoint, method, ctxt, args)
-   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 194, in _do_dispatch
- result = func(ctxt, **new_args)
-   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, 
in wrapped
- function_name, call_dict, binary, tb)
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, 
in __exit__
- self.force_reraise()
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
- six.reraise(self.type_, self.value, self.tb)
-   File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, 
in wrapped
- return f(self, context, *args, **kw)
-   File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1372, 
in decorated_function
- return function(self, context, *args, **kwargs)
-   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 219, 
in decorated_function
- kwargs['instance'], e, sys.exc_info())
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, 
in __exit__self.force_reraise()
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
- six.reraise(self.type_, self.value, self.tb)  File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 207, in 
decorated_function
- return function(self, context, *args, **kwargs)
-   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7007, 
in pre_live_migration
- bdm.save()
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, 
in __exit__
- self.force_reraise()
-   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, 
in force_reraise
- six.reraise(self.type_, self.value, self.tb)
-   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6972, 
in pre_live_migration
- migrate_data)
-   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 
9190, in pre_live_migration
- instance, network_info, migrate_data)
-   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 
9071, in _pre_live_migration_plug_vifs
- vif_plug_nw_info.append(migrate_vif.get_dest_vif())
-   File "/usr/lib/python2.7/site-packages/nova/objects/migrate_data.py", line 
90, in get_dest_vif
- vif['type'] = self.vif_type
-   File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 
67, in getter
- self.obj_load_attr(name)
-   File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 
603, in obj_load_attr
- _("Cannot load '%s' in the base class") % attrname)
+   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 
165, in _process_incoming
+ res = self.dispatcher.dispatch(message)
+   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", 
line 274, in dispatch
+ return self._do_dispatch(endpoint, method, ctxt, args)
+   

[Yahoo-eng-team] [Bug 1915318] Re: User list cannot be retrieved when pointing user_tree_dn at top level of the root domain

2021-02-12 Thread Billy Olsen
Further discussion with Jeff indicated that replacing the { and } with (
and ) resolved the issue.

** Changed in: keystone
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1915318

Title:
  User list cannot be retrieved when pointing user_tree_dn at top level
  of the root domain

Status in OpenStack Identity (keystone):
  Invalid

Bug description:
  Windows AD, functional level Windows Server 2012 R2

  Focal + Ussuri

  keystone-ldap-31

  Using ldap-config-flags of:

  ```
  ldap-config-flags: "{
user_tree_dn: 'DC=example,DC=org',
query_scope: sub,
user_objectclass: person,
user_id_attribute: cn,
user_filter: 
'{|(memberOf=CN=OpenStackAdmins,OU=OpenStack,OU=Groups,DC=example,DC=org)(memberOf=CN=OpenStackUsers,OU=OpenStack,OU=Groups,DC=example,DC=org)}',
user_name_attribute: sAMAccountName,
user_mail_attribute: mail,
user_pass_attribute: '',
user_description_attribute: displayName,
user_enabled_attribute: userAccountControl,
user_enabled_mask: 2,
user_enabled_invert: false,
user_enabled_default: 512,
group_tree_dn: 'OU=OpenStack,OU=Groups,DC=example,DC=org',
group_objectclass: group,
group_id_attribute: cn,
group_name_attribute: sAMAccountName,
group_member_attribute: member,
}"
  ```

  The user list cannot be retrieved, but the group list can.  Horizon
  shows an error of "Unable to retrieve user list"

  Running `openstack user list --domain example.org` shows "Internal
  Server Error (HTTP 500)"

  In this scenario.  There are 2 sets of users that customer wants to
  have access to this openstack environment.

  There are no logs in /var/log/keystone/keystone.log when this error
  occurs

  The DN's for those 2 different User trees are:

  OU=AdminUsers,DC=example,DC=com   and OU=Users,DC=example,DC=com

  As can be seen, both OU's are off of the root domain, and don't share
  a common tree, other than the root.

  When the user_dn_tree is changed to `OU=AdminUsers,DC=example,DC=com`
  then users in that User tree can log in, and show up in the user list,
  but the users from OU=Users,DC=example,DC=com do not.  and Vice-Versa

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1915318/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1912513] Re: Port creation fails with error IP already allocated but the IP is available

2021-01-21 Thread Billy Olsen
Since this doesn't appear to be an issue with the charms, I'm going to
remove the project from being affected by this bug and the field
critical designation. However, feel free to re-add it should evidence
present itself otherwise.

** No longer affects: charm-neutron-api

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1912513

Title:
  Port creation fails with error IP already allocated but the IP is
  available

Status in neutron:
  Incomplete

Bug description:
  Description:
  =
  When trying to create a new port using an available IP in the allocation pool 
of a VLAN neutron network, creation fails with error:
  IP address 10.41.8.3 already allocated in subnet 
afb678c6-a152-4f1d-8d77-03b9167520cc

  
  Precondition:
  =
  A port using the same IP was previously created and then deleted.

  How to reproduce:
  =
  I have the following network:

  $ openstack network show e30b938b-210d-45c2-894c-95c0c5d08f79
  
+---++
  | Field | Value   

   |
  
+---++
  | admin_state_up| UP  

   |
  | availability_zone_hints   | 

   |
  | availability_zones| 

   |
  | created_at| 2020-11-25T10:55:32Z

   |
  | description   | 

   |
  | dns_domain| 

   |
  | id| e30b938b-210d-45c2-894c-95c0c5d08f79

   |
  | ipv4_address_scope| None

   |
  | ipv6_address_scope| None

   |
  | is_default| False   

   |
  | is_vlan_transparent   | None

   |
  | location  | cloud='', project.domain_id=, 
project.domain_name=, project.id='606e529ab1bc4b18a6d5dbf8735b9815', 
project.name=, region_name='us-test', zone= |
  | mtu   | 1500

   |
  | name  | test
  |
  | port_security_enabled | True

   |
  | project_id| 606e529ab1bc4b18a6d5dbf8735b9815

   |
  | provider:network_type | vlan

   |
  | provider:physical_network | physnet1

   |
  | provider:segmentation_id  | 2220

   

[Yahoo-eng-team] [Bug 1820612] Re: Logging is hard to read if there is a problem with resources during live migration

2019-03-18 Thread Billy Olsen
** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1820612

Title:
  Logging is hard to read if there is a problem with resources during
  live migration

Status in OpenStack nova-cloud-controller charm:
  Incomplete
Status in OpenStack Compute (nova):
  New

Bug description:
  Issuing the command to migrate an instance from openstack-6 to another
  hosts (openstack-17) which does have enough resources


  # logging in nova-compute.log
  2019-03-18 12:56:49.111 301805 DEBUG nova.scheduler.client.report 
[req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c 
b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 
f750199c451f432f9d615a147744f4f5] Doubling-up allocation request for move 
operation. _move_operation_alloc_request 
/usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:162
  2019-03-18 12:56:49.112 301805 DEBUG nova.scheduler.client.report 
[req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c 
b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 
f750199c451f432f9d615a147744f4f5] New allocation request containing both source 
and destination hosts in move operation: {'allocations': [{'resource_provider': 
{'uuid': u'4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'}, 'resources': {u'VCPU': 4, 
u'MEMORY_MB': 2048, u'DISK_GB': 20}}, {'resource_provider': {'uuid': 
u'57990d7c-7b10-40ee-916f-324bf7784eed'}, 'resources': {u'VCPU': 4, 
u'MEMORY_MB': 2048, u'DISK_GB': 20}}]} _move_operation_alloc_request 
/usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:202
  2019-03-18 12:56:49.146 301805 WARNING nova.scheduler.client.report 
[req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c 
b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 
f750199c451f432f9d615a147744f4f5] Unable to submit allocation for instance 
7e00-7913-4de9-8f45-ce13fcb8a104 (409 
   
409 Conflict
   
   
409 Conflict
There was a conflict when trying to complete your request.
  Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on 
resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'. The requested amount 
would exceed the capacity.

  
   
  )
  2019-03-18 12:56:49.147 301805 WARNING nova.scheduler.utils 
[req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c 
b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 
f750199c451f432f9d615a147744f4f5] Failed to compute_task_migrate_server: No 
valid host was found. Unable to move instance 
7e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough 
capacity on the host for the instance.: NoValidHost: No valid host was found. 
Unable to move instance 7e00-7913-4de9-8f45-ce13fcb8a104 to host 
openstack-17. There is not enough capacity on the host for the instance.
  2019-03-18 12:56:49.148 301805 WARNING nova.scheduler.utils 
[req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c 
b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 
f750199c451f432f9d615a147744f4f5] [instance: 
7e00-7913-4de9-8f45-ce13fcb8a104] Setting instance to ACTIVE state.: 
NoValidHost: No valid host was found. Unable to move instance 
7e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough 
capacity on the host for the instance.

  
  When searching who resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec' 
is, used the nova_api database


  select * from resource_providers where 
uuid='4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec';
  
+-+-++--+--++--+
  | created_at  | updated_at  | id | uuid   
  | name | generation | can_host |
  
+-+-++--+--++--+
  | 2018-05-09 11:00:01 | 2019-03-14 10:47:55 | 39 | 
4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec | openstack-6.maas |171 | NULL 
|
  
+-+-++--+--++--+
  1 row in set (0.00 sec)


  
  So that is openstack-6 and nog 17 as mentioned in the above logging. From the 
logging provided this is not clear, also there does not seem to be an command 
to retrieve the resource-provider, based on the uuid and that is the only thing 
logged.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1820612/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : 

[Yahoo-eng-team] [Bug 1713499] Re: Cannot delete a neutron network, if the currently configured MTU is lower than the network's MTU

2018-09-24 Thread Billy Olsen
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1713499

Title:
  Cannot delete a neutron network, if the currently configured MTU is
  lower than the network's MTU

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  Currently, the neutron API returns an error [1] when trying to delete
  a neutron network which has a higher MTU than the configured
  MTU[2][3].

  This issue has been noticed in Pike.

  [1] Error: http://paste.openstack.org/show/619627/
  [2] neutron.conf: http://paste.openstack.org/show/619629/
  [3] ml2_conf.ini: http://paste.openstack.org/show/619630/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1713499/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1582585] Re: the speed of query user from ldap server is very slow

2018-01-09 Thread Billy Olsen
** Also affects: keystone (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1582585

Title:
  the speed of query user from ldap server is very slow

Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Identity (keystone):
  Fix Released
Status in keystone package in Ubuntu:
  New

Bug description:
  In our project, the speed of query user from ldap server is very
  slow,our ldap user number is 12,000,the query costs almost 45 seconds

  The reason is that keystone will generate the uuid for the ldap users one by 
one and insert db.And second query time later,it also goes to db,not use the 
cache.
  So adding the cache to improve the query speed

  After adding @MEMOIZE to the following function
  
https://github.com/openstack/keystone/blob/master/keystone/identity/core.py#L580.
  First query time almost costs 50 seconds,but second query time later it only 
costs 7 seconds.

  So it is very necessary to improve this feature

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1582585/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1613900] Re: Unable to use 'Any' availability zone when spawning instance

2018-01-03 Thread Billy Olsen
** Also affects: horizon (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1613900

Title:
  Unable to use 'Any' availability zone when spawning instance

Status in Ubuntu Cloud Archive:
  In Progress
Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in horizon package in Ubuntu:
  In Progress

Bug description:
  While using Mitaka, we found that by default, using js backend, it is
  not possible to choose 'any' availability zone. The issue is not fixed
  in master branch.

  For python implementation the logic is:
  
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/instances/workflows/create_instance.py#L390

  The JS implementation miss the logic if number of AZs is >1
  
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js#L321

  Also, JS implementation looks ugly if you have lot of subnets per
  network...

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1613900/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1613900] Re: Unable to use 'Any' availability zone when spawning instance

2018-01-03 Thread Billy Olsen
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1613900

Title:
  Unable to use 'Any' availability zone when spawning instance

Status in Ubuntu Cloud Archive:
  In Progress
Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in horizon package in Ubuntu:
  In Progress

Bug description:
  While using Mitaka, we found that by default, using js backend, it is
  not possible to choose 'any' availability zone. The issue is not fixed
  in master branch.

  For python implementation the logic is:
  
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/instances/workflows/create_instance.py#L390

  The JS implementation miss the logic if number of AZs is >1
  
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js#L321

  Also, JS implementation looks ugly if you have lot of subnets per
  network...

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1613900/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1718287] [NEW] systemd mount targets fail due to device busy or already mounted

2017-09-19 Thread Billy Olsen
Public bug reported:

[Issue]

After rebooting a 16.04 AWS instance (ami-1d4e7a66) with several
external disks attached, formatted, and added to /etc/fstab - systemd
mount targets fail to mount with:

● media-v.mount - /media/v
   Loaded: loaded (/etc/fstab; bad; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-09-19 20:12:18 UTC; 1min 
54s ago
Where: /media/v
 What: /dev/xvdv
 Docs: man:fstab(5)
   man:systemd-fstab-generator(8)
  Process: 1196 ExecMount=/bin/mount /dev/xvdv /media/v -t ext4 -o defaults 
(code=exited, status=32)

Sep 19 20:12:17 ip-172-31-7-167 systemd[1]: Mounting /media/v...
Sep 19 20:12:17 ip-172-31-7-167 mount[1196]: mount: /dev/xvdv is already 
mounted or /media/v busy
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Mount process 
exited, code=exited status=32
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: Failed to mount /media/v.
Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Unit entered failed 
state.


>From the cloud-init logs, it appears that the the OVF datasource is mounting 
>the device to find data:

2017-09-19 20:12:17,502 - util.py[DEBUG]: Peeking at /dev/xvdv (max_bytes=512)
2017-09-19 20:12:17,502 - util.py[DEBUG]: Reading from /proc/mounts 
(quiet=False)
2017-09-19 20:12:17,502 - util.py[DEBUG]: Read 2570 bytes from /proc/mounts
...
2017-09-19 20:12:17,506 - util.py[DEBUG]: Running command ['mount', '-o', 
'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] with allowed 
return codes [0] (shell=False, capture=True)
2017-09-19 20:12:17,545 - util.py[DEBUG]: Failed mount of '/dev/xvdv' as 
'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', 
'/tmp/tmpw2tyqqid']
Exit code: 32
Reason: -
Stdout: -
Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdv,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.
2017-09-19 20:12:17,545 - util.py[DEBUG]: Recursively deleting /tmp/tmpw2tyqqid
2017-09-19 20:12:17,545 - DataSourceOVF.py[DEBUG]: /dev/xvdv not mountable as 
iso9660


[Vitals]

Version: 0.7.9-153-g16a7302f-0ubuntu1~16.04.2
OS: Ubuntu 16.04
Provider: AWS - ami-1d4e7a66

[Recreate]

To recreate this

1. Launch an AWS instance using AMI ami-1d4e7a66 and attach several
disks (I used 25 additional disks)

2. Format and mount all 25:
   mkdir /media/{b..z}
   for i in {b..z}; do 
   mkfs -t ext4 /dev/xvd$i
   mount /dev/xvd$i /media/$i
   echo "/dev/xvd$i /media/$i ext4 defaults,nofail 0 2" >> /etc/fstab
   done

3. reboot instance

Since this is a race, multiple may be necessary. A reproducer script is
attached.

** Affects: cloud-init
 Importance: Undecided
 Status: New


** Tags: sts

** Attachment added: "cloud-init.tar"
   
https://bugs.launchpad.net/bugs/1718287/+attachment/4953081/+files/cloud-init.tar

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1718287

Title:
  systemd mount targets fail due to device busy or already mounted

Status in cloud-init:
  New

Bug description:
  [Issue]

  After rebooting a 16.04 AWS instance (ami-1d4e7a66) with several
  external disks attached, formatted, and added to /etc/fstab - systemd
  mount targets fail to mount with:

  ● media-v.mount - /media/v
 Loaded: loaded (/etc/fstab; bad; vendor preset: enabled)
 Active: failed (Result: exit-code) since Tue 2017-09-19 20:12:18 UTC; 1min 
54s ago
  Where: /media/v
   What: /dev/xvdv
   Docs: man:fstab(5)
 man:systemd-fstab-generator(8)
Process: 1196 ExecMount=/bin/mount /dev/xvdv /media/v -t ext4 -o defaults 
(code=exited, status=32)

  Sep 19 20:12:17 ip-172-31-7-167 systemd[1]: Mounting /media/v...
  Sep 19 20:12:17 ip-172-31-7-167 mount[1196]: mount: /dev/xvdv is already 
mounted or /media/v busy
  Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Mount process 
exited, code=exited status=32
  Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: Failed to mount /media/v.
  Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Unit entered 
failed state.

  
  From the cloud-init logs, it appears that the the OVF datasource is mounting 
the device to find data:

  2017-09-19 20:12:17,502 - util.py[DEBUG]: Peeking at /dev/xvdv (max_bytes=512)
  2017-09-19 20:12:17,502 - util.py[DEBUG]: Reading from /proc/mounts 
(quiet=False)
  2017-09-19 20:12:17,502 - util.py[DEBUG]: Read 2570 bytes from /proc/mounts
  ...
  2017-09-19 20:12:17,506 - util.py[DEBUG]: Running command ['mount', '-o', 
'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] with allowed 
return codes [0] (shell=False, capture=True)
  2017-09-19 20:12:17,545 - util.py[DEBUG]: Failed mount of '/dev/xvdv' as 
'iso9660': Unexpected error while running command.
  Command: 

[Yahoo-eng-team] [Bug 1668410] Re: [SRU] Infinite loop trying to delete deleted HA router

2017-08-30 Thread Billy Olsen
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1668410

Title:
  [SRU] Infinite loop trying to delete deleted HA router

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  In Progress
Status in OpenStack Security Advisory:
  Won't Fix
Status in neutron package in Ubuntu:
  Triaged

Bug description:
  [Impact]

  When deleting a router the logfile is filled up. See full log -
  http://paste.ubuntu.com/25429257/

  I can see the error 'Error while deleting router
  c0dab368-5ac8-4996-88c9-f5d345a774a6' occured 3343386 times from
  _safe_router_removed() [1]:

  $ grep -r 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' 
|wc -l
  3343386

  This _safe_router_removed() is invoked by L488 [2], if
  _safe_router_removed() goes wrong it will return False, then
  self._resync_router(update) [3] will make the code
  _safe_router_removed be run again and again. So we saw so many errors
  'Error while deleting router X'.

  [1] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L361
  [2] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488
  [3] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L457

  [Test Case]

  That's because race condition between neutron server and L3 agent,
  after neutron server deletes HA interfaces the L3 agent may sync a HA
  router without HA interface info (just need to trigger L708[1] after
  deleting HA interfaces and before deleting HA router). If we delete HA
  router at this time, this problem will happen. So test case we design
  is as below:

  1, First update fixed package, and restart neutron-server by 'sudo
  service neutron-server restart'

  2, Create ha_router

  neutron router-create harouter --ha=True

  3, Delete ports associated with ha_router before deleting ha_router

  neutron router-port-list harouter |grep 'HA port' |awk '{print $2}' |xargs -l 
neutron port-delete
  neutron router-port-list harouter

  4, Update ha_router to trigger l3-agent to update ha_router info
  without ha_port into self.router_info

  neutron router-update harouter --description=test

  5, Delete ha_router this time

  neutron router-delete harouter

  [1] https://github.com/openstack/neutron/blob/mitaka-
  eol/neutron/db/l3_hamode_db.py#L708

  [Regression Potential]

  The fixed patch [1] for neutron-server will no longer return ha_router
  which is missing ha_ports, so L488 will no longer have chance to call
  _safe_router_removed() for a ha_router, so the problem has been
  fundamentally fixed by this patch and no regression potential.

  Besides, this fixed patch has been in mitaka-eol branch now, and
  neutron-server mitaka package is based on neutron-8.4.0, so we need to
  backport it to xenial and mitaka.

  $ git tag --contains 8c77ee6b20dd38cc0246e854711cb91cffe3a069
  mitaka-eol

  [1] https://review.openstack.org/#/c/440799/2/neutron/db/l3_hamode_db.py
  [2] 
https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1668410/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1685881] Re: l3-agent-router-add doesn't error/warn about router already existing on agent

2017-04-24 Thread Billy Olsen
Adding neutron as it doesn't appear that this is charm related. The
error command that should error/warn is from the neutron cli itself.

** Also affects: neutron
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1685881

Title:
  l3-agent-router-add doesn't error/warn about router already existing
  on agent

Status in OpenStack neutron-api charm:
  New
Status in neutron:
  New

Bug description:
  we had an incident on a network that ended up with random packet
  dropping between nodes within the cloud, and outside of cloud when
  crossing l3-routers.

  Steps to reproduce:
  juju set neutron-api min-agents-per-router=2
  juju set neutron-api max-agents-per-router=2
  juju set neutron-api l2-population=false
  juju set neutron-api enable-l3ha=true
  for i in $(neutron router-list -f value -c id); do
  neutron router-update $i --admin-state=up=false
  neutron router-update $i --ha=true
  neutron router-update $i --admin-state=up=true
  done
  juju set neutron-api max-agents-per-router=3
  neutron
  for i in $(neutron router-list -f value -c id); do
neutron l3-agent-list-hosting-router $i
for j in $(neutron agent-list -f value -c id); do
  neutron l3-agent-router-add $j $i
done
  done
  sleep 120 #for settle
  for i in $(neutron router-list -f value -c id); do
neutron l3-agent-list-hosting-router $i
  done

  Potentially you may see two active l3-agents for a given router.  (We
  saw this corresponded to rabbitmq messaging failures concurrent with
  this activity).  Our environment had 9 active routers.

  You'll notice that there's no error that comes out of adding a router
  to an agent it's already running on.

  After making these updates, we found that ssh and RDP sessions to the
  floating IPs associated with VMs across several different
  networks/routers were exhibiting random session drops as if the route
  were hosted in multiple locations and we were getting an asymmetric
  route issue.

  We had to revert to --ha=false and enable-l3ha=false before we could
  gather deeper info/SOS reports.  May be able to reproduce in lab at
  some point in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-api/+bug/1685881/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1629097] Re: neutron-rootwrap processes not getting cleaned up

2017-03-10 Thread Billy Olsen
I saw this last night, and can indeed confirm its related to DNS issues.
It was suggested by a colleague that it may be related to the sudo call
returning an error indicating that the hostname could not be found,
though I spent no time exploring this option today. Restarting the
openvswitch-switch service closes all the existing processes but it
seems more that the service can't stop cleanly. Also of note is that the
service stop took a very long time, whereas with working DNS it took a
few seconds at worst.

** Changed in: neutron
   Status: Expired => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1629097

Title:
  neutron-rootwrap processes not getting cleaned up

Status in neutron:
  Confirmed

Bug description:
  neutron-rootwrap processes aren't getting cleaned up on Newton.  I'm
  testing with Newton rc3.

  I was noticing memory exhaustion on my neutron gateway units, which turned 
out to be due to compounding neutron-rootwrap processes:
  sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client 
monitor Interface name,ofport,external_ids --format=json

  $ top -n1 -b -o VIRT
  http://paste.ubuntu.com/23252407/

  $ ps aux|grep ovsdb-client
  http://paste.ubuntu.com/23252658/

  Restarting openvswitch cleans up the processes but they just start piling 
again up soon after:
  sudo systemctl restart openvswitch-switch

  At first I thought this was an openvswitch issue, however I reverted
  the code in get_root_helper_child_pid() and neutron-rootwrap processes
  started getting cleaned up. See corresponding commit for code that
  possibly introduced this at [1].

  This can be recreated with the openstack charms using xenial-newton-
  staging.  On newton deploys, neutron-gateway and nova-compute units
  will exhaust memory due to compounding ovsdb-client processes.

  [1]
  commit fd93e19f2a415b3803700fc491749daba01a4390
  Author: Assaf Muller 
  Date:   Fri Mar 18 16:29:26 2016 -0400

  Change get_root_helper_child_pid to stop when it finds cmd

  get_root_helper_child_pid recursively finds the child of pid,
  until it can no longer find a child. However, the intention is
  not to find the deepest child, but to strip away root helpers.
  For example 'sudo neutron-rootwrap x' is supposed to find the
  pid of x. However, in cases 'x' spawned quick lived children of
  its own (For example: ip / brctl / ovs invocations),
  get_root_helper_child_pid returned those pids if called in
  the wrong time.

  Change-Id: I582aa5c931c8bfe57f49df6899445698270bb33e
  Closes-Bug: #1558819

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1629097/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1623327] Re: openstack orchestration service list fails to return endpoint

2016-12-12 Thread Billy Olsen
Based on Brad's comment in #9, there were actions that were missing for
the openstack orchestration service. I believe this to no longer be a
valid bug, therefore I'm marking remaining tasks as invalid.

** Changed in: python-openstackclient
   Status: New => Invalid

** Changed in: keystone
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1623327

Title:
  openstack orchestration service list fails to return endpoint

Status in OpenStack Identity (keystone):
  Invalid
Status in python-heatclient:
  Invalid
Status in python-openstackclient:
  Invalid

Bug description:
  OpenStack service endpoints are created for the heat service, but the
  openstack client cannot find the endpoints to issue the query against.
  I suspect this is due to the domain auth tokens included in the
  initial authentication doesn't include any endpoints with the
  $(tenant_id)s in the output there.

  I'm not sure whether this should be a bug against the openstack client
  or against keystone. I believe its intentional to exclude the
  endpoints with a tenant_id substitution in the endpoint, but it
  doesn't make any sense to me as it seems the openstack catalog list
  command uses this catalog query in order to list endpoints and
  services, which it only gets the service but not the endpoints.

  Here's some output collected:

  > openstack catalog list
  +--+-++
  | Name | Type| Endpoints  |
  +--+-++
  | heat | orchestration   ||
  | heat-cfn | cloudformation  | RegionOne  |
  |  | |   public: http://10.5.20.176:8000/v1   |
  |  | | RegionOne  |
  |  | |   admin: http://10.5.20.176:8000/v1|
  |  | | RegionOne  |
  |  | |   internal: http://10.5.20.176:8000/v1 |
  |  | ||

  ...

  > openstack endpoint list | grep heat
  | 85ee6b6e8f814856a3a547982f6b2835 | RegionOne  | heat | 
orchestration   | True| internal  | 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
  | 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne  | heat | 
orchestration   | True| public| 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
  | ad63a139c90749ff9d98a704200d2e49 | RegionOne  | heat | 
orchestration   | True| admin | 
http://10.5.20.176:8004/v1/$(tenant_id)s  |


  > openstack orchestration service list
  public endpoint for orchestration service not found

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1623327/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1623327] [NEW] openstack orchestration service list fails to return endpoint

2016-09-13 Thread Billy Olsen
Public bug reported:

OpenStack service endpoints are created for the heat service, but the
openstack client cannot find the endpoints to issue the query against. I
suspect this is due to the domain auth tokens included in the initial
authentication doesn't include any endpoints with the $(tenant_id)s in
the output there.

I'm not sure whether this should be a bug against the openstack client
or against keystone. I believe its intentional to exclude the endpoints
with a tenant_id substitution in the endpoint, but it doesn't make any
sense to me as it seems the openstack catalog list command uses this
catalog query in order to list endpoints and services, which it only
gets the service but not the endpoints.

Here's some output collected:

> openstack catalog list
+--+-++
| Name | Type| Endpoints  |
+--+-++
| heat | orchestration   ||
| heat-cfn | cloudformation  | RegionOne  |
|  | |   public: http://10.5.20.176:8000/v1   |
|  | | RegionOne  |
|  | |   admin: http://10.5.20.176:8000/v1|
|  | | RegionOne  |
|  | |   internal: http://10.5.20.176:8000/v1 |
|  | ||

...

> openstack endpoint list | grep heat
| 85ee6b6e8f814856a3a547982f6b2835 | RegionOne  | heat | 
orchestration   | True| internal  | 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
| 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne  | heat | 
orchestration   | True| public| 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
| ad63a139c90749ff9d98a704200d2e49 | RegionOne  | heat | 
orchestration   | True| admin | 
http://10.5.20.176:8004/v1/$(tenant_id)s  |


> openstack orchestration service list
public endpoint for orchestration service not found

** Affects: keystone
 Importance: Undecided
 Status: New

** Affects: python-openstackclient
 Importance: Undecided
 Status: New


** Tags: canonical-bootstack

** Also affects: python-openstackclient
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1623327

Title:
  openstack orchestration service list fails to return endpoint

Status in OpenStack Identity (keystone):
  New
Status in python-openstackclient:
  New

Bug description:
  OpenStack service endpoints are created for the heat service, but the
  openstack client cannot find the endpoints to issue the query against.
  I suspect this is due to the domain auth tokens included in the
  initial authentication doesn't include any endpoints with the
  $(tenant_id)s in the output there.

  I'm not sure whether this should be a bug against the openstack client
  or against keystone. I believe its intentional to exclude the
  endpoints with a tenant_id substitution in the endpoint, but it
  doesn't make any sense to me as it seems the openstack catalog list
  command uses this catalog query in order to list endpoints and
  services, which it only gets the service but not the endpoints.

  Here's some output collected:

  > openstack catalog list
  +--+-++
  | Name | Type| Endpoints  |
  +--+-++
  | heat | orchestration   ||
  | heat-cfn | cloudformation  | RegionOne  |
  |  | |   public: http://10.5.20.176:8000/v1   |
  |  | | RegionOne  |
  |  | |   admin: http://10.5.20.176:8000/v1|
  |  | | RegionOne  |
  |  | |   internal: http://10.5.20.176:8000/v1 |
  |  | ||

  ...

  > openstack endpoint list | grep heat
  | 85ee6b6e8f814856a3a547982f6b2835 | RegionOne  | heat | 
orchestration   | True| internal  | 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
  | 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne  | heat | 
orchestration   | True| public| 
http://10.5.20.176:8004/v1/$(tenant_id)s  |
  | ad63a139c90749ff9d98a704200d2e49 | RegionOne  | heat | 
orchestration   | True| admin 

[Yahoo-eng-team] [Bug 1453264] Re: iptables_manager can run very slowly when a large number of security group rules are present

2016-08-29 Thread Billy Olsen
Uploading debdiff based on what is currently available in trusty-
proposed since that has been verified and pending release.

** Description changed:

+ [Impact]
+ 
  We have customers that typically add a few hundred security group rules
  or more.  We also typically run 30+ VMs per compute node.  When about
  10+ VMs with a large SG set all get scheduled to the same node, the L2
  agent (OVS) can spend many minutes in the iptables_manager.apply() code,
  so much so that by the time all the rules are updated, the VM has
  already tried DHCP and failed, leaving it in an unusable state.
  
  While there have been some patches that tried to address this in Juno
  and Kilo, they've either not helped as much as necessary, or broken SGs
  completely due to re-ordering the of the iptables rules.
  
  I've been able to show some pretty bad scaling with just a handful of
  VMs running in devstack based on today's code (May 8th, 2015) from
  upstream Openstack.
+ 
+ 
+ [Test Case]
  
  Here's what I tested:
  
  1. I created a security group with 1000 TCP port rules (you could
  alternately have a smaller number of rules and more VMs, but it's
  quicker this way)
  
  2. I booted VMs, specifying both the default and "large" SGs, and timed
  from the second it took Neutron to "learn" about the port until it
  completed it's work
  
  3. I got a :( pretty quickly
  
  And here's some data:
  
  1-3 VM - didn't time, less than 20 seconds
  4th VM - 0:36
  5th VM - 0:53
  6th VM - 1:11
  7th VM - 1:25
  8th VM - 1:48
  9th VM - 2:14
  
  While it's busy adding the rules, the OVS agent is consuming pretty
  close to 100% of a CPU for most of this time (from top):
  
-   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND   
  
+   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
  25767 stack 20   0  157936  76572   4416 R  89.2  0.5  50:14.28 python
  
  And this is with only ~10K rules at this point!  When we start crossing
  the 20K point VM boot failures start to happen.
  
  I'm filing this bug since we need to take a closer look at this in
  Liberty and fix it, it's been this way since Havana and needs some TLC.
  
  I've attached a simple script I've used to recreate this, and will start
  taking a look at options here.
+ 
+ 
+ [Regression Potential]
+ 
+ Minimal since this has been running in upstream stable for several
+ releases now (Kilo, Liberty, Mitaka).

** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

** Patch added: "trusty patch based on -proposed"
   
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1453264/+attachment/4730270/+files/lp1453264.debdiff

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1453264

Title:
  iptables_manager can run very slowly when a large number of security
  group rules are present

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released
Status in neutron kilo series:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  [Impact]

  We have customers that typically add a few hundred security group
  rules or more.  We also typically run 30+ VMs per compute node.  When
  about 10+ VMs with a large SG set all get scheduled to the same node,
  the L2 agent (OVS) can spend many minutes in the
  iptables_manager.apply() code, so much so that by the time all the
  rules are updated, the VM has already tried DHCP and failed, leaving
  it in an unusable state.

  While there have been some patches that tried to address this in Juno
  and Kilo, they've either not helped as much as necessary, or broken
  SGs completely due to re-ordering the of the iptables rules.

  I've been able to show some pretty bad scaling with just a handful of
  VMs running in devstack based on today's code (May 8th, 2015) from
  upstream Openstack.

  
  [Test Case]

  Here's what I tested:

  1. I created a security group with 1000 TCP port rules (you could
  alternately have a smaller number of rules and more VMs, but it's
  quicker this way)

  2. I booted VMs, specifying both the default and "large" SGs, and
  timed from the second it took Neutron to "learn" about the port until
  it completed it's work

  3. I got a :( pretty quickly

  And here's some data:

  1-3 VM - didn't time, less than 20 seconds
  4th VM - 0:36
  5th VM - 0:53
  6th VM - 1:11
  7th VM - 1:25
  8th VM - 1:48
  9th VM - 2:14

  While it's busy adding the rules, the OVS agent is consuming pretty
  close to 100% of a CPU for most of this time (from top):

    PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
  25767 stack 20   0  157936  76572   4416 R  89.2  0.5  50:14.28 python

  And this is with only ~10K rules at this point!  When we start
  crossing the 20K point VM boot failures start to 

[Yahoo-eng-team] [Bug 1607039] [NEW] KVS _update_user_token_list can be more efficient

2016-07-27 Thread Billy Olsen
Public bug reported:

Maintaining the user token list and the revocation list in the memcached
persistence backend (kvs) is inefficient for larger amounts of tokens
due to the use of a linear algorithm for token list maintenance.

Since the list is unordered, each token within the list must be checked
first to ensure whether it has expired or not, secondly to determine if
it has been revoked or not. By changing to an ordered list and using a
binary search, expired tokens can be found with less computational
overhead.

The current algorithm means that the insertion of a new token into the
list is O(n) since token expiration validity is done when the list is
updated. By using an ordered list, the insertion and validation of the
expiration can be reduced to O(log n).

** Affects: keystone
 Importance: Undecided
 Status: New


** Tags: sts

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1607039

Title:
  KVS _update_user_token_list can be more efficient

Status in OpenStack Identity (keystone):
  New

Bug description:
  Maintaining the user token list and the revocation list in the
  memcached persistence backend (kvs) is inefficient for larger amounts
  of tokens due to the use of a linear algorithm for token list
  maintenance.

  Since the list is unordered, each token within the list must be
  checked first to ensure whether it has expired or not, secondly to
  determine if it has been revoked or not. By changing to an ordered
  list and using a binary search, expired tokens can be found with less
  computational overhead.

  The current algorithm means that the insertion of a new token into the
  list is O(n) since token expiration validity is done when the list is
  updated. By using an ordered list, the insertion and validation of the
  expiration can be reduced to O(log n).

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1607039/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1414218] Re: Remove extraneous trace in linux/dhcp.py

2016-07-05 Thread Billy Olsen
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Changed in: cloud-archive
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1414218

Title:
  Remove extraneous trace in linux/dhcp.py

Status in Ubuntu Cloud Archive:
  Confirmed
Status in neutron:
  Fix Released
Status in neutron juno series:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  [Impact]

  The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and
  causes unnecessary performance overhead when creating lots (> 1000)
  ports at one time.

  The trace point is unnecessary since the data is being written to disk
  and the file can be examined in a worst case scenario. The added
  performance overhead is an order of magnitude in difference (~.5
  seconds versus ~.05 seconds at 1500 ports).

  [Test Case]

  1. Deploy OpenStack using neutron for networking
  2. Create 1500 ports
  3. Observe the performance degradation for each port creation.

  [Regression Potential]

  Minimal. This code has been running in stable/juno, stable/kilo, and
  above for awhile.

  [Other Questions]

  This is likely to occur in OpenStack deployments which have large
  networks deployed. The degradation is gradual, but the performance
  becomes unacceptable with large enough networks.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1414218/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1554227] Re: DHCP unicast requests are not responded to

2016-03-08 Thread Billy Olsen
** Also affects: nova (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1554227

Title:
  DHCP unicast requests are not responded to

Status in OpenStack Compute (nova):
  New
Status in nova package in Ubuntu:
  New

Bug description:
  Issue:
  We run nova-network in VLAN+multi_host mode on Kilo and notice that only one 
dnsmasq process (either the oldest or newest) on the hypervisor responds to 
unicast BOOTPREQUESTS. dhclient on VMs will retry until it eventually gives up 
and broadcasts the request, which is then responded to. Depending on the timing 
of the DHCP broadcast request, VMs can briefly lose connectivity as they 
attempt rebinding.

  According to
  
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=9380ba70d67db6b69f817d8e318de5ba1e990b12,
  it seems that passing "--interface" argument, in addition to "--bind-
  interfaces" is necessary for dnsmasq to work correctly in VLAN mode.

  
  Reproduce steps:
  1. Create two tenants
  2. Create a VM under each tenant, forcing the VMs to run on a single 
hypervisor. I tested with a vanilla Ubuntu cloud image, but any other image 
that uses dhclient should also work.
  3. On the hypervisor, run dhcpdump -i  for each tenant's 
bridge interface. On at least one of them, you should see unicast BOOTPREQUEST 
with no corresponding BOOTPREPLY. dnsmasq will reply when the request 
eventually hits 255.255.255.255.

  
  Nova/Openstack/dnsmasq versions:
  ii  nova-api 1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - API frontend
  ii  nova-common  1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - common files
  ii  nova-compute 1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - compute node base
  ii  nova-compute-libvirt 1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - compute node libvirt support
  ii  nova-compute-qemu1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - compute node (QEmu)
  ii  nova-network 1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - Network manager
  ii  nova-novncproxy  1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute - NoVNC proxy
  ii  python-nova  1:2015.1.2-0ubuntu2~cloud0   
 all  OpenStack Compute Python libraries
  ii  python-nova-adminclient  0.1.8-0ubuntu2   
 amd64client for administering Openstack Nova
  ii  python-novaclient1:2.22.0-0ubuntu2~cloud0 
 all  client library for OpenStack Compute API
  ii  dnsmasq-base 2.68-1ubuntu0.1  
 amd64Small caching DNS proxy and DHCP/TFTP server
  ii  dnsmasq-utils2.68-1ubuntu0.1  
 amd64Utilities for manipulating DHCP leases

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1554227/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1374999] Re: iSCSI volume detach does not correctly remove the multipath device descriptors

2016-01-22 Thread Billy Olsen
Marking this is confirmed against the Ubuntu Cloud Archive for Kilo,
Juno, and Trusty which are still supported from the Ubuntu perspective
and is known not to include the os-brick library dependencies.
Certainly, the testing for the change to os-brick needs to be verified
that the problem is fixed there, so leaving that as incomplete.

** Changed in: cloud-archive/icehouse
   Status: Invalid => Confirmed

** Changed in: cloud-archive/juno
   Status: Invalid => Confirmed

** Changed in: cloud-archive/kilo
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1374999

Title:
  iSCSI volume detach does not correctly remove the multipath device
  descriptors

Status in Ubuntu Cloud Archive:
  Confirmed
Status in Ubuntu Cloud Archive icehouse series:
  Confirmed
Status in Ubuntu Cloud Archive juno series:
  Confirmed
Status in Ubuntu Cloud Archive kilo series:
  Confirmed
Status in OpenStack Compute (nova):
  Incomplete
Status in nova package in Ubuntu:
  Triaged
Status in nova source package in Trusty:
  Triaged

Bug description:
  iSCSI volume detach does not correctly remove the multipath device
  descriptors

  tested environment:
  nova-compute on Ubuntu 14.04.1, iscsi_use_multipath=True and iSCSI volume 
backend is EMC VNX 5300.

   I created 3 cinder volumes and attached them to a nova instance. Then I 
detach them one by one. First 2 volumes volumes detached successfully.  3rd 
volume also successfully detached but ended up with  failed multipaths. 
  Here is the terminal log for last volume detach.

  openstack@W1DEV103:~/devstack$ cinder list
  
+--++--+--+-+--+--+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable |
   Attached to
   |
  
+--++--+--+-+--+--+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | in-use | None | 1 |
   None
   | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e |
  
+--++--+--+-+--+--+
  openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l
  Fri Sep 19 21:38:13 JST 2014
  360060160cf0036002d1475f6e73fe411 dm-2 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | |- 4:0:0:42 sdb 8:16 active undef running
  | |- 5:0:0:42 sdd 8:48 active undef running
  | |- 6:0:0:42 sdf 8:80 active undef running
  | `- 7:0:0:42 sdh 8:112 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
  |- 11:0:0:42 sdp 8:240 active undef running
  |- 8:0:0:42 sdj 8:144 active undef running
  |- 9:0:0:42 sdl 8:176 active undef running
  `- 10:0:0:42 sdn 8:208 active undef running
  openstack@W1CN103:/etc/iscsi$ date;sudo iscsiadm -m session
  Fri Sep 19 21:38:19 JST 2014
  tcp: [10] 172.23.58.228:3260,4 iqn.1992-04.com.emc:cx.fcn00133400150.a7
  tcp: [3] 172.23.58.238:3260,8 iqn.1992-04.com.emc:cx.fcn00133400150.b7
  tcp: [4] 172.23.58.235:3260,20 iqn.1992-04.com.emc:cx.fcn00133400150.b4
  tcp: [5] 172.23.58.236:3260,6 iqn.1992-04.com.emc:cx.fcn00133400150.b5
  tcp: [6] 172.23.58.237:3260,19 iqn.1992-04.com.emc:cx.fcn00133400150.b6
  tcp: [7] 172.23.58.225:3260,16 iqn.1992-04.com.emc:cx.fcn00133400150.a4
  tcp: [8] 172.23.58.226:3260,2 iqn.1992-04.com.emc:cx.fcn00133400150.a5
  tcp: [9] 172.23.58.227:3260,17 iqn.1992-04.com.emc:cx.fcn00133400150.a6

  openstack@W1DEV103:~/devstack$ nova volume-detach 
5bd68785-4acf-43ab-ae13-11b1edc3a62e
  56a63288-5cc0-4f5c-9197-cde731172dd8
  openstack@W1DEV103:~/devstack$
  openstack@W1DEV103:~/devstack$ cinder list
  
+--+---+--+--+-+--+--+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable |
   Attached to
   |
  
+--+---+--+--+-+--+--+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | detaching | None | 1 |
   None
   | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e|

  
+--+---+--+--+-+--+--+
  openstack@W1DEV103:~/devstack$
  openstack@W1DEV103:~/devstack$ cinder list
  
+--+---+--+--+-+--+-+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable | Attached to |
  
+--+---+--+--+-+--+-+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | available | None | 1 |
   None
   | false |
  

[Yahoo-eng-team] [Bug 1374999] Re: iSCSI volume detach does not correctly remove the multipath device descriptors

2016-01-11 Thread Billy Olsen
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1374999

Title:
  iSCSI volume detach does not correctly remove the multipath device
  descriptors

Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Compute (nova):
  In Progress
Status in nova package in Ubuntu:
  Triaged
Status in nova source package in Trusty:
  Triaged

Bug description:
  iSCSI volume detach does not correctly remove the multipath device
  descriptors

  tested environment:
  nova-compute on Ubuntu 14.04.1, iscsi_use_multipath=True and iSCSI volume 
backend is EMC VNX 5300.

   I created 3 cinder volumes and attached them to a nova instance. Then I 
detach them one by one. First 2 volumes volumes detached successfully.  3rd 
volume also successfully detached but ended up with  failed multipaths. 
  Here is the terminal log for last volume detach.

  openstack@W1DEV103:~/devstack$ cinder list
  
+--++--+--+-+--+--+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable |
   Attached to
   |
  
+--++--+--+-+--+--+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | in-use | None | 1 |
   None
   | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e |
  
+--++--+--+-+--+--+
  openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l
  Fri Sep 19 21:38:13 JST 2014
  360060160cf0036002d1475f6e73fe411 dm-2 DGC,VRAID
  size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | |- 4:0:0:42 sdb 8:16 active undef running
  | |- 5:0:0:42 sdd 8:48 active undef running
  | |- 6:0:0:42 sdf 8:80 active undef running
  | `- 7:0:0:42 sdh 8:112 active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
  |- 11:0:0:42 sdp 8:240 active undef running
  |- 8:0:0:42 sdj 8:144 active undef running
  |- 9:0:0:42 sdl 8:176 active undef running
  `- 10:0:0:42 sdn 8:208 active undef running
  openstack@W1CN103:/etc/iscsi$ date;sudo iscsiadm -m session
  Fri Sep 19 21:38:19 JST 2014
  tcp: [10] 172.23.58.228:3260,4 iqn.1992-04.com.emc:cx.fcn00133400150.a7
  tcp: [3] 172.23.58.238:3260,8 iqn.1992-04.com.emc:cx.fcn00133400150.b7
  tcp: [4] 172.23.58.235:3260,20 iqn.1992-04.com.emc:cx.fcn00133400150.b4
  tcp: [5] 172.23.58.236:3260,6 iqn.1992-04.com.emc:cx.fcn00133400150.b5
  tcp: [6] 172.23.58.237:3260,19 iqn.1992-04.com.emc:cx.fcn00133400150.b6
  tcp: [7] 172.23.58.225:3260,16 iqn.1992-04.com.emc:cx.fcn00133400150.a4
  tcp: [8] 172.23.58.226:3260,2 iqn.1992-04.com.emc:cx.fcn00133400150.a5
  tcp: [9] 172.23.58.227:3260,17 iqn.1992-04.com.emc:cx.fcn00133400150.a6

  openstack@W1DEV103:~/devstack$ nova volume-detach 
5bd68785-4acf-43ab-ae13-11b1edc3a62e
  56a63288-5cc0-4f5c-9197-cde731172dd8
  openstack@W1DEV103:~/devstack$
  openstack@W1DEV103:~/devstack$ cinder list
  
+--+---+--+--+-+--+--+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable |
   Attached to
   |
  
+--+---+--+--+-+--+--+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | detaching | None | 1 |
   None
   | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e|

  
+--+---+--+--+-+--+--+
  openstack@W1DEV103:~/devstack$
  openstack@W1DEV103:~/devstack$ cinder list
  
+--+---+--+--+-+--+-+
  |
   ID
   | Status | Name | Size | Volume Type | Bootable | Attached to |
  
+--+---+--+--+-+--+-+
  | 56a63288-5cc0-4f5c-9197-cde731172dd8 | available | None | 1 |
   None
   | false |
  
+--+---+--+--+-+--+-+
  |
  openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l
  Fri Sep 19 21:39:23 JST 2014
  360060160cf0036002d1475f6e73fe411 dm-2 ,
  size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
  |-+- policy='round-robin 0' prio=-1 status=active
  | |- #:#:#:# - #:# active undef running
  | |- #:#:#:# - #:# active undef running
  | |- #:#:#:# - #:# active undef running
  | `- #:#:#:# - #:# active undef running
  `-+- policy='round-robin 0' prio=-1 status=enabled
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# 

[Yahoo-eng-team] [Bug 1353939] Re: Rescue fails with 'Failed to terminate process: Device or resource busy' in the n-cpu log

2015-12-15 Thread Billy Olsen
This fix was made available in 1:2014.2.4-0ubuntu1~cloud4 of nova in the
Ubuntu Cloud Archive for Juno.

** Changed in: cloud-archive/juno
   Status: In Progress => Fix Committed

** Changed in: cloud-archive/juno
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1353939

Title:
  Rescue fails with 'Failed to terminate process: Device or resource
  busy' in the n-cpu log

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive juno series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) juno series:
  New
Status in OpenStack Compute (nova) kilo series:
  Fix Released
Status in nova package in Ubuntu:
  Invalid

Bug description:
  [Impact]

   * Users may sometimes fail to shutdown an instance if the associated qemu
 process is on uninterruptable sleep (typically IO).

  [Test Case]

   * 1. create some IO load in a VM
 2. look at the associated qemu, make sure it has STAT D in ps output
 3. shutdown the instance
 4. with the patch in place, nova will retry calling libvirt to shutdown
the instance 3 times to wait for the signal to be delivered to the 
qemu process.

  [Regression Potential]

   * None


  message: "Failed to terminate process" AND
  message:'InstanceNotRescuable' AND message: 'Exception during message
  handling' AND tags:"screen-n-cpu.txt"

  The above log stash-query reports back only the failed jobs, the 'Failed to 
terminate process' close other failed rescue tests,
  but tempest does not always reports them as an error at the end.

  message: "Failed to terminate process" AND tags:"screen-n-cpu.txt"

  Usual console log:
  Details: (ServerRescueTestJSON:test_rescue_unrescue_instance) Server 
0573094d-53da-40a5-948a-747d181462f5 failed to reach RESCUE status and task 
state "None" within the required time (196 s). Current status: SHUTOFF. Current 
task state: None.

  http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-
  full/90726cb/console.html#_2014-08-07_03_50_26_520

  Usual n-cpu exception:
  
http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-full/90726cb/logs/screen-n-cpu.txt.gz#_2014-08-07_03_32_02_855

  2014-08-07 03:32:02.855 ERROR oslo.messaging.rpc.dispatcher 
[req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 
ServerRescueTestJSON-1017508309] Exception during message handling: Instance 
0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to 
terminate process 26425 with SIGKILL: Device or resource busy
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher Traceback 
(most recent call last):
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 
134, in _dispatch_and_reply
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher 
incoming.message))
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 
177, in _dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 
123, in _do_dispatch
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher result 
= getattr(endpoint, method)(ctxt, **new_args)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/opt/stack/new/nova/nova/compute/manager.py", line 408, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return 
function(self, context, *args, **kwargs)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/opt/stack/new/nova/nova/exception.py", line 88, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher payload)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher 
six.reraise(self.type_, self.value, self.tb)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/opt/stack/new/nova/nova/exception.py", line 71, in wrapped
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return 
f(self, context, *args, **kw)
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher   File 
"/opt/stack/new/nova/nova/compute/manager.py", line 292, in decorated_function
  2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher pass
  2014-08-07 03:32:02.855 

[Yahoo-eng-team] [Bug 1414218] [NEW] Remove extraneous trace in linux/dhcp.py

2015-01-23 Thread Billy Olsen
Public bug reported:

The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and
causes unnecessary performance overhead due to string formating when
creating lots ( 1000) ports at one time.

The trace point is unnecessary since the data is being written to disk
and the file can be examined in a worst case scenario. The added
performance overhead is an order of magnitude in difference (~.5 seconds
versus ~.05 seconds at 1500 ports).

** Affects: neutron
 Importance: Undecided
 Assignee: Billy Olsen (billy-olsen)
 Status: In Progress

** Changed in: neutron
 Assignee: (unassigned) = Billy Olsen (billy-olsen)

** Changed in: neutron
   Status: New = In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1414218

Title:
  Remove extraneous trace in linux/dhcp.py

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and
  causes unnecessary performance overhead due to string formating when
  creating lots ( 1000) ports at one time.

  The trace point is unnecessary since the data is being written to disk
  and the file can be examined in a worst case scenario. The added
  performance overhead is an order of magnitude in difference (~.5
  seconds versus ~.05 seconds at 1500 ports).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1414218/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp