[Yahoo-eng-team] [Bug 1511311] [NEW] L3 agent failed to respawn keepalived process
Public bug reported: I enabled the l3 ha in neutron configuration, and I usually see the following log in l3_agent.log: 2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not found. The process should not have died 2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b 2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid get_value_from_file /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222 2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid', '-r', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84 And I noticed that the counts of vrrp pid files were usually bigger than the "pid" files: root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc -l 664 root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l 677 And seems that if "pid.vrrp" file existed, we can't successfully respawn the keepalived process using this kind of command: keepalived -P -f /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf -p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp So I think in neutron, after we checked that the pid is not active, can we check the existence of "pid" file and "vrrp pid" file and remove them before respawn the keepalived process to make sure the process can be started successfully ? https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1511311 Title: L3 agent failed to respawn keepalived process Status in neutron: New Bug description: I enabled the l3 ha in neutron configuration, and I usually see the following log in l3_agent.log: 2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not found. The process should not have died 2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b 2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid get_value_from_file /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222 2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid', '-r', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84 And I noticed that the counts of vrrp pid files were usually bigger than the "pid" files: root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc -l 664 root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l 677 And seems that if "pid.vrrp" file existed, we can't successfully respawn the keepalived process using this kind of command: keepalived -P -f /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf -p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp So I think in neutron, after we checked that the pid is not active, can we check the existence of "pid" file and "vrrp pid" file and remove them before respawn the keepalived process to make sure the process can be started successfully ? https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1511311/+subscriptions -- Mailing list:
[Yahoo-eng-team] [Bug 1510796] [NEW] Function sync_routers always call "get_dvr_sync_data" in ha but not dvr scenario
Public bug reported: The configuration neutron.conf: [DEFAULT] l3_ha = True service_plugins = router l3_agent.ini: [DEFAULT] ... agent_mode = legacy ... The current code call the "get_dvr_sync_data" through the plugin support dvr or not , it's better to judge "agent mode" here: https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L535-L536 Call "get_sync_data" method here can save more time. ** Affects: neutron Importance: Undecided Assignee: ZongKai LI (lzklibj) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1510796 Title: Function sync_routers always call "get_dvr_sync_data" in ha but not dvr scenario Status in neutron: New Bug description: The configuration neutron.conf: [DEFAULT] l3_ha = True service_plugins = router l3_agent.ini: [DEFAULT] ... agent_mode = legacy ... The current code call the "get_dvr_sync_data" through the plugin support dvr or not , it's better to judge "agent mode" here: https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L535-L536 Call "get_sync_data" method here can save more time. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1510796/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1510399] [NEW] Error when metadata proxy is disabled for ha l3 agent
Public bug reported: The configuration: /etc/neutron/neutron.conf [DEFAULT] l3_ha = True /etc/neutron/l3_agent.ini [DEFAULT] enable_metadata_proxy = False There was an error in l3_agent's log: 2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent self.metadata_driver.destroy_monitored_metadata_proxy( 2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent AttributeError: 'L3NATAgentWithStateReport' object has no attribute 'metadata_driver' I think maybe we need a check before this method : https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha.py#L126 ** Affects: neutron Importance: High Assignee: Hong Hui Xiao (xiaohhui) Status: Confirmed ** Tags: l3-ha liberty-backport-potential -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1510399 Title: Error when metadata proxy is disabled for ha l3 agent Status in neutron: Confirmed Bug description: The configuration: /etc/neutron/neutron.conf [DEFAULT] l3_ha = True /etc/neutron/l3_agent.ini [DEFAULT] enable_metadata_proxy = False There was an error in l3_agent's log: 2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent self.metadata_driver.destroy_monitored_metadata_proxy( 2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent AttributeError: 'L3NATAgentWithStateReport' object has no attribute 'metadata_driver' I think maybe we need a check before this method : https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha.py#L126 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1510399/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1510415] [NEW] Linuxbridge agent failed to create some bridges after the os was rebooted
Public bug reported: When I rebooted the operation system which hosted the Linuxbridge agent , Linuxbridge agent will try to recreate all the bridges according to the updated tap devices. But I have found that some bridge were not created, and I found the weird log message: In l3-agent.log: The tap device seems already created at 2015-10-27 00:53:15 2015-10-27 00:53:15.002 5135 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'add', 'tap5f17438b-6e', 'type', 'veth', 'peer', 'name', 'qr-5f17438b-6e', 'netns', 'qrouter- f3f17112-aadc-4649-9662-6bf25cad569d'] create_process /usr/lib/python2.7 /dist-packages/neutron/agent/linux/utils.py:84 But in linuxbridge-agent.log: the linuxbridge agent think the tap device was not created yet and give up to create the related bridge at 2015-10-27 00:53:27 2015-10-27 00:53:27.774 5088 DEBUG neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673 ] Port tap5f17438b-6e added treat_devices_added_updated /usr/lib/python2.7/dist-packages/neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:865 2015-10-27 00:53:27.774 5088 INFO neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673] Device tap5f17438b-6e not defined on plugin ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1510415 Title: Linuxbridge agent failed to create some bridges after the os was rebooted Status in neutron: New Bug description: When I rebooted the operation system which hosted the Linuxbridge agent , Linuxbridge agent will try to recreate all the bridges according to the updated tap devices. But I have found that some bridge were not created, and I found the weird log message: In l3-agent.log: The tap device seems already created at 2015-10-27 00:53:15 2015-10-27 00:53:15.002 5135 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'add', 'tap5f17438b-6e', 'type', 'veth', 'peer', 'name', 'qr-5f17438b-6e', 'netns', 'qrouter- f3f17112-aadc-4649-9662-6bf25cad569d'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84 But in linuxbridge-agent.log: the linuxbridge agent think the tap device was not created yet and give up to create the related bridge at 2015-10-27 00:53:27 2015-10-27 00:53:27.774 5088 DEBUG neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673 ] Port tap5f17438b-6e added treat_devices_added_updated /usr/lib/python2.7/dist-packages/neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:865 2015-10-27 00:53:27.774 5088 INFO neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent [req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673] Device tap5f17438b-6e not defined on plugin To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1510415/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1508789] [NEW] Linux bridge agent "get_bridge_for_tap_device" can be optimized
Public bug reported: Currently, when lb agent need to get bridge for a tap device, it will iterate all the bridges and all the tap devices on each bridge to check which bridge the tap device bound to . It takes too much time. code: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L210-L217 One better way should to check it from tap device itself. If tap device bound to a bridge, there should has a link named "master" from bridge folder to tap folder "/sys/class/net/tapxxx". So we can check the BRIDGEFS to get the related bridge. One prototype of this change: def get_bridge_for_tap_device(self, tap_device_name): tap_master = BRIDGE_FS + tap_device_name + '/master' if os.path.islink(tap_master) return os.readlink(tap_master).split('/')[1] ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1508789 Title: Linux bridge agent "get_bridge_for_tap_device" can be optimized Status in neutron: New Bug description: Currently, when lb agent need to get bridge for a tap device, it will iterate all the bridges and all the tap devices on each bridge to check which bridge the tap device bound to . It takes too much time. code: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L210-L217 One better way should to check it from tap device itself. If tap device bound to a bridge, there should has a link named "master" from bridge folder to tap folder "/sys/class/net/tapxxx". So we can check the BRIDGEFS to get the related bridge. One prototype of this change: def get_bridge_for_tap_device(self, tap_device_name): tap_master = BRIDGE_FS + tap_device_name + '/master' if os.path.islink(tap_master) return os.readlink(tap_master).split('/')[1] To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1508789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1508270] [NEW] One log level for linux bridge agent was no right
Public bug reported: This log message: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L975-L977 would better to be "WARNING" or "ERROR" level, currently is just "DEBUG". ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1508270 Title: One log level for linux bridge agent was no right Status in neutron: New Bug description: This log message: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L975-L977 would better to be "WARNING" or "ERROR" level, currently is just "DEBUG". To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1508270/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1499177] [NEW] Performance: L2 agent takes too much time to refresh sg rules
Public bug reported: This issue is introducing a performance problem for the L2 agent including LinuxBridge and OVS agent in Compute node when there are lots of networks and instances in this Compute node (eg. 500 instances) The performance problem reflect in two aspects: 1. When LinuxBridge agent service starts up(this seems only happened in LinuxBridge agent not for the OVS agent), I found there were two methods take too much time: 1.1 get_interface_by_ip(), we should find the interface which was assigned with the "local ip" defined in configuration file, and to check whether this interface support "vxlan" or not. This method will iterate all the interface in this compute node and execute "ip link show [interface] to [local ip]" to judge the result. I think there should be a faster way. 1.2 prepare_port_filter() , in this method , we should make sure the ipset are create correctly. But this method will execute too much "ipset" commands and take too much time. 2. When devices' sg rules are changed, L2 agent should refresh the firewalls. 2.1 refresh_firewall() this method will call "modify_rules" to make the rules predicable, but this method also takes too much time. It will be very benefit for the large scales of networks if this performance problem can be fix or optimize. ** Affects: neutron Importance: Undecided Status: New ** Description changed: This issue is introducing a performance problem for the L2 agent including LinuxBridge and OVS agent in Compute node when there are lots of networks and instances in this Compute node (eg. 500 instances) The performance problem reflect in two aspects: 1. When LinuxBridge agent service starts up(this seems only happened in LinuxBridge agent not for the OVS agent), I found there were two methods take too much time: -1.1 get_interface_by_ip(), we should find the interface which was + 1.1 get_interface_by_ip(), we should find the interface which was assigned with the "local ip" defined in configuration file, and to check whether this interface support "vxlan" or not. This method will iterate all the interface in this compute node and execute "ip link show [interface] to [local ip]" to judge the result. I think that should be a faster way. -1.2 prepare_port_filter() , in this method , we should make sure + 1.2 prepare_port_filter() , in this method , we should make sure the ipset are create correctly. But this method will execute too much "ipset" commands and take too much time. + 2. When devices' sg rules are changed, L2 agent should refresh the + firewalls. - 2. When devices' sg rules are changed, L2 agent should refresh the firewalls. - - 2.1 refresh_firewall() this method will call "modify_rules" to make + 2.1 refresh_firewall() this method will call "modify_rules" to make the rules predicable, but this method also takes too much time. It will be very benefit for the large scales of networks if this performance problem can be fix or optimize. - - - If this kind of performance problem ** Description changed: This issue is introducing a performance problem for the L2 agent including LinuxBridge and OVS agent in Compute node when there are lots of networks and instances in this Compute node (eg. 500 instances) The performance problem reflect in two aspects: 1. When LinuxBridge agent service starts up(this seems only happened in LinuxBridge agent not for the OVS agent), I found there were two methods take too much time: 1.1 get_interface_by_ip(), we should find the interface which was assigned with the "local ip" defined in configuration file, and to check whether this interface support "vxlan" or not. This method will iterate all the interface in this compute node and execute "ip link show - [interface] to [local ip]" to judge the result. I think that should be - a faster way. + [interface] to [local ip]" to judge the result. I think there should + be a faster way. 1.2 prepare_port_filter() , in this method , we should make sure the ipset are create correctly. But this method will execute too much "ipset" commands and take too much time. 2. When devices' sg rules are changed, L2 agent should refresh the firewalls. 2.1 refresh_firewall() this method will call "modify_rules" to make the rules predicable, but this method also takes too much time. It will be very benefit for the large scales of networks if this performance problem can be fix or optimize. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1499177 Title: Performance: L2 agent takes too much time to refresh sg rules Status in neutron: New Bug description: This issue is introducing a performance problem for the L2 agent including LinuxBridge and OVS agent in Compute node when
[Yahoo-eng-team] [Bug 1447884] [NEW] Boot from volume, block device allocate timeout cause VM error, but volume would be available later
Public bug reported: When we try to boot multi instances from volume (with a large image source) at the same time, we usually got a block device allocate error as the logs in nova-compute.log: 2015-03-30 23:22:46.920 6445 WARNING nova.compute.manager [-] Volume id: 551ea616-e1c4-4ef2-9bf3-b0ca6d4474dc finished being created but was not set as 'available' 2015-03-30 23:22:47.131 6445 ERROR nova.compute.manager [-] [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Instance failed block device setup 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Traceback (most recent call last): 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/compute/manager.py, line 1829, in _prep_block_device 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach) + 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 407, in attach_block_devices 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] map(_log_and_attach, block_device_mapping) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 405, in _log_and_attach 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] bdm.attach(*attach_args, **attach_kwargs) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 339, in attach 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 46, in wrapped 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] ret_val = method(obj, context, *args, **kwargs) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 229, in attach 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] volume_api.check_attach(context, volume, instance=instance) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] File /usr/lib/python2.6/site-packages/nova/volume/cinder.py, line 305, in check_attach 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] raise exception.InvalidVolume(reason=msg) 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] InvalidVolume: Invalid volume: status must be 'available' 2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] This error cause the VM in error status: +--+++--+-+--+ | ID | Name | Status | Task State | Power State | Networks | +--+++--+-+--+ | 1fa2d7aa-8bd9-4a22-8538-0a07d9dae8aa | inst02 | ERROR | block_device_mapping | NOSTATE | | +--+++--+-+--+ But the volume was in available status: ---+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+--+ | a9ab2dc2-b117-44ef-8678-f71067a9e770 | available | None | 2 | None | true | | +--+---+--+--+-+--+--+ +--+++--+-+--- And when we teminiate this VM, the volume will still exist, since there is no volume attachment info stored in this VM. This can be easily reproduced: 1. add the following options to nova.conf in compute node ( make sure the error
[Yahoo-eng-team] [Bug 1437993] [NEW] VMware: attach an iscsi volume didn't consider the authentication condition
Public bug reported: When we use lioadm as cinder iscsi_helper, it will enable authentication to each volume at this line: https://github.com/openstack/cinder/blob/master/cinder/cmd/rtstool.py#L56 But, currently, VMware driver use Dynamic Discovery to discover the iscsi target which won't add the CHAP info to the target. Then, when the host try to connect to the target, an Unable to connect to the ISCSI target should be occurred. I think we shouldn't use Dynamic Discovery at this case, we should dynamic create an iscsi target with CHAP info. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1437993 Title: VMware: attach an iscsi volume didn't consider the authentication condition Status in OpenStack Compute (Nova): New Bug description: When we use lioadm as cinder iscsi_helper, it will enable authentication to each volume at this line: https://github.com/openstack/cinder/blob/master/cinder/cmd/rtstool.py#L56 But, currently, VMware driver use Dynamic Discovery to discover the iscsi target which won't add the CHAP info to the target. Then, when the host try to connect to the target, an Unable to connect to the ISCSI target should be occurred. I think we shouldn't use Dynamic Discovery at this case, we should dynamic create an iscsi target with CHAP info. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1437993/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1431201] [NEW] kilo controller cann't conduct juno compute nodes
Public bug reported: When I tried to use kilo controller to conduct juno compute nodes, the juno nova-compute service start with the following two errors: 1. 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return self._update_available_resource(context, resources) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 272, in inner 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib64/python2.6/contextlib.py, line 34, in __exit__ 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup self.gen.throw(type, value, traceback) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 236, in lock 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup yield int_lock 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 272, in inner 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py, line 377, in _update_available_resource 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup self._sync_compute_node(context, resources) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py, line 388, in _sync_compute_node 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup compute_node_refs = service['compute_node'] 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup KeyError: 'compute_node' We can revert this commit to fix this error: https://github.com/openstack/nova/commit/83b64ceb871b1553b1bb1e0bb9270816db892552 2. 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/nova/rpc.py, line 111, in deserialize_entity 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return self._base.deserialize_entity(context, entity) 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/nova/objects/base.py, line 649, in deserialize_entity 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver entity = self._process_object(context, entity) 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/nova/objects/base.py, line 615, in _process_object 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver e.kwargs['supported']) 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/nova/conductor/api.py, line 217, in object_backport 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return self._manager.object_backport(context, objinst, target_version) 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py, line 358, in object_backport 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver target_version=target_version) 2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver File /usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py, line 152, in call We can revert this commit to fix this error: https://github.com/openstack/nova/commit/f287b75138129542436b2085d52d6fe201ca7e14 Andbody know is there something like gate keeper to make kilo controller can keep conducting the juno compute nodes ? Thanks! ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1431201 Title: kilo controller cann't conduct juno compute nodes Status in OpenStack Compute (Nova): New Bug description: When I tried to use kilo controller to conduct juno compute nodes, the juno nova-compute service start with the following two errors: 1. 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return self._update_available_resource(context, resources) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 272, in inner 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup return f(*args, **kwargs) 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup File /usr/lib64/python2.6/contextlib.py, line 34, in __exit__ 2015-03-10 06:37:18.525 18900
[Yahoo-eng-team] [Bug 1405044] [NEW] [GPFS] nova volume-attach a gpfs volume with an error log in nova-compute
Public bug reported: When I attached a gpfs volume to an instance, the volume has been successfully attached to the instance, but there were some error logs in nova-compute log file as below: 2014-12-22 21:52:10.863 13396 ERROR nova.openstack.common.threadgroup [-] Unexpected error while running command. Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 /gpfs/volume-98520c4e-935d-43d8-9c8d-00fcb54bb335 Exit code: 1 Stdout: u'' Stderr: u'BLKGETSIZE64: Inappropriate ioctl for device\n' 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup Traceback (most recent call last): 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py, line 125, in wait 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup x.wait() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py, line 47, in wait 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup return self.thread.wait() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/eventlet/greenthread.py, line 173, in wait 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup return self._exit_event.wait() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/eventlet/event.py, line 121, in wait 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/eventlet/hubs/hub.py, line 293, in switch 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup return self.greenlet.switch() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/eventlet/greenthread.py, line 212, in main 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/openstack/common/service.py, line 490, in run_service 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup service.start() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/service.py, line 181, in start 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/compute/manager.py, line 1159, in pre_start_hook 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_context()) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/compute/manager.py, line 6037, in update_available_resource 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup nodenames = set(self.driver.get_available_nodes()) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/driver.py, line 1237, in get_available_nodes 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup stats = self.get_host_stats(refresh=refresh) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 5794, in get_host_stats 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup return self.host_state.get_host_stats(refresh=refresh) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 473, in host_state 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup self._host_state = HostState(self) 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6360, in __init__ 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup self.update_status() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6411, in update_status 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup data['disk_available_least'] = _get_disk_available_least() 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup File /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6384, in _get_disk_available_least 2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup
[Yahoo-eng-team] [Bug 1383100] [NEW] Vmware: attach an iscsi volume to instance failed
10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/nova/virt/vmwareapi/driver.py, line 676, in _wait_for_task 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return self.wait_for_task(task_ref) 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/vmware/api.py, line 382, in wait_for_task 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return evt.wait() 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/eventlet/event.py, line 121, in wait 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return hubs.get_hub().switch() 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/eventlet/hubs/hub.py, line 293, in switch 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return self.greenlet.switch() 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/vmware/common/loopingcall.py, line 76, in _inner 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher self.f(*self.args, **self.kw) 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/vmware/api.py, line 423, in _poll_task 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher raise task_ex 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher VMwareDriverException: The virtual disk is either corrupted or not a supported format. 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher ** Affects: nova Importance: Undecided Assignee: Lan Qi song (lqslan) Status: New ** Tags: nova vmware ** Changed in: nova Assignee: (unassigned) = Lan Qi song (lqslan) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1383100 Title: Vmware: attach an iscsi volume to instance failed Status in OpenStack Compute (Nova): New Bug description: When I try to attach an iscsi volume which created by cinder lvm iscsi driver to an instance. I discovered the following two problems: 1. In current code base,When attaching an iscsi volume, it choose the adapter type the same way as the attachment of a VMDK volume : def _attach_volume_iscsi(self, connection_info, instance, mountpoint): .. (vmdk_file_path, adapter_type, disk_type) = vm_util.get_vmdk_path_and_adapter_type(hardware_devices) self.attach_disk_to_vm(vm_ref, instance, adapter_type, 'rdmp', device_name=device_name) Indeed, the adapter type should always be lsiLogicsas. It is easy to appear an odd scenario that an iscsi volume is attached on an IDE adapter. 2. The current code always choose to rescan the first host's HBA of a cluster. eg. You have two hosts in a cluster of a vcenter : host01 and host 02. If you want to attach an iscsi volume to an instance spawned in host02. The attach code should rescan the host02's HBA and discover the target. But, In fact the code is always rescan the host01's HBA: def _iscsi_rescan_hba(self, target_portal): Rescan the iSCSI HBA to discover iSCSI targets. host_mor = vm_util.get_host_ref(self._session, self._cluster) The host_mor always represent the first host. The following error may be produced: 2014-10-20 10:50:07.917 21540 ERROR oslo.messaging.rpc.dispatcher [req-bdf00be9-194f-474d-a61b-5c998c36bdea ] Exception during message handling: The virtual disk is either corrupted or not a supported format. 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 134, in _dispatch_and_reply 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher incoming.message)) 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 177, in _dispatch 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 123, in _do_dispatch 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) 2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher File /usr/lib/python2.6/site-packages/nova/compute/manager.py, line