[Yahoo-eng-team] [Bug 1511311] [NEW] L3 agent failed to respawn keepalived process

2015-10-29 Thread Lan Qi song
Public bug reported:

I enabled the l3 ha in neutron configuration, and I usually see the
following log in l3_agent.log:

2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] 
default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not 
found. The process should not have died
2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] 
respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b
2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to 
access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid 
get_value_from_file 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222
2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 
'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 
'keepalived', '-P', '-f', 
'/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf',
 '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid',  
'-r', 
'/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] 
create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84 
 

And I noticed that the counts of vrrp pid files were usually bigger than
the "pid" files:

root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc -l
664
root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l
677

And seems that if "pid.vrrp" file existed,  we can't successfully respawn  the 
keepalived process using this kind of command:
keepalived -P -f 
/var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf 
-p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r 
/var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp

So  I think in neutron,  after we checked that the pid is not active,
can we check the existence of "pid" file and "vrrp pid" file and remove
them before respawn the keepalived process to make sure the process can
be started successfully ?

https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1511311

Title:
  L3 agent failed to respawn keepalived process

Status in neutron:
  New

Bug description:
  I enabled the l3 ha in neutron configuration, and I usually see the
  following log in l3_agent.log:

  2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] 
default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not 
found. The process should not have died
  2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] 
respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b
  2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to 
access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid 
get_value_from_file 
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222
  2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running 
command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 
'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 
'keepalived', '-P', '-f', 
'/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf',
 '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid',  
'-r', 
'/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] 
create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84 
 

  And I noticed that the counts of vrrp pid files were usually bigger
  than the "pid" files:

  root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc 
-l
  664
  root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l
  677

  And seems that if "pid.vrrp" file existed,  we can't successfully respawn  
the keepalived process using this kind of command:
  keepalived -P -f 
/var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf 
-p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r 
/var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp

  So  I think in neutron,  after we checked that the pid is not active,
  can we check the existence of "pid" file and "vrrp pid" file and
  remove them before respawn the keepalived process to make sure the
  process can be started successfully ?

  
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1511311/+subscriptions

-- 
Mailing list: 

[Yahoo-eng-team] [Bug 1510796] [NEW] Function sync_routers always call "get_dvr_sync_data" in ha but not dvr scenario

2015-10-28 Thread Lan Qi song
Public bug reported:

The configuration 
neutron.conf:
[DEFAULT]

l3_ha = True
service_plugins = router


l3_agent.ini:

[DEFAULT]
...
agent_mode = legacy
...

The current code call  the "get_dvr_sync_data" through the plugin support dvr 
or not ,  it's better to judge "agent mode" here:
https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L535-L536

Call "get_sync_data" method here can save more time.

** Affects: neutron
 Importance: Undecided
 Assignee: ZongKai LI (lzklibj)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1510796

Title:
  Function sync_routers always call "get_dvr_sync_data" in ha but not
  dvr scenario

Status in neutron:
  New

Bug description:
  The configuration 
  neutron.conf:
  [DEFAULT]
  
  l3_ha = True
  service_plugins = router
  

  l3_agent.ini:

  [DEFAULT]
  ...
  agent_mode = legacy
  ...

  The current code call  the "get_dvr_sync_data" through the plugin support dvr 
or not ,  it's better to judge "agent mode" here:
  
https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L535-L536

  Call "get_sync_data" method here can save more time.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1510796/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1510399] [NEW] Error when metadata proxy is disabled for ha l3 agent

2015-10-27 Thread Lan Qi song
Public bug reported:

The configuration:

/etc/neutron/neutron.conf
[DEFAULT]

l3_ha = True


/etc/neutron/l3_agent.ini
[DEFAULT]

enable_metadata_proxy = False


There was an error in l3_agent's log:

2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent 
self.metadata_driver.destroy_monitored_metadata_proxy(
2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent AttributeError: 
'L3NATAgentWithStateReport' object has no attribute 'metadata_driver'

I think maybe we need a check before this method :

https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha.py#L126

** Affects: neutron
 Importance: High
 Assignee: Hong Hui Xiao (xiaohhui)
 Status: Confirmed


** Tags: l3-ha liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1510399

Title:
  Error when metadata proxy is disabled for ha l3 agent

Status in neutron:
  Confirmed

Bug description:
  The configuration:

  /etc/neutron/neutron.conf
  [DEFAULT]
  
  l3_ha = True
  

  /etc/neutron/l3_agent.ini
  [DEFAULT]
  
  enable_metadata_proxy = False
  

  There was an error in l3_agent's log:

  2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent 
self.metadata_driver.destroy_monitored_metadata_proxy(
  2015-10-27 01:18:46.833 5494 TRACE neutron.agent.l3.agent AttributeError: 
'L3NATAgentWithStateReport' object has no attribute 'metadata_driver'

  I think maybe we need a check before this method :

  https://github.com/openstack/neutron/blob/master/neutron/agent/l3/ha.py#L126

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1510399/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1510415] [NEW] Linuxbridge agent failed to create some bridges after the os was rebooted

2015-10-27 Thread Lan Qi song
Public bug reported:

When I rebooted the operation system which hosted the Linuxbridge agent
,  Linuxbridge agent will try to recreate all the bridges according to
the updated tap devices.

But I have found that some bridge were not created,  and I found the
weird log message:

In l3-agent.log:  The tap device seems already created at 2015-10-27
00:53:15

2015-10-27 00:53:15.002 5135 DEBUG neutron.agent.linux.utils [-] Running
command: ['sudo', '/usr/bin/neutron-rootwrap',
'/etc/neutron/rootwrap.conf', 'ip', 'link', 'add', 'tap5f17438b-6e',
'type', 'veth', 'peer', 'name', 'qr-5f17438b-6e', 'netns', 'qrouter-
f3f17112-aadc-4649-9662-6bf25cad569d'] create_process /usr/lib/python2.7
/dist-packages/neutron/agent/linux/utils.py:84

But in linuxbridge-agent.log:   the linuxbridge agent think the tap
device was not created yet and give up to create the related bridge at
2015-10-27 00:53:27

2015-10-27 00:53:27.774 5088 DEBUG 
neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent 
[req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673 ] Port tap5f17438b-6e added 
treat_devices_added_updated 
/usr/lib/python2.7/dist-packages/neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:865
2015-10-27 00:53:27.774 5088 INFO 
neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent 
[req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673] Device tap5f17438b-6e not defined on 
plugin

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1510415

Title:
  Linuxbridge agent failed to create some bridges after the os was
  rebooted

Status in neutron:
  New

Bug description:
  When I rebooted the operation system which hosted the Linuxbridge
  agent ,  Linuxbridge agent will try to recreate all the bridges
  according to the updated tap devices.

  But I have found that some bridge were not created,  and I found the
  weird log message:

  In l3-agent.log:  The tap device seems already created at 2015-10-27
  00:53:15

  2015-10-27 00:53:15.002 5135 DEBUG neutron.agent.linux.utils [-]
  Running command: ['sudo', '/usr/bin/neutron-rootwrap',
  '/etc/neutron/rootwrap.conf', 'ip', 'link', 'add', 'tap5f17438b-6e',
  'type', 'veth', 'peer', 'name', 'qr-5f17438b-6e', 'netns', 'qrouter-
  f3f17112-aadc-4649-9662-6bf25cad569d'] create_process
  /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84

  But in linuxbridge-agent.log:   the linuxbridge agent think the tap
  device was not created yet and give up to create the related bridge at
  2015-10-27 00:53:27

  2015-10-27 00:53:27.774 5088 DEBUG 
neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent 
[req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673 ] Port tap5f17438b-6e added 
treat_devices_added_updated 
/usr/lib/python2.7/dist-packages/neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:865
  2015-10-27 00:53:27.774 5088 INFO 
neutron.plugins.linuxbridge.agent.linuxbridge_neutron_agent 
[req-c156ba1e-aafb-4e4e-ae15-ba9384bb1673] Device tap5f17438b-6e not defined on 
plugin

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1510415/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1508789] [NEW] Linux bridge agent "get_bridge_for_tap_device" can be optimized

2015-10-21 Thread Lan Qi song
Public bug reported:

Currently,  when lb agent need to get bridge for a tap device,  it will
iterate all the bridges and all the tap devices on each bridge to check
which bridge the tap device bound to . It takes too much time.  code:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L210-L217

One better way should to  check it from tap device itself.   If tap
device bound to a bridge, there should has a link named "master" from
bridge folder to tap folder "/sys/class/net/tapxxx".   So we can check
the BRIDGEFS to get the related bridge.

One prototype of this change:

def get_bridge_for_tap_device(self, tap_device_name):
tap_master = BRIDGE_FS + tap_device_name + '/master'
if os.path.islink(tap_master)
return os.readlink(tap_master).split('/')[1]

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1508789

Title:
  Linux bridge agent "get_bridge_for_tap_device" can be optimized

Status in neutron:
  New

Bug description:
  Currently,  when lb agent need to get bridge for a tap device,  it
  will iterate all the bridges and all the tap devices on each bridge to
  check which bridge the tap device bound to . It takes too much time.
  code:

  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L210-L217

  One better way should to  check it from tap device itself.   If tap
  device bound to a bridge, there should has a link named "master" from
  bridge folder to tap folder "/sys/class/net/tapxxx".   So we can check
  the BRIDGEFS to get the related bridge.

  One prototype of this change:

  def get_bridge_for_tap_device(self, tap_device_name):
  tap_master = BRIDGE_FS + tap_device_name + '/master'
  if os.path.islink(tap_master)
  return os.readlink(tap_master).split('/')[1]

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1508789/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1508270] [NEW] One log level for linux bridge agent was no right

2015-10-20 Thread Lan Qi song
Public bug reported:

This log message: 
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L975-L977

would better to be "WARNING" or "ERROR" level,  currently is  just
"DEBUG".

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1508270

Title:
  One log level for linux bridge agent was no right

Status in neutron:
  New

Bug description:
  This log message: 
  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py#L975-L977

  would better to be "WARNING" or "ERROR" level,  currently is  just
  "DEBUG".

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1508270/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1499177] [NEW] Performance: L2 agent takes too much time to refresh sg rules

2015-09-24 Thread Lan Qi song
Public bug reported:

This issue is introducing a performance problem for the L2 agent
including LinuxBridge and OVS agent in Compute node when there are lots
of networks and instances in this Compute node (eg. 500 instances)

The performance problem reflect in two aspects:

1. When LinuxBridge agent service starts up(this seems only happened in
LinuxBridge agent not for the OVS agent), I found there were two methods
take too much time:

   1.1 get_interface_by_ip(),  we should find the interface which was
assigned with the "local ip" defined in configuration file, and to check
whether this interface support "vxlan" or not.  This method will iterate
all the interface in this compute node and execute "ip link show
[interface] to [local  ip]" to judge the result.  I think there should
be a faster way.

   1.2 prepare_port_filter() ,  in this method ,  we should make sure
the ipset are create correctly. But this method will execute too much
"ipset" commands and take too much time.

2. When devices' sg rules are changed,  L2 agent should refresh the
firewalls.

2.1 refresh_firewall() this method will call "modify_rules" to make
the rules predicable, but this method also takes too much time.

It will be very benefit for the large scales of networks if this
performance problem can be fix or optimize.

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  This issue is introducing a performance problem for the L2 agent
  including LinuxBridge and OVS agent in Compute node when there are lots
  of networks and instances in this Compute node (eg. 500 instances)
  
  The performance problem reflect in two aspects:
  
  1. When LinuxBridge agent service starts up(this seems only happened in
  LinuxBridge agent not for the OVS agent), I found there were two methods
  take too much time:
  
-1.1 get_interface_by_ip(),  we should find the interface which was
+    1.1 get_interface_by_ip(),  we should find the interface which was
  assigned with the "local ip" defined in configuration file, and to check
  whether this interface support "vxlan" or not.  This method will iterate
  all the interface in this compute node and execute "ip link show
  [interface] to [local  ip]" to judge the result.  I think that should be
  a faster way.
  
-1.2 prepare_port_filter() ,  in this method ,  we should make sure
+    1.2 prepare_port_filter() ,  in this method ,  we should make sure
  the ipset are create correctly. But this method will execute too much
  "ipset" commands and take too much time.
  
+ 2. When devices' sg rules are changed,  L2 agent should refresh the
+ firewalls.
  
- 2. When devices' sg rules are changed,  L2 agent should refresh the firewalls.
- 
- 2.1 refresh_firewall() this method will call "modify_rules" to make
+ 2.1 refresh_firewall() this method will call "modify_rules" to make
  the rules predicable, but this method also takes too much time.
  
  It will be very benefit for the large scales of networks if this
  performance problem can be fix or optimize.
- 
- 
- If this kind of performance problem

** Description changed:

  This issue is introducing a performance problem for the L2 agent
  including LinuxBridge and OVS agent in Compute node when there are lots
  of networks and instances in this Compute node (eg. 500 instances)
  
  The performance problem reflect in two aspects:
  
  1. When LinuxBridge agent service starts up(this seems only happened in
  LinuxBridge agent not for the OVS agent), I found there were two methods
  take too much time:
  
     1.1 get_interface_by_ip(),  we should find the interface which was
  assigned with the "local ip" defined in configuration file, and to check
  whether this interface support "vxlan" or not.  This method will iterate
  all the interface in this compute node and execute "ip link show
- [interface] to [local  ip]" to judge the result.  I think that should be
- a faster way.
+ [interface] to [local  ip]" to judge the result.  I think there should
+ be a faster way.
  
     1.2 prepare_port_filter() ,  in this method ,  we should make sure
  the ipset are create correctly. But this method will execute too much
  "ipset" commands and take too much time.
  
  2. When devices' sg rules are changed,  L2 agent should refresh the
  firewalls.
  
  2.1 refresh_firewall() this method will call "modify_rules" to make
  the rules predicable, but this method also takes too much time.
  
  It will be very benefit for the large scales of networks if this
  performance problem can be fix or optimize.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1499177

Title:
  Performance: L2 agent takes too much time to refresh sg rules

Status in neutron:
  New

Bug description:
  This issue is introducing a performance problem for the L2 agent
  including LinuxBridge and OVS agent in Compute node when 

[Yahoo-eng-team] [Bug 1447884] [NEW] Boot from volume, block device allocate timeout cause VM error, but volume would be available later

2015-04-23 Thread Lan Qi song
Public bug reported:

When we  try to boot multi instances from volume (with a  large image
source)  at the same time,  we usually got a block device allocate error
as the logs in nova-compute.log:

2015-03-30 23:22:46.920 6445 WARNING nova.compute.manager [-] Volume id: 
551ea616-e1c4-4ef2-9bf3-b0ca6d4474dc finished being created but was not set as 
'available'
2015-03-30 23:22:47.131 6445 ERROR nova.compute.manager [-] [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] Instance failed block device setup
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] Traceback (most recent call last):
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/compute/manager.py, line 1829, in 
_prep_block_device
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach) +
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 407, in 
attach_block_devices
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] map(_log_and_attach, 
block_device_mapping)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 405, in 
_log_and_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] bdm.attach(*attach_args, 
**attach_kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 339, in 
attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] do_check_attach=do_check_attach)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 46, in 
wrapped
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] ret_val = method(obj, context, *args, 
**kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/virt/block_device.py, line 229, in 
attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] volume_api.check_attach(context, 
volume, instance=instance)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]   File 
/usr/lib/python2.6/site-packages/nova/volume/cinder.py, line 305, in 
check_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] raise 
exception.InvalidVolume(reason=msg)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6] InvalidVolume: Invalid volume: status 
must be 'available'
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 
483472b2-61b3-4574-95e2-8cd0304f90f6]

This error cause the VM in error status:
+--+++--+-+--+
| ID   | Name   | Status | Task State   
| Power State | Networks |
+--+++--+-+--+
| 1fa2d7aa-8bd9-4a22-8538-0a07d9dae8aa | inst02 | ERROR  | 
block_device_mapping | NOSTATE |  |
+--+++--+-+--+
But the volume was in available status:
---+
|  ID  |   Status  | Name | Size | Volume Type 
| Bootable | Attached to  |
+--+---+--+--+-+--+--+
| a9ab2dc2-b117-44ef-8678-f71067a9e770 | available | None |  2   | None
|   true   |  |
+--+---+--+--+-+--+--+
+--+++--+-+---


And when we teminiate this VM, the volume will still exist,  since there is no 
volume attachment info stored in this VM.

This can be easily reproduced:
1. add the following options  to nova.conf  in compute node ( make sure the 
error 

[Yahoo-eng-team] [Bug 1437993] [NEW] VMware: attach an iscsi volume didn't consider the authentication condition

2015-03-29 Thread Lan Qi song
Public bug reported:

When we use lioadm  as cinder iscsi_helper,  it will enable
authentication to each volume at this line:

https://github.com/openstack/cinder/blob/master/cinder/cmd/rtstool.py#L56

But, currently, VMware driver use Dynamic Discovery to discover the
iscsi target which won't add the CHAP info to the target. Then, when the
host try to connect to the target, an Unable to connect to the ISCSI
target should be occurred.

I think we shouldn't use Dynamic Discovery at this case, we should
dynamic create an iscsi target with CHAP info.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1437993

Title:
  VMware: attach an iscsi volume didn't consider the authentication
  condition

Status in OpenStack Compute (Nova):
  New

Bug description:
  When we use lioadm  as cinder iscsi_helper,  it will enable
  authentication to each volume at this line:

  https://github.com/openstack/cinder/blob/master/cinder/cmd/rtstool.py#L56

  But, currently, VMware driver use Dynamic Discovery to discover the
  iscsi target which won't add the CHAP info to the target. Then, when
  the host try to connect to the target, an Unable to connect to the
  ISCSI target should be occurred.

  I think we shouldn't use Dynamic Discovery at this case, we should
  dynamic create an iscsi target with CHAP info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1437993/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1431201] [NEW] kilo controller cann't conduct juno compute nodes

2015-03-12 Thread Lan Qi song
Public bug reported:

When I tried to use kilo controller to conduct juno compute nodes,  the
juno nova-compute service start with the following two errors:

1. 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
return self._update_available_resource(context, resources)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 
272, in inner
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
return f(*args, **kwargs)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib64/python2.6/contextlib.py, line 34, in __exit__
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
self.gen.throw(type, value, traceback)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 
236, in lock
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup yield 
int_lock
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 
272, in inner
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
return f(*args, **kwargs)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py, line 377, 
in _update_available_resource
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
self._sync_compute_node(context, resources)
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/compute/resource_tracker.py, line 388, 
in _sync_compute_node
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
compute_node_refs = service['compute_node']
2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup KeyError: 
'compute_node'


We can revert this commit to fix this error: 
https://github.com/openstack/nova/commit/83b64ceb871b1553b1bb1e0bb9270816db892552

2.  2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/nova/rpc.py, line 111, in deserialize_entity
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return 
self._base.deserialize_entity(context, entity)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/nova/objects/base.py, line 649, in 
deserialize_entity
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver entity = 
self._process_object(context, entity)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/nova/objects/base.py, line 615, in 
_process_object
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver 
e.kwargs['supported'])
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/nova/conductor/api.py, line 217, in 
object_backport
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver return 
self._manager.object_backport(context, objinst, target_version)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py, line 358, in 
object_backport
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver 
target_version=target_version)
2015-03-10 06:41:29.388 19336 TRACE nova.virt.libvirt.driver   File 
/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py, line 152, in 
call

We can revert this commit to fix this error:
https://github.com/openstack/nova/commit/f287b75138129542436b2085d52d6fe201ca7e14


Andbody  know is there  something like gate keeper to make kilo controller can 
keep conducting the juno compute nodes ? Thanks!

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1431201

Title:
  kilo controller cann't conduct juno compute nodes

Status in OpenStack Compute (Nova):
  New

Bug description:
  When I tried to use kilo controller to conduct juno compute nodes,
  the juno nova-compute service start with the following two errors:

  1. 2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
return self._update_available_resource(context, resources)
  2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py, line 
272, in inner
  2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup 
return f(*args, **kwargs)
  2015-03-10 06:37:18.525 18900 TRACE nova.openstack.common.threadgroup   File 
/usr/lib64/python2.6/contextlib.py, line 34, in __exit__
  2015-03-10 06:37:18.525 18900 

[Yahoo-eng-team] [Bug 1405044] [NEW] [GPFS] nova volume-attach a gpfs volume with an error log in nova-compute

2014-12-22 Thread Lan Qi song
Public bug reported:

When I attached a gpfs volume to an instance, the volume has been
successfully attached to the instance, but  there were some error logs
in nova-compute log file as below:

2014-12-22 21:52:10.863 13396 ERROR nova.openstack.common.threadgroup [-] 
Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev --getsize64 
/gpfs/volume-98520c4e-935d-43d8-9c8d-00fcb54bb335
Exit code: 1
Stdout: u''
Stderr: u'BLKGETSIZE64: Inappropriate ioctl for device\n'
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup Traceback 
(most recent call last):
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py, line 
125, in wait
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
x.wait()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py, line 
47, in wait
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
return self.thread.wait()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/eventlet/greenthread.py, line 173, in wait
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
return self._exit_event.wait()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/eventlet/event.py, line 121, in wait
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
return hubs.get_hub().switch()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py, line 293, in switch
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
return self.greenlet.switch()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/eventlet/greenthread.py, line 212, in main
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
result = function(*args, **kwargs)
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/openstack/common/service.py, line 490, 
in run_service
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
service.start()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/service.py, line 181, in start
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
self.manager.pre_start_hook()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/compute/manager.py, line 1159, in 
pre_start_hook
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
self.update_available_resource(nova.context.get_admin_context())
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/compute/manager.py, line 6037, in 
update_available_resource
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
nodenames = set(self.driver.get_available_nodes())
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/driver.py, line 1237, in 
get_available_nodes
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup stats 
= self.get_host_stats(refresh=refresh)
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 5794, in 
get_host_stats
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
return self.host_state.get_host_stats(refresh=refresh)
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 473, in 
host_state
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
self._host_state = HostState(self)
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6360, in 
__init__
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
self.update_status()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6411, in 
update_status
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 
data['disk_available_least'] = _get_disk_available_least()
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup   File 
/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py, line 6384, in 
_get_disk_available_least
2014-12-22 21:52:10.863 13396 TRACE nova.openstack.common.threadgroup 

[Yahoo-eng-team] [Bug 1383100] [NEW] Vmware: attach an iscsi volume to instance failed

2014-10-20 Thread Lan Qi song
 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/driver.py, line 676, in 
_wait_for_task
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return 
self.wait_for_task(task_ref)
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/vmware/api.py, line 382, in 
wait_for_task
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return 
evt.wait()
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/eventlet/event.py, line 121, in wait
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return 
hubs.get_hub().switch()
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py, line 293, in switch
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return 
self.greenlet.switch()
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/vmware/common/loopingcall.py, line 76, 
in _inner
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher 
self.f(*self.args, **self.kw)
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/vmware/api.py, line 423, in _poll_task
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher raise 
task_ex
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher 
VMwareDriverException: The virtual disk is either corrupted or not a supported 
format.
2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher

** Affects: nova
 Importance: Undecided
 Assignee: Lan Qi song (lqslan)
 Status: New


** Tags: nova vmware

** Changed in: nova
 Assignee: (unassigned) = Lan Qi song (lqslan)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1383100

Title:
  Vmware: attach an iscsi volume to instance failed

Status in OpenStack Compute (Nova):
  New

Bug description:
  When I try to attach an iscsi volume which created by cinder lvm iscsi
  driver to an instance.  I discovered the following two problems:

  1. In current code base,When attaching an iscsi volume,  it choose the 
adapter type the same way as the attachment of  a VMDK volume :
 def _attach_volume_iscsi(self, connection_info, instance, mountpoint):
  ..
  (vmdk_file_path, adapter_type,
   disk_type) = vm_util.get_vmdk_path_and_adapter_type(hardware_devices)

  self.attach_disk_to_vm(vm_ref, instance,
 adapter_type, 'rdmp',
 device_name=device_name)

  Indeed,   the adapter type should always be lsiLogicsas.  It is easy
  to  appear an odd scenario that an iscsi volume is attached on an IDE
  adapter.

  
  2. The current code always  choose to rescan the first host's HBA of a 
cluster.

  eg. You have two hosts in a  cluster of a vcenter :  host01  and host
  02.  If you want to attach an iscsi volume to an instance spawned in
  host02.  The attach code should rescan the host02's HBA and discover
  the target. But, In fact the code is always rescan the host01's HBA:

  def _iscsi_rescan_hba(self, target_portal):
  Rescan the iSCSI HBA to discover iSCSI targets.
  host_mor = vm_util.get_host_ref(self._session, self._cluster)

  The host_mor  always represent the first host.  The following error
  may be produced:

  2014-10-20 10:50:07.917 21540 ERROR oslo.messaging.rpc.dispatcher 
[req-bdf00be9-194f-474d-a61b-5c998c36bdea ] Exception during message handling: 
The virtual disk is either corrupted or not a supported format.
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher Traceback 
(most recent call last):
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 134, 
in _dispatch_and_reply
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher 
incoming.message))
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 177, 
in _dispatch
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py, line 123, 
in _do_dispatch
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher result 
= getattr(endpoint, method)(ctxt, **new_args)
  2014-10-20 10:50:07.917 21540 TRACE oslo.messaging.rpc.dispatcher   File 
/usr/lib/python2.6/site-packages/nova/compute/manager.py, line