Hi,

I'm running Havana, and I just tried a testcase involving doing six 
simultaneous live-migrations.

It appears that the migrations succeeded, but two of the instances got stuck 
with a status of "MIGRATING" because of RPC timeouts:

2014-09-16 20:35:07.376 12493 INFO nova.notifier [-] processing ERROR 
_post_live_migration
2014-09-16 20:35:07.390 12493 INFO nova.openstack.common.rpc.common [-] 
Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:07.396 12493 ERROR nova.openstack.common.loopingcall [-] in 
fixed duration looping call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall Traceback 
(most recent call last):
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/loopingcall.py", 
line 78, in _inner
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4874, 
in wait_for_live_migration
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/exception.py", line 90, in wrapped
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/exception.py", line 73, in wrapped
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 4558, in 
_post_live_migration
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/compute/rpcapi.py", line 517, in 
post_live_migration_at_destination
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 85, in call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 63, in _invoke
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/proxy.py", line 
131, in call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall Timeout: 
Timeout while waiting on RPC response - topic: "compute.compute-0", RPC method: 
"post_live_migration_at_destination" info: "<unknown>"


Looking at the migration destination compute node, I see it in turn getting 
stuck on an RPC timeout:


2014-09-16 20:35:32.216 12510 INFO nova.notifier 
[req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 
02bf771ab40f4ecea6fe42135c5a09bc] processing ERROR 
post_live_migration_at_destination
2014-09-16 20:35:32.247 12510 INFO nova.openstack.common.rpc.common 
[req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 
02bf771ab40f4ecea6fe42135c5a09bc] Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:32.248 12510 INFO nova.openstack.common.rpc.common 
[req-2f7bd481-9aed-4d66-97dd-3075407282dd 24a43342a5ae4e31bd431aef2d395ebc 
02bf771ab40f4ecea6fe42135c5a09bc] Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:32.270 12510 ERROR nova.openstack.common.rpc.amqp 
[req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 
02bf771ab40f4ecea6fe42135c5a09bc] Exception during message handling
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp Traceback 
(most recent call last):
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/amqp.py", line 
466, in _process_data
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/dispatcher.py", 
line 172, in dispatch
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/exception.py", line 90, in wrapped
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/exception.py", line 73, in wrapped
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 4647, in 
post_live_migration_at_destination
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5065, 
in post_live_migration_at_destination
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3709, 
in to_xml
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3465, 
in get_guest_config
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 383, in 
instance_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 498, in 
_instance_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/lockutils.py", line 
246, in inner
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/compute/resource_tracker.py", line 
601, in update_usage
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/compute/resource_tracker.py", line 
827, in _update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/conductor/api.py", line 263, in 
compute_node_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/conductor/rpcapi.py", line 386, in 
compute_node_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 85, in call
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 63, in _invoke
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File 
"./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/proxy.py", line 
131, in call
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp Timeout: 
Timeout while waiting on RPC response - topic: "conductor", RPC method: 
"compute_node_update" info: "<unknown>"


Looking at nova-conductor, there's nothing interesting in the logs around this 
time.


Looking at the rabbitmq logs at /var/log/rabbitmq/[email protected], there's 
nothing particularly interesting.
It didn't do a switchover or restart or anything like that, I just see messages 
like this:

=INFO REPORT==== 16-Sep-2014::20:34:37 ===
closing AMQP connection <0.19196.1> (192.168.204.208:59984 -> 
192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:34:38 ===
accepting AMQP connection <0.10878.2> (192.168.204.208:33566 -> 
192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:34:38 ===
accepting AMQP connection <0.10888.2> (192.168.204.208:33567 -> 
192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
closing AMQP connection <0.10798.2> (192.168.204.136:49498 -> 
192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
closing AMQP connection <0.10788.2> (192.168.204.136:49497 -> 
192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
accepting AMQP connection <0.11364.2> (192.168.204.136:49538 -> 
192.168.204.2:5672)


So...how do I go about figuring out what's causing these RPC timeouts?


Thanks,
Chris

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to