Public bug reported: It looks like during live migration some generated port to use for live migration didn't checked for being already used and/or didn't have a re- get new port if the old one got occupied by someone else.
Here is an example of this behavior in nova-compute log files from the source compute node: 2015-09-20T06:25:21.701157+00:00 info: 2015-09-20 06:25:21.700 17037 INFO nova.virt.libvirt.driver [-] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Instance spawned successfully. 2015-09-20T06:25:21.828941+00:00 info: 2015-09-20 06:25:21.828 17037 INFO nova.compute.manager [req-8fdf447a-48c4-4b41-8276-9459ae9e5a65 - - - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] VM Resumed (Lifecycle Event) 2015-09-20T06:25:37.349069+00:00 err: 2015-09-20 06:25:37.348 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Live Migration failure: internal error: early end of file from monitor: possible problem: 2015-09-20T06:25:37.116947Z qemu-system-x86_64: -incoming tcp:[::]:49152: Failed to bind socket: Address already in use 2015-09-20T06:25:37.354837+00:00 info: 2015-09-20 06:25:37.354 17037 INFO nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration running for 0 secs, memory 0% remaining; (bytes processed=0, remaining=0, total=0) 2015-09-20T06:25:37.856147+00:00 err: 2015-09-20 06:25:37.855 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration operation has aborted Some env description: root@node-169:~# nova-compute --version 2015.1.1 root@node-169:~# dpkg -l |grep 'nova-compute '|awk '{print $3}' 1:2015.1.1-1~u14.04+mos19662 Steps to reproduce: Actually this happens during rally testing of pretty big env (~200 nodes) one per 200 iterations so chances for getting that on scale are pretty big. So it should be easily reproduced under following circumastances: 1. Very high rate of migrations. 2. A lot of running VMs/other services with large amount of used TCP ports. Both of these statements will lead to the higher chances of getting collision for qemu migration port allocation procedure. ** Affects: mos Importance: Undecided Status: New ** Affects: nova Importance: Undecided Status: New ** Tags: scale ** Project changed: nova => mos ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1498196 Title: Live migration's assigned ports conflicts Status in Mirantis OpenStack: New Status in OpenStack Compute (nova): New Bug description: It looks like during live migration some generated port to use for live migration didn't checked for being already used and/or didn't have a re-get new port if the old one got occupied by someone else. Here is an example of this behavior in nova-compute log files from the source compute node: 2015-09-20T06:25:21.701157+00:00 info: 2015-09-20 06:25:21.700 17037 INFO nova.virt.libvirt.driver [-] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Instance spawned successfully. 2015-09-20T06:25:21.828941+00:00 info: 2015-09-20 06:25:21.828 17037 INFO nova.compute.manager [req-8fdf447a-48c4-4b41-8276-9459ae9e5a65 - - - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] VM Resumed (Lifecycle Event) 2015-09-20T06:25:37.349069+00:00 err: 2015-09-20 06:25:37.348 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Live Migration failure: internal error: early end of file from monitor: possible problem: 2015-09-20T06:25:37.116947Z qemu-system-x86_64: -incoming tcp:[::]:49152: Failed to bind socket: Address already in use 2015-09-20T06:25:37.354837+00:00 info: 2015-09-20 06:25:37.354 17037 INFO nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration running for 0 secs, memory 0% remaining; (bytes processed=0, remaining=0, total=0) 2015-09-20T06:25:37.856147+00:00 err: 2015-09-20 06:25:37.855 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration operation has aborted Some env description: root@node-169:~# nova-compute --version 2015.1.1 root@node-169:~# dpkg -l |grep 'nova-compute '|awk '{print $3}' 1:2015.1.1-1~u14.04+mos19662 Steps to reproduce: Actually this happens during rally testing of pretty big env (~200 nodes) one per 200 iterations so chances for getting that on scale are pretty big. So it should be easily reproduced under following circumastances: 1. Very high rate of migrations. 2. A lot of running VMs/other services with large amount of used TCP ports. Both of these statements will lead to the higher chances of getting collision for qemu migration port allocation procedure. To manage notifications about this bug go to: https://bugs.launchpad.net/mos/+bug/1498196/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp