Re: [libvirt-users] concurrent migration of several domains rarely fails

2018-12-07 Thread Jim Fehlig

On 12/6/18 10:12 AM, Lentes, Bernd wrote:



Hi,

i have a two-node cluster with several domains as resources. During testing i
tried several times to migrate some domains concurrently.
Usually it suceeded, but rarely it failed. I found one clue in the log:

Dec 03 16:03:02 ha-idg-1 libvirtd[3252]: 2018-12-03 15:03:02.758+: 3252:
error : virKeepAliveTimerInternal:143 : internal error: connection closed due
to keepalive timeout

The domains are configured similar:
primitive vm_geneious VirtualDomain \
params config="/mnt/san/share/config.xml" \
params hypervisor="qemu:///system" \
params migration_transport=ssh \
op start interval=0 timeout=120 trace_ra=1 \
op stop interval=0 timeout=130 trace_ra=1 \
op monitor interval=30 timeout=25 trace_ra=1 \
op migrate_from interval=0 timeout=300 trace_ra=1 \
op migrate_to interval=0 timeout=300 trace_ra=1 \
meta allow-migrate=true target-role=Started is-managed=true \
utilization cpu=2 hv_memory=8000

What is the algorithm to discover the port used for live migration ?
I have the impression that "params migration_transport=ssh" is worthless, port
22 isn't involved for live migration.
My experience is that for the migration tcp ports > 49151 are used. But the
exact procedure isn't clear for me.
Does live migration uses first tcp port 49152 and for each following domain one
port higher ?
E.g. for the concurrent live migration of three domains 49152, 49153 and 49154.

Why does live migration for three domains usually succeed, although on both
hosts just 49152 and 49153 is open ?
Is the migration not really concurrent, but sometimes sequential ?

Bernd


Hi,

i tried to narrow down the problem.
My first assumption was that something with the network between the hosts is 
not ok.
I opened port 49152 - 49172 in the firewall - problem persisted.
So i deactivated the firewall on both nodes - problem persisted.

Then i wanted to exclude the HA-Cluster software (pacemaker).
I unmanaged the VirtualDomains in pacemaker and migrated them with virsh - 
problem persists.

I wrote a script to migrate three domains sequentially from host A to host B 
and vice versa via virsh.
I raised up the loglevel from libvirtd and found s.th. in the log which may be 
the culprit:

This is the output of my script:

Thu Dec  6 17:02:53 CET 2018
migrate sim
Migration: [100 %]
Thu Dec  6 17:03:07 CET 2018
migrate geneious
Migration: [100 %]
Thu Dec  6 17:03:16 CET 2018
migrate mausdb
Migration: [ 99 %]error: operation failed: migration job: unexpectedly failed
<= error !

Thu Dec  6 17:05:32 CET 2018  < time of error
Guests on ha-idg-1: \n
  IdName   State

  1 simrunning
  2 geneious   running
  - mausdb shut off

migrate to ha-idg-2\n
Thu Dec  6 17:05:32 CET 2018

This is what journalctl told:

Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+: 12553: 
info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 
client=0x55b2bb930d50 countToDeath=0 idle=30
Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+: 12553: 
error : virKeepAliveTimerInternal:143 : internal error: connection closed due 
to keepalive timeout
Dec 06 17:05:32 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:32.481+: 12553: 
info : virObjectUnref:259 : OBJECT_UNREF: obj=0x55b2bb937740

Dec 06 17:05:27 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:27.476+: 12553: 
info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 
client=0x55b2bb930d50 countToDeath=1 idle=25
Dec 06 17:05:27 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:27.476+: 12553: 
info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 
client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1

Dec 06 17:05:22 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:22.471+: 12553: 
info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 
client=0x55b2bb930d50 countToDeath=2 idle=20
Dec 06 17:05:22 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:22.471+: 12553: 
info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 
client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1

Dec 06 17:05:17 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:17.466+: 12553: 
info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 
client=0x55b2bb930d50 countToDeath=3 idle=15
Dec 06 17:05:17 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:17.466+: 12553: 
info : virKeepAliveMessage:107 : RPC_KEEPALIVE_SEND: ka=0x55b2bb937740 
client=0x55b2bb930d50 prog=1801807216 vers=1 proc=1

Dec 06 17:05:12 ha-idg-1 libvirtd[12553]: 2018-12-06 16:05:12.460+: 12553: 
info : virKeepAliveTimerInternal:136 : RPC_KEEPALIVE_TIMEOUT: ka=0x55b2bb937740 
client=0x55b2bb930d50 countToDeath=4 idle=10
Dec 06 

[libvirt-users] Usable and non-usable CPU models in nested virtualization

2018-12-07 Thread Milan Zamazal
Hi, some custom CPU models are reported from
virConnectGetDomainCapabilities as usable='yes' on a physical machine
while as usable='no' inside a VM running on the same machine.  That's
not completely surprising.

But what surprises me is that those models are still reported from
virConnectCompareCPU as supported (VIR_CPU_COMPARE_SUPERSET) in the
nested environment and VMs can be started happily with them.

For instance, virConnectGetDomainCapabilities reports

  Skylake-Client

but when I try to use that model anyway, the VM starts fine with it:

  
Skylake-Client




  

  

That's actually good news, but unexpected.  Do I miss something?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users