Re: Disable virtlockd

2020-06-09 Thread Felix Queißner
> Yes. via the 'lock_manager' option in /etc/libvirt/qemu.conf
Okay, i'll read docs on this!

> 
>> or VM file locking? I
> 
> If you mean the image locking provided by qemu, then no, libvirt doesn't
> have provisions to disable it.
> 
>> start qemu with a -snapshot option which prevents and changes to the
>> disk image anyways.
> 
> Note that starting from libvirt-5.10 and qemu-4.2 any VM hacking in
> -snapshot via qemu:arg will break as '-snapshot' is not supported
> together with -blockdev which is used to configure disks.
> 
> This is because libvirt never really supported -snapshot.
> 
> Said that I still plan to add support for  disks but I'm
> currently busy with other features.
Okay, thanks for the information!


> You didn't really describe the problem though. Are you trying to share
> the disk images between multiple VMs?
> 
> Either way, another option is to add qcow2 overlay images which will
> capture the writes and discard them after the VM is turned off.
Yeah, the problem domain is this:
I have ~20 computer sharing the same properties (hardware up to device
revisions) and setup (as in hypervisor-os and user interaction).

They all require the exact same OS to be run AND require to be immutable
(so rebooting will reset them into the same state as before)

My current approach is having a disk image on NFS which i start with
libvirt+qemu and -snapshot. It does exactly what i need and if this is
not supported anymore i need a way to replace the behaviour (which is
"read from disk file, and cache+discard writes to the original image)

Using a distinct qcow2 image backed by the original image in /tmp should
work as well, if i understand right?

Regards
Felix



Re: No outbound connectivity from guest VM(fedora 32)

2020-06-09 Thread Laine Stump

On 6/8/20 8:55 AM, Justin Stephenson wrote:

On Mon, Jun 8, 2020 at 5:09 AM Daniel P. Berrangé  wrote:


On Fri, Jun 05, 2020 at 01:27:08PM -0400, Justin Stephenson wrote:

Hi,

I recently installed a fresh install of Fedora 32 and I am having
trouble with my virtual machine networking, I can ssh and connect into
my guest VMs from my host, but the guest VMs cannot ping out to the
internet.

I am using the "default" NAT virtual network, the interesting thing is
I have made no configuration changes on my host or in the guest VMs,
simply created and installed two VMs(Fedora and RHEL8) in Fedora where
the VMs are having the same issue.

I am happy to provide any logs or command output if that would help.


Do you have "podman" installed on your host ? As there is an issue
with podman loading "br_netfilter" which is harming libvirt default
network traffic..


Hi, yes I am using podman for some development tasks. However I don't
see any br_netfilter module loaded:

  # lsmod | grep br_netfilter
  # grep 'netfilter' /proc/modules

I'm not sure if it matters but my host laptop is also connected wirelessly.


Since it's not the "problem du jour" with F32, here's a few other things 
you can try:


1) Try "systemctl restart libvirtd.service" (which reloads libvirt's 
iptables rules), and then start the VM again to see if the problem is 
solved. (If this fixes it, then something that is starting after 
libvirtd.service is adding a firewall rule that blocks the outbound 
guest traffic)


2) You say this was a fresh install of F32. Have yourun dnf update to 
make sure you have all post-release updates to libvirt and firewalld 
packages? If not, try that first.


(BTW, can you ssh from guest to host?)

3) see if you can ping from the guest to the outside network. If you can 
ping but can't ssh, then again there is a firewall problem. make sure 
the libvirt zone exists in firewalld config, and that virbr0 is a part 
of that zone. (aside from allowing inbound dns, dhcp and ssh from guests 
to the host, the libvirt zone has a default "ACCEPT" policy, which will 
allow packets to be forwarded from the guest through the host. If virbr0 
is on a different zone, then the default policy won't be ACCEPT, and 
forwarded traffic will be rejected. all libvirt networks are put into 
firewalld's "libvirt" zone by default, so this should always be the case)


Beyond those suggestions, I'm not sure what else to recommend, other 
than that you might get a quicker response on troubleshooting like this 
by logging into irc.oftc.net and joining the #virt channel :-)




Re: Permission to disk set wrong when restoring from memory snapshot?

2020-06-09 Thread Liran Rotenberg
On Tue, Jun 9, 2020 at 4:14 PM Peter Krempa  wrote:

> On Tue, Jun 09, 2020 at 16:00:02 +0300, Liran Rotenberg wrote:
> > On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa  wrote:
> >
> > > On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > > > Hi all,
> > > > Passing on Bug 1840609 <
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > > > - Wake up from hibernation failed:internal error: unable to execute
> QEMU
> > > > command 'cont': Failed to get "write" lock.
> > > >
> > > > In Ovirt/RHV there is a specific flow that prevents the VM from
> starting
> > > on
> > > > the first host.
> > > > The result is:
> > > > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write" lock
> > > > Is another process using the image
> > > >
> > >
> [/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > > > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from
> pid
> > > > 177876 ()
> > >
> > > This error comes from qemu's internal file locking. It usually means
> > > that there is another qemu or qemu-img which has the given image open.
> > >
> > > Is there anything which would access the image at that specific time or
> > > slightly around?
> > >
> > I don't think so. The volumes are created and added to the volume chain
> on
> > the VM metadata domxml(two snapshots created). Then the user restores the
> > latest snapshot and deletes them (while the VM is down) - they are
> removed.
> > The VM is set with the volume and going up, restoring the memory. The
> > mounting place (in /rhev/data-center) points to the same disk and volume.
> > On the first run I see the new place /rhev/data-center/... that I
> > can't tell why or where it comes from. It is set with 'rw', while the
> > normal destination to the shared NFS is only with 'r'.
>
> That's definitely something not related to the locking itself.
>
> Please attach the XML document used to start the VM in both places
> (working and non-working). I can't tell if there's a difference without
> seeing those.
>
> The difference might very well be in the XMLs.
>
I added a snippet from the engine log. The two first domxmls are from the
first attempt on 'host_mixed_1' resulting in:
"unable to execute QEMU command 'cont': Failed to get "write" lock."
>From this point, it's the second attempt on 'host_mixed_2' (2020-06-09
15:13:00,308+03 in log time).

>
> > > It might be a race condition from something trying to modify the image
> > > perhaps combined with propagation of locks via NFS.
> > >
> > I can't see any race from the RHV point of view. The strangest thing is
> why
> > it works on the second host?
> > In RHV after the VM is up we remove the memory disk, but that doesn't
> > happen in this case, since the VM wasn't up.
>
> I'd speculate that it's timing. There's more time for the locks to
> propagate or an offending process to terminate. The error message
> definitely does not look like a permission issue, and
> qemuSetupImagePathCgroup:75 is in the end a no-op on on non-device
> paths (makes sense only for LVM logical volumes, partitions etc).
>
>
2020-06-09 15:12:43,964+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2511) [f8dad9f1-080f-45a1-b781-c44029bdca93] START, CreateBrokerVDSCommand(HostName = host_mixed_1, CreateVDSCommandParameters:{hostId='c861e3c5-70c9-4ce7-abd9-989c7b8b6f6d', vmId='bebec40d-03f9-40d3-acb9-37ddadc67849', vm='VM [snapshot_test]'}), log id: 236571bd
2020-06-09 15:12:43,985+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2511) [f8dad9f1-080f-45a1-b781-c44029bdca93] VM http://ovirt.org/vm/tune/1.0; xmlns:ovirt-vm="http://ovirt.org/vm/1.0; xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0;>
  snapshot_test
  bebec40d-03f9-40d3-acb9-37ddadc67849
  1048576
  1048576
  1
  4194304
  16
  

  oVirt
  OS-NAME:
  OS-VERSION:
  oVirt
  HOST-SERIAL:
  bebec40d-03f9-40d3-acb9-37ddadc67849

  
  



  
  

  
  
Skylake-Client




  

  
  
  


  
  


  
  


  
  


  


  


  
  
  
  
  
  
  
  
  


  


  


  
  
  


  


  /dev/urandom
  


  


  


  
  


  
  
  


  
  


  


  


  


  


  


  


  


  


  


  


  


  
  


  



  
  

  
  
  
  
  


  
  

  
  
 

Re: Permission to disk set wrong when restoring from memory snapshot?

2020-06-09 Thread Peter Krempa
On Tue, Jun 09, 2020 at 16:00:02 +0300, Liran Rotenberg wrote:
> On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa  wrote:
> 
> > On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > > Hi all,
> > > Passing on Bug 1840609 <
> > https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > > - Wake up from hibernation failed:internal error: unable to execute QEMU
> > > command 'cont': Failed to get "write" lock.
> > >
> > > In Ovirt/RHV there is a specific flow that prevents the VM from starting
> > on
> > > the first host.
> > > The result is:
> > > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write" lock
> > > Is another process using the image
> > >
> > [/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from pid
> > > 177876 ()
> >
> > This error comes from qemu's internal file locking. It usually means
> > that there is another qemu or qemu-img which has the given image open.
> >
> > Is there anything which would access the image at that specific time or
> > slightly around?
> >
> I don't think so. The volumes are created and added to the volume chain on
> the VM metadata domxml(two snapshots created). Then the user restores the
> latest snapshot and deletes them (while the VM is down) - they are removed.
> The VM is set with the volume and going up, restoring the memory. The
> mounting place (in /rhev/data-center) points to the same disk and volume.
> On the first run I see the new place /rhev/data-center/... that I
> can't tell why or where it comes from. It is set with 'rw', while the
> normal destination to the shared NFS is only with 'r'.

That's definitely something not related to the locking itself.

Please attach the XML document used to start the VM in both places
(working and non-working). I can't tell if there's a difference without
seeing those.

The difference might very well be in the XMLs.

> > It might be a race condition from something trying to modify the image
> > perhaps combined with propagation of locks via NFS.
> >
> I can't see any race from the RHV point of view. The strangest thing is why
> it works on the second host?
> In RHV after the VM is up we remove the memory disk, but that doesn't
> happen in this case, since the VM wasn't up.

I'd speculate that it's timing. There's more time for the locks to
propagate or an offending process to terminate. The error message
definitely does not look like a permission issue, and
qemuSetupImagePathCgroup:75 is in the end a no-op on on non-device
paths (makes sense only for LVM logical volumes, partitions etc).



Re: Permission to disk set wrong when restoring from memory snapshot?

2020-06-09 Thread Liran Rotenberg
On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa  wrote:

> On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > Hi all,
> > Passing on Bug 1840609 <
> https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > - Wake up from hibernation failed:internal error: unable to execute QEMU
> > command 'cont': Failed to get "write" lock.
> >
> > In Ovirt/RHV there is a specific flow that prevents the VM from starting
> on
> > the first host.
> > The result is:
> > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write" lock
> > Is another process using the image
> >
> [/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from pid
> > 177876 ()
>
> This error comes from qemu's internal file locking. It usually means
> that there is another qemu or qemu-img which has the given image open.
>
> Is there anything which would access the image at that specific time or
> slightly around?
>
I don't think so. The volumes are created and added to the volume chain on
the VM metadata domxml(two snapshots created). Then the user restores the
latest snapshot and deletes them (while the VM is down) - they are removed.
The VM is set with the volume and going up, restoring the memory. The
mounting place (in /rhev/data-center) points to the same disk and volume.
On the first run I see the new place /rhev/data-center/... that I
can't tell why or where it comes from. It is set with 'rw', while the
normal destination to the shared NFS is only with 'r'.

>
> It might be a race condition from something trying to modify the image
> perhaps combined with propagation of locks via NFS.
>
I can't see any race from the RHV point of view. The strangest thing is why
it works on the second host?
In RHV after the VM is up we remove the memory disk, but that doesn't
happen in this case, since the VM wasn't up.


Re: Permission to disk set wrong when restoring from memory snapshot?

2020-06-09 Thread Peter Krempa
On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> Hi all,
> Passing on Bug 1840609 
> - Wake up from hibernation failed:internal error: unable to execute QEMU
> command 'cont': Failed to get "write" lock.
> 
> In Ovirt/RHV there is a specific flow that prevents the VM from starting on
> the first host.
> The result is:
> 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write" lock
> Is another process using the image
> [/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from pid
> 177876 ()

This error comes from qemu's internal file locking. It usually means
that there is another qemu or qemu-img which has the given image open.

Is there anything which would access the image at that specific time or
slightly around?

It might be a race condition from something trying to modify the image
perhaps combined with propagation of locks via NFS.



Re: Disable virtlockd

2020-06-09 Thread Peter Krempa
On Mon, Jun 08, 2020 at 16:26:44 +0200, Felix Queißner wrote:
> Hello!

Hi,

> 
> Is it possible to disable the virtlockd daemon

Yes. via the 'lock_manager' option in /etc/libvirt/qemu.conf

>or VM file locking? I

If you mean the image locking provided by qemu, then no, libvirt doesn't
have provisions to disable it.

> start qemu with a -snapshot option which prevents and changes to the
> disk image anyways.

Note that starting from libvirt-5.10 and qemu-4.2 any VM hacking in
-snapshot via qemu:arg will break as '-snapshot' is not supported
together with -blockdev which is used to configure disks.

This is because libvirt never really supported -snapshot.

Said that I still plan to add support for  disks but I'm
currently busy with other features.

> Using  is not supported for IDE disks.
> 
> Another option would be to not require locking on the NFS share, but i
> have no idea how.

You didn't really describe the problem though. Are you trying to share
the disk images between multiple VMs?

Either way, another option is to add qcow2 overlay images which will
capture the writes and discard them after the VM is turned off.



Permission to disk set wrong when restoring from memory snapshot?

2020-06-09 Thread Liran Rotenberg
Hi all,
Passing on Bug 1840609 
- Wake up from hibernation failed:internal error: unable to execute QEMU
command 'cont': Failed to get "write" lock.

In Ovirt/RHV there is a specific flow that prevents the VM from starting on
the first host.
The result is:
2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write" lock
Is another process using the image
[/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from pid
177876 ()
We have a rerun mechanism. Therefore the VM starts again on another host if
available. On the second try the VM does succeed to run.

After looking in the debug logs (attached in the bug), I have seen this on
the first host:
2020-06-09 12:12:46.288+: 177879: debug :
qemuSetupImageCgroupInternal:139 : Not updating cgroups for disk path
'', type: file
2020-06-09 12:12:46.288+: 177879: debug : qemuSetupImagePathCgroup:75 :
Allow path
/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191
384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe,
perms: rw

*2020-06-09 12:12:46.288+: 177879: debug : qemuSetupImagePathCgroup:75
: Allow path
/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe,
perms: r*

And immediately after retry on the second host:
2020-06-09 12:13:01.839+: 15781: debug : qemuSetupImagePathCgroup:75 :
Allow path 
/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-ge-4_nfs__0/4fd23357-60
47-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe,
perms: rw

Is it intended and might cause the QEMU error?

Regards,
Liran.


Disable virtlockd

2020-06-09 Thread Felix Queißner
Hello!

Is it possible to disable the virtlockd daemon or VM file locking? I
start qemu with a -snapshot option which prevents and changes to the
disk image anyways.

Using  is not supported for IDE disks.

Another option would be to not require locking on the NFS share, but i
have no idea how.

Can someone help me with that?

Regards
Felix Queißner