from:"Nir Soffer"

Re: How to configure the default connection for virsh

2024-04-18 Thread Nir Soffer

On Thu, Apr 18, 2024 at 5:20 PM Daniel P. Berrangé 
wrote:

> On Thu, Apr 18, 2024 at 05:07:24PM +0300, Nir Soffer wrote:
> > We are using minikube vms, which require adding the user to to libvirt
> > group[1].
> > We use `virsh -c qemu:///system` for debugging these vms or related
> libvirt
> > networks.
> >
> > Using virsh without specifying the '-c' argument is a common mistake that
> > leads to
> > trouble, for example creating the libvirt default network incorrectly.
> >
> > I wonder if it is possible to configure virsh so it uses -c
> qemu:///system
> > by default?
>
> The $HOME/.config/libvirt/libvirt.conf client config file can contain
>
>uri_default = "qemu:///system"
>
> Or you can set per shell with
>
>export LIBVIRT_DEFAULT_URI=qemu:///system
>
> Or you can set it just for virsh with
>
>export VIRSH_DEFAULT_CONNECT_URI=qemu:///system
>
> See:
>
>   https://libvirt.org/uri.html#default-uri-choice
>   https://libvirt.org/manpages/virsh.html#environment
>
>
Awesome, thanks!
___
Users mailing list -- users@lists.libvirt.org
To unsubscribe send an email to users-le...@lists.libvirt.org

How to configure the default connection for virsh

2024-04-18 Thread Nir Soffer

We are using minikube vms, which require adding the user to to libvirt
group[1].
We use `virsh -c qemu:///system` for debugging these vms or related libvirt
networks.

Using virsh without specifying the '-c' argument is a common mistake that
leads to
trouble, for example creating the libvirt default network incorrectly.

I wonder if it is possible to configure virsh so it uses -c qemu:///system
by default?

We know that this is not great, but we don't know a better way to use
minikube
with the kvm2 driver. The minikube vms are started for creating a
development
environment and during CI, and they cannot require a password or any
complicated
setup.

[1] https://github.com/kubernetes/minikube/issues/3467

Nir
___
Users mailing list -- users@lists.libvirt.org
To unsubscribe send an email to users-le...@lists.libvirt.org

[ovirt-users] Re: virt-v2v paused by system after one hour or a bit more

2024-03-21 Thread Nir Soffer

On Thu, Mar 21, 2024 at 7:03 PM Cyril VINH-TUNG  wrote:

> Hello
>
> Here's the technique we use :
> - create manually the vm on ovirt with same disks (same size that original
> but you can choose target type, thin provision or preallocated)
> - on any node, force activating the disks to make them writable at the os
> level (lvm, vgchange...)
> - if the disk type is the same on target and destination, you can use dd
> over netcat to copy the disks
> - if the type is not the same, you might use qemu-img convert over netcat
>

This is very fragile and dangerous, you must really know what you are doing
:-)

If you already imported the disks from the other system, uploading them to
any storage domain
in any wanted (and supported) image format and allocation policy is much
easier and faster
using ovirt-img.

See https://www.ovirt.org/media/ovirt-img-v8.pdf


>
> If you have physical access to the node, you might use a flat backup
>
> Another workaround is to backup/restore the vm with a backup tool that
> works both with vmware and ovirt... I would say vprotect or vinchin
>

This should also work, but when importing vms from vmware, it is not enough
to copy the disk data a is,
you want to modify it to remove vmware bits and add the ovirt bits -
virt-v2v does all this for you.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6MASYRHUZPJCRX45G3TO73OURT5LFWCB/

[ovirt-users] Re: virt-v2v paused by system after one hour or a bit more

2024-03-21 Thread Nir Soffer

On Thu, Mar 21, 2024 at 12:44 PM Claus Serbe via Users 
wrote:

> Hi,
>
> I am migrating some vmware VM's from an NFS Storage via rhv-upload in
> virt-v2v, what is working good.
>
> But now I try to move some bigger VM's with several disks and sadly after
> a while (I would guess around an hour) the Ovirt-engine shows me "Paused by
> system" instead of transfering, so when the next disk should be imported,
> it will fail
>
> In the ovirt-engine.log I see the following lines for the remaining 4
> disks.
>
> 2024-03-21 06:14:06,815-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35)
> [f61b3906-804d-470f-8524-6507081fbdec] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:17,915-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14)
> [aef8e312-d811-4a39-b5fb-342157209bce] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:24,959-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-85)
> [860b012d-78a4-49f8-a875-52f4299c8298] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
> 2024-03-21 06:14:46,099-04 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65)
> [f93869ee-2ecb-4f54-b3e9-b12259637b0b] EVENT_ID:
> UPLOAD_IMAGE_PAUSED_BY_SYSTEM_TIMEOUT(1,071), Upload was paused by system.
> Reason: timeout due to transfer inactivity.
>
>
> There are 2 strange things.
>
> 1. When I start virt-v2v it will create all 6 disks and set them to
> transferring, but virt-v2v will import one after the other, what leads to
> kind of unused/timing out transferring tickets.
>

This is a incorrect usage of ovirt-imageio API in virt-v2v, please report
it here:
https://github.com/libguestfs/virt-v2v/issues

The right way to use the API is:
1. Start transfer
2. Upload the data
3. End the transfer

It does not matter if you create all the disk at the start of the
operation, but starting a transfer must be done
right before you upload the data.


> 2. When I copy the disk images to a local disk before, it works. Maybe
> just because of faster transfer speeds.
>
> Is there a possibility to transfer parallel or maybe extend the timeout?
>

Sure you can upload in parallel, but I'm not sure virt-v2v will probably
have issues importing in parallel from other
systems (e.g. vmware would not allow this).

We tested uploading 10 100g images in parallel using the ovirt-img tool,
each upload using 4 connections
(total of 40 upload connections).

You may be able to extend the timeout, but this is not recommended since
your system will not clean up quickly
after a bad client disconnects uncleanly without ending the transfer.
Unfortunately I don't remember the which
timeout should be modified on the engine side, maybe Arik or Albert can
help with this.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2TUOTBUYA6GEZYJK3W4EC3PSDJLYXHK3/

[ovirt-users] Re: Create Vm without Storage Domain

2024-03-20 Thread Nir Soffer

On Wed, Mar 20, 2024 at 12:06 PM Shafi Mohammed 
wrote:

> Hi Guys ,
>
> Thanks for the Info .
>
> I am trying to migrate my data from local to Storage domain . But it
> requires twice the effort in terms of space and time to copy the data to
> the local file system and then upload the disk to it .
>
> I created a storage domain and tried to expose it to the NBD server to
> write the data from the source Vm disk . But I'm facing an nbd
> permission issue  . Even a copy or write is restricted .
>
> Actual Command
> qemu-nbd -f qcow2 /rhev/data-center/mnt/192.168.108.27:
> _storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb
>
>
>
> *qemu-nbd: Failed to blk_new_open
> '/rhev/data-center/mnt/192.168.108.27:_storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb':
> Could not open
> '/rhev/data-center/mnt/192.168.108.27:_storage/d62b04f8-973f-4168-9e69-1f334a4968b6/images/cd6b9fc4-ef8a-4f40-ac8d-f18d355223d0/d2a57e6f-029e-46cc-85f8-ce151d027dcb':
> Permission denied*
> Please suggest me if Ovirt has any API  to open a disk from a
> storage domain and write data to it
>

You are trying to reinvent ovirt-img. Try it:
https://www.ovirt.org/media/ovirt-img-v8.pdf
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BK2RAMNASXIYOBKXSXG7JLN7ZPL2SI4R/

Re: Export checkpoint/bitmap from image on qcow2

2024-01-02 Thread Nir Soffer

On Fri, Dec 22, 2023 at 3:35 PM João Jandre Paraquetti
 wrote:
>
> Hello Nir,
>
> Thank you for your detailed response, but this process does seem a bit
> too complex.
>
> I've been thinking, wouldn't it be easier to use the process below?
>   - Create a paused transient dummy VM;
>   - Create the necessary checkpoints on the VM using dumps from the
> original VM;

Not sure what do you mean by dumps from the original vm...

>   - Use Libvirt's virDomainBackupBegin to create the incremental backup;
>   - Destroy the dummy VM.
>
> What do you think?

It should work and the backup code will be simpler, since you use the
libvirt APIs for both
live and "cold" backup.

But creating the vm in paused mode and integrating this new mode into
your system can
be more complicated than the cold backup code, depending on your system.

Some issues you will have to think about:
- when a vm is running in the backup mode, does it consume resources
(e.g. memory) that it does not need?
- how do you switch to normal mode if a user want to start a vm in the
middle of a backup? (full backup of a large vm can take hours)
- can you migrate a vm during backup?

In oVirt we went in the other way - we never use live backup -
instead, we do this
for every backup:

1. create a snapshot
2. backup the vm using the snapshot (instead of the active image)
3. delete the snapshot

With this both live and cold backup are the same, and we avoid the
issue when long backup
(e.g full backup, or incremental backup with a lot of data) blocks
usage of the vm during the
backup. You can start/stop/migrate a vm while it is being backed up
and the backup can run
on another host in the cluster (if using shared storage).

Nir

> On 12/18/23 18:44, Nir Soffer wrote:
> > On Thu, Nov 30, 2023 at 4:14 PM João Jandre Paraquetti
> >  wrote:
> >> Hi, all
> >>
> >> I recently started looking into how to create incremental backups using
> >> Libvirt+Qemu. I have already found that, since Libvirt 7.6.0, we can use
> >> the virDomainBackupBegin API to create incremental backups of live VMs
> >> through Libvirt. Using this API we can back up the VM's disks and create
> >> a checkpoint at the same time. After the first full backup, we can
> >> create incremental backups referencing the checkpoint that was generated
> >> on the previous backup process.
> >>
> >> As far as I understood Libvirt's documentation, when using this API to
> >> create a checkpoint, what happens is that Libvirt will create a new
> >> bitmap on the VM's disk to track the changes that happened from that
> >> checkpoint on, then, they'll copy what changed between the last bitmap
> >> and the new one. By doing this, Libvirt is able to create incremental
> >> backups of disks that are used by running VMs.
> >>
> >> My problem is the following though: after I stop the VM, I'm no longer
> >> able to use Libvirts API as it requires the VM (domain) to be running.
> >> However, there might still be changes on the disk since the last backup
> >> I performed. Therefore, to create a backup of the changes made since the
> >> last backup until the VM was stopped, I have to work directly on the VM
> >> image using Qemu. I have checked, and the QCOW2 file has the checkpoint
> >> created with the Libvirt backup process (I checked via the "qemu-img
> >> info"). Therefore, we have the marker of the last checkpoint generated
> >> by the last backup process. However, if the VM is stopped, there is no
> >> clear way to export this last differencial copy of the volume.
> >>
> >> I have searched through the documentation about this; however, I haven't
> >> found any example or explanation on how to manually export a
> >> differencial copy of the QCOW2 file using the last checkpoint (bitmap)
> >> created.
> >>
> >> I would much appreaciate, if people could point me to the direction on
> >> how to export differencial copies of a QCOW2 file, using the checkpoints
> >> (bitmaps) that it has.
> > Backup is complicated and a lot of hard work. Before you reinvent the
> > wheel, check
> > if this project project works for you:
> > https://github.com/abbbi/virtnbdbackup
> >
> > If you need to create your own solution of want to understand how backup 
> > works
> > with bitmaps, here is how it works.
> >
> > The basic idea is - you start an nbd server exporting the dirty
> > bitmap, so the client can
> > get the dirty extents using NBD_BLOCK_STATUS and copy the dirty clusters.
> >
> > To start an nbd server you have 2 options:
> >
&

Re: Export checkpoint/bitmap from image on qcow2

2023-12-18 Thread Nir Soffer

On Thu, Nov 30, 2023 at 4:14 PM João Jandre Paraquetti
 wrote:
>
> Hi, all
>
> I recently started looking into how to create incremental backups using
> Libvirt+Qemu. I have already found that, since Libvirt 7.6.0, we can use
> the virDomainBackupBegin API to create incremental backups of live VMs
> through Libvirt. Using this API we can back up the VM's disks and create
> a checkpoint at the same time. After the first full backup, we can
> create incremental backups referencing the checkpoint that was generated
> on the previous backup process.
>
> As far as I understood Libvirt's documentation, when using this API to
> create a checkpoint, what happens is that Libvirt will create a new
> bitmap on the VM's disk to track the changes that happened from that
> checkpoint on, then, they'll copy what changed between the last bitmap
> and the new one. By doing this, Libvirt is able to create incremental
> backups of disks that are used by running VMs.
>
> My problem is the following though: after I stop the VM, I'm no longer
> able to use Libvirts API as it requires the VM (domain) to be running.
> However, there might still be changes on the disk since the last backup
> I performed. Therefore, to create a backup of the changes made since the
> last backup until the VM was stopped, I have to work directly on the VM
> image using Qemu. I have checked, and the QCOW2 file has the checkpoint
> created with the Libvirt backup process (I checked via the "qemu-img
> info"). Therefore, we have the marker of the last checkpoint generated
> by the last backup process. However, if the VM is stopped, there is no
> clear way to export this last differencial copy of the volume.
>
> I have searched through the documentation about this; however, I haven't
> found any example or explanation on how to manually export a
> differencial copy of the QCOW2 file using the last checkpoint (bitmap)
> created.
>
> I would much appreaciate, if people could point me to the direction on
> how to export differencial copies of a QCOW2 file, using the checkpoints
> (bitmaps) that it has.

Backup is complicated and a lot of hard work. Before you reinvent the
wheel, check
if this project project works for you:
https://github.com/abbbi/virtnbdbackup

If you need to create your own solution of want to understand how backup works
with bitmaps, here is how it works.

The basic idea is - you start an nbd server exporting the dirty
bitmap, so the client can
get the dirty extents using NBD_BLOCK_STATUS and copy the dirty clusters.

To start an nbd server you have 2 options:

- qemu-nbd using the --bitmap option. vdsm code is a good place to
learn how to do this
  (more about this below). This was the best option few years ago, but
now you may want
  the second option.

- qemu-storage-daemon - you configure and start the nbd server in the same
  way libvirt does it - using the qmp commands. libvirt code is
probably the best place
  to learn how to do this. This is probably the best option at this
time, unless you are
  using an old qemu version that does not have working qemu-storage-daemon.

If you have a single qcow2 image with the bitmap, you are ready to
export the image.
But you may have a more interesting image when the same bitmap exists
in multiple
snapshots. You can see all the bitmaps and snapshots using:

qemu-img info --backing-chain --output json filename

If the bitmap exists on multiple images in the chain, you need to
validate that the bitmap
is valid - it must exist in all images in the chain since the bitmap
was created and must have
the right flags (see vdsm code below for the details).

After validating the bitmap, you need to create a new bitmap with all
the bits in the same bitmap
in all the images in the chain. You have 2 ways to do this:

- with qemu-nbd: create an overlay on top of the image, create a
bitmap in the overlay, and merge
  the bitmaps from the image chain into this bitmap. Then start
qemu-nbd with the overlay. You can
  create and merge bitmaps using `qemu-img bitmap ...` (see vdsm code
below for the details)

- with qemu-storage-daemon: you can do what libvirt does - create a
new temporary non-persistent bitmap
  (in qemu-storage-daemon memory), and merge the other bitmaps in the
chain using qmp commands.
  (see libvirt source or debug logs for the details).

When the backup bitmap is ready, you can start the nbd server and
configure it to export this bitmap.

When backup is done delete the overlay or the temporary backup bitmap.

You can check how vdsm does this using qemu-nbd here:
https://github.com/oVirt/vdsm/blob/0fc22ff0b81d605f10a2bc67309e119b7462b506/lib/vdsm/storage/nbd.py#L97

The steps:

1. Finding the images with the bitmap and validating the bitmap
   
https://github.com/oVirt/vdsm/blob/0fc22ff0b81d605f10a2bc67309e119b7462b506/lib/vdsm/storage/nbd.py#L217

2. Creating overlay with a backup bitmap

[ovirt-devel] Re: Yuriy about NVMe over fabrics for oVirt

2023-11-23 Thread Nir Soffer

On Thu, Nov 2, 2023 at 4:40 PM Yuriy  wrote:

> Hi Nir!
>
> This is Yuriy.
> We agreed to continue the subject via email.
>

So the options are:

1. Using Managed Block Storage (cinderlib) with a driver that supports
NVMe/TCP.

Lastest oVirt has the needed changes to configure this. Benny and I tested
with Lightbits[1] driver in a
virtualized environment. This is basically a POC that may work for you or
not, or require more work
that you will have to do yourself since not much development is happening
now in oVirt.

2. Using the devices via multipath

Legacy storage domains are based on multipath. It may be possible to use
multipath on top of
NVMe devices, and in this case they look like a normal LUN so you can
create a storage domain
from such devices.

oVirt will not handle connections for you, and all the devices must be
connected to all nodes at the
same time, just like FC/iSCSI LUNs. You will likely not get the performance
benefit of NVMe/TCP.

3. Using host devices

If what you need is using some devices (which happen to be connected via be
NVMe/TCP), maybe you
can attach them to a VM directly (using host devices). This gives the best
possible performance but no
features (snapshots, backup, live migration, live storage migration, etc.)

[1] https://www.lightbitslabs.com/

Nir
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/C7G4T7AZ2QWZXEU4BAA2WTG3WGQPLQBQ/

Re: Is it normal to get bigger qcow2 image after blockcopy?

2023-11-06 Thread Nir Soffer

On Fri, Nov 3, 2023 at 3:25 AM Fangge Jin  wrote:

>
>
> On Thu, Nov 2, 2023 at 5:13 PM Fangge Jin  wrote:
>
>> Recently, I found that the disk size of qcow2 image get bigger(from 6.16G
>> to 8G in my test) after blockcopy.  ❓  ❓
>>
> Sorry, it should be "from 6.16G to 6.64G in my test"here
>
>> I'm not sure whether this is normal or not. Please help to check. Thanks.
>>   ❓  ❓
>>
>>
>> Before blockcopy, check source image:  ❓
>>
>   # qemu-img info -U
>> /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2  ❓
>>
>
Was this a compressed qcow2 image when you started? maybe you started with
an appliance image?
these are typically compressed. When data is modified, new clusters are
stored uncompressed, but data
that was never modified on the original disk remains compressed.


>
>> After blockcopy, check target image:  ❓
>>   # qemu-img info -U /var/lib/avocado/data/avocado-vt/images/copy.qcow2
>>   ❓
>>
>
This image is not compressed based on the following qmp commands from
libvirt log.


> Qemu command line:  ❓
>>   -blockdev
>> '{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
>> \  ❓
>>   -blockdev
>> '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}'
>> \  ❓
>>   -device
>> '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}'
>> \  ❓
>>
>> Qemu monitor command grepped from libvirt log:  ❓
>>
>> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/copy.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"},"id":"libvirt-429"}
>>   ❓
>>
>> {"execute":"blockdev-create","arguments":{"job-id":"create-libvirt-2-format","options":{"driver":"qcow2","file":"libvirt-2-storage","size":10737418240,"cluster-size":65536}},"id":"libvirt-430"}
>>   ❓
>>
>> {"execute":"job-dismiss","arguments":{"id":"create-libvirt-2-format"},"id":"libvirt-432"}
>>   ❓
>>
>> {"execute":"blockdev-add","arguments":{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null},"id":"libvirt-433"}
>>   ❓
>>
>> {"execute":"blockdev-mirror","arguments":{"job-id":"copy-vda-libvirt-1-format","device":"libvirt-1-format","target":"libvirt-2-format","sync":"full","auto-finalize":true,"auto-dismiss":false},"id":"libvirt-434"}
>>   ❓
>>
>> {"execute":"transaction","arguments":{"actions":[{"type":"block-dirty-bitmap-add","data":{"node":"libvirt-2-format","name":"libvirt-tmp-activewrite","persistent":false,"disabled":false}}]},"id":"libvirt-443"}
>>   ❓
>>
>> {"execute":"job-complete","arguments":{"id":"copy-vda-libvirt-1-format"},"id":"libvirt-444"}
>>   ❓
>>
>
 Nir

Re: [PATCH 1/1] block: improve alignment detection and fix 271 test

2023-10-14 Thread Nir Soffer

On Fri, Sep 8, 2023 at 12:54 AM Denis V. Lunev  wrote:

> Unfortunately 271 IO test is broken if started in non-cached mode.
>

Is this a real world issue? For example in oVirt you cannot create a disk
with
size < 4k so there is no way that 4k is not a good alignment.

Should we fix the test to reflect real world usage?

_reset_img 2083k

I guess it works with:

_reset_img 2084k

Commits
> commit a6b257a08e3d72219f03e461a52152672fec0612
>     Author: Nir Soffer 
> Date:   Tue Aug 13 21:21:03 2019 +0300
> file-posix: Handle undetectable alignment
> and
> commit 9c60a5d1978e6dcf85c0e01b50e6f7f54ca09104
> Author: Kevin Wolf 
> Date:   Thu Jul 16 16:26:00 2020 +0200
> block: Require aligned image size to avoid assertion failure
> have interesting side effect if used togather.
>
> If the image size is not multiple of 4k and that image falls under
> original constraints of Nil's patch, the image can not be opened
> due to the check in the bdrv_check_perm().
>
> The patch tries to satisfy the requirements of bdrv_check_perm()
> inside raw_probe_alignment(). This is at my opinion better that just
> disallowing to run that test in non-cached mode. The operation is legal
> by itself.
>
> Signed-off-by: Denis V. Lunev 
> CC: Nir Soffer 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: Alberto Garcia 
> ---
>  block/file-posix.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..988cfdc76c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -447,8 +447,21 @@ static void raw_probe_alignment(BlockDriverState *bs,
> int fd, Error **errp)
>  for (i = 0; i < ARRAY_SIZE(alignments); i++) {
>  align = alignments[i];
>  if (raw_is_io_aligned(fd, buf, align)) {
> -/* Fallback to safe value. */
> -bs->bl.request_alignment = (align != 1) ? align :
> max_align;
> +if (align != 1) {
> +bs->bl.request_alignment = align;
> +break;
> +}
> +/*
> + * Fallback to safe value. max_align is perfect, but the
> size of the device must be multiple of
> + * the virtual length of the device. In the other case we
> will get a error in
> + * bdrv_node_refresh_perm().
> + */
> +for (align = max_align; align > 1; align /= 2) {
> +if ((bs->total_sectors * BDRV_SECTOR_SIZE) % align ==
> 0) {
>

Moving image size calculation out of the loop would make the intent of the
code
more clear:

if (image_size % align == 0) {

Since qemu does not enforce image size alignment, I can see how you create
a 512 bytes
aligned image and in the case when qemu cannot detect the alignment, we end
with
align = 4k. In this case this loop would select align = 512, but with the
image aligned to
some strange value, this loop may select align = 2 or some other value that
does not
make sense.

So I can see using 4k or 512 bytes as a good fallback value, but anything
else should not
be possible, so maybe we should fix this in bdrv_check_perm()?

Nir

Re: [PATCH 1/1] block: improve alignment detection and fix 271 test

2023-10-14 Thread Nir Soffer

On Fri, Sep 8, 2023 at 12:54 AM Denis V. Lunev  wrote:

> Unfortunately 271 IO test is broken if started in non-cached mode.
>

Is this a real world issue? For example in oVirt you cannot create a disk
with
size < 4k so there is no way that 4k is not a good alignment.

Should we fix the test to reflect real world usage?

_reset_img 2083k

I guess it works with:

_reset_img 2084k

Commits
> commit a6b257a08e3d72219f03e461a52152672fec0612
>     Author: Nir Soffer 
> Date:   Tue Aug 13 21:21:03 2019 +0300
> file-posix: Handle undetectable alignment
> and
> commit 9c60a5d1978e6dcf85c0e01b50e6f7f54ca09104
> Author: Kevin Wolf 
> Date:   Thu Jul 16 16:26:00 2020 +0200
> block: Require aligned image size to avoid assertion failure
> have interesting side effect if used togather.
>
> If the image size is not multiple of 4k and that image falls under
> original constraints of Nil's patch, the image can not be opened
> due to the check in the bdrv_check_perm().
>
> The patch tries to satisfy the requirements of bdrv_check_perm()
> inside raw_probe_alignment(). This is at my opinion better that just
> disallowing to run that test in non-cached mode. The operation is legal
> by itself.
>
> Signed-off-by: Denis V. Lunev 
> CC: Nir Soffer 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: Alberto Garcia 
> ---
>  block/file-posix.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..988cfdc76c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -447,8 +447,21 @@ static void raw_probe_alignment(BlockDriverState *bs,
> int fd, Error **errp)
>  for (i = 0; i < ARRAY_SIZE(alignments); i++) {
>  align = alignments[i];
>  if (raw_is_io_aligned(fd, buf, align)) {
> -/* Fallback to safe value. */
> -bs->bl.request_alignment = (align != 1) ? align :
> max_align;
> +if (align != 1) {
> +bs->bl.request_alignment = align;
> +break;
> +}
> +/*
> + * Fallback to safe value. max_align is perfect, but the
> size of the device must be multiple of
> + * the virtual length of the device. In the other case we
> will get a error in
> + * bdrv_node_refresh_perm().
> + */
> +for (align = max_align; align > 1; align /= 2) {
> +if ((bs->total_sectors * BDRV_SECTOR_SIZE) % align ==
> 0) {
>

Moving image size calculation out of the loop would make the intent of the
code
more clear:

if (image_size % align == 0) {

Since qemu does not enforce image size alignment, I can see how you create
a 512 bytes
aligned image and in the case when qemu cannot detect the alignment, we end
with
align = 4k. In this case this loop would select align = 512, but with the
image aligned to
some strange value, this loop may select align = 2 or some other value that
does not
make sense.

So I can see using 4k or 512 bytes as a good fallback value, but anything
else should not
be possible, so maybe we should fix this in bdrv_check_perm()?

Nir

[ovirt-users] Re: How to obtain vm snapshots status

2023-10-03 Thread Nir Soffer

On Tue, Sep 26, 2023 at 9:07 PM anton.alymov--- via Users 
wrote:

> Hi! I use ovirt rest api to start vm, backup vm and then remove vm.
> I start vm, wait for vmstatus up, then  start backup, wait for starting,
> finalize, wait for succeeded, wait for disk unlock. Looks like backup is
> finished here from my side.Because ovirt repost succeed status and unlocks
> disk. But if i try shutdown and remove vm ovirt will throw error  Cannot
> remove VM. The VM is performing an operation on a Snapshot. Please wait for
> the operation to finish, and try again.
> Ok, ovirt is right here, I see from web interface that operation hasn't
> finished yet. How can I obtain correct status where vm can be removed? I
> also tried to get info about vm snapshots but all of them had Status: ok
>

This sounds similar to ovirt stress delete-snapshot and backup tests.

Please check here how to use the ovirt python sdk to create/delete/backup
and wait for events:
https://github.com/ovirt/ovirt-stress

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/B2SFL5DIHW7HEDCZWJXGDFRN7HC5LIGR/

Re: [Libguestfs] [libnbd PATCH] golang: Bump minimum Go version to 1.17

2023-08-17 Thread Nir Soffer

On Tue, Aug 15, 2023 at 9:53 PM Eric Blake  wrote:

> On Mon, Aug 14, 2023 at 01:43:37PM -0500, Eric Blake wrote:
> > > > +++ b/golang/configure/test.go
> > > > @@ -25,8 +25,19 @@
> > > >  import (
> > > > "fmt"
> > > > "runtime"
> > > > +   "unsafe"
> > > >  )
> > > >
> > > > +func check_slice(arr *uint32, cnt int) []uint32 {
> > > > +   /* We require unsafe.Slice(), introduced in 1.17 */
> > > > +   ret := make([]uint32, cnt)
> > > > +   s := unsafe.Slice(arr, cnt)
> > > > +   for i, item := range s {
> > > > +   ret[i] = uint32(item)
> > > > +   }
> > > > +   return ret
> > > > +}
> > > >
> > >
> > > I'm not sure what is the purpose of this test - requiring the Go
> version is
> > > good
> > > enough since the code will not compile with an older version. EVen if
> it
> > > would,
> > > it will not compile without unsafe.Slice so no special check is needed.
>
> Turns out it does matter.  On our CI system, Ubuntu 20.04 has Go
> 1.13.8 installed, and without this feature test, it compiled just fine
> (it wasn't until later versions of Go that go.mod's version request
> causes a compile failure if not satisfied).
>

How does it compile when unsafe.Slice is undefined?

Quick test with unrelated test app:

$ go build; echo $?
# cobra-test
./main.go:10:6: undefined: cmd.NoSuchMethod
1

Or you mean the compile test for configure works and we want to make
the configure test fail to compile?

https://gitlab.com/nbdkit/libnbd/-/jobs/4870816575
>
> But while investigating that, I also noticed that libvirt-ci just
> recently dropped all Debian 10 support in favor of Debian 12, so I'm
> working on updating ci/manifest.yml to match.
>

Sounds like a good idea
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [libnbd PATCH] golang: Bump minimum Go version to 1.17

2023-08-13 Thread Nir Soffer

On Sat, Aug 12, 2023 at 12:18 AM Eric Blake  wrote:

> Go 1.17 or newer is required to use unsafe.Slice(), which in turn
> allows us to write a simpler conversion from a C array to a Go object
> during callbacks.
>
> To check if this makes sense, look at
> https://repology.org/project/go/versions compared to our list in
> ci/manifest.yml, at the time I made this commit:
>
> Alpine 3.15: 1.17.10
> AlmaLinux 8: 1.19.10
> CentOS Stream 8: 1.20.4
> Debian 10: 1.11.6
> Debian 11: 1.15.15 (mainline), 1.19.8 (backports)
> Debian 12: 1.19.8
> Fedoar 36: 1.19.8
> FreeBSD Ports: 1.20.7
> OpenSUSE Leap 15.3: 1.16.3
> OpenSUSE Leap 15.4: 1.18.1
> Ubuntu 18.04: 1.18.1
>
> We previously required a minimum of 1.13 for module support, which
> means Debian 10 was already not supporting Go bindings.  OpenSUSE Leap
> 15.3 loses support, but is relatively old these days.  All other
> systems appear unaffected by this bump in requirements, at least if
> they can be configured to use developer backports.
>
> Suggested-by: Nir Soffer 
> Signed-off-by: Eric Blake 
> ---
>
> This replaces
> https://listman.redhat.com/archives/libguestfs/2023-August/032227.html
>
>  generator/GoLang.ml  |  8 
>  README.md|  2 +-
>  golang/configure/go.mod  |  4 ++--
>  golang/configure/test.go | 11 +++
>  golang/go.mod|  4 ++--
>  5 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/generator/GoLang.ml b/generator/GoLang.ml
> index 73df5254..55ff1b8a 100644
> --- a/generator/GoLang.ml
> +++ b/generator/GoLang.ml
> @@ -517,10 +517,10 @@ let
>
>  func copy_uint32_array(entries *C.uint32_t, count C.size_t) []uint32 {
>  ret := make([]uint32, int(count))
> -// See
> https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
> -// TODO: Use unsafe.Slice() when we require Go 1.17.
> -s := (*[1 << 30]uint32)(unsafe.Pointer(entries))[:count:count]
>

We have another instance of this pattern in AioBuffer.Slice


> -copy(ret, s)
> +s := unsafe.Slice(entries, count)
> +for i, item := range s {
> +ret[i] = uint32(item)
> +}
>  return ret
>  }
>  ";
> diff --git a/README.md b/README.md
> index c7166613..8524038e 100644
> --- a/README.md
> +++ b/README.md
> @@ -105,7 +105,7 @@ ## Building from source
>  * Python >= 3.3 to build the Python 3 bindings and NBD shell (nbdsh).
>  * FUSE 3 to build the nbdfuse program.
>  * Linux >= 6.0 and ublksrv library to build nbdublk program.
> -* go and cgo, for compiling the golang bindings and tests.
> +* go and cgo >= 1.17, for compiling the golang bindings and tests.
>  * bash-completion >= 1.99 for tab completion.
>
>  Optional, only needed to run the test suite:
> diff --git a/golang/configure/go.mod b/golang/configure/go.mod
> index ce3e4f39..fcdb28db 100644
> --- a/golang/configure/go.mod
> +++ b/golang/configure/go.mod
> @@ -1,4 +1,4 @@
>  module libguestfs.org/configure
>
> -// First version of golang with working module support.
> -go 1.13
> +// First version of golang with working module support and unsafe.Slice.
>

"First version of golang with working module support" is not relevant for
1.17,
maybe "For unsafe.Slice"?


> +go 1.17
> diff --git a/golang/configure/test.go b/golang/configure/test.go
> index fe742f2b..a15c9ea3 100644
> --- a/golang/configure/test.go
> +++ b/golang/configure/test.go
> @@ -25,8 +25,19 @@
>  import (
> "fmt"
> "runtime"
> +   "unsafe"
>  )
>
> +func check_slice(arr *uint32, cnt int) []uint32 {
> +   /* We require unsafe.Slice(), introduced in 1.17 */
> +   ret := make([]uint32, cnt)
> +   s := unsafe.Slice(arr, cnt)
> +   for i, item := range s {
> +   ret[i] = uint32(item)
> +   }
> +   return ret
> +}
>

I'm not sure what is the purpose of this test - requiring the Go version is
good
enough since the code will not compile with an older version. EVen if it
would,
it will not compile without unsafe.Slice so no special check is needed.


> +
>  func main() {
> fmt.Println(runtime.Version())
>
> diff --git a/golang/go.mod b/golang/go.mod
> index fc772840..1b72e77d 100644
> --- a/golang/go.mod
> +++ b/golang/go.mod
> @@ -1,4 +1,4 @@
>  module libguestfs.org/libnbd
>
> -// First version of golang with working module support.
> -go 1.13
> +// First version of golang with working module support and unsafe.Slice.
> +go 1.17
> --
> 2.41.0
>
>
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [libnbd PATCH v4 05/25] golang: Change logic of copy_uint32_array

2023-08-08 Thread Nir Soffer

On Thu, Aug 3, 2023 at 4:57 AM Eric Blake  wrote:
>
> Commit 6725fa0e12 changed copy_uint32_array() to utilize a Go hack for
> accessing a C array as a Go slice in order to potentially benefit from
> any optimizations in Go's copy() for bulk transfer of memory over
> naive one-at-a-time iteration.  But that commit also acknowledged that
> no benchmark timings were performed, which would have been useful to
> demonstrat an actual benefit for using hack in the first place.  And

Why do you call this a hack? This is the documented way to create a Go
slice from memory.

> since we are copying data anyways (rather than using the slice to
> avoid a copy), and network transmission costs have a higher impact to
> performance than in-memory copying speed, it's hard to justify keeping
> the hack without hard data.

Since this is not a hack we don't need to justify it :-)

> What's more, while using Go's copy() on an array of C uint32_t makes
> sense for 32-bit extents, our corresponding 64-bit code uses a struct
> which does not map as nicely to Go's copy().

If we return a slice of the C extent type, copy() can work, but it is probably
not what we want to return.

> Using a common style
> between both list copying helpers is beneficial to mainenance.
>
> Additionally, at face value, converting C.size_t to int may truncate;
> we could avoid that risk if we were to uniformly use uint64 instead of
> int.  But we can equally just panic if the count is oversized: our
> state machine guarantees that the server's response fits within 64M
> bytes (count will be smaller than that, since it is multiple bytes per
> extent entry).

Good to check this, but not related to changing the way we copy the array.

> Suggested-by: Laszlo Ersek 
> CC: Nir Soffer 
> Signed-off-by: Eric Blake 
> ---
>
> v4: new patch to the series, but previously posted as part of the
> golang cleanups.  Since then: rework the commit message as it is no
> longer a true revert, and add a panic() if count exceeds expected
> bounds.
> ---
>  generator/GoLang.ml | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/generator/GoLang.ml b/generator/GoLang.ml
> index 73df5254..cc7d78b6 100644
> --- a/generator/GoLang.ml
> +++ b/generator/GoLang.ml
> @@ -516,11 +516,16 @@ let
>  /* Closures. */
>
>  func copy_uint32_array(entries *C.uint32_t, count C.size_t) []uint32 {
> +if (uint64(count) > 64*1024*1024) {
> +panic(\"violation of state machine guarantee\")

This is unwanted in a library, it means the entire application will crash
because of a bug in the library. Can we convert this to an error in the caller?

> +}
>  ret := make([]uint32, int(count))
> -// See 
> https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
> -// TODO: Use unsafe.Slice() when we require Go 1.17.
> -s := (*[1 << 30]uint32)(unsafe.Pointer(entries))[:count:count]

Can we require Go 1.17? (current version is 1.20)

In Go >= 1.17, we can use something like:

s := unsafe.Slice(C.uint32_t, length)

> -copy(ret, s)
> +addr := uintptr(unsafe.Pointer(entries))
> +for i := 0; i < int(count); i++ {
> +ptr := (*C.uint32_t)(unsafe.Pointer(addr))
> +ret[i] = uint32(*ptr)
> +addr += unsafe.Sizeof(*ptr)
> +}

This loop is worse than the ugly line creating a slice.
With a slice we can do:

for i, item := range s {
ret[i] = uint32(item)
}

(I did not try to compile this)

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[ovirt-users] Re: python sdk4 ovirt 4.5.5.0 master

2023-07-19 Thread Nir Soffer

On Mon, Jul 17, 2023 at 6:29 PM Jorge Visentini
 wrote:
>
> Hi.
>
> I am testing oVirt 4.5.5-0.master.20230712143502.git07e865d650.el8.
>
> I missed the python scripts to download and upload discs and images... Will 
> it still be possible to use them or should I consider using Ansible?

See 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EUSHCZXXZD4HHBQ64OEVXN5FE7SJT7R6/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MSYHQ6V7DZAZAAHZ5FCCEJVJYMF2QOKG/

Re: [Libguestfs] Libnbd asynchronous API with epoll

2023-07-09 Thread Nir Soffer

On Fri, Jul 7, 2023 at 11:59 AM Tage Johansson 
wrote:

> On 7/6/2023 7:06 PM, Nir Soffer wrote:
>
> - After calling for example aio_notify_read(3), can I know that the next
> reading from the file descriptor would block?
>
> No, you have to call again aio_get_direction() and poll again until the
> event happens.
>
> Well, what I mean is:
>
> After calling aio_notify_read, if aio_get_direction returns
> AIO_DIRECTION_READ or AIO_DIRECTION_BOTH, can I know that the reading on
> the file descriptor actually blocked?
>

Yes - it never blocks.

> Or might there be cases when aio_notify_read returns and the next
> direction includes a read and there is still more data to read on the file
> descriptor?
>

Sure it is expected that the socket is readable but more data will be
available
later...

> I guess this is the case, but I must know or the client may hang
> unexpectedly.
>

Libnbd uses non-blocking socket so it will never hang.
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Libnbd asynchronous API with epoll

2023-07-06 Thread Nir Soffer

On Wed, Jul 5, 2023 at 3:38 PM Tage Johansson 
wrote:

> As part of the Rust bindings for Libnbd, I try to integrate the
> asynchronous (aio_*) functions with Tokio
> , the most used asynchronous runtime
> in Rust. However, in its eventloop, Tokio uses epoll(7) instead of poll(2)
> (which is used internally in Libnbd). The difference is that poll(2) uses
> level-triggered notifications as aposed to epoll(7) which uses
> edge-triggered notifications.
>

According to epoll(7) section "Level-triggered and edge-triggered" says:

   By  contrast,  when  used  as a level-triggered interface (the
default,
   when EPOLLET is not specified), epoll is simply a faster  poll(2),
 and
   can be used wherever the latter is used since it shares the same
seman‐
   tics.

So you should not have any issue using epoll instead of poll.

> - After calling aio_get_direction(3), can I know that reading/writing
> would actually block?
>
No, this is just the events that libnbd wants to get.

- After calling for example aio_notify_read(3), can I know that the next
> reading from the file descriptor would block?
>
No, you have to call again aio_get_direction() and poll again until the
event happens.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[PATCH] libvhost-user: Fix update of signalled_used

2023-05-09 Thread Nir Soffer

When we check if a driver needs a signal, we compare:

- used_event: written by the driver each time it consumes an item
- new: current idx written to the used ring, updated by us
- old: last idx we signaled about

We call vring_need_event() which does:

return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);

Previously we updated signalled_used on every check, so old was always
new - 1. Because used_event cannot bigger than new_idx, this check
becomes (ignoring wrapping):

return new_idx == event_idx + 1;

Since the driver consumes items at the same time the device produces
items, it is very likely (and seen in logs) that the driver used_event
is too far behind new_idx and we don't signal the driver.

With libblkio virtio-blk-vhost-user driver, if the driver does not get a
signal, the libblkio client can hang polling the completion fd. This
is very easy to reproduce on some machines and impossible to reproduce
on others.

Fixed by updating signalled_used only when we signal the driver.
Tested using blkio-bench and libblkio client application that used to
hang randomly without this change.

Buglink: https://gitlab.com/libblkio/libblkio/-/issues/68
Signed-off-by: Nir Soffer 
---
 subprojects/libvhost-user/libvhost-user.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 8fb61e2df2..5f26d2d378 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2382,12 +2382,11 @@ vu_queue_empty(VuDev *dev, VuVirtq *vq)
 }
 
 static bool
 vring_notify(VuDev *dev, VuVirtq *vq)
 {
-uint16_t old, new;
-bool v;
+uint16_t old, new, used;
 
 /* We need to expose used array entries before checking used event. */
 smp_mb();
 
 /* Always notify when queue is empty (when feature acknowledge) */
@@ -2398,15 +2397,27 @@ vring_notify(VuDev *dev, VuVirtq *vq)
 
 if (!vu_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
 return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT);
 }
 
-v = vq->signalled_used_valid;
-vq->signalled_used_valid = true;
+if (!vq->signalled_used_valid) {
+vq->signalled_used_valid = true;
+vq->signalled_used = vq->used_idx;
+return true;
+}
+
+used = vring_get_used_event(vq);
+new = vq->used_idx;
 old = vq->signalled_used;
-new = vq->signalled_used = vq->used_idx;
-return !v || vring_need_event(vring_get_used_event(vq), new, old);
+
+if (vring_need_event(used, new, old)) {
+vq->signalled_used_valid = true;
+vq->signalled_used = vq->used_idx;
+return true;
+}
+
+return false;
 }
 
 static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
 {
 if (unlikely(dev->broken) ||
-- 
2.40.1

[Libguestfs] [PATCH libnbd v2] README: Document additional packages

2023-04-17 Thread Nir Soffer

When building from git we need autoconf, automake and libtool.

Signed-off-by: Nir Soffer 
---

Changes sinve v1:
- Remove `,` between package namses (Laszlo)

 README.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/README.md b/README.md
index c7166613..7eed0e31 100644
--- a/README.md
+++ b/README.md
@@ -32,10 +32,17 @@ ## License
 very liberal license.
 
 
 ## Building from source
 
+Building from source requires additional packages. On rpm based system
+use:
+
+```
+dnf install autoconf automake libtool
+```
+
 To build from git:
 
 ```
 autoreconf -i
 ./configure
-- 
2.39.2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH libnbd] README: Document additional packages

2023-04-17 Thread Nir Soffer

On Mon, Apr 17, 2023 at 7:38 PM Laszlo Ersek  wrote:
>
> On 4/17/23 18:36, Nir Soffer wrote:
> > When building from git we need autoconf, automake and libtool.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  README.md | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/README.md b/README.md
> > index c7166613..42a187c0 100644
> > --- a/README.md
> > +++ b/README.md
> > @@ -32,10 +32,17 @@ ## License
> >  very liberal license.
> >
> >
> >  ## Building from source
> >
> > +Building from source requires additional packages. On rpm based system
> > +use:
> > +
> > +```
> > +dnf install autoconf, automake, libtool
> > +```
>
> Are the comma characters intentional?

Copied from the spec, fixing.

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[Libguestfs] [PATCH libnbd] README: Document additional packages

2023-04-17 Thread Nir Soffer

When building from git we need autoconf, automake and libtool.

Signed-off-by: Nir Soffer 
---
 README.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/README.md b/README.md
index c7166613..42a187c0 100644
--- a/README.md
+++ b/README.md
@@ -32,10 +32,17 @@ ## License
 very liberal license.
 
 
 ## Building from source
 
+Building from source requires additional packages. On rpm based system
+use:
+
+```
+dnf install autoconf, automake, libtool
+```
+
 To build from git:
 
 ```
 autoreconf -i
 ./configure
-- 
2.39.2

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-06 Thread Nir Soffer

On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst  wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir

Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-06 Thread Nir Soffer

On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst  wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir

Re: [Libguestfs] [PATCH v2 1/6] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

2023-03-06 Thread Nir Soffer

On Sun, Mar 5, 2023 at 10:42 AM Wouter Verhelst  wrote:
>
> On Fri, Mar 03, 2023 at 04:17:40PM -0600, Eric Blake wrote:
> > On Fri, Dec 16, 2022 at 10:32:01PM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > s-o-b line missed.
> >
> > I'm not sure if the NBD project has a strict policy on including one,
> > but I don't mind adding it.
>
> I've never required it, mostly because it's something that I myself
> always forget, too, so, *shrug*.
>
> (if there were a way in git to make it add that automatically, that
> would help; I've looked but haven't found it)

What I'm using in all projects that require signed-off-by is:

$ cat .git/hooks/commit-msg
#!/bin/sh

# Add Signed-off-by trailer.
sob=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p')
git interpret-trailers --in-place --trailer "$sob" "$1"

You can also use a pre-commit hook but the commit-msg hook is more
convenient.

And in github you can add the DCO application to the project:
https://github.com/apps/dco

Once installed it will check that all commits are signed off, and
provide helpful error
messages to contributors.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH] docs: Prefer 'cookie' over 'handle'

2023-03-04 Thread Nir Soffer

On Sat, Mar 4, 2023 at 12:15 AM Eric Blake  wrote:
>
> In libnbd, we quickly learned that distinguishing between 'handle'
> (verb for acting on an object) and 'handle' (noun describing which
> object to act on) could get confusing; we solved it by renaming the
> latter to 'cookie'.  Copy that approach into the NBD spec, and make it
> obvious that a cookie is opaque data from the point of view of the
> server.

Good change, will make it easier to search code.

But the actual text does not make it clear that a cookie is opaque data from
point of view of the client. Maybe make this more clear?

> Makes no difference to implementations (other than older code
> still using 'handle' may be slightly harder to tie back to the spec).

To avoid confusion with older code that carefully used "handle" to match
the spec, maybe add a note that "cookie" was named "handle" before?

Nir

Re: [Libguestfs] [PATCH] docs: Prefer 'cookie' over 'handle'

2023-03-04 Thread Nir Soffer

On Sat, Mar 4, 2023 at 12:15 AM Eric Blake  wrote:
>
> In libnbd, we quickly learned that distinguishing between 'handle'
> (verb for acting on an object) and 'handle' (noun describing which
> object to act on) could get confusing; we solved it by renaming the
> latter to 'cookie'.  Copy that approach into the NBD spec, and make it
> obvious that a cookie is opaque data from the point of view of the
> server.

Good change, will make it easier to search code.

But the actual text does not make it clear that a cookie is opaque data from
point of view of the client. Maybe make this more clear?

> Makes no difference to implementations (other than older code
> still using 'handle' may be slightly harder to tie back to the spec).

To avoid confusion with older code that carefully used "handle" to match
the spec, maybe add a note that "cookie" was named "handle" before?

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

2023-03-02 Thread Nir Soffer

On Thu, Mar 2, 2023 at 10:46 AM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  
> > wrote:
> > > I think it would be more useful if (or in addition) it could compute
> > > the checksum of a stream which is being converted with 'qemu-img
> > > convert'.  Extra points if it can compute the checksum over either the
> > > input or output stream.
> >
> > I thought about this, it could be a filter that you add in the graph
> > that gives you checksum as a side effect of copying. But this requires
> > disabling unordered writes, which is pretty bad for performance.
> >
> > But even if you compute the checksum during a transfer, you want to
> > verify it by reading the transferred data from storage. Once you computed
> > the checksum you can keep it for verifying the same image in the future.
>
> The use-case I have in mind is being able to verify a download when
> you already know the checksum and are copying / converting the image
> in flight.
>
> eg: You are asked to download https://example.com/distro-cloud.qcow2
> with some published checksum and you will on the fly download and
> convert this to raw, but want to verify the checksum (of the qcow2)
> during the conversion step.  (Or at some point, but during the convert
> avoids having to spool the image locally.)

I'm thinking about the same flow. I think the best way to verify is:

1. The remote server publishes a block-checksum of the image
2. The system gets the block-checksum from the server (from http header?)
3. The system pulls data from the server, pushes to the target disk in
the wanted format
4. The system computes a checksum of the target disk

This way you verify the entire pipeline including the storage. If we
compute a checksum
during the conversion, we verify only that we got the correct data
from the server.

If we care only about verifying the transfer from the server, we can compute the
checksum during the download, which is likely to be sequential (so easy to
integrate with blkhash)

If we want to validate nbdcopy, it will be much harder to compute a checksum
inside nbdcopy because it does not stream the data in order.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Nir Soffer

On Tue, Feb 28, 2023 at 4:13 PM Laszlo Ersek  wrote:
>
> On 2/28/23 12:39, Richard W.M. Jones wrote:
> > On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
> >> On 2/27/23 17:44, Richard W.M. Jones wrote:
> >>> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
>  Or intentionally choose a hash that can be computed out-of-order,
>  such as a Merkle Tree.  But we'd need a standard setup for all
>  parties to agree on how the hash is to be computed and checked, if
>  it is going to be anything more than just a linear hash of the
>  entire guest-visible contents.
> >>>
> >>> Unfortunately I suspect that by far the easiest way for people who
> >>> host images to compute checksums is to run 'shaXXXsum' on them or
> >>> sign them with a GPG signature, rather than engaging in a novel hash
> >>> function.  Indeed that's what is happening now:
> >>>
> >>> https://alt.fedoraproject.org/en/verify.html
> >>
> >> If the output is produced with unordered writes, but the complete
> >> output needs to be verified with a hash *chain*, that still allows
> >> for some level of asynchrony. The start of the hashing need not be
> >> delayed until after the end of output, only after the start of
> >> output.
> >>
> >> For example, nbdcopy could maintain the highest offset up to which
> >> the output is contiguous, and on a separate thread, it could be
> >> hashing the output up to that offset.
> >>
> >> Considering a gigantic output, as yet unassembled blocks could likely
> >> not be buffered in memory (that's why the writes are unordered in the
> >> first place!), so the hashing thread would have to re-read the output
> >> via NBD. Whether that would cause performance to improve or to
> >> deteriorate is undecided IMO. If the far end of the output network
> >> block device can accommodate a reader that is independent of the
> >> writers, then this level of overlap is beneficial. Otherwise, this
> >> extra reader thread would just add more thrashing, and we'd be better
> >> off with a separate read-through once writing is complete.
> >
> > In my mind I'm wondering if there's any mathematical result that lets
> > you combine each hash(block_i) into the final hash(block[1..N])
> > without needing to compute the hash of each block in order.
>
> I've now checked:
>
> https://en.wikipedia.org/wiki/SHA-2
> https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction
> https://en.wikipedia.org/wiki/One-way_compression_function#Construction_from_block_ciphers
> https://en.wikipedia.org/wiki/One-way_compression_function#Davies%E2%80%93Meyer
>
> Consider the following order of steps:
>
> - precompute hash(block[n]), with some initial IV
> - throw away block[n]
> - wait until block[n-1] is processed, providing the actual IV for
>   hashing block[n]
> - mix the new IV into hash(block[n]) without having access to block[n]
>
> If such a method existed, it would break the security (i.e., the
> original design) of the hash, IMO, as it would separate the IV from
> block[n]. In a way, it would make the "mix" and "concat" operators (of
> the underlying block cipher's chaining method) distributive. I believe
> then you could generate a bunch of *valid* hash(block[n]) values as a
> mere function of the IV, without having access to block[n]. You could
> perhaps use that for probing against other hash(block[m]) values, and
> maybe determine repeating patterns in the plaintext. I'm not a
> cryptographer so I can't exactly show what security property is broken
> by separating the IV from block[n].
>
> > (This is what blkhash solves, but unfortunately the output isn't
> > compatible with standard hashes.)
>
> Assuming blkhash is a Merkle Tree implementation, blkhash solves a
> different problem IMO.

blkhash uses a flat Merkle tree, described here:
https://www.researchgate.net/publication/323243320_Foundations_of_Applied_Cryptography_and_Cybersecurity
seection 3.9.4 2lMT: the Flat Merkle Tree construction

To support parallel hashing it uses more complex construction, but this
can be simplified to a single flat Merkle tree.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer

On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 04:24:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  
> > wrote:
> > >
> > >
> > > https://github.com/kubevirt/containerized-data-importer/issues/1520
> > >
> > > Hi Eric,
> > >
> > > We had a question from the Kubevirt team related to the above issue.
> > > The question is roughly if it's possible to calculate the checksum of
> > > an image as an nbdkit filter and/or in the qemu block layer.
> > >
> > > Supplemental #1: could qemu-img convert calculate a checksum as it goes
> > > along?
> > >
> > > Supplemental #2: could we detect various sorts of common errors, such
> > > a webserver that is incorrectly configured and serves up an error page
> > > containing ""; or something which is supposed to be a disk image
> > > but does not "look like" (in some ill-defined sense) a disk image,
> > > eg. it has no partition table.
> > >
> > > I'm not sure if qemu has any existing features covering the above (and
> > > I know for sure that nbdkit doesn't).
> > >
> > > One issue is that calculating a checksum involves a linear scan of the
> > > image, although we can at least skip holes.
> >
> > Kubvirt can use blksum
> > https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/
> >
> > But we need to package it for Fedora/CentOS Stream.
> >
> > I also work on "qemu-img checksum", getting more reviews on this can help:
> > Lastest version:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
> > Last reveiw are here:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-12/
> >
> > More work is needed on the testing framework changes.
>
> I think it would be more useful if (or in addition) it could compute
> the checksum of a stream which is being converted with 'qemu-img
> convert'.  Extra points if it can compute the checksum over either the
> input or output stream.

I thought about this, it could be a filter that you add in the graph
that gives you checksum as a side effect of copying. But this requires
disabling unordered writes, which is pretty bad for performance.

But even if you compute the checksum during a transfer, you want to
verify it by reading the transferred data from storage. Once you computed
the checksum you can keep it for verifying the same image in the future.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer

On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  wrote:
>
>
> https://github.com/kubevirt/containerized-data-importer/issues/1520
>
> Hi Eric,
>
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.
>
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?
>
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing ""; or something which is supposed to be a disk image
> but does not "look like" (in some ill-defined sense) a disk image,
> eg. it has no partition table.
>
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).
>
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.

Kubvirt can use blksum
https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/

But we need to package it for Fedora/CentOS Stream.

I also work on "qemu-img checksum", getting more reviews on this can help:
Lastest version:
https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
Last reveiw are here:
https://lists.nongnu.org/archive/html/qemu-block/2022-12/

More work is needed on the testing framework changes.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH 1/2] python: Avoid crash if callback parameters cannot be built

2023-02-20 Thread Nir Soffer

On Mon, Feb 20, 2023 at 10:45 AM Laszlo Ersek  wrote:
>
> On 2/17/23 17:52, Eric Blake wrote:
> > On Thu, Feb 16, 2023 at 03:09:02PM +0100, Laszlo Ersek wrote:
>
> >> - Py_BuildValue with the "O" format specifier transfers the new list's
> >> *sole* reference (= ownership) to the just-built higher-level object "args"
> >
> > Reference transfer is done with "N", not "O".  That would be an
> > alternative to decreasing the refcount of py_array on success, but not
> > eliminate the need to decrease the refcount on Py_BuildValue failure.
> >
> >>
> >> - when "args" is killed (decref'd), it takes care of "py_array".
> >>
> >> Consequently, if Py_BuildValue fails, "py_array" continues owning the
> >> new list -- and I believe that, if we take the new error branch, we leak
> >> the object pointed-to by "py_array". Is that the case?
> >
> > Not quite.  "O" is different than "N".
>
> I agree with you *now*, looking up the "O" specification at
> .
>
> However, when I was writing my email, I looked up Py_BuildValue at that
> time as well, just elsewhere. I don't know where. Really. And then that
> documentation said that the reference count would *not* be increased. I
> distinctly remember that, because it surprised me -- I actually recalled
> an *even earlier* experience reading the documentation, which had again
> stated that "O" would increase the reference count.

Maybe here:
https://docs.python.org/2/c-api/arg.html#building-values

Looks like another incompatibility between python 2 and 3.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[ovirt-users] Re: ImageIO Performance

2023-02-11 Thread Nir Soffer

On Thu, Feb 9, 2023 at 7:03 PM Nir Soffer  wrote:
>
> On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
>  wrote:
> >
> > Hi All,
> >
> > We backup our VM's with a custom script based on the
> > https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/backup_vm.py
> > example.
> > This works fine, but we start to see scaling issues.
> >
> > On VM's where there are a lot of dirty blocks,
>
> We need to see the list of extents returned by the server.
>
> The easiest way would be to enable debug logs - it will be even slower,
> but we will see these logs showing all extents:
>
> log.debug("Copying %s", ext)
> log.debug("Zeroing %s", ext)
> log.debug("Skipping %s", ext)
>
> It will also show other info that can help to understand why it is slow.
>
> > the transfer goes really
> > slow (sometimes only 20MiB/sec).
>
> Seems much slower than expected
>
> > At the same time we see that ovirt-imageio process sometimes uses 100%
> > CPU
>
> This is possible, it shows that you do a lot of requests.
>
> > (its single threaded?).
>
> It uses thread per connection model. When used with backup_vm.py or other
> examples using the ovirt_imageio.client it usually use 4 connections
> per transfer
> so there will be 4 threads on the server size serving the data.
>
> Please share debug log of a slow backup, and info about the backup image 
> storage
> for example, is this local file system or NFS?

I opened https://github.com/oVirt/ovirt-imageio/issues/175 to make
debugging such
issue easier.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HP2OZ3E3HAZIE2OCUKXSZIVEDYKQO2DY/

[ovirt-users] Re: ImageIO Performance

2023-02-09 Thread Nir Soffer

On Thu, Feb 9, 2023 at 7:03 PM Nir Soffer  wrote:
>
> On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
>  wrote:
> The easiest way would be to enable debug logs - it will be even slower,
> but we will see these logs showing all extents:

Using the --debug option

Run backup_vm.py with --help to see all options.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HPJVPFQD32Q55ZALOLK6AQ6PFXNGLGMX/

[ovirt-users] Re: ImageIO Performance

2023-02-09 Thread Nir Soffer

On Mon, Feb 6, 2023 at 10:00 AM Jean-Louis Dupond via Users
 wrote:
>
> Hi All,
>
> We backup our VM's with a custom script based on the
> https://github.com/oVirt/python-ovirt-engine-sdk4/blob/main/examples/backup_vm.py
> example.
> This works fine, but we start to see scaling issues.
>
> On VM's where there are a lot of dirty blocks,

We need to see the list of extents returned by the server.

The easiest way would be to enable debug logs - it will be even slower,
but we will see these logs showing all extents:

log.debug("Copying %s", ext)
log.debug("Zeroing %s", ext)
log.debug("Skipping %s", ext)

It will also show other info that can help to understand why it is slow.

> the transfer goes really
> slow (sometimes only 20MiB/sec).

Seems much slower than expected

> At the same time we see that ovirt-imageio process sometimes uses 100%
> CPU

This is possible, it shows that you do a lot of requests.

> (its single threaded?).

It uses thread per connection model. When used with backup_vm.py or other
examples using the ovirt_imageio.client it usually use 4 connections
per transfer
so there will be 4 threads on the server size serving the data.

Please share debug log of a slow backup, and info about the backup image storage
for example, is this local file system or NFS?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HE6ZKEPEHOYDCIHLUFAW4MDQVXYO2TA6/

Re: [Libguestfs] [libnbd PATCH v2 3/3] nbdsh: Improve --help and initial banner contents.

2023-01-31 Thread Nir Soffer

On Tue, Jan 31, 2023 at 12:34 AM Richard W.M. Jones  wrote:
>
> On Fri, Nov 04, 2022 at 04:18:31PM -0500, Eric Blake wrote:
> > Document all options in --help output.  If -n is not in use, then
> > enhance the banner to print the current state of h, and further tailor
> > the advice given on useful next steps to take to mention opt_go when
> > using --opt-mode.
>
> I had to partially revert this patch (reverting most of it) because it
> unfortunately breaks the implicit handle creation :-(
>
> https://gitlab.com/nbdkit/libnbd/-/commit/5a02c7d2cc6a201f9e5531c0c20c2f3c22b805a2
>
> I'm not actually sure how to do this correctly in Python.  I made
> several attempts, but I don't think Python is very good about having a
> variable which is only defined on some paths -- maybe it's not
> possible at all.

Can you share the error when it breaks?

I'm not sure what is the issue, but usually if you have a global
variable created only in
some flows, adding:

thing = None

At the start of the module makes sure that the name exists later,
regardless of the flow
taken. Code can take the right action based on:

if thing is None:
...

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v v2] -o rhv-upload: Improve error message for invalid or missing -os parameter

2023-01-28 Thread Nir Soffer

On Fri, Jan 27, 2023 at 2:26 PM Richard W.M. Jones  wrote:
>
> For -o rhv-upload, the -os parameter specifies the storage domain.
> Because the RHV API allows globs when searching for a domain, if you
> used a parameter like -os 'data*' then this would confuse the Python
> code, since it can glob to the name of a storage domain, but then
> later fail when we try to exact match the storage domain we found.
> The result of this was a confusing error in the precheck script:
>
>   IndexError: list index out of range
>
> This fix validates the output storage parameter before trying to use
> it.  Since valid storage domain names cannot contain glob characters
> or spaces, it avoids the problems above and improves the error message
> that the user sees:
>
>   $ virt-v2v [...] -o rhv-upload -os ''
>   ...
>   RuntimeError: The storage domain (-os) parameter ‘’ is not valid
>   virt-v2v: error: failed server prechecks, see earlier errors
>
>   $ virt-v2v [...] -o rhv-upload -os 'data*'
>   ...
>   RuntimeError: The storage domain (-os) parameter ‘data*’ is not valid
>   virt-v2v: error: failed server prechecks, see earlier errors
>

Makes sense, the new errors are very helpful.

> Although the IndexError should no longer happen, I also added a
> try...catch around that code to improve the error in case it still
> happens.

Theoretically it can happen if the admin changes the storage domain
name or detaches the domain from the data center in the window
after the precheck completes and before the transfer starts.

>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> Reported-by: Junqin Zhou
> Thanks: Nir Soffer
> ---
>  output/rhv-upload-precheck.py | 27 +--
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> index 1dc1b8498a..ba125611ba 100644
> --- a/output/rhv-upload-precheck.py
> +++ b/output/rhv-upload-precheck.py
> @@ -18,6 +18,7 @@
>
>  import json
>  import logging
> +import re
>  import sys
>
>  from urllib.parse import urlparse
> @@ -46,6 +47,15 @@ output_password = output_password.rstrip()
>  parsed = urlparse(params['output_conn'])
>  username = parsed.username or "admin@internal"
>
> +# Check the storage domain name is valid
> +# (https://bugzilla.redhat.com/show_bug.cgi?id=1986386#c1)
> +# Also this means it cannot contain spaces or glob symbols, so
> +# the search below is valid.
> +output_storage = params['output_storage']
> +if not re.match('^[-a-zA-Z0-9_]+$', output_storage):

The comment in the bug does not point to the docs or code enforcing
the domain name restrictions, but I validated with ovirt 4.5. Trying to
create a domain name with a space or using Hebrew characters is blocked
in the UI, displaying an error. See attached screenshots.

I think it is highly unlikely that this limit will change in the
future since nobody
is working on oVirt now, but if it does change this may prevent uploading to an
existing storage domain.

> +raise RuntimeError("The storage domain (-os) parameter ‘%s’ is not 
> valid" %
> +   output_storage)
> +
>  # Connect to the server.
>  connection = sdk.Connection(
>  url=params['output_conn'],
> @@ -60,28 +70,33 @@ system_service = connection.system_service()
>
>  # Check whether there is a datacenter for the specified storage.
>  data_centers = system_service.data_centers_service().list(
> -search='storage.name=%s' % params['output_storage'],
> +search='storage.name=%s' % output_storage,
>  case_sensitive=True,
>  )
>  if len(data_centers) == 0:
>  storage_domains = system_service.storage_domains_service().list(
> -search='name=%s' % params['output_storage'],
> +search='name=%s' % output_storage,
>  case_sensitive=True,
>  )
>  if len(storage_domains) == 0:
>  # The storage domain does not even exist.
>  raise RuntimeError("The storage domain ‘%s’ does not exist" %
> -   (params['output_storage']))
> +   output_storage)
>
>  # The storage domain is not attached to a datacenter
>  # (shouldn't happen, would fail on disk creation).
>  raise RuntimeError("The storage domain ‘%s’ is not attached to a DC" %
> -   (params['output_storage']))
> +   output_storage)
>  datacenter = data_centers[0]
>
>  # Get the storage domain.
>  storage_domains = connection.follow_link(datacenter.storage_domains)
> -storage_domain = [sd for sd in storage_domains if sd.name == 
> params['output_storage']][0]
> +try:
> +storage_domain = [sd for sd in s

Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain does not exist

2023-01-27 Thread Nir Soffer

On Fri, Jan 27, 2023 at 1:18 PM Nir Soffer  wrote:
>
> On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
> >
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> > Reported-by: Junqin Zhou
> > ---
> >  output/rhv-upload-precheck.py | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> > index 1dc1b8498a..35ea021032 100644
> > --- a/output/rhv-upload-precheck.py
> > +++ b/output/rhv-upload-precheck.py
> > @@ -81,7 +81,12 @@ datacenter = data_centers[0]
> >
> >  # Get the storage domain.
> >  storage_domains = connection.follow_link(datacenter.storage_domains)
> > -storage_domain = [sd for sd in storage_domains if sd.name == 
> > params['output_storage']][0]
> > +try:
> > +storage_domain = [sd for sd in storage_domains \
> > +  if sd.name == params['output_storage']][0]
>
> Using `\` may work but it is needed. You can do this:
>
> storage_domain = [sd for sd in storage_domains
>if sd.name == params['output_storage']][0]
>
> This is also the common way to indent list comprehension that
> makes the expression more clear.
>
> > +except IndexError:
> > +raise RuntimeError("The storage domain ‘%s’ does not exist" %
> > +   params['output_storage'])
>
> The fix is safe and makes sense.
>
> Not sure why we list all storage domains when we already know the name,
> maybe Albert would like to clean up this mess later.

Like this:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/2aa50266056b7ee0b72597f346cbf0f006041566/examples/list_storage_domains.py#L93

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain does not exist

2023-01-27 Thread Nir Soffer

On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1986386
> Reported-by: Junqin Zhou
> ---
>  output/rhv-upload-precheck.py | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/output/rhv-upload-precheck.py b/output/rhv-upload-precheck.py
> index 1dc1b8498a..35ea021032 100644
> --- a/output/rhv-upload-precheck.py
> +++ b/output/rhv-upload-precheck.py
> @@ -81,7 +81,12 @@ datacenter = data_centers[0]
>
>  # Get the storage domain.
>  storage_domains = connection.follow_link(datacenter.storage_domains)
> -storage_domain = [sd for sd in storage_domains if sd.name == 
> params['output_storage']][0]
> +try:
> +storage_domain = [sd for sd in storage_domains \
> +  if sd.name == params['output_storage']][0]

Using `\` may work but it is needed. You can do this:

storage_domain = [sd for sd in storage_domains
   if sd.name == params['output_storage']][0]

This is also the common way to indent list comprehension that
makes the expression more clear.

> +except IndexError:
> +raise RuntimeError("The storage domain ‘%s’ does not exist" %
> +   params['output_storage'])

The fix is safe and makes sense.

Not sure why we list all storage domains when we already know the name,
maybe Albert would like to clean up this mess later.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [PATCH v2v] -o rhv-upload: Give a nicer error if the storage domain

2023-01-27 Thread Nir Soffer

On Thu, Jan 26, 2023 at 2:31 PM Richard W.M. Jones  wrote:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1986386
>
> My RHV instance is dead at the moment so I didn't do much more than
> check this compiles and passes the one test we have.  Also I want to
> spend as little time as possible on RHV outputs for virt-v2v since the
> RHV product will be discontinued soon.
>
> I did want to point out some things:
>
>  - The preceeding code is probably wrong.
>
> https://github.com/libguestfs/virt-v2v/blob/master/output/rhv-upload-transfer.py#L70
>
>It attempts to search for the output storage using:
>
> storage_domains = system_service.storage_domains_service().list(
> search='name=%s' % params['output_storage'],
> case_sensitive=True,
> )

I think the search is correct. This is explained in
https://bugzilla.redhat.com/1986386#c1

>I couldn't find any documentation about what can go into that
>search string, but it's clearly a lot more complicated than just
>pasting in the literal name after "name=".  At the very least,
>spaces are not permitted, see:
>
> https://github.com/libguestfs/virt-v2v/blob/master/output/rhv-upload-transfer.py#L70

True, search can be an expression.

>  - The bug reporter used "data*" as the name and I suspect that is
>parsed in some way (wildcard? regexp? I've no idea).

It is treated as glob pattern, also explained in comment 1.

>  - Probably for the same reason, the preceeding code ought to fail
>with an error if the output storage domain doesn't exist.  The fact
>we reach the code patched here at all also indicates some bug,
>maybe in the search string.
>
> As I say above, I don't especially care about any of this.

I'm not working on RHV since August 2022. Adding Albert who is current
RHV storage maintainer.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer

On Tue, Dec 13, 2022 at 8:09 PM Hanna Reitz  wrote:
>
> On 13.12.22 16:56, Nir Soffer wrote:
> > On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
> >> On 28.11.22 15:15, Nir Soffer wrote:
> >>> Extend the test finder to find tests with format (*.out.qcow2) or cache
> >>> specific (*.out.nocache) out file. This worked before only for the
> >>> numbered tests.
> >>> ---
> >>>tests/qemu-iotests/findtests.py | 10 --
> >>>1 file changed, 8 insertions(+), 2 deletions(-)
> >> This patch lacks an S-o-b, too.
> >>
> >>> diff --git a/tests/qemu-iotests/findtests.py 
> >>> b/tests/qemu-iotests/findtests.py
> >>> index dd77b453b8..f4344ce78c 100644
> >>> --- a/tests/qemu-iotests/findtests.py
> >>> +++ b/tests/qemu-iotests/findtests.py
> >>> @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> 
> >>> Iterator[None]:
> >>>os.chdir(saved_dir)
> >>>
> >>>
> >>>class TestFinder:
> >>>def __init__(self, test_dir: Optional[str] = None) -> None:
> >>>self.groups = defaultdict(set)
> >>>
> >>>with chdir(test_dir):
> >>>self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >>>self.all_tests += [f for f in glob.iglob('tests/*')
> >>> -   if not f.endswith('.out') and
> >>> -   os.path.isfile(f + '.out')]
> >>> +   if self.is_test(f)]
> >> So previously a file was only considered a test file if there was a
> >> corresponding reference output file (`f + '.out'`), so files without
> >> such a reference output aren’t considered test files...
> >>
> >>>for t in self.all_tests:
> >>>with open(t, encoding="utf-8") as f:
> >>>for line in f:
> >>>if line.startswith('# group: '):
> >>>for g in line.split()[2:]:
> >>>self.groups[g].add(t)
> >>>break
> >>>
> >>> +def is_test(self, fname: str) -> bool:
> >>> +"""
> >>> +The tests directory contains tests (no extension) and out files
> >>> +(*.out, *.out.{format}, *.out.{option}).
> >>> +"""
> >>> +return re.search(r'.+\.out(\.\w+)?$', fname) is None
> >> ...but this new function doesn’t check that.  I think we should check it
> >> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> >> `fname`) so that behavior isn’t changed.
> > This means that you cannot add a test without a *.out* file, which may
> >   be useful when you don't use the out file for validation, but we can
> > add this later if needed.
>
> I don’t think tests work without a reference output, do they?  At least
> a couple of years ago, the ./check script would refuse to run tests
> without a corresponding .out file.

This may be true, but most tests do not really need an out file and better be
verified by asserting. There are some python tests that have pointless out
file with the output of python unittest:

$ cat tests/qemu-iotests/tests/nbd-multiconn.out
...
--
Ran 3 tests

OK

This is not only unhelpful (update the output when adding a 4th test)
but fragile.
if unitests changes the output, maybe adding info about skipped tests, or
changing "---" to "", the test will break.

But for now I agree the test framework should keep the current behavior.

Nir

Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer

On Tue, Dec 13, 2022 at 8:09 PM Hanna Reitz  wrote:
>
> On 13.12.22 16:56, Nir Soffer wrote:
> > On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
> >> On 28.11.22 15:15, Nir Soffer wrote:
> >>> Extend the test finder to find tests with format (*.out.qcow2) or cache
> >>> specific (*.out.nocache) out file. This worked before only for the
> >>> numbered tests.
> >>> ---
> >>>tests/qemu-iotests/findtests.py | 10 --
> >>>1 file changed, 8 insertions(+), 2 deletions(-)
> >> This patch lacks an S-o-b, too.
> >>
> >>> diff --git a/tests/qemu-iotests/findtests.py 
> >>> b/tests/qemu-iotests/findtests.py
> >>> index dd77b453b8..f4344ce78c 100644
> >>> --- a/tests/qemu-iotests/findtests.py
> >>> +++ b/tests/qemu-iotests/findtests.py
> >>> @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> 
> >>> Iterator[None]:
> >>>os.chdir(saved_dir)
> >>>
> >>>
> >>>class TestFinder:
> >>>def __init__(self, test_dir: Optional[str] = None) -> None:
> >>>self.groups = defaultdict(set)
> >>>
> >>>with chdir(test_dir):
> >>>self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >>>self.all_tests += [f for f in glob.iglob('tests/*')
> >>> -   if not f.endswith('.out') and
> >>> -   os.path.isfile(f + '.out')]
> >>> +   if self.is_test(f)]
> >> So previously a file was only considered a test file if there was a
> >> corresponding reference output file (`f + '.out'`), so files without
> >> such a reference output aren’t considered test files...
> >>
> >>>for t in self.all_tests:
> >>>with open(t, encoding="utf-8") as f:
> >>>for line in f:
> >>>if line.startswith('# group: '):
> >>>for g in line.split()[2:]:
> >>>self.groups[g].add(t)
> >>>break
> >>>
> >>> +def is_test(self, fname: str) -> bool:
> >>> +"""
> >>> +The tests directory contains tests (no extension) and out files
> >>> +(*.out, *.out.{format}, *.out.{option}).
> >>> +"""
> >>> +return re.search(r'.+\.out(\.\w+)?$', fname) is None
> >> ...but this new function doesn’t check that.  I think we should check it
> >> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> >> `fname`) so that behavior isn’t changed.
> > This means that you cannot add a test without a *.out* file, which may
> >   be useful when you don't use the out file for validation, but we can
> > add this later if needed.
>
> I don’t think tests work without a reference output, do they?  At least
> a couple of years ago, the ./check script would refuse to run tests
> without a corresponding .out file.

This may be true, but most tests do not really need an out file and better be
verified by asserting. There are some python tests that have pointless out
file with the output of python unittest:

$ cat tests/qemu-iotests/tests/nbd-multiconn.out
...
--
Ran 3 tests

OK

This is not only unhelpful (update the output when adding a 4th test)
but fragile.
if unitests changes the output, maybe adding info about skipped tests, or
changing "---" to "", the test will break.

But for now I agree the test framework should keep the current behavior.

Nir

Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer

On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
>
> On 28.11.22 15:15, Nir Soffer wrote:
> > Extend the test finder to find tests with format (*.out.qcow2) or cache
> > specific (*.out.nocache) out file. This worked before only for the
> > numbered tests.
> > ---
> >   tests/qemu-iotests/findtests.py | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
>
> This patch lacks an S-o-b, too.
>
> > diff --git a/tests/qemu-iotests/findtests.py 
> > b/tests/qemu-iotests/findtests.py
> > index dd77b453b8..f4344ce78c 100644
> > --- a/tests/qemu-iotests/findtests.py
> > +++ b/tests/qemu-iotests/findtests.py
> > @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
> >   os.chdir(saved_dir)
> >
> >
> >   class TestFinder:
> >   def __init__(self, test_dir: Optional[str] = None) -> None:
> >   self.groups = defaultdict(set)
> >
> >   with chdir(test_dir):
> >   self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >   self.all_tests += [f for f in glob.iglob('tests/*')
> > -   if not f.endswith('.out') and
> > -   os.path.isfile(f + '.out')]
> > +   if self.is_test(f)]
>
> So previously a file was only considered a test file if there was a
> corresponding reference output file (`f + '.out'`), so files without
> such a reference output aren’t considered test files...
>
> >   for t in self.all_tests:
> >   with open(t, encoding="utf-8") as f:
> >   for line in f:
> >   if line.startswith('# group: '):
> >   for g in line.split()[2:]:
> >   self.groups[g].add(t)
> >   break
> >
> > +def is_test(self, fname: str) -> bool:
> > +"""
> > +The tests directory contains tests (no extension) and out files
> > +(*.out, *.out.{format}, *.out.{option}).
> > +"""
> > +return re.search(r'.+\.out(\.\w+)?$', fname) is None
>
> ...but this new function doesn’t check that.  I think we should check it
> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> `fname`) so that behavior isn’t changed.

This means that you cannot add a test without a *.out* file, which may
 be useful when you don't use the out file for validation, but we can
add this later if needed.

I'll change the code to check both conditions.

Re: [PATCH v2 2/5] Support format or cache specific out file

2022-12-13 Thread Nir Soffer

On Mon, Dec 12, 2022 at 12:38 PM Hanna Reitz  wrote:
>
> On 28.11.22 15:15, Nir Soffer wrote:
> > Extend the test finder to find tests with format (*.out.qcow2) or cache
> > specific (*.out.nocache) out file. This worked before only for the
> > numbered tests.
> > ---
> >   tests/qemu-iotests/findtests.py | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
>
> This patch lacks an S-o-b, too.
>
> > diff --git a/tests/qemu-iotests/findtests.py 
> > b/tests/qemu-iotests/findtests.py
> > index dd77b453b8..f4344ce78c 100644
> > --- a/tests/qemu-iotests/findtests.py
> > +++ b/tests/qemu-iotests/findtests.py
> > @@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
> >   os.chdir(saved_dir)
> >
> >
> >   class TestFinder:
> >   def __init__(self, test_dir: Optional[str] = None) -> None:
> >   self.groups = defaultdict(set)
> >
> >   with chdir(test_dir):
> >   self.all_tests = glob.glob('[0-9][0-9][0-9]')
> >   self.all_tests += [f for f in glob.iglob('tests/*')
> > -   if not f.endswith('.out') and
> > -   os.path.isfile(f + '.out')]
> > +   if self.is_test(f)]
>
> So previously a file was only considered a test file if there was a
> corresponding reference output file (`f + '.out'`), so files without
> such a reference output aren’t considered test files...
>
> >   for t in self.all_tests:
> >   with open(t, encoding="utf-8") as f:
> >   for line in f:
> >   if line.startswith('# group: '):
> >   for g in line.split()[2:]:
> >   self.groups[g].add(t)
> >   break
> >
> > +def is_test(self, fname: str) -> bool:
> > +"""
> > +The tests directory contains tests (no extension) and out files
> > +(*.out, *.out.{format}, *.out.{option}).
> > +"""
> > +return re.search(r'.+\.out(\.\w+)?$', fname) is None
>
> ...but this new function doesn’t check that.  I think we should check it
> (just whether there’s any variant of `/{fname}\.out(\.\w+)?/` to go with
> `fname`) so that behavior isn’t changed.

This means that you cannot add a test without a *.out* file, which may
 be useful when you don't use the out file for validation, but we can
add this later if needed.

I'll change the code to check both conditions.

[ovirt-users] Re: How does ovirt handle disks across multiple iscsi LUNs

2022-12-04 Thread Nir Soffer

On Sun, Nov 27, 2022 at 9:11 PM  wrote:

> A possibly obvious question I can't find the answer to anywhere—how does
> ovirt allocate VM disk images when a storage domain has multiple LUNs? Are
> these allocated one per LUN, so if e.g. a LUN runs out of space the disks
> on that LUN (only) will be unable to write? Or are these distributed across
> LUNs, so if a LUN fails due to storage failure etc the entire storage
> domain can be affected?
>

A storage domain is exactly one LVM Volume Group (VG). Disks are created
from volume, which are LVM Logical Volume (LV). Each time you create a
snapshot oVirt creates a new volume. So a disk may have one or more LVs in
the VG.

The volumes may be extended as more space is needed. Up to 4.5, oVirt
extended the volumes in chunks of 1g. Since 4.5, it uses chunks of 2.5g.

So every disk may contain multiple chunks in different size, and these may
be allocated anywhere in the VG logical
space, so they may be on any PV.

To understand how the chunks are allocated, you can inspect the each LV
like this:

# lvdisplay -m --devicesfile=
bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  --- Logical volume ---
  LV Path
 /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  LV Name51de2d8b-b67e-4a91-bc68-a2c922bc7398
  VG Namebafd0f16-9aba-4f9f-ba90-46d3b8a29157
  LV UUIDW0FAvX-EUDc-v7QR-4A2A-3aSX-yGC5-2jREeV
  LV Write Accessread/write
  LV Creation host, time host4, 2022-12-04 01:43:17 +0200
  LV Status  NOT available
  LV Size200.00 GiB
  Current LE 1600
  Segments   3
  Allocation inherit
  Read ahead sectors auto

  --- Segments ---
  Logical extents 0 to 796:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02
Physical extents 0 to 796

  Logical extents 797 to 1593:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03
Physical extents 0 to 796

  Logical extents 1594 to 1599:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 49 to 54

Note that oVirt uses lvm devices files to prevent unwanted access of
volumes by lvm
commands. To disable the devices file temporarily you can use
--devicesfile=.

After extending this disk by 10g:

# lvdisplay -m --devicesfile=
bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  --- Logical volume ---
  LV Path
 /dev/bafd0f16-9aba-4f9f-ba90-46d3b8a29157/51de2d8b-b67e-4a91-bc68-a2c922bc7398
  LV Name51de2d8b-b67e-4a91-bc68-a2c922bc7398
  VG Namebafd0f16-9aba-4f9f-ba90-46d3b8a29157
  LV UUIDW0FAvX-EUDc-v7QR-4A2A-3aSX-yGC5-2jREeV
  LV Write Accessread/write
  LV Creation host, time host4, 2022-12-04 01:43:17 +0200
  LV Status  NOT available
  LV Size210.00 GiB
  Current LE 1680
  Segments   7
  Allocation inherit
  Read ahead sectors auto

  --- Segments ---
  Logical extents 0 to 796:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-02
Physical extents 0 to 796

  Logical extents 797 to 1593:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-03
Physical extents 0 to 796

  Logical extents 1594 to 1613:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 49 to 68

  Logical extents 1614 to 1616:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 244 to 246

  Logical extents 1617 to 1619:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 154 to 156

  Logical extents 1620 to 1648:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 177 to 205

  Logical extents 1649 to 1679:
Type linear
Physical volume /dev/mapper/0QEMU_QEMU_HARDDISK_data-fc-01
Physical extents 531 to 561

I hope it helps.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V5QX3OJ6KYMXGBKIUEPM7ES4L4HG5UJ6/

[PATCH v2 5/5] qemu-img: Speed up checksum

2022-11-28 Thread Nir Soffer

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 350 ++---
 1 file changed, 277 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4b4ca7add3..5f63a769a9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1618,50 +1618,296 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/*
+ * Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes.
+ */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/*
+ * Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready.
+ */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, , NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(>lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s) < 0) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset = s->offset;
+w->lengt

[PATCH v2 1/5] qemu-img.c: Move IO_BUF_SIZE to the top of the file

2022-11-28 Thread Nir Soffer

This macro is used by various commands (compare, convert, rebase) but it
is defined somewhere in the middle of the file. I'm going to use it in
the new checksum command so lets clean up a bit before that.
---
 qemu-img.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a9b3a8103c..c03d6b4b31 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -49,20 +49,21 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
 #include "trace/control.h"
 #include "qemu/throttle.h"
 #include "block/throttle-groups.h"
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \
   "\n" QEMU_COPYRIGHT "\n"
+#define IO_BUF_SIZE (2 * MiB)
 
 typedef struct img_cmd_t {
 const char *name;
 int (*handler)(int argc, char **argv);
 } img_cmd_t;
 
 enum {
 OPTION_OUTPUT = 256,
 OPTION_BACKING_CHAIN = 257,
 OPTION_OBJECT = 258,
@@ -1281,22 +1282,20 @@ static int compare_buffers(const uint8_t *buf1, const 
uint8_t *buf2,
 if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
 break;
 }
 i += len;
 }
 
 *pnum = i;
 return res;
 }
 
-#define IO_BUF_SIZE (2 * MiB)
-
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
  * Intended for use by 'qemu-img compare': Returns 0 in case sectors are
  * filled with 0, 1 if sectors contain non-zero data (this is a comparison
  * failure), and 4 on error (the exit status for read errors), after emitting
  * an error message.
  *
  * @param blk:  BlockBackend for the image
  * @param offset: Starting offset to check
-- 
2.38.1

[PATCH v2 4/5] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer

Add simple tests computing a checksum for image with all kinds of
extents in raw and qcow2 formats.

The test can be extended later for other formats, format options (e..g
compressed qcow2), protocols (e.g. nbd), and image with a backing chain,
but I'm not sure this is really needed.

To help debugging in case of failures, the output includes a json map of
the test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 63 +++
 .../tests/qemu-img-checksum.out.qcow2 | 11 
 .../tests/qemu-img-checksum.out.raw   | 10 +++
 3 files changed, 84 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3577a0bc41
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file"],
+)
+
+print("=== Create test image ===\n")
+
+disk = iotests.file_path('disk')
+qemu_img("create", "-f", iotests.imgfmt, disk, "10m")
+qemu_io("-f", iotests.imgfmt,
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk)
+print(filter_testfiles(disk))
+qemu_img_log("map", "--output", "json", disk)
+
+print("=== Compute checksum ===\n")
+
+qemu_img_log("checksum", "-T", iotests.cachemode, disk)
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
new file mode 100644
index 00..02b9616e5b
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
@@ -0,0 +1,11 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 327680},
+{ "start": 4194304, "length": 4194304, "depth": 0, "present": true, "zero": 
true, "data": false},
+{ "start": 8388608, "length": 2097152, "depth": 0, "present": false, "zero": 
true, "data": false}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.raw 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
new file mode 100644
index 00..6294e4dace
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
@@ -0,0 +1,10 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 0},
+{ "start": 4194304, "length": 6291456, "depth": 0, "present": true, "zero": 
true, "data": false, "offset": 4194304}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
-- 
2.38.1

[PATCH v2 3/5] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  24 ++
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 183 
 5 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..d856785ecc 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,44 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings will have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum* for
+  optimization purposes; using multithreading and optimized handling of zero
+  areas. For more info please see https://gitlab.com/nirs/blkhash.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.b

[PATCH v2 2/5] Support format or cache specific out file

2022-11-28 Thread Nir Soffer

Extend the test finder to find tests with format (*.out.qcow2) or cache
specific (*.out.nocache) out file. This worked before only for the
numbered tests.
---
 tests/qemu-iotests/findtests.py | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/findtests.py b/tests/qemu-iotests/findtests.py
index dd77b453b8..f4344ce78c 100644
--- a/tests/qemu-iotests/findtests.py
+++ b/tests/qemu-iotests/findtests.py
@@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
 os.chdir(saved_dir)
 
 
 class TestFinder:
 def __init__(self, test_dir: Optional[str] = None) -> None:
 self.groups = defaultdict(set)
 
 with chdir(test_dir):
 self.all_tests = glob.glob('[0-9][0-9][0-9]')
 self.all_tests += [f for f in glob.iglob('tests/*')
-   if not f.endswith('.out') and
-   os.path.isfile(f + '.out')]
+   if self.is_test(f)]
 
 for t in self.all_tests:
 with open(t, encoding="utf-8") as f:
 for line in f:
 if line.startswith('# group: '):
 for g in line.split()[2:]:
 self.groups[g].add(t)
 break
 
+def is_test(self, fname: str) -> bool:
+"""
+The tests directory contains tests (no extension) and out files
+(*.out, *.out.{format}, *.out.{option}).
+"""
+return re.search(r'.+\.out(\.\w+)?$', fname) is None
+
 def add_group_file(self, fname: str) -> None:
 with open(fname, encoding="utf-8") as f:
 for line in f:
 line = line.strip()
 
 if (not line) or line[0] == '#':
 continue
 
 words = line.split()
 test_file = self.parse_test_name(words[0])
-- 
2.38.1

[PATCH v2 0/5] Add qemu-img checksum command using blkhash

2022-11-28 Thread Nir Soffer

Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Changes since v1 (Hanna):
- Move IO_BUF_SIZE to top of the file
- Extend TestFinder to support format or cache specific out files
- Improve online help (note about optimization and lint to blkhash project)
- Guard blkhash.h include with CONFIG_BLKHASH
- Using user_creatable_process_cmdline() instead of 
user_creatable_add_from_str()
- Rename ret to exit_code
- Add static assert to ensure that read buffer is algined to block size
- Drop unneeded pnum variable
- Change test to work like other tests; use iotest.imgfmt and iotest.cachemode
- Simplify test to test only raw and qcow2 format using file protocol
- Fix code style issues (multi-line comments, missing braces)
- Make error checking more clear (checksum_block_status(s) < 0)

v1:
https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html

v1 discussion:
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00602.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00603.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00604.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00171.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00173.html

Nir Soffer (5):
  qemu-img.c: Move IO_BUF_SIZE to the top of the file
  Support format or cache specific out file
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  24 ++
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 390 +-
 tests/qemu-iotests/findtests.py   |  10 +-
 tests/qemu-iotests/tests/qemu-img-checksum|  63 +++
 .../tests/qemu-img-checksum.out.qcow2 |  11 +
 .../tests/qemu-img-checksum.out.raw   |  10 +
 9 files changed, 523 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

-- 
2.38.1

[PATCH v2 0/5] Add qemu-img checksum command using blkhash

2022-11-28 Thread Nir Soffer

Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Changes since v1 (Hanna):
- Move IO_BUF_SIZE to top of the file
- Extend TestFinder to support format or cache specific out files
- Improve online help (note about optimization and lint to blkhash project)
- Guard blkhash.h include with CONFIG_BLKHASH
- Using user_creatable_process_cmdline() instead of 
user_creatable_add_from_str()
- Rename ret to exit_code
- Add static assert to ensure that read buffer is algined to block size
- Drop unneeded pnum variable
- Change test to work like other tests; use iotest.imgfmt and iotest.cachemode
- Simplify test to test only raw and qcow2 format using file protocol
- Fix code style issues (multi-line comments, missing braces)
- Make error checking more clear (checksum_block_status(s) < 0)

v1:
https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html

v1 discussion:
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00602.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00603.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-10/msg00604.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00171.html
- https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00173.html

Nir Soffer (5):
  qemu-img.c: Move IO_BUF_SIZE to the top of the file
  Support format or cache specific out file
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  24 ++
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 390 +-
 tests/qemu-iotests/findtests.py   |  10 +-
 tests/qemu-iotests/tests/qemu-img-checksum|  63 +++
 .../tests/qemu-img-checksum.out.qcow2 |  11 +
 .../tests/qemu-img-checksum.out.raw   |  10 +
 9 files changed, 523 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

-- 
2.38.1

[PATCH v2 5/5] qemu-img: Speed up checksum

2022-11-28 Thread Nir Soffer

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 350 ++---
 1 file changed, 277 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4b4ca7add3..5f63a769a9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1618,50 +1618,296 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/*
+ * Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes.
+ */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/*
+ * Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready.
+ */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, , NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(>lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s) < 0) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset = s->offset;
+w->lengt

[PATCH v2 4/5] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer

Add simple tests computing a checksum for image with all kinds of
extents in raw and qcow2 formats.

The test can be extended later for other formats, format options (e..g
compressed qcow2), protocols (e.g. nbd), and image with a backing chain,
but I'm not sure this is really needed.

To help debugging in case of failures, the output includes a json map of
the test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 63 +++
 .../tests/qemu-img-checksum.out.qcow2 | 11 
 .../tests/qemu-img-checksum.out.raw   | 10 +++
 3 files changed, 84 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out.raw

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3577a0bc41
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,63 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file"],
+)
+
+print("=== Create test image ===\n")
+
+disk = iotests.file_path('disk')
+qemu_img("create", "-f", iotests.imgfmt, disk, "10m")
+qemu_io("-f", iotests.imgfmt,
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk)
+print(filter_testfiles(disk))
+qemu_img_log("map", "--output", "json", disk)
+
+print("=== Compute checksum ===\n")
+
+qemu_img_log("checksum", "-T", iotests.cachemode, disk)
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
new file mode 100644
index 00..02b9616e5b
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.qcow2
@@ -0,0 +1,11 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 327680},
+{ "start": 4194304, "length": 4194304, "depth": 0, "present": true, "zero": 
true, "data": false},
+{ "start": 8388608, "length": 2097152, "depth": 0, "present": false, "zero": 
true, "data": false}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
diff --git a/tests/qemu-iotests/tests/qemu-img-checksum.out.raw 
b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
new file mode 100644
index 00..6294e4dace
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum.out.raw
@@ -0,0 +1,10 @@
+=== Create test image ===
+
+TEST_DIR/PID-disk
+[{ "start": 0, "length": 4194304, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": 0},
+{ "start": 4194304, "length": 6291456, "depth": 0, "present": true, "zero": 
true, "data": false, "offset": 4194304}]
+
+=== Compute checksum ===
+
+57cd8ef0cfad106d737f8fb0de3a0306a8a1a41db7bf7c0c36e2dfe75ee9bd26  
TEST_DIR/PID-disk
+
-- 
2.38.1

[PATCH v2 2/5] Support format or cache specific out file

2022-11-28 Thread Nir Soffer

Extend the test finder to find tests with format (*.out.qcow2) or cache
specific (*.out.nocache) out file. This worked before only for the
numbered tests.
---
 tests/qemu-iotests/findtests.py | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/findtests.py b/tests/qemu-iotests/findtests.py
index dd77b453b8..f4344ce78c 100644
--- a/tests/qemu-iotests/findtests.py
+++ b/tests/qemu-iotests/findtests.py
@@ -38,31 +38,37 @@ def chdir(path: Optional[str] = None) -> Iterator[None]:
 os.chdir(saved_dir)
 
 
 class TestFinder:
 def __init__(self, test_dir: Optional[str] = None) -> None:
 self.groups = defaultdict(set)
 
 with chdir(test_dir):
 self.all_tests = glob.glob('[0-9][0-9][0-9]')
 self.all_tests += [f for f in glob.iglob('tests/*')
-   if not f.endswith('.out') and
-   os.path.isfile(f + '.out')]
+   if self.is_test(f)]
 
 for t in self.all_tests:
 with open(t, encoding="utf-8") as f:
 for line in f:
 if line.startswith('# group: '):
 for g in line.split()[2:]:
 self.groups[g].add(t)
 break
 
+def is_test(self, fname: str) -> bool:
+"""
+The tests directory contains tests (no extension) and out files
+(*.out, *.out.{format}, *.out.{option}).
+"""
+return re.search(r'.+\.out(\.\w+)?$', fname) is None
+
 def add_group_file(self, fname: str) -> None:
 with open(fname, encoding="utf-8") as f:
 for line in f:
 line = line.strip()
 
 if (not line) or line[0] == '#':
 continue
 
 words = line.split()
 test_file = self.parse_test_name(words[0])
-- 
2.38.1

[PATCH v2 3/5] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  24 ++
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 183 
 5 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..d856785ecc 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,44 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings will have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum* for
+  optimization purposes; using multithreading and optimized handling of zero
+  areas. For more info please see https://gitlab.com/nirs/blkhash.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.b

[PATCH v2 1/5] qemu-img.c: Move IO_BUF_SIZE to the top of the file

2022-11-28 Thread Nir Soffer

This macro is used by various commands (compare, convert, rebase) but it
is defined somewhere in the middle of the file. I'm going to use it in
the new checksum command so lets clean up a bit before that.
---
 qemu-img.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a9b3a8103c..c03d6b4b31 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -49,20 +49,21 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "block/qapi.h"
 #include "crypto/init.h"
 #include "trace/control.h"
 #include "qemu/throttle.h"
 #include "block/throttle-groups.h"
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \
   "\n" QEMU_COPYRIGHT "\n"
+#define IO_BUF_SIZE (2 * MiB)
 
 typedef struct img_cmd_t {
 const char *name;
 int (*handler)(int argc, char **argv);
 } img_cmd_t;
 
 enum {
 OPTION_OUTPUT = 256,
 OPTION_BACKING_CHAIN = 257,
 OPTION_OBJECT = 258,
@@ -1281,22 +1282,20 @@ static int compare_buffers(const uint8_t *buf1, const 
uint8_t *buf2,
 if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
 break;
 }
 i += len;
 }
 
 *pnum = i;
 return res;
 }
 
-#define IO_BUF_SIZE (2 * MiB)
-
 /*
  * Check if passed sectors are empty (not allocated or contain only 0 bytes)
  *
  * Intended for use by 'qemu-img compare': Returns 0 in case sectors are
  * filled with 0, 1 if sectors contain non-zero data (this is a comparison
  * failure), and 4 on error (the exit status for read errors), after emitting
  * an error message.
  *
  * @param blk:  BlockBackend for the image
  * @param offset: Starting offset to check
-- 
2.38.1

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer

On Mon, Nov 7, 2022 at 1:41 PM Hanna Reitz  wrote:

> On 30.10.22 18:38, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> > > Add simple tests creating an image with all kinds of extents,
> > different
> > > formats, different backing chain, different protocol, and different
> > > image options. Since all images have the same guest visible
> > content they
> > > must have the same checksum.
> > >
> > > To help debugging in case of failures, the output includes a
> > json map of
> > > every test image.
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >   tests/qemu-iotests/tests/qemu-img-checksum| 149
> > ++
> > >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> > >   2 files changed, 223 insertions(+)
> > >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> > >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> > >
> > > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> > b/tests/qemu-iotests/tests/qemu-img-checksum
> > > new file mode 100755
> > > index 00..3a85ba33f2
> > > --- /dev/null
> > > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > > @@ -0,0 +1,149 @@
> > > +#!/usr/bin/env python3
> > > +# group: rw auto quick
> > > +#
> > > +# Test cases for qemu-img checksum.
> > > +#
> > > +# Copyright (C) 2022 Red Hat, Inc.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > modify
> > > +# it under the terms of the GNU General Public License as
> > published by
> > > +# the Free Software Foundation; either version 2 of the License,
> or
> > > +# (at your option) any later version.
> > > +#
> > > +# This program is distributed in the hope that it will be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public
> License
> > > +# along with this program.  If not, see
> > <http://www.gnu.org/licenses/>.
> > > +
> > > +import re
> > > +
> > > +import iotests
> > > +
> > > +from iotests import (
> > > +filter_testfiles,
> > > +qemu_img,
> > > +qemu_img_log,
> > > +qemu_io,
> > > +qemu_nbd_popen,
> > > +)
> > > +
> > > +
> > > +def checksum_available():
> > > +out = qemu_img("--help").stdout
> > > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > > +
> > > +
> > > +if not checksum_available():
> > > +iotests.notrun("checksum command not available")
> > > +
> > > +iotests.script_initialize(
> > > +supported_fmts=["raw", "qcow2"],
> > > +supported_cache_modes=["none", "writeback"],
> >
> > It doesn’t work with writeback, though, because it uses -T none
> below.
> >
> >
> > Good point
> >
> >
> > Which by the way is a heavy cost, because I usually run tests in
> > tmpfs,
> > where this won’t work.  Is there any way of not doing the -T none
> > below?
> >
> >
> > Testing using tempfs is problematic since you cannot test -T none.In
> > oVirt
> > we alway use /var/tmp which usually uses something that supports
> > direct I/O.
> >
> > Do we have a way to specify cache mode in the tests, so we can use -T
> none
> > only when the option is set?
>
> `./check` has a `-c` option (e.g. `./check -c none`), which lands in
> `iotests.cachemode`.  That isn’t automatically passed to qemu-img calls,
> but you can do it manually (i.e. `qemu_img_log("checksum", "-T",
> iotests.cachemode, disk_top)` instead of `"-T", "none"`).
>

Ok, I will change to use the current cache setting.


> >
> > > +supported_p

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-11-28 Thread Nir Soffer

On Mon, Nov 7, 2022 at 1:41 PM Hanna Reitz  wrote:

> On 30.10.22 18:38, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> > > Add simple tests creating an image with all kinds of extents,
> > different
> > > formats, different backing chain, different protocol, and different
> > > image options. Since all images have the same guest visible
> > content they
> > > must have the same checksum.
> > >
> > > To help debugging in case of failures, the output includes a
> > json map of
> > > every test image.
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >   tests/qemu-iotests/tests/qemu-img-checksum| 149
> > ++
> > >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> > >   2 files changed, 223 insertions(+)
> > >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> > >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> > >
> > > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> > b/tests/qemu-iotests/tests/qemu-img-checksum
> > > new file mode 100755
> > > index 00..3a85ba33f2
> > > --- /dev/null
> > > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > > @@ -0,0 +1,149 @@
> > > +#!/usr/bin/env python3
> > > +# group: rw auto quick
> > > +#
> > > +# Test cases for qemu-img checksum.
> > > +#
> > > +# Copyright (C) 2022 Red Hat, Inc.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > modify
> > > +# it under the terms of the GNU General Public License as
> > published by
> > > +# the Free Software Foundation; either version 2 of the License,
> or
> > > +# (at your option) any later version.
> > > +#
> > > +# This program is distributed in the hope that it will be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public
> License
> > > +# along with this program.  If not, see
> > <http://www.gnu.org/licenses/>.
> > > +
> > > +import re
> > > +
> > > +import iotests
> > > +
> > > +from iotests import (
> > > +filter_testfiles,
> > > +qemu_img,
> > > +qemu_img_log,
> > > +qemu_io,
> > > +qemu_nbd_popen,
> > > +)
> > > +
> > > +
> > > +def checksum_available():
> > > +out = qemu_img("--help").stdout
> > > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > > +
> > > +
> > > +if not checksum_available():
> > > +iotests.notrun("checksum command not available")
> > > +
> > > +iotests.script_initialize(
> > > +supported_fmts=["raw", "qcow2"],
> > > +supported_cache_modes=["none", "writeback"],
> >
> > It doesn’t work with writeback, though, because it uses -T none
> below.
> >
> >
> > Good point
> >
> >
> > Which by the way is a heavy cost, because I usually run tests in
> > tmpfs,
> > where this won’t work.  Is there any way of not doing the -T none
> > below?
> >
> >
> > Testing using tempfs is problematic since you cannot test -T none.In
> > oVirt
> > we alway use /var/tmp which usually uses something that supports
> > direct I/O.
> >
> > Do we have a way to specify cache mode in the tests, so we can use -T
> none
> > only when the option is set?
>
> `./check` has a `-c` option (e.g. `./check -c none`), which lands in
> `iotests.cachemode`.  That isn’t automatically passed to qemu-img calls,
> but you can do it manually (i.e. `qemu_img_log("checksum", "-T",
> iotests.cachemode, disk_top)` instead of `"-T", "none"`).
>

Ok, I will change to use the current cache setting.


> >
> > > +supported_p

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer

On Mon, Nov 7, 2022 at 12:20 PM Hanna Reitz  wrote:

> On 30.10.22 18:37, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> [...]
> > > ---
> > >   docs/tools/qemu-img.rst |  22 +
> > >   meson.build |  10 ++-
> > >   meson_options.txt   |   2 +
> > >   qemu-img-cmds.hx|   8 ++
> > >   qemu-img.c  | 191
> > 
> > >   5 files changed, 232 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > > index 85a6e05b35..8be9c45cbf 100644
> > > --- a/docs/tools/qemu-img.rst
> > > +++ b/docs/tools/qemu-img.rst
> > > @@ -347,20 +347,42 @@ Command description:
> > >   Check completed, image is corrupted
> > > 3
> > >   Check completed, image has leaked clusters, but is not
> > corrupted
> > > 63
> > >   Checks are not supported by the image format
> > >
> > > If ``-r`` is specified, exit codes representing the image
> > state refer to the
> > > state after (the attempt at) repairing it. That is, a
> > successful ``-r all``
> > > will yield the exit code 0, independently of the image state
> > before.
> > >
> > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f
> > FMT] [-T SRC_CACHE] [-p] FILENAME
> > > +
> > > +  Print a checksum for image *FILENAME* guest visible content.
> >
> > Why not say which kind of checksum it is?
> >
> >
> > Do you mean the algorithm used? This may be confusing, for example we
> > write
> >
> >Print a sha256 checksum ...
> >
> > User will expect to get the same result from "sha256sum disk.img". How
> > about
> >
> >Print a blkhash checksum ...
> >
> > And add a link to the blkhash project?
>
> I did mean sha256, but if it isn’t pure sha256, then a link to any
> description how it is computed would be good, I think.
>

Ok, will link to https://gitlab.com/nirs/blkhash

[...]

>
> > > +  The checksum is not compatible with other tools such as
> > *sha256sum*.
> >
> > Why not?  I can see it differs even for raw images, but why?  I would
> > have very much assumed that this gives me exactly what sha256sum
> > in the
> > guest on the guest device would yield.
> >
> >
> > The blkhash is a construction based on other cryptographic hash
> > functions (e.g. sha256).
> > The way the hash is constructed is explained here:
> > https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52
> >
> > We can provide a very slow version using a single thread and no zero
> > optimization
> > that will create the same hash as sha256sum for raw image.
>
> Ah, right.  Yes, especially zero optimization is likely to make a huge
> difference.  Thanks for the explanation!
>
> Maybe that could be mentioned here as a side note, though?  E.g. “The
> checksum is not compatible with other tools such as *sha256sum* for
> optimization purposes (to allow multithreading and optimized handling of
> zero areas).”?
>

Ok, I will improve the text in the next version.

[...]

> > In blksum I do not allow changing the block size.
> >
> > I'll add an assert in the next version to keeps this default optimal.
>
> Thanks!  (Static assert should work, right?)
>

I think it should

Nir

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-11-28 Thread Nir Soffer

On Mon, Nov 7, 2022 at 12:20 PM Hanna Reitz  wrote:

> On 30.10.22 18:37, Nir Soffer wrote:
> > On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:
> >
> >     On 01.09.22 16:32, Nir Soffer wrote:
> [...]
> > > ---
> > >   docs/tools/qemu-img.rst |  22 +
> > >   meson.build |  10 ++-
> > >   meson_options.txt   |   2 +
> > >   qemu-img-cmds.hx|   8 ++
> > >   qemu-img.c  | 191
> > 
> > >   5 files changed, 232 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > > index 85a6e05b35..8be9c45cbf 100644
> > > --- a/docs/tools/qemu-img.rst
> > > +++ b/docs/tools/qemu-img.rst
> > > @@ -347,20 +347,42 @@ Command description:
> > >   Check completed, image is corrupted
> > > 3
> > >   Check completed, image has leaked clusters, but is not
> > corrupted
> > > 63
> > >   Checks are not supported by the image format
> > >
> > > If ``-r`` is specified, exit codes representing the image
> > state refer to the
> > > state after (the attempt at) repairing it. That is, a
> > successful ``-r all``
> > > will yield the exit code 0, independently of the image state
> > before.
> > >
> > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f
> > FMT] [-T SRC_CACHE] [-p] FILENAME
> > > +
> > > +  Print a checksum for image *FILENAME* guest visible content.
> >
> > Why not say which kind of checksum it is?
> >
> >
> > Do you mean the algorithm used? This may be confusing, for example we
> > write
> >
> >Print a sha256 checksum ...
> >
> > User will expect to get the same result from "sha256sum disk.img". How
> > about
> >
> >Print a blkhash checksum ...
> >
> > And add a link to the blkhash project?
>
> I did mean sha256, but if it isn’t pure sha256, then a link to any
> description how it is computed would be good, I think.
>

Ok, will link to https://gitlab.com/nirs/blkhash

[...]

>
> > > +  The checksum is not compatible with other tools such as
> > *sha256sum*.
> >
> > Why not?  I can see it differs even for raw images, but why?  I would
> > have very much assumed that this gives me exactly what sha256sum
> > in the
> > guest on the guest device would yield.
> >
> >
> > The blkhash is a construction based on other cryptographic hash
> > functions (e.g. sha256).
> > The way the hash is constructed is explained here:
> > https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52
> >
> > We can provide a very slow version using a single thread and no zero
> > optimization
> > that will create the same hash as sha256sum for raw image.
>
> Ah, right.  Yes, especially zero optimization is likely to make a huge
> difference.  Thanks for the explanation!
>
> Maybe that could be mentioned here as a side note, though?  E.g. “The
> checksum is not compatible with other tools such as *sha256sum* for
> optimization purposes (to allow multithreading and optimized handling of
> zero areas).”?
>

Ok, I will improve the text in the next version.

[...]

> > In blksum I do not allow changing the block size.
> >
> > I'll add an assert in the next version to keeps this default optimal.
>
> Thanks!  (Static assert should work, right?)
>

I think it should

Nir

Re: [Libguestfs] [libnbd PATCH v2 3/3] nbdsh: Improve --help and initial banner contents.

2022-11-04 Thread Nir Soffer

On Fri, Nov 4, 2022 at 11:18 PM Eric Blake  wrote:
[...]

> @@ -127,7 +129,10 @@ def __call__(self, parser, namespace, values,
> option_string=None):
>  os.environ["LIBNBD_DEBUG"] = "1"
>
>  # Create the handle.
> -if not args.n:
> +if args.n:
> +pass
> +else:
>

Why add useless branch?


> +global h
>  h = nbd.NBD()
>  h.set_handle_name("nbdsh")
>
> @@ -165,21 +170,35 @@ def line(x): lines.append(x)
>  def blank(): line("")
>  def example(ex, desc): line("%-34s # %s" % (ex, desc))
>
> +connect_hint = False
> +go_hint = False
>  blank()
>  line("Welcome to nbdsh, the shell for interacting with")
>  line("Network Block Device (NBD) servers.")
>  blank()
> -if not args.n:
> -line("The ‘nbd’ module has already been imported and there")
> -line("is an open NBD handle called ‘h’.")
> -blank()
> -else:
> +if args.n:
>  line("The ‘nbd’ module has already been imported.")
>  blank()
>  example("h = nbd.NBD()", "Create a new handle.")
> -if False:  # args.uri is None:
>

Good that this was removed, but it will be better to remove in the previous
patch.

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] [libnbd PATCH v2 2/3] nbdsh: Allow -u interleaved with -c

2022-11-04 Thread Nir Soffer

On Fri, Nov 4, 2022 at 11:18 PM Eric Blake  wrote
[...]

Sorry but I did not read, but I noticed this:


> @@ -165,7 +177,7 @@ def example(ex, desc): line("%-34s # %s" % (ex, desc))
>  line("The ‘nbd’ module has already been imported.")
>  blank()
>  example("h = nbd.NBD()", "Create a new handle.")
> -if args.uri is None:
> +if False:  # args.uri is None:
>

If False will never run, so why not remove the entire branch?

Is this leftover from debugging?

Nir
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer

On Sun, Oct 30, 2022 at 7:38 PM Nir Soffer  wrote:

> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:
>
>> On 01.09.22 16:32, Nir Soffer wrote:
>>
> [...]

> > +/* The current chunk. */
>> > +int64_t offset;
>> > +int64_t length;
>> > +bool zero;
>> > +
>> > +/* Always true for zero extent, false for data extent. Set to true
>> > + * when reading the chunk completes. */
>>
>> Qemu codestyle requires /* and */ to be on separate lines for multi-line
>> comments (see checkpatch.pl).
>>
>
> I'll change that. Do we have a good way to run checkpatch.pl when using
> git-publish?
>
> Maybe a way to run checkpatch.pl on all patches generated by git publish
> automatically?
>

I found
https://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html
and it seems to work well.

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer

On Sun, Oct 30, 2022 at 7:38 PM Nir Soffer  wrote:

> On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:
>
>> On 01.09.22 16:32, Nir Soffer wrote:
>>
> [...]

> > +/* The current chunk. */
>> > +int64_t offset;
>> > +int64_t length;
>> > +bool zero;
>> > +
>> > +/* Always true for zero extent, false for data extent. Set to true
>> > + * when reading the chunk completes. */
>>
>> Qemu codestyle requires /* and */ to be on separate lines for multi-line
>> comments (see checkpatch.pl).
>>
>
> I'll change that. Do we have a good way to run checkpatch.pl when using
> git-publish?
>
> Maybe a way to run checkpatch.pl on all patches generated by git publish
> automatically?
>

I found
https://blog.vmsplice.net/2011/03/how-to-automatically-run-checkpatchpl.html
and it seems to work well.

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > The checksum command compute a checksum for disk image content using the
> > blkhash library[1]. The blkhash library is not packaged yet, but it is
> > available via copr[2].
> >
> > Example run:
> >
> >  $ ./qemu-img checksum -p fedora-35.qcow2
> >  6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
> fedora-35.qcow2
> >
> > The block checksum is constructed by splitting the image to fixed sized
> > blocks and computing a digest of every block. The image checksum is the
> > digest of the all block digests.
> >
> > The checksum uses internally the "sha256" algorithm but it cannot be
> > compared with checksums created by other tools such as `sha256sum`.
> >
> > The blkhash library supports sparse images, zero detection, and
> > optimizes zero block hashing (they are practically free). The library
> > uses multiple threads to speed up the computation.
> >
> > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> > faster, depending on the amount of data in the image:
> >
> >  $ ./qemu-img info /scratch/50p.raw
> >  file format: raw
> >  virtual size: 6 GiB (6442450944 bytes)
> >  disk size: 2.91 GiB
> >
> >  $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
> /scratch/50p.raw" \
> >   "sha256sum /scratch/50p.raw"
> >  Benchmark 1: ./qemu-img checksum /scratch/50p.raw
> >Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s,
> System: 0.962 s]
> >Range (min … max):1.813 s …  1.908 s5 runs
> >
> >  Benchmark 2: sha256sum /scratch/50p.raw
> >Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s,
> System: 1.003 s]
> >Range (min … max):   14.501 s … 14.697 s5 runs
> >
> >  Summary
> >'./qemu-img checksum /scratch/50p.raw' ran
> >  7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
> >
> > The new command is available only when `blkhash` is available during
> > build. To test the new command please install the `blkhash-devel`
> > package:
> >
> >  $ dnf copr enable nsoffer/blkhash
> >  $ sudo dnf install blkhash-devel
> >
> > [1] https://gitlab.com/nirs/blkhash
> > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
> >  sha256sum (estimate): 17,749s
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   docs/tools/qemu-img.rst |  22 +
> >   meson.build |  10 ++-
> >   meson_options.txt   |   2 +
> >   qemu-img-cmds.hx|   8 ++
> >   qemu-img.c  | 191 
> >   5 files changed, 232 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 85a6e05b35..8be9c45cbf 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -347,20 +347,42 @@ Command description:
> >   Check completed, image is corrupted
> > 3
> >   Check completed, image has leaked clusters, but is not corrupted
> > 63
> >   Checks are not supported by the image format
> >
> > If ``-r`` is specified, exit codes representing the image state
> refer to the
> > state after (the attempt at) repairing it. That is, a successful
> ``-r all``
> > will yield the exit code 0, independently of the image state before.
> >
> > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T
> SRC_CACHE] [-p] FILENAME
> > +
> > +  Print a checksum for image *FILENAME* guest visible content.
>
> Why not say which kind of checksum it is?
>

Do you mean the algorithm used? This may be confusing, for example we write

   Print a sha256 checksum ...

User will expect to get the same result from "sha256sum disk.img". How about

   Print a blkhash checksum ...

And add a link to the blkhash project?


>
> >  Images
> with
> > +  different format or settings wil have the same checksum.
>
> s/wil/will/
>

Fixing


>
> > +
> > +  The format is probed unless you specify it by ``-f``.
> > +
> > +  The checksum is computed for guest visible content. Allocated areas
> full of
> > +  zeroes, zero clusters, and unallocated areas are read as z

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add coroutine based loop inspired by `qemu-img convert` design.
> >
> > Changes compared to `qemu-img convert`:
> >
> > - State for the entire image is kept in ImgChecksumState
> >
> > - State for single worker coroutine is kept in ImgChecksumworker.
> >
> > - "Writes" are always in-order, ensured using a queue.
> >
> > - Calling block status once per image extent, when the current extent is
> >consumed by the workers.
> >
> > - Using 1m buffer size - testings shows that this gives best read
> >performance both with buffered and direct I/O.
>
> Why does patch 1 then choose to use 2 MB?
>

The first patch uses sync I/O, and in this case 2 MB is a little faster.


> > - Number of coroutines is not configurable. Testing does not show
> >improvement when using more than 8 coroutines.
> >
> > - Progress include entire image, not only the allocated state.
> >
> > Comparing to the simple read loop shows that this version is up to 4.67
> > times faster when computing a checksum for an image full of zeroes. For
> > real images it is 1.59 times faster with direct I/O, and with buffered
> > I/O there is no difference.
> >
> > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
> >
> > | image| size | i/o   | before | after  | change
> |
> >
> |--|--|---||||
> > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67
> |
> > | zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12
> |
> > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02
> |
> > | real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59
> |
> > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36
> |
> > | nbd      |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02
> |
> >
> > [1] raw image full of zeroes
> > [2] raw fedora 35 image with additional random data, 50% full
> > [3] image [2] exported by qemu-nbd via unix socket
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   qemu-img.c | 343 +
> >   1 file changed, 270 insertions(+), 73 deletions(-)
>
> Looks good!
>
> Just a couple of style comments below.
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7edcfe4bc8..bfa8e2862f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1613,48 +1613,288 @@ out:
> >   qemu_vfree(buf2);
> >   blk_unref(blk2);
> >   out2:
> >   blk_unref(blk1);
> >   out3:
> >   qemu_progress_end();
> >   return ret;
> >   }
> >
> >   #ifdef CONFIG_BLKHASH
> > +
> > +#define CHECKSUM_COROUTINES 8
> > +#define CHECKSUM_BUF_SIZE (1 * MiB)
> > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> > +
> > +typedef struct ImgChecksumState ImgChecksumState;
> > +
> > +typedef struct ImgChecksumWorker {
> > +QTAILQ_ENTRY(ImgChecksumWorker) entry;
> > +ImgChecksumState *state;
> > +Coroutine *co;
> > +uint8_t *buf;
> > +
> > +/* The current chunk. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +/* Always true for zero extent, false for data extent. Set to true
> > + * when reading the chunk completes. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments (see checkpatch.pl).
>

I'll change that. Do we have a good way to run checkpatch.pl when using
git-publish?

Maybe a way to run checkpatch.pl on all patches generated by git publish
automatically?


> > +bool ready;
> > +} ImgChecksumWorker;
> > +
> > +struct ImgChecksumState {
> > +const char *filename;
> > +BlockBackend *blk;
> > +BlockDriverState *bs;
> > +int64_t total_size;
> > +
> > +/* Current extent, modified in checksum_co_next. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +int running_coroutines;
> > +CoMutex lock;
> > +ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> > +
> > +/* Ensure in-order updates. Update are scheduled at the tail of the
> > + * queue and processed from the head of the queue when a worker is
> > + * ready. */
>
> Qemu codestyle requires /* and */ to be on separate li

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:54 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add coroutine based loop inspired by `qemu-img convert` design.
> >
> > Changes compared to `qemu-img convert`:
> >
> > - State for the entire image is kept in ImgChecksumState
> >
> > - State for single worker coroutine is kept in ImgChecksumworker.
> >
> > - "Writes" are always in-order, ensured using a queue.
> >
> > - Calling block status once per image extent, when the current extent is
> >consumed by the workers.
> >
> > - Using 1m buffer size - testings shows that this gives best read
> >performance both with buffered and direct I/O.
>
> Why does patch 1 then choose to use 2 MB?
>

The first patch uses sync I/O, and in this case 2 MB is a little faster.


> > - Number of coroutines is not configurable. Testing does not show
> >improvement when using more than 8 coroutines.
> >
> > - Progress include entire image, not only the allocated state.
> >
> > Comparing to the simple read loop shows that this version is up to 4.67
> > times faster when computing a checksum for an image full of zeroes. For
> > real images it is 1.59 times faster with direct I/O, and with buffered
> > I/O there is no difference.
> >
> > Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:
> >
> > | image| size | i/o   | before | after  | change
> |
> >
> |--|--|---||||
> > | zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67
> |
> > | zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12
> |
> > | real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02
> |
> > | real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59
> |
> > | nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36
> |
> > | nbd      |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02
> |
> >
> > [1] raw image full of zeroes
> > [2] raw fedora 35 image with additional random data, 50% full
> > [3] image [2] exported by qemu-nbd via unix socket
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   qemu-img.c | 343 +
> >   1 file changed, 270 insertions(+), 73 deletions(-)
>
> Looks good!
>
> Just a couple of style comments below.
>
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 7edcfe4bc8..bfa8e2862f 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1613,48 +1613,288 @@ out:
> >   qemu_vfree(buf2);
> >   blk_unref(blk2);
> >   out2:
> >   blk_unref(blk1);
> >   out3:
> >   qemu_progress_end();
> >   return ret;
> >   }
> >
> >   #ifdef CONFIG_BLKHASH
> > +
> > +#define CHECKSUM_COROUTINES 8
> > +#define CHECKSUM_BUF_SIZE (1 * MiB)
> > +#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
> > +
> > +typedef struct ImgChecksumState ImgChecksumState;
> > +
> > +typedef struct ImgChecksumWorker {
> > +QTAILQ_ENTRY(ImgChecksumWorker) entry;
> > +ImgChecksumState *state;
> > +Coroutine *co;
> > +uint8_t *buf;
> > +
> > +/* The current chunk. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +/* Always true for zero extent, false for data extent. Set to true
> > + * when reading the chunk completes. */
>
> Qemu codestyle requires /* and */ to be on separate lines for multi-line
> comments (see checkpatch.pl).
>

I'll change that. Do we have a good way to run checkpatch.pl when using
git-publish?

Maybe a way to run checkpatch.pl on all patches generated by git publish
automatically?


> > +bool ready;
> > +} ImgChecksumWorker;
> > +
> > +struct ImgChecksumState {
> > +const char *filename;
> > +BlockBackend *blk;
> > +BlockDriverState *bs;
> > +int64_t total_size;
> > +
> > +/* Current extent, modified in checksum_co_next. */
> > +int64_t offset;
> > +int64_t length;
> > +bool zero;
> > +
> > +int running_coroutines;
> > +CoMutex lock;
> > +ImgChecksumWorker workers[CHECKSUM_COROUTINES];
> > +
> > +/* Ensure in-order updates. Update are scheduled at the tail of the
> > + * queue and processed from the head of the queue when a worker is
> > + * ready. */
>
> Qemu codestyle requires /* and */ to be on separate li

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add simple tests creating an image with all kinds of extents, different
> > formats, different backing chain, different protocol, and different
> > image options. Since all images have the same guest visible content they
> > must have the same checksum.
> >
> > To help debugging in case of failures, the output includes a json map of
> > every test image.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
> >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> >   2 files changed, 223 insertions(+)
> >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> b/tests/qemu-iotests/tests/qemu-img-checksum
> > new file mode 100755
> > index 00..3a85ba33f2
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > @@ -0,0 +1,149 @@
> > +#!/usr/bin/env python3
> > +# group: rw auto quick
> > +#
> > +# Test cases for qemu-img checksum.
> > +#
> > +# Copyright (C) 2022 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +
> > +import re
> > +
> > +import iotests
> > +
> > +from iotests import (
> > +filter_testfiles,
> > +qemu_img,
> > +qemu_img_log,
> > +qemu_io,
> > +qemu_nbd_popen,
> > +)
> > +
> > +
> > +def checksum_available():
> > +out = qemu_img("--help").stdout
> > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > +
> > +
> > +if not checksum_available():
> > +iotests.notrun("checksum command not available")
> > +
> > +iotests.script_initialize(
> > +supported_fmts=["raw", "qcow2"],
> > +supported_cache_modes=["none", "writeback"],
>
> It doesn’t work with writeback, though, because it uses -T none below.
>

Good point


>
> Which by the way is a heavy cost, because I usually run tests in tmpfs,
> where this won’t work.  Is there any way of not doing the -T none below?
>

Testing using tempfs is problematic since you cannot test -T none. In oVirt
we alway use /var/tmp which usually uses something that supports direct I/O.

Do we have a way to specify cache mode in the tests, so we can use -T none
only when the option is set?


>
> > +supported_protocols=["file", "nbd"],
> > +required_fmts=["raw", "qcow2"],
> > +)
> > +
> > +print()
> > +print("=== Test images ===")
> > +print()
> > +
> > +disk_raw = iotests.file_path('raw')
> > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> > +qemu_io("-f", "raw",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", "write -z 4m 2m", # zero allocated
> > +"-c", "write -z -u 6m 2m",  # zero hole
> > +# unallocated
> > +disk_raw)
> > +print(filter_testfiles(disk_raw))
> > +qemu_img_log("map", "--output", "json", disk_raw)
> > +
> > +disk_qcow2 = iotests.file_path('qcow2')
> > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> > +qemu_io("-f", "qcow2",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", &quo

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:31 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > Add simple tests creating an image with all kinds of extents, different
> > formats, different backing chain, different protocol, and different
> > image options. Since all images have the same guest visible content they
> > must have the same checksum.
> >
> > To help debugging in case of failures, the output includes a json map of
> > every test image.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
> >   .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
> >   2 files changed, 223 insertions(+)
> >   create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >   create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > diff --git a/tests/qemu-iotests/tests/qemu-img-checksum
> b/tests/qemu-iotests/tests/qemu-img-checksum
> > new file mode 100755
> > index 00..3a85ba33f2
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/qemu-img-checksum
> > @@ -0,0 +1,149 @@
> > +#!/usr/bin/env python3
> > +# group: rw auto quick
> > +#
> > +# Test cases for qemu-img checksum.
> > +#
> > +# Copyright (C) 2022 Red Hat, Inc.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +
> > +import re
> > +
> > +import iotests
> > +
> > +from iotests import (
> > +filter_testfiles,
> > +qemu_img,
> > +qemu_img_log,
> > +qemu_io,
> > +qemu_nbd_popen,
> > +)
> > +
> > +
> > +def checksum_available():
> > +out = qemu_img("--help").stdout
> > +return re.search(r"\bchecksum .+ filename\b", out) is not None
> > +
> > +
> > +if not checksum_available():
> > +iotests.notrun("checksum command not available")
> > +
> > +iotests.script_initialize(
> > +supported_fmts=["raw", "qcow2"],
> > +supported_cache_modes=["none", "writeback"],
>
> It doesn’t work with writeback, though, because it uses -T none below.
>

Good point


>
> Which by the way is a heavy cost, because I usually run tests in tmpfs,
> where this won’t work.  Is there any way of not doing the -T none below?
>

Testing using tempfs is problematic since you cannot test -T none. In oVirt
we alway use /var/tmp which usually uses something that supports direct I/O.

Do we have a way to specify cache mode in the tests, so we can use -T none
only when the option is set?


>
> > +supported_protocols=["file", "nbd"],
> > +required_fmts=["raw", "qcow2"],
> > +)
> > +
> > +print()
> > +print("=== Test images ===")
> > +print()
> > +
> > +disk_raw = iotests.file_path('raw')
> > +qemu_img("create", "-f", "raw", disk_raw, "10m")
> > +qemu_io("-f", "raw",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", "write -z 4m 2m", # zero allocated
> > +"-c", "write -z -u 6m 2m",  # zero hole
> > +# unallocated
> > +disk_raw)
> > +print(filter_testfiles(disk_raw))
> > +qemu_img_log("map", "--output", "json", disk_raw)
> > +
> > +disk_qcow2 = iotests.file_path('qcow2')
> > +qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
> > +qemu_io("-f", "qcow2",
> > +"-c", "write -P 0x1 0 2m",  # data
> > +"-c", "write -P 0x0 2m 2m", # data with zeroes
> > +"-c", &quo

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-10-30 Thread Nir Soffer

On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz  wrote:

> On 01.09.22 16:32, Nir Soffer wrote:
> > The checksum command compute a checksum for disk image content using the
> > blkhash library[1]. The blkhash library is not packaged yet, but it is
> > available via copr[2].
> >
> > Example run:
> >
> >  $ ./qemu-img checksum -p fedora-35.qcow2
> >  6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5
> fedora-35.qcow2
> >
> > The block checksum is constructed by splitting the image to fixed sized
> > blocks and computing a digest of every block. The image checksum is the
> > digest of the all block digests.
> >
> > The checksum uses internally the "sha256" algorithm but it cannot be
> > compared with checksums created by other tools such as `sha256sum`.
> >
> > The blkhash library supports sparse images, zero detection, and
> > optimizes zero block hashing (they are practically free). The library
> > uses multiple threads to speed up the computation.
> >
> > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
> > faster, depending on the amount of data in the image:
> >
> >  $ ./qemu-img info /scratch/50p.raw
> >  file format: raw
> >  virtual size: 6 GiB (6442450944 bytes)
> >  disk size: 2.91 GiB
> >
> >  $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum
> /scratch/50p.raw" \
> >   "sha256sum /scratch/50p.raw"
> >  Benchmark 1: ./qemu-img checksum /scratch/50p.raw
> >Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s,
> System: 0.962 s]
> >Range (min … max):1.813 s …  1.908 s5 runs
> >
> >  Benchmark 2: sha256sum /scratch/50p.raw
> >Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s,
> System: 1.003 s]
> >Range (min … max):   14.501 s … 14.697 s5 runs
> >
> >  Summary
> >'./qemu-img checksum /scratch/50p.raw' ran
> >  7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'
> >
> > The new command is available only when `blkhash` is available during
> > build. To test the new command please install the `blkhash-devel`
> > package:
> >
> >  $ dnf copr enable nsoffer/blkhash
> >  $ sudo dnf install blkhash-devel
> >
> > [1] https://gitlab.com/nirs/blkhash
> > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
> > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
> >  sha256sum (estimate): 17,749s
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >   docs/tools/qemu-img.rst |  22 +
> >   meson.build |  10 ++-
> >   meson_options.txt   |   2 +
> >   qemu-img-cmds.hx|   8 ++
> >   qemu-img.c  | 191 
> >   5 files changed, 232 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> > index 85a6e05b35..8be9c45cbf 100644
> > --- a/docs/tools/qemu-img.rst
> > +++ b/docs/tools/qemu-img.rst
> > @@ -347,20 +347,42 @@ Command description:
> >   Check completed, image is corrupted
> > 3
> >   Check completed, image has leaked clusters, but is not corrupted
> > 63
> >   Checks are not supported by the image format
> >
> > If ``-r`` is specified, exit codes representing the image state
> refer to the
> > state after (the attempt at) repairing it. That is, a successful
> ``-r all``
> > will yield the exit code 0, independently of the image state before.
> >
> > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T
> SRC_CACHE] [-p] FILENAME
> > +
> > +  Print a checksum for image *FILENAME* guest visible content.
>
> Why not say which kind of checksum it is?
>

Do you mean the algorithm used? This may be confusing, for example we write

   Print a sha256 checksum ...

User will expect to get the same result from "sha256sum disk.img". How about

   Print a blkhash checksum ...

And add a link to the blkhash project?


>
> >  Images
> with
> > +  different format or settings wil have the same checksum.
>
> s/wil/will/
>

Fixing


>
> > +
> > +  The format is probed unless you specify it by ``-f``.
> > +
> > +  The checksum is computed for guest visible content. Allocated areas
> full of
> > +  zeroes, zero clusters, and unallocated areas are read as z

[ovirt-devel] Enable DCO app on github for enforcing signed-off-by

2022-10-26 Thread Nir Soffer

I'm not sure we enforce signed-off-by in all projects, but we do in the
most important
projects. There is a nice little app adding CI check with zero effort. Here
is
a bad commit:

https://github.com/nirs/userstorage/pull/29

When the check fails the app provides useful help for users:


https://github.com/nirs/userstorage/pull/29/checks?check_run_id=9125092601

Looks like very little effort to improve all projects!

If you wandered what signed-off-by means, this is explained here:

https://wiki.linuxfoundation.org/dco

Nir
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WM6H2MXLZ27FGDRRX7VZEINVJXNMYPXA/

Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-10-18 Thread Nir Soffer

On Sun, Sep 18, 2022 at 12:35 PM Nir Soffer  wrote:

> ping
>
> Kevin, Hanna, I hope you have time to take a look.
>
> https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


Ping again, hopefully someone has time to look at this :-)


>
>
>
> On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
> >
> > Since blkhash is available only via copr now, the new command is added as
> > optional feature, built only if blkhash-devel package is installed.
> >
> > Nir Soffer (3):
> >   qemu-img: Add checksum command
> >   iotests: Test qemu-img checksum
> >   qemu-img: Speed up checksum
> >
> >  docs/tools/qemu-img.rst   |  22 +
> >  meson.build   |  10 +-
> >  meson_options.txt |   2 +
> >  qemu-img-cmds.hx  |   8 +
> >  qemu-img.c| 388 ++
> >  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
> >  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
> >  7 files changed, 652 insertions(+), 1 deletion(-)
> >  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > --
> > 2.37.2
> >
>

Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-10-18 Thread Nir Soffer

On Sun, Sep 18, 2022 at 12:35 PM Nir Soffer  wrote:

> ping
>
> Kevin, Hanna, I hope you have time to take a look.
>
> https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


Ping again, hopefully someone has time to look at this :-)


>
>
>
> On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
> >
> > Since blkhash is available only via copr now, the new command is added as
> > optional feature, built only if blkhash-devel package is installed.
> >
> > Nir Soffer (3):
> >   qemu-img: Add checksum command
> >   iotests: Test qemu-img checksum
> >   qemu-img: Speed up checksum
> >
> >  docs/tools/qemu-img.rst   |  22 +
> >  meson.build   |  10 +-
> >  meson_options.txt |   2 +
> >  qemu-img-cmds.hx  |   8 +
> >  qemu-img.c| 388 ++
> >  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
> >  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
> >  7 files changed, 652 insertions(+), 1 deletion(-)
> >  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
> >  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
> >
> > --
> > 2.37.2
> >
>

[ovirt-users] Re: Issue with disk import and disk upload - ovirt 4.5.2

2022-10-04 Thread Nir Soffer

On Mon, Oct 3, 2022 at 4:11 PM Proniakin Marcin 
wrote:

>
> Hello,
>
>
> After upgrading ovirt to version 4.5.2, I've experienced issue with using
> import function in import disk window in storage domain after attaching it
> to data center. Logs in the attachment (#1).
>
>
> Second issue is with uploading disks from storage domain window.
>
> Using example from attachment (#2):
>
> When choosing to upload disk to domain portal-1 from upload function in
> the storage domain disk window, ovirt chooses wrong data-center dev-1
> (dev-1 datacenter has domain dev-1, portal-1 datacenter has domain
> portal-1) and wrong host to upload. Accepting this upload always fails.
>
>
> When choosing to upload disk from storage -> disks window works fine.
>
>
> Issues confirmed on two independent ovirt servers (both with version
> 4.5.2).
>

Please report an issue in
https://github.com/oVirt/ovirt-engine/issues

ovirt 4.5.2 introduced the ovirt-img tool - you can use it to upload images
from the command line instead of the UI. This is much faster, more reliable,
and support many features like format on the fly format conversion.

Example usage:

ovirt-img upload-disk --engine-url https://example.com \
--username username \
--password-file /path/to/password-file \
--cafile ca.pem \
--storage-domain mydomain \
/path/to/iso

See ovirt-im upload-disk --help for more options.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JKM5JGCNGZYRK2WT34KDZZGRHCLSHEED/

Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-18 Thread Nir Soffer

ping

Kevin, Hanna, I hope you have time to take a look.

https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
>
> Since blkhash is available only via copr now, the new command is added as
> optional feature, built only if blkhash-devel package is installed.
>
> Nir Soffer (3):
>   qemu-img: Add checksum command
>   iotests: Test qemu-img checksum
>   qemu-img: Speed up checksum
>
>  docs/tools/qemu-img.rst   |  22 +
>  meson.build   |  10 +-
>  meson_options.txt |   2 +
>  qemu-img-cmds.hx  |   8 +
>  qemu-img.c| 388 ++
>  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
>  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
>  7 files changed, 652 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>
> --
> 2.37.2
>

Re: [PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-18 Thread Nir Soffer

ping

Kevin, Hanna, I hope you have time to take a look.

https://lists.nongnu.org/archive/html/qemu-block/2022-09/msg00021.html


On Thu, Sep 1, 2022 at 5:32 PM Nir Soffer  wrote:
>
> Since blkhash is available only via copr now, the new command is added as
> optional feature, built only if blkhash-devel package is installed.
>
> Nir Soffer (3):
>   qemu-img: Add checksum command
>   iotests: Test qemu-img checksum
>   qemu-img: Speed up checksum
>
>  docs/tools/qemu-img.rst   |  22 +
>  meson.build   |  10 +-
>  meson_options.txt |   2 +
>  qemu-img-cmds.hx  |   8 +
>  qemu-img.c| 388 ++
>  tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
>  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
>  7 files changed, 652 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
>  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out
>
> --
> 2.37.2
>

[PATCH 3/3] qemu-img: Speed up checksum

2022-09-01 Thread Nir Soffer

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 343 +
 1 file changed, 270 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 7edcfe4bc8..bfa8e2862f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1613,48 +1613,288 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/* Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes. */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/* Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready. */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, , NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(>lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s)) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset = s->offset;
+w->length = MIN(s->length,
+

[PATCH 1/3] qemu-img: Add checksum command

2022-09-01 Thread Nir Soffer

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  22 +
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 191 
 5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 85a6e05b35..8be9c45cbf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,42 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings wil have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum*.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.build
index 20fddbd707..56b648d8a7 100644
--- a/meson.build
+++ b/meson.build
@@ -727,20 +727,24 @@ if not get_option('curl').auto() or have_block

[PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-01 Thread Nir Soffer

Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Nir Soffer (3):
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  22 +
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 388 ++
 tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
 7 files changed, 652 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

-- 
2.37.2

[PATCH 2/3] iotests: Test qemu-img checksum

2022-09-01 Thread Nir Soffer

Add simple tests creating an image with all kinds of extents, different
formats, different backing chain, different protocol, and different
image options. Since all images have the same guest visible content they
must have the same checksum.

To help debugging in case of failures, the output includes a json map of
every test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
 2 files changed, 223 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3a85ba33f2
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+qemu_nbd_popen,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file", "nbd"],
+required_fmts=["raw", "qcow2"],
+)
+
+print()
+print("=== Test images ===")
+print()
+
+disk_raw = iotests.file_path('raw')
+qemu_img("create", "-f", "raw", disk_raw, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_raw)
+print(filter_testfiles(disk_raw))
+qemu_img_log("map", "--output", "json", disk_raw)
+
+disk_qcow2 = iotests.file_path('qcow2')
+qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
+qemu_io("-f", "qcow2",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_qcow2)
+print(filter_testfiles(disk_qcow2))
+qemu_img_log("map", "--output", "json", disk_qcow2)
+
+disk_compressed = iotests.file_path('compressed')
+qemu_img("convert", "-f", "qcow2", "-O", "qcow2", "-c",
+ disk_qcow2, disk_compressed)
+print(filter_testfiles(disk_compressed))
+qemu_img_log("map", "--output", "json", disk_compressed)
+
+disk_base = iotests.file_path('base')
+qemu_img("create", "-f", "raw", disk_base, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",
+"-c", "write -P 0x0 2m 2m",
+disk_base)
+print(filter_testfiles(disk_base))
+qemu_img_log("map", "--output", "json", disk_base)
+
+disk_top = iotests.file_path('top')
+qemu_img("create", "-f", "qcow2", "-b", disk_base, "-F", "raw",
+ disk_top)
+qemu_io("-f", "qcow2",
+"-c", "write -z 4m 2m",
+"-c", "write -z -u 6m 2m",
+disk_top)
+print(filter_testfiles(disk_top))
+qemu_img_log("map", "--output", "json"

[PATCH 1/3] qemu-img: Add checksum command

2022-09-01 Thread Nir Soffer

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

$ ./qemu-img checksum -p fedora-35.qcow2
6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

$ ./qemu-img info /scratch/50p.raw
file format: raw
virtual size: 6 GiB (6442450944 bytes)
disk size: 2.91 GiB

$ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
 "sha256sum /scratch/50p.raw"
Benchmark 1: ./qemu-img checksum /scratch/50p.raw
  Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 0.962 
s]
  Range (min … max):1.813 s …  1.908 s5 runs

Benchmark 2: sha256sum /scratch/50p.raw
  Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
  Range (min … max):   14.501 s … 14.697 s5 runs

Summary
  './qemu-img checksum /scratch/50p.raw' ran
7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

$ dnf copr enable nsoffer/blkhash
$ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
 docs/tools/qemu-img.rst |  22 +
 meson.build |  10 ++-
 meson_options.txt   |   2 +
 qemu-img-cmds.hx|   8 ++
 qemu-img.c  | 191 
 5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 85a6e05b35..8be9c45cbf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,42 @@ Command description:
 Check completed, image is corrupted
   3
 Check completed, image has leaked clusters, but is not corrupted
   63
 Checks are not supported by the image format
 
   If ``-r`` is specified, exit codes representing the image state refer to the
   state after (the attempt at) repairing it. That is, a successful ``-r all``
   will yield the exit code 0, independently of the image state before.
 
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T 
SRC_CACHE] [-p] FILENAME
+
+  Print a checksum for image *FILENAME* guest visible content. Images with
+  different format or settings wil have the same checksum.
+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest, but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum*.
+
 .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
 
   Commit the changes recorded in *FILENAME* in its base image or backing file.
   If the backing file is smaller than the snapshot, then the backing file will 
be
   resized to be the same size as the snapshot.  If the snapshot is smaller than
   the backing file, the backing file will not be truncated.  If you want the
   backing file to match the size of the smaller snapshot, you can safely 
truncate
   it yourself once the commit operation successfully completes.
 
   The image *FILENAME* is emptied after the operation has succeeded. If you do
diff --git a/meson.build b/meson.build
index 20fddbd707..56b648d8a7 100644
--- a/meson.build
+++ b/meson.build
@@ -727,20 +727,24 @@ if not get_option('curl').auto() or have_block

[PATCH 3/3] qemu-img: Speed up checksum

2022-09-01 Thread Nir Soffer

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
  consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
  performance both with buffered and direct I/O.

- Number of coroutines is not configurable. Testing does not show
  improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
 qemu-img.c | 343 +
 1 file changed, 270 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 7edcfe4bc8..bfa8e2862f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1613,48 +1613,288 @@ out:
 qemu_vfree(buf2);
 blk_unref(blk2);
 out2:
 blk_unref(blk1);
 out3:
 qemu_progress_end();
 return ret;
 }
 
 #ifdef CONFIG_BLKHASH
+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/* Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes. */
+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/* Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready. */
+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};
+
+static int checksum_block_status(ImgChecksumState *s)
+{
+int64_t length;
+int status;
+
+/* Must be called when current extent is consumed. */
+assert(s->length == 0);
+
+status = bdrv_block_status_above(s->bs, NULL, s->offset,
+ s->total_size - s->offset, , NULL,
+ NULL);
+if (status < 0) {
+error_report("Error checking status at offset %" PRId64 " for %s",
+ s->offset, s->filename);
+s->ret = status;
+return -1;
+}
+
+assert(length > 0);
+
+s->length = length;
+s->zero = !!(status & BDRV_BLOCK_ZERO);
+
+return 0;
+}
+
+/**
+ * Grab the next chunk from the current extent, getting the next extent if
+ * needed, and schecule the next update at the end fo the update queue.
+ *
+ * Retrun true if the worker has work to do, false if the worker has
+ * finished or there was an error getting the next extent.
+ */
+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(>lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s)) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+/* Grab one chunk from current extent. */
+w->offset = s->offset;
+w->length = MIN(s->length,
+

[PATCH 2/3] iotests: Test qemu-img checksum

2022-09-01 Thread Nir Soffer

Add simple tests creating an image with all kinds of extents, different
formats, different backing chain, different protocol, and different
image options. Since all images have the same guest visible content they
must have the same checksum.

To help debugging in case of failures, the output includes a json map of
every test image.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
 2 files changed, 223 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3a85ba33f2
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+qemu_nbd_popen,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],
+supported_protocols=["file", "nbd"],
+required_fmts=["raw", "qcow2"],
+)
+
+print()
+print("=== Test images ===")
+print()
+
+disk_raw = iotests.file_path('raw')
+qemu_img("create", "-f", "raw", disk_raw, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_raw)
+print(filter_testfiles(disk_raw))
+qemu_img_log("map", "--output", "json", disk_raw)
+
+disk_qcow2 = iotests.file_path('qcow2')
+qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
+qemu_io("-f", "qcow2",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_qcow2)
+print(filter_testfiles(disk_qcow2))
+qemu_img_log("map", "--output", "json", disk_qcow2)
+
+disk_compressed = iotests.file_path('compressed')
+qemu_img("convert", "-f", "qcow2", "-O", "qcow2", "-c",
+ disk_qcow2, disk_compressed)
+print(filter_testfiles(disk_compressed))
+qemu_img_log("map", "--output", "json", disk_compressed)
+
+disk_base = iotests.file_path('base')
+qemu_img("create", "-f", "raw", disk_base, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",
+"-c", "write -P 0x0 2m 2m",
+disk_base)
+print(filter_testfiles(disk_base))
+qemu_img_log("map", "--output", "json", disk_base)
+
+disk_top = iotests.file_path('top')
+qemu_img("create", "-f", "qcow2", "-b", disk_base, "-F", "raw",
+ disk_top)
+qemu_io("-f", "qcow2",
+"-c", "write -z 4m 2m",
+"-c", "write -z -u 6m 2m",
+disk_top)
+print(filter_testfiles(disk_top))
+qemu_img_log("map", "--output", "json"

[PATCH 0/3] Add qemu-img checksum command using blkhash

2022-09-01 Thread Nir Soffer

Since blkhash is available only via copr now, the new command is added as
optional feature, built only if blkhash-devel package is installed.

Nir Soffer (3):
  qemu-img: Add checksum command
  iotests: Test qemu-img checksum
  qemu-img: Speed up checksum

 docs/tools/qemu-img.rst   |  22 +
 meson.build   |  10 +-
 meson_options.txt |   2 +
 qemu-img-cmds.hx  |   8 +
 qemu-img.c| 388 ++
 tests/qemu-iotests/tests/qemu-img-checksum| 149 +++
 .../qemu-iotests/tests/qemu-img-checksum.out  |  74 
 7 files changed, 652 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
 create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

-- 
2.37.2

[ovirt-users] Re: Veeam Backup for RHV (oVirt)

2022-07-27 Thread Nir Soffer

On Wed, Jul 27, 2022 at 2:56 PM  wrote:
>
> Hi!
> Not really sure if this is right place to ask, but..
>
> I am trying to use Veeam Backup for Red Hat Virtualization on oVirt 4.5.1.
> I have been using it on version 4.4.10.7 and it works ok there.
>
> On Veeam Release page it says that supported OS is RHV 4.4SP1 (oVirt 4.5).
>
> When i try to do backup, this is what i get from Veeam backup.
> No errors in vdsm.log and engine.log..
>
> 2022-07-27 08:08:44.153 00039 [19545] INFO | [LoggingEventsManager_39]: 
> Add to storage LoggingEvent [id: 34248685-a193-4df5-8ff2-838f738e211c, Type: 
> BackupStartedNew]
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [TaskManager_39]: Create and 
> run new async task. [call method:'RunBackupChain']
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [AsyncTask_39]: New AsynTask 
> created [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f', description:'', 
> type:'BACKUP_VM_POLICY']
> 2022-07-27 08:08:44.168 00039 [19545] INFO | [AsyncTask_39]: Prepare 
> AsynTask to run [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [BackupPolicy_31]: Refresh 
> VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [BackupPolicy_31]: Begin 
> updating list of active VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = True]
> 2022-07-27 08:08:44.176 00031 [19545] INFO | [RhevCluster_31]: Test 
> connection to cluster [IP: engine.example.org, Port: 443, User: 
> admin@ovirt@internalsso]
> 2022-07-27 08:08:44.189 00039 [19545] INFO | [TaskManager_39]: AsyncTask 
> registered. [id:'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:44.371 00031 [19545] INFO | [RhevCluster_31]: Test 
> connection to cluster success. Status: Success. Message:
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicyManager_31]: 
> Refreshing the policies data...
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: Begin 
> updating list of active VMs for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = False]
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: List of 
> active VMs updated for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = False]. Number of active VMs '1'
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicyManager_31]: 
> Policies data has been refreshed.
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: List of 
> active VMs updated for policy '6e090c98-d44b-4785-acb4-82a627da5d9b' 
> [forceRefresh = True]. Number of active VMs '1'
> 2022-07-27 08:08:44.556 00031 [19545] INFO | [BackupPolicy_31]: Found the 
> '1' VMs to backup in policy '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.564 00031 [19545] INFO | [BackupPolicy_31]: * 
> Parallel policy runner has started * for policy [Name:'test5', ID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'
> 2022-07-27 08:08:44.564 00031 [19545] INFO | [VeeamBackupServer_31]: Test 
> connection to backup server [IP: 'veeambr.example.org', Port: '10006', User: 
> 'rhvproxy']
> 2022-07-27 08:08:44.931 00031 [19545] INFO | [VeeamBackupServer_31]: Test 
> connection to backup server [IP: 'veeambr.example.org', Port: '10006']. 
> Connection status: ConnectionSuccess. Version: 11.0.1.1261
> 2022-07-27 08:08:45.423 00031 [19545] INFO | [BackupPolicy_31]: 
> Successfully called CreateVeeamPolicySession for job [UID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'], session [UID: 
> 'aafa22ac-ff2e-4647-becb-dca88e3eb67f']
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: 
> Successfully called RetainPolicyVms for job [UID: 
> '6e090c98-d44b-4785-acb4-82a627da5d9b'] with VMs: 
> 50513a65-6ccc-479b-9b61-032e0961b016
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: Start 
> calculating maxPointsCount
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: End 
> calculating maxPointsCount. Result = 7
> 2022-07-27 08:08:45.820 00031 [19545] INFO | [BackupPolicy_31]: Starting 
> validate repository schedule. Repository [UID: 
> '237e41d6-7c67-4a1f-80bf-d7c73c481209', MaxPointsCount: '7', 
> IsPeriodicFullRequired: 'False']
> 2022-07-27 08:08:46.595 00031 [19545] INFO | [BackupPolicy_31]: End 
> validate repository schedule. Result: [IsScheduleValid: 'True', ErrorMessage: 
> '']
> 2022-07-27 08:08:46.597 00031 [19545] INFO | [SessionManager_31]: Start 
> registering a new session[Id: 'b6f3f0e1-7aab-41cb-b0e7-10f5b2ed6708']
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [SessionManager_31]: Session 
> registered. [Id:'b6f3f0e1-7aab-41cb-b0e7-10f5b2ed6708']
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [BackupPolicy_31]: Backup VM 
> [id:'50513a65-6ccc-479b-9b61-032e0961b016'] starting...
> 2022-07-27 08:08:46.639 00031 [19545] INFO | [BackupPolicy_31]: 
> RetentionMergeDisabled: false
> 2022-07-27 08:08:46.640 00031 [19545]

[ovirt-devel] Re: github testing: merge with branch, or use PR HEAD?

2022-07-07 Thread Nir Soffer

On Wed, Jun 15, 2022 at 12:26 PM Yedidyah Bar David  wrote:
>
> Hi all,
>
> I was annoyed for some time now by the fact that when I used some
> github-CI-generated RPMs, with a git hash in their names, I could
> never find this git hash anywhere - not in my local git repo, nor in
> github. Why is it so? Because, if I got it right, the default for
> 'actions/checkout@v2' is to merge the PR HEAD with the branch HEAD.
> See e.g. [1]:
>
> HEAD is now at 7bbb40c9a Merge
> 026bb9c672bf46786dd6d16f4cbe0ecfa84c531d into
> 35e217936b5571e9657946b47333a563373047bb
>
> Meaning: my patch was 026bb9c, master was 35e2179, and the generated
> RPMs will have 7bbb40c9a, not to be found anywhere else. If you check
> the main PR page [3], you can find there '026bb9c', but not
> '7bbb40c9a'.
>
> (Even 026bb9c might require some effort, e.g. "didib force-pushed the
> add-hook-log-console branch 2 times, most recently from c90e658 to
> 66ebc88 yesterday". I guess this is the result of github discouraging
> force-pushes, in direct opposite of gerrit, which had a notion of
> different patchsets for a single change. I already ranted about this
> in the past, but that's not the subject of the current message).
>
> This is not just an annoyance, it's a real difference in semantics. In
> gerrit/jenkins days, IIRC most/all projects I worked on, ran CI
> testing/building on the pushed HEAD, and didn't touch it. Rebase, if
> at all, happened either explicitly, or at merge time.

I don't think that the action *rebases* the pr, it uses a merge commit
but this adds newer commits on master on top of the pr, which may
conflict or change the semantics of the pr.

> actions/checkout's default, to auto-merge, is probably meant to be
> more "careful" - to test what would happen if the code is merged. I
> agree this makes sense. But I personally think it's almost always ok
> to test on the pushed HEAD and not rebase/merge _implicitely_.
>
> What do you think?

I agree, this is unexpected and unwanted behavior in particular for
projects that disable merge commits (e.g. vdsm).

> It should be easy to change, using [2]:
>
> - uses: actions/checkout@v2
>   with:
> ref: ${{ github.event.pull_request.head.sha }}

+1

Nir
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WZ3W6BII34CTQXXLBYJB6W6ECCWEGM4J/

[ovirt-users] Re: Cannot Enable incremental backup

2022-07-06 Thread Nir Soffer

On Thu, Jul 7, 2022 at 12:40 AM Jonas  wrote:

> Hello all
>
> I'm trying to create incremental backups for my VMs on a testing cluster
> and am using the functions from
> https://gitlab.com/nirs/ovirt-stress/-/blob/master/backup/backup.py.
>
Note that the VM configuration is not backed up, so restoring
requires creating a new VM with the restored disks.

So far it works well, but on some disks it is not possible to enable
> incremental backups even when the VM is powered off (see screenshot below).
> Does anyone know why this might be the case and how to activate it? I think
> I already checked the docs and didn't find anything but feel free to nudge
> me in the right direction.
>
It look like you try to enable incremental backup *after* the disk was
created
which is not possible if the disk is using raw format. I guess the disk is
on
file based storage (NFS, Glsuter) and thin volume on file based storage is
using raw format by default.

The way to fix this is to convert the disk to qcow2 format - this feature
is available since ovirt 4.5.0, but it works only via the API/SDK.

Here is and example code for converting disk format:
https://github.com/oVirt/python-ovirt-engine-sdk4/blob/7c17fe326fc1b67ba581d3c244ec019508f3ac25/examples/convert_disk.py

The example is not finished and is still in review, but it should show
how to use the API.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W5YKJUXOHDMWUDZ4EHOPIRCHZS6MIB5G/

[ovirt-users] Re: Import KVM VMs on individual iSCSI luns

2022-07-02 Thread Nir Soffer

On Sat, Jul 2, 2022 at 9:40 AM  wrote:
>
> Greetings,
>
> Is it possible with oVirt to  import existing VMs where the underlying 
> storage is on raw iSCSI luns and to keep them on those luns?
>
> The historical scenario is that we have Virtual farms in multiple sites 
> managed by an ancient Orchestration tool that does not support modern OS's as 
> the hypervisor.
> - In each site, there are clusters of hypervisors/Hosts  that have visibility 
>  to the same iSCSI luns.
> - Each VM has it's own set of iscsi luns that are totally dedicated to that VM
> - Each VM is using LVM to manage the disk
> - Each Host has LVM filtering configured to NOT manage the VM's iscsi luns
> - The VMs can be live migrated from any Hypervisor within the cluster to any 
> other  Hypervisor in that same cluster
>
> We are attempting to bring this existing environment into oVirt without  
> replacing the storage model.
> Is there any documentation that will serve as a guide for this scenario?
>
> In a lab environment, we have successfully
> - Added 2 hypervisors (hosts) and oVirt can see their  VMs as 
> external-ovtest1 and external-ovtest2
> - Removed the LVM filtering on the hosts

This should not be needed. The lvm filter ensure that the host can
manage only the
disks used by the host (for example for the boot disk). Other disks
(e.g. your LUNs)
are not managed by the host, but they are managed by oVirt.

> - Created a storage domain that is able to see the iscsi luns, but we have 
> not yet  done the 'add' of each lun

Don't create a storage domain, since you want to use the LUNs directly.
Adding the LUNs to the storage domain can destroy the data on the LUN.

> Is it possible to import these luns as raw block devices without LVM being 
> layered on top of them?

Yes, this is called Direct LUN in oVirt.

> Is it required  to actually import the luns into a storage domain, or can the 
> VM's still be imported if all luns are visible on all hosts in the cluster?

There is no way to import the VM as is, but you can recreate the VM
with the same LUNs.

> In the grand scheme of things, are we trying to do something that is not 
> possible with  oVirt?
> If it is possible, we would greatly appreciate tips, pointers, links to docs 
> etc that will help us migrate this environment to oVirt.

You can do this:

1. Connect to the storage server with the relevant LUNs. The LUNs used by
   the VM should be visible in engine UI (New storage domain dialog).
2. Create a new VM using the same configuration you had in the original VM
3. Attach the right LUNs to the VM (using Direct LUN)
4. In the VM, make sure you use the right disks - the best way is to use:

/dev/disk/by-id/{virtaio}-{serial}

When {serial} is the disk UUID seen in engine UI for the direct LUN.
{virtaio} is correct if you connect the disks using virtio, if you use
virtio-scsi the string will be different.

You may also need to install extra components on the boot disk, like
qemu-guest-agent or virtio drivers.

Note that oVirt does not use the LUNs directly, but the multipath device on
top of the SCSI device. This should be transparent, but be prepared to see
/dev/mapper/{wwid} instead of /dev/sdXXX.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NW5HEA7YWS5OPPYOZLUQHQQEB4MKZC7U/

[ovirt-users] Re: Failed to Start services ovirt-imageio

2022-06-26 Thread Nir Soffer

On Fri, Jun 24, 2022 at 1:33 AM АБИОЛА А. Э  wrote:
>
> Hello Nir,

Hi АБИОЛА, you forgot to reply to users@ovirt.org. Please always reply
to the mailing list so the discussion is public.

>   I am so grateful for your reply and helpful information. I have 
> successfully deployed the Ovirt engine and it is up and running fine, but now 
> I am having another issue uploading .ISO files to disk. I tried to upload 
> .iso files but the status was "Paused by the system" and was waiting 15 hours 
> for the status to change but still no changes, i tried to delete the .ISO 
> file but i can't remove at all, its status was "Finalizing cleaning up" i 
> waited 20 hours for the status to change but with no success. Kindly guard me 
> through the process to fix this errors, so I can upload .ISO files to the 
> Disk to launch the VM successfully. Please see below picture for the error.

When the system pauses an upload, you can cancel the upload from the
upload menu:

Storage > Disks > Upload > Cancel

This will delete the new disk and end the image transfer.

To understand why the system paused the transfer, please share more info:

- Which oVirt version are you running?
- Did you add the engine CA certificate to the browser?

You can check if the browser is configured correctly by opening an
upload dialog:

Storage > Disks > Upload > Start

and clicking the "Test Connection" button. If this fails, you need to
add the engine
CA to the browser. When "Test Connection" is successful, try to upload again.

If the upload fails again, please share logs showing the timeframe of
this error:
- engine log from /var/log/ovirt-engine/engine.log
- vdsm log from the host performing the upload (see the events tag in
engine UI) from /var/log/vdsm/vdsm.log
- ovirt-imageio log from the host performing the upload from
/var/log/ovirt-imageio/daemon.log

Nir

>
> I will be glad to read from you soon.
>
> Appreciated
>
>
> On Tue, Jun 21, 2022 at 3:19 PM Nir Soffer  wrote:
>>
>> On Tue, Jun 21, 2022 at 8:18 AM АБИОЛА А. Э  wrote:
>> >
>> > Hello Sir,
>> > I am new to Ovirt and I tried to deploy it 3weeks into my oracle linux 
>> > with no success.
>> > I got the following error messages
>> > Please how can i fix this error to successfully deploy it.
>> > I will be glad to read from you soon.
>> > Appreciated
>> > AAE.
>>
>> Which oVirt version is this?
>>
>> You can try to check:
>>
>> systemctl status ovirt-imageio
>>
>> It usually show the latest logs which may help to understand why the
>> service could not start.
>>
>> What do you have in /var/log/ovirt-imageio/daemon.log?
>>
>> Nir
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YPGFFMB7H4UONQMLU3QIB4CR2XPVG7YS/

[ovirt-users] Re: Failed to Start services ovirt-imageio

2022-06-21 Thread Nir Soffer

On Tue, Jun 21, 2022 at 8:18 AM АБИОЛА А. Э  wrote:
>
> Hello Sir,
> I am new to Ovirt and I tried to deploy it 3weeks into my oracle linux with 
> no success.
> I got the following error messages
> Please how can i fix this error to successfully deploy it.
> I will be glad to read from you soon.
> Appreciated
> AAE.

Which oVirt version is this?

You can try to check:

systemctl status ovirt-imageio

It usually show the latest logs which may help to understand why the
service could not start.

What do you have in /var/log/ovirt-imageio/daemon.log?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IAVOIVROVX45Z27JA66RG25MFPVNHI6H/

[ovirt-devel] Re: Projects missing safe dir config in COPR

2022-06-13 Thread Nir Soffer

On Mon, Jun 13, 2022 at 1:39 PM Michal Skrivanek  wrote:
>
> Hi,
> I scanned the current projects and AFAICT these are the active projects that 
> don't have builds configured properly. Please add "git config --global --add 
> safe.directory ..." to the COPR makefile
> Otherwise COPR build may not work at all or (worse) they build something 
> wrong.

For example way to fix this issue see:
https://github.com/oVirt/vdsm/commit/64873df82480006fce743baa89b83b94c41e53e7
https://github.com/oVirt/ovirt-imageio/commit/a72c680782584db17597dd3116e0a1dca7290249

>
> Thanks,
> michal
>
>
> imgbased
> ioprocess

Fixed in
https://github.com/oVirt/ioprocess/pull/2

> mom
> ovirt-ansible-collection
> ovirt-cockpit-sso
> ovirt-engine-api-metamodel
> ovirt-engine-api-model
> ovirt-engine-wildfly
> ovirt-hosted-engine-ha
> ovirt-hosted-engine-setup
> ovirt-lldp-labeler
> ovirt-log-collector
> ovirt-node-ng
> ovirt-openvswitch
> ovirt-setup-lib
> ovirt-vmconsole
> python-ovirt-engine-sdk4

Nir
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/BMUD4NBMGB4JDVAJKV5BC7TLWJ2IDAE6/

Re: [Libguestfs] libnbd golang failure on RISC-V

2022-06-09 Thread Nir Soffer

On Thu, Jun 9, 2022 at 6:24 PM Richard W.M. Jones  wrote:
>
> On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > > + go test -count=1 -v
> > > === RUN   Test010Load
> > > --- PASS: Test010Load (0.00s)
> > > === RUN   TestAioBuffer
> > > --- PASS: TestAioBuffer (0.00s)
> > > === RUN   TestAioBufferFree
> > > --- PASS: TestAioBufferFree (0.00s)
> > > === RUN   TestAioBufferBytesAfterFree
> > > SIGABRT: abort
> > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
> >
> > So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> > odd as that test case is trivial and whle it allocates some
> > native memory it doesn't seem to write to it. Unless the problem
> > happened in an earlier test case and we have delayed detection ?
> >
> > I guess I'd try throwing darts at the wall by chopping out bits
> > of test code to see what makes it disappear.
> >
> > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> > in case pre-existing data into the C.malloc()d block is confusing
> > Go ?
>
> Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
> and disable GODEBUG, then the tests pass.  (Reproducer commands at end
> of email).  So I guess at least one of the problems is confined to
> this test and/or functions it calls in the main library.
> Unfortunately this test is huge.
>
> At your suggestion, replacing every MakeAioBuffer with
> MakeAioBufferZero in that test, but it didn't help.  Also tried
> replacing malloc -> calloc in the aio_buffer.go implementation which
> didn't help.
>
> I'll try some more random things ...
>
> Rich.
>
>
> $ emacs -nw run.in# comment out GODEBUG line
> $ emacs -nw golang/Makefile.am   # remove libnbd_020_aio_buffer_test.go line
> $ mv golang/libnbd_020_aio_buffer_test.go 
> golang/libnbd_020_aio_buffer_test.nogo
> $ make run
> $ make
> $ make -C golang check
> ...
> PASS: run-tests.sh

So when skipping libnbd_020_aio_buffer_test.go we don't get the warning about
Go pointer in C memory?

If true, can you find the test triggering this issue?

You can run only some tests using -run={regex}, for example:

$ go test -v -run=TestAioBuffer.+AfterFree
=== RUN   TestAioBufferBytesAfterFree
--- PASS: TestAioBufferBytesAfterFree (0.00s)
=== RUN   TestAioBufferSliceAfterFree
--- PASS: TestAioBufferSliceAfterFree (0.00s)
=== RUN   TestAioBufferGetAfterFree
--- PASS: TestAioBufferGetAfterFree (0.00s)
PASS

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[Libguestfs] [PATCH libnbd] golang: aio_buffer.go: Explicit panic() on invalid usage

2022-06-09 Thread Nir Soffer

Previously we depended on the behavior on common platforms to panic when
trying to use a nil pointer, but Richard reported that it segfault on
RISC-V. Avoid the undocumented assumptions and panic explicitly with a
useful panic message.

Signed-off-by: Nir Soffer 
---
 golang/aio_buffer.go | 9 +
 1 file changed, 9 insertions(+)

diff --git a/golang/aio_buffer.go b/golang/aio_buffer.go
index 52ea54de..325dbc98 100644
--- a/golang/aio_buffer.go
+++ b/golang/aio_buffer.go
@@ -65,28 +65,37 @@ func (b *AioBuffer) Free() {
if b.P != nil {
C.free(b.P)
b.P = nil
}
 }
 
 // Bytes copies the underlying C array to Go allocated memory and return a
 // slice. Modifying the returned slice does not modify the underlying buffer
 // backing array.
 func (b *AioBuffer) Bytes() []byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
return C.GoBytes(b.P, C.int(b.Size))
 }
 
 // Slice creates a slice backed by the underlying C array. The slice can be
 // used to access or modify the contents of the underlying array. The slice
 // must not be used after caling Free().
 func (b *AioBuffer) Slice() []byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
// See 
https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
// TODO: Use unsafe.Slice() when we require Go 1.17.
return (*[1<<30]byte)(b.P)[:b.Size:b.Size]
 }
 
 // Get returns a pointer to a byte in the underlying C array. The pointer can
 // be used to modify the underlying array. The pointer must not be used after
 // calling Free().
 func (b *AioBuffer) Get(i uint) *byte {
+   if b.P == nil {
+   panic("Using AioBuffer after Free()")
+   }
return (*byte)(unsafe.Pointer(uintptr(b.P) + uintptr(i)))
 }
-- 
2.36.1

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Re: [Libguestfs] libnbd golang failure on RISC-V

2022-06-09 Thread Nir Soffer

On Thu, Jun 9, 2022 at 6:48 PM Richard W.M. Jones  wrote:
>
> On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote:
> > On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > > > + go test -count=1 -v
> > > > === RUN   Test010Load
> > > > --- PASS: Test010Load (0.00s)
> > > > === RUN   TestAioBuffer
> > > > --- PASS: TestAioBuffer (0.00s)
> > > > === RUN   TestAioBufferFree
> > > > --- PASS: TestAioBufferFree (0.00s)
> > > > === RUN   TestAioBufferBytesAfterFree
> > > > SIGABRT: abort
> > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
> > >
> > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> > > odd as that test case is trivial and whle it allocates some
> > > native memory it doesn't seem to write to it. Unless the problem
> > > happened in an earlier test case and we have delayed detection ?
> > >
> > > I guess I'd try throwing darts at the wall by chopping out bits
> > > of test code to see what makes it disappear.
> > >
> > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> > > in case pre-existing data into the C.malloc()d block is confusing
> > > Go ?
> >
> > Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
> > and disable GODEBUG, then the tests pass.  (Reproducer commands at end
> > of email).  So I guess at least one of the problems is confined to
> > this test and/or functions it calls in the main library.
> > Unfortunately this test is huge.
> >
> > At your suggestion, replacing every MakeAioBuffer with
> > MakeAioBufferZero in that test, but it didn't help.  Also tried
> > replacing malloc -> calloc in the aio_buffer.go implementation which
> > didn't help.
> >
> > I'll try some more random things ...
>
> Adding a few printf's shows something interesting:
>
> === RUN   TestAioBufferBytesAfterFree
> calling Free on 0x3fbc1882b0
> calling C.GoBytes on 0x3fbc1882b0
> SIGABRT: abort
> PC=0x3fe6aaebac m=0 sigcode=18446744073709551610
>
> goroutine 21 [running]:
> gsignal
> :0
> abort
> :0
> runtime.throwException
> ../../../libgo/runtime/go-unwind.c:128
> runtime.unwindStack
> ../../../libgo/go/runtime/panic.go:535
> panic
> ../../../libgo/go/runtime/panic.go:750
> runtime.panicmem
> ../../../libgo/go/runtime/panic.go:210
> runtime.sigpanic
> ../../../libgo/go/runtime/signal_unix.go:634
> _wordcopy_fwd_aligned
> :0
> __GI_memmove
> :0
> runtime.stringtoslicebyte
> ../../../libgo/go/runtime/string.go:155
> __go_string_to_byte_array
> ../../../libgo/go/runtime/string.go:509
> _cgo_23192bdcbd72_Cfunc_GoBytes
> ./cgo-c-prolog-gccgo:46
>
> This is a simple use after free because the Free function in
> aio_buffer.go frees the array and then the Bytes function attempts to
> copy b.Size bytes from the NULL pointer.
>
> I didn't write this test so I'm not quite sure what it's trying to
> achieve.

The test verifies that using the buffer in the wrong way fails in a clean
way (panic) and not silent double free like it was before
https://gitlab.com/nbdkit/libnbd/-/commit/3394f47556cac009fa7d39c9e2f7e5f2468bd65d

> It seems to be deliberately trying to cause a panic, but
> causes a segfault instead?  (And why only on RISC-V?)
>
>   func TestAioBufferBytesAfterFree(t *testing.T) {
> buf := MakeAioBuffer(uint(32))
> buf.Free()
>
> defer func() {
> if r := recover(); r == nil {
> t.Fatal("Did not recover from panic calling Bytes() 
> after Free()")
> }
> }()
>
> buf.Bytes()
>   }
>
> Since this only happens on RISC-V I guess it might be something to do
> with the golang implementation on this architecture being unable to
> turn segfaults into panics.
>
> Removing all three *AfterFree tests fixes the tests.

But this hides the real issue - if users use Bytes() in the wrong way, we want
the panic, not the segfault - the tests are good!

> It seems a bit of an odd function however.  Wouldn't it be better to
> changes the Bytes function so that it tests if the pointer is NULL and
> panics?

I cannot find now any docs for GoBytes, maybe I tested that it panics
in this case,
but this does not work this arch (bug?). Panic with a good error message about
the wrong usage will be much better.

>
> NB: this _does not_ address the other problem where GODEBUG=cgocheck=2
> complains about "fatal error: Go pointer stored into non-Go memory".

Do we keep go pointers in buffers allocated in C?

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer

On Thu, Jun 2, 2022 at 10:33 PM Patrick Hibbs  wrote:
>
> Here's the ausearch results from that host. Looks like more than one
> issue. (openvswitch is also in there.)

I did not see anything related to the issues you reported and selinux
is likely not related. However there are unexpected denials that
may be harmless but should not appear in report.

I think filing a separate bug for the 2 kinds of deinals there makes
sense, someone should check and fix either the selinux policy or
the program trying to do stuff it should not.

I think should be reported for qemu-kvm in bugzilla:

time->Thu Jun  2 10:33:38 2022
type=PROCTITLE msg=audit(1654180418.940:5119):
proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D57656253657276696365735F486F6E6F6B612C64656275672D746872656164733D6F6E002D53002D6F626A656374007B22716F6D2D74797065223A22736563726574222C226964223A226D61737465724B657930222C22666F726D617422
type=SYSCALL msg=audit(1654180418.940:5119): arch=c03e syscall=257
success=no exit=-13 a0=ff9c a1=5647b7ffd910 a2=0 a3=0 items=0
ppid=1 pid=3639 auid=4294967295 uid=107 gid=107 euid=107 suid=107
fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295
comm="qemu-kvm" exe="/usr/libexec/qemu-kvm"
subj=system_u:system_r:svirt_t:s0:c9,c704 key=(null)
type=AVC msg=audit(1654180418.940:5119): avc:  denied  { search } for
pid=3639 comm="qemu-kvm" name="1055" dev="proc" ino=28142
scontext=system_u:system_r:svirt_t:s0:c9,c704
tcontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tclass=dir
permissive=0

I'm not sure where this should be reported, maybe kernel?

type=SYSCALL msg=audit(1651812155.891:50): arch=c03e syscall=175
success=yes exit=0 a0=55bcab394ed0 a1=51494 a2=55bca960b8b6
a3=55bcaab64010 items=0 ppid=1274 pid=1282 auid=4294967295 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295
comm="modprobe" exe="/usr/bin/kmod"
subj=system_u:system_r:openvswitch_load_module_t:s0 key=(null)
type=AVC msg=audit(1651812155.891:50): avc:  denied  { search } for
pid=1282 comm="modprobe" name="events" dev="tracefs" ino=2060
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:tracefs_t:s0 tclass=dir permissive=0
type=AVC msg=audit(1651812155.891:50): avc:  denied  { search } for
pid=1282 comm="modprobe" name="events" dev="tracefs" ino=2060
scontext=system_u:system_r:openvswitch_load_module_t:s0
tcontext=system_u:object_r:tracefs_t:s0 tclass=dir permissive=0

> I'll see about opening the bug. Should I file it on oVirt's github or
> the RedHat bugzilla?

Bugzilla is still the preferred place, but you can use github if you like,
we will look at it in both places.

Nir

> -Patrick Hibbs
>
> On Thu, 2022-06-02 at 22:08 +0300, Nir Soffer wrote:
> > On Thu, Jun 2, 2022 at 9:52 PM Patrick Hibbs 
> > wrote:
> > >
> > > The attached logs are from the cluster hosts that were running the
> > > HA
> > > VMs during the failures.
> > >
> > > I've finally got all of my HA VMs up again. The last one didn't
> > > start
> > > again until after I freed up more space in the storage domain than
> > > what
> > > was originally available when the VM was running previously. (It
> > > now
> > > has over 150GB of free space. Which should be more than enough, but
> > > it
> > > didn't boot with 140GB avaiable)
> > >
> > > SideNote:
> > > I just found this in the logs on the original host that the HA VMs
> > > were
> > > running on:
> > >
> > > ---snip---
> > > Jun 02 10:33:29 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:29
> > > 674607
> > > [1054]: s1 check_our_lease warning 71 last_success 674536
> > >   # semanage
> > > fcontext -a -t virt_image_t '1055'
> > >   *  Plugin
> > > catchall (2.13 confidence) suggests   **
> > >   Then you
> > > should
> > > report this as a bug.
> > >   You can
> > > generate
> > > a local policy module to allow this access.
> > >   Do
> >
> > Not clear what is the selinux issue. If you run:
> >
> > ausearch -m avc
> >
> > It should be more clear.
> >
> > > Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45
> > > 674623
> > > [1054]: s1 kill 3441 sig 15 count 8
> >

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer

On Thu, Jun 2, 2022 at 9:52 PM Patrick Hibbs  wrote:
>
> The attached logs are from the cluster hosts that were running the HA
> VMs during the failures.
>
> I've finally got all of my HA VMs up again. The last one didn't start
> again until after I freed up more space in the storage domain than what
> was originally available when the VM was running previously. (It now
> has over 150GB of free space. Which should be more than enough, but it
> didn't boot with 140GB avaiable)
>
> SideNote:
> I just found this in the logs on the original host that the HA VMs were
> running on:
>
> ---snip---
> Jun 02 10:33:29 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:29 674607
> [1054]: s1 check_our_lease warning 71 last_success 674536
>   # semanage
> fcontext -a -t virt_image_t '1055'
>   *  Plugin
> catchall (2.13 confidence) suggests   **
>   Then you should
> report this as a bug.
>   You can generate
> a local policy module to allow this access.
>   Do

Not clear what is the selinux issue. If you run:

ausearch -m avc

It should be more clear.

> Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45 674623
> [1054]: s1 kill 3441 sig 15 count 8
> Jun 02 10:33:45 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:45 674623
> [1054]: s1 kill 4337 sig 15 count 8
> Jun 02 10:33:46 ryuki.codenet sanlock[1054]: 2022-06-02 10:33:46 674624
> [1054]: s1 kill 3206 sig 15 count 9

This means that the host could not access the storage for 80 seconds, and the
leases expired. When leases expire, sanlock must kill the process holding the
lease. Here we see that sanlock send a SIGTERM to 3 processes.

If these are VMs, they will pause and libvirt will release the lease.

I can check the log deeper next week.

Nir

> Jun 02 10:33:47 ryuki.codenet kernel: ovirtmgmt: port 4(vnet2) entered
> disabled state
> ---snip---
>
> That looks like some SELinux failure.
>
> -Patrick Hibbs
>
> On Thu, 2022-06-02 at 19:44 +0300, Nir Soffer wrote:
> > On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs 
> > wrote:
> > >
> > > OK, so the data storage domain on a cluster filled up to the point
> > > that
> > > the OS refused to allocate any more space.
> > >
> > > This happened because I tried to create a new prealloc'd disk from
> > > the
> > > Admin WebUI. The disk creation claims to be completed successfully,
> > > I've not tried to use that disk yet, but due to a timeout with the
> > > storage domain in question the engine began trying to fence all of
> > > the
> > > HA VMs.
> > > The fencing failed for all of the HA VMs leaving them in a powered
> > > off
> > > state. Despite all of the HA VMs being up at the time, so no
> > > reallocation of the leases should have been necessary.
> >
> > Leases are not reallocated during fencing, not sure why you expect
> > this to happen.
> >
> > > Attempting to
> > > restart them manually from the Admin WebUI failed. With the
> > > original
> > > host they were running on complaining about "no space left on
> > > device",
> > > and the other hosts claiming that the original host still held the
> > > VM
> > > lease.
> >
> > No space left on device may be an unfortunate error from sanlock,
> > meaning that there is no locksapce. This means the host has trouble
> > adding the lockspace, or it did not complete yet.
> >
> > > After cleaning up some old snapshots, the HA VMs would still not
> > > boot.
> > > Toggling the High Availability setting for each one and allowing
> > > the
> > > lease to be removed from the storage domain was required to get the
> > > VMs
> > > to start again.
> >
> > If  you know that the VM is not running, disabling the lease
> > temporarily is
> > a good way to workaround the issue.
> >
> > > Re-enabling the High Availability setting there after
> > > fixed the lease issue. But now some, not all, of the HA VMs are
> > > still
> > > throwing "no space left on device" errors when attempting to start
> > > them. The others are working just fine even with their HA lease
> > > enabled.
> >
> > All erros come from same host(s) or some vms cannot start while
> > others can on the same host?
> >
> > > My qu

Re: [linux-lvm] lvm commands hanging when run from inside a kubernetes pod

2022-06-02 Thread Nir Soffer

On Thu, Jun 2, 2022 at 9:41 AM Abhishek Agarwal
 wrote:
>
> These are not different LVM processes. The container process is using the LVM 
> binary that the node itself has. We have achieved this by using scripts that 
> point to the same lvm binary that is used by the node.
>
> Configmap(~shell script) used for the same has the following contents where 
> `/host` refers to the root directory of the node:
>
> get_bin_path: |
>   #!/bin/sh
>   bin_name=$1
>   if [ -x /host/bin/which ]; then
> echo $(chroot /host /bin/which $bin_name | cut -d ' ' -f 1)
>   elif [ -x /host/usr/bin/which ]; then
> echo $(chroot /host /usr/bin/which $bin_name | cut -d ' ' -f 1)
>   else
> $(chroot /host which $bin_name | cut -d ' ' -f 1)
>   fi
>
> lvcreate: |
>   #!/bin/sh
>   path=$(/sbin/lvm-eg/get_bin_path "lvcreate")
>   chroot /host $path "$@"
>
> Also, the above logs in the pastebin link have errors because the vg lock has 
> not been acquired and hence creation commands will fail. Once the lock is 
> acquired, the `strace -f` command gives the following output being stuck. 
> Check out this link for full details -> https://pastebin.com/raw/DwQfdmr8
>
> P.S: We at OpenEBS are trying to provide lvm storage to cloud native 
> workloads with the help of kubernetes CSI drivers and since all these drivers 
> run as pods and help dynamic provisioning of kubernetes volumes(storage) for 
> the application, the lvm commands needs to be run from inside the pod. 
> Reference -> https://github.com/openebs/lvm-localpv

For using LVM in Kubernetes, why not toplvm?
https://github.com/topolvm/topolvm

The design looks right, running lvm commands on the host:
https://github.com/topolvm/topolvm/blob/main/docs/design.md

Nir

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer

On Thu, Jun 2, 2022 at 7:44 PM Nir Soffer  wrote:
>
> On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs  wrote:
> >
> > OK, so the data storage domain on a cluster filled up to the point that
> > the OS refused to allocate any more space.
> >
> > This happened because I tried to create a new prealloc'd disk from the
> > Admin WebUI. The disk creation claims to be completed successfully,
> > I've not tried to use that disk yet, but due to a timeout with the
> > storage domain in question the engine began trying to fence all of the
> > HA VMs.
> > The fencing failed for all of the HA VMs leaving them in a powered off
> > state. Despite all of the HA VMs being up at the time, so no
> > reallocation of the leases should have been necessary.
>
> Leases are not reallocated during fencing, not sure why you expect
> this to happen.
>
> > Attempting to
> > restart them manually from the Admin WebUI failed. With the original
> > host they were running on complaining about "no space left on device",
> > and the other hosts claiming that the original host still held the VM
> > lease.
>
> No space left on device may be an unfortunate error from sanlock,
> meaning that there is no locksapce. This means the host has trouble
> adding the lockspace, or it did not complete yet.
>
> > After cleaning up some old snapshots, the HA VMs would still not boot.
> > Toggling the High Availability setting for each one and allowing the
> > lease to be removed from the storage domain was required to get the VMs
> > to start again.
>
> If  you know that the VM is not running, disabling the lease temporarily is
> a good way to workaround the issue.
>
> > Re-enabling the High Availability setting there after
> > fixed the lease issue. But now some, not all, of the HA VMs are still
> > throwing "no space left on device" errors when attempting to start
> > them. The others are working just fine even with their HA lease
> > enabled.
>
> All erros come from same host(s) or some vms cannot start while
> others can on the same host?
>
> > My questions are:
> >
> > 1. Why does oVirt claim to have a constantly allocated HA VM lease on
> > the storage domain when it's clearly only done while the VM is running?
>
> Leases are allocated when a VM is created. This allocated a the lease space
> (1MiB) in the external leases special volume, and bind it to the VM ID.
>
> When VM starts, it acquires the lease for its VM ID. If sanlock is not 
> connected
> to the lockspace on this host, this may fail with the confusing
> "No space left on device" error.
>
> > 2. Why does oVirt deallocate the HA VM lease when performing a fencing
> > operation?
>
> It does not. oVirt does not actually "fence" the VM. If the host running the 
> VM
> cannot access storage and update the lease, the host lose all leases on that
> storage. The result is pausing all the VM holding a lease on that storage.
>
> oVirt will try to start the VM on another host, which will try to
> acquire the lease
> again on the new host. If enough time passed since the original host lost
> access to storage, the lease can be acquired on the new host. If not, this
> will happen in the next retrie(s).
>
> If the original host did not lose access to storage, and it is still
> updating the
> lease you cannot acquire the lease from another host. This protect the VM
> from split-brain that will corrupt the vm disk.
>
> > 3. Why can't oVirt clear the old HA VM lease when the VM is down and
> > the storage pool has space available? (How much space is even needed?
> > The leases section of the storage domain in the Admin WebUI doesn't
> > contain any useful info beyond the fact that a lease should exist for a
> > VM even when it's off.)
>
> Acquiring the lease is possible only if the lease is not held on another host.
>
> oVirt does not support acquiring a held lease by killing the process holding
> the lease on another host, but sanlock provides such capability.
>
> > 4. Is there a better way to force start a HA VM when the lease is old
> > and the VM is powered off?
>
> If the original VM is powered off for enough time (2-3 minutes), the lease
> expires and starting the VM on another host should succeed.
>
> > 5. Should I file a bug on the whole HA VM failing to reacquire a lease
> > on a full storage pool?
>
> The external lease volume is not fully allocated. If you use thin provisioned
> storage, and the there is really no storage space, it is possible that 
> creating
> a new lease will fail, but starting and stopping VM that have leases should 
&

[ovirt-users] Re: HA VM Lease failure with full data storage domain

2022-06-02 Thread Nir Soffer

On Thu, Jun 2, 2022 at 7:14 PM Patrick Hibbs  wrote:
>
> OK, so the data storage domain on a cluster filled up to the point that
> the OS refused to allocate any more space.
>
> This happened because I tried to create a new prealloc'd disk from the
> Admin WebUI. The disk creation claims to be completed successfully,
> I've not tried to use that disk yet, but due to a timeout with the
> storage domain in question the engine began trying to fence all of the
> HA VMs.
> The fencing failed for all of the HA VMs leaving them in a powered off
> state. Despite all of the HA VMs being up at the time, so no
> reallocation of the leases should have been necessary.

Leases are not reallocated during fencing, not sure why you expect
this to happen.

> Attempting to
> restart them manually from the Admin WebUI failed. With the original
> host they were running on complaining about "no space left on device",
> and the other hosts claiming that the original host still held the VM
> lease.

No space left on device may be an unfortunate error from sanlock,
meaning that there is no locksapce. This means the host has trouble
adding the lockspace, or it did not complete yet.

> After cleaning up some old snapshots, the HA VMs would still not boot.
> Toggling the High Availability setting for each one and allowing the
> lease to be removed from the storage domain was required to get the VMs
> to start again.

If  you know that the VM is not running, disabling the lease temporarily is
a good way to workaround the issue.

> Re-enabling the High Availability setting there after
> fixed the lease issue. But now some, not all, of the HA VMs are still
> throwing "no space left on device" errors when attempting to start
> them. The others are working just fine even with their HA lease
> enabled.

All erros come from same host(s) or some vms cannot start while
others can on the same host?

> My questions are:
>
> 1. Why does oVirt claim to have a constantly allocated HA VM lease on
> the storage domain when it's clearly only done while the VM is running?

Leases are allocated when a VM is created. This allocated a the lease space
(1MiB) in the external leases special volume, and bind it to the VM ID.

When VM starts, it acquires the lease for its VM ID. If sanlock is not connected
to the lockspace on this host, this may fail with the confusing
"No space left on device" error.

> 2. Why does oVirt deallocate the HA VM lease when performing a fencing
> operation?

It does not. oVirt does not actually "fence" the VM. If the host running the VM
cannot access storage and update the lease, the host lose all leases on that
storage. The result is pausing all the VM holding a lease on that storage.

oVirt will try to start the VM on another host, which will try to
acquire the lease
again on the new host. If enough time passed since the original host lost
access to storage, the lease can be acquired on the new host. If not, this
will happen in the next retrie(s).

If the original host did not lose access to storage, and it is still
updating the
lease you cannot acquire the lease from another host. This protect the VM
from split-brain that will corrupt the vm disk.

> 3. Why can't oVirt clear the old HA VM lease when the VM is down and
> the storage pool has space available? (How much space is even needed?
> The leases section of the storage domain in the Admin WebUI doesn't
> contain any useful info beyond the fact that a lease should exist for a
> VM even when it's off.)

Acquiring the lease is possible only if the lease is not held on another host.

oVirt does not support acquiring a held lease by killing the process holding
the lease on another host, but sanlock provides such capability.

> 4. Is there a better way to force start a HA VM when the lease is old
> and the VM is powered off?

If the original VM is powered off for enough time (2-3 minutes), the lease
expires and starting the VM on another host should succeed.

> 5. Should I file a bug on the whole HA VM failing to reacquire a lease
> on a full storage pool?

The external lease volume is not fully allocated. If you use thin provisioned
storage, and the there is really no storage space, it is possible that creating
a new lease will fail, but starting and stopping VM that have leases should not
be affected. But if you reach to the point when you don't have enough storage
space you have much bigger trouble and you should fix urgently.

Do you really have issue with available space? What does engine reports
about the storage domain? What does the underlying storage reports?

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OYA5KFPJIZXUGDK6CZFO6BWHY7ZDT7OR/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4846 matches

Mail list logo