Re: How to stop QEMU punching holes in backing store

2024-08-08 Thread Nir Soffer
On Sat, Jun 29, 2024 at 12:19 AM Nir Soffer  wrote:
>
> On Wed, Jun 19, 2024 at 2:18 PM Nir Soffer  wrote:
> > On 19 Jun 2024, at 8:54, Justin  wrote:
> >
> >I've run strace and I see calls to fallocate with these flags:
> >FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE
> >
> >I've tried passing these options: discard=off,detect-zeroes=off but
> >this does not help. This is the full set of relevant options I'm
> >using:
> >
> >-drive 
> > file=/vms/vm0/drive,format=raw,if=virtio,discard=off,detect-zeroes=
> >off
> >
> > You don't need to disable detect-zeros - in my tests it makes dd 
> > if=/dev/zero
> > 5 times faster (770 MiB/s -> 3 GiB/s) since zero writes are converted to
> > fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_ZERO_RANGE).
> >
> > The issue seems to be ignoring the discard option when opening the image,
> > and is fixed by this:
> > https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html
> >
> > Thanks. When might this patch (or something similar) be merged?
> >
> > The patch need more work, when the work is done and qemu maintainers are 
> > happy it is a good estimate :-)
>
> Justin, v3 should be ready now, do you want to help by testing it?
>
> v3:
> https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00644.html
>
> You can also pull it from:
> https://gitlab.com/nirs/qemu/-/tree/consider-discard-option

The fix is included in QEMU 9.1.0-rc0




Re: How to stop QEMU punching holes in backing store

2024-06-28 Thread Nir Soffer
On Wed, Jun 19, 2024 at 2:18 PM Nir Soffer  wrote:
> On 19 Jun 2024, at 8:54, Justin  wrote:
>
>I've run strace and I see calls to fallocate with these flags:
>FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE
>
>I've tried passing these options: discard=off,detect-zeroes=off but
>this does not help. This is the full set of relevant options I'm
>using:
>
>-drive file=/vms/vm0/drive,format=raw,if=virtio,discard=off,detect-zeroes=
>off
>
> You don't need to disable detect-zeros - in my tests it makes dd if=/dev/zero
> 5 times faster (770 MiB/s -> 3 GiB/s) since zero writes are converted to
> fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_ZERO_RANGE).
>
> The issue seems to be ignoring the discard option when opening the image,
> and is fixed by this:
> https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html
>
> Thanks. When might this patch (or something similar) be merged?
>
> The patch need more work, when the work is done and qemu maintainers are 
> happy it is a good estimate :-)

Justin, v3 should be ready now, do you want to help by testing it?

v3:
https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00644.html

You can also pull it from:
https://gitlab.com/nirs/qemu/-/tree/consider-discard-option

Nir




Re: qemu-img convert: Compression can not be disabled when converting from .qcow2 to .raw

2024-06-21 Thread Nir Soffer
On Fri, Jun 21, 2024 at 5:48 PM Sven Ott  wrote:

> Hi, I want to mount a VM image to a loop device and give it some excess
> space.
>
> To do so, I download a .qcow2 file, add some 0 bytes with truncate, and
> then convert the image from QCOW2 to RAW format with qemu-img convert,
> like so:
>
> ```
>
> GUEST_IMG=focal-server-cloudimg-amd64
>
> wget https://cloud-images.ubuntu.com/focal/current/$GUEST_IMG
>
> truncate -s 5G $GUEST_IMG.img
>

This is not needed, and ineffective...


>
> qemu-img convert -f qcow2 -O raw $GUEST_IMG.img $GUEST_IMG.raw
>

Since the first thing done in this command is truncating the target image
to 0 bytes.

You can use -n to avoid creation of the target image and use your image,
but this is also not
needed.

You can convert the image:

qemu-img convert -f qccow2 -O raw src.qcow2 dst.raw

and then resize the raw image:

qmeu-img resize dst.raw newsize

You can also resize before converting, it does not matter if you resize
before or after.

Note that you will have to grow the pv/lv/filessystem inside the guest to
use the additional space.

Nir


Re: How to stop QEMU punching holes in backing store

2024-06-19 Thread Nir Soffer


> On 19 Jun 2024, at 8:54, Justin  wrote:
> 
>>I've run strace and I see calls to fallocate with these flags:
>>FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE
>> 
>>I've tried passing these options: discard=off,detect-zeroes=off but
>>this does not help. This is the full set of relevant options I'm
>>using:
>> 
>>-drive file=/vms/vm0/drive,format=raw,if=virtio,discard=off,detect-zeroes=
>>off
>> 
>> You don't need to disable detect-zeros - in my tests it makes dd if=/dev/zero
>> 5 times faster (770 MiB/s -> 3 GiB/s) since zero writes are converted to
>> fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_ZERO_RANGE).
>> 
>> The issue seems to be ignoring the discard option when opening the image,
>> and is fixed by this:
>> https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html
> 
> Thanks. When might this patch (or something similar) be merged?

The patch need more work, when the work is done and qemu maintainers are happy 
it is a good estimate :-)

> 
>> I think the change needs more work to keep the default behavior
>> since most users want sparse images, but it seems to do what you
>> want - keeping images thick.
> 
> It seems that this patch is making the code align more closely with
> the documentation? To, me, it appeared fairly clear that discard=unmap
> would punch holes, and thus the inverse setting would stop hole
> punching.

Punching holes is used both for discard (e.g. fstrim in the guest) and for 
writing zeros.

I think that discard should work only when you set discard=unmap or 
disacard=on, but writing zeros should always punch holes unless you set 
discard=off. I don’t think this behavior is documented now but it should be, at 
least the intent to keep images sparse when possible.

Nir

Re: How to stop QEMU punching holes in backing store

2024-06-18 Thread Nir Soffer
On Wed, Jun 5, 2024 at 5:27 PM Justin  wrote:

> Hi. I'm using QEMU emulator version 5.2.0 on Linux. I am using
> thick-provisioned RAW files as the VM backing store. I've found that
> QEMU is punching holes in my RAW files (it's replacing some zero
> blocks with holes), which means that the number of blocks allocated to
> the VM volumes decreases. It keeps doing this; I've manually used
> fallocate(1) to reallocate the full number of blocks to the VM backing
> store files, and sometime later QEMU punches some more holes.
>
> How do I completely disable all hole punching?
>
> The problem with this behavious is that this confuses capacity
> management software into thinking that there is enough free space to
> create more VMs. The file-system for the VM backing stores becomes
> over-committed. Later, when a VM starts writing non-zero data to the
> holes, the VM hangs because QEMU cannot write to the backing store
> because there are no free blocks available. There is no recovery other
> than deleting files, so it basically means one or more VMs have to be
> sacrificed for the greater good.
>

On the other hand, using a thin disk means that storage operations like
copying a disk, backup or writing zeros are much more efficient.

I would check if it is possible to fix the capacity management system to
consider the disk virtual instead of available space.


> I've run strace and I see calls to fallocate with these flags:
> FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE
>
> I've tried passing these options: discard=off,detect-zeroes=off but
> this does not help. This is the full set of relevant options I'm
> using:
>
> -drive
> file=/vms/vm0/drive,format=raw,if=virtio,discard=off,detect-zeroes=off
>

You don't need to disable detect-zeros - in my tests it makes dd
if=/dev/zero
5 times faster (770 MiB/s -> 3 GiB/s) since zero writes are converted to
fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_ZERO_RANGE).

The issue seems to be ignoring the discard option when opening the image,
and is fixed by this:
https://lists.nongnu.org/archive/html/qemu-block/2024-06/msg00198.html

I think the change needs more work to keep the default behavior since most
users
want sparse images, but it seems to do what you want - keeping images thick.

Nir


Re: Export checkpoint/bitmap from image on qcow2

2024-01-02 Thread Nir Soffer
On Fri, Dec 22, 2023 at 3:35 PM João Jandre Paraquetti
 wrote:
>
> Hello Nir,
>
> Thank you for your detailed response, but this process does seem a bit
> too complex.
>
> I've been thinking, wouldn't it be easier to use the process below?
>   - Create a paused transient dummy VM;
>   - Create the necessary checkpoints on the VM using dumps from the
> original VM;

Not sure what do you mean by dumps from the original vm...

>   - Use Libvirt's virDomainBackupBegin to create the incremental backup;
>   - Destroy the dummy VM.
>
> What do you think?

It should work and the backup code will be simpler, since you use the
libvirt APIs for both
live and "cold" backup.

But creating the vm in paused mode and integrating this new mode into
your system can
be more complicated than the cold backup code, depending on your system.

Some issues you will have to think about:
- when a vm is running in the backup mode, does it consume resources
(e.g. memory) that it does not need?
- how do you switch to normal mode if a user want to start a vm in the
middle of a backup? (full backup of a large vm can take hours)
- can you migrate a vm during backup?

In oVirt we went in the other way - we never use live backup -
instead, we do this
for every backup:

1. create a snapshot
2. backup the vm using the snapshot (instead of the active image)
3. delete the snapshot

With this both live and cold backup are the same, and we avoid the
issue when long backup
(e.g full backup, or incremental backup with a lot of data) blocks
usage of the vm during the
backup. You can start/stop/migrate a vm while it is being backed up
and the backup can run
on another host in the cluster (if using shared storage).

Nir

> On 12/18/23 18:44, Nir Soffer wrote:
> > On Thu, Nov 30, 2023 at 4:14 PM João Jandre Paraquetti
> >  wrote:
> >> Hi, all
> >>
> >> I recently started looking into how to create incremental backups using
> >> Libvirt+Qemu. I have already found that, since Libvirt 7.6.0, we can use
> >> the virDomainBackupBegin API to create incremental backups of live VMs
> >> through Libvirt. Using this API we can back up the VM's disks and create
> >> a checkpoint at the same time. After the first full backup, we can
> >> create incremental backups referencing the checkpoint that was generated
> >> on the previous backup process.
> >>
> >> As far as I understood Libvirt's documentation, when using this API to
> >> create a checkpoint, what happens is that Libvirt will create a new
> >> bitmap on the VM's disk to track the changes that happened from that
> >> checkpoint on, then, they'll copy what changed between the last bitmap
> >> and the new one. By doing this, Libvirt is able to create incremental
> >> backups of disks that are used by running VMs.
> >>
> >> My problem is the following though: after I stop the VM, I'm no longer
> >> able to use Libvirts API as it requires the VM (domain) to be running.
> >> However, there might still be changes on the disk since the last backup
> >> I performed. Therefore, to create a backup of the changes made since the
> >> last backup until the VM was stopped, I have to work directly on the VM
> >> image using Qemu. I have checked, and the QCOW2 file has the checkpoint
> >> created with the Libvirt backup process (I checked via the "qemu-img
> >> info"). Therefore, we have the marker of the last checkpoint generated
> >> by the last backup process. However, if the VM is stopped, there is no
> >> clear way to export this last differencial copy of the volume.
> >>
> >> I have searched through the documentation about this; however, I haven't
> >> found any example or explanation on how to manually export a
> >> differencial copy of the QCOW2 file using the last checkpoint (bitmap)
> >> created.
> >>
> >> I would much appreaciate, if people could point me to the direction on
> >> how to export differencial copies of a QCOW2 file, using the checkpoints
> >> (bitmaps) that it has.
> > Backup is complicated and a lot of hard work. Before you reinvent the
> > wheel, check
> > if this project project works for you:
> > https://github.com/abbbi/virtnbdbackup
> >
> > If you need to create your own solution of want to understand how backup 
> > works
> > with bitmaps, here is how it works.
> >
> > The basic idea is - you start an nbd server exporting the dirty
> > bitmap, so the client can
> > get the dirty extents using NBD_BLOCK_STATUS and copy the dirty clusters.
> >
> > To start an n

Re: Export checkpoint/bitmap from image on qcow2

2023-12-18 Thread Nir Soffer
On Thu, Nov 30, 2023 at 4:14 PM João Jandre Paraquetti
 wrote:
>
> Hi, all
>
> I recently started looking into how to create incremental backups using
> Libvirt+Qemu. I have already found that, since Libvirt 7.6.0, we can use
> the virDomainBackupBegin API to create incremental backups of live VMs
> through Libvirt. Using this API we can back up the VM's disks and create
> a checkpoint at the same time. After the first full backup, we can
> create incremental backups referencing the checkpoint that was generated
> on the previous backup process.
>
> As far as I understood Libvirt's documentation, when using this API to
> create a checkpoint, what happens is that Libvirt will create a new
> bitmap on the VM's disk to track the changes that happened from that
> checkpoint on, then, they'll copy what changed between the last bitmap
> and the new one. By doing this, Libvirt is able to create incremental
> backups of disks that are used by running VMs.
>
> My problem is the following though: after I stop the VM, I'm no longer
> able to use Libvirts API as it requires the VM (domain) to be running.
> However, there might still be changes on the disk since the last backup
> I performed. Therefore, to create a backup of the changes made since the
> last backup until the VM was stopped, I have to work directly on the VM
> image using Qemu. I have checked, and the QCOW2 file has the checkpoint
> created with the Libvirt backup process (I checked via the "qemu-img
> info"). Therefore, we have the marker of the last checkpoint generated
> by the last backup process. However, if the VM is stopped, there is no
> clear way to export this last differencial copy of the volume.
>
> I have searched through the documentation about this; however, I haven't
> found any example or explanation on how to manually export a
> differencial copy of the QCOW2 file using the last checkpoint (bitmap)
> created.
>
> I would much appreaciate, if people could point me to the direction on
> how to export differencial copies of a QCOW2 file, using the checkpoints
> (bitmaps) that it has.

Backup is complicated and a lot of hard work. Before you reinvent the
wheel, check
if this project project works for you:
https://github.com/abbbi/virtnbdbackup

If you need to create your own solution of want to understand how backup works
with bitmaps, here is how it works.

The basic idea is - you start an nbd server exporting the dirty
bitmap, so the client can
get the dirty extents using NBD_BLOCK_STATUS and copy the dirty clusters.

To start an nbd server you have 2 options:

- qemu-nbd using the --bitmap option. vdsm code is a good place to
learn how to do this
  (more about this below). This was the best option few years ago, but
now you may want
  the second option.

- qemu-storage-daemon - you configure and start the nbd server in the same
  way libvirt does it - using the qmp commands. libvirt code is
probably the best place
  to learn how to do this. This is probably the best option at this
time, unless you are
  using an old qemu version that does not have working qemu-storage-daemon.

If you have a single qcow2 image with the bitmap, you are ready to
export the image.
But you may have a more interesting image when the same bitmap exists
in multiple
snapshots. You can see all the bitmaps and snapshots using:

qemu-img info --backing-chain --output json filename

If the bitmap exists on multiple images in the chain, you need to
validate that the bitmap
is valid - it must exist in all images in the chain since the bitmap
was created and must have
the right flags (see vdsm code below for the details).

After validating the bitmap, you need to create a new bitmap with all
the bits in the same bitmap
in all the images in the chain. You have 2 ways to do this:

- with qemu-nbd: create an overlay on top of the image, create a
bitmap in the overlay, and merge
  the bitmaps from the image chain into this bitmap. Then start
qemu-nbd with the overlay. You can
  create and merge bitmaps using `qemu-img bitmap ...` (see vdsm code
below for the details)

- with qemu-storage-daemon: you can do what libvirt does - create a
new temporary non-persistent bitmap
  (in qemu-storage-daemon memory), and merge the other bitmaps in the
chain using qmp commands.
  (see libvirt source or debug logs for the details).

When the backup bitmap is ready, you can start the nbd server and
configure it to export this bitmap.

When backup is done delete the overlay or the temporary backup bitmap.

You can check how vdsm does this using qemu-nbd here:
https://github.com/oVirt/vdsm/blob/0fc22ff0b81d605f10a2bc67309e119b7462b506/lib/vdsm/storage/nbd.py#L97

The steps:

1. Finding the images with the bitmap and validating the bitmap
   
https://github.com/oVirt/vdsm/blob/0fc22ff0b81d605f10a2bc67309e119b7462b506/lib/vdsm/storage/nbd.py#L217

2. Creating overlay with a backup bitmap
   
https://github.com/oVirt/vdsm/blob/0fc22ff0b81d605f10a2bc67309e119b7462b506/lib/vdsm/storage/nbd.p

Re: Is it normal to get bigger qcow2 image after blockcopy?

2023-11-06 Thread Nir Soffer
On Fri, Nov 3, 2023 at 3:25 AM Fangge Jin  wrote:

>
>
> On Thu, Nov 2, 2023 at 5:13 PM Fangge Jin  wrote:
>
>> Recently, I found that the disk size of qcow2 image get bigger(from 6.16G
>> to 8G in my test) after blockcopy.🔄  ❓🔄  ❓
>>
> Sorry, it should be "from 6.16G to 6.64G in my test"here
>
>> I'm not sure whether this is normal or not. Please help to check. Thanks.
>> 🔄  ❓🔄  ❓
>>
>>
>> Before blockcopy, check source image:🔄  ❓
>>
>   # qemu-img info -U
>> /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2🔄  ❓
>>
>
Was this a compressed qcow2 image when you started? maybe you started with
an appliance image?
these are typically compressed. When data is modified, new clusters are
stored uncompressed, but data
that was never modified on the original disk remains compressed.


>
>> After blockcopy, check target image:🔄  ❓
>>   # qemu-img info -U /var/lib/avocado/data/avocado-vt/images/copy.qcow2
>> 🔄  ❓
>>
>
This image is not compressed based on the following qmp commands from
libvirt log.


> Qemu command line:🔄  ❓
>>   -blockdev
>> '{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
>> \🔄  ❓
>>   -blockdev
>> '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}'
>> \🔄  ❓
>>   -device
>> '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}'
>> \🔄  ❓
>>
>> Qemu monitor command grepped from libvirt log:🔄  ❓
>>
>> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/copy.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"},"id":"libvirt-429"}
>> 🔄  ❓
>>
>> {"execute":"blockdev-create","arguments":{"job-id":"create-libvirt-2-format","options":{"driver":"qcow2","file":"libvirt-2-storage","size":10737418240,"cluster-size":65536}},"id":"libvirt-430"}
>> 🔄  ❓
>>
>> {"execute":"job-dismiss","arguments":{"id":"create-libvirt-2-format"},"id":"libvirt-432"}
>> 🔄  ❓
>>
>> {"execute":"blockdev-add","arguments":{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null},"id":"libvirt-433"}
>> 🔄  ❓
>>
>> {"execute":"blockdev-mirror","arguments":{"job-id":"copy-vda-libvirt-1-format","device":"libvirt-1-format","target":"libvirt-2-format","sync":"full","auto-finalize":true,"auto-dismiss":false},"id":"libvirt-434"}
>> 🔄  ❓
>>
>> {"execute":"transaction","arguments":{"actions":[{"type":"block-dirty-bitmap-add","data":{"node":"libvirt-2-format","name":"libvirt-tmp-activewrite","persistent":false,"disabled":false}}]},"id":"libvirt-443"}
>> 🔄  ❓
>>
>> {"execute":"job-complete","arguments":{"id":"copy-vda-libvirt-1-format"},"id":"libvirt-444"}
>> 🔄  ❓
>>
>
 Nir


Re: how to improve qcow performance?

2021-07-21 Thread Nir Soffer
On Wed, Jul 21, 2021 at 3:20 PM Geraldo Netto  wrote:
>
> Dear Nir/Friends,
>
> On Tue, 20 Jul 2021 at 11:34, Nir Soffer  wrote:
> >
> > On Thu, Jul 15, 2021 at 2:33 PM Geraldo Netto  
> > wrote:
> > >
> > > Dear Friends,
> > >
> > > I beg your pardon for such a newbie question
> > > But I would like to better understand how to improve the qcow performance
> >
> > I guess you mean how to improve "qcow2" performance. If you use "qcow"
> > format the best way is to switch to "qcow2".
>
> I read here [1] there was a qcow3, but it seems that page is
> deprecated (last update on sept. 2016)

QCOW3 is qcow2 v3. There is a version field in the format, but for
some reason it is not exposed in qemu-img info.

You can inspect the headers with this minimal qcow2 parser:
https://github.com/nirs/qcow2-parser

$ python3 qcow2.py /var/tmp/fedora-32.qcow2
{
"backing_file_offset": 0,
"backing_file_size": 0,
"cluster_bits": 16,
"compatible_features": 0,
"crypt_method": 0,
"header_length": 0,
"incompatible_features": 0,
"l1_size": 12,
"l1_table_offset": 196608,
"magic": 1363560955,
"nb_snapshots": 0,
"refcount_order": 0,
"refcount_table_clusters": 1,
"refcount_table_offset": 65536,
"size": 6442450944,
"snapshots_offset": 0,
"version": 3
}

$ qemu-img info /var/tmp/fedora-32.qcow2
image: /var/tmp/fedora-32.qcow2
file format: qcow2
virtual size: 6 GiB (6442450944 bytes)
disk size: 1.55 GiB
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false
extended l2: false

You can use "comapt: 1.1" to identify qcow2 v3. qcow2 v2 have "compat: 0.10".

I think there is a complete parser elsewhere that gives move info.

> > > I was checking the qemu-img and it seems that the following parameters
> > > are the most relevant to optimise the performance, no?
> > >
> > >   'cache' is the cache mode used to write the output disk image, the valid
> > > options are: 'none', 'writeback' (default, except for convert),
> > > 'writethrough',
> > > 'directsync' and 'unsafe' (default for convert)
> > >
> > > Should I infer that directsync means bypass all the stack and write
> > > directly to the disk?
> >
> > 'directsync' is using direct I/O, but calls fsync() for every write. This is
> > the slowest way and does not make sense for converting images.
> >
> > 'none' uses direct I/O (O_DIRECT). This enables native async I/O (libaio)
> > which can give better performance in some cases.
> >
> > 'writeback' uses the page cache, considering the write complete when the
> > data is in the page cache, and reading data from the page cache. This is
> > likely to give the best performance, but is also likely to give inconsistent
> > performance and cause trouble for other applications.
> >
> > The kernel will write a huge amount of data to the page cache, and from time
> > to time try to flush a huge amount of data, which can cause long delays in
> > other processes accessing the same storage. It also pollutes the page cache
> > with data that may not be needed after the image is converted, for example
> > when you convert an image on one host, writing to shared storage, and the
> > image is used later on another host.
> >
> > 'writethrough' seems to use the pagecache, but it reports writes only after
> > data is flushed so it will be slow as 'directsync' for writing, and
> > can cause the
> > same issues with the page cache as 'writeback'.
> >
> > 'unsafe' (default for convert) means writes are never flushed to disk, 
> > which is
> > unsafe when using in vm's -drive option, but completely safe when using in
> > qemu-img convert, since qemu-img completes the operation with fsync().
> >
> > The most important option for performance is -W (unordered writes).
> > For writing to block devices, it is up to 6 times faster. But it can cause
> > fragmentation so you may get faster copies but accessing the image
> > later may be slower.
>
> I see! Now I get it
>
> > Check this for example of -W usage:
> > https://bugzilla.redhat.com/1511891#c57
> >
> > Finally there is the -m option - the default value (8) gives good 
> &

Re: how to improve qcow performance?

2021-07-20 Thread Nir Soffer
On Thu, Jul 15, 2021 at 2:33 PM Geraldo Netto  wrote:
>
> Dear Friends,
>
> I beg your pardon for such a newbie question
> But I would like to better understand how to improve the qcow performance

I guess you mean how to improve "qcow2" performance. If you use "qcow"
format the best way is to switch to "qcow2".

> I was checking the qemu-img and it seems that the following parameters
> are the most relevant to optimise the performance, no?
>
>   'cache' is the cache mode used to write the output disk image, the valid
> options are: 'none', 'writeback' (default, except for convert),
> 'writethrough',
> 'directsync' and 'unsafe' (default for convert)
>
> Should I infer that directsync means bypass all the stack and write
> directly to the disk?

'directsync' is using direct I/O, but calls fsync() for every write. This is
the slowest way and does not make sense for converting images.

'none' uses direct I/O (O_DIRECT). This enables native async I/O (libaio)
which can give better performance in some cases.

'writeback' uses the page cache, considering the write complete when the
data is in the page cache, and reading data from the page cache. This is
likely to give the best performance, but is also likely to give inconsistent
performance and cause trouble for other applications.

The kernel will write a huge amount of data to the page cache, and from time
to time try to flush a huge amount of data, which can cause long delays in
other processes accessing the same storage. It also pollutes the page cache
with data that may not be needed after the image is converted, for example
when you convert an image on one host, writing to shared storage, and the
image is used later on another host.

'writethrough' seems to use the pagecache, but it reports writes only after
data is flushed so it will be slow as 'directsync' for writing, and
can cause the
same issues with the page cache as 'writeback'.

'unsafe' (default for convert) means writes are never flushed to disk, which is
unsafe when using in vm's -drive option, but completely safe when using in
qemu-img convert, since qemu-img completes the operation with fsync().

The most important option for performance is -W (unordered writes).
For writing to block devices, it is up to 6 times faster. But it can cause
fragmentation so you may get faster copies but accessing the image
later may be slower.

Check this for example of -W usage:
https://bugzilla.redhat.com/1511891#c57

Finally there is the -m option - the default value (8) gives good performance,
but using -m 16 can be a little faster.

>   'src_cache' is the cache mode used to read input disk images, the valid
> options are the same as for the 'cache' option
>
> I didn`t follow where should I look to check the 'cache' options :`(

   -t CACHE
  Specifies  the cache mode that should be used with the
(destination) file.
  See the documentation of the emulator's -drive cache=...
option for allowed values.

"See the documentation of the amulator -drive cache=" means see qemu(1).

> I guess that using smaller files is more performance due to the
> reduced number of metadata to handle?

What do you mean by smaller files?

> In any case, I saw the qemu-io command and I plan to stress test it

The best test is to measure the actual operation with qemu-img convert
with different options and the relevant storage.

Nir




Re: how to change sector size using qemu-img

2021-06-25 Thread Nir Soffer
On Fri, Jun 25, 2021 at 7:40 PM Jiatong Shen  wrote:
>
> Hello community,
>
> I have a disk with both logical and physical sector size being 4096. I have a 
> qcow2 image which is built from a virtual machine has legacy 512 bytes sector 
> size.
>
> when I do something like
>
> qemu-img convert -f qcow2 -O host_device disk device

This cannot change the sector size used in the file system inside
this image.

> turns out it will not work. mounting a filesystem reports an error.
>
> Any advice on how to possibly work around this? thank you.

You need to use physical block size of 4k when running the guest:

-device virtio-scsi-pci,id=scsi1,bus=pci.0 \
-drive 
file=/home/scsi/disk2.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,aio=native,media=disk,werror=stop,rerror=stop
\
-device 
scsi-hd,bus=scsi1.0,drive=drive-virtio-disk1,id=virtio-scsi-pci1,physical_block_size=4096,logical_block_size=512
\

See https://bugzilla.redhat.com/1448021

If you use libvirt, it can be done via:



See https://libvirt.org/formatdomain.html#elementsDisks

Nir




Re: resizing disk image in a raw file online possible ?

2021-04-16 Thread Nir Soffer
On Thu, Apr 15, 2021 at 7:13 PM Lentes, Bernd
 wrote:
>
> Hi,
>
> we have several qemu guests running on SLES 12 SP5 and pacemaker and a 
> two-node HA cluster.
> The raw files for the disks for the guests reside on a OCFS2 Volume on a SAN.
> We need to give more storage to a guest (several 100GB).
> Is that online possible ?

Yes, we do this in oVirt for many years.

You need to resize the raw file on storage - this can be done with
truncate:

truncate -s 500g /path

This does not affect the guest yet, since qemu does not know about the
additional size added.

And then you need to update the qemu about the new size. If you use libvirt
python binding this can be done using:

flags = libvirt.VIR_DOMAIN_BLOCK_RESIZE_BYTES
dom.blockResize("sda", 500 * 1024**3, flags=flags)

This should also be possible using virsh.

Nir




Re: qemu-img measure

2020-07-23 Thread Nir Soffer
On Thu, Jul 23, 2020 at 7:55 PM Arik Hadas  wrote:
>
>
>
> On Thu, Jul 23, 2020 at 7:31 PM Nir Soffer  wrote:
>>
>> On Thu, Jul 23, 2020 at 6:12 PM Arik Hadas  wrote:
>>
>> The best place for this question is qemu-discuss, and CC Kevin and Stefan
>> (author of qemu-img measure).
>>
>> > @Nir Soffer does the following make any sense to you:
>> >
>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img info 
>> > 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > image: 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > file format: raw
>> > virtual size: 15 GiB (16106127360 bytes)
>> > disk size: 8.65 GiB
>> >
>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O 
>> > qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93
>> > required size: 16108814336
>> > fully allocated size: 16108814336
>>
>> This means the file system does not report sparseness info, and without
>> information qemu-img cannot give a safe estimate.
>>
>> I can reproduce this on NFS 3:
>>
>> $ mount | grep export/2
>> nfs1:/export/2 on /rhev/data-center/mnt/nfs1:_export_2 type nfs
>> (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,mountaddr=192.168.122.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=192.168.122.30)
>>
>> $ cd /rhev/data-center/mnt/nfs1:_export_2
>>
>> $ truncate -s 1g empty.img
>>
>> $ qemu-img measure -O qcow2 empty.img
>> required size: 1074135040
>> fully allocated size: 1074135040
>>
>> $ qemu-img map --output json empty.img
>> [{ "start": 0, "length": 1073741824, "depth": 0, "zero": false,
>> "data": true, "offset": 0}]
>>
>> If we run qemu-img measure with strace, we can see:
>>
>> $ strace qemu-img measure -O qcow2 empty.img 2>&1 | grep SEEK_HOLE
>> lseek(9, 0, SEEK_HOLE)  = 1073741824
>>
>> This means the byte range from 0 to 1073741824 is data.
>>
>> If we do the same on NFS 4.2:
>>
>> $ mount | grep export/1
>> nfs1:/export/1 on /rhev/data-center/mnt/nfs1:_export_1 type nfs4
>> (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.122.23,local_lock=none,addr=192.168.122.30)
>>
>> $ cd /rhev/data-center/mnt/nfs1\:_export_1
>> $ qemu-img measure -O qcow2 empty.img
>> required size: 393216
>> fully allocated size: 1074135040
>>
>> Unfortunately oVirt default is not NFS 4.2 yet, and we even warn about
>> changing the stupid default.
>>
>> > qemu-img convert -f raw -O qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93 
>> > /tmp/arik.qcow2
>>
>> qemu-img convert detects zeros in the input file, so it can cope with
>> no sparseness info.
>> This is not free of course, copying this image is much slower when we
>> have to read the entire
>> image.
>
>
> It would have been great if 'measure' could also have such an ability to take 
> zeros into account as the 'convert',
> even if it means longer execution time - otherwise when we export VMs to OVAs 
> on such file systems, we may end up allocating the virtual size within the 
> OVA (at least when base volume is a raw volume).

You can file RFE for qemu-img.

>> > [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O 
>> > qcow2 /tmp/arik.qcow2
>> > required size: 9359720448
>> > fully allocated size: 16108814336
>>
>> Now we have qcow2 image, so we don't depend on the file system capabilities.
>> This is the advantage of using advanced file format.
>>
>> > shouldn't the 'measure' command be a bit smarter than that? :)
>>
>> I think it cannot be smarter, but maybe qemu folks have a better answer.
>>
>> To measure, qemu-img needs to know how the data is laid out on disk, to 
>> compute
>> the number of clusters in the qcow2 image. Without help from the
>> filesystem the only
>> way to do this is to read the entire image.
>>
>> The solution in oVirt is to allocate the required size (possibly
>> overallocating) and after
>> conversion was finished, reduce the volume to the required size using:
>> http://ovirt.github.io/ovirt-engine-sdk/4.4/services.m.html#ovirtsdk4.services.StorageDomainDiskService.reduce
>>
>> This is much faster than reading the entire image twice.
>
>
> That's sort of what we've started

Re: qemu-img measure

2020-07-23 Thread Nir Soffer
On Thu, Jul 23, 2020 at 6:12 PM Arik Hadas  wrote:

The best place for this question is qemu-discuss, and CC Kevin and Stefan
(author of qemu-img measure).

> @Nir Soffer does the following make any sense to you:
>
> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img info 
> 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> image: 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> file format: raw
> virtual size: 15 GiB (16106127360 bytes)
> disk size: 8.65 GiB
>
> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O qcow2 
> 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> required size: 16108814336
> fully allocated size: 16108814336

This means the file system does not report sparseness info, and without
information qemu-img cannot give a safe estimate.

I can reproduce this on NFS 3:

$ mount | grep export/2
nfs1:/export/2 on /rhev/data-center/mnt/nfs1:_export_2 type nfs
(rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,mountaddr=192.168.122.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=192.168.122.30)

$ cd /rhev/data-center/mnt/nfs1:_export_2

$ truncate -s 1g empty.img

$ qemu-img measure -O qcow2 empty.img
required size: 1074135040
fully allocated size: 1074135040

$ qemu-img map --output json empty.img
[{ "start": 0, "length": 1073741824, "depth": 0, "zero": false,
"data": true, "offset": 0}]

If we run qemu-img measure with strace, we can see:

$ strace qemu-img measure -O qcow2 empty.img 2>&1 | grep SEEK_HOLE
lseek(9, 0, SEEK_HOLE)  = 1073741824

This means the byte range from 0 to 1073741824 is data.

If we do the same on NFS 4.2:

$ mount | grep export/1
nfs1:/export/1 on /rhev/data-center/mnt/nfs1:_export_1 type nfs4
(rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.122.23,local_lock=none,addr=192.168.122.30)

$ cd /rhev/data-center/mnt/nfs1\:_export_1
$ qemu-img measure -O qcow2 empty.img
required size: 393216
fully allocated size: 1074135040

Unfortunately oVirt default is not NFS 4.2 yet, and we even warn about
changing the stupid default.

> qemu-img convert -f raw -O qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93 
> /tmp/arik.qcow2

qemu-img convert detects zeros in the input file, so it can cope with
no sparseness info.
This is not free of course, copying this image is much slower when we
have to read the entire
image.

> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O qcow2 
> /tmp/arik.qcow2
> required size: 9359720448
> fully allocated size: 16108814336

Now we have qcow2 image, so we don't depend on the file system capabilities.
This is the advantage of using advanced file format.

> shouldn't the 'measure' command be a bit smarter than that? :)

I think it cannot be smarter, but maybe qemu folks have a better answer.

To measure, qemu-img needs to know how the data is laid out on disk, to compute
the number of clusters in the qcow2 image. Without help from the
filesystem the only
way to do this is to read the entire image.

The solution in oVirt is to allocate the required size (possibly
overallocating) and after
conversion was finished, reduce the volume to the required size using:
http://ovirt.github.io/ovirt-engine-sdk/4.4/services.m.html#ovirtsdk4.services.StorageDomainDiskService.reduce

This is much faster than reading the entire image twice.

Nir




Re: [ovirt-devel] [ARM64] Possiblity to support oVirt on ARM64

2020-07-19 Thread Nir Soffer
On Sun, Jul 19, 2020 at 5:04 PM Zhenyu Zheng  wrote:
>
> Hi oVirt,
>
> We are currently trying to make oVirt work on ARM64 platform, since I'm quite 
> new to oVirt community, I'm wondering what is the current status about ARM64 
> support in the oVirt upstream, as I saw the oVirt Wikipedia page mentioned 
> there is an ongoing efforts to support ARM platform. We have a small team 
> here and we are willing to also help to make this work.

Hi Zhenyu,

I think this is a great idea, both supporting more hardware, and
enlarging the oVirt
community.

Regarding hardware support we depend mostly on libvirt and qemu, and I
don't know
that is the status. Adding relevant lists and people.

I don't know about any effort  on oVirt side, but last week I added
arm builds for
ovirt-imageio and it works:
https://copr.fedorainfracloud.org/coprs/nsoffer/ovirt-imageio-preview/build/1555705/

We have many dependendencies, but oVirt itself is mostly python and java, with
tiny bits in C or using ctypes, so it should not be too hard.

I think the first thing is getting some hardware for testing. Do you
have such hardware,
or have some contacts that can help to get hardware contribution for this?

Nir




Re: [ovirt-users] [OT] Major and minor numbers assigned to /dev/vdx virtio devices

2020-07-13 Thread Nir Soffer
On Wed, Jul 1, 2020 at 5:55 PM Gianluca Cecchi
 wrote:
>
> Hello,
> isn't there an official major/minor numbering scheme for virtio disks?
> Sometimes I see 251 major or 252 or so... what is the udev assignment logic?
> Reading here:
> https://www.kernel.org/doc/Documentation/admin-guide/devices.txt
>
>  240-254 block LOCAL/EXPERIMENTAL USE
> Allocated for local/experimental use.  For devices not
> assigned official numbers, these ranges should be
> used in order to avoid conflicting with future assignments.
>
> it seems they are in the range of experimental ones, while for example Xen 
> /dev/xvdx devices have their own static assignment (202 major)

This question belongs to qemu-discuss.

Also added some people that may help.

Nir




Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-29 Thread Nir Soffer
On Mon, Jun 29, 2020 at 3:06 PM Kevin Wolf  wrote:
>
> Am 26.06.2020 um 21:42 hat Nir Soffer geschrieben:
> > On Tue, Jun 23, 2020 at 1:21 AM Nir Soffer  wrote:
> > >
> > > I'm trying to export qcow2 images from ova format using qemu-nbd.
> > >
> > > I create 2 compressed qcow2 images, with different data:
> > >
> > > $ qemu-img info disk1.qcow2
> > > image: disk1.qcow2
> > > file format: qcow2
> > > virtual size: 200 MiB (209715200 bytes)
> > > disk size: 384 KiB
> > > ...
> > >
> > > $ qemu-img info disk2.qcow2
> > > image: disk2.qcow2
> > > file format: qcow2
> > > virtual size: 200 MiB (209715200 bytes)
> > > disk size: 384 KiB
> > > ...
> > >
> > > And packed them in a tar file. This is not a valid ova but good enough
> > > for this test:
> > >
> > > $ tar tvf vm.ova
> > > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk1.qcow2
> > > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk2.qcow2
> > >
> > > To get info about the disks in ova file, we can use:
> > >
> > > $ python -c 'import tarfile; print(list({"name": m.name, "offset":
> > > m.offset_data, "size": m.size} for m in tarfile.open("vm.ova")))'
> > > [{'name': 'disk1.qcow2', 'offset': 512, 'size': 454144}, {'name':
> > > 'disk2.qcow2', 'offset': 455168, 'size': 454144}]
> > >
> > > First I tried the obvious:
> > >
> > > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only --offset=512 
> > > vm.ova
> > >
> > > And it works, but it exposes the qcow2 data. I want to raw data so I
> > > can upload the guest
> > > data to ovirt, where is may be converted to qcow2 format.
> > >
> > > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > {
> > > "virtual-size": 209715200,
> > > "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> > > "format": "qcow2",
> > >  ...
> > > }
> > >
> > > Looking in qemu manual and qapi/block-core.json, I could construct this 
> > > command:
> > >
> > > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only
> > > 'json:{"driver": "qcow2", "file": {"driver": "raw", "offset": 512,
> > > "size": 454144, "file": {"driver": "file", "filename": "vm.ova"}}}'
> > >
> > > And it works:
> > >
> > > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > {
> > > "virtual-size": 209715200,
> > > "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> > > "format": "raw"
> > > }
> > >
> > > $ qemu-img map --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
> > > true, "offset": 0},
> > > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > > "data": false, "offset": 104857600}]
> > >
> > > $ qemu-img map --output json disk1.qcow2
> > > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": 
> > > true},
> > > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > > "data": false}]
> > >
> > > $ qemu-img convert -f raw -O raw nbd+unix://?socket=/tmp/nbd.sock 
> > > disk1.raw
> > >
> > > $ qemu-img info disk1.raw
> > > image: disk1.raw
> > > file format: raw
> > > virtual size: 200 MiB (209715200 bytes)
> > > disk size: 100 MiB
> > >
> > > $ qemu-img compare disk1.raw disk1.qcow2
> > > Images are identical.
> > >
> > > I wonder if this is the best way to stack a qcow2 driver on top of a
> > > raw driver exposing a range from a tar file.
>
> Yes, if you want to specify an offset and a size to access only part of
> a file as the disk image, sticking a raw driver in the middle is the way
> to go.
>
>

Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-26 Thread Nir Soffer
On Fri, Jun 26, 2020 at 11:34 PM Richard W.M. Jones  wrote:
>
> On Fri, Jun 26, 2020 at 10:42:02PM +0300, Nir Soffer wrote:
> > Can we have better support in qemu-img/qemu-nbd for accessing images
> > in a tar file?
> >
> > Maybe something like:
> >
> > qemu-img info tar://vm.ova?member=fedora-32.qcow2
>
> Isn't this exactly a case where nbdkit-tar-plugin would work despite
> the performance problems with it being written in Python?  Something like:
>
> $ tar tvf disk.ova
> -rw-r--r-- rjones/rjones 2031616 2020-06-26 21:32 disk.qcow2
>
> $ nbdkit -U - tar tar=disk.ova file=disk.qcow2 --run 'qemu-img info 
> --output=json $nbd'
> {
> "virtual-size": 105923072,
> "filename": "nbd+unix://?socket=/tmp/nbdkitTjkeRd/socket",
> "cluster-size": 65536,
> "format": "qcow2",
> "format-specific": {
> "type": "qcow2",
> "data": {
> "compat": "1.1",
> "lazy-refcounts": false,
> "refcount-bits": 16,
> "corrupt": false
> }
> },
> "dirty-flag": false
> }
>
> qemu-img measure will work the same way.

Looks like format probing just works this way, I'll try this, thanks!

Nir




Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-26 Thread Nir Soffer
On Tue, Jun 23, 2020 at 1:21 AM Nir Soffer  wrote:
>
> I'm trying to export qcow2 images from ova format using qemu-nbd.
>
> I create 2 compressed qcow2 images, with different data:
>
> $ qemu-img info disk1.qcow2
> image: disk1.qcow2
> file format: qcow2
> virtual size: 200 MiB (209715200 bytes)
> disk size: 384 KiB
> ...
>
> $ qemu-img info disk2.qcow2
> image: disk2.qcow2
> file format: qcow2
> virtual size: 200 MiB (209715200 bytes)
> disk size: 384 KiB
> ...
>
> And packed them in a tar file. This is not a valid ova but good enough
> for this test:
>
> $ tar tvf vm.ova
> -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk1.qcow2
> -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk2.qcow2
>
> To get info about the disks in ova file, we can use:
>
> $ python -c 'import tarfile; print(list({"name": m.name, "offset":
> m.offset_data, "size": m.size} for m in tarfile.open("vm.ova")))'
> [{'name': 'disk1.qcow2', 'offset': 512, 'size': 454144}, {'name':
> 'disk2.qcow2', 'offset': 455168, 'size': 454144}]
>
> First I tried the obvious:
>
> $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only --offset=512 vm.ova
>
> And it works, but it exposes the qcow2 data. I want to raw data so I
> can upload the guest
> data to ovirt, where is may be converted to qcow2 format.
>
> $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> {
> "virtual-size": 209715200,
> "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> "format": "qcow2",
>  ...
> }
>
> Looking in qemu manual and qapi/block-core.json, I could construct this 
> command:
>
> $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only
> 'json:{"driver": "qcow2", "file": {"driver": "raw", "offset": 512,
> "size": 454144, "file": {"driver": "file", "filename": "vm.ova"}}}'
>
> And it works:
>
> $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> {
> "virtual-size": 209715200,
> "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> "format": "raw"
> }
>
> $ qemu-img map --output json "nbd+unix://?socket=/tmp/nbd.sock"
> [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
> true, "offset": 0},
> { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> "data": false, "offset": 104857600}]
>
> $ qemu-img map --output json disk1.qcow2
> [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": true},
> { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> "data": false}]
>
> $ qemu-img convert -f raw -O raw nbd+unix://?socket=/tmp/nbd.sock disk1.raw
>
> $ qemu-img info disk1.raw
> image: disk1.raw
> file format: raw
> virtual size: 200 MiB (209715200 bytes)
> disk size: 100 MiB
>
> $ qemu-img compare disk1.raw disk1.qcow2
> Images are identical.
>
> I wonder if this is the best way to stack a qcow2 driver on top of a
> raw driver exposing
> a range from a tar file.
>
> I found similar example for gluster in:
> docs/system/device-url-syntax.rst.inc

Other related challenges with this are:

1. probing image format

With standalone images, we probe image format using:

qemu-img info image

I know probing is considered dangerous, but I think this ok when user
run this code on
his machine, on an image they want to upload to oVirt. On a hypervisor
we use prlimit
to limit the resources used by qemu-img, so we can use the same
solution also when
running by a user if needed.

However not being able to probe image format is a usability issue. It
does not make sense
that qemu-img cannot probe image format safely, at least for qcow2 format.

I can get image info using:

$ qemu-img info 'json:{"driver": "qcow2", "file": {"driver": "raw",
"offset": 1536, "file": {"driver": "file", "filename":
"fedora-32.ova"}}}'
image: json:{"driver": "qcow2", "file": {"offset": 1536, "driver":
"raw", "file": {"driver": "file", "filename": "fedora-32.ova"}}}
file format

Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-26 Thread Nir Soffer
On Tue, Jun 23, 2020 at 1:47 PM Richard W.M. Jones  wrote:
>
> On Tue, Jun 23, 2020 at 01:14:43PM +0300, Nir Soffer wrote:
> > On Tue, Jun 23, 2020 at 12:47 PM Richard W.M. Jones  
> > wrote:
> > > Here you go:
> > >
> > > https://github.com/libguestfs/nbdkit/commit/2d15e79f65764d9b0c68bea28ed6afbcbcc63467
> >
> > Nice!
> >
> > But using qemu-nbd directly is much simpler and will perform better.
>
> Not sure about simpler,

These are the patches (in review) implementing this in imageio client:

- https://gerrit.ovirt.org/c/109847
- https://gerrit.ovirt.org/c/109848

And this is example script for the engine SDK:
https://gerrit.ovirt.org/c/109873/

Work is not finished yet, we need to handle getting image virtual size
and measuring
required size to support upload to sparse disks on block storage:
https://bugzilla.redhat.com/show_bug.cgi?id=1849981#c3

Since we already run qemu-nbd to access images, this is just a
modification of the
'json:{...}' filename, so we can also consume images inside tar file.

And in the client, allow specifying image format and tar member name
(like the tar plugin).

Using tar plugin requires:
- Adding support for NBD url - this is planned but at least the same
amount of work as supporting
  json: with qcow2 drive over raw with range.
- Integrating with nbdkit and qemu-nbd (since we must have qcow2 support)
- packaging
- handle missing dependencies

Whats's missing in the current solution is supporting compressed raw
disk. I don't think
oVirt creates such ova files, so this should not be an issue. However
if nbdkit tar
plugin can support this (current implementation does not), this will
be a good reason
to integrate with it.

In this case we will have:

[ovf | gziped raw image | ...]  -> nbdkit exposing uncompressed
raw image -> imageio nbd client

I wonder if it is possible to  add gzip driver in qemu block layer?

> and you might want to verify the "perform better" claim

I did not test yet on real server and harder, but from initial tests I
get same performance
from uploading qcow2 compressed image and uploading qcow2 compressed
image inside
ova file.

> (it is likely to be true because writing a plugin in
> Python causes requests to be serialized, but it may not matter if
> you're reading linearly from a file).

It matters, because having serial reads also imply serial writes to the server
in imageio. We could change this but the current design we have multiple threads
reading from nbd server, and sending data or zero requests to imageio server.
This doubles the performance we had in ovirt 4.3.

You can see here results compared to qemu-img convert from local image
to block device:
https://github.com/oVirt/ovirt-imageio/commit/97f2e277458db579023ba54a4a4bd122b36f543e

The unexpected slow results with qemu-img are caused by the pre-zero
pessimisation
which should be removed soon:
https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg01094.html

With the slow pre-zeroing removed, qemu-img convert gives similar performance.

> The tar plugin could be rewritten in C if performance was really a problem.

This may make the tar plugin more attractive, especially if we can
support multiple
readers. Maybe we can separate the tar parsing to a helper process, and use the
results (list of (member, offset, size)) in a file-like plugin in C?


> > Regardless, nbdit tar plugin is awesome. Is it possible to expose
> > all the disks from a tar file so they are accessible using the
> > export name?
>
> In theory yes, but it would require exposing the export name
> (nbdkit_export_name() -> nbdkit.export_name()) to Python plugins,
> which we don't do at the moment.  See plugins/python/python.c:
> NbdkitMethods[].  That would also mean the plugin would require the
> latest nbdkit so you'd have to wait for patches to get backported to
> RHEL 8.
>
> You would also have to be cautious with security because the export
> name is supplied by the untrusted client.
>
> > For example:
> >
> > $ nbdkit tar file=vm.ova
> >
> > $ qemu-nbd --list
> > exports available: 2
> >  export: 'disk1.qcow2'
> >   size:  910848
> >   flags: 0x48f ( readonly flush fua df cache )
> >   min block: 512
> >   opt block: 4096
> >   max block: 33554432
> >   available meta contexts: 1
> >base:allocation
> >  export: 'disk2.qcow2'
> >   size:  910848
> >   flags: 0x48f ( readonly flush fua df cache )
> >   min block: 512
> >   opt block: 4096
> >   max block: 33554432
> >   available meta contexts: 1
> >base:allocation
> >
> > $  qemu-img convert -f qcow2 -O raw nbb://localhost/disk1.qcow2 disk1.raw
> >
> > $  qemu-img convert -f qcow2 -O

Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-26 Thread Nir Soffer
On Tue, Jun 23, 2020 at 5:08 PM Richard W.M. Jones  wrote:
>
> On Tue, Jun 23, 2020 at 08:47:52AM -0500, Eric Blake wrote:
> > On 6/22/20 5:21 PM, Nir Soffer wrote:
> > >And it works, but it exposes the qcow2 data. I want to raw data so I
> > >can upload the guest
> > >data to ovirt, where is may be converted to qcow2 format.
>
> Nir, can you use qemu-img convert and get a free conversion to your
> choice of format?  This works fine over NBD as long as you don't try
> and write which I guess you don't want to do here.

No, this code can run on remote host that does not have access to
storage, or on
a hypervisor as any user that cannot access storage.

> > >Richard suggested to try nbdkit tar plugin, but the plugin is not
> > >available on RHEL,
> > >and this adds additional dependency, when we already use qemu-nbd.
> >
> > Rich just rewrote the tar plugin to use python instead of perl,
> > which means it is that much easier for a future RHEL to pull it in.
> > We still ought to consider having a tar filter, either in place of
> > or in addition to, the tar plugin (similar to how we recently
> > converted nbdkit's ext4 support from a plugin to a filter) - having
> > a tar filter would allow you to read a compressed ova file (by
> > combining the xz and tar filters to decompress then extract a file).
> > But right now, nbdkit doesn't support non-C filters (and given that
> > our tar plugin was written first in perl and now in python, that
> > still means translation to yet another language if the filter
> > requires it to be in C).
>
> The reason it was in Perl and is now in Python (and not C), and also
> the reason it still a plugin, is that parsing tar files is very
> complex because of historical compatibility.  If we accept that we
> cannot write a from-scratch tar file parser in C then we have to use
> an existing tool or library (‘tar’ itself was used by Perl, now we're
> using ‘tarfile.py’ from Python stdlib).  Those tools require access to
> an actual local file.  So I'm afraid this rewrite is hard work :-)
>
> Unless we accept that we only parse files created by a narrow range of
> tools, but the problem is that OVA files can be generated by a wide
> variety of tools.
>
> If you can supply the offset by some other means then of course using
> nbdkit-offset-filter or qemu's offset block layer is the solution.
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-p2v converts physical machines to virtual machines.  Boot with a
> live CD or over the network (PXE) and turn machines into KVM guests.
> http://libguestfs.org/virt-v2v
>




Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-23 Thread Nir Soffer
On Tue, Jun 23, 2020 at 12:47 PM Richard W.M. Jones  wrote:
>
>
> Here you go:
>
> https://github.com/libguestfs/nbdkit/commit/2d15e79f65764d9b0c68bea28ed6afbcbcc63467

Nice!

But using qemu-nbd directly is much simpler and will perform better.

Regardless, nbdit tar plugin is awesome. Is it possible to expose all
the disks from
a tar file so they are accessible using the export name?

For example:

$ nbdkit tar file=vm.ova

$ qemu-nbd --list
exports available: 2
 export: 'disk1.qcow2'
  size:  910848
  flags: 0x48f ( readonly flush fua df cache )
  min block: 512
  opt block: 4096
  max block: 33554432
  available meta contexts: 1
   base:allocation
 export: 'disk2.qcow2'
  size:  910848
  flags: 0x48f ( readonly flush fua df cache )
  min block: 512
  opt block: 4096
  max block: 33554432
  available meta contexts: 1
   base:allocation

$  qemu-img convert -f qcow2 -O raw nbb://localhost/disk1.qcow2 disk1.raw

$  qemu-img convert -f qcow2 -O raw nbb://localhost/disk2.qcow2 disk2.raw

> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-top is 'top' for virtual machines.  Tiny program with many
> powerful monitoring features, net stats, disk stats, logging, etc.
> http://people.redhat.com/~rjones/virt-top
>




Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-22 Thread Nir Soffer
On Tue, Jun 23, 2020, 03:37 Jakob Bohm  wrote:

> Why not use qemu-img convert directly, it doesn't expose the disk
> content to any
> interface except the disk image file(s) created.
>

The context is uploading disks to a remote oVirt setup. You don't have
access to the target image.


> On 2020-06-23 00:21, Nir Soffer wrote:
> > I'm trying to export qcow2 images from ova format using qemu-nbd.
> >
> > I create 2 compressed qcow2 images, with different data:
> >
> > $ qemu-img info disk1.qcow2
> > image: disk1.qcow2
> > file format: qcow2
> > virtual size: 200 MiB (209715200 bytes)
> > disk size: 384 KiB
> > ...
> >
> > $ qemu-img info disk2.qcow2
> > image: disk2.qcow2
> > file format: qcow2
> > virtual size: 200 MiB (209715200 bytes)
> > disk size: 384 KiB
> > ...
> >
> > And packed them in a tar file. This is not a valid ova but good enough
> > for this test:
> >
> > $ tar tvf vm.ova
> > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk1.qcow2
> > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk2.qcow2
> >
> > To get info about the disks in ova file, we can use:
> >
> > $ python -c 'import tarfile; print(list({"name": m.name, "offset":
> > m.offset_data, "size": m.size} for m in tarfile.open("vm.ova")))'
> > [{'name': 'disk1.qcow2', 'offset': 512, 'size': 454144}, {'name':
> > 'disk2.qcow2', 'offset': 455168, 'size': 454144}]
> >
> > First I tried the obvious:
> >
> > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only --offset=512
> vm.ova
> >
> > And it works, but it exposes the qcow2 data. I want to raw data so I
> > can upload the guest
> > data to ovirt, where is may be converted to qcow2 format.
> >
> > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > {
> >  "virtual-size": 209715200,
> >  "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> >  "format": "qcow2",
> >   ...
> > }
> >
> > Looking in qemu manual and qapi/block-core.json, I could construct this
> command:
> >
> > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only
> > 'json:{"driver": "qcow2", "file": {"driver": "raw", "offset": 512,
> > "size": 454144, "file": {"driver": "file", "filename": "vm.ova"}}}'
> >
> > And it works:
> >
> > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > {
> >  "virtual-size": 209715200,
> >  "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> >  "format": "raw"
> > }
> >
> > $ qemu-img map --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
> > true, "offset": 0},
> > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > "data": false, "offset": 104857600}]
> >
> > $ qemu-img map --output json disk1.qcow2
> > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
> true},
> > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > "data": false}]
> >
> > $ qemu-img convert -f raw -O raw nbd+unix://?socket=/tmp/nbd.sock
> disk1.raw
> >
> > $ qemu-img info disk1.raw
> > image: disk1.raw
> > file format: raw
> > virtual size: 200 MiB (209715200 bytes)
> > disk size: 100 MiB
> >
> > $ qemu-img compare disk1.raw disk1.qcow2
> > Images are identical.
> >
> > I wonder if this is the best way to stack a qcow2 driver on top of a
> > raw driver exposing
> > a range from a tar file.
> >
> > I found similar example for gluster in:
> > docs/system/device-url-syntax.rst.inc
> >
> > Richard suggested to try nbdkit tar plugin, but the plugin is not
> > available on RHEL,
> > and this adds additional dependency, when we already use qemu-nbd.
> >
> > Nir
> >
> >
>
>
> Enjoy
>
> Jakob
> --
> Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
> Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
> This public discussion message is non-binding and may contain errors.
> WiseMo - Remote Service Management for PCs, Phones and Embedded
>
>
>


Exporting qcow2 images as raw data from ova file with qemu-nbd

2020-06-22 Thread Nir Soffer
I'm trying to export qcow2 images from ova format using qemu-nbd.

I create 2 compressed qcow2 images, with different data:

$ qemu-img info disk1.qcow2
image: disk1.qcow2
file format: qcow2
virtual size: 200 MiB (209715200 bytes)
disk size: 384 KiB
...

$ qemu-img info disk2.qcow2
image: disk2.qcow2
file format: qcow2
virtual size: 200 MiB (209715200 bytes)
disk size: 384 KiB
...

And packed them in a tar file. This is not a valid ova but good enough
for this test:

$ tar tvf vm.ova
-rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk1.qcow2
-rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk2.qcow2

To get info about the disks in ova file, we can use:

$ python -c 'import tarfile; print(list({"name": m.name, "offset":
m.offset_data, "size": m.size} for m in tarfile.open("vm.ova")))'
[{'name': 'disk1.qcow2', 'offset': 512, 'size': 454144}, {'name':
'disk2.qcow2', 'offset': 455168, 'size': 454144}]

First I tried the obvious:

$ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only --offset=512 vm.ova

And it works, but it exposes the qcow2 data. I want to raw data so I
can upload the guest
data to ovirt, where is may be converted to qcow2 format.

$ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
{
"virtual-size": 209715200,
"filename": "nbd+unix://?socket=/tmp/nbd.sock",
"format": "qcow2",
 ...
}

Looking in qemu manual and qapi/block-core.json, I could construct this command:

$ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only
'json:{"driver": "qcow2", "file": {"driver": "raw", "offset": 512,
"size": 454144, "file": {"driver": "file", "filename": "vm.ova"}}}'

And it works:

$ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
{
"virtual-size": 209715200,
"filename": "nbd+unix://?socket=/tmp/nbd.sock",
"format": "raw"
}

$ qemu-img map --output json "nbd+unix://?socket=/tmp/nbd.sock"
[{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
true, "offset": 0},
{ "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
"data": false, "offset": 104857600}]

$ qemu-img map --output json disk1.qcow2
[{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": true},
{ "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
"data": false}]

$ qemu-img convert -f raw -O raw nbd+unix://?socket=/tmp/nbd.sock disk1.raw

$ qemu-img info disk1.raw
image: disk1.raw
file format: raw
virtual size: 200 MiB (209715200 bytes)
disk size: 100 MiB

$ qemu-img compare disk1.raw disk1.qcow2
Images are identical.

I wonder if this is the best way to stack a qcow2 driver on top of a
raw driver exposing
a range from a tar file.

I found similar example for gluster in:
docs/system/device-url-syntax.rst.inc

Richard suggested to try nbdkit tar plugin, but the plugin is not
available on RHEL,
and this adds additional dependency, when we already use qemu-nbd.

Nir




Re: QEMU-img convert question

2019-12-04 Thread Nir Soffer
On Tue, Dec 3, 2019 at 7:27 PM Reuven Alvesson  wrote:
>
> Hi.
>
> I can not create a VM in proxmox...I get all the time this message when 
> trying to create a Hard drive:
>
>
> lvcreate 'pve/vm-100-disk-0' error: Aborting. Could not deactivate thin pool 
> pve/data. at /usr/share/perl5/PVE/API2/Qemu.pm line 1314. (500)

This is part of proxmox, not qemu.

>
> Any idea of what this could be ?

You should ask here:
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

>
> On Mon, Dec 2, 2019 at 8:08 PM Carson, Jay B  wrote:
>>
>> Does running the following command transfer any data to any cloud or 
>> external service provider:
>>
>> $ qemu-img convert -f raw -O qcow2 image.img image.qcow2
>>
>>
>>
>> Does this application work as a completely standalone application?
>>
>>
>>
>> Thank you,
>>
>> Jay
>>
>>




Re: [ovirt-users] Possible sources of cpu steal and countermeasures

2019-12-04 Thread Nir Soffer
On Wed, Dec 4, 2019 at 6:15 PM  wrote:
>
> Hi,
>
> I'm having performance issues with a ovirt installation. It is showing
> high steal (5-10%) for a cpu intensive VM. The hypervisor however has
> more than 65% of his resources idle while the steal is seen inside of
> the VM.
>
> Even when placing only a single VM on a hypervisor it still receives
> steal (0-2%), even though the hypervisor is not overcommited.
>
>
> Hypervisor:
>
> 2 Socket system in total 2*28(56HT) cores
>
>
> VM:
>
> 30vCPUs (ovirt seems to think its a good idea to make that 15 sockets *
> 2 cores)

I think you can control this in oVirt.

> My questions are:
>
> a) Could it be that the hypervisor is trying to schedule all 30 cores on
> a single numa node, ie using the HT cores instead of "real" ones and
> this shows up as steal?
>
> b) Do I need to make VMs this big numa-aware and spread the vm over both
> numa nodes?
>
> c) Would using the High Performance VM type help in this kind of situation?
>
> d) General advise: how do I reduce steal in an environment where the
> hypervisor has idle resources
>
>
> Any advise would be appreciated.

These questions are mainly about qemu, so adding qemu-discuss.

I think it will help if you share your vm qemu command line, found in:
/var/log/libvit/qemu/vm-name.log

Nir




Re: qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M accel=... for now instead

2019-11-25 Thread Nir Soffer
On Mon, Nov 25, 2019 at 10:29 PM Paolo Bonzini  wrote:
>
> Yes, it's a good change. The reason for the deprecation is that -accel with 
> colons worked only because of an implementation detail, and it conflicts with 
> the plans we have to improve -accel. In particular, in the next version of 
> QEMU the -accel option will grow more suboptions specific to KVM or TCG, and 
> that doesn't make sense if more than one accelerator is specified.
>
> Therefore in the next version of QEMU you will be able to say "-accel kvm 
> -accel tcg", each with its own set of suboptions, but that isn't implemented 
> yet. Anyway there's no plan to drop "-M accel=kvm:tcg", it will continue to 
> work and will be transparently converted to "-accel kvm -accel tcg".

Makes sense, thanks!

>
> Thanks,
>
> Paolo
>
> Il lun 25 nov 2019, 21:20 Nir Soffer  ha scritto:
>>
>> On Mon, Nov 25, 2019 at 9:54 PM Nir Soffer  wrote:
>> >
>> > We have automated tests running qemu like this:
>> >
>> > 19:22:55 DEBUG   (MainThread) [qemu] Starting qemu ['qemu-kvm',
>> > '-accel', 'kvm:tcg', '-drive',
>> > 'file=/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/disk.raw,format=raw',
>> > '-nographic', '-qmp',
>> > 'unix:/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/qmp.sock,server,nowait',
>> > '-audiodev', 'none,id=1', '-S']
>> >
>> > These tests worked fine for couple of moth and start to fail to day
>> > with this error:
>> > qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M
>> > accel=... for now instead
>> >
>> > Since we use virt-preview repo, we consume now this version:
>> > qemu-4.2.0-0.2.rc2.fc30.src.rpm
>> >
>> > These tests run fine with:
>> > qemu-kvm-4.1.0-5.fc30.x86_64
>> >
>> > Is this a known issue?
>>
>> This patch fixes the issue:
>> https://gerrit.ovirt.org/c/105030/1/common/test/qemu.py
>>
>> Would be nice to get your blessing on this change :-)
>>
>> Nir
>>




Re: qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M accel=... for now instead

2019-11-25 Thread Nir Soffer
On Mon, Nov 25, 2019 at 9:54 PM Nir Soffer  wrote:
>
> We have automated tests running qemu like this:
>
> 19:22:55 DEBUG   (MainThread) [qemu] Starting qemu ['qemu-kvm',
> '-accel', 'kvm:tcg', '-drive',
> 'file=/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/disk.raw,format=raw',
> '-nographic', '-qmp',
> 'unix:/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/qmp.sock,server,nowait',
> '-audiodev', 'none,id=1', '-S']
>
> These tests worked fine for couple of moth and start to fail to day
> with this error:
> qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M
> accel=... for now instead
>
> Since we use virt-preview repo, we consume now this version:
> qemu-4.2.0-0.2.rc2.fc30.src.rpm
>
> These tests run fine with:
> qemu-kvm-4.1.0-5.fc30.x86_64
>
> Is this a known issue?

This patch fixes the issue:
https://gerrit.ovirt.org/c/105030/1/common/test/qemu.py

Would be nice to get your blessing on this change :-)

Nir




qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M accel=... for now instead

2019-11-25 Thread Nir Soffer
We have automated tests running qemu like this:

19:22:55 DEBUG   (MainThread) [qemu] Starting qemu ['qemu-kvm',
'-accel', 'kvm:tcg', '-drive',
'file=/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/disk.raw,format=raw',
'-nographic', '-qmp',
'unix:/var/tmp/ovirt-imageio-common/test_full_backup_raw_tcp_0/qmp.sock,server,nowait',
'-audiodev', 'none,id=1', '-S']

These tests worked fine for couple of moth and start to fail to day
with this error:
qemu-system-x86_64: -accel kvm:tcg: Don't use ':' with -accel, use -M
accel=... for now instead

Since we use virt-preview repo, we consume now this version:
qemu-4.2.0-0.2.rc2.fc30.src.rpm

These tests run fine with:
qemu-kvm-4.1.0-5.fc30.x86_64

Is this a known issue?

Nir




Re: [Qemu-discuss] [ovirt-users] VM snapshot - some info needed

2019-07-07 Thread Nir Soffer
On Wed, Jul 3, 2019 at 8:11 PM Strahil Nikolov 
wrote:

> I have noticed that if I want a VM snapshot with memory - I get a warning "The
> VM will be paused while saving the memory" .
> Is there a way to make snapshot without pausing the VM for the whole
> duration ?
>

No, this is expected.

Denis was working on background snapshot which will remove this limitation:
https://www.youtube.com/watch?v=fKj4j8lw8pU

Nir


> I have noticed that it doesn't matter if VM has qemu-guest-agent or
> ovirt-guest-agent.
>
> Thanks in advance for sharing your thoughts.
>
> Best Regards,
> Strahil  Nikolov
> ___
> Users mailing list -- us...@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/us...@ovirt.org/message/7VFMZRGF57WXTFTZ44W55MDX6BJES5TX/
>


Re: [Qemu-discuss] [ovirt-devel] ovirt and AMD virtualization bug

2019-03-08 Thread Nir Soffer
On Sat, Mar 9, 2019 at 1:24 AM Hetz Ben Hamo  wrote:

> Hi,
>
> I've done some research on a bug which I found on the AMD Zen/Zen+ based
> CPU's. Apparently, when using nested virtualization, the CPU doesn't expose
> the "monitor" CPU flag.
>
> While other virtualization solutions (Xen, ESXi, Hyper-V) don't care about
> this issue and let you run with nested without any problem, oVirt doesn't
> let you launch or create any VM and it complains that this "monitor" CPU
> flag is missing.
>
> Since QEMU/KVM cannot do anything about it, I was wondering if the ovirt
> dev team could ignore this flag please.
>

Adding some people and relevant mailing lists.

Nir


Re: [Qemu-discuss] [ovirt-devel] Query regarding copying and executing files inside guest OS

2019-02-21 Thread Nir Soffer
On Thu, Feb 21, 2019 at 2:42 PM Pravin Amin  wrote:

> Hi,
>
>
>
> We are looking for a way to copy files and execute scripts inside Virtual
> Machines hosted on oVirt manager (Any supported guest OS) either through
> REST API or using Guest Tools. Please let me know how we can achieve it.
>

I'm not an expert in this area, but I think you have 2 options:
- ssh - we have integration in engine, not sure how it can be used
- qemu-guest-agent - I think it supports running commands on a host, but I
don't know
  any derails. We integrate with it, but I don't know if we support such
options.

I hope that Ryan can help with this.

For qemu-guest-agent, the best place to ask is probably qemu-discuss, added.

Nir



>
>
> -Thanks,
>
> *Pravin Amin*
>
> Principal SQA Engineer
>
> Netbackup
>
> Veritas Technologies LLC
>
>
>
> Office: (91) 20 - 66157288 /  Mobile: (91) 9881139050
>
> pravin.a...@veritas.com
>
>
>
> [image: signature_394478960]
> 
>
>
>
> This message (including any attachments) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is non-public, proprietary, privileged, confidential, and
> exempt from disclosure under applicable law or may constitute as attorney
> work product. If you are not the intended recipient, you are hereby
> notified that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and (i) destroy
> this message if a facsimile or (ii) delete this message immediately if this
> is an electronic communication.
> ___
> Devel mailing list -- de...@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/de...@ovirt.org/message/PE7G5WXBGIN5I2O2JHLZZ2A2VTWFWQDN/
>


Re: [Qemu-discuss] [ovirt-users] VirtIO in new upgraded 4.2.5 for FreeBSD is very poor (could not use)

2018-08-07 Thread Nir Soffer
On Sun, Aug 5, 2018 at 1:40 PM Paul.LKW  wrote:

> Dear All:
> I just upgraded my oVIrt-4.2.4 to 4.2.5 and nightmare begin, FreeBSD VMs
> hangs once the VM's VirtIO harddisk is some loading (eg. extracting a
> big tar ball, portsnap extract, etc.), I tried to create a VMs using
> back the IDE do not has any problem but got slow IO performance once
> using VirtIO it will hang as it like.
> I also tried to create another VMs with VirtIO-SCSI it seems the hang
> problem no occur but I got the oVirt-Host loading over 40 when VM
> harddisk have some little job, so conclusion is if you guys will using
> FreeBSD as guest please do not upgrade at the moment.
> In fact I feel every time the new version always have problem and
> causing user to nightmare.
>

I don't think ovirt is the root cause, it may be some lower level component
that was
upgraded at the same time, or something in the guest.

Does this happen only with ovirt 4.2.5 node and engine, or upgrading engine
is enough?
Are you running same version of libvirt, qemu, and kernel as in 4.2.4?
Are you running same version of FreeBSD?
Can you reproduce the same issue only with qemu? virsh? virt-manager?

Nir


Re: [Qemu-discuss] [kubevirt-dev] Re: Converting qcow2 image on the fly to raw format

2018-07-19 Thread Nir Soffer
On Thu, Jul 19, 2018 at 10:27 PM Richard W.M. Jones 
wrote:

>
> This is the code in libguestfs if you prefer something in C:
>
>
> https://github.com/libguestfs/libguestfs/blob/3a87c4bb441430c9cef9223e67d10bf51a4e865c/lib/info.c#L150-L160


I see that libguestfs is using 10 seconds limit:
https://github.com/libguestfs/libguestfs/blob/3a87c4bb441430c9cef9223e67d10bf51a4e865c/lib/info.c#L218


Re: [Qemu-discuss] [kubevirt-dev] Re: Converting qcow2 image on the fly to raw format

2018-07-19 Thread Nir Soffer
On Thu, Jul 19, 2018 at 10:24 PM Richard W.M. Jones 
wrote:

> On Thu, Jul 19, 2018 at 09:50:00PM +0300, Nir Soffer wrote:
> > On Mon, Jul 16, 2018 at 11:56 AM Daniel P. Berrangé  >
> > wrote:
> > ...
> >
> > > Recommendation is to run 'qemu-img info' to extract the metadata and
> sanity
> > > check results eg no backing file list, not unreasonable size, etc. When
> > > running 'qemu-img info' apply process limits of 30 secs CPU time, and
> 1 GB
> > > address space.
> > >
> >
> > Can you explain the values of cpu seconds and addres space?
> >
> > Also, I tried to use prlimit --rss=N, and the value seems to be ignored.
> Is
> > this
> > a bug in prlimit?
>
> This is the code in OpenStack:
>
>
> https://github.com/openstack/nova/blob/50f40854b04351fb622fd8b68b374a8fe8ca2070/nova/virt/images.py#L74
>
>
> https://github.com/openstack/oslo.concurrency/blob/a9d728b71e47540fd248a6bc2d301fdfa9a988ce/oslo_concurrency/processutils.py#L194
> (see prlimit)
>

Thanks for the links.

The 30 seconds cpu_time time limit confuses me; it was added in:
https://github.com/openstack/nova/commit/011ae614d5c5fb35b2e9c22a9c4c99158f6aee20

The patch references this bug:
https://bugs.launchpad.net/nova/+bug/1705340

The bug show that qemu-img info took more then 8 seconds real time:

/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:355
2017-07-19 19:47:42.849 7 DEBUG oslo_concurrency.processutils
[req-7ed3314d-1c11-4dd8-b612-f8d9c022417f
ff236d57a57dd42cb5811c998e30fca1a76233873b9f08330f725fb639c8b025
9776d48734a24c23a4aef51cb78cc269 - - -] CMD "/usr/bin/python2 -m
oslo_concurrency.prlimit --as=1073741824 --cpu=16 -- env LC_ALL=C
LANG=C qemu-img info
/var/lib/nova/instances/_base/41ebff725eab55d368f97bc79ddeea47df894145.part"
returned: 0 in 8.639s execute

How the size of the snapshot can slow down reading the qcow2 header?
Is this relevant for qemu-img-rhev 2.10?

Nir


Re: [Qemu-discuss] [kubevirt-dev] Re: Converting qcow2 image on the fly to raw format

2018-07-19 Thread Nir Soffer
On Mon, Jul 16, 2018 at 11:56 AM Daniel P. Berrangé 
wrote:
...

> Recommendation is to run 'qemu-img info' to extract the metadata and sanity
> check results eg no backing file list, not unreasonable size, etc. When
> running 'qemu-img info' apply process limits of 30 secs CPU time, and 1 GB
> address space.
>

Can you explain the values of cpu seconds and addres space?

Also, I tried to use prlimit --rss=N, and the value seems to be ignored. Is
this
a bug in prlimit?

Nir


Re: [Qemu-discuss] [kubevirt-dev] Re: Converting qcow2 image on the fly to raw format

2018-07-19 Thread Nir Soffer
On Mon, Jul 16, 2018 at 11:56 AM Daniel P. Berrangé 
wrote:

> On Wed, Jul 11, 2018 at 02:17:18PM +0300, Adam Litke wrote:
> > Adding some kubevirt developers to the thread.  Thanks guys for the
> > information!  I think this could work perfectly for on the fly conversion
> > of qcow2 images to raw format on our PVCs.
>
> FYI if you are intending to accept qcow2 images from untrustworthy sources
> you must take special care to validate the image in a confined environment.
> It is possible to construct malicious images that inflict a denial of
> service attack on CPU or memory or both, even when merely opening the image
> to query its metadata. This has been reported as a CVE against OpenStack
> in the past:
>
>   https://bugs.launchpad.net/ossa/+bug/1449062
>
> Recommendation is to run 'qemu-img info' to extract the metadata and sanity
> check results eg no backing file list, not unreasonable size, etc. When
> running 'qemu-img info' apply process limits of 30 secs CPU time, and 1 GB
> address space.
>

Thanks for the suggestion.

We currently do not limit the qemu-img process in any way, but it sounds
like
a good idea.

We also don't verify the size of the image, this should be fixed.

What we do currently is:
1. Mark image as illegal in oVirt metadata - prevents using the image by
oVirt.
2. Expose the image via http
3. Wait until the user completes the upload
4. Unexposed the image, so no more data can be written.
5. Run qemu-img info /path/to/image (running as vdsm, but without any limit)
6. Verify format with oVirt metada - it must be same as specified in oVirt
7. Verify backing file with ovirt metadata - it must be same as specified
in oVirt
(no backing file or volume UUID)
8. Verify that qcow2 compat is compatible with the storage domain
9. If all checks are ok, mark the image as legal.

The image is deleted on verification failure.

This is the code if someone like to check:
-
https://github.com/oVirt/vdsm/blob/857df825f8c1e9030f7c1e46e6c59cb63546d7c9/lib/vdsm/storage/hsm.py#L1535
-
https://github.com/oVirt/vdsm/blob/857df825f8c1e9030f7c1e46e6c59cb63546d7c9/tests/storage/hsm_test.py#L49

Nir


Re: [Qemu-discuss] Converting qcow2 image on the fly to raw format

2018-07-09 Thread Nir Soffer
On Mon, Jul 9, 2018 at 8:06 PM Richard W.M. Jones  wrote:

> On Mon, Jul 09, 2018 at 07:02:50PM +0200, Kevin Wolf wrote:
> > Am 09.07.2018 um 18:52 hat Richard W.M. Jones geschrieben:
> > > On Mon, Jul 09, 2018 at 07:38:05PM +0300, Nir Soffer wrote:
> > > > We are discussing importing VM images to KubVirt. The goal is to be
> > > > able to import an existing qcow2 disk, probably some appliance stored
> > > > on http server, and and convert it to raw format for writing to
> storage.
> > > >
> > > > This can be also useful for for oVirt for importing OVA, since we
> like to
> > > > pack
> > > > disks in qcow2 format inside OVA, but the user may like to use raw
> disks, or
> > > > for uploading existing disks.
> > > >
> > > > Of course converting the image using qemu-img is easy, but requires
> > > > downloading the image to temporary disk. We would like to avoid
> temporary
> > > > disks, or telling users to convert the image.
> > > >
> > > > Base on the discussion we had here:
> > > >
> https://lists.ovirt.org/archives/list/us...@ovirt.org/thread/GNAVJ253FP65QUSOONES5XZGRIDX5ABC/#YMLSEGU7PN3MX5MUORGEGGAQLLSL4KKJ
> > > >
> > > > I think this is impossible since qcow2 is not built for streaming.
> But both
> > > > Richard and Eric suggested some solutions.
> > > >
> > > > The flow is:
> > > >
> > > > qcow2 image -- http --> importer -> raw file
> > > >
> > > > Is it possible to implement the importer using qemu-img and qemu-nbd,
> > > > or maybe nbdkit?
> > >
> > > Strictly speaking streaming qcow2 to raw is not possible.  However
> > > placing an overlay on top of the original remote image will allow
> > > streaming to raw with only a modest amount of local storage consumed.
> > >
> > > You can demonstrate this fairly easily:
> > >
> > > $ qemu-img create -f qcow2 -b 'json: { "file.driver": "https",
> "file.url": "
> https://uk-mirrors.evowise.com/fedora/releases/28/Cloud/x86_64/images/Fedora-Cloud-Base-28-1.1.x86_64.qcow2";,
> "file.timeout": 1 }' /var/tmp/overlay.qcow2
> > > $ qemu-img convert -p -f qcow2 -O raw overlay.qcow2 fedora.img
> >
> > This overlay stays empty, so it's pretty pointless and you could just
> > directly point 'qemu-img convert' to https and the real image.
>
> Right, indeed.  I was copying what virt-v2v does without thinking
> about it enough.  virt-v2v needs the overlay because it actually wants
> to write into it, and it does copy-on-read for the first phase (not
> the final ‘qemu-img convert’).
>

Thanks, I just tested the simple:

qemu-img convert -p -f qcow2 -O raw http://localhost/orig.qcow2
converted.raw

And it just works :-)

I got timeouts trying to download from
https://download.fedoraproject.org/pub/alt/atomic/stable/Fedora-Atomic-28-20180625.1/AtomicHost/x86_64/images/Fedora-AtomicHost-28-20180625.1.x86_64.qcow2

I guess we need to use
'json: { "file.driver": "http", "file.url": "url...", "file.timeout": 1
}'
To change timeout? Where is these and other options documented?

I did also some timings, using sever on local network with 1g nic.

$ time wget http://local.server/Fedora-AtomicHost-28-20180625.1.x86_64.qcow2

...
Length: 638043136 (608M) [application/octet-stream]Saving to:
‘Fedora-AtomicHost-28-20180625.1.x86_64.qcow2’

Fedora-AtomicHost-28-20180625.1.x86_64.qcow2
100%[=>]
608.49M   107MB/sin 5.6s

2018-07-09 21:38:39 (108 MB/s) -
‘Fedora-AtomicHost-28-20180625.1.x86_64.qcow2’ saved [638043136/638043136]


real 0m5.941s
user 0m0.183s
sys 0m1.185s

$ time qemu-img convert -p -f qcow2 -O raw
http://local.server/Fedora-AtomicHost-28-20180625.1.x86_64.qcow2
converted.raw
(100.00/100%)

real 0m14.217s
user 0m5.235s
sys 0m2.343s

$ time qemu-img convert -p -f qcow2 -O raw
Fedora-AtomicHost-28-20180625.1.x86_64.qcow2 converted.raw
(100.00/100%)

real 0m11.909s
user 0m4.728s
sys 0m1.595s

So converting on the fly is even little faster then downloading to
temporary file and converting.

Nir


[Qemu-discuss] Converting qcow2 image on the fly to raw format

2018-07-09 Thread Nir Soffer
We are discussing importing VM images to KubVirt. The goal is to be
able to import an existing qcow2 disk, probably some appliance stored
on http server, and and convert it to raw format for writing to storage.

This can be also useful for for oVirt for importing OVA, since we like to
pack
disks in qcow2 format inside OVA, but the user may like to use raw disks, or
for uploading existing disks.

Of course converting the image using qemu-img is easy, but requires
downloading the image to temporary disk. We would like to avoid temporary
disks, or telling users to convert the image.

Base on the discussion we had here:
https://lists.ovirt.org/archives/list/us...@ovirt.org/thread/GNAVJ253FP65QUSOONES5XZGRIDX5ABC/#YMLSEGU7PN3MX5MUORGEGGAQLLSL4KKJ

I think this is impossible since qcow2 is not built for streaming. But both
Richard and Eric suggested some solutions.

The flow is:

qcow2 image -- http --> importer -> raw file

Is it possible to implement the importer using qemu-img and qemu-nbd,
or maybe nbdkit?

Nir


Re: [Qemu-discuss] qemu-img convert stuck

2018-04-08 Thread Nir Soffer
On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik  wrote:

> Hi,
>
> As part of copy operation initiated by rhev got stuck for more than a day
> and consumes plenty of CPU
> vdsm 13024  3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img convert
> -p -t none -T none -f qcow2
>
> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e
> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com:
>
> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da
>
> The target image appears to have no data yet:
> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da"
> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da
> file format: raw
> virtual size: 120G (128849018880 bytes)
> disk size: 0
>
> strace -p 13024 -tt -T -f shows only:
> ...
> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.10>
> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.09>
> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.09>
> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.10>
>
> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64
>
> What could cause this? I'll provide any additional information needed
>

A backtrace may help, try:

gdb -p 13024 -batch -ex "thread apply all bt"

Also adding Kevin and qemu-block.

Nir


Re: [Qemu-discuss] [Qemu-devel] Supporting unsafe create when backing file is not accessible

2017-07-14 Thread Nir Soffer
On Wed, Jul 12, 2017 at 5:40 PM Eric Blake  wrote:

> On 07/12/2017 03:56 AM, Ala Hino wrote:
> > Hi,
> >
> > We encountered a performance issue when creating a volume for a running
> VM
> > and we'd like to share the info with you. The root cause of the issue is
> in
> > our code but we found a workaround that relies on qemu-img create
> > undocumented behavior.
> >
> > During our tests, we found that in order to create a volume with a
> backing
> > file, the baking file has to be valid and accessible.
>
> In general, that's a good thing. But you're also right that it is nice
> to have a mode where the backing file is not probed.
>
> > This requires us to
> > activate the entire chain before creating the volume, and deactivate the
> > chain after the operation completes. Activating and deactivating the
> chain
> > are expensive operations that we prefer to avoid when possible. Below is
> > the command we use to create volumes:
> >
> > qemu-img create -f qcow2 -o compat=1.1 -b
> > 8a28cda2-548d-47da-bbba-faa81284f6ba -F raw
> >
> /rhev/data-center/e6b272af-84cb-43fc-ae5b-7fe18bc469a2/f47c980a-fd76-44a9-8b78-00d3ab2ffd2f/images/2ff0a3c0-f145-4f83-b668-fc0c39ba191f/d3b69657-892f-499c-9ac3-9c443ead7d9b
> > 1073741824
> >
> > We also found that when providing the size and the backing format for
> qemu,
> > qemu doesn't open the backing chain, and in this case we don't have to
> > activate/deactivate the entire chain - exactly the behavior we wish to
> have.
>
> Yes, that is currently the case. You are correct that patches have been
> proposed to tighten things so that we would probe the existence of the
> backing file in more cases (even when the size is provided); and that if
> we do so, we'd also have to provide a backdoor for creating an image
> without probing the backing file.  But it is also the case that you can
> create an image with NO backing file, and then use 'qemu-img rebase -u'
> to add the backing file after the fact, without waiting for any proposed
> patches to land.
>
> > We we'd like to get your confirmation of the above behavior as it isn't
> > documented, and whether it can be documented.
>
> The fact that you are asking does mean that we should revive John's
> proposed patches, in some form or another.
>

Eric, we are more concerned about using the current qemu version.

We can use the fact that providing both size and backing format,
qemu does not open the backing file, but this is not documented, and
we don't want to base oVirt code on undocumented behavior.

What we would like to have is:
- qemu blessing for using this undocumented behaviour
- and documenting this behavior in qemu-img(1)

With this we can fix https://bugzilla.redhat.com/1395941

now, instead of waiting for qemu 2.10.

For future version, having a explicit way to allow unsafe create
is of course better.

Nir

>
> > In addition, we are aware of https://bugzilla.redhat.com/1213786, where
> a
> > -u (unsafe) option is planned to be added (see comment #4 in the BZ). Can
> > you please confirm that once that support is released, it won't break
> > existing, i.e. code that provides size and backing format assuming that
> > "unsafe" create is supported?
>
> If we tighten things to require the backing file to exist unless -u is
> provided, then providing the size alone will no longer be sufficient to
> prevent the probe - you'd have to use -u to prevent the probe, or change
> your workflow to create the image without a backing file then add in the
> backing information via 'qemu-img rebase -u'.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266 <(919)%20301-3266>
> Virtualization:  qemu.org | libvirt.org
>
>


Re: [Qemu-discuss] [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-15 Thread Nir Soffer
On Wed, Feb 15, 2017 at 5:20 PM, Daniel P. Berrange  wrote:
> On Wed, Feb 15, 2017 at 03:14:19PM +, Stefan Hajnoczi wrote:
>> On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
>> > I was wondering if that is possible to provide a new API that
>> > estimates the size of
>> > qcow2 image converted from a raw image. We could use this new API to
>> > allocate the
>> > size more precisely before the convert operation.
>> >
>> [...]
>> > We think that the best way to solve this issue is to return this info
>> > from qemu-img, maybe as a flag to qemu-img convert that will
>> > calculate the size of the converted image without doing any writes.
>>
>> Sounds reasonable.  qcow2 actually already does some of this calculation
>> internally for image preallocation in qcow2_create2().
>>
>> Let's try this syntax:
>>
>>   $ qemu-img query-max-size -f raw -O qcow2 input.raw
>>   1234678000
>>
>> As John explained, it is only an estimate.  But it will be a
>> conservative maximum.
>
> This forces you to have an input file. It would be nice to be able to
> get the same information by merely giving the desired capacity e.g
>
>   $ qemu-img query-max-size -O qcow2 20G

Without a file, this will have to assume that all clusters will be allocated.

Do you have a use case for not using existing file?

For ovirt we need this when converting a file from one storage to another,
the capabilities of the storage matter in both cases.

(Adding all)

Nir



Re: [Qemu-discuss] [Qemu-devel] Estimation of qcow2 image size converted from raw image

2017-02-15 Thread Nir Soffer
On Wed, Feb 15, 2017 at 5:14 PM, Stefan Hajnoczi  wrote:
> On Mon, Feb 13, 2017 at 05:46:19PM +0200, Maor Lipchuk wrote:
>> I was wondering if that is possible to provide a new API that
>> estimates the size of
>> qcow2 image converted from a raw image. We could use this new API to
>> allocate the
>> size more precisely before the convert operation.
>>
> [...]
>> We think that the best way to solve this issue is to return this info
>> from qemu-img, maybe as a flag to qemu-img convert that will
>> calculate the size of the converted image without doing any writes.
>
> Sounds reasonable.  qcow2 actually already does some of this calculation
> internally for image preallocation in qcow2_create2().
>
> Let's try this syntax:
>
>   $ qemu-img query-max-size -f raw -O qcow2 input.raw
>   1234678000

This is little bit verbose compared to other commands
(e.g. info, check, convert)

Since this is needed only during convert, maybe this can be
a convert flag?

qemu-img convert -f xxx -O yyy src dst --estimate-size --output json
{
"estimated size": 1234678000
}

> As John explained, it is only an estimate.  But it will be a
> conservative maximum.
>
> Internally BlockDriver needs a new interface:
>
> struct BlockDriver {
> /*
>  * Return a conservative estimate of the maximum host file size
>  * required by a new image given an existing BlockDriverState (not
>  * necessarily opened with this BlockDriver).
>  */
> uint64_t (*bdrv_query_max_size)(BlockDriverState *other_bs,
> Error **errp);
> };
>
> This interface allows individual block drivers to probe other_bs in
> whatever way necessary (e.g. querying block allocation status).
>
> Since this is a conservative max estimate there's no need to read all
> data to check for zero regions.  We should give the best estimate that
> can be generated quickly.

I think we need to check allocation (e.g. with SEEK_DATA), I hope this
is what you mean by not read all data.

Nir



Re: [Qemu-discuss] Converting qcow2 image to raw thin lv

2017-02-11 Thread Nir Soffer
On Sat, Feb 11, 2017 at 12:23 AM, Nir Soffer  wrote:
> Hi all,
>
> I'm trying to convert images (mostly qcow2) to raw format on thin lv,
> hoping to write only the allocated blocks on the thin lv, but
> it seems that qemu-img cannot write sparse image on a block
> device.
>
> Here is an example:
>
> Create a new thin lv:
>
> # lvcreate --name raw-test --virtualsize 20g --thinpool pool0 ovirt-local
>   Using default stripesize 64.00 KiB.
>   Logical volume "raw-test" created.
>
> [root@voodoo6 ~]# lvs ovirt-local
>   LV   VG  Attr   LSize
> Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
> pool06.74
>   4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
> pool00.00
>   7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
> pool06.98
>   ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
> pool06.87
>   pool0ovirt-local twi-aotz-- 40.00g
>10.30  5.49
>   raw-test ovirt-local Vwi-a-tz-- 20.00g
> pool00.00
>
> I want to convert this image (fresh fedora 25 installation):
>
> # qemu-img info fedora.qcow2
> image: fedora.qcow2
> file format: qcow2
> virtual size: 20G (21474836480 bytes)
> disk size: 1.3G
> cluster_size: 65536
> Format specific information:
> compat: 1.1
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
>
> Convert the image to raw, into the new thin lv:
>
> # qemu-img convert -p -f qcow2 -O raw -t none -T none fedora.qcow2
> /dev/ovirt-local/raw-test
> (100.00/100%)
>
> The image size was 1.3G, but now the thin lv is fully allocated:
>
> # lvs ovirt-local
>   LV   VG  Attr   LSize
> Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
> pool06.74
>   4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
> pool00.00
>   7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
> pool06.98
>   ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
> pool06.87
>   pool0ovirt-local twi-aotz-- 40.00g
>60.30  29.72
>   raw-test ovirt-local Vwi-a-tz-- 20.00g
> pool0100.00
>
> Recreate the lv:
>
> # lvremove -f ovirt-local/raw-test
>   Logical volume "raw-test" successfully removed
>
> # lvcreate --name raw-test --virtualsize 20g --thinpool pool0 ovirt-local
>   Using default stripesize 64.00 KiB.
>   Logical volume "raw-test" created.
>
> Covert the qcow image to raw sparse file:
>
> # qemu-img convert -p -f qcow2 -O raw -t none -T none fedora.qcow2 fedora.raw
> (100.00/100%)
>
> # qemu-img info fedora.raw
> image: fedora.raw
> file format: raw
> virtual size: 20G (21474836480 bytes)
> disk size: 1.3G
>
> Write the sparse file to the thin lv:
>
> # dd if=fedora.raw of=/dev/ovirt-local/raw-test bs=8M conv=sparse
> 2560+0 records in
> 2560+0 records out
> 21474836480 bytes (21 GB) copied, 39.0065 s, 551 MB/s
>
> Now we are using only 7.19% of the lv:
>
> # lvs ovirt-local
>   LV   VG  Attr   LSize
> Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
> pool06.74
>   4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
> pool00.00
>   7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
> pool06.98
>   ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
> pool06.87
>   pool0ovirt-local twi-aotz-- 40.00g
>13.89  7.17
>   raw-test ovirt-local Vwi-a-tz-- 20.00g
> pool07.19
>
> This works, but it would be nicer to have a way to convert
> to raw sparse to a block device in one pass.

So it seems that qemu-img is trying to write a sparse image.

I tested again with empty file:

truncate -s 20m empty

Using strace, qemu-img checks the device discard_zeroes_data:

ioctl(11, BLKDISCARDZEROES, 0)  = 0

Then it find that the source is empty:

lseek(10, 0, SEEK_DATA) = -1 ENXIO (No such device
or address)

Then it issues one call

[pid 10041] ioctl(11, BLKZEROOUT, 0x7f6049c82ba0) = 0

And fsync and close the destination.

# grep -s "" /sys/block/dm-57/queue/disca

[Qemu-discuss] Converting qcow2 image to raw thin lv

2017-02-10 Thread Nir Soffer
Hi all,

I'm trying to convert images (mostly qcow2) to raw format on thin lv,
hoping to write only the allocated blocks on the thin lv, but
it seems that qemu-img cannot write sparse image on a block
device.

Here is an example:

Create a new thin lv:

# lvcreate --name raw-test --virtualsize 20g --thinpool pool0 ovirt-local
  Using default stripesize 64.00 KiB.
  Logical volume "raw-test" created.

[root@voodoo6 ~]# lvs ovirt-local
  LV   VG  Attr   LSize
Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
  029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
pool06.74
  4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
pool00.00
  7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
pool06.98
  ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
pool06.87
  pool0ovirt-local twi-aotz-- 40.00g
   10.30  5.49
  raw-test ovirt-local Vwi-a-tz-- 20.00g
pool00.00

I want to convert this image (fresh fedora 25 installation):

# qemu-img info fedora.qcow2
image: fedora.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 1.3G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

Convert the image to raw, into the new thin lv:

# qemu-img convert -p -f qcow2 -O raw -t none -T none fedora.qcow2
/dev/ovirt-local/raw-test
(100.00/100%)

The image size was 1.3G, but now the thin lv is fully allocated:

# lvs ovirt-local
  LV   VG  Attr   LSize
Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
  029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
pool06.74
  4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
pool00.00
  7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
pool06.98
  ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
pool06.87
  pool0ovirt-local twi-aotz-- 40.00g
   60.30  29.72
  raw-test ovirt-local Vwi-a-tz-- 20.00g
pool0100.00

Recreate the lv:

# lvremove -f ovirt-local/raw-test
  Logical volume "raw-test" successfully removed

# lvcreate --name raw-test --virtualsize 20g --thinpool pool0 ovirt-local
  Using default stripesize 64.00 KiB.
  Logical volume "raw-test" created.

Covert the qcow image to raw sparse file:

# qemu-img convert -p -f qcow2 -O raw -t none -T none fedora.qcow2 fedora.raw
(100.00/100%)

# qemu-img info fedora.raw
image: fedora.raw
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 1.3G

Write the sparse file to the thin lv:

# dd if=fedora.raw of=/dev/ovirt-local/raw-test bs=8M conv=sparse
2560+0 records in
2560+0 records out
21474836480 bytes (21 GB) copied, 39.0065 s, 551 MB/s

Now we are using only 7.19% of the lv:

# lvs ovirt-local
  LV   VG  Attr   LSize
Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
  029060ab-41ef-4dfd-9a3e-4c716c01db06 ovirt-local Vwi-a-tz-- 20.00g
pool06.74
  4f207ee8-bb47-465a-9b68-cb778e070861 ovirt-local Vwi-a-tz-- 20.00g
pool00.00
  7aed605e-c74c-40d8-b449-8a1bf7228b8b ovirt-local Vwi-a-tz-- 20.00g
pool06.98
  ce6d08d3-350f-4afa-a0e7-7b492a1a7744 ovirt-local Vwi-a-tz-- 20.00g
pool06.87
  pool0ovirt-local twi-aotz-- 40.00g
   13.89  7.17
  raw-test ovirt-local Vwi-a-tz-- 20.00g
pool07.19

This works, but it would be nicer to have a way to convert
to raw sparse to a block device in one pass.

Nir