Re: [Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)

2019-01-20 Thread Max Reitz
On 18.01.19 16:25, Alberto Garcia wrote:
> On Wed 16 Jan 2019 02:15:24 AM CET, james harvey wrote:
> 
>> I ran:
>>
>> # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw
>> /mnt/tmpqcow/win7.raw
>>
>> 45 minutes later, qemu-img had been running with 100% CPU every time I
>> checked, and it had allocated the raw file, but still hadn't actually
>> written a single byte: (note the dd in the VM completed the 90GB in
>> about 8 minutes)
> 
> [...]
> 
>> After running this long, I ran strace for 15 seconds, here:
>> https://termbin.com/gg9k -- It's repeatedly running lseek with
>> SEEK_DATA and SEEK_HOLE.  The SEEK_HOLE always results in 96251936768,
>> and SEEK_DATA is different results.
> 
> It seems like the problem addressed by this patch:
> 
>https://lists.gnu.org/archive/html/qemu-block/2019-01/msg00246.html

But that patch is rather controversial.

I don't think we've found a definitive solution for this issue yet
(other than the fact that tmpfs is basically just buggy (is I think what
we've claimed), and I think this is not the first time it was reported
for btrfs either).  I had some patches specifically for qemu-img
convert, too, but the issue there was that conversion of preallocated
qcow2 images on non-buggy filesystems got slower.  So we somehow have to
resolve that tradeoff...

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)

2019-01-18 Thread Alberto Garcia
On Wed 16 Jan 2019 02:15:24 AM CET, james harvey wrote:

> I ran:
>
> # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw
> /mnt/tmpqcow/win7.raw
>
> 45 minutes later, qemu-img had been running with 100% CPU every time I
> checked, and it had allocated the raw file, but still hadn't actually
> written a single byte: (note the dd in the VM completed the 90GB in
> about 8 minutes)

[...]

> After running this long, I ran strace for 15 seconds, here:
> https://termbin.com/gg9k -- It's repeatedly running lseek with
> SEEK_DATA and SEEK_HOLE.  The SEEK_HOLE always results in 96251936768,
> and SEEK_DATA is different results.

It seems like the problem addressed by this patch:

   https://lists.gnu.org/archive/html/qemu-block/2019-01/msg00246.html

Berto



[Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)

2019-01-15 Thread james harvey
In IRC, jsnow said this was a known problem, "that lseek is not
reliably fast", but requested I send this to the block list.  I used a
workaround of giving the qcow to a VM not otherwise using it, and ran
dd within the VM.  So, I don't personally need a fix, but I'm happy to
mail the list to give info to help down the bug.

I ran:

# qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw
/mnt/tmpqcow/win7.raw

45 minutes later, qemu-img had been running with 100% CPU every time I
checked, and it had allocated the raw file, but still hadn't actually
written a single byte: (note the dd in the VM completed the 90GB in
about 8 minutes)

# ls -la /mnt/tmpqcow
-rw-r--r-- 1 root root 96636764160 Jan 15 18:50 win7.raw
# du /mnt/tmpqcow/win7.raw
0   /mnt/tmpqcow/win7.raw

Both /var/lib/libvirt/images and /mnt/tmpqcow are on the same Samsung
960 EVO NVMe (spec 1900MB/s write, 3200MB/s read.)

$ ls -la /var/lib/libvirt/images/win7.qcow2
-rw--- 1 root root 96251936768 Jan 15 18:45
/var/lib/libvirt/images/win7.qcow2

# qemu-img info /var/lib/libvirt/images/win7.qcow2
image: /var/lib/libvirt/images/win7.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 90G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: true
refcount bits: 16
corrupt: false

After running this long, I ran strace for 15 seconds, here:
https://termbin.com/gg9k -- It's repeatedly running lseek with
SEEK_DATA and SEEK_HOLE.  The SEEK_HOLE always results in 96251936768,
and SEEK_DATA is different results.

Starting over to get the beginning of an strace, here: https://termbin.com/misf

Running up to date Arch Linux.  Kernel 4.20.1.  qemu 3.1.0.

/var/lib/libvirt/images and /mnt/tmpqcow are on separate BTRFS
volumes, on top of LVM thin volumes.  I'll admit I'm only now seeing
that BTRFS' copy on write is being used on "/var/lib/libvirt/".  I
thought it was turned off.  This is the only VM I'm running as a qcow
or through libvirt - rest are virtio direct through qemu.

Hitting CTRL+C didn't stop qemu-img convert, I had to kill it.