Re: [Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)
On 18.01.19 16:25, Alberto Garcia wrote: > On Wed 16 Jan 2019 02:15:24 AM CET, james harvey wrote: > >> I ran: >> >> # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw >> /mnt/tmpqcow/win7.raw >> >> 45 minutes later, qemu-img had been running with 100% CPU every time I >> checked, and it had allocated the raw file, but still hadn't actually >> written a single byte: (note the dd in the VM completed the 90GB in >> about 8 minutes) > > [...] > >> After running this long, I ran strace for 15 seconds, here: >> https://termbin.com/gg9k -- It's repeatedly running lseek with >> SEEK_DATA and SEEK_HOLE. The SEEK_HOLE always results in 96251936768, >> and SEEK_DATA is different results. > > It seems like the problem addressed by this patch: > >https://lists.gnu.org/archive/html/qemu-block/2019-01/msg00246.html But that patch is rather controversial. I don't think we've found a definitive solution for this issue yet (other than the fact that tmpfs is basically just buggy (is I think what we've claimed), and I think this is not the first time it was reported for btrfs either). I had some patches specifically for qemu-img convert, too, but the issue there was that conversion of preallocated qcow2 images on non-buggy filesystems got slower. So we somehow have to resolve that tradeoff... Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)
On Wed 16 Jan 2019 02:15:24 AM CET, james harvey wrote: > I ran: > > # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw > /mnt/tmpqcow/win7.raw > > 45 minutes later, qemu-img had been running with 100% CPU every time I > checked, and it had allocated the raw file, but still hadn't actually > written a single byte: (note the dd in the VM completed the 90GB in > about 8 minutes) [...] > After running this long, I ran strace for 15 seconds, here: > https://termbin.com/gg9k -- It's repeatedly running lseek with > SEEK_DATA and SEEK_HOLE. The SEEK_HOLE always results in 96251936768, > and SEEK_DATA is different results. It seems like the problem addressed by this patch: https://lists.gnu.org/archive/html/qemu-block/2019-01/msg00246.html Berto
[Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)
In IRC, jsnow said this was a known problem, "that lseek is not reliably fast", but requested I send this to the block list. I used a workaround of giving the qcow to a VM not otherwise using it, and ran dd within the VM. So, I don't personally need a fix, but I'm happy to mail the list to give info to help down the bug. I ran: # qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw /mnt/tmpqcow/win7.raw 45 minutes later, qemu-img had been running with 100% CPU every time I checked, and it had allocated the raw file, but still hadn't actually written a single byte: (note the dd in the VM completed the 90GB in about 8 minutes) # ls -la /mnt/tmpqcow -rw-r--r-- 1 root root 96636764160 Jan 15 18:50 win7.raw # du /mnt/tmpqcow/win7.raw 0 /mnt/tmpqcow/win7.raw Both /var/lib/libvirt/images and /mnt/tmpqcow are on the same Samsung 960 EVO NVMe (spec 1900MB/s write, 3200MB/s read.) $ ls -la /var/lib/libvirt/images/win7.qcow2 -rw--- 1 root root 96251936768 Jan 15 18:45 /var/lib/libvirt/images/win7.qcow2 # qemu-img info /var/lib/libvirt/images/win7.qcow2 image: /var/lib/libvirt/images/win7.qcow2 file format: qcow2 virtual size: 90G (96636764160 bytes) disk size: 90G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: true refcount bits: 16 corrupt: false After running this long, I ran strace for 15 seconds, here: https://termbin.com/gg9k -- It's repeatedly running lseek with SEEK_DATA and SEEK_HOLE. The SEEK_HOLE always results in 96251936768, and SEEK_DATA is different results. Starting over to get the beginning of an strace, here: https://termbin.com/misf Running up to date Arch Linux. Kernel 4.20.1. qemu 3.1.0. /var/lib/libvirt/images and /mnt/tmpqcow are on separate BTRFS volumes, on top of LVM thin volumes. I'll admit I'm only now seeing that BTRFS' copy on write is being used on "/var/lib/libvirt/". I thought it was turned off. This is the only VM I'm running as a qcow or through libvirt - rest are virtio direct through qemu. Hitting CTRL+C didn't stop qemu-img convert, I had to kill it.