[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2020-08-12 Thread Laszlo Ersek (Red Hat)
Commit 5e9785505210 was released in v4.2.0; closing this ticket. ** Changed in: qemu Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1846427 Title:

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-12-07 Thread Michael Weiser
FWIW, my VMs run with SATA and Virtio SCSI with discard=unmap and detect-zeroes=unmap (among the plethora of options from libvirtd) for maximum space savings. No problems since the fix patches went in and had no bearing on the bug occurence before that. /usr/bin/qemu-system-x86_64 -name

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-12-05 Thread Matti Hameister
The qemu 4.1.0 upgrade killed pretty much all my VMs. I had data corruption (i.e. tar was unable to extract some larger data archives for testing purposes) in all my Linux VMs and other strange errors. The Windows VM was killed after I ran "qemu-img check -r all" on the image. Afterwards Windows

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-12-04 Thread Kevin Wolf
I don't see anything suspicious in that command line. My only idea for a different configuration to test would be discard=off, which would remove a few code paths that could contain a bug. Anyway, I think it's pretty clear now that you're hitting a different bug than Michael. Maybe it would be

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-12-04 Thread Matti Hameister
I was unable to compile the qemu-git package and I currently have not time to investigate that. But I updated to 4.1.1. I just started my Windows 10 VM with that and after a short time of use the image was corrupted again. Here is my full start parameter set. Maybe there is something wrong or I

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-12-02 Thread Michael Weiser
My images are still fine after some heavy use with qemu-4.1.1 and no additional patches. I consider this bug fixed for good. Thanks for all your support on this! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU.

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-28 Thread Michael Weiser
All my images are still fine after some heavy use with qemu-4.1.0 and fix patches applied. Just upgraded to 4.1.1 and will report back. But it's certainly looks like this bug is fixed for good. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-18 Thread Michael Weiser
I've done some security updates on my Debian, Windows 7 64 and 32 Bit VMs and quite intensively used a Windows 1903 VM today without any corruption. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU.

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-17 Thread Matti Hameister
The image was fine before upgrading qemu. I rechecked the image after the first use and it was fine. But after the larger Windows 1903 -> 1909 upgrade done in qemu 4.1.0 the image was damaged. I will try the git master version of qemu in the coming days and report back. -- You received this bug

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-15 Thread Michael Weiser
I have been dragging my feet exposing my production VMs to a patched 4.1.0 TBH. I have now taken the opportunity to upgrade from 4.0.0 to a 4.1.0 with the fix patches applied. As expected, I can not produce any image corruption with the reproducer I've been using all along. I will now use it in

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-15 Thread Kevin Wolf
Is this a fresh image or is it possible that it already had some latent corruption from a previous run with an unfixed version? If it wasn't fresh, did you run qemu-img check after upgrading QEMU and it still was clean, so we know the corruption was introduced by the new version? Is the problem

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-14 Thread Matti Hameister
I tried the ArchLinux package that includes three patches applied to qemu 4.1 ( see https://git.archlinux.org/svntogit/packages.git/commit/trunk/PKGBUILD?h=packages/qemu=e9707066408de26aa04f8d0ddebe5556aa87e662 ). My Windows 10 qcow2 image got corrupted again after a short time of use. Host

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-11-04 Thread Laszlo Ersek (Red Hat)
My understanding is that Kevin has fixed this bug in (as yet unreleased) commit 5e9785505210 ("qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()", 2019-10-25). The patch had been posted as a part of the following sets: [PATCH 0/3] qcow2: Fix image corruption bug in 4.1

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-23 Thread Michael Weiser
For completeness's sake: All the changes you proposed (replacing call to qcow2_detect_metadata_preallocation() with ret = true and ret = false, moving acquiring s->lock before the call and replacing the call with a sleep) prevent corruption on my system. The latter would suggest that it's not so

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-23 Thread Michael Weiser
> I think I may have had the wrong image size before because both tmpfs and > my spare LVM volume are rather limited in size. I also had a hard time to get my image to corrupt on tmpfs because it could not grow to its final size, it seems. Sometimes qemu ran into acutal ENOSPC but most of the

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-23 Thread Kevin Wolf
I finally got an image with which I can reproduce the problem. I think I may have had the wrong image size before because both tmpfs and my spare LVM volume are rather limited in size. Anyway, so far locking around qcow2_get_refcount() seems to do the trick. I'll try to investigate the details a

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-23 Thread Kevin Wolf
> So it's much more likely that is_zero_cow() has a side-effect that somehow > causes corruption later on even without handle_alloc_space() ever calling > bdrv_co_pwrite_zeroes(). Yes, looks like it. I think we have ruled out that a changing return value is the cause of the problems because the

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-22 Thread Laszlo Ersek (Red Hat)
In reply to : > Is it possible that we're talking about some kind of miscompilation > here, maybe because gcc-9.2.0 is just that tiny bit too spanking > current? I'm riding the trailing edge here (gcc-4.8 in RHEL7) :) [...] -- You

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-22 Thread Michael Weiser
Please ignore the stuff about (!is_zero_cow(bs, m) || true) being optimized out. Of course it isn't. And corruption still occurs with that way of calling only is_zero_cow(). Dunno what I did there. It seems to be even later than I thought. The rest of my testing holds true though. -- You

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-22 Thread Michael Weiser
> I tried to reproduce the problem locally, on the same commit, with the > steps you described, but I wasn't lucky. I tried keeping the image on my > home directory (XFS), on tmpfs, and finally on a newly created ext4 > filesystem on a spare LVM volume, but the image just wouldn't break even >

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-22 Thread Kevin Wolf
> To avoid any suspicion as to what that may have brought with it in > breakage I just created a fresh image using this command: [...] I tried to reproduce the problem locally, on the same commit, with the steps you described, but I wasn't lucky. I tried keeping the image on my home directory

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-22 Thread Kevin Wolf
> But isn't that "if" at the core of this problem? What happens if the > detection misfires? The information that a block driver must give is just whether the given block is allocated by the image or whether it is taken from the backing file. Almost everything else is just a hint that can be

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-21 Thread Michael Weiser
> After reading some related code, I have more questions than before, but > let's see... As more qcow2 code was merged since, I would suggest that > we debug the problem on commit 69f4750 (the bisection result) rather > than on anything newer. Okay, for all of the following I did a fresh compile

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-21 Thread Laszlo Ersek (Red Hat)
In reply to Kevin's comment#13: > I find Laszlo's case with a preallocated image particularly surprising > because the behaviour isn't supposed to have changed at all for > preallocated images, at least if the heuristics still detects them as > such. But isn't that "if" at the core of this

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-21 Thread Kevin Wolf
After reading some related code, I have more questions than before, but let's see... As more qcow2 code was merged since, I would suggest that we debug the problem on commit 69f4750 (the bisection result) rather than on anything newer. First of all: Michael, you didn't specify explicitly how

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-20 Thread Simon John
Can't seem to reproduce if I convert the qcow2 image to raw+sparse. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1846427 Title: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle Status in

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-20 Thread Simon John
Not sure if i have exactly the same problem, as my qcow2 corruption seems to be limited to windows10 guests - win2019 and debian10 guests with the same virtio-scsi setup are fine (as are various virtio-blk or ide/sata images from linux/solaris/macos guests). I find that i randomly have disk image

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-18 Thread Michael Weiser
My qcow2 images also reside on an ext4 with features "has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file dir_nlink extra_isize metadata_csum" on a luks- encrypt(ed|ing) device mapper device backed by a partition on an NVMe SSD. The setup is rock

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-17 Thread Laszlo Ersek (Red Hat)
(See also / possible duplicate: .) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1846427 Title: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-17 Thread Laszlo Ersek (Red Hat)
After reading the message of commit 69f47505ee66 ("block: avoid recursive block_status call if possible", 2019-06-04), I'm none the wiser. But, I can at least confirm that all my qcow2 images are pre-allocated, as a norm. I create them with the following command: qemu-img create \ -f qcow2 \

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread psyhomb
I can confirm exactly the same issue on Arch linux running qemu-4.1.0. After downgrading from 4.1.0 => 4.0.0 everything is running normal again, no corruption detected and all qcow2 images stays healthy. -- You received this bug notification because you are a member of qemu- devel-ml, which is

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread Laszlo Ersek (Red Hat)
I haven't done any sort of "narrowing down", but recent QEMUs (built from the master branch, post-v4.1) have corrupted at least two VM disk images (qcow2) for me as well. I had to reinstall both VMs. I didn't make any noise because I was sure that, if I wasn't seeing ghosts, then others must have

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread Michael Weiser
I just quickly retested with today's master (commit 69b81893bc28feb678188fbcdce52eff1609bdad) and the automated reproducer. With the attached revert patch applied the loadvm/sleep 10/savevm/quit loop ran 50 times without problem. As soon as I removed the patch, recompiled and replaced the qemu

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread Michael Weiser
Yes. As said: > qemu compiled from the commit before does not exhibit the issue, from that > commit on it does and reverting the commit off of current master makes it > disappear. In my tests the problem only occurs with that commit in the code. I used git bisect to narrow it down to that

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread Dr. David Alan Gilbert
Hi Michael, How sure are you that it's that commit - have you checked the commit before it? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1846427 Title: 4.1.0: qcow2 corruption on

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-16 Thread Michael Weiser
> I'm seeing massive corruption of qcow2 images with qemu 4.1.0 and git master > as of 7f21573c822805a8e6be379d9bcf3ad9effef3dc after a few > savevm/quit/loadvm cycles. [...] > bisected the introduction of the problem to commit > 69f47505ee66afaa513305de0c1895a224e52c45 > (block: avoid recursive

[Bug 1846427] Re: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle

2019-10-03 Thread Dr. David Alan Gilbert
cc'd in kwolf since he signed off on that change. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1846427 Title: 4.1.0: qcow2 corruption on savevm/quit/loadvm cycle Status in QEMU: New Bug