Re: CRC mismatch
On 18/10/2018 08.02, Anton Shepelev wrote: I wrote: What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? Thank you, Austin and Chris, for your replies. While describing the problem for the client, I tried again to copy the corrupt file and this time it was copied without error, which is of course scary because errors that miraculously disappear may suddenly reappear in the same manner. If The filesystem was running some profile that supports repairs (pretty much, anything except single or raid0 profiles), then BTRFS will have fixed that particular block for you automatically. Of course, the other possibility is that it was a transient error in the block layer that caused it tor return bogus data when the data that was on-disk was in fact correct.
Re: CRC mismatch
I wrote: >What may be the reason of a CRC mismatch on a BTRFS file in >a virutal machine: > >csum failed ino 175524 off 1876295680 csum 451760558 >expected csum 1446289185 > >Shall I seek the culprit in the host machine on in the >guest one? Supposing the host machine healty, what >operations on the gueest might have caused a CRC mismatch? Thank you, Austin and Chris, for your replies. While describing the problem for the client, I tried again to copy the corrupt file and this time it was copied without error, which is of course scary because errors that miraculously disappear may suddenly reappear in the same manner. -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]
Re: CRC mismatch
On 2018-10-16 16:27, Chris Murphy wrote: On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn wrote: On 2018-10-16 11:30, Anton Shepelev wrote: Hello, all What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? Possible causes include: * On the guest side: - Unclean shutdown of the guest system (not likely even if this did happen). - A kernel bug on in the guest. - Something directly modifying the block device (also not very likely). * On the host side: - Unclean shutdown of the host system without properly flushing data from the guest. Not likely unless you're using an actively unsafe caching mode for the guest's storage back-end. - At-rest data corruption in the storage back-end. - A bug in the host-side storage stack. - A transient error in the host-side storage stack. - A bug in the hypervisor. - Something directly modifying the back-end storage. Of these, the statistically most likely location for the issue is probably the storage stack on the host. Is there still that O_DIRECT related "bug" (or more of a limitation) if the guest is using cache=none on the block device? I had actually forgotten about this, and I'm not quite sure if it's fixed or not. Anton what virtual machine tech are you using? qemu/kvm managed with virt-manager? The configuration affects host behavior; but the negative effect manifests inside the guest as corruption. If I remember correctly.
Re: CRC mismatch
[I accdientally replied to Chris instead of the mailing list] Chris Murphy: >Is there still that O_DIRECT related "bug" (or more of a >limitation) if the guest is using cache=none on the block >device? I know nothing about it. >Anton what virtual machine tech are you using? qemu/kvm >managed with virt-manager? The configuration affects host >behavior; but the negative effect manifests inside the >guest as corruption. If I remember correctly. This is a commericial system run inside VMWare. -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]
Re: CRC mismatch
On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn wrote: > On 2018-10-16 11:30, Anton Shepelev wrote: >> >> Hello, all >> >> What may be the reason of a CRC mismatch on a BTRFS file in >> a virutal machine: >> >> csum failed ino 175524 off 1876295680 csum 451760558 >> expected csum 1446289185 >> >> Shall I seek the culprit in the host machine on in the guest >> one? Supposing the host machine healty, what operations on >> the gueest might have caused a CRC mismatch? >> > Possible causes include: > > * On the guest side: > - Unclean shutdown of the guest system (not likely even if this did > happen). > - A kernel bug on in the guest. > - Something directly modifying the block device (also not very likely). > > * On the host side: > - Unclean shutdown of the host system without properly flushing data from > the guest. Not likely unless you're using an actively unsafe caching mode > for the guest's storage back-end. > - At-rest data corruption in the storage back-end. > - A bug in the host-side storage stack. > - A transient error in the host-side storage stack. > - A bug in the hypervisor. > - Something directly modifying the back-end storage. > > Of these, the statistically most likely location for the issue is probably > the storage stack on the host. Is there still that O_DIRECT related "bug" (or more of a limitation) if the guest is using cache=none on the block device? Anton what virtual machine tech are you using? qemu/kvm managed with virt-manager? The configuration affects host behavior; but the negative effect manifests inside the guest as corruption. If I remember correctly. -- Chris Murphy
Re: CRC mismatch
On 2018-10-16 11:30, Anton Shepelev wrote: Hello, all What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? Possible causes include: * On the guest side: - Unclean shutdown of the guest system (not likely even if this did happen). - A kernel bug on in the guest. - Something directly modifying the block device (also not very likely). * On the host side: - Unclean shutdown of the host system without properly flushing data from the guest. Not likely unless you're using an actively unsafe caching mode for the guest's storage back-end. - At-rest data corruption in the storage back-end. - A bug in the host-side storage stack. - A transient error in the host-side storage stack. - A bug in the hypervisor. - Something directly modifying the back-end storage. Of these, the statistically most likely location for the issue is probably the storage stack on the host.
CRC mismatch
Hello, all What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]