Re: CRC mismatch

2018-10-18 Thread Austin S. Hemmelgarn

On 18/10/2018 08.02, Anton Shepelev wrote:

I wrote:


What may be the reason of a CRC mismatch on a BTRFS file in
a virutal machine:

csum failed ino 175524 off 1876295680 csum 451760558
expected csum 1446289185

Shall I seek the culprit in the host machine on in the
guest one?  Supposing the host machine healty, what
operations on the gueest might have caused a CRC mismatch?


Thank you, Austin and Chris, for your replies.  While
describing the problem for the client, I tried again to copy
the corrupt file and this time it was copied without error,
which is of course scary because errors that miraculously
disappear may suddenly reappear in the same manner.

If The filesystem was running some profile that supports repairs (pretty 
much, anything except single or raid0 profiles), then BTRFS will have 
fixed that particular block for you automatically.


Of course, the other possibility is that it was a transient error in the 
block layer that caused it tor return bogus data when the data that was 
on-disk was in fact correct.


Re: CRC mismatch

2018-10-18 Thread Anton Shepelev
I wrote:

>What may be the reason of a CRC mismatch on a BTRFS file in
>a virutal machine:
>
>csum failed ino 175524 off 1876295680 csum 451760558
>expected csum 1446289185
>
>Shall I seek the culprit in the host machine on in the
>guest one?  Supposing the host machine healty, what
>operations on the gueest might have caused a CRC mismatch?

Thank you, Austin and Chris, for your replies.  While
describing the problem for the client, I tried again to copy
the corrupt file and this time it was copied without error,
which is of course scary because errors that miraculously
disappear may suddenly reappear in the same manner.

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]


Re: CRC mismatch

2018-10-17 Thread Austin S. Hemmelgarn

On 2018-10-16 16:27, Chris Murphy wrote:

On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn
 wrote:

On 2018-10-16 11:30, Anton Shepelev wrote:


Hello, all

What may be the reason of a CRC mismatch on a BTRFS file in
a virutal machine:

 csum failed ino 175524 off 1876295680 csum 451760558
 expected csum 1446289185

Shall I seek the culprit in the host machine on in the guest
one?  Supposing the host machine healty, what operations on
the gueest might have caused a CRC mismatch?


Possible causes include:

* On the guest side:
   - Unclean shutdown of the guest system (not likely even if this did
happen).
   - A kernel bug on in the guest.
   - Something directly modifying the block device (also not very likely).

* On the host side:
   - Unclean shutdown of the host system without properly flushing data from
the guest.  Not likely unless you're using an actively unsafe caching mode
for the guest's storage back-end.
   - At-rest data corruption in the storage back-end.
   - A bug in the host-side storage stack.
   - A transient error in the host-side storage stack.
   - A bug in the hypervisor.
   - Something directly modifying the back-end storage.

Of these, the statistically most likely location for the issue is probably
the storage stack on the host.


Is there still that O_DIRECT related "bug" (or more of a limitation)
if the guest is using cache=none on the block device?
I had actually forgotten about this, and I'm not quite sure if it's 
fixed or not.


Anton what virtual machine tech are you using? qemu/kvm managed with
virt-manager? The configuration affects host behavior; but the
negative effect manifests inside the guest as corruption. If I
remember correctly.





Re: CRC mismatch

2018-10-17 Thread Anton Shepelev
[I accdientally replied to Chris instead of the mailing list]
Chris Murphy:

>Is there still that O_DIRECT related "bug" (or more of a
>limitation) if the guest is using cache=none on the block
>device?

I know nothing about it.

>Anton what virtual machine tech are you using?  qemu/kvm
>managed with virt-manager?  The configuration affects host
>behavior; but the negative effect manifests inside the
>guest as corruption.  If I remember correctly.

This is a commericial system run inside VMWare.

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]


Re: CRC mismatch

2018-10-16 Thread Chris Murphy
On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn
 wrote:
> On 2018-10-16 11:30, Anton Shepelev wrote:
>>
>> Hello, all
>>
>> What may be the reason of a CRC mismatch on a BTRFS file in
>> a virutal machine:
>>
>> csum failed ino 175524 off 1876295680 csum 451760558
>> expected csum 1446289185
>>
>> Shall I seek the culprit in the host machine on in the guest
>> one?  Supposing the host machine healty, what operations on
>> the gueest might have caused a CRC mismatch?
>>
> Possible causes include:
>
> * On the guest side:
>   - Unclean shutdown of the guest system (not likely even if this did
> happen).
>   - A kernel bug on in the guest.
>   - Something directly modifying the block device (also not very likely).
>
> * On the host side:
>   - Unclean shutdown of the host system without properly flushing data from
> the guest.  Not likely unless you're using an actively unsafe caching mode
> for the guest's storage back-end.
>   - At-rest data corruption in the storage back-end.
>   - A bug in the host-side storage stack.
>   - A transient error in the host-side storage stack.
>   - A bug in the hypervisor.
>   - Something directly modifying the back-end storage.
>
> Of these, the statistically most likely location for the issue is probably
> the storage stack on the host.

Is there still that O_DIRECT related "bug" (or more of a limitation)
if the guest is using cache=none on the block device?

Anton what virtual machine tech are you using? qemu/kvm managed with
virt-manager? The configuration affects host behavior; but the
negative effect manifests inside the guest as corruption. If I
remember correctly.

-- 
Chris Murphy


Re: CRC mismatch

2018-10-16 Thread Austin S. Hemmelgarn

On 2018-10-16 11:30, Anton Shepelev wrote:

Hello, all

What may be the reason of a CRC mismatch on a BTRFS file in
a virutal machine:

csum failed ino 175524 off 1876295680 csum 451760558
expected csum 1446289185

Shall I seek the culprit in the host machine on in the guest
one?  Supposing the host machine healty, what operations on
the gueest might have caused a CRC mismatch?


Possible causes include:

* On the guest side:
  - Unclean shutdown of the guest system (not likely even if this did 
happen).

  - A kernel bug on in the guest.
  - Something directly modifying the block device (also not very likely).

* On the host side:
  - Unclean shutdown of the host system without properly flushing data 
from the guest.  Not likely unless you're using an actively unsafe 
caching mode for the guest's storage back-end.

  - At-rest data corruption in the storage back-end.
  - A bug in the host-side storage stack.
  - A transient error in the host-side storage stack.
  - A bug in the hypervisor.
  - Something directly modifying the back-end storage.

Of these, the statistically most likely location for the issue is 
probably the storage stack on the host.


CRC mismatch

2018-10-16 Thread Anton Shepelev
Hello, all

What may be the reason of a CRC mismatch on a BTRFS file in
a virutal machine:

   csum failed ino 175524 off 1876295680 csum 451760558
   expected csum 1446289185

Shall I seek the culprit in the host machine on in the guest
one?  Supposing the host machine healty, what operations on
the gueest might have caused a CRC mismatch?

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]