[BUG] btrfs csum failed on v3.9.0-rc7

2013-04-24 Thread Tom Gundersen
I'm having lots of problems with wrong checksums on the most recent
kernels. Note that this is not a regression as far as I know, just
more pronounced now than before (the increase in severity might be due
to changes in my setup).

I see that this was discussed on the ML a few months back, but it was
not clear to me if the problem is still open or if a solution should
have landed upstream.



This is what I'm seeing:

Pretty much on every reboot some (but not all) of the files written to
or created before the reboot are broken. If the offending files are
deleted / overwritten the problem goes away (at least until next
reboot when other files are affected). A random selection of dmesg |
grep btrfs is attached below.

As I can easily reproduce, please let me know how I can help debugging
further. For instance, how can I tell btrfs to ignore the checksum
error and give me the file it has anyway (to see if the file is
garbled, or just the checksum is wrong)?

My btrfs volume is made up of two partitions, and is split into three
subvolumes. When mounting the rootfs I see this in dmesg:

Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
Apr 24 01:31:47 toms-air kernel: btrfs: use ssd allocation scheme
Apr 24 01:31:47 toms-air kernel: btrfs: use lzo compression
Apr 24 01:31:47 toms-air kernel: btrfs: disk space caching is enabled
Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda2 errs: wr 0, rd
0, flush 0, corrupt 2056270, gen 6
Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda4 errs: wr 0, rd
0, flush 0, corrupt 2049061, gen 6
Apr 24 01:31:47 toms-air kernel: device fsid
0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2

And the output of findmnt is:

TARGET SOURCEFSTYPE OPTIONS
/home  UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
subvol=home,ssd,compress=lzo,x-systemd.automount,nofail
/var   UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
subvol=var,ssd,compress=lzo
/usr   UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
subvol=usr,ssd,compress=lzo



Errors reported in dmesg:

[10520.530437] btrfs csum failed ino 1988603 off 1277952 csum
2566472073 private 2887162790
[10520.535299] btrfs csum failed ino 1988542 off 172032 csum
1032373158 private 2555710917
[10520.535489] btrfs csum failed ino 1988542 off 172032 csum
1032373158 private 2555710917
[10520.536448] btrfs csum failed ino 1988542 off 307200 csum
2566472073 private 4189934277
[10521.404738] btrfs csum failed ino 1988603 off 1277952 csum
2566472073 private 2887162790
[10521.406514] btrfs csum failed ino 1988542 off 192512 csum
2359321615 private 259683409
[10521.407797] btrfs csum failed ino 1988542 off 372736 csum
2566472073 private 1399566794
[10521.620012] btrfs csum failed ino 1988603 off 1277952 csum
2566472073 private 2887162790
[10521.621371] btrfs csum failed ino 1988542 off 192512 csum
2359321615 private 259683409
[10521.622048] btrfs csum failed ino 1988542 off 372736 csum
2566472073 private 1399566794
[10546.115794] btrfs_readpage_end_io_hook: 26 callbacks suppressed
[10546.115806] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.116811] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.117847] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.118527] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.118910] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.119436] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.119856] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.120292] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.120683] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10546.121086] btrfs csum failed ino 1988548 off 28672 csum 2066685480
private 49363816
[10553.246253] btrfs_readpage_end_io_hook: 2 callbacks suppressed
[10553.246269] btrfs csum failed ino 114348 off 45056 csum 1787155441
private 2298707641
[10553.246541] btrfs csum failed ino 114348 off 45056 csum 1787155441
private 2298707641
[10554.761105] btrfs csum failed ino 1988542 off 372736 csum
2566472073 private 1399566794
[10554.762052] btrfs csum failed ino 1988603 off 1204224 csum
4217002373 private 516821494
[10605.966575] btrfs csum failed ino 1988548 off 28672 csum 1496083883
private 49363816
[10681.761222] btrfs csum

Re: [BUG] btrfs csum failed on v3.9.0-rc7

2013-04-24 Thread Harald Glatt
On Wed, Apr 24, 2013 at 1:24 PM, Tom Gundersen t...@jklm.no wrote:
 I'm having lots of problems with wrong checksums on the most recent
 kernels. Note that this is not a regression as far as I know, just
 more pronounced now than before (the increase in severity might be due
 to changes in my setup).

 I see that this was discussed on the ML a few months back, but it was
 not clear to me if the problem is still open or if a solution should
 have landed upstream.



 This is what I'm seeing:

 Pretty much on every reboot some (but not all) of the files written to
 or created before the reboot are broken. If the offending files are
 deleted / overwritten the problem goes away (at least until next
 reboot when other files are affected). A random selection of dmesg |
 grep btrfs is attached below.

 As I can easily reproduce, please let me know how I can help debugging
 further. For instance, how can I tell btrfs to ignore the checksum
 error and give me the file it has anyway (to see if the file is
 garbled, or just the checksum is wrong)?

 My btrfs volume is made up of two partitions, and is split into three
 subvolumes. When mounting the rootfs I see this in dmesg:

 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
 Apr 24 01:31:47 toms-air kernel: btrfs: use ssd allocation scheme
 Apr 24 01:31:47 toms-air kernel: btrfs: use lzo compression
 Apr 24 01:31:47 toms-air kernel: btrfs: disk space caching is enabled
 Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda2 errs: wr 0, rd
 0, flush 0, corrupt 2056270, gen 6
 Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda4 errs: wr 0, rd
 0, flush 0, corrupt 2049061, gen 6
 Apr 24 01:31:47 toms-air kernel: device fsid
 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2

 And the output of findmnt is:

 TARGET SOURCEFSTYPE OPTIONS
 /home  UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
 subvol=home,ssd,compress=lzo,x-systemd.automount,nofail
 /var   UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
 subvol=var,ssd,compress=lzo
 /usr   UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
 subvol=usr,ssd,compress=lzo



 Errors reported in dmesg:

 [10520.530437] btrfs csum failed ino 1988603 off 1277952 csum
 2566472073 private 2887162790
 [10520.535299] btrfs csum failed ino 1988542 off 172032 csum
 1032373158 private 2555710917
 [10520.535489] btrfs csum failed ino 1988542 off 172032 csum
 1032373158 private 2555710917
 [10520.536448] btrfs csum failed ino 1988542 off 307200 csum
 2566472073 private 4189934277
 [10521.404738] btrfs csum failed ino 1988603 off 1277952 csum
 2566472073 private 2887162790
 [10521.406514] btrfs csum failed ino 1988542 off 192512 csum
 2359321615 private 259683409
 [10521.407797] btrfs csum failed ino 1988542 off 372736 csum
 2566472073 private 1399566794
 [10521.620012] btrfs csum failed ino 1988603 off 1277952 csum
 2566472073 private 2887162790
 [10521.621371] btrfs csum failed ino 1988542 off 192512 csum
 2359321615 private 259683409
 [10521.622048] btrfs csum failed ino 1988542 off 372736 csum
 2566472073 private 1399566794
 [10546.115794] btrfs_readpage_end_io_hook: 26 callbacks suppressed
 [10546.115806] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.116811] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.117847] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.118527] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.118910] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.119436] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.119856] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.120292] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.120683] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10546.121086] btrfs csum failed ino 1988548 off 28672 csum 2066685480
 private 49363816
 [10553.246253] btrfs_readpage_end_io_hook: 2 callbacks suppressed
 [10553.246269] btrfs csum failed ino 114348 off 45056 csum 1787155441
 private 2298707641
 [10553.246541] btrfs csum failed ino 114348 off 45056 csum 1787155441
 private 2298707641
 [10554.761105] btrfs csum failed ino 1988542 off 372736 csum
 2566472073 private 1399566794
 [10554.762052] btrfs csum failed ino 1988603

btrfs csum failed, scrub ok

2012-03-27 Thread Christoph Groth
I have a freshly installed system with btrfs as the root file system.
The machine is running linux 3.2.  The raid1 btrfs file system lives on
two new hard drives.

About one day after installation the following message appeared in
kern.log.  There were no other errors.

root@mim:/var/log# grep 'btrfs.*fail' kern.log
Mar 27 01:07:46 mim kernel: [ 6480.233861] btrfs csum failed ino 453509 off 
1495040 csum 3301532933 private 4156998194
Mar 27 01:07:46 mim kernel: [ 6480.234470] btrfs csum failed ino 453509 off 
1499136 csum 1873118812 private 3512102188
Mar 27 01:07:46 mim kernel: [ 6480.234572] btrfs csum failed ino 453509 off 
1503232 csum 1034640717 private 2041007647
Mar 27 01:07:46 mim kernel: [ 6480.234670] btrfs csum failed ino 453509 off 
1507328 csum 889729013 private 2342095239
Mar 27 01:07:46 mim kernel: [ 6480.237977] btrfs csum failed ino 453509 off 
1503232 csum 1518679450 private 2041007647
Mar 27 01:07:46 mim kernel: [ 6480.238149] btrfs csum failed ino 453509 off 
1507328 csum 889729013 private 2342095239
Mar 27 01:07:46 mim kernel: [ 6480.238330] btrfs csum failed ino 453509 off 
1495040 csum 3234580989 private 4156998194
Mar 27 01:07:46 mim kernel: [ 6480.238447] btrfs csum failed ino 453509 off 
1499136 csum 1873118812 private 3512102188
Mar 27 01:07:46 mim kernel: [ 6480.243873] btrfs csum failed ino 453509 off 
1503232 csum 2184012753 private 2041007647
Mar 27 01:07:46 mim kernel: [ 6480.243962] btrfs csum failed ino 453509 off 
1507328 csum 240604621 private 2342095239

inode 453509 belongs to a file installed by dpkg

root@mim:/# find / -inum 453509 -ls
453509 1976 -rw-r--r--   1 root root  2020832 Mar  7 21:11 
/usr/lib/libreoffice/basis3.4/program/libsblx.so

That file seems to be ok, there are no errors when re-reading it.

A scrub done the morning after the incident also didn't find any
problems:

root@mim:/home/cwg# btrfs scrub status /
scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 
seconds
total bytes scrubbed: 550.20GB with 0 errors

Also inspecting the SMART status of the hard drives does not reveal any
problems.

Is this a bug in btrfs, or am I supposed to be afraid that the new hard
drives are not working reliably?  Or could this be the effect of some
cosmic ray hitting my machine?  (It doesn't have ECC.)  Or is it normal
that hard drives sometimes make errors?  (In that case the additional
layer of btrfs checksumming seems to be a very good thing.)

Christoph

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed, scrub ok

2012-03-27 Thread Roman Mamedov
On Tue, 27 Mar 2012 12:57:31 +0200
Christoph Groth c...@falma.de wrote:

 root@mim:/# find / -inum 453509 -ls
 453509 1976 -rw-r--r--   1 root root  2020832 Mar  7 21:11 
 /usr/lib/libreoffice/basis3.4/program/libsblx.so
 
 That file seems to be ok, there are no errors when re-reading it.

How about

$ sudo apt-get install debsums
$ debsums libreoffice-core | grep libsblx.so

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


Re: btrfs csum failed, scrub ok

2012-03-27 Thread Christoph Groth
Roman Mamedov r...@romanrm.ru writes:

 On Tue, 27 Mar 2012 12:57:31 +0200
 Christoph Groth c...@falma.de wrote:

 root@mim:/# find / -inum 453509 -ls
 453509 1976 -rw-r--r--   1 root root  2020832 Mar  7 21:11 
 /usr/lib/libreoffice/basis3.4/program/libsblx.so
 
 That file seems to be ok, there are no errors when re-reading it.

 How about

 $ sudo apt-get install debsums
 $ debsums libreoffice-core | grep libsblx.so

Good idea!

$ debsums libreoffice-core | grep libsblx.so
/usr/lib/libreoffice/basis3.4/program/libsblx.so  OK

I'm still puzzled by this incident.

Christoph

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed, scrub ok

2012-03-27 Thread cwillu
On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote:
 I have a freshly installed system with btrfs as the root file system.
 The machine is running linux 3.2.  The raid1 btrfs file system lives on
 two new hard drives.

 About one day after installation the following message appeared in
 kern.log.  There were no other errors.

 root@mim:/var/log# grep 'btrfs.*fail' kern.log
 Mar 27 01:07:46 mim kernel: [ 6480.233861] btrfs csum failed ino 453509 off 
 1495040 csum 3301532933 private 4156998194
 Mar 27 01:07:46 mim kernel: [ 6480.234470] btrfs csum failed ino 453509 off 
 1499136 csum 1873118812 private 3512102188
 Mar 27 01:07:46 mim kernel: [ 6480.234572] btrfs csum failed ino 453509 off 
 1503232 csum 1034640717 private 2041007647
 Mar 27 01:07:46 mim kernel: [ 6480.234670] btrfs csum failed ino 453509 off 
 1507328 csum 889729013 private 2342095239
 Mar 27 01:07:46 mim kernel: [ 6480.237977] btrfs csum failed ino 453509 off 
 1503232 csum 1518679450 private 2041007647
 Mar 27 01:07:46 mim kernel: [ 6480.238149] btrfs csum failed ino 453509 off 
 1507328 csum 889729013 private 2342095239
 Mar 27 01:07:46 mim kernel: [ 6480.238330] btrfs csum failed ino 453509 off 
 1495040 csum 3234580989 private 4156998194
 Mar 27 01:07:46 mim kernel: [ 6480.238447] btrfs csum failed ino 453509 off 
 1499136 csum 1873118812 private 3512102188
 Mar 27 01:07:46 mim kernel: [ 6480.243873] btrfs csum failed ino 453509 off 
 1503232 csum 2184012753 private 2041007647
 Mar 27 01:07:46 mim kernel: [ 6480.243962] btrfs csum failed ino 453509 off 
 1507328 csum 240604621 private 2342095239

 inode 453509 belongs to a file installed by dpkg

 root@mim:/# find / -inum 453509 -ls
 453509 1976 -rw-r--r--   1 root     root      2020832 Mar  7 21:11 
 /usr/lib/libreoffice/basis3.4/program/libsblx.so

 That file seems to be ok, there are no errors when re-reading it.

 A scrub done the morning after the incident also didn't find any
 problems:

 root@mim:/home/cwg# btrfs scrub status /
 scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
        scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 
 seconds
        total bytes scrubbed: 550.20GB with 0 errors

If btrfs is able to find a good copy, it will fix the bad copy automatically.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed, scrub ok

2012-03-27 Thread Jan Schmidt
On 27.03.2012 18:24, cwillu wrote:
 On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote:
 A scrub done the morning after the incident also didn't find any
 problems:

 root@mim:/home/cwg# btrfs scrub status /
 scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 
 seconds
total bytes scrubbed: 550.20GB with 0 errors
 
 If btrfs is able to find a good copy, it will fix the bad copy automatically.

It does mention this in your logs, though. Grep for repair, if it
doesn't occur, btrfs didn't repair any failures.

Scrub would normally find and count checksum errors, though.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed, scrub ok

2012-03-27 Thread Christoph Groth
Jan Schmidt list.bt...@jan-o-sch.net writes:
 On 27.03.2012 18:24, cwillu wrote:
 On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote:
 A scrub done the morning after the incident also didn't find any
 problems:

 root@mim:/home/cwg# btrfs scrub status /
 scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 
 seconds
total bytes scrubbed: 550.20GB with 0 errors
 
 If btrfs is able to find a good copy, it will fix the bad copy automatically.

 It does mention this in your logs, though. Grep for repair, if it
 doesn't occur, btrfs didn't repair any failures.

repair doesn't occur in the logs.  Actually, there are no other
entries from btrfs.

So why didn't btrfs try to repair a block it believed to be bad?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Martin Schitter

Am 2011-05-04 04:18, schrieb Fajar A. Nugraha:

could you give me some advice how to debug/report this specific
problem more

precise?

If it's not reproducible then I'd suspect it'd be hard to do.


the last working snapshot is from 2011-05-02-17:13. i can reproduce this 
file system corruption on one specific file in any hourly snapshot later.


whenever i make a simple:

  cat snapshot-2011-05-02-18:13/sata-images/image_xy.raw  /dev/null

i get an Input/output error and the quoted debug messages in dmesg and
kernel-log

could this be seen as an useful starting point for further investigations?


Usually checksum errors is early sign of hardware failure (most
common are disk or power supply).


that looks very unplausible to me. there is an RAID1 layer beneath btrfs 
in our setup and i don't see any errors there.


and the 'nodatasum' option should also ignore csum issues.-- isn't it?

martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Hugo Mills
On Wed, May 04, 2011 at 01:39:46PM +0200, Martin Schitter wrote:
 and the 'nodatasum' option should also ignore csum issues.-- isn't it?

   No, nodatasum will prevent newly-written data from being
checksummed.  However, if a checksum already exists (because the data
was written to a filesystem mounted without the nodatasum option),
btrfs will still verify the checksum, regardless of the current
setting of nodatasum.

   There is currently no way of preventing btrfs from verifying
checksums if they exist; I don't believe that there's any way of
removing an existing checksum, either.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Charting the inexorable advance of Western syphilisation... ---   


signature.asc
Description: Digital signature


Re: btrfs csum failed

2011-05-04 Thread cwillu
On Wed, May 4, 2011 at 5:39 AM, Martin Schitter m...@mur.at wrote:
 Am 2011-05-04 04:18, schrieb Fajar A. Nugraha:

 could you give me some advice how to debug/report this specific
 problem more

 precise?

 If it's not reproducible then I'd suspect it'd be hard to do.

 the last working snapshot is from 2011-05-02-17:13. i can reproduce this
 file system corruption on one specific file in any hourly snapshot later.

That's not surprising, any later snapshots will be sharing the same
corrupted block.

 that looks very unplausible to me. there is an RAID1 layer beneath btrfs in
 our setup and i don't see any errors there.

That doesn't rule out the possibility of corruption when it was
written in the first place, or some similar problem that the raid1
faithfully reproduced on both mirrors.  That's not to say that it's
impossible that the problem is in btrfs, just that it's not the only
plausible possibility.

 and the 'nodatasum' option should also ignore csum issues.-- isn't it?

No, it only affects writing new checksums; any existing checksums are
still checked.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Martin Schitter

Am 2011-05-04 13:51, schrieb cwillu:

that looks very unplausible to me. there is an RAID1 layer beneath btrfs in
our setup and i don't see any errors there.


That doesn't rule out the possibility of corruption when it was
written in the first place, or some similar problem that the raid1
faithfully reproduced on both mirrors.  That's not to say that it's
impossible that the problem is in btrfs, just that it's not the only
plausible possibility.


well -- i am doing a backup of all images every night. this process 
should work like a simple scrub because all data (and its checksumes) 
will be read. that's the way i stumbled over this problem!



and the 'nodatasum' option should also ignore csum issues.-- isn't it?



No, it only affects writing new checksums; any existing checksums are
still checked.


would it make some sense to remount the volume with checksumming enabled 
and run additional tests to find similar suspect blocks to prevent this 
kind of suddenly broken files?


martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Chris Mason
Excerpts from Martin Schitter's message of 2011-05-03 17:56:32 -0400:
 since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with 
 csum failures. it's a volume full of huge kvm-images on md-RAID1 and 
 LVM, so i used the mount options: 'noatime,nodatasum' to maximize the 
 performance.
 
 it happened two weeks ago for the fist time. and now again a kvm-image 
 isn't readable again. i have to use an older snapshot to substitute the 
 virtual machine.
 
 this are the entries in dmesg/kernel-log on any access:
 ...
   [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum 
 3632892464 private 2115348581
 ...
 
 it's a production machine, so i can not make to much experiments on it.
 do you see an obvious way to solve this problem?

What OS is inside these virtual machines?  The btrfs unstable tree has
some fixes for windows based OSes.

Is your kvm config using O_DIRECT?

I've also got patches here that force us to honor nodatasum even when
the file has csums, that can help if the contents of the file are
actually good.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Kaspar Schleiser

Hey Martin,

On 05/04/11 13:39, Martin Schitter wrote:

Usually checksum errors is early sign of hardware failure (most
common are disk or power supply).


that looks very unplausible to me. there is an RAID1 layer beneath btrfs
in our setup and i don't see any errors there.
Is the btrfs RAID1 itself inside a virtual machine? I've had data 
corruption with virtio block devices  1TB on early squeeze kernels.


Kaspar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Martin Schitter

Am 2011-05-04 14:31, schrieb Kaspar Schleiser:

Is the btrfs RAID1 itself inside a virtual machine? I've had data
corruption with virtio block devices  1TB on early squeeze kernels.


no -- it's on the (native) host side. and we use a very actual kernel 
from debian 'testing' (2.6.38-2).


martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-04 Thread Martin Schitter

Am 2011-05-04 14:39, schrieb Chris Mason:

What OS is inside these virtual machines?  The btrfs unstable tree has
some fixes for windows based OSes.


we have only linux guests of different flavor, no windows guests.

both corruptions during this last weeks belong to different virtual 
block device images of the same guest instance.



Is your kvm config using O_DIRECT?


yes -- the kvm/qemu option cache=none implies O_DIRECT.


I've also got patches here that force us to honor nodatasum even when
the file has csums, that can help if the contents of the file are
actually good.


that sounds interessting! in our case it may be easier do use same 
recent backup data, but it could be very helpful in similar situations.


i would really like to help isolating the reasons of this failure and a 
find a practical strategy to prevent additional breakdowns.


thanks
martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs csum failed

2011-05-03 Thread Martin Schitter
since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with 
csum failures. it's a volume full of huge kvm-images on md-RAID1 and 
LVM, so i used the mount options: 'noatime,nodatasum' to maximize the 
performance.


it happened two weeks ago for the fist time. and now again a kvm-image 
isn't readable again. i have to use an older snapshot to substitute the 
virtual machine.


this are the entries in dmesg/kernel-log on any access:
...
 [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum 
3632892464 private 2115348581

...

it's a production machine, so i can not make to much experiments on it.
do you see an obvious way to solve this problem?

thanks!
martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-03 Thread Martin Schitter

Am 2011-05-04 02:28, schrieb Josef Bacik:

Wait why are you running with btrfs in production?


do you know a better alternative for continuous snapshots? :)

it works surprisingly well since more than a year.
well the performance could be better for vm-image-hosting but it works.

we used cache='writeback' for a long time but now all virtual instances 
have set cache='none'



What OS is in this vm image?


2.6.30-bpo.1-amd64 with virtio-driver

could you give me some advice how to debug/report this specific problem 
more precise?


thanks
martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed

2011-05-03 Thread Fajar A. Nugraha
On Wed, May 4, 2011 at 7:44 AM, Martin Schitter m...@mur.at wrote:
 Am 2011-05-04 02:28, schrieb Josef Bacik:

 Wait why are you running with btrfs in production?

 do you know a better alternative for continuous snapshots? :)

zfs :D


 it works surprisingly well since more than a year.
 well the performance could be better for vm-image-hosting but it works.

 we used cache='writeback' for a long time but now all virtual instances have
 set cache='none'

 What OS is in this vm image?

 2.6.30-bpo.1-amd64 with virtio-driver

 could you give me some advice how to debug/report this specific problem more
 precise?

If it's not reproducible then I'd suspect it'd be hard to do.

Usually checksum errors is early sign of hardware failure (most common
are disk or power supply).

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


recurring btrfs csum failed

2011-03-24 Thread Tomasz Chmielewski
I had a system freeze for some reason with 2.6.38.

I made a hard reboot, just to discover some of the files (KVM images, were in 
use when the crash happened) on btrfs RAID-1 filesystem are corrupted:

btrfs csum failed ino 257 off 120180736 csum 4246715593 private 48329
btrfs csum failed ino 257 off 120180736 csum 4246715593 private 48329
btrfs csum failed ino 257 off 120180736 csum 4246715593 private 48329


Not being in mood if btrfs should try the other device from the mirror, I 
decided to remove the corrupted file and copy a previous version stored on a 
ext3 filesystem.

The file copied fine, but to my surprise, the new file is still corrupted:

# md5sum vm-113-disk-1.raw 
md5sum: vm-113-disk-1.raw: Input/output error


Errors reported by btrfs are slightly different now:

btrfs csum failed ino 260 extent 21968855040 csum 582168802 wanted 1727644489 
mirror 1
btrfs csum failed ino 260 extent 21948932096 csum 582168802 wanted 1727644489 
mirror 2
btrfs csum failed ino 260 extent 21968855040 csum 582168802 wanted 1727644489 
mirror 1
btrfs csum failed ino 260 extent 21968855040 csum 582168802 wanted 1727644489 
mirror 1
btrfs csum failed ino 260 extent 21948932096 csum 582168802 wanted 1727644489 
mirror 2
btrfs csum failed ino 260 extent 21968855040 csum 582168802 wanted 1727644489 
mirror 1
btrfs csum failed ino 260 extent 21948932096 csum 582168802 wanted 1727644489 
mirror 2



btrfs is mounted with these flags:

/dev/sdc on /mnt/btrfs type btrfs 
(rw,noatime,compress-force=lzo,device=/dev/sdc,device=/dev/sdd)


I don't need to recover the file, just trying to signal something doesn't work 
well here!

-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Markus Trippelsdorf
On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
 On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
   On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
Just got this error today in my dmesg:
btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
43905798

linux % find . -inum 1483065
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

It's the main pack file from my git linux kernel tree:

   
   Hmm, I ran into something very similar. Care to check what the corrupted
   block of data looks like (and how big it is)?
  
  I've hit the same problem again today:
  
  btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 
  1660028275
  
  The file in question is:
  ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
  
  I can't read the file directly, because of the csum mismatch:
 
 Chris, is there a way to force reading the file? Seems like that would
 be a very handy feature.
 
 Markus, not sure if that works, but you could always try and remount
 with data checksumming disabled.
 
 mount /dev/fooX -o remount,rw,nodatasum
 
 should do the trick.

That doesn't work unfortunately, btrfs still calculates and compares the
checksums (it won't write new ones I guess).

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Jens Axboe
On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
 On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
  On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
   On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
 Just got this error today in my dmesg:
 btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
 43905798
 
 linux % find . -inum 1483065
 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 
 It's the main pack file from my git linux kernel tree:
 

Hmm, I ran into something very similar. Care to check what the corrupted
block of data looks like (and how big it is)?
   
   I've hit the same problem again today:
   
   btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 
   1660028275
   
   The file in question is:
   ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
   
   I can't read the file directly, because of the csum mismatch:
  
  Chris, is there a way to force reading the file? Seems like that would
  be a very handy feature.
  
  Markus, not sure if that works, but you could always try and remount
  with data checksumming disabled.
  
  mount /dev/fooX -o remount,rw,nodatasum
  
  should do the trick.
 
 That doesn't work unfortunately, btrfs still calculates and compares the
 checksums (it won't write new ones I guess).

Ah ok, as mentioned I wasn't sure whether that would work or not. I'll
defer to Chris :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Markus Trippelsdorf
On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe wrote:
 On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
  On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
   On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
  43905798
  
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
  
  It's the main pack file from my git linux kernel tree:
  
 
 Hmm, I ran into something very similar. Care to check what the 
 corrupted
 block of data looks like (and how big it is)?

I've hit the same problem again today:

btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 
1660028275

The file in question is:
./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack

I can't read the file directly, because of the csum mismatch:
   
   Chris, is there a way to force reading the file? Seems like that would
   be a very handy feature.
   
   Markus, not sure if that works, but you could always try and remount
   with data checksumming disabled.
   
   mount /dev/fooX -o remount,rw,nodatasum
   
   should do the trick.
  
  That doesn't work unfortunately, btrfs still calculates and compares the
  checksums (it won't write new ones I guess).
 
 Ah ok, as mentioned I wasn't sure whether that would work or not. I'll
 defer to Chris :-)

Understood.

I did some further investigations and was able to reconstruct exactly
the same pack file in question by starting from an older backup copy of
my git repro and then running the same git commands as previous. 
Then I did a binary comparison between this reconstructed file and a
corrupted backup copy from the time before the csum errors occured (I
automatically backup every 4h).

This is the result (first line good pack file, second line corrupted
file):

vbindiff 
debug/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack 
debug2/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack

0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C

06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11

06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A

0802 C3C0: 5C A5 E1 4A 1C BC 14 04  16 4A 29 D3 CC EF A6 80
0802 C3C0: 5C 25 E1 4A 1C BC 14 04  16 48 29 D3 CC EF A6 80

081A B3C0: 7D 7A 2C CD 20 89 E5 F2  A8 D3 32 38 04 BA 8A B5
081A B3C0: 7D 3A 2C CD 20 89 E5 F2  A8 D3 32 38 04 BA 8A B5

098E C430: FE 24 4A 19 09 F4 D5 1F  22 E8 36 FA F8 55 B2 6E
098E C430: FE 24 4A 19 09 F4 D5 1F  22 E0 36 FA F8 55 B2 6E

098E C440: 1B 3F C1 B4 BB 80 F8 5A  FB EE 0D A3 3F C5 A4 DB
098E C440: 1B 3D C1 B4 BB 80 F8 5A  FB EE 0D A3 3F C5 A4 DB

098E C4D0: F8 6C E2 65 18 7A 5D 33  2E 35 77 64 B2 81 BE DF
098E C4D0: F8 6C E2 65 18 7A 5D 33  2E 25 77 64 B2 81 BE DF

098E C4E0: 05 18 DE E3 00 78 D2 2C  4F 91 8F AF 0B F6 0C 31
098E C4E0: 05 1C DE E3 00 78 D2 2C  4F 91 8F AF 0B F6 0C 31

098E C500: 0A 12 D3 E7 FA B8 40 DE  0D 71 94 88 5D 4C 97 21
098E C500: 0A 12 D3 E7 FA B8 40 DE  0D 51 94 88 5D 4C 97 21

098E C540: 93 F2 58 C7 49 9A AA EB  30 3D 28 AA E3 09 4B 7B
098E C540: 93 F2 58 C7 49 9A AA EB  30 3C 28 AA E3 09 4B 7B

0FDE C420: F3 6A C2 38 76 43 9E 86  0D 9C 89 86 F1 E6 B0 F2
0FDE C420: F3 6A C2 38 76 43 9E 86  0D DC 89 86 F1 E6 B0 F2

0FDE C430: 38 E4 69 2E 22 1D E4 FF  90 A7 C6 E8 9F 08 4C 98
0FDE C430: 38 E4 69 2E 22 1D E4 FF  90 A5 C6 E8 9F 08 4C 98

1214 A4C0: 24 D6 56 AC 8B D8 D0 9B  D2 62 7B 83 C7 0B 3D BE
1214 A4C0: 24 D4 56 AC 8B D8 D0 9B  D2 62 7B 83 C7 0B 3D BE

1214 A500: EC 51 D3 FF C5 7D 30 DD  6D 45 50 FE E9 64 A4 FC
1214 A500: EC 11 D3 FF C5 7D 30 DD  6D 45 50 FE E9 64 A4 FC

1214 A520: D9 4D 63 EB 77 4D F0 BE  5E B3 6B DE E6 D2 28 67
1214 A520: D9 4D 63 EB 77 4D F0 BE  5E 33 6B DE E6 D2 28 67

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Markus Trippelsdorf
On Thu, Sep 17, 2009 at 02:15:01PM +0200, Markus Trippelsdorf wrote:
 On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe wrote:
  On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
   On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
 On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
  On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
   Just got this error today in my dmesg:
   btrfs csum failed ino 1483065 off 158482432 csum 4283543305 
   private 43905798
   
   linux % find . -inum 1483065
   ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
   
   It's the main pack file from my git linux kernel tree:
   
  
  Hmm, I ran into something very similar. Care to check what the 
  corrupted
  block of data looks like (and how big it is)?
 
 I've hit the same problem again today:
 
 btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 
 1660028275
 
 The file in question is:
 ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
 
 I can't read the file directly, because of the csum mismatch:

Chris, is there a way to force reading the file? Seems like that would
be a very handy feature.

Markus, not sure if that works, but you could always try and remount
with data checksumming disabled.

mount /dev/fooX -o remount,rw,nodatasum

should do the trick.
   
   That doesn't work unfortunately, btrfs still calculates and compares the
   checksums (it won't write new ones I guess).
  
  Ah ok, as mentioned I wasn't sure whether that would work or not. I'll
  defer to Chris :-)
 
 Understood.
 
 I did some further investigations and was able to reconstruct exactly
 the same pack file in question by starting from an older backup copy of
 my git repro and then running the same git commands as previous. 
 Then I did a binary comparison between this reconstructed file and a
 corrupted backup copy from the time before the csum errors occured (I
 automatically backup every 4h).
 
Thanks to Chris' patch (from IRC) I was able to compare the file with
the csum error to the reconstructed one. You'll find the reults as
attachments.

-- 
Markus
08F403A0   5D 8E B3 32  7D 8F 5D E7  54 B6 9D 1E  E6 0C 9B 0D  BE 1D 9D 0C  34 
BA 7F FE  7F D4 E5 1A  0A 16 29 96
105AC3A0   76 80 1E 0A  3F 8A 7E FC  B3 2E 2B 9E  9E 53 82 10  C3 F6 4B C1  C0 
12 FC 61  A5 0E 63 70  B0 A4 7B 27
105AC3C0   DC AE 26 CE  48 5D CA 07  B7 26 B6 3C  BC 91 AD 00  55 97 BF E4  8C 
D7 EF AA  28 B7 54 65  30 DB 78 A6
105AC3E0   26 90 18 88  8F F4 25 91  48 5F 9C F6  4F 0D 46 72  A2 04 77 1A  AF 
FB 88 23  93 AF FB AA  B9 82 BC CC
08F403A0   5D 8E B3 32  7D 8F 5D E7  54 B4 9D 1E  E6 0C 9B 0D  BE 1D 9D 0C  34 
BA 7F FE  7F D4 E5 1A  0A 16 29 96
105AC3A0   76 80 1E 0A  3F 8A 7E FC  B3 2E 2B 9E  9E 53 82 10  C3 F7 4B C1  C0 
12 FC 61  A5 0E 63 70  B0 A4 7B 27
105AC3C0   DC AE 26 CE  48 5D CA 07  B7 77 B6 3C  BC 91 AD 00  55 97 BF E4  8C 
D7 EF AA  28 A7 54 65  30 DB 78 A6
105AC3E0   26 90 18 88  8F F4 25 91  48 5F 9C F6  4F 0D 46 72  A2 04 77 1A  AF 
FB 88 23  93 AF FB AA  B9 82 BC CC


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Zach Brown

 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C

B7 = 10110111
33 = 00110011

 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11

5E = 0100
1E = 0000

 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A

4B = 01001011
0B = 1011

And so on.

It looks like a few bits are getting flipped at the same byte offset.
One can imagine software bugs that would do this, certainly, but upset
hardware seems awfully likely too.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Markus Trippelsdorf
On Thu, Sep 17, 2009 at 10:00:28AM -0700, Zach Brown wrote:
 
  0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
  0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C
 
 B7 = 10110111
 33 = 00110011
 
  06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
  06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11
 
 5E = 0100
 1E = 0000
 
  06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
  06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A
 
 4B = 01001011
 0B = 1011
 
 And so on.
 
 It looks like a few bits are getting flipped at the same byte offset.
 One can imagine software bugs that would do this, certainly, but upset
 hardware seems awfully likely too.

I'm afraid you're right. I did some further tests and now I'm pretty
sure that a bad RAM module was the root cause of it all...
Oh well.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-17 Thread Tomasz Torcz
On Thu, Sep 17, 2009 at 07:10:06PM +0200, Markus Trippelsdorf wrote:
   06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
   06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A
  
  4B = 01001011
  0B = 1011
  
  And so on.
  
  It looks like a few bits are getting flipped at the same byte offset.
  One can imagine software bugs that would do this, certainly, but upset
  hardware seems awfully likely too.
 
 I'm afraid you're right. I did some further tests and now I'm pretty
 sure that a bad RAM module was the root cause of it all...
 Oh well.

  On the other hand, that what's so great in checksumming filesystems.
You found bad module thanks to btrfs, otherwise you wouldn't suspect
anything wrong. If you have had raid-1 for data, this corruption would
have been fixed by btrfs.

-- 
Tomasz Torcz   72-|   80-|
xmpp: zdzich...@chrome.pl  72-|   80-|

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-10 Thread Bryan Østergaard
On Wed, Sep 9, 2009 at 11:01 PM, Oliver Mattos
oliver.matto...@imperial.ac.uk wrote:

 What a strange coincidence that it affected git pack files in both cases.
 It's almost too improbable...

I had similar problems with a broken git repository about two weeks
ago. This was on a regular laptop harddrive that's never reported any
errors.

Unfortunately I rm'ed the repository and cloned it again so I can't
check exactly what caused the corruption. Interestingly I've just
discovered a broken tar.bz2 file that shows similar symptoms as what's
been described here earlier.

The first (and by far largest) chunk of the file consists entirely of
0x01 bytes followed by a smaller chunk that appears to be a PNG file
and then arch/sparc/include/asm/fhc.h from the linux kernel. After
this I have a small chunk of 0x00 bytes followed by
arch/sparc/include/asm/floppy.h.

This pattern is repeated several times with different include files
from the kernel sources and the file ends with a small chunk of 0x01
bytes again.

The harddisk in question is:
=== START OF INFORMATION SECTION ===
Model Family: Fujitsu MHV series
Device Model: FUJITSU MHV2080BH
Serial Number:NW05T6425FRY
Firmware Version: 00840028
User Capacity:80,025,280,000 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:Thu Sep 10 12:40:10 2009 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

As already mentioned it's never reported any errors and I also haven't
seen any problems like this before when using ext3 or ext4. The broken
file is available at http://omploader.org/vMmJtbg if that's any help.

Regards,
Bryan Østergaard
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Markus Trippelsdorf
On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
 On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
   On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
Just got this error today in my dmesg:
btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
43905798

linux % find . -inum 1483065
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

It's the main pack file from my git linux kernel tree:

   
   Hmm, I ran into something very similar. Care to check what the corrupted
   block of data looks like (and how big it is)?
  
  I've already deleted the file in question unfortunately.
  On IRC Chris decided that either bad RAM or a harddrive error was the
  most likely reason for this chechsum mismatch.
 
 Darn, that's too bad. The corruption issue I had was also in a git pack
 file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
 in the file, and I blamed it on the (cheap) SSD drive that hosted the
 local git repo. It's still the most likely explanation given the nature
 of the problem, however it would have been really interesting to see
 what corruption you had.

If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
be using the same hardware (30GB Vertex in my case). 
What a strange coincidence that it affected git pack files in both cases.
It's almost too improbable...

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Markus Trippelsdorf
On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote:
 On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
   On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
  43905798
  
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
  
  It's the main pack file from my git linux kernel tree:
  
 
 Hmm, I ran into something very similar. Care to check what the 
 corrupted
 block of data looks like (and how big it is)?

I've already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.
   
   Darn, that's too bad. The corruption issue I had was also in a git pack
   file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
   in the file, and I blamed it on the (cheap) SSD drive that hosted the
   local git repo. It's still the most likely explanation given the nature
   of the problem, however it would have been really interesting to see
   what corruption you had.
  
  If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
  be using the same hardware (30GB Vertex in my case). 
 
 Spooky, yes indeed that's the very same drive I'm using. Also see my
 postings on this very issue here, top two entries:
 
 http://axboe.livejournal.com/
 
 So that pretty much looks like it reaffirms some of my suspicions. Is
 the drive in a laptop that you suspend and resume?

No. I use it in my workstation, that I never switch off normally.

  What a strange coincidence that it affected git pack files in both cases.
  It's almost too improbable...
 
 Probably more than a coincidence I think, the question is what though...

If it really was an SSD error, then it should happen randomly, messing up
random files. But (contrary to your experience) I never had any issues with 
this SSD until this single failed checksum.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Jens Axboe
On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
 On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote:
  On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
   On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
 On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
  On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
   Just got this error today in my dmesg:
   btrfs csum failed ino 1483065 off 158482432 csum 4283543305 
   private 43905798
   
   linux % find . -inum 1483065
   ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
   
   It's the main pack file from my git linux kernel tree:
   
  
  Hmm, I ran into something very similar. Care to check what the 
  corrupted
  block of data looks like (and how big it is)?
 
 I've already deleted the file in question unfortunately.
 On IRC Chris decided that either bad RAM or a harddrive error was the
 most likely reason for this chechsum mismatch.

Darn, that's too bad. The corruption issue I had was also in a git pack
file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
in the file, and I blamed it on the (cheap) SSD drive that hosted the
local git repo. It's still the most likely explanation given the nature
of the problem, however it would have been really interesting to see
what corruption you had.
   
   If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
   be using the same hardware (30GB Vertex in my case). 
  
  Spooky, yes indeed that's the very same drive I'm using. Also see my
  postings on this very issue here, top two entries:
  
  http://axboe.livejournal.com/
  
  So that pretty much looks like it reaffirms some of my suspicions. Is
  the drive in a laptop that you suspend and resume?
 
 No. I use it in my workstation, that I never switch off normally.

OK, so we can rule out any interactions between suspending and resuming
the drive. That's at least something.

   What a strange coincidence that it affected git pack files in both cases.
   It's almost too improbable...
  
  Probably more than a coincidence I think, the question is what though...
 
 If it really was an SSD error, then it should happen randomly, messing up
 random files. But (contrary to your experience) I never had any issues with 
 this SSD until this single failed checksum.

Not necessarily, they may be some pattern to how the pack files are
accessed (that propagates through to the drive). The fact is, 0xff is an
extremely weird piece of corruption that just reeks of bad flash blocks.
It's almost impossible that it is a software error. If it was all
zeroes, or a bit flip, the likely causes would be very different.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Daniel J Blueman
On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote:
 On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
 On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
  On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
   On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
 Just got this error today in my dmesg:
 btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
 43905798

 linux % find . -inum 1483065
 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

 It's the main pack file from my git linux kernel tree:

   
Hmm, I ran into something very similar. Care to check what the 
corrupted
block of data looks like (and how big it is)?
  
   I've already deleted the file in question unfortunately.
   On IRC Chris decided that either bad RAM or a harddrive error was the
   most likely reason for this chechsum mismatch.
 
  Darn, that's too bad. The corruption issue I had was also in a git pack
  file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
  in the file, and I blamed it on the (cheap) SSD drive that hosted the
  local git repo. It's still the most likely explanation given the nature
  of the problem, however it would have been really interesting to see
  what corruption you had.

 If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
 be using the same hardware (30GB Vertex in my case).

 Spooky, yes indeed that's the very same drive I'm using. Also see my
 postings on this very issue here, top two entries:

 http://axboe.livejournal.com/

 So that pretty much looks like it reaffirms some of my suspicions. Is
 the drive in a laptop that you suspend and resume?

If you're on firmware  1.30, the changlog includes some fixes which
may be relevant, eg if block 0 is relative, or you're
suspending/resuming:

- Race condition occurred during soft reset handler
- If read fail occurs during reading stamp information, firmware
corrupted block 0.
- Power off recovery had bug in certain circumstances

http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Jens Axboe
On Wed, Sep 09 2009, Daniel J Blueman wrote:
 On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote:
  On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
   On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 
  private 43905798
 
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 
  It's the main pack file from my git linux kernel tree:
 

 Hmm, I ran into something very similar. Care to check what the 
 corrupted
 block of data looks like (and how big it is)?
   
I've already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.
  
   Darn, that's too bad. The corruption issue I had was also in a git pack
   file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
   in the file, and I blamed it on the (cheap) SSD drive that hosted the
   local git repo. It's still the most likely explanation given the nature
   of the problem, however it would have been really interesting to see
   what corruption you had.
 
  If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
  be using the same hardware (30GB Vertex in my case).
 
  Spooky, yes indeed that's the very same drive I'm using. Also see my
  postings on this very issue here, top two entries:
 
  http://axboe.livejournal.com/
 
  So that pretty much looks like it reaffirms some of my suspicions. Is
  the drive in a laptop that you suspend and resume?
 
 If you're on firmware  1.30, the changlog includes some fixes which
 may be relevant, eg if block 0 is relative, or you're
 suspending/resuming:
 
 - Race condition occurred during soft reset handler
 - If read fail occurs during reading stamp information, firmware
 corrupted block 0.
 - Power off recovery had bug in certain circumstances
 
 http://www.ocztechnologyforum.com/forum/showthread.php?t=57516

The issue is pretty much moot at this point, since OCZ support were not
really interested in providing any sort of real technical support to
find out what really caused this issue. My main worry was reliability of
these cheaper SSD drives, and that worry is still not resolved. If you
read the blog entries, I do comment on the apparently scary basic bugs
taht are still being fixed on the Indilinx controllers. I do expect some
basic level of data integrity from a consumer product and at least some
interest in resolving weird corruption issues if things go wrong. Since
OCZ cannot provide anything like that, I have a hard time recommending
these drives for anything but very casual use. Fast, cheap, reliable.
Pick any two.

My drive was running 1.10 at the time of the problem.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Daniel J Blueman
On Wed, Sep 9, 2009 at 9:26 AM, Jens Axboejens.ax...@oracle.com wrote:
 On Wed, Sep 09 2009, Daniel J Blueman wrote:
 On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote:
  On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
   On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 
  private 43905798
 
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 
  It's the main pack file from my git linux kernel tree:
 

 Hmm, I ran into something very similar. Care to check what the 
 corrupted
 block of data looks like (and how big it is)?
   
I've already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.
  
   Darn, that's too bad. The corruption issue I had was also in a git pack
   file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
   in the file, and I blamed it on the (cheap) SSD drive that hosted the
   local git repo. It's still the most likely explanation given the nature
   of the problem, however it would have been really interesting to see
   what corruption you had.
 
  If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
  be using the same hardware (30GB Vertex in my case).
 
  Spooky, yes indeed that's the very same drive I'm using. Also see my
  postings on this very issue here, top two entries:
 
  http://axboe.livejournal.com/
 
  So that pretty much looks like it reaffirms some of my suspicions. Is
  the drive in a laptop that you suspend and resume?

 If you're on firmware  1.30, the changlog includes some fixes which
 may be relevant, eg if block 0 is relative, or you're
 suspending/resuming:

 - Race condition occurred during soft reset handler
 - If read fail occurs during reading stamp information, firmware
 corrupted block 0.
 - Power off recovery had bug in certain circumstances

 http://www.ocztechnologyforum.com/forum/showthread.php?t=57516

 The issue is pretty much moot at this point, since OCZ support were not
 really interested in providing any sort of real technical support to
 find out what really caused this issue. My main worry was reliability of
 these cheaper SSD drives, and that worry is still not resolved. If you
 read the blog entries, I do comment on the apparently scary basic bugs
 taht are still being fixed on the Indilinx controllers. I do expect some
 basic level of data integrity from a consumer product and at least some
 interest in resolving weird corruption issues if things go wrong. Since
 OCZ cannot provide anything like that, I have a hard time recommending
 these drives for anything but very casual use. Fast, cheap, reliable.
 Pick any two.

 My drive was running 1.10 at the time of the problem.

It looks like we need a small tool which performs patterned block I/O
to the device, updating a checksum as it goes, and performing
integrity sweeps at intervals, lower level than fsx. It must be
trusted or not.

I had a problem like this with nVidia CK804/MCP55 chipsets corrupting
data under a triple-edge case workload.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Chris Mason
On Wed, Sep 09, 2009 at 09:37:42AM +0100, Daniel J Blueman wrote:
 
  http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
 
  The issue is pretty much moot at this point, since OCZ support were not
  really interested in providing any sort of real technical support to
  find out what really caused this issue. My main worry was reliability of
  these cheaper SSD drives, and that worry is still not resolved. If you
  read the blog entries, I do comment on the apparently scary basic bugs
  taht are still being fixed on the Indilinx controllers. I do expect some
  basic level of data integrity from a consumer product and at least some
  interest in resolving weird corruption issues if things go wrong. Since
  OCZ cannot provide anything like that, I have a hard time recommending
  these drives for anything but very casual use. Fast, cheap, reliable.
  Pick any two.
 
  My drive was running 1.10 at the time of the problem.
 
 It looks like we need a small tool which performs patterned block I/O
 to the device, updating a checksum as it goes, and performing
 integrity sweeps at intervals, lower level than fsx. It must be
 trusted or not.
 
 I had a problem like this with nVidia CK804/MCP55 chipsets corrupting
 data under a triple-edge case workload.

Well, just use git ;)  Apply a bunch of patches (say the mm tree) with
guilt and repack in a loop.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Oliver Mattos



What a strange coincidence that it affected git pack files in both cases.
It's almost too improbable...



Probably more than a coincidence I think, the question is what though...


Some SSD drives (or rather the cheap wear levelling controllers in things
like USB sticks) have firmware which tries to recognise certain data
structures of common filesystems (like FAT and NTFS), and uses information
in those data structures to optimise the allocation and erasure of blocks
(for example the free space linked list in FAT).  If the data you were
saving to the disk was similar to one of those data structures, you might've
triggered one of those algorithms, which would cause data corruption.  This
is common in high performance usb sticks because they want to pre-erase
blocks on file deletion for operating systems not supporting SCSI TRIM - I
imagine the same technology might carry across to cheap SSD's.

Not much BTRFS can do about it though.  If the piece of data that triggers
the bug could be identified, workarounds could possibly be introduced for
the particular buggy controllers.

Oliver Mattos

(resent as I emailled wrong recipients before) 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-08 Thread Jens Axboe
On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
 On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
  On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
   Just got this error today in my dmesg:
   btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
   43905798
   
   linux % find . -inum 1483065
   ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
   
   It's the main pack file from my git linux kernel tree:
   
   linux % ls -l ./.git/objects/pack/
   total 562848
   -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49 
   pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
   -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54 
   pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
   -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 
   pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
   -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48 
   pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
   -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 
   pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
   -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
   
   I'm running the latest git kernel and I've been using btrfs as my root
   fs for the last few weeks without problems so far.
  
  Hmm, I ran into something very similar. Care to check what the corrupted
  block of data looks like (and how big it is)?
 
 I've already deleted the file in question unfortunately.
 On IRC Chris decided that either bad RAM or a harddrive error was the
 most likely reason for this chechsum mismatch.

Darn, that's too bad. The corruption issue I had was also in a git pack
file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
in the file, and I blamed it on the (cheap) SSD drive that hosted the
local git repo. It's still the most likely explanation given the nature
of the problem, however it would have been really interesting to see
what corruption you had.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-08 Thread Jens Axboe
On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
 Just got this error today in my dmesg:
 btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798
 
 linux % find . -inum 1483065
 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 
 It's the main pack file from my git linux kernel tree:
 
 linux % ls -l ./.git/objects/pack/
 total 562848
 -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49 
 pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
 -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54 
 pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
 -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 
 pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
 -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48 
 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
 -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 
 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
 
 I'm running the latest git kernel and I've been using btrfs as my root
 fs for the last few weeks without problems so far.

Hmm, I ran into something very similar. Care to check what the corrupted
block of data looks like (and how big it is)?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-08 Thread Markus Trippelsdorf
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798
  
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
  
  It's the main pack file from my git linux kernel tree:
  
  linux % ls -l ./.git/objects/pack/
  total 562848
  -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49 
  pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
  -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54 
  pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
  -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 
  pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
  -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48 
  pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
  -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 
  pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
  -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
  
  I'm running the latest git kernel and I've been using btrfs as my root
  fs for the last few weeks without problems so far.
 
 Hmm, I ran into something very similar. Care to check what the corrupted
 block of data looks like (and how big it is)?

I've already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs csum failed on git .pack file

2009-09-07 Thread Markus Trippelsdorf
Just got this error today in my dmesg:
btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798

linux % find . -inum 1483065
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

It's the main pack file from my git linux kernel tree:

linux % ls -l ./.git/objects/pack/
total 562848
-rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49 
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
-rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54 
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
-rw-r--r-- 1 markus markus730332 2008-11-29 19:49 
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
-r--r--r-- 1 markus markus  36061684 2009-09-06 21:48 
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
-r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
-rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER

I'm running the latest git kernel and I've been using btrfs as my root
fs for the last few weeks without problems so far.

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html