Re: btrfs csum failed on git .pack file
On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've hit the same problem again today: btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 1660028275 The file in question is: ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack I can't read the file directly, because of the csum mismatch: Chris, is there a way to force reading the file? Seems like that would be a very handy feature. Markus, not sure if that works, but you could always try and remount with data checksumming disabled. mount /dev/fooX -o remount,rw,nodatasum should do the trick. That doesn't work unfortunately, btrfs still calculates and compares the checksums (it won't write new ones I guess). -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've hit the same problem again today: btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 1660028275 The file in question is: ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack I can't read the file directly, because of the csum mismatch: Chris, is there a way to force reading the file? Seems like that would be a very handy feature. Markus, not sure if that works, but you could always try and remount with data checksumming disabled. mount /dev/fooX -o remount,rw,nodatasum should do the trick. That doesn't work unfortunately, btrfs still calculates and compares the checksums (it won't write new ones I guess). Ah ok, as mentioned I wasn't sure whether that would work or not. I'll defer to Chris :-) -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've hit the same problem again today: btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 1660028275 The file in question is: ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack I can't read the file directly, because of the csum mismatch: Chris, is there a way to force reading the file? Seems like that would be a very handy feature. Markus, not sure if that works, but you could always try and remount with data checksumming disabled. mount /dev/fooX -o remount,rw,nodatasum should do the trick. That doesn't work unfortunately, btrfs still calculates and compares the checksums (it won't write new ones I guess). Ah ok, as mentioned I wasn't sure whether that would work or not. I'll defer to Chris :-) Understood. I did some further investigations and was able to reconstruct exactly the same pack file in question by starting from an older backup copy of my git repro and then running the same git commands as previous. Then I did a binary comparison between this reconstructed file and a corrupted backup copy from the time before the csum errors occured (I automatically backup every 4h). This is the result (first line good pack file, second line corrupted file): vbindiff debug/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack debug2/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack 0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 B7 FD AB DA 74 2D 1C 0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 33 FD AB DA 74 2D 1C 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 5E 7E EB DA 5F D6 11 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 1E 7E EB DA 5F D6 11 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 4B 08 94 C0 65 17 3A 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 0B 08 94 C0 65 17 3A 0802 C3C0: 5C A5 E1 4A 1C BC 14 04 16 4A 29 D3 CC EF A6 80 0802 C3C0: 5C 25 E1 4A 1C BC 14 04 16 48 29 D3 CC EF A6 80 081A B3C0: 7D 7A 2C CD 20 89 E5 F2 A8 D3 32 38 04 BA 8A B5 081A B3C0: 7D 3A 2C CD 20 89 E5 F2 A8 D3 32 38 04 BA 8A B5 098E C430: FE 24 4A 19 09 F4 D5 1F 22 E8 36 FA F8 55 B2 6E 098E C430: FE 24 4A 19 09 F4 D5 1F 22 E0 36 FA F8 55 B2 6E 098E C440: 1B 3F C1 B4 BB 80 F8 5A FB EE 0D A3 3F C5 A4 DB 098E C440: 1B 3D C1 B4 BB 80 F8 5A FB EE 0D A3 3F C5 A4 DB 098E C4D0: F8 6C E2 65 18 7A 5D 33 2E 35 77 64 B2 81 BE DF 098E C4D0: F8 6C E2 65 18 7A 5D 33 2E 25 77 64 B2 81 BE DF 098E C4E0: 05 18 DE E3 00 78 D2 2C 4F 91 8F AF 0B F6 0C 31 098E C4E0: 05 1C DE E3 00 78 D2 2C 4F 91 8F AF 0B F6 0C 31 098E C500: 0A 12 D3 E7 FA B8 40 DE 0D 71 94 88 5D 4C 97 21 098E C500: 0A 12 D3 E7 FA B8 40 DE 0D 51 94 88 5D 4C 97 21 098E C540: 93 F2 58 C7 49 9A AA EB 30 3D 28 AA E3 09 4B 7B 098E C540: 93 F2 58 C7 49 9A AA EB 30 3C 28 AA E3 09 4B 7B 0FDE C420: F3 6A C2 38 76 43 9E 86 0D 9C 89 86 F1 E6 B0 F2 0FDE C420: F3 6A C2 38 76 43 9E 86 0D DC 89 86 F1 E6 B0 F2 0FDE C430: 38 E4 69 2E 22 1D E4 FF 90 A7 C6 E8 9F 08 4C 98 0FDE C430: 38 E4 69 2E 22 1D E4 FF 90 A5 C6 E8 9F 08 4C 98 1214 A4C0: 24 D6 56 AC 8B D8 D0 9B D2 62 7B 83 C7 0B 3D BE 1214 A4C0: 24 D4 56 AC 8B D8 D0 9B D2 62 7B 83 C7 0B 3D BE 1214 A500: EC 51 D3 FF C5 7D 30 DD 6D 45 50 FE E9 64 A4 FC 1214 A500: EC 11 D3 FF C5 7D 30 DD 6D 45 50 FE E9 64 A4 FC 1214 A520: D9 4D 63 EB 77 4D F0 BE 5E B3 6B DE E6 D2 28 67 1214 A520: D9 4D 63 EB 77 4D F0 BE 5E 33 6B DE E6 D2 28 67 -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Thu, Sep 17, 2009 at 02:15:01PM +0200, Markus Trippelsdorf wrote: On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote: On Thu, Sep 17 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've hit the same problem again today: btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 1660028275 The file in question is: ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack I can't read the file directly, because of the csum mismatch: Chris, is there a way to force reading the file? Seems like that would be a very handy feature. Markus, not sure if that works, but you could always try and remount with data checksumming disabled. mount /dev/fooX -o remount,rw,nodatasum should do the trick. That doesn't work unfortunately, btrfs still calculates and compares the checksums (it won't write new ones I guess). Ah ok, as mentioned I wasn't sure whether that would work or not. I'll defer to Chris :-) Understood. I did some further investigations and was able to reconstruct exactly the same pack file in question by starting from an older backup copy of my git repro and then running the same git commands as previous. Then I did a binary comparison between this reconstructed file and a corrupted backup copy from the time before the csum errors occured (I automatically backup every 4h). Thanks to Chris' patch (from IRC) I was able to compare the file with the csum error to the reconstructed one. You'll find the reults as attachments. -- Markus 08F403A0 5D 8E B3 32 7D 8F 5D E7 54 B6 9D 1E E6 0C 9B 0D BE 1D 9D 0C 34 BA 7F FE 7F D4 E5 1A 0A 16 29 96 105AC3A0 76 80 1E 0A 3F 8A 7E FC B3 2E 2B 9E 9E 53 82 10 C3 F6 4B C1 C0 12 FC 61 A5 0E 63 70 B0 A4 7B 27 105AC3C0 DC AE 26 CE 48 5D CA 07 B7 26 B6 3C BC 91 AD 00 55 97 BF E4 8C D7 EF AA 28 B7 54 65 30 DB 78 A6 105AC3E0 26 90 18 88 8F F4 25 91 48 5F 9C F6 4F 0D 46 72 A2 04 77 1A AF FB 88 23 93 AF FB AA B9 82 BC CC 08F403A0 5D 8E B3 32 7D 8F 5D E7 54 B4 9D 1E E6 0C 9B 0D BE 1D 9D 0C 34 BA 7F FE 7F D4 E5 1A 0A 16 29 96 105AC3A0 76 80 1E 0A 3F 8A 7E FC B3 2E 2B 9E 9E 53 82 10 C3 F7 4B C1 C0 12 FC 61 A5 0E 63 70 B0 A4 7B 27 105AC3C0 DC AE 26 CE 48 5D CA 07 B7 77 B6 3C BC 91 AD 00 55 97 BF E4 8C D7 EF AA 28 A7 54 65 30 DB 78 A6 105AC3E0 26 90 18 88 8F F4 25 91 48 5F 9C F6 4F 0D 46 72 A2 04 77 1A AF FB 88 23 93 AF FB AA B9 82 BC CC
Re: btrfs csum failed on git .pack file
0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 B7 FD AB DA 74 2D 1C 0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 33 FD AB DA 74 2D 1C B7 = 10110111 33 = 00110011 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 5E 7E EB DA 5F D6 11 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 1E 7E EB DA 5F D6 11 5E = 0100 1E = 0000 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 4B 08 94 C0 65 17 3A 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 0B 08 94 C0 65 17 3A 4B = 01001011 0B = 1011 And so on. It looks like a few bits are getting flipped at the same byte offset. One can imagine software bugs that would do this, certainly, but upset hardware seems awfully likely too. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Thu, Sep 17, 2009 at 10:00:28AM -0700, Zach Brown wrote: 0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 B7 FD AB DA 74 2D 1C 0130 9FA0: E2 3B 43 AA 63 BF 28 B3 87 33 FD AB DA 74 2D 1C B7 = 10110111 33 = 00110011 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 5E 7E EB DA 5F D6 11 06CD DF90: B0 22 6B 46 9F ED 6E 47 73 1E 7E EB DA 5F D6 11 5E = 0100 1E = 0000 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 4B 08 94 C0 65 17 3A 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 0B 08 94 C0 65 17 3A 4B = 01001011 0B = 1011 And so on. It looks like a few bits are getting flipped at the same byte offset. One can imagine software bugs that would do this, certainly, but upset hardware seems awfully likely too. I'm afraid you're right. I did some further tests and now I'm pretty sure that a bad RAM module was the root cause of it all... Oh well. -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Thu, Sep 17, 2009 at 07:10:06PM +0200, Markus Trippelsdorf wrote: 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 4B 08 94 C0 65 17 3A 06CD DFC0: 0D 86 2B B2 57 A4 5A CD 78 0B 08 94 C0 65 17 3A 4B = 01001011 0B = 1011 And so on. It looks like a few bits are getting flipped at the same byte offset. One can imagine software bugs that would do this, certainly, but upset hardware seems awfully likely too. I'm afraid you're right. I did some further tests and now I'm pretty sure that a bad RAM module was the root cause of it all... Oh well. On the other hand, that what's so great in checksumming filesystems. You found bad module thanks to btrfs, otherwise you wouldn't suspect anything wrong. If you have had raid-1 for data, this corruption would have been fixed by btrfs. -- Tomasz Torcz 72-| 80-| xmpp: zdzich...@chrome.pl 72-| 80-| -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 9, 2009 at 11:01 PM, Oliver Mattos oliver.matto...@imperial.ac.uk wrote: What a strange coincidence that it affected git pack files in both cases. It's almost too improbable... I had similar problems with a broken git repository about two weeks ago. This was on a regular laptop harddrive that's never reported any errors. Unfortunately I rm'ed the repository and cloned it again so I can't check exactly what caused the corruption. Interestingly I've just discovered a broken tar.bz2 file that shows similar symptoms as what's been described here earlier. The first (and by far largest) chunk of the file consists entirely of 0x01 bytes followed by a smaller chunk that appears to be a PNG file and then arch/sparc/include/asm/fhc.h from the linux kernel. After this I have a small chunk of 0x00 bytes followed by arch/sparc/include/asm/floppy.h. This pattern is repeated several times with different include files from the kernel sources and the file ends with a small chunk of 0x01 bytes again. The harddisk in question is: === START OF INFORMATION SECTION === Model Family: Fujitsu MHV series Device Model: FUJITSU MHV2080BH Serial Number:NW05T6425FRY Firmware Version: 00840028 User Capacity:80,025,280,000 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is:Thu Sep 10 12:40:10 2009 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled As already mentioned it's never reported any errors and I also haven't seen any problems like this before when using ext3 or ext4. The broken file is available at http://omploader.org/vMmJtbg if that's any help. Regards, Bryan Østergaard -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). What a strange coincidence that it affected git pack files in both cases. It's almost too improbable... -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote: On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). Spooky, yes indeed that's the very same drive I'm using. Also see my postings on this very issue here, top two entries: http://axboe.livejournal.com/ So that pretty much looks like it reaffirms some of my suspicions. Is the drive in a laptop that you suspend and resume? No. I use it in my workstation, that I never switch off normally. What a strange coincidence that it affected git pack files in both cases. It's almost too improbable... Probably more than a coincidence I think, the question is what though... If it really was an SSD error, then it should happen randomly, messing up random files. But (contrary to your experience) I never had any issues with this SSD until this single failed checksum. -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote: On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). Spooky, yes indeed that's the very same drive I'm using. Also see my postings on this very issue here, top two entries: http://axboe.livejournal.com/ So that pretty much looks like it reaffirms some of my suspicions. Is the drive in a laptop that you suspend and resume? No. I use it in my workstation, that I never switch off normally. OK, so we can rule out any interactions between suspending and resuming the drive. That's at least something. What a strange coincidence that it affected git pack files in both cases. It's almost too improbable... Probably more than a coincidence I think, the question is what though... If it really was an SSD error, then it should happen randomly, messing up random files. But (contrary to your experience) I never had any issues with this SSD until this single failed checksum. Not necessarily, they may be some pattern to how the pack files are accessed (that propagates through to the drive). The fact is, 0xff is an extremely weird piece of corruption that just reeks of bad flash blocks. It's almost impossible that it is a software error. If it was all zeroes, or a bit flip, the likely causes would be very different. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote: On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). Spooky, yes indeed that's the very same drive I'm using. Also see my postings on this very issue here, top two entries: http://axboe.livejournal.com/ So that pretty much looks like it reaffirms some of my suspicions. Is the drive in a laptop that you suspend and resume? If you're on firmware 1.30, the changlog includes some fixes which may be relevant, eg if block 0 is relative, or you're suspending/resuming: - Race condition occurred during soft reset handler - If read fail occurs during reading stamp information, firmware corrupted block 0. - Power off recovery had bug in certain circumstances http://www.ocztechnologyforum.com/forum/showthread.php?t=57516 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 09 2009, Daniel J Blueman wrote: On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote: On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). Spooky, yes indeed that's the very same drive I'm using. Also see my postings on this very issue here, top two entries: http://axboe.livejournal.com/ So that pretty much looks like it reaffirms some of my suspicions. Is the drive in a laptop that you suspend and resume? If you're on firmware 1.30, the changlog includes some fixes which may be relevant, eg if block 0 is relative, or you're suspending/resuming: - Race condition occurred during soft reset handler - If read fail occurs during reading stamp information, firmware corrupted block 0. - Power off recovery had bug in certain circumstances http://www.ocztechnologyforum.com/forum/showthread.php?t=57516 The issue is pretty much moot at this point, since OCZ support were not really interested in providing any sort of real technical support to find out what really caused this issue. My main worry was reliability of these cheaper SSD drives, and that worry is still not resolved. If you read the blog entries, I do comment on the apparently scary basic bugs taht are still being fixed on the Indilinx controllers. I do expect some basic level of data integrity from a consumer product and at least some interest in resolving weird corruption issues if things go wrong. Since OCZ cannot provide anything like that, I have a hard time recommending these drives for anything but very casual use. Fast, cheap, reliable. Pick any two. My drive was running 1.10 at the time of the problem. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 9, 2009 at 9:26 AM, Jens Axboejens.ax...@oracle.com wrote: On Wed, Sep 09 2009, Daniel J Blueman wrote: On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote: On Wed, Sep 09 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote: On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. If by cheap SSD drive you mean an Indilinx Barefoot based one, we might be using the same hardware (30GB Vertex in my case). Spooky, yes indeed that's the very same drive I'm using. Also see my postings on this very issue here, top two entries: http://axboe.livejournal.com/ So that pretty much looks like it reaffirms some of my suspicions. Is the drive in a laptop that you suspend and resume? If you're on firmware 1.30, the changlog includes some fixes which may be relevant, eg if block 0 is relative, or you're suspending/resuming: - Race condition occurred during soft reset handler - If read fail occurs during reading stamp information, firmware corrupted block 0. - Power off recovery had bug in certain circumstances http://www.ocztechnologyforum.com/forum/showthread.php?t=57516 The issue is pretty much moot at this point, since OCZ support were not really interested in providing any sort of real technical support to find out what really caused this issue. My main worry was reliability of these cheaper SSD drives, and that worry is still not resolved. If you read the blog entries, I do comment on the apparently scary basic bugs taht are still being fixed on the Indilinx controllers. I do expect some basic level of data integrity from a consumer product and at least some interest in resolving weird corruption issues if things go wrong. Since OCZ cannot provide anything like that, I have a hard time recommending these drives for anything but very casual use. Fast, cheap, reliable. Pick any two. My drive was running 1.10 at the time of the problem. It looks like we need a small tool which performs patterned block I/O to the device, updating a checksum as it goes, and performing integrity sweeps at intervals, lower level than fsx. It must be trusted or not. I had a problem like this with nVidia CK804/MCP55 chipsets corrupting data under a triple-edge case workload. -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Wed, Sep 09, 2009 at 09:37:42AM +0100, Daniel J Blueman wrote: http://www.ocztechnologyforum.com/forum/showthread.php?t=57516 The issue is pretty much moot at this point, since OCZ support were not really interested in providing any sort of real technical support to find out what really caused this issue. My main worry was reliability of these cheaper SSD drives, and that worry is still not resolved. If you read the blog entries, I do comment on the apparently scary basic bugs taht are still being fixed on the Indilinx controllers. I do expect some basic level of data integrity from a consumer product and at least some interest in resolving weird corruption issues if things go wrong. Since OCZ cannot provide anything like that, I have a hard time recommending these drives for anything but very casual use. Fast, cheap, reliable. Pick any two. My drive was running 1.10 at the time of the problem. It looks like we need a small tool which performs patterned block I/O to the device, updating a checksum as it goes, and performing integrity sweeps at intervals, lower level than fsx. It must be trusted or not. I had a problem like this with nVidia CK804/MCP55 chipsets corrupting data under a triple-edge case workload. Well, just use git ;) Apply a bunch of patches (say the mm tree) with guilt and repack in a loop. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
What a strange coincidence that it affected git pack files in both cases. It's almost too improbable... Probably more than a coincidence I think, the question is what though... Some SSD drives (or rather the cheap wear levelling controllers in things like USB sticks) have firmware which tries to recognise certain data structures of common filesystems (like FAT and NTFS), and uses information in those data structures to optimise the allocation and erasure of blocks (for example the free space linked list in FAT). If the data you were saving to the disk was similar to one of those data structures, you might've triggered one of those algorithms, which would cause data corruption. This is common in high performance usb sticks because they want to pre-erase blocks on file deletion for operating systems not supporting SCSI TRIM - I imagine the same technology might carry across to cheap SSD's. Not much BTRFS can do about it though. If the piece of data that triggers the bug could be identified, workarounds could possibly be introduced for the particular buggy controllers. Oliver Mattos (resent as I emailled wrong recipients before) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Tue, Sep 08 2009, Markus Trippelsdorf wrote: On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: linux % ls -l ./.git/objects/pack/ total 562848 -rw-r--r-- 1 markus markus 1891324 2008-11-29 19:49 pack-011b43fa6956667db5e67fba859e40cb4b154226.idx -rw-r--r-- 1 markus markus 44002938 2008-11-29 19:54 pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 pack-67be92b3fab3dab175683582dab0b719517e55a5.idx -r--r--r-- 1 markus markus 36061684 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER I'm running the latest git kernel and I've been using btrfs as my root fs for the last few weeks without problems so far. Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. Darn, that's too bad. The corruption issue I had was also in a git pack file. It was fine one day, bad the next. Turned out to be 16kb of 0xff in the file, and I blamed it on the (cheap) SSD drive that hosted the local git repo. It's still the most likely explanation given the nature of the problem, however it would have been really interesting to see what corruption you had. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: linux % ls -l ./.git/objects/pack/ total 562848 -rw-r--r-- 1 markus markus 1891324 2008-11-29 19:49 pack-011b43fa6956667db5e67fba859e40cb4b154226.idx -rw-r--r-- 1 markus markus 44002938 2008-11-29 19:54 pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 pack-67be92b3fab3dab175683582dab0b719517e55a5.idx -r--r--r-- 1 markus markus 36061684 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER I'm running the latest git kernel and I've been using btrfs as my root fs for the last few weeks without problems so far. Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed on git .pack file
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote: On Mon, Sep 07 2009, Markus Trippelsdorf wrote: Just got this error today in my dmesg: btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798 linux % find . -inum 1483065 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack It's the main pack file from my git linux kernel tree: linux % ls -l ./.git/objects/pack/ total 562848 -rw-r--r-- 1 markus markus 1891324 2008-11-29 19:49 pack-011b43fa6956667db5e67fba859e40cb4b154226.idx -rw-r--r-- 1 markus markus 44002938 2008-11-29 19:54 pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp -rw-r--r-- 1 markus markus730332 2008-11-29 19:49 pack-67be92b3fab3dab175683582dab0b719517e55a5.idx -r--r--r-- 1 markus markus 36061684 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48 pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack -rw--- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER I'm running the latest git kernel and I've been using btrfs as my root fs for the last few weeks without problems so far. Hmm, I ran into something very similar. Care to check what the corrupted block of data looks like (and how big it is)? I've already deleted the file in question unfortunately. On IRC Chris decided that either bad RAM or a harddrive error was the most likely reason for this chechsum mismatch. -- Markus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html