Re: volume broken? btrfsck fails

2010-08-14 Thread Thomas Kuther
On Mi, 04.08.10 21:30 Chris Mason chris.ma...@oracle.com wrote:

 On Wed, Aug 04, 2010 at 08:48:40PM +0200, Thomas Kuther wrote:
  On Di, 06.07.10 20:16 Chris Mason chris.ma...@oracle.com wrote:
  
   On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
Hi,

i think my btrfs volume is hosed it mounts okay, but iostat
shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
verify failed on x wanted y found z'. then after a while i can't
read from it (access to the filesystem freezes).

the machine had crashed (prob from some other process), and upon
reboot i've been experience this problem since.

can anyone provide any guidance in how to proceed?
   
   These are definitely corruptions, and they probably came from the
   crash. Can you tell me more about the crash? (Power failure, what
   is the storage underneath etc, what are the write cache
   settings).  We don't expect these kinds corruptions to happen.
   
   Yan Zheng is making a lot of progress on btrfsck, but I don't
   think you'll want to be one of the first testers there.  I can
   definitely help copy things off if you're having trouble
   accessing the FS.
   
   -chris
  
  Hello Chris,
  
  sorry if I'm hijacking this thread. I got a similar problem,
  probably caused by a system crash due to faulty/badly timed memory
  dimms. The system suddenly hardlocked during write activity.
  
  - kernel is 2.6.35
  - btrfs on top of a md raid5, which looks healthy. Desktop SATA
  disks.
  
  # cat /proc/mdstat|grep -A1 md0
  md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
  
  # btrfsck
  usage: btrfsck dev
  Btrfs v0.19-16-g075587c-dirty
  
  # btrfsck /dev/md0
  parent transid verify failed on 2419218964480 wanted 127839 found
  127260 parent transid verify failed on 2419218964480 wanted 127839
  found 127260 parent transid verify failed on 2419218915328 wanted
  127839 found 127260 parent transid verify failed on 2419218915328
  wanted 127839 found 127260 parent transid verify failed on
  2419214266368 wanted 127839 found 127837 parent transid verify
  failed on 2419214266368 wanted 127839 found 127837 parent transid
  verify failed on 2419214266368 wanted 127839 found 127837
  Segmentation fault
  
  Mount endlessly loops, like explained in this thread.
  
  If there is a way, I would really like some aid copying the data
  off. The backup is quite out of date, shame on me.
 
 No problem, I'll get a test patch out in the morning.
 
 -chris
 

Hi Chris,

did you find the time to get that patch done meanwhile?
I'm willing to test.

Seems more people get this error after power outages, suspending or
similar.

Thanks in advance.

~Thomas


signature.asc
Description: PGP signature


Re: volume broken? btrfsck fails

2010-08-04 Thread Thomas Kuther
On Di, 06.07.10 20:16 Chris Mason chris.ma...@oracle.com wrote:

 On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
  Hi,
  
  i think my btrfs volume is hosed it mounts okay, but iostat
  shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
  verify failed on x wanted y found z'. then after a while i can't
  read from it (access to the filesystem freezes).
  
  the machine had crashed (prob from some other process), and upon
  reboot i've been experience this problem since.
  
  can anyone provide any guidance in how to proceed?
 
 These are definitely corruptions, and they probably came from the
 crash. Can you tell me more about the crash? (Power failure, what is
 the storage underneath etc, what are the write cache settings).  We
 don't expect these kinds corruptions to happen.
 
 Yan Zheng is making a lot of progress on btrfsck, but I don't think
 you'll want to be one of the first testers there.  I can definitely
 help copy things off if you're having trouble accessing the FS.
 
 -chris

Hello Chris,

sorry if I'm hijacking this thread. I got a similar problem, probably
caused by a system crash due to faulty/badly timed memory dimms. The
system suddenly hardlocked during write activity.

- kernel is 2.6.35
- btrfs on top of a md raid5, which looks healthy. Desktop SATA disks.

# cat /proc/mdstat|grep -A1 md0
md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
  2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

# btrfsck
usage: btrfsck dev
Btrfs v0.19-16-g075587c-dirty

# btrfsck /dev/md0
parent transid verify failed on 2419218964480 wanted 127839 found 127260
parent transid verify failed on 2419218964480 wanted 127839 found 127260
parent transid verify failed on 2419218915328 wanted 127839 found 127260
parent transid verify failed on 2419218915328 wanted 127839 found 127260
parent transid verify failed on 2419214266368 wanted 127839 found 127837
parent transid verify failed on 2419214266368 wanted 127839 found 127837
parent transid verify failed on 2419214266368 wanted 127839 found 127837
Segmentation fault

Mount endlessly loops, like explained in this thread.

If there is a way, I would really like some aid copying the data off.
The backup is quite out of date, shame on me.

Best regards,
Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-08-04 Thread Chris Mason
On Wed, Aug 04, 2010 at 08:48:40PM +0200, Thomas Kuther wrote:
 On Di, 06.07.10 20:16 Chris Mason chris.ma...@oracle.com wrote:
 
  On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
   Hi,
   
   i think my btrfs volume is hosed it mounts okay, but iostat
   shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
   verify failed on x wanted y found z'. then after a while i can't
   read from it (access to the filesystem freezes).
   
   the machine had crashed (prob from some other process), and upon
   reboot i've been experience this problem since.
   
   can anyone provide any guidance in how to proceed?
  
  These are definitely corruptions, and they probably came from the
  crash. Can you tell me more about the crash? (Power failure, what is
  the storage underneath etc, what are the write cache settings).  We
  don't expect these kinds corruptions to happen.
  
  Yan Zheng is making a lot of progress on btrfsck, but I don't think
  you'll want to be one of the first testers there.  I can definitely
  help copy things off if you're having trouble accessing the FS.
  
  -chris
 
 Hello Chris,
 
 sorry if I'm hijacking this thread. I got a similar problem, probably
 caused by a system crash due to faulty/badly timed memory dimms. The
 system suddenly hardlocked during write activity.
 
 - kernel is 2.6.35
 - btrfs on top of a md raid5, which looks healthy. Desktop SATA disks.
 
 # cat /proc/mdstat|grep -A1 md0
 md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
   2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
 
 # btrfsck
 usage: btrfsck dev
 Btrfs v0.19-16-g075587c-dirty
 
 # btrfsck /dev/md0
 parent transid verify failed on 2419218964480 wanted 127839 found 127260
 parent transid verify failed on 2419218964480 wanted 127839 found 127260
 parent transid verify failed on 2419218915328 wanted 127839 found 127260
 parent transid verify failed on 2419218915328 wanted 127839 found 127260
 parent transid verify failed on 2419214266368 wanted 127839 found 127837
 parent transid verify failed on 2419214266368 wanted 127839 found 127837
 parent transid verify failed on 2419214266368 wanted 127839 found 127837
 Segmentation fault
 
 Mount endlessly loops, like explained in this thread.
 
 If there is a way, I would really like some aid copying the data off.
 The backup is quite out of date, shame on me.

No problem, I'll get a test patch out in the morning.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-11 Thread Yee-Ting Li
so after leaving the array for a while, with the disk churning away for a few 
days, it stopped. i copied some files off the disk (everything seems okay) and 
decided to unmount and run btrfsck again - this time i get a different error:

$ sudo /usr/local/bin/btrfsck /dev/sdf
failed to read /dev/sr0
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 2703914500096 wanted 9066 found 7543
parent transid verify failed on 2703873781760 wanted 9074 found 9022
parent transid verify failed on 2703877693440 wanted 9070 found 9062
parent transid verify failed on 2703921868800 wanted 9066 found 7543
parent transid verify failed on 2703922647040 wanted 9066 found 7543
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 270391922 wanted 9066 found 7543
parent transid verify failed on 2703917125632 wanted 9066 found 7543
parent transid verify failed on 2703879294976 wanted 9075 found 9055
parent transid verify failed on 2703883194368 wanted 9075 found 9057
parent transid verify failed on 2703922688000 wanted 9066 found 7543
parent transid verify failed on 2703873781760 wanted 9074 found 9022
parent transid verify failed on 2703877693440 wanted 9070 found 9062
parent transid verify failed on 2703921868800 wanted 9066 found 7543
parent transid verify failed on 2703922647040 wanted 9066 found 7543
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 270391922 wanted 9066 found 7543
bad block 2703873781760
Extent back ref already exists for 365342720 parent 0 root 2 
Extent back ref already exists for 2221870616576 parent 0 root 2 
Extent back ref already exists for 383959040 parent 0 root 2 
Extent back ref already exists for 367714304 parent 0 root 2 
Extent back ref already exists for 706744320 parent 0 root 2 
Extent back ref already exists for 368672768 parent 0 root 2 
Extent back ref already exists for 315338752 parent 0 root 2 
Extent back ref already exists for 377356288 parent 0 root 2 
Extent back ref already exists for 368914432 parent 0 root 2 
Extent back ref already exists for 369807360 parent 0 root 2 
Extent back ref already exists for 2221957713920 parent 0 root 2 
Extent back ref already exists for 370139136 parent 0 root 2 
Extent back ref already exists for 369811456 parent 0 root 2 
Extent back ref already exists for 370122752 parent 0 root 2 
Extent back ref already exists for 365936640 parent 0 root 2 
Extent back ref already exists for 2221948424192 parent 0 root 2 
Extent back ref already exists for 3624002596864 parent 0 root 2 
Extent back ref already exists for 706789376 parent 0 root 2 
Extent back ref already exists for 2703778734080 parent 0 root 2 
Extent back ref already exists for 372252672 parent 0 root 2 
Extent back ref already exists for 372109312 parent 0 root 2 
Extent back ref already exists for 372989952 parent 0 root 2 
Extent back ref already exists for 373657600 parent 0 root 2 
Extent back ref already exists for 374521856 parent 0 root 2 
Extent back ref already exists for 374628352 parent 0 root 2 
Extent back ref already exists for 374976512 parent 0 root 2 
Extent back ref already exists for 2221948403712 parent 0 root 2 
Extent back ref already exists for 375586816 parent 0 root 2 
Extent back ref already exists for 375906304 parent 0 root 2 
Extent back ref already exists for 376639488 parent 0 root 2 
Extent back ref already exists for 706818048 parent 0 root 2 
Extent back ref already exists for 383778816 parent 0 root 2 
Extent back ref already exists for 377626624 parent 0 root 2 
leaf parent key incorrect 2703874203648
bad block 2703874203648
leaf 080487424 items 37 free space 1183 generation 10279 owner 2
fs uuid ea7ea0b3-bc42-4b0c-9173-346df61d4454
chunk uuid 886b0dfb-fa34-49c7-9ab0-2589603f8ae4
item 0 key (364388352 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
extent refs 1 gen 1061 flags 2
tree block key (18446744073709551606 80 200172044288) level 0
tree block backref root 7
item 1 key (364392448 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
extent refs 1 gen 1061 flags 2
tree block key (18446744073709551606 80 200220258304) level 0
tree block backref root 7
item 2 key (364396544 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
extent refs 1 gen 1061 flags 2
tree block key (18446744073709551606 80 200179384320) level 0
tree block backref root 7
item 3 key (364400640 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
extent refs 1 gen 1061 flags 2
tree block key (18446744073709551606 80 200220258304) level 0
tree block backref root 7
item 4 key (364404736 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
extent refs 1 gen 1061 flags 2
tree block key (18446744073709551606 80 

Re: volume broken? btrfsck fails

2010-07-11 Thread Chris Mason
On Wed, Jul 07, 2010 at 10:39:48PM -0400, Daniel Kozlowski wrote:
  Looks like we're looping on a single block.  What happens when you
  dmesg -n1 to cut down on the console traffic?
 
  Nothing changes I still have endless repeats of
 
  parent transid verify failed on 1682586464256 wanted 285114 found 11257
 
  If that doesn't help we can change it to spit a stack trace to figure
  out where the looping is happening.  We should be erroring out instead
  of hitting it over and over again.
 
  In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
  however apparently you can't attach gdb to a kernel thread like that
  If you could assist me in obtaining a call trace I will gladly attempt
  to resolve the matter.
 
 Ok I had some free time and decided to excersice my googlefoo and came
 up with this trace
 
 parent transid verify failed on 3241193205760 wanted 285287 found 281382
 Pid: 2163, comm: mount Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1
 Call Trace:
  [a047c376] verify_parent_transid+0xb7/0xfe [btrfs]
  [a047c4f2] btrfs_buffer_uptodate+0x49/0x59 [btrfs]
  [a04686a2] read_block_for_search+0x8f/0x289 [btrfs]
  [a046d554] btrfs_search_slot+0x3ae/0x513 [btrfs]
  [a0470ece] btrfs_read_block_groups+0x73/0x526 [btrfs]
  [8149b0a3] ? _raw_spin_unlock+0x2b/0x2f
  [a0469f56] ? btrfs_root_node+0x2a/0x32 [btrfs]
  [a047d287] ? find_and_setup_root+0xab/0xbc [btrfs]
  [a04800eb] open_ctree+0xf19/0x143a [btrfs]
  [a0467960] btrfs_get_sb+0x1ce/0x40b [btrfs]
  [810e9cfd] ? free_pages+0x49/0x4e
  [8112c9f9] vfs_kern_mount+0xbd/0x19b
  [8112cb3f] do_kern_mount+0x4d/0xed
  [81143742] do_mount+0x776/0x7ed
  [81143841] sys_mount+0x88/0xc2
  [81009c32] system_call_fastpath+0x16/0x1b


Ok, so we're never getting out of mount.  A recent change to
read_block_for_search is causing this problem.  We're looping over and
over again because it is returning -EAGAIN instead of -EIO.

Thanks for nailing this trace down, I'll get a fix in for the looping.
I'm afraid it won't bring back the filesystem though, you'll end up
failing in mount.  Would you like some helping copying the data off?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-11 Thread Yee-Ting Li

On 11 Jul 2010, at 17:43, Chris Mason wrote:
 Was this after a fresh mkfs?  Clearly things are very corrupt on this
 original drive.  It would be a good test case for Yan Zhengs new fsck
 code, but first I'd like to figure out if you're still seeing the old
 corruption of if you've started over.

nope, same disk as before when the btrfsck exited with:

btrfsck: disk-io.c:410: find_and_setup_root: Assertion `!(ret)' failed.

the strange thing was that i'm pretty sure that btrfs crashed the system a 
couple of times (hung). after reboot the mounted drive would basically churn 
away for hours and spit out lots of the parent transid messages. but after a 
while it stops and everything seems fine again.

i don't mind losing files on the disk array, but it would be nice if it could 
tell me the actual filenames which are corrupt.

Yee.--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-08 Thread Daniel J Blueman
On 8 July 2010 01:21, Daniel Kozlowski dan.kozlow...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 8:19 PM, Chris Mason chris.ma...@oracle.com wrote:
 I am also having the same problem with a slightly different setup. In My 
 case I
 cannot mount the filesystem.

 What is your hardware setup here?  Including write cache settings.  Did
 you have craces with 2.6.35-rc1 or rc2?

 My setup is

 Eight hard Drive
 four 1TB Drives
 four 500GB Drives
 All drives are connected through a 3ware Inc 9550SX SATA-II RAID PCI-X card
 The card is configured to export all drives essentially acting as a
 SATA port multiplier. (drives show up sdb - sdi)
 Drives are configured in btrfs raid0
 Filesystem is mounted using:
 mount -t btrfs /dev/sdb /opt

 I have been able to lock up the system on
 2.6.33.5-124.fc13.x86_64
 2.6.35-0.13.rc3.git2.fc14.x86_64
 2.6.35-0.23.rc3.git6.fc14.x86_64
 and
 2.6.35-0.23.rc3.git6.fc14.x86_64 with a DKMS build of the btrfs module
 (Btrfs v0.19-16-g075587c-dirty)

 If you would like me to pull out another version of the kernel or roll
 back specific commits from the kernel module I can

 I have been able to get different responses form different version
 2.6.33.* - This will mount the volume but will hang shortly after
 mounting when reading data form the filesystem ( ls /opt) writes a
 bunch of transid verify failed messages hangs on ls
 2.6.34.* - Will not mount at all still gives the transid verify failed
  hands on mount


 Looks like we're looping on a single block.  What happens when you
 dmesg -n1 to cut down on the console traffic?

 Nothing changes I still have endless repeats of

 parent transid verify failed on 1682586464256 wanted 285114 found 11257

 If that doesn't help we can change it to spit a stack trace to figure
 out where the looping is happening.  We should be erroring out instead
 of hitting it over and over again.

 In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
 however apparently you can't attach gdb to a kernel thread like that
 If you could assist me in obtaining a call trace I will gladly attempt
 to resolve the matter.

For grabbing kernel backtraces:

$ sudo -s
# dmesg -c /dev/null
# echo t /proc/sysrq-trigger
# dmesg backtraces.txt
(there are other ways with

The problem is that you'll be taking instantaneous snapshots, which
may or may not be representative of the main looping, but over a few
shots should be.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-07 Thread Daniel Kozlowski
 Looks like we're looping on a single block.  What happens when you
 dmesg -n1 to cut down on the console traffic?

 Nothing changes I still have endless repeats of

 parent transid verify failed on 1682586464256 wanted 285114 found 11257

 If that doesn't help we can change it to spit a stack trace to figure
 out where the looping is happening.  We should be erroring out instead
 of hitting it over and over again.

 In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
 however apparently you can't attach gdb to a kernel thread like that
 If you could assist me in obtaining a call trace I will gladly attempt
 to resolve the matter.

Ok I had some free time and decided to excersice my googlefoo and came
up with this trace

parent transid verify failed on 3241193205760 wanted 285287 found 281382
Pid: 2163, comm: mount Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1
Call Trace:
 [a047c376] verify_parent_transid+0xb7/0xfe [btrfs]
 [a047c4f2] btrfs_buffer_uptodate+0x49/0x59 [btrfs]
 [a04686a2] read_block_for_search+0x8f/0x289 [btrfs]
 [a046d554] btrfs_search_slot+0x3ae/0x513 [btrfs]
 [a0470ece] btrfs_read_block_groups+0x73/0x526 [btrfs]
 [8149b0a3] ? _raw_spin_unlock+0x2b/0x2f
 [a0469f56] ? btrfs_root_node+0x2a/0x32 [btrfs]
 [a047d287] ? find_and_setup_root+0xab/0xbc [btrfs]
 [a04800eb] open_ctree+0xf19/0x143a [btrfs]
 [a0467960] btrfs_get_sb+0x1ce/0x40b [btrfs]
 [810e9cfd] ? free_pages+0x49/0x4e
 [8112c9f9] vfs_kern_mount+0xbd/0x19b
 [8112cb3f] do_kern_mount+0x4d/0xed
 [81143742] do_mount+0x776/0x7ed
 [81143841] sys_mount+0x88/0xc2
 [81009c32] system_call_fastpath+0x16/0x1b


 Dan Kozlowski

 --
 S.D.G.




-- 
S.D.G.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-07 Thread Daniel Kozlowski
On Tue, Jul 6, 2010 at 8:19 PM, Chris Mason chris.ma...@oracle.com wrote:
 I am also having the same problem with a slightly different setup. In My 
 case I
 cannot mount the filesystem.

 What is your hardware setup here?  Including write cache settings.  Did
 you have craces with 2.6.35-rc1 or rc2?

My setup is

Eight hard Drive
four 1TB Drives
four 500GB Drives
All drives are connected through a 3ware Inc 9550SX SATA-II RAID PCI-X card
The card is configured to export all drives essentially acting as a
SATA port multiplier. (drives show up sdb - sdi)
Drives are configured in btrfs raid0
Filesystem is mounted using:
mount -t btrfs /dev/sdb /opt

I have been able to lock up the system on
2.6.33.5-124.fc13.x86_64
2.6.35-0.13.rc3.git2.fc14.x86_64
2.6.35-0.23.rc3.git6.fc14.x86_64
and
2.6.35-0.23.rc3.git6.fc14.x86_64 with a DKMS build of the btrfs module
(Btrfs v0.19-16-g075587c-dirty)

If you would like me to pull out another version of the kernel or roll
back specific commits from the kernel module I can

I have been able to get different responses form different version
2.6.33.* - This will mount the volume but will hang shortly after
mounting when reading data form the filesystem ( ls /opt) writes a
bunch of transid verify failed messages hangs on ls
2.6.34.* - Will not mount at all still gives the transid verify failed
 hands on mount


 Looks like we're looping on a single block.  What happens when you
 dmesg -n1 to cut down on the console traffic?

Nothing changes I still have endless repeats of

parent transid verify failed on 1682586464256 wanted 285114 found 11257

 If that doesn't help we can change it to spit a stack trace to figure
 out where the looping is happening.  We should be erroring out instead
 of hitting it over and over again.

In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
however apparently you can't attach gdb to a kernel thread like that
If you could assist me in obtaining a call trace I will gladly attempt
to resolve the matter.

Dan Kozlowski

-- 
S.D.G.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-06 Thread Chris Mason
On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
 Hi,
 
 i think my btrfs volume is hosed it mounts okay, but iostat shows 
 /dev/sdg on 100% load. dmesg shows lots of 'parent transid verify failed on x 
 wanted y found z'. then after a while i can't read from it (access to the 
 filesystem freezes).
 
 the machine had crashed (prob from some other process), and upon reboot i've 
 been experience this problem since.
 
 can anyone provide any guidance in how to proceed?

These are definitely corruptions, and they probably came from the crash.
Can you tell me more about the crash? (Power failure, what is the
storage underneath etc, what are the write cache settings).  We don't
expect these kinds corruptions to happen.

Yan Zheng is making a lot of progress on btrfsck, but I don't think
you'll want to be one of the first testers there.  I can definitely help
copy things off if you're having trouble accessing the FS.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-06 Thread Yee-Ting Li

On 6 Jul 2010, at 17:16, Chris Mason wrote:
 These are definitely corruptions, and they probably came from the crash.
 Can you tell me more about the crash? (Power failure, what is the
 storage underneath etc, what are the write cache settings).  We don't
 expect these kinds corruptions to happen.

i think what happened was that the power got pulled accidentally. at the time i 
had a drive (sde) on an external usb controller. the other two drives are 
internal on a nForce 730i chipset. they are all 2TB WD drives (combination of 
EADS and EARS drives). according to hdparm all the drives have write-caching on.

 Yan Zheng is making a lot of progress on btrfsck, but I don't think
 you'll want to be one of the first testers there.  I can definitely help
 copy things off if you're having trouble accessing the FS.

i'm performing rsyncs at the moment to get some of the data off. i can read the 
drive fine, but after a while (i guess when something tries to access the 
corrupt file) i get the dmesgs again, and high cpu on the two btrfs-transacti 
and btrfs-endio-met threads.

is there a way i can determine the actual filenames that may be corrupt?

also, as i'm not using the /dev/sde drive (btrfs-show gives used 0.00TB) as i 
didn't do a balance after i installed it - is there a way i can degrade the 
array to recover that disk and keep the array with just two disks? then i will 
have enough storage to copy the 'good' files off :)

once i have a replica, then i can test whatever code you'd like to throw at me 
:)

cheers,

Yee.--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-04 Thread Yee-Ting Li

On 1 Jul 2010, at 05:51, Daniel Kozlowski wrote:
 I am also having the same problem with a slightly different setup. In My case 
 I 
 cannot mount the filesystem. mount, btrfs-endio-met and kblockd/0 will all 
 continually run until the system freezes up and requires a power cycle.

have you tried mounting with '-o degraded'?

having monitored the system for a while, i also think that in fact it's btrfs 
that's killing my system. i'm on ubuntu 10.4 with:

$ uname -a
Linux htpc 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64 
GNU/Linux

using the default kernel module, but git'd out the tools.

following the other thread 'Is there a more aggressive fixer than btrfsck?' i 
suspect that we'll just have to wait until some actual fsck operations are 
available for btrfs :(

on my system, it's btrfs-endio-met (only 1 out of 4) and btrfs-transacti (1 out 
of 2) that is taking up all the cpu/io wait cycles.

i wonder if it's only certain files on the array that are hosed; if that's the 
case is there a way i can map the kernel messages to a real filename? i don't 
mind loosing the odd file on this array, but i don't fancy copying it all over 
to somewhere else (yeah-yeah, up to date backups blah blah!) - i figured given 
the momentum btrfs was gaining it would be much more stable than this :(

Yee.--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-01 Thread Daniel Kozlowski
Yee-Ting Li yee379 at gmail.com writes:

 
 Hi,
 
 i think my btrfs volume is hosed it mounts okay, but iostat shows 
 /dev/sdg 
on 100% load. dmesg shows lots
 of 'parent transid verify failed on x wanted y found z'. then after a while i 
can't read from it (access to the
 filesystem freezes).
 
 the machine had crashed (prob from some other process), and upon reboot i've 
been experience this problem since.
 
 can anyone provide any guidance in how to proceed?
 
 cheers,
 
 Yee.

I am also having the same problem with a slightly different setup. In My case I 
cannot mount the filesystem. mount, btrfs-endio-met and kblockd/0 will all 
continually run until the system freezes up and requires a power cycle. I have 
both the kernel module and the tools checked out from git so if you have any 
ideas on fix's I can build them and test it out. 

here is some information about my setup 

[r...@solution ~]# uname -a
Linux solution.bcig 2.6.35-0.13.rc3.git2.fc14.x86_64 #1 SMP Mon Jun 28 19:27:35 
UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
[r...@solution ~]# 

[r...@solution ~]# btrfs-show 
Label: store  uuid: 4ba1cc6b-e12a-454a-a064-f4019312c063
Total devices 7 FS bytes used 1.15TB
devid1 size 931.51GB used 415.55GB path /dev/sdb
devid2 size 931.51GB used 518.50GB path /dev/sdc
devid3 size 931.51GB used 342.04GB path /dev/sdd
devid4 size 931.51GB used 523.54GB path /dev/sde
devid5 size 465.76GB used 402.54GB path /dev/sdf
devid6 size 465.76GB used 382.54GB path /dev/sdg
devid7 size 465.76GB used 367.54GB path /dev/sdh

Btrfs v0.19-16-g075587c-dirty
[r...@solution ~]# 

[r...@solution ~]# tail  -n 12 /var/log/messages
Jul  1 04:47:03 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: verify_parent_transid: 9244 callbacks 
suppressed
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
[r...@solution ~]# 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


volume broken? btrfsck fails

2010-06-26 Thread Yee-Ting Li
Hi,

i think my btrfs volume is hosed it mounts okay, but iostat shows /dev/sdg 
on 100% load. dmesg shows lots of 'parent transid verify failed on x wanted y 
found z'. then after a while i can't read from it (access to the filesystem 
freezes).

the machine had crashed (prob from some other process), and upon reboot i've 
been experience this problem since.

can anyone provide any guidance in how to proceed?

cheers,

Yee.

$ sudo /usr/local/bin/btrfs-show 
failed to read /dev/sr0

Label: none  uuid: ea7ea0b3-bc42-4b0c-9173-346df61d4454
Total devices 3 FS bytes used 3.56TB
devid3 size 1.82TB used 0.00 path /dev/sde
devid1 size 1.82TB used 1.82TB path /dev/sdf
devid2 size 1.82TB used 1.82TB path /dev/sdg

Btrfs v0.19-16-g075587c


$ sudo /usr/local/bin/btrfsck /dev/sdf 
failed to read /dev/sr0
parent transid verify failed on 2703873638400 wanted 9074 found 9016
parent transid verify failed on 2703884750848 wanted 9074 found 9055
parent transid verify failed on 2703884763136 wanted 9074 found 9060
parent transid verify failed on 2703883599872 wanted 9074 found 9034
parent transid verify failed on 2703920717824 wanted 9066 found 7543
parent transid verify failed on 2703912325120 wanted 9066 found 7543
parent transid verify failed on 2703912034304 wanted 9066 found 7543
parent transid verify failed on 2703881900032 wanted 9071 found 9060
parent transid verify failed on 2703881793536 wanted 9069 found 9057
bad block 2703860367360
Extent back ref already exists for 2703873536000 parent 0 root 2 
bad block 2703860621312
bad block 2703861547008
Extent back ref already exists for 2703876689920 parent 0 root 2 
Extent back ref already exists for 2703881900032 parent 0 root 2 
Extent back ref already exists for 2703879290880 parent 0 root 2 
Extent back ref already exists for 2703873753088 parent 0 root 2 
parent transid verify failed on 2703921885184 wanted 9066 found 7543
parent transid verify failed on 2703921889280 wanted 9066 found 7543
parent transid verify failed on 2703879036928 wanted 9069 found 9061
parent transid verify failed on 2703881867264 wanted 9075 found 9065
parent transid verify failed on 2703873536000 wanted 9074 found 9062
parent transid verify failed on 2703883190272 wanted 9075 found 9061
parent transid verify failed on 2703869997056 wanted 9073 found 9060
parent transid verify failed on 2703922012160 wanted 9066 found 7543
parent transid verify failed on 2703921975296 wanted 9066 found 7543
parent transid verify failed on 2703867707392 wanted 9071 found 9060
parent transid verify failed on 2703922679808 wanted 9066 found 7543
parent transid verify failed on 2703922032640 wanted 9066 found 7543
parent transid verify failed on 2703881891840 wanted 9075 found 9057
parent transid verify failed on 2703882297344 wanted 9075 found 9061
parent transid verify failed on 2703884488704 wanted 9074 found 9057
parent transid verify failed on 2703884353536 wanted 9074 found 9057
parent transid verify failed on 2703884365824 wanted 9074 found 9055
parent transid verify failed on 2703921500160 wanted 9066 found 7543
parent transid verify failed on 2703883177984 wanted 9075 found 9061
parent transid verify failed on 2703921487872 wanted 9066 found 7543
parent transid verify failed on 2703922683904 wanted 9066 found 7543
parent transid verify failed on 2703873753088 wanted 9074 found 9062
parent transid verify failed on 2703874314240 wanted 9074 found 9056
Extent back ref already exists for 2703865823232 parent 0 root 2 
Extent back ref already exists for 2703866810368 parent 0 root 2 
Extent back ref already exists for 2703866986496 parent 0 root 2 
Extent back ref already exists for 2703867031552 parent 0 root 2 
Extent back ref already exists for 2703867625472 parent 0 root 2 
Extent back ref already exists for 2703867609088 parent 0 root 2 
Extent back ref already exists for 2703868829696 parent 0 root 2 
Extent back ref already exists for 2703869734912 parent 0 root 2 
Extent back ref already exists for 2703870255104 parent 0 root 2 
Extent back ref already exists for 2703870562304 parent 0 root 2 
Extent back ref already exists for 2703871201280 parent 0 root 2 
Extent back ref already exists for 2703871168512 parent 0 root 2 
Extent back ref already exists for 2703873040384 parent 0 root 2 
Extent back ref already exists for 2703872610304 parent 0 root 2 
Extent back ref already exists for 2703874686976 parent 0 root 2 
Extent back ref already exists for 2703873318912 parent 0 root 2 
Extent back ref already exists for 2703873740800 parent 0 root 2 
Extent back ref already exists for 2703874465792 parent 0 root 2 
Extent back ref already exists for 2703876370432 parent 0 root 2 
Extent back ref already exists for 2703877046272 parent 0 root 2 
Extent back ref already exists for 2703877050368 parent 0 root 2 
Extent back ref already exists for 2703878647808 parent 0 root 2 
Extent back ref already exists for 2703876407296 parent 0 root 2 
Extent back ref