Thanks Sunil. I treat the ocfs2/LVM volumes as static partitions so that shouldn't cause problems unless I'm attempting to resize or something like that right?
---recovery--- In the past I tried to recover using "debugfs.ocfs rdump" but it would always fail with an error message of: debugfs.ocfs2: extend_file.c:211: ocfs2_new_path: Assertion `root_el->l_tree_depth < 5' failed. After your suggestion I'm trying it again and breaking the rdump operation into parts so that it never needs to traverse more than 5 subdirectories deep and it seems to be working very nicely. I've recovered about 60% so far and it looks like smooth sailing from here. I was exporting this volume over iSCSI and I suspect the filesystem problem came about as a result of fencing resets? 90% of my way through the recovery and it turns out that each of the more volume-wide rdump attemps were chocking at the same specific point: a symlink pointing to a directory on a different filesystem that then descended back into the ocfs volume. Looking like this: debugfs: stat lossless.from_folks Inode: 258749 Mode: 0777 Generation: 667446219 (0x27c86bcb) FS Generation: 781535612 (0x2e95497c) Type: Symbolic Link Attr: 0x0 Flags: Valid User: 1000 (khaije) Group: 1000 (khaije) Size: 68 Links: 1 Clusters: 0 ctime: 0x49f5b25d -- Mon Apr 27 09:25:49 2009 atime: 0x49f5b25d -- Mon Apr 27 09:25:49 2009 mtime: 0x4994683e -- Thu Feb 12 13:19:42 2009 dtime: 0x0 -- Wed Dec 31 19:00:00 1969 ctime_nsec: 0x0a0f5a36 -- 168778294 atime_nsec: 0x00000000 -- 0 mtime_nsec: 0x00000000 -- 0 Last Extblk: 0 Sub Alloc Slot: 0 Sub Alloc Bit: 699 Fast Symlink Destination: /home/khaije/documents/shared_multimedia/audio/music/lossy.from_folks I'm guessing this either added enough directories to the path to exceed rdump's threshold or that it had problems manipulating the combination of local and non-local filesystems. I would simply delete it but debugfs.ocfs2 doesn't seem to allow that. (I'm not sure why I didn't just use a relative path symlink since the target is in the same directory) Anyway by avoiding that symlink I was able to make a full recovery. Cheers and thanks again, On Fri, May 29, 2009 at 2:18 PM, Sunil Mushran <sunil.mush...@oracle.com>wrote: > You are using ocfs2 atop lvm - a non-cluster-aware volume manager. > A lot of things can go wrong in this combination. Quite a few have > been reported on this forum. > > debugfs.ocfs2 has commands dump and rdump that allows users to > read the files directly off the disk. Use it to recover your data. > > khaije rock wrote: > >> I can simplify this question: >> >> What can I do to try to recover data from a problematic ocfs2 filesystem? >> >> For example, would I get any traction if I build tools from upstream >> sources? >> >> Thanks all! >> >> ---------- Forwarded message ---------- >> From: *khaije rock* <khai...@gmail.com <mailto:khai...@gmail.com>> >> Date: Mon, May 25, 2009 at 8:06 AM >> Subject: fsck fails & volume mount fails, is my data lost? >> To: ocfs2-users@oss.oracle.com <mailto:ocfs2-users@oss.oracle.com> >> >> >> Hi, >> >> I hope its appropriate for me to post my issue to this list. Thanks in >> advance for any help! >> >> I don't know exactly what the underlying cause is, but here is what it >> looks like: >> - mount the filesystem >> - cd into the directory with no errors, however >> - the shell seizes when i attempt to 'ls' or interact with any data in >> any way. >> >> I've found when running fsck.ocfs2 against the block device (it's a >> logical volume using lvm) it completes successfully and reports the >> following: >> >> kha...@chronovore:~$ sudo fsck >> /dev/vg.chronovore/lv.medea.share._multimedia_store >> fsck 1.41.3 (12-Oct-2008) >> Checking OCFS2 filesystem in >> /dev/vg.chronovore/lv.medea.share._multimedia_store: >> label: lv.medea.share._multimedia_store >> uuid: 28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38 >> number of blocks: 65536000 >> bytes per block: 4096 >> number of clusters: 65536000 >> bytes per cluster: 4096 >> max slots: 4 >> >> o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1 >> o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0 >> o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0 >> o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0 >> >> /dev/vg.chronovore/lv.medea.share._multimedia_store is clean. It will be >> checked after 20 additional mounts. >> >> >> The command returns this output and returns control to the shell. As you >> can see it indicates there is a 'journal dirty' flag set for slot one, which >> is the host machine. You'll notice that immediately after stating that the >> journal is dirty it says that the filesystem is clean. >> >> In order to try to make the filesystem usable I ran fsck.ocfs2 with the >> -fvv flags. This process never fully completes. After several minutes of the >> process happily chugging along it seizes. One of the last blocks of output >> generated has this to say: >> >> o2fsck_verify_inode_fields:435 | checking inode 14119181's fields >> check_el:249 | depth 0 count 243 next_free 1 >> check_er:164 | cpos 0 clusters 1 blkno 14677109 >> verify_block:705 | adding dir block 14677109 >> update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot >> 0 >> o2fsck_verify_inode_fields:435 | checking inode 14119182's fields >> check_el:249 | depth 0 count 243 next_free 1 >> check_er:164 | cpos 0 clusters 1 blkno 14677110 >> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster >> 14677110 >> verify_block:705 | adding dir block 14677110 >> >> This 'Internal logic failure' seems significant, so I googled and found >> the following passage ( >> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) which >> seems to have some bearing in my case: >> >> -=-=-=-=-=- >> Duplicate groups or missing groups >> >> When we relink the groups in extent_alloc and inode_alloc, it contains 2 >> steps, deleting from the old inode and relinking to the new inode. So which >> should be carried first since we may panic between the two steps. >> >> Deleting from the old inode first If deletion is carried first and >> tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are >> allocated(it decide them by reading inode_alloc and extent_alloc), all the >> spaces will be freed. This is too bad. >> >> Relinking to the new inode first If relink is carried first, and >> tunefs panic: Since now two alloc inode contains some duplicated chains, >> error "GROUP_PARENT" is prompted every time and many internal error >> "o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate >> cluster". >> Although this is also boring, we at least have the chain information in >> our hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. >> There are also one thing that has to be mentioned: fsck.ocfs2 will loop >> forever in o2fsck_add_dir_block since it doesn't handle the condition of >> dbe->e_blkno == tmp_dbe->e_blkno, so we have to handle this also. >> =-=-=-=-=- >> >> Later in this page the author suggests that fsck.ocfs2 would need to be >> modified to handle this case (which I gather hasn't happened yet), however >> there must be some other way to remedy this situation and recover the nearly >> 250 gigs of data i have on this share? >> >> Can anyone help? >> >> I've tried copying to a new partition by using debugfs.ocfs2 but I'm not >> sure if I'm doing it right or if there is a more sensible approach to try. >> >> Thanks all, >> Nick >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users