Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
The corruption can only occur if your HAMMER filesystem became full or nearly full sometime in the last 45 days or so with a kernel built sometime in the last 45 days. To check for the corruption you need an unmounted or completely idle filesystem and then run (using the latest hammer utility): .. hammer -f device show | egrep '^B' | egrep -v '^BM' I was hit, but think my FS has been under 90% full at all times. (checked in single user, w/ r/o mount) Any way to find out which files (history) are affected? If using mirror-read to copyoff remember it must be run on every PFS individually, and bulk mode (-B) is recommended, and make sure any backups are viable before smashing the original filesystem. Why is -B recommended? In hammer.8 -B is 'not recommended'; should this just be removed? Any way to restore root PFS (#0) fully? Root PFS can not be downgraded to slave, for mirror-write, so I see no way to get history restored. Beware of cpdup'ing root PFS; symlinks for already restored PFSs will be overwritten. Also remember to copy PFS config (if you use non default). (I had to restore PFSs twice, as I did 'hammer cleanup' too early) -thomas
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
: :The corruption can only occur if your HAMMER filesystem became full :or nearly full sometime in the last 45 days or so with a kernel built :sometime in the last 45 days. To check for the corruption you need :an unmounted or completely idle filesystem and then run (using the :latest hammer utility): :.. :hammer -f device show | egrep '^B' | egrep -v '^BM' :I was hit, but think my FS has been under 90% full at all times. :(checked in single user, w/ r/o mount) : :Any way to find out which files (history) are affected? If the filesystem is not idle you can try sync'ing a few times and running the hammer -f device show | egrep... stuff several times to see if the output changes. Only manually by locating the errors in the show output and backtracking the object id (inode number) to the directory entry. In that case you would have to dump the entire show output to a file, which could end up being gigabytes depending on the size of the filesystem. hammer -f device show somefile less somefile /^B (but ^BM has to be ignored since those represent mirror_tid errors which are probably all over the place prior to the fix which went into 2.6). :If using mirror-read to copyoff remember it must be run on every PFS :individually, and bulk mode (-B) is recommended, and make sure any :backups are viable before smashing the original filesystem. : :Why is -B recommended? :In hammer.8 -B is 'not recommended'; should this just be removed? -B works around a bug in the incremental mirroring transaction ids stored in the B-Tree which was fixed for the 2.6 release but existed prior to that. The bug is self-correcting in that modifications made after the bug was fixed will properly deal with the mirror_tid in the B-Tree. :Any way to restore root PFS (#0) fully? :Root PFS can not be downgraded to slave, for mirror-write, :so I see no way to get history restored. No. What we really need to do here is get rid of the notion of a root PFS entirely and just make all the PFSs operate the same way. Someone was talking about making it possible to mount the root HAMMER filesystem with a PFS # other than 0, as well. Also very easy to do I think, it could be a small mini-project for someone. In anycase, ultimately for people who hit this corruption problem the best solution, unfortunately, may be to copy off the data and newfs the thing from scratch. :Beware of cpdup'ing root PFS; symlinks for already restored PFSs :will be overwritten. : :Also remember to copy PFS config (if you use non default). :(I had to restore PFSs twice, as I did 'hammer cleanup' too early) : : -thomas -Matt Matthew Dillon dil...@backplane.com
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
Hi, Yah, indeed it does :'( sudo hammer -f /dev/serno/QM2.s1a checkmap Volume header records=0 next_tid=00010841bec0 bufoffset=4404 Collecting allocation info from B-Tree: done BM block=20001000 calc 114688 free, got 1163264 Now what? Is really the mirror-copy the only solution? I basically have no means to do that ... Cheers, Antonio Huete 2010/4/20 Matthew Dillon dil...@apollo.backplane.com: :Does this show all errors for sure? I had corrunptions on one :filesystem but no such errors were printed. It should find any corruption. It won't find partial recoveries (basically directory entries with no corresponding inode). Those can be rm'd. -Matt Matthew Dillon dil...@backplane.com -- - Antonio Huete
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
On Mon, 19 Apr 2010 21:43:49 -0700 (PDT) Matthew Dillon dil...@apollo.backplane.com wrote: :Does this show all errors for sure? I had corrunptions on one :filesystem but no such errors were printed. It should find any corruption. It won't find partial recoveries (basically directory entries with no corresponding inode). Those can be rm'd. -Matt Matthew Dillon dil...@backplane.com Nevermind, this was a PEBKAC on my side. Sorry for the noise.
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
:Hi, : :Yah, indeed it does :'( : : sudo hammer -f /dev/serno/QM2.s1a checkmap :Volume header records=0 next_tid=00010841bec0 :bufoffset=4404 :Collecting allocation info from B-Tree: done :BM block=20001000 calc 114688 free, got 1163264 : :Now what? Is really the mirror-copy the only solution? I basically :have no means to do that ... : :Cheers, :Antonio Huete Pretty much, short of modifying the hammer utility to correct the allocation info. If its a qemu image then just create a second qemu disk so both are available inside the VM. Then you can mirror-copy the stuff over. -Matt
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
On Tue, Apr 20, 2010 at 11:03:55AM -0700, Matthew Dillon wrote: :Yah, indeed it does :'( : : sudo hammer -f /dev/serno/QM2.s1a checkmap :Volume header records=0 next_tid=00010841bec0 :bufoffset=4404 :Collecting allocation info from B-Tree: done :BM block=20001000 calc 114688 free, got 1163264 : :Now what? Is really the mirror-copy the only solution? I basically :have no means to do that ... Pretty much, short of modifying the hammer utility to correct the allocation info. From the point of view of a guy having the means to do the copy stuff (two harddrives), it is still a bit of a PITA. 300+ GB take hours to copy :-( Having a fsck.hammer would be a big plus IMHO. -- Francois Tigeot
DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
A serious HAMMER corruption issue came up soon after the release. This issue can occur when a HAMMER filesystem becomes full or nearly full and reblocking occurs while the filesystem is also loaded down with other write activity. The reblocking activity itself can cause an almost-full filesystem to temporarily become full. People running 2.6 or HEAD should update to the latest on the branch ASAP. ISOs and IMGs will be available on the site later this evening. -- The corruption can only occur if your HAMMER filesystem became full or nearly full sometime in the last 45 days or so with a kernel built sometime in the last 45 days. To check for the corruption you need an unmounted or completely idle filesystem and then run (using the latest hammer utility): hammer -f device checkmap and hammer -f device show | egrep '^B' | egrep -v '^BM' checkmap runs in a fairly short period of time. Show reads basically every block on the filesystem and verifies all the CRCs and can take quite a while to run. Any records output, other than the volume summary checkmap always prints, is an indication of a problem. In this case there isn't much that can be done except to ensure the system is updated and copyoff/reformat/copyback the filesystem. If using mirror-read to copyoff remember it must be run on every PFS individually, and bulk mode (-B) is recommended, and make sure any backups are viable before smashing the original filesystem. -- If you do NOT have any corruption then simply updating your kernel is sufficient. -Matt Matthew Dillon dil...@backplane.com
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
On Mon, 19 Apr 2010 08:55:00 -0700 (PDT) Matthew Dillon dil...@apollo.backplane.com wrote: SNIP The corruption can only occur if your HAMMER filesystem became full or nearly full sometime in the last 45 days or so with a kernel built sometime in the last 45 days. To check for the corruption you need an unmounted or completely idle filesystem and then run (using the latest hammer utility): hammer -f device checkmap and hammer -f device show | egrep '^B' | egrep -v '^BM' SNIP Does this show all errors for sure? I had corrunptions on one filesystem but no such errors were printed.
Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue
:Does this show all errors for sure? I had corrunptions on one :filesystem but no such errors were printed. It should find any corruption. It won't find partial recoveries (basically directory entries with no corresponding inode). Those can be rm'd. -Matt Matthew Dillon dil...@backplane.com