Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-05-17 Thread Thomas Nikolajsen
The corruption can only occur if your HAMMER filesystem became full
or nearly full sometime in the last 45 days or so with a kernel built
sometime in the last 45 days.  To check for the corruption you need
an unmounted or completely idle filesystem and then run (using the
latest hammer utility):
..
hammer -f device show | egrep '^B' | egrep -v '^BM'

I was hit, but think my FS has been under 90% full at all times.
(checked in single user, w/ r/o mount)

Any way to find out which files (history) are affected?

If using mirror-read to copyoff remember it must be run on every PFS
individually, and bulk mode (-B) is recommended, and make sure any
backups are viable before smashing the original filesystem.

Why is -B recommended?
In hammer.8 -B is 'not recommended'; should this just be removed?

Any way to restore root PFS (#0) fully?
Root PFS can not be downgraded to slave, for mirror-write,
so I see no way to get history restored.

Beware of cpdup'ing root PFS; symlinks for already restored PFSs
will be overwritten.

Also remember to copy PFS config (if you use non default).
(I had to restore PFSs twice, as I did 'hammer cleanup' too early)

 -thomas



Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-05-17 Thread Matthew Dillon

:
:The corruption can only occur if your HAMMER filesystem became full
:or nearly full sometime in the last 45 days or so with a kernel built
:sometime in the last 45 days.  To check for the corruption you need
:an unmounted or completely idle filesystem and then run (using the
:latest hammer utility):
:..
:hammer -f device show | egrep '^B' | egrep -v '^BM'

:I was hit, but think my FS has been under 90% full at all times.
:(checked in single user, w/ r/o mount)
:
:Any way to find out which files (history) are affected?

If the filesystem is not idle you can try sync'ing a few times
and running the hammer -f device show | egrep... stuff several
times to see if the output changes.

Only manually by locating the errors in the show output and backtracking
the object id (inode number) to the directory entry.  In that case
you would have to dump the entire show output to a file, which could
end up being gigabytes depending on the size of the filesystem.

hammer -f device show  somefile
less somefile
/^B

(but ^BM has to be ignored since those represent mirror_tid errors
which are probably all over the place prior to the fix which went
into 2.6).

:If using mirror-read to copyoff remember it must be run on every PFS
:individually, and bulk mode (-B) is recommended, and make sure any
:backups are viable before smashing the original filesystem.
:
:Why is -B recommended?
:In hammer.8 -B is 'not recommended'; should this just be removed?

-B works around a bug in the incremental mirroring transaction ids
stored in the B-Tree which was fixed for the 2.6 release but existed
prior to that.  The bug is self-correcting in that modifications made
after the bug was fixed will properly deal with the mirror_tid in
the B-Tree.

:Any way to restore root PFS (#0) fully?
:Root PFS can not be downgraded to slave, for mirror-write,
:so I see no way to get history restored.

No.  What we really need to do here is get rid of the notion of
a root PFS entirely and just make all the PFSs operate the same
way.

Someone was talking about making it possible to mount the root HAMMER
filesystem with a PFS # other than 0, as well.  Also very easy to do
I think, it could be a small mini-project for someone.

In anycase, ultimately for people who hit this corruption problem
the best solution, unfortunately, may be to copy off the data and
newfs the thing from scratch.

:Beware of cpdup'ing root PFS; symlinks for already restored PFSs
:will be overwritten.
:
:Also remember to copy PFS config (if you use non default).
:(I had to restore PFSs twice, as I did 'hammer cleanup' too early)
:
: -thomas

-Matt
Matthew Dillon 
dil...@backplane.com


Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-20 Thread Antonio Huete Jimenez
Hi,

Yah, indeed it does :'(

 sudo hammer -f /dev/serno/QM2.s1a checkmap
Volume header   records=0 next_tid=00010841bec0
bufoffset=4404
Collecting allocation info from B-Tree: done
BM  block=20001000 calc 114688 free, got 1163264

Now what? Is really the mirror-copy the only solution? I basically
have no means to do that ...

Cheers,
Antonio Huete

2010/4/20 Matthew Dillon dil...@apollo.backplane.com:

 :Does this show all errors for sure? I had corrunptions on one
 :filesystem but no such errors were printed.

    It should find any corruption.  It won't find partial recoveries
    (basically directory entries with no corresponding inode).  Those
    can be rm'd.

                                        -Matt
                                        Matthew Dillon
                                        dil...@backplane.com




-- 
- Antonio Huete



Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-20 Thread Gergo Szakal
On Mon, 19 Apr 2010 21:43:49 -0700 (PDT)
Matthew Dillon dil...@apollo.backplane.com wrote:

 
 :Does this show all errors for sure? I had corrunptions on one
 :filesystem but no such errors were printed.
 
 It should find any corruption.  It won't find partial recoveries
 (basically directory entries with no corresponding inode).  Those
 can be rm'd.
 
   -Matt
   Matthew Dillon 
   dil...@backplane.com


Nevermind, this was a PEBKAC on my side. Sorry for the noise.


Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-20 Thread Matthew Dillon

:Hi,
:
:Yah, indeed it does :'(
:
: sudo hammer -f /dev/serno/QM2.s1a checkmap
:Volume header   records=0 next_tid=00010841bec0
:bufoffset=4404
:Collecting allocation info from B-Tree: done
:BM  block=20001000 calc 114688 free, got 1163264
:
:Now what? Is really the mirror-copy the only solution? I basically
:have no means to do that ...
:
:Cheers,
:Antonio Huete

Pretty much, short of modifying the hammer utility to correct
the allocation info.

If its a qemu image then just create a second qemu disk so both
are available inside the VM.  Then you can mirror-copy the stuff
over.

-Matt



Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-20 Thread Francois Tigeot
On Tue, Apr 20, 2010 at 11:03:55AM -0700, Matthew Dillon wrote:
 
 :Yah, indeed it does :'(
 :
 : sudo hammer -f /dev/serno/QM2.s1a checkmap
 :Volume header   records=0 next_tid=00010841bec0
 :bufoffset=4404
 :Collecting allocation info from B-Tree: done
 :BM  block=20001000 calc 114688 free, got 1163264
 :
 :Now what? Is really the mirror-copy the only solution? I basically
 :have no means to do that ...
 
 Pretty much, short of modifying the hammer utility to correct
 the allocation info.

From the point of view of a guy having the means to do the copy stuff
(two harddrives), it is still a bit of a PITA.

300+ GB take hours to copy :-(

Having a fsck.hammer would be a big plus IMHO.

-- 
Francois Tigeot


DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-19 Thread Matthew Dillon
A serious HAMMER corruption issue came up soon after the release.
This issue can occur when a HAMMER filesystem becomes full or nearly
full and reblocking occurs while the filesystem is also loaded down
with other write activity.  The reblocking activity itself can cause
an almost-full filesystem to temporarily become full.

People running 2.6 or HEAD should update to the latest on the branch
ASAP.

ISOs and IMGs will be available on the site later this evening.

--

The corruption can only occur if your HAMMER filesystem became full
or nearly full sometime in the last 45 days or so with a kernel built
sometime in the last 45 days.  To check for the corruption you need
an unmounted or completely idle filesystem and then run (using the
latest hammer utility):

hammer -f device checkmap

and

hammer -f device show | egrep '^B' | egrep -v '^BM'

checkmap runs in a fairly short period of time.  Show reads basically
every block on the filesystem and verifies all the CRCs and can take
quite a while to run.

Any records output, other than the volume summary checkmap always
prints, is an indication of a problem.  In this case there isn't much
that can be done except to ensure the system is updated and
copyoff/reformat/copyback the filesystem.

If using mirror-read to copyoff remember it must be run on every PFS
individually, and bulk mode (-B) is recommended, and make sure any
backups are viable before smashing the original filesystem.

--

If you do NOT have any corruption then simply updating your kernel
is sufficient.

-Matt
Matthew Dillon 
dil...@backplane.com


Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-19 Thread Gergo Szakal
On Mon, 19 Apr 2010 08:55:00 -0700 (PDT)
Matthew Dillon dil...@apollo.backplane.com wrote:

SNIP
 
 The corruption can only occur if your HAMMER filesystem became full
 or nearly full sometime in the last 45 days or so with a kernel built
 sometime in the last 45 days.  To check for the corruption you need
 an unmounted or completely idle filesystem and then run (using the
 latest hammer utility):
 
 hammer -f device checkmap
 
 and
 
 hammer -f device show | egrep '^B' | egrep -v '^BM'
 
SNIP

Does this show all errors for sure? I had corrunptions on one
filesystem but no such errors were printed.


Re: DragonFly 2.6.2, 2.7.2 tags pushed - fixes for serious HAMMER issue

2010-04-19 Thread Matthew Dillon

:Does this show all errors for sure? I had corrunptions on one
:filesystem but no such errors were printed.

It should find any corruption.  It won't find partial recoveries
(basically directory entries with no corresponding inode).  Those
can be rm'd.

-Matt
Matthew Dillon 
dil...@backplane.com