Re: dump/restore corrupted filesystems

2007-04-18 Thread CyberLeo Kitsana
Roland Smith wrote:
 Sorry if I wasn't clear. Most all of the data is readable and complete
 if I mount the filesystem read-only. It just panics the box when mounted
 read/write, and fsck can't fix the damage.
 
 That might be worth filing a PR for, especially the panics. 
 
 Exactly what is damaged?  Garbage in files? Wrong inode counts? I've had
 unclean filesystems because of panics, but nothing fsck_ffs couldn't
 fix.
 
 You might want to check the hardware too. Use smartmontools in case of
 (S)ATA drives.

Smart says that the drives are fine, as does the manufacturer's disk
fitness tools. All the files that are readable contain correct data, but
the files that are corrupt are totally not readable, and cannot even be
removed manually:

--8--
rsync: readlink
/raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es
failed: Bad file descriptor (9)
rsync: readlink
/raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr
failed: Bad file descriptor (9)
--8--

fsck_ufs dies after about 30 minutes of grinding with the following:

--8--
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=93409222  OWNER=1002 MODE=40755
SIZE=512 MTIME=Feb 10 00:49 2007
DIR=?

UNEXPECTED SOFT UPDATE INCONSISTENCY

SALVAGE? no

MISSING '.'  I=93409222  OWNER=1002 MODE=40755
SIZE=512 MTIME=Feb 10 00:49 2007
DIR=?

UNEXPECTED SOFT UPDATE INCONSISTENCY
CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS

UNEXPECTED SOFT UPDATE INCONSISTENCY
fsck_ufs: inoinfo: inumber -1170056596 out of range
--8--

(full output is at
http://home.cyberleo.net/cyberleo/workspace/Zip/Bugs/fbsd-20070320-corr/saba-fsck-raid.txt
)

It's possible this might be a result of the odd interaction between
geom_raid5 and UFS, as discovered in January (
http://www.nabble.com/geom_raid5-livelock--p8304142.html ), but I can't
be sure.


I've already chalked this up to just an unfortunate occurrence, as the
circumstances that caused the corruption in the first place are likely
either long gone or so obscure as to be nearly impossible for me to root
out.

 Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used
 inodes list and all directories. So if any of these is corrupt, your
 dump will be too. And if the contents of the inodes is corrupted, so
 will the dump.

Thanks for this insight. I'll avoid dump/restore and just use manual
copying for now.

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
[EMAIL PROTECTED]

Furry Peace! - http://www.fur.com/peace/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dump/restore corrupted filesystems

2007-04-18 Thread Jerry McAllister
On Wed, Apr 18, 2007 at 04:09:22PM -0500, CyberLeo Kitsana wrote:

 Roland Smith wrote:
  Sorry if I wasn't clear. Most all of the data is readable and complete
  if I mount the filesystem read-only. It just panics the box when mounted
  read/write, and fsck can't fix the damage.
  
  That might be worth filing a PR for, especially the panics. 
  
  Exactly what is damaged?  Garbage in files? Wrong inode counts? I've had
  unclean filesystems because of panics, but nothing fsck_ffs couldn't
  fix.
  
  You might want to check the hardware too. Use smartmontools in case of
  (S)ATA drives.
 
 Smart says that the drives are fine, as does the manufacturer's disk
 fitness tools. All the files that are readable contain correct data, but
 the files that are corrupt are totally not readable, and cannot even be
 removed manually:

Given that, I would try to make a dump(8) of it.   If dump dies on
a particular file, try to exclude that file from the dump either by
rm-ing it or setting a nodump flag and try again.   You may not 
actually be able to do the rm or nodump flag though if you cannot
mount it with write permission.   You might be able to force it 
mounted without doing the fsck in single user.

Note that tar allows you to specify exclusions.   I usually don't
suggest using tar for mass moves because it has weaknesses with
hard links and might also not transfer flags and permissions
correctly.  But, if tar is what it takes, then use it.

Good luck,

jerry

 
 --8--
 rsync: readlink
 /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es
 failed: Bad file descriptor (9)
 rsync: readlink
 /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr
 failed: Bad file descriptor (9)
 --8--
 
 fsck_ufs dies after about 30 minutes of grinding with the following:
 
 --8--
 ** Phase 2 - Check Pathnames
 DIRECTORY CORRUPTED  I=93409222  OWNER=1002 MODE=40755
 SIZE=512 MTIME=Feb 10 00:49 2007
 DIR=?
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 SALVAGE? no
 
 MISSING '.'  I=93409222  OWNER=1002 MODE=40755
 SIZE=512 MTIME=Feb 10 00:49 2007
 DIR=?
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY
 fsck_ufs: inoinfo: inumber -1170056596 out of range
 --8--
 
 (full output is at
 http://home.cyberleo.net/cyberleo/workspace/Zip/Bugs/fbsd-20070320-corr/saba-fsck-raid.txt
 )
 
 It's possible this might be a result of the odd interaction between
 geom_raid5 and UFS, as discovered in January (
 http://www.nabble.com/geom_raid5-livelock--p8304142.html ), but I can't
 be sure.
 
 
 I've already chalked this up to just an unfortunate occurrence, as the
 circumstances that caused the corruption in the first place are likely
 either long gone or so obscure as to be nearly impossible for me to root
 out.
 
  Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used
  inodes list and all directories. So if any of these is corrupt, your
  dump will be too. And if the contents of the inodes is corrupted, so
  will the dump.
 
 Thanks for this insight. I'll avoid dump/restore and just use manual
 copying for now.
 
 --
 Fuzzy love,
 -CyberLeo
 Technical Administrator
 CyberLeo.Net Webhosting
 http://www.CyberLeo.Net
 [EMAIL PROTECTED]
 
 Furry Peace! - http://www.fur.com/peace/
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dump/restore corrupted filesystems

2007-04-18 Thread Roland Smith
On Wed, Apr 18, 2007 at 04:09:22PM -0500, CyberLeo Kitsana wrote:
 Roland Smith wrote:
  Sorry if I wasn't clear. Most all of the data is readable and complete
  if I mount the filesystem read-only. It just panics the box when mounted
  read/write, and fsck can't fix the damage.
  
  That might be worth filing a PR for, especially the panics. 
  
  Exactly what is damaged?  Garbage in files? Wrong inode counts? I've had
  unclean filesystems because of panics, but nothing fsck_ffs couldn't
  fix.
  
  You might want to check the hardware too. Use smartmontools in case of
  (S)ATA drives.
 
 Smart says that the drives are fine, as does the manufacturer's disk
 fitness tools.

That's at least some good news.

 --8--
 rsync: readlink
 /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es
 failed: Bad file descriptor (9)
 rsync: readlink
 /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr
 failed: Bad file descriptor (9)
 --8--

At least these files should be easy to replace, if necessary.

 fsck_ufs dies after about 30 minutes of grinding with the following:
 
 --8--
 ** Phase 2 - Check Pathnames
 DIRECTORY CORRUPTED  I=93409222  OWNER=1002 MODE=40755
 SIZE=512 MTIME=Feb 10 00:49 2007
 DIR=?
 
 UNEXPECTED SOFT UPDATE INCONSISTENCY

Did these problems start after a crash? 

 SALVAGE? no

What happens if you tell it to try and salvage?
 
Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgp5z4guQPOE6.pgp
Description: PGP signature


Re: dump/restore corrupted filesystems

2007-04-18 Thread CyberLeo Kitsana
Jerry McAllister wrote:
 Smart says that the drives are fine, as does the manufacturer's disk
 fitness tools. All the files that are readable contain correct data, but
 the files that are corrupt are totally not readable, and cannot even be
 removed manually:
 
 Given that, I would try to make a dump(8) of it.   If dump dies on
 a particular file, try to exclude that file from the dump either by
 rm-ing it or setting a nodump flag and try again.   You may not 
 actually be able to do the rm or nodump flag though if you cannot
 mount it with write permission.   You might be able to force it 
 mounted without doing the fsck in single user.
 
 Note that tar allows you to specify exclusions.   I usually don't
 suggest using tar for mass moves because it has weaknesses with
 hard links and might also not transfer flags and permissions
 correctly.  But, if tar is what it takes, then use it.

Force-mounting the filesystem works just fine. It's when I try to modify
any munged file that it panics the box, with ufs_dirbad or somesuch.

I have been using rsync to recover readable data, which handles
hard-links, permissions, sparse files, and et cetera. I figure it's
best, as that's what is used to drop the differential backups onto the
box in the first place.

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
[EMAIL PROTECTED]

Furry Peace! - http://www.fur.com/peace/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dump/restore corrupted filesystems

2007-04-18 Thread CyberLeo Kitsana
Roland Smith wrote:
 --8--
 ** Phase 2 - Check Pathnames
 DIRECTORY CORRUPTED  I=93409222  OWNER=1002 MODE=40755
 SIZE=512 MTIME=Feb 10 00:49 2007
 DIR=?

 UNEXPECTED SOFT UPDATE INCONSISTENCY
 
 Did these problems start after a crash? 

It's possible, but I cannot be absolutely certain. The machine is
supposed to start itself up and shut itself down every day, running a
total of about 4 hours a day, during the span when all other machines
dump their backups. The only reason I noticed this failure was because
it didn't power down one day. Investigation revealed that FSCK had
failed and dropped to single user, with errors seen in the log.

 
 SALVAGE? no
 
 What happens if you tell it to try and salvage?

This was a dry-run to get the error log. When I actually tried to repair
the filesystem, fsck aborts shortly after, complaining that it cannot
fix the filesystem, and cannot continue. Hence the current path of
removing everything and re-newfs'ing.

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
[EMAIL PROTECTED]

Furry Peace! - http://www.fur.com/peace/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


dump/restore corrupted filesystems

2007-04-16 Thread CyberLeo Kitsana
Hi!

I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I
must move all 500GB or so of data off of it and re-newfs it.

Does anybody know whether dump/restore can gracefully handle filesystem
corruption, or will it happily back up and restore said damage to the
pristine filesystem?

Thanks!

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
[EMAIL PROTECTED]

Furry Peace! - http://www.fur.com/peace/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dump/restore corrupted filesystems

2007-04-16 Thread Roland Smith
On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote:
 I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I
 must move all 500GB or so of data off of it and re-newfs it.

If the corruption is due to hardware failure, your data is probably lost.

Ditto if the corruption is so bad that fsck_ffs can't handle it. You can
e.g. tell fsck_ffs(8) to use a backup superblock, with the -b option.

 Does anybody know whether dump/restore can gracefully handle filesystem
 corruption, or will it happily back up and restore said damage to the
 pristine filesystem?

Dump examines the filesystem to see which files need to be backed up. So
dumping a corrupted FS will probably not produce the desired results. If
it did, we wouldn't need backups.

What you could do is use dd(1) with nc(1) to send a copy of the raw
device data to another machine, and try if you can pry your data from that.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpyOmrbRZctd.pgp
Description: PGP signature


Re: dump/restore corrupted filesystems

2007-04-16 Thread CyberLeo Kitsana
Roland Smith wrote:
 On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote:
 I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I
 must move all 500GB or so of data off of it and re-newfs it.
 
 If the corruption is due to hardware failure, your data is probably lost.

Sorry if I wasn't clear. Most all of the data is readable and complete
if I mount the filesystem read-only. It just panics the box when mounted
read/write, and fsck can't fix the damage.

My question was more along the lines of whether or not dump/restore
would see that those corrupted directory and file inodes were indeed
corrupt and not bother attempting to back them up, or if it would
happily back them up and restore them in their corrupted state to a new
filesystem, thus trashing it.

If it does, I can always use rsync.

 Dump examines the filesystem to see which files need to be backed up.
 So dumping a corrupted FS will probably not produce the desired
 results. If it did, we wouldn't need backups.

Ironically, this is the machine that holds the backups.

--
Fuzzy love,
-CyberLeo
Technical Administrator
CyberLeo.Net Webhosting
http://www.CyberLeo.Net
[EMAIL PROTECTED]

Furry Peace! - http://www.fur.com/peace/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dump/restore corrupted filesystems

2007-04-16 Thread Roland Smith
On Mon, Apr 16, 2007 at 11:14:35PM -0500, CyberLeo Kitsana wrote:
 Roland Smith wrote:
  On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote:
  I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I
  must move all 500GB or so of data off of it and re-newfs it.
  
  If the corruption is due to hardware failure, your data is probably lost.
 
 Sorry if I wasn't clear. Most all of the data is readable and complete
 if I mount the filesystem read-only. It just panics the box when mounted
 read/write, and fsck can't fix the damage.

That might be worth filing a PR for, especially the panics. 

Exactly what is damaged?  Garbage in files? Wrong inode counts? I've had
unclean filesystems because of panics, but nothing fsck_ffs couldn't
fix.

You might want to check the hardware too. Use smartmontools in case of
(S)ATA drives.

 My question was more along the lines of whether or not dump/restore
 would see that those corrupted directory and file inodes were indeed
 corrupt and not bother attempting to back them up, or if it would
 happily back them up and restore them in their corrupted state to a new
 filesystem, thus trashing it.

Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used inodes
list and all directories. So if any of these is corrupt, your dump will
be too. And if the contents of the inodes is corrupted, so will the dump. 

 Ironically, this is the machine that holds the backups.

Oops.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpINMVo3zQiB.pgp
Description: PGP signature


Re: dump/restore corrupted filesystems

2007-04-16 Thread Jerry McAllister
On Mon, Apr 16, 2007 at 11:14:35PM -0500, CyberLeo Kitsana wrote:

 Roland Smith wrote:
  On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote:
  I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I
  must move all 500GB or so of data off of it and re-newfs it.
  
  If the corruption is due to hardware failure, your data is probably lost.
 
 Sorry if I wasn't clear. Most all of the data is readable and complete
 if I mount the filesystem read-only. It just panics the box when mounted
 read/write, and fsck can't fix the damage.
 
 My question was more along the lines of whether or not dump/restore
 would see that those corrupted directory and file inodes were indeed
 corrupt and not bother attempting to back them up, or if it would
 happily back them up and restore them in their corrupted state to a new
 filesystem, thus trashing it.

It depends on how they are corrupted.  Really there are three situations.

In the first, something happened to cause a problem with the filesystem
structure - the block and their pointer chains/links.   That would make
fsck see errors and possibly refuse to complete.  If that also affects
the ability to read some actual file then neither dump/restore nor any
other copy method will fix the situation.  dump and other utilities will
fail when reading the files and abort.

You might be able to tinker around a little, figure out which actual files 
are affected and delete them or set dump not to read them and then copy 
all the rest.   But, if you are unable to mount the filesystem as write, 
this might not work.   If you are able to copy most, then those files 
would be uncorrupted in the new location.   You would just have to 
figure out what to do about the files you could not read.

Second would be a similar corruption to the filesystem structure 
blocks and links, but it happens to luckily not be in a place
currently being used by any actual files.   In this case, fsck
would fail, but you could still read the files enough to copy
them to some other space.   In this case, the copy process, whether
dump/restore or some other - dump/restore is probably best - would
fix the problem nicely.   The copy would be uncorrupted.

The third situation would be where the data itself was miswritten - 
maybe by a routine that cobbled some computation or database utility 
or whatever.  In this case, fsck would not see any problem with the 
filesystem.  It would see that all the blocks and links were nicely 
accounted for.  But the data would be bad and no amount of copying
would fix it.  If fact, dump or any other copy utility would read
the files without errors just fine and dandy, because it would not
know of the corruptions - so they would just follow it to the new copy.

dump/restore won't make any difference to/fix any fsck type errors.   
It works above that level - on the files' data itself.   fsck works
below the file level, on blocks and file chain links, etc.  If fsck
finds an unfixable error, dump or any other utility will fail too
if the error is in the area it is trying to read.

When you have dump-ed, then if you need to restore in to a cleanly created
new filesystem.  Remember that newfs created a filesystem on a partition.
Then the copy should not be corrupted from an fsck point of view.  This is 
not because of anything that dump/restore would do, but because the newfs 
made a clean new system that fsck would be happy with.

Now, if the data itself is corrupt - but readable, then dump will
happily read the corrupt data and restore will happily write out
what dump created.   The data would be just as incorrect.   But,
again, that is not at the fsck level.   It is at the file and
directory level.   fsck works on blocks and links and doesn't care
anything about the actual data written in the blocks.   It can
find errors in blocks and links that are both in a real file chain or
not currently part of any real file.   Generally fsck can fix those, 
but there are some things that it cannot make a reasonable guess on.

I hope this adds to the understanding rather than just confusing
you more.   Basically I am pointing out that there can be different
types or places for corruption.   No copying of files will fix a
problem if the errors are within the structure or data of the file
itself.   But, since fsck doesn't look at the actual data, but 
rather on structural integrity in the filesystem - the entity within
which the files reside, it is possible that it can find errors in
places that are not part of an actual current file.   If the latter
is the case, then copying the files out of the corrupt filesystem
in to a nice new one, freshly newfs-ed using dump/restore or some 
other method, can fix the problem.

But, if there are errors in the data, then no method of copying the
files will fix them.   And, if the filesystem corruption makes it
impossible to read some of the files, then no copying scheme will
fix them.   You might be able to tinker