Re: dump/restore corrupted filesystems
Roland Smith wrote: Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. That might be worth filing a PR for, especially the panics. Exactly what is damaged? Garbage in files? Wrong inode counts? I've had unclean filesystems because of panics, but nothing fsck_ffs couldn't fix. You might want to check the hardware too. Use smartmontools in case of (S)ATA drives. Smart says that the drives are fine, as does the manufacturer's disk fitness tools. All the files that are readable contain correct data, but the files that are corrupt are totally not readable, and cannot even be removed manually: --8-- rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es failed: Bad file descriptor (9) rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr failed: Bad file descriptor (9) --8-- fsck_ufs dies after about 30 minutes of grinding with the following: --8-- ** Phase 2 - Check Pathnames DIRECTORY CORRUPTED I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY SALVAGE? no MISSING '.' I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS UNEXPECTED SOFT UPDATE INCONSISTENCY fsck_ufs: inoinfo: inumber -1170056596 out of range --8-- (full output is at http://home.cyberleo.net/cyberleo/workspace/Zip/Bugs/fbsd-20070320-corr/saba-fsck-raid.txt ) It's possible this might be a result of the odd interaction between geom_raid5 and UFS, as discovered in January ( http://www.nabble.com/geom_raid5-livelock--p8304142.html ), but I can't be sure. I've already chalked this up to just an unfortunate occurrence, as the circumstances that caused the corruption in the first place are likely either long gone or so obscure as to be nearly impossible for me to root out. Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used inodes list and all directories. So if any of these is corrupt, your dump will be too. And if the contents of the inodes is corrupted, so will the dump. Thanks for this insight. I'll avoid dump/restore and just use manual copying for now. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dump/restore corrupted filesystems
On Wed, Apr 18, 2007 at 04:09:22PM -0500, CyberLeo Kitsana wrote: Roland Smith wrote: Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. That might be worth filing a PR for, especially the panics. Exactly what is damaged? Garbage in files? Wrong inode counts? I've had unclean filesystems because of panics, but nothing fsck_ffs couldn't fix. You might want to check the hardware too. Use smartmontools in case of (S)ATA drives. Smart says that the drives are fine, as does the manufacturer's disk fitness tools. All the files that are readable contain correct data, but the files that are corrupt are totally not readable, and cannot even be removed manually: Given that, I would try to make a dump(8) of it. If dump dies on a particular file, try to exclude that file from the dump either by rm-ing it or setting a nodump flag and try again. You may not actually be able to do the rm or nodump flag though if you cannot mount it with write permission. You might be able to force it mounted without doing the fsck in single user. Note that tar allows you to specify exclusions. I usually don't suggest using tar for mass moves because it has weaknesses with hard links and might also not transfer flags and permissions correctly. But, if tar is what it takes, then use it. Good luck, jerry --8-- rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es failed: Bad file descriptor (9) rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr failed: Bad file descriptor (9) --8-- fsck_ufs dies after about 30 minutes of grinding with the following: --8-- ** Phase 2 - Check Pathnames DIRECTORY CORRUPTED I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY SALVAGE? no MISSING '.' I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS UNEXPECTED SOFT UPDATE INCONSISTENCY fsck_ufs: inoinfo: inumber -1170056596 out of range --8-- (full output is at http://home.cyberleo.net/cyberleo/workspace/Zip/Bugs/fbsd-20070320-corr/saba-fsck-raid.txt ) It's possible this might be a result of the odd interaction between geom_raid5 and UFS, as discovered in January ( http://www.nabble.com/geom_raid5-livelock--p8304142.html ), but I can't be sure. I've already chalked this up to just an unfortunate occurrence, as the circumstances that caused the corruption in the first place are likely either long gone or so obscure as to be nearly impossible for me to root out. Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used inodes list and all directories. So if any of these is corrupt, your dump will be too. And if the contents of the inodes is corrupted, so will the dump. Thanks for this insight. I'll avoid dump/restore and just use manual copying for now. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dump/restore corrupted filesystems
On Wed, Apr 18, 2007 at 04:09:22PM -0500, CyberLeo Kitsana wrote: Roland Smith wrote: Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. That might be worth filing a PR for, especially the panics. Exactly what is damaged? Garbage in files? Wrong inode counts? I've had unclean filesystems because of panics, but nothing fsck_ffs couldn't fix. You might want to check the hardware too. Use smartmontools in case of (S)ATA drives. Smart says that the drives are fine, as does the manufacturer's disk fitness tools. That's at least some good news. --8-- rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/es failed: Bad file descriptor (9) rsync: readlink /raid/Backup/Pizzabox/2007-02-23/cyberleo/secondlife/linux/SecondLife_i686_1_13_2_15/skins/xui/fr failed: Bad file descriptor (9) --8-- At least these files should be easy to replace, if necessary. fsck_ufs dies after about 30 minutes of grinding with the following: --8-- ** Phase 2 - Check Pathnames DIRECTORY CORRUPTED I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY Did these problems start after a crash? SALVAGE? no What happens if you tell it to try and salvage? Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgp5z4guQPOE6.pgp Description: PGP signature
Re: dump/restore corrupted filesystems
Jerry McAllister wrote: Smart says that the drives are fine, as does the manufacturer's disk fitness tools. All the files that are readable contain correct data, but the files that are corrupt are totally not readable, and cannot even be removed manually: Given that, I would try to make a dump(8) of it. If dump dies on a particular file, try to exclude that file from the dump either by rm-ing it or setting a nodump flag and try again. You may not actually be able to do the rm or nodump flag though if you cannot mount it with write permission. You might be able to force it mounted without doing the fsck in single user. Note that tar allows you to specify exclusions. I usually don't suggest using tar for mass moves because it has weaknesses with hard links and might also not transfer flags and permissions correctly. But, if tar is what it takes, then use it. Force-mounting the filesystem works just fine. It's when I try to modify any munged file that it panics the box, with ufs_dirbad or somesuch. I have been using rsync to recover readable data, which handles hard-links, permissions, sparse files, and et cetera. I figure it's best, as that's what is used to drop the differential backups onto the box in the first place. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dump/restore corrupted filesystems
Roland Smith wrote: --8-- ** Phase 2 - Check Pathnames DIRECTORY CORRUPTED I=93409222 OWNER=1002 MODE=40755 SIZE=512 MTIME=Feb 10 00:49 2007 DIR=? UNEXPECTED SOFT UPDATE INCONSISTENCY Did these problems start after a crash? It's possible, but I cannot be absolutely certain. The machine is supposed to start itself up and shut itself down every day, running a total of about 4 hours a day, during the span when all other machines dump their backups. The only reason I noticed this failure was because it didn't power down one day. Investigation revealed that FSCK had failed and dropped to single user, with errors seen in the log. SALVAGE? no What happens if you tell it to try and salvage? This was a dry-run to get the error log. When I actually tried to repair the filesystem, fsck aborts shortly after, complaining that it cannot fix the filesystem, and cannot continue. Hence the current path of removing everything and re-newfs'ing. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
dump/restore corrupted filesystems
Hi! I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I must move all 500GB or so of data off of it and re-newfs it. Does anybody know whether dump/restore can gracefully handle filesystem corruption, or will it happily back up and restore said damage to the pristine filesystem? Thanks! -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dump/restore corrupted filesystems
On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote: I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I must move all 500GB or so of data off of it and re-newfs it. If the corruption is due to hardware failure, your data is probably lost. Ditto if the corruption is so bad that fsck_ffs can't handle it. You can e.g. tell fsck_ffs(8) to use a backup superblock, with the -b option. Does anybody know whether dump/restore can gracefully handle filesystem corruption, or will it happily back up and restore said damage to the pristine filesystem? Dump examines the filesystem to see which files need to be backed up. So dumping a corrupted FS will probably not produce the desired results. If it did, we wouldn't need backups. What you could do is use dd(1) with nc(1) to send a copy of the raw device data to another machine, and try if you can pry your data from that. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpyOmrbRZctd.pgp Description: PGP signature
Re: dump/restore corrupted filesystems
Roland Smith wrote: On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote: I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I must move all 500GB or so of data off of it and re-newfs it. If the corruption is due to hardware failure, your data is probably lost. Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. My question was more along the lines of whether or not dump/restore would see that those corrupted directory and file inodes were indeed corrupt and not bother attempting to back them up, or if it would happily back them up and restore them in their corrupted state to a new filesystem, thus trashing it. If it does, I can always use rsync. Dump examines the filesystem to see which files need to be backed up. So dumping a corrupted FS will probably not produce the desired results. If it did, we wouldn't need backups. Ironically, this is the machine that holds the backups. -- Fuzzy love, -CyberLeo Technical Administrator CyberLeo.Net Webhosting http://www.CyberLeo.Net [EMAIL PROTECTED] Furry Peace! - http://www.fur.com/peace/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dump/restore corrupted filesystems
On Mon, Apr 16, 2007 at 11:14:35PM -0500, CyberLeo Kitsana wrote: Roland Smith wrote: On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote: I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I must move all 500GB or so of data off of it and re-newfs it. If the corruption is due to hardware failure, your data is probably lost. Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. That might be worth filing a PR for, especially the panics. Exactly what is damaged? Garbage in files? Wrong inode counts? I've had unclean filesystems because of panics, but nothing fsck_ffs couldn't fix. You might want to check the hardware too. Use smartmontools in case of (S)ATA drives. My question was more along the lines of whether or not dump/restore would see that those corrupted directory and file inodes were indeed corrupt and not bother attempting to back them up, or if it would happily back them up and restore them in their corrupted state to a new filesystem, thus trashing it. Looking at /usr/src/sbin/dump/traverse.c, dump traverses the used inodes list and all directories. So if any of these is corrupt, your dump will be too. And if the contents of the inodes is corrupted, so will the dump. Ironically, this is the machine that holds the backups. Oops. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpINMVo3zQiB.pgp Description: PGP signature
Re: dump/restore corrupted filesystems
On Mon, Apr 16, 2007 at 11:14:35PM -0500, CyberLeo Kitsana wrote: Roland Smith wrote: On Mon, Apr 16, 2007 at 09:11:48AM -0500, CyberLeo Kitsana wrote: I have a 1.2TB UFS2 filesystem with irrecoverable corruption. As such, I must move all 500GB or so of data off of it and re-newfs it. If the corruption is due to hardware failure, your data is probably lost. Sorry if I wasn't clear. Most all of the data is readable and complete if I mount the filesystem read-only. It just panics the box when mounted read/write, and fsck can't fix the damage. My question was more along the lines of whether or not dump/restore would see that those corrupted directory and file inodes were indeed corrupt and not bother attempting to back them up, or if it would happily back them up and restore them in their corrupted state to a new filesystem, thus trashing it. It depends on how they are corrupted. Really there are three situations. In the first, something happened to cause a problem with the filesystem structure - the block and their pointer chains/links. That would make fsck see errors and possibly refuse to complete. If that also affects the ability to read some actual file then neither dump/restore nor any other copy method will fix the situation. dump and other utilities will fail when reading the files and abort. You might be able to tinker around a little, figure out which actual files are affected and delete them or set dump not to read them and then copy all the rest. But, if you are unable to mount the filesystem as write, this might not work. If you are able to copy most, then those files would be uncorrupted in the new location. You would just have to figure out what to do about the files you could not read. Second would be a similar corruption to the filesystem structure blocks and links, but it happens to luckily not be in a place currently being used by any actual files. In this case, fsck would fail, but you could still read the files enough to copy them to some other space. In this case, the copy process, whether dump/restore or some other - dump/restore is probably best - would fix the problem nicely. The copy would be uncorrupted. The third situation would be where the data itself was miswritten - maybe by a routine that cobbled some computation or database utility or whatever. In this case, fsck would not see any problem with the filesystem. It would see that all the blocks and links were nicely accounted for. But the data would be bad and no amount of copying would fix it. If fact, dump or any other copy utility would read the files without errors just fine and dandy, because it would not know of the corruptions - so they would just follow it to the new copy. dump/restore won't make any difference to/fix any fsck type errors. It works above that level - on the files' data itself. fsck works below the file level, on blocks and file chain links, etc. If fsck finds an unfixable error, dump or any other utility will fail too if the error is in the area it is trying to read. When you have dump-ed, then if you need to restore in to a cleanly created new filesystem. Remember that newfs created a filesystem on a partition. Then the copy should not be corrupted from an fsck point of view. This is not because of anything that dump/restore would do, but because the newfs made a clean new system that fsck would be happy with. Now, if the data itself is corrupt - but readable, then dump will happily read the corrupt data and restore will happily write out what dump created. The data would be just as incorrect. But, again, that is not at the fsck level. It is at the file and directory level. fsck works on blocks and links and doesn't care anything about the actual data written in the blocks. It can find errors in blocks and links that are both in a real file chain or not currently part of any real file. Generally fsck can fix those, but there are some things that it cannot make a reasonable guess on. I hope this adds to the understanding rather than just confusing you more. Basically I am pointing out that there can be different types or places for corruption. No copying of files will fix a problem if the errors are within the structure or data of the file itself. But, since fsck doesn't look at the actual data, but rather on structural integrity in the filesystem - the entity within which the files reside, it is possible that it can find errors in places that are not part of an actual current file. If the latter is the case, then copying the files out of the corrupt filesystem in to a nice new one, freshly newfs-ed using dump/restore or some other method, can fix the problem. But, if there are errors in the data, then no method of copying the files will fix them. And, if the filesystem corruption makes it impossible to read some of the files, then no copying scheme will fix them. You might be able to tinker