Re: fsck setting d_ino == 0 (was Re: filesystem errors)

2001-08-24 Thread Kirk McKusick

To: Kirk McKusick <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
   Ollivier Robert <[EMAIL PROTECTED]>,
   Mikhail Teterin <[EMAIL PROTECTED]>
Subject: fsck setting d_ino == 0 (was Re: filesystem errors) 
In-Reply-To: Your message of "Sat, 28 Jul 2001 12:48:54 PDT."
 <[EMAIL PROTECTED]> 
Date: Wed, 22 Aug 2001 01:21:03 +0100
From: Ian Dowse <[EMAIL PROTECTED]>

In message <[EMAIL PROTECTED]>,
Kirk McKusick writes:
>FFS will never set a directory ino == 0 at a location other
>than the first entry in a directory, but fsck will do so to
>get rid of an unwanted entry. The readdir routines know to
>skip over an ino == 0 entry no matter where in the directory
>it is found, so applications will never see such entries.
>It would be a fair amount of work to change fsck to `do the
>right thing', as the checking code is given only the current
>entry with which to work. I am of the opinion that you
>should simply accept that mid-directory block ino == 0 is
>acceptable rather than trying to `fix' the problem.

Bleh, well I guess not too surprisingly, there is a case in
ufs_direnter() (ufs_lookup.c) where the kernel does the wrong thing
when a mid-block entry has d_ino == 0. The result can be serious
directory corruption, and the bug has been there since the Lite/2
merge:

# fetch http://www.maths.tcd.ie/~iedowse/FreeBSD/dirbug_img.gz
Receiving dirbug_img.gz (6745 bytes): 100%
6745 bytes transferred in 0.0 seconds (4.69 MBps)
# gunzip dirbug_img.gz
# mdconfig -a -t vnode -f dirbug_img
md0
# fsck_ffs /dev/md0
** /dev/md0
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
20 files, 1 used, 2638 free (14 frags, 328 blocks, 0.5% fragmentation)
# mount /dev/md0 /mnt
# touch /mnt/ff12
# umount /mnt
# fsck_ffs /dev/md0
** /dev/md0
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=2  OWNER=root MODE=40755
SIZE=512 MTIME=Aug 21 22:28 2001 
DIR=/

SALVAGE? [yn]

The bug is that when compressing directory blocks, the code trusts
the DIRSIZ() macro to calculate the amount of data to be bcopy'd
when moving a directory entry. If d_ino is zero, DIRSIZ() cannot
be trusted, so random bytes in unused portions of the directory
determine how much gets copied. I think it is very unlikely in
practice for the value returned by DIRSIZ() to be harmful, but fsck
certainly doesn't check it so this bug can be triggered after other
types of corruption have been repaired by fsck.

I just found this while looking for a dirhash bug - the dirhash
code didn't check for d_ino == 0 when compressing directories,
so it would freak when it couldn't find the entry to move. The
patch below should fix both these issues, and it makes it clearer
that DIRSIZ() is not used when d_ino == 0.

Any comments welcome. The patch is a bit larger than it needs to
be, but that directory compression code is so hard to understand
that I think it is worth clarifying it slightly :-)

Ian

The compaction code started out deeply nested and highly
conditional. I was very happy to get it down to one for loop
with single nested conditionals. That being said, it is still
pretty hard to follow. Anyway, I agree with your change. It is
amazing to me that that bug has been present since the day the
code was written (1983) and has not been noticed until now.

Kirk


Index: ufs_lookup.c
===
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/ufs/ufs/ufs_lookup.c,v
retrieving revision 1.52
diff -u -r1.52 ufs_lookup.c
--- ufs_lookup.c2001/08/18 03:08:48 1.52
+++ ufs_lookup.c2001/08/21 23:59:09
@@ -869,26 +869,38 @@
 * dp->i_offset + dp->i_count would yield the space.
 */
ep = (struct direct *)dirbuf;
-   dsize = DIRSIZ(OFSFMT(dvp), ep);
+   dsize = ep->d_ino ? DIRSIZ(OFSFMT(dvp), ep) : 0;
spacefree = ep->d_reclen - dsize;
for (loc = ep-

fsck setting d_ino == 0 (was Re: filesystem errors)

2001-08-21 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Kirk McKusick writ
es:
>FFS will never set a directory ino == 0 at a location other
>than the first entry in a directory, but fsck will do so to
>get rid of an unwanted entry. The readdir routines know to
>skip over an ino == 0 entry no matter where in the directory
>it is found, so applications will never see such entries.
>It would be a fair amount of work to change fsck to `do the
>right thing', as the checking code is given only the current
>entry with which to work. I am of the opinion that you
>should simply accept that mid-directory block ino == 0 is
>acceptable rather than trying to `fix' the problem.

Bleh, well I guess not too surprisingly, there is a case in
ufs_direnter() (ufs_lookup.c) where the kernel does the wrong thing
when a mid-block entry has d_ino == 0. The result can be serious
directory corruption, and the bug has been there since the Lite/2
merge:

# fetch http://www.maths.tcd.ie/~iedowse/FreeBSD/dirbug_img.gz
Receiving dirbug_img.gz (6745 bytes): 100%
6745 bytes transferred in 0.0 seconds (4.69 MBps)
# gunzip dirbug_img.gz
# mdconfig -a -t vnode -f dirbug_img
md0
# fsck_ffs /dev/md0
** /dev/md0
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
20 files, 1 used, 2638 free (14 frags, 328 blocks, 0.5% fragmentation)
# mount /dev/md0 /mnt
# touch /mnt/ff12
# umount /mnt
# fsck_ffs /dev/md0
** /dev/md0
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=2  OWNER=root MODE=40755
SIZE=512 MTIME=Aug 21 22:28 2001 
DIR=/

SALVAGE? [yn]

The bug is that when compressing directory blocks, the code trusts
the DIRSIZ() macro to calculate the amount of data to be bcopy'd
when moving a directory entry. If d_ino is zero, DIRSIZ() cannot
be trusted, so random bytes in unused portions of the directory
determine how much gets copied. I think it is very unlikely in
practice for the value returned by DIRSIZ() to be harmful, but fsck
certainly doesn't check it so this bug can be triggered after other
types of corruption have been repaired by fsck.

I just found this while looking for a dirhash bug - the dirhash
code didn't check for d_ino == 0 when compressing directories,
so it would freak when it couldn't find the entry to move. The
patch below should fix both these issues, and it makes it clearer
that DIRSIZ() is not used when d_ino == 0.

Any comments welcome. The patch is a bit larger than it needs to
be, but that directory compression code is so hard to understand
that I think it is worth clarifying it slightly :-)

Ian


Index: ufs_lookup.c
===
RCS file: /FreeBSD/FreeBSD-CVS/src/sys/ufs/ufs/ufs_lookup.c,v
retrieving revision 1.52
diff -u -r1.52 ufs_lookup.c
--- ufs_lookup.c2001/08/18 03:08:48 1.52
+++ ufs_lookup.c2001/08/21 23:59:09
@@ -869,26 +869,38 @@
 * dp->i_offset + dp->i_count would yield the space.
 */
ep = (struct direct *)dirbuf;
-   dsize = DIRSIZ(OFSFMT(dvp), ep);
+   dsize = ep->d_ino ? DIRSIZ(OFSFMT(dvp), ep) : 0;
spacefree = ep->d_reclen - dsize;
for (loc = ep->d_reclen; loc < dp->i_count; ) {
nep = (struct direct *)(dirbuf + loc);
-   if (ep->d_ino) {
-   /* trim the existing slot */
-   ep->d_reclen = dsize;
-   ep = (struct direct *)((char *)ep + dsize);
-   } else {
-   /* overwrite; nothing there; header is ours */
-   spacefree += dsize;
+
+   /* Trim the existing slot (NB: dsize may be zero). */
+   ep->d_reclen = dsize;
+   ep = (struct direct *)((char *)ep + dsize);
+
+   loc += nep->d_reclen;
+   if (nep->d_ino == 0) {
+   /*
+* A mid-block unused entry. Such entries are
+* never created by the kernel, but fsck_ffs
+* can create them (and it doesn't fix them).
+*
+* Add up the free space, and initialise the
+* relocated entry since we don't bcopy it.
+*/
+   spacefree += nep->d_reclen;
+   ep->d_ino = 0;
+   dsize = 0;
+   continue;
}
dsize = DIRSIZ(OFSFMT(dvp), nep);
spacefree += nep->d_reclen - dsize;
 #ifdef UFS_DIRHASH
if (dp->i_dirhash != NULL)
-   uf

Re: filesystem errors

2001-07-28 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Michael Harnois writes:
>
>I don't have sufficient technical knowledge to know which of you is
>right; I would just ask that filesystem corruption caused by
>restarting from a hung system not cause a panic .

I removed the extra sanity check yesterday, so if you have revision
1.3 of ufs_dirhash.c you won't see that panic again. I didn't
realise that fsck actually causes these directory entries, but just
the fact that it leaves them intact meant that the sanity check
was bad.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: filesystem errors

2001-07-28 Thread Michael Harnois

On Sat, 28 Jul 2001 12:48:54 -0700, Kirk McKusick <[EMAIL PROTECTED]> said:

> FFS will never set a directory ino == 0 at a location other than
> the first entry in a directory, but fsck will do so to get rid
> of an unwanted entry. The readdir routines know to skip over an
> ino == 0 entry no matter where in the directory it is found, so
> applications will never see such entries. It would be a fair
> amount of work to change fsck to `do the right thing', as the
> checking code is given only the current entry with which to
> work. I am of the opinion that you should simply accept that
> mid-directory block ino == 0 is acceptable rather than trying to
> `fix' the problem.

I don't have sufficient technical knowledge to know which of you is
right; I would just ask that filesystem corruption caused by
restarting from a hung system not cause a panic .

-- 
Michael D. Harnois   bilocational bivocational
Washburn, Iowa  Minneapolis, Minnesota
 Hanlon's Razor: Never attribute to malice 
 that which is adequately explained by stupidity.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: filesystem errors

2001-07-28 Thread Kirk McKusick

To: Michael Harnois <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: filesystem errors 
In-Reply-To: Your message of "Wed, 25 Jul 2001 23:14:16 CDT."
 <[EMAIL PROTECTED]> 
Date: Thu, 26 Jul 2001 15:14:09 +0100
From: Ian Dowse <[EMAIL PROTECTED]>

In message <[EMAIL PROTECTED]>,
Michael Harnois writes:

>I'm tearing my hair out trying to find a filesystem error that's
>causing me a panic: ufsdirhash_checkblock: bad dir inode.
>
>When I run fsck from a single user boot, it finds no errors.
>
>When I run it on the same filesystem mounted, it finds errors: but, of
>course, it then can't correct them

[Kirk, I'm cc'ing you because here the dirhash code sanity checks
found a directory entry with d_ino == 0 that was not at the start
of a DIRBLKSIZ block. This doesn't happen normally, but it seems
from this report that fsck does not correct this. Is it a basic
filesystem assumption that d_ino == 0 can only happen at the start
of a directory block, or is it something the code should tolerate?]

FFS will never set a directory ino == 0 at a location other
than the first entry in a directory, but fsck will do so to
get rid of an unwanted entry. The readdir routines know to
skip over an ino == 0 entry no matter where in the directory
it is found, so applications will never see such entries.
It would be a fair amount of work to change fsck to `do the
right thing', as the checking code is given only the current
entry with which to work. I am of the opinion that you
should simply accept that mid-directory block ino == 0 is
acceptable rather than trying to `fix' the problem.

Interesting - this is an error reported by the UFS_DIRHASH code
that you enabled in your kernel config. A sanity check that the
dirhash code is performing is failing. These checks are designed
to catch bugs in the dirhash code, but in this case I think it may
be a bug that fsck is not finding this problem, or else my sanity
tests are too strict.

A workaround is to turn off the sanity checks with:

sysctl vfs.ufs.dirhash_docheck=0

or to remove UFS_DIRHASH from your kernel config. You could also
try to find the directory that is causing the problems. Copy the
following script to a file called dircheck.pl, and try running:

chmod 755 dircheck.pl
find / -fstype ufs -type d -print0 | xargs ./dircheck.pl

That should show up any directories that would fail that dirhash
sanity check - there will probably just be one or two that resulted
from some old filesystem corruption.

Ian


#!/usr/local/bin/perl

while (defined($dir = shift)) {
unless (open(DIR, "$dir")) {
print STDERR "$dir: $!\n";
next;
}

$b = 0;
my(%dir) = ();

while (sysread(DIR, $dat, 512) == 512) {
$off = 0;
while (length($dat) > 0) {
($dir{'d_ino'}, $dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}) = unpack("LSCC", $dat);
$dir{'d_name'} = substr($dat, 8, $dir{'d_namlen'});
$minreclen = (8 + $dir{'d_namlen'} + 1 + 3) & (~3);
$gapinfo = ($dir{'d_reclen'} == $minreclen) ? "" :
sprintf("[%d]", $dir{'d_reclen'} - $minreclen);

if ($dir{'d_ino'} == 0 && $off != 0) {
printf("%s off %d ino %d reclen 0x%x type 0%o"
. " namelen %d name '%s' %s\n",
$dir, $off, $dir{'d_ino'},
$dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}, $dir{'d_name'},
$gapinfo);
}
if ($dir{'d_reclen'} > length($dat)) {
die "reclen too long!\n";
}
$dat = substr($dat, $dir{'d_reclen'});
$off += $dir{'d_reclen'};
}
$b++;
}
}

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: filesystem errors

2001-07-26 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Michael Harnois writes:
>
>The only result it generated was
>
>/usr/home/mdharnois off 120 ino 0 reclen 0x188 type 010 namelen 14 name '.fetc
>hmail.pid' [368]
>
>and that file is destroted and recreated every couple of minutes.

It's the directory (/usr/home/mdharnois), not the file that is the
problem. If you recreate the directory:

cd /usr/home
mv mdharnois mdharnois.old
mkdir mdharnois
chown mdharnois:mdharnois mdharnois # (or whatever)
mv mdharnois.old/* mdharnois/
mv mdharnois.old/.[a-zA-Z0-9]* mdharnois/
rmdir mdharnois.old

this problem should go away permanently. Even just creating loads of
files in the existing directory might be enough to reuse the bit of
the directory that has d_ino == 0. Running

./dircheck.pl /usr/home/mdharnois

will check if there is still a problem.

However, I'd like to know if this is something that fsck should
detect and correct automatically. It is an odd case, because the
ffs filesystem code never creates directory entries like this, but
I think it will not object to them if it finds them. This kind of
ambiguity is probably a bad thing.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: filesystem errors

2001-07-26 Thread Michael Harnois

On Thu, 26 Jul 2001 15:14:09 +0100, Ian Dowse <[EMAIL PROTECTED]> said:

> That should show up any directories that would fail that dirhash
> sanity check - there will probably just be one or two that
> resulted from some old filesystem corruption.

The only result it generated was

/usr/home/mdharnois off 120 ino 0 reclen 0x188 type 010 namelen 14 name 
'.fetchmail.pid' [368]

and that file is destroted and recreated every couple of minutes.

-- 
Michael D. Harnois   bilocational bivocational
Washburn, Iowa  Minneapolis, Minnesota
 Sed quis custodiet ipsos custodes? -- Juvenal

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: filesystem errors

2001-07-26 Thread Ian Dowse

In message <[EMAIL PROTECTED]>, Michael Harnois writes:
>I'm tearing my hair out trying to find a filesystem error that's
>causing me a panic: ufsdirhash_checkblock: bad dir inode.
>
>When I run fsck from a single user boot, it finds no errors.
>
>When I run it on the same filesystem mounted, it finds errors: but, of
>course, it then can't correct them

[Kirk, I'm cc'ing you because here the dirhash code sanity checks
found a directory entry with d_ino == 0 that was not at the start
of a DIRBLKSIZ block. This doesn't happen normally, but it seems
from this report that fsck does not correct this. Is it a basic
filesystem assumption that d_ino == 0 can only happen at the start
of a directory block, or is it something the code should tolerate?]

Interesting - this is an error reported by the UFS_DIRHASH code
that you enabled in your kernel config. A sanity check that the
dirhash code is performing is failing. These checks are designed
to catch bugs in the dirhash code, but in this case I think it may
be a bug that fsck is not finding this problem, or else my sanity
tests are too strict.

A workaround is to turn off the sanity checks with:

sysctl vfs.ufs.dirhash_docheck=0

or to remove UFS_DIRHASH from your kernel config. You could also
try to find the directory that is causing the problems. Copy the
following script to a file called dircheck.pl, and try running:

chmod 755 dircheck.pl
find / -fstype ufs -type d -print0 | xargs ./dircheck.pl

That should show up any directories that would fail that dirhash
sanity check - there will probably just be one or two that resulted
from some old filesystem corruption.

Ian


#!/usr/local/bin/perl

while (defined($dir = shift)) {
unless (open(DIR, "$dir")) {
print STDERR "$dir: $!\n";
next;
}

$b = 0;
my(%dir) = ();

while (sysread(DIR, $dat, 512) == 512) {
$off = 0;
while (length($dat) > 0) {
($dir{'d_ino'}, $dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}) = unpack("LSCC", $dat);
$dir{'d_name'} = substr($dat, 8, $dir{'d_namlen'});
$minreclen = (8 + $dir{'d_namlen'} + 1 + 3) & (~3);
$gapinfo = ($dir{'d_reclen'} == $minreclen) ? "" :
sprintf("[%d]", $dir{'d_reclen'} - $minreclen);

if ($dir{'d_ino'} == 0 && $off != 0) {
printf("%s off %d ino %d reclen 0x%x type 0%o"
. " namelen %d name '%s' %s\n",
$dir, $off, $dir{'d_ino'},
$dir{'d_reclen'}, $dir{'d_type'},
$dir{'d_namlen'}, $dir{'d_name'},
$gapinfo);
}
if ($dir{'d_reclen'} > length($dat)) {
die "reclen too long!\n";
}
$dat = substr($dat, $dir{'d_reclen'});
$off += $dir{'d_reclen'};
}
$b++;
}
}

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message