Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Bruce Evans writes:

This is either disk corruption or an ffs bug.  ffs passes the garbage
block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
by panicing.  Garbage block numbers, including negative ones, can possibly
be created by applications seeking to preposterous offsets, so they should
not be handled with panics.

They most certainly should!  If the range checking in any filesystem
is not able to catch these cases I insist that GEOM do so with a panic.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Bernd Walter
On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Bruce Evans writes:
 
 This is either disk corruption or an ffs bug.  ffs passes the garbage
 block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
 by panicing.  Garbage block numbers, including negative ones, can possibly
 be created by applications seeking to preposterous offsets, so they should
 not be handled with panics.
 
 They most certainly should!  If the range checking in any filesystem
 is not able to catch these cases I insist that GEOM do so with a panic.

What is wrong with returning an IO error?

I always hated panics because of filesystem corruptions.
An alternative would be to just bring that filesystem down.
Its easy to panic a whole system with a bogus filesystem on a removeable
media.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Bernd Walter writes:
On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Bruce Evans writes:
 
 This is either disk corruption or an ffs bug.  ffs passes the garbage
 block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
 by panicing.  Garbage block numbers, including negative ones, can possibly
 be created by applications seeking to preposterous offsets, so they should
 not be handled with panics.
 
 They most certainly should!  If the range checking in any filesystem
 is not able to catch these cases I insist that GEOM do so with a panic.

What is wrong with returning an IO error?

I always hated panics because of filesystem corruptions.
An alternative would be to just bring that filesystem down.
Its easy to panic a whole system with a bogus filesystem on a removeable
media.

I hate panics too, but this would be an indication of a serious
filesystem error, so a panic is in order.  Otherwise we would be
unlikely to ever receive a report which would allow us to fix
the problem.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Bernd Walter
On Wed, Sep 17, 2003 at 10:30:15AM +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Bernd Walter writes:
 On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote:
  In message [EMAIL PROTECTED], Bruce Evans writes:
  
  This is either disk corruption or an ffs bug.  ffs passes the garbage
  block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
  by panicing.  Garbage block numbers, including negative ones, can possibly
  be created by applications seeking to preposterous offsets, so they should
  not be handled with panics.
  
  They most certainly should!  If the range checking in any filesystem
  is not able to catch these cases I insist that GEOM do so with a panic.
 
 What is wrong with returning an IO error?
 
 I always hated panics because of filesystem corruptions.
 An alternative would be to just bring that filesystem down.
 Its easy to panic a whole system with a bogus filesystem on a removeable
 media.
 
 I hate panics too, but this would be an indication of a serious
 filesystem error, so a panic is in order.  Otherwise we would be
 unlikely to ever receive a report which would allow us to fix
 the problem.

Don't you think that people will report them if the filesystem is
automatically unmounted?
Accepted that's not an option for the GEOM point and that panicing
here can be good to fix range checking in the filesystem.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Poul-Henning Kamp
In message [EMAIL PROTECTED], Bernd Walter writes:

Don't you think that people will report them if the filesystem is
automatically unmounted?

We can't sensibly do that.

Accepted that's not an option for the GEOM point and that panicing
here can be good to fix range checking in the filesystem.

That's the point:  Our filesystems should be robust.  If they're
not they should be fixed.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread John-Mark Gurney
Bernd Walter wrote this message on Wed, Sep 17, 2003 at 10:27 +0200:
 On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote:
  In message [EMAIL PROTECTED], Bruce Evans writes:
  
  This is either disk corruption or an ffs bug.  ffs passes the garbage
  block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
  by panicing.  Garbage block numbers, including negative ones, can possibly
  be created by applications seeking to preposterous offsets, so they should
  not be handled with panics.
  
  They most certainly should!  If the range checking in any filesystem
  is not able to catch these cases I insist that GEOM do so with a panic.
 
 What is wrong with returning an IO error?
 
 I always hated panics because of filesystem corruptions.
 An alternative would be to just bring that filesystem down.
 Its easy to panic a whole system with a bogus filesystem on a removeable
 media.

If you're file system is so hosed that it does this, then panicing
is the only safe thing to do.  You don't know what continued operation
will do to the filesytem, and you might end up losing more data.

It is not unresonable to put parameter restrictions on function calls.
It is not much different from enforcing that a pointer is not NULL when
being passed as an argument.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread Bernd Walter
On Wed, Sep 17, 2003 at 12:52:03PM -0700, John-Mark Gurney wrote:
 Bernd Walter wrote this message on Wed, Sep 17, 2003 at 10:27 +0200:
  On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote:
   In message [EMAIL PROTECTED], Bruce Evans writes:
   
   This is either disk corruption or an ffs bug.  ffs passes the garbage
   block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
   by panicing.  Garbage block numbers, including negative ones, can possibly
   be created by applications seeking to preposterous offsets, so they should
   not be handled with panics.
   
   They most certainly should!  If the range checking in any filesystem
   is not able to catch these cases I insist that GEOM do so with a panic.
  
  What is wrong with returning an IO error?
  
  I always hated panics because of filesystem corruptions.
  An alternative would be to just bring that filesystem down.
  Its easy to panic a whole system with a bogus filesystem on a removeable
  media.
 
 If you're file system is so hosed that it does this, then panicing
 is the only safe thing to do.  You don't know what continued operation
 will do to the filesytem, and you might end up losing more data.

You don't do anything to a filesystem if you force umount it on
detected inconsistencies, but your system is still up.
In which way could the filesystem further harmed?
I have a bunch of MO media and also get media which were written by
others - currently the only way to be safe is to fsck every media bevor
mounting to not panic the system by just reading a removeable media.
I have no clue on about how hard it is to implement, but I can't see
anything wrong from the idea itself.

As I already wrote in another mail - panicing inside GEOM sounds OK,
because the FS shouldn't try to access unavailable blocks.

 It is not unresonable to put parameter restrictions on function calls.
 It is not much different from enforcing that a pointer is not NULL when
 being passed as an argument.

It is different - if a pointer is NULL then we have a software problem.
If the filesystem is broken then the software might be OK and the cause
could even be outside your own system.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-17 Thread John-Mark Gurney
Bernd Walter wrote this message on Wed, Sep 17, 2003 at 22:27 +0200:
  If you're file system is so hosed that it does this, then panicing
  is the only safe thing to do.  You don't know what continued operation
  will do to the filesytem, and you might end up losing more data.
 
 You don't do anything to a filesystem if you force umount it on
 detected inconsistencies, but your system is still up.
 In which way could the filesystem further harmed?
 I have a bunch of MO media and also get media which were written by
 others - currently the only way to be safe is to fsck every media bevor
 mounting to not panic the system by just reading a removeable media.
 I have no clue on about how hard it is to implement, but I can't see
 anything wrong from the idea itself.

there is nothing wrong with the idea, but implementation is difficult.
As far as GEOM is considered, it just gets data read/write requests from
various backing objects, but has no idea what fs or even if it is an
fs that is trying to access the block.  It could be broken swap code,
or some person's custom kernel web server, etc.  GEOM just can't know
how to behave in these cases.

 As I already wrote in another mail - panicing inside GEOM sounds OK,
 because the FS shouldn't try to access unavailable blocks.

Exactly.

  It is not unresonable to put parameter restrictions on function calls.
  It is not much different from enforcing that a pointer is not NULL when
  being passed as an argument.
 
 It is different - if a pointer is NULL then we have a software problem.
 If the filesystem is broken then the software might be OK and the cause
 could even be outside your own system.

If the filesystem is broken, then we still have a software bug for not
asserting that the properties of the fs is maintained.  If/when we ever
support user mounting fs's, we need to make sure that the fs doesn't do
wacky things and provide a way to escelate permissions or crash the box.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-15 Thread Kris Kennaway
bad block 8239054478774324592, ino 3229486
bad block 7021770428354685254, ino 3229486
panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
Debugger(panic)
Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db trace
Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54
panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5
g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7
spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef
spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72
spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18
breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122
bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c
ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286
indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334
handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at 
handle_workitem_freeblocks+0x21e
process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd
softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0
sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304
fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf
fork_trampoline() at fork_trampoline+0x1a
--- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 ---
db

Is this disk corruption, or a bug?

Kris


pgp0.pgp
Description: PGP signature


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-15 Thread Poul-Henning Kamp

8239054478774324592 = 0x72570065646F4D70 = rW\0edoMp

7021770428354685254 = 0x617257006C6C6946 = arW\0lliF

That looks suspicious to me...

In message [EMAIL PROTECTED], Kris Kennaway writes:

--OgqxwSJOaUobr8KG
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

bad block 8239054478774324592, ino 3229486
bad block 7021770428354685254, ino 3229486
panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
Debugger(panic)
Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db trace
Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54
panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5
g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7
spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef
spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72
spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18
breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122
bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c
ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286
indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334
handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at 
handle_workitem_freeblocks+0x21e
process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd
softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0
sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304
fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf
fork_trampoline() at fork_trampoline+0x1a
--- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 ---
db

Is this disk corruption, or a bug?

Kris

--OgqxwSJOaUobr8KG
Content-Type: application/pgp-signature
Content-Disposition: inline

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/ZgHiWry0BWjoQKURAvPFAJ9vLJrNmZgRDT9Hhoked8il+5YGbACdENuh
U4x0Dyqvq01pYLya7q4Xo60=
=vAfZ
-END PGP SIGNATURE-

--OgqxwSJOaUobr8KG--


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-15 Thread Kris Kennaway
On Mon, Sep 15, 2003 at 08:24:24PM +0200, Poul-Henning Kamp wrote:
 
 8239054478774324592 = 0x72570065646F4D70 = rW\0edoMp
 
 7021770428354685254 = 0x617257006C6C6946 = arW\0lliF
 
 That looks suspicious to me...

Suspicious as indicating a kernel bug, or suspicious as in this panic
is spurious and likely due to hardware failure?

Kris


pgp0.pgp
Description: PGP signature


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-15 Thread Bruce Evans
On Mon, 15 Sep 2003, Kris Kennaway wrote:

 bad block 8239054478774324592, ino 3229486
 bad block 7021770428354685254, ino 3229486
 panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
 Debugger(panic)
 Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0
 db trace
 Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54
 panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5
 g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7
 spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef
 spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72
 spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18
 breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122
 bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c
 
 ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286
 indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334
 handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at 
 handle_workitem_freeblocks+0x21e
 process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd
 softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0
 sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304
 fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf
 fork_trampoline() at fork_trampoline+0x1a
 --- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 ---
 db

 Is this disk corruption, or a bug?

This is either disk corruption or an ffs bug.  ffs passes the garbage
block number 0xe5441ae9720 to bread.  GEOM then handles this austerely
by panicing.  Garbage block numbers, including negative ones, can possibly
be created by applications seeking to preposterous offsets, so they should
not be handled with panics.

The following script (with edits to turn off avoiding the bugs) demonstrated
related bugs the last time I tried it (about 6 months ago).

%%%
#!/bin/sh

SOMEFILE=/c/tmp/zz

# Bugs:
# (1) md silently truncates sizes (in DEV_BSIZE'ed units) mod 2^32.
# (2) at least pre-GEOM versions of md get confused by this and cause a
# null pointer panic in devstat.
#
# Use the maximum size that works (2^32 - 1).  Unfortunately, this prevents
# testing of file systems with size 2 TB or larger.
dd if=/dev/zero of=$SOMEFILE oseek=0xFFFE count=1

mdconfig -a -t vnode -f ${SOMEFILE} -u 0

# The large values here are more to make newfs not take too long than to
# get a large maxfilesize.
newfs -O 1 -b 65536 -f 65536 -i 6533600 /dev/md0

# Note that this reports a very large maxfilesise (2282 TB).  This is the
# size associated with the triple indirect block limit, not the correct
# one.  I think the actual limit should be 64 TB (less epsilon).
dumpfs /dev/md0 | grep maxfilesize

mount /dev/md0 /mnt

# Bugs:
# (1) fsbtodb(nb) overflows when nb has type ufs1_daddr_t and the result
# should be larger than (2^31 - 1).
# (2) dscheck() used to detect garbage block numbers caused by (1) (if the
# garbage happened to be negative or too large).  Then it reported the error
# and invalidated the buffer.  GEOM doesn't detect any error.  It apparently
# passes on the garbage, so the error is eventually detected by ffs (since
# md0 is on an ffs vnode) (if the garbage is preposterous).  ffs_balloc_ufs1()
# eventually sees the error as an EFBIG returned be bwrite() and gives up.
# But the buffer says in the buffer cache to cause problems later.
# (3) The result of bwrite() is sometimes ignored.
#
# Chop a couple of F's off the seek so that we don't get an EFBIG error.
# Unfortunately, this breaks testing for files of size near 2282 TB.
dd if=/dev/zero of=/mnt/zz oseek=0xFE count=1

ls -l /mnt/zz

# Bugs:
# (1) umount(2) returns the undocumented errno EFBIG for the unwriteable
# buffer.
# (2) umount -f and unmount at reboot fail too (the latter leaving all file
# systems dirty).
#
# Removing the file clears the problem.
rm /mnt/zz
umount /mnt

# Since we couldn't demonstrate files larger than 2 TB on md0, demonstrate
# one near ${SOMEFILE}.
dumpfs /c | egrep '(^bsize|^fsize|maxfilesize)'
dd if=/dev/zero of=$SOMEFILE-bigger oseek=0x3FFFE count=1
ls -l $SOMEFILE-bigger
rm $SOMEFILE-bigger

mdconfig -d -u 0
rm $SOMEFILE
%%%

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50

2003-09-15 Thread Kris Kennaway
On Tue, Sep 16, 2003 at 10:42:38AM +1000, Bruce Evans wrote:

 This is either disk corruption or an ffs bug.

Thanks.  The disk this occurred on is a flaky IBM drive which
periodically experiences other kinds of FS corruption, so I'm inclined
to blame it.

Kris

pgp0.pgp
Description: PGP signature