Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. What is wrong with returning an IO error? I always hated panics because of filesystem corruptions. An alternative would be to just bring that filesystem down. Its easy to panic a whole system with a bogus filesystem on a removeable media. -- B.Walter BWCThttp://www.bwct.de [EMAIL PROTECTED] [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
In message [EMAIL PROTECTED], Bernd Walter writes: On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. What is wrong with returning an IO error? I always hated panics because of filesystem corruptions. An alternative would be to just bring that filesystem down. Its easy to panic a whole system with a bogus filesystem on a removeable media. I hate panics too, but this would be an indication of a serious filesystem error, so a panic is in order. Otherwise we would be unlikely to ever receive a report which would allow us to fix the problem. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Wed, Sep 17, 2003 at 10:30:15AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bernd Walter writes: On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. What is wrong with returning an IO error? I always hated panics because of filesystem corruptions. An alternative would be to just bring that filesystem down. Its easy to panic a whole system with a bogus filesystem on a removeable media. I hate panics too, but this would be an indication of a serious filesystem error, so a panic is in order. Otherwise we would be unlikely to ever receive a report which would allow us to fix the problem. Don't you think that people will report them if the filesystem is automatically unmounted? Accepted that's not an option for the GEOM point and that panicing here can be good to fix range checking in the filesystem. -- B.Walter BWCThttp://www.bwct.de [EMAIL PROTECTED] [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
In message [EMAIL PROTECTED], Bernd Walter writes: Don't you think that people will report them if the filesystem is automatically unmounted? We can't sensibly do that. Accepted that's not an option for the GEOM point and that panicing here can be good to fix range checking in the filesystem. That's the point: Our filesystems should be robust. If they're not they should be fixed. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
Bernd Walter wrote this message on Wed, Sep 17, 2003 at 10:27 +0200: On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. What is wrong with returning an IO error? I always hated panics because of filesystem corruptions. An alternative would be to just bring that filesystem down. Its easy to panic a whole system with a bogus filesystem on a removeable media. If you're file system is so hosed that it does this, then panicing is the only safe thing to do. You don't know what continued operation will do to the filesytem, and you might end up losing more data. It is not unresonable to put parameter restrictions on function calls. It is not much different from enforcing that a pointer is not NULL when being passed as an argument. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Wed, Sep 17, 2003 at 12:52:03PM -0700, John-Mark Gurney wrote: Bernd Walter wrote this message on Wed, Sep 17, 2003 at 10:27 +0200: On Wed, Sep 17, 2003 at 09:07:24AM +0200, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], Bruce Evans writes: This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. They most certainly should! If the range checking in any filesystem is not able to catch these cases I insist that GEOM do so with a panic. What is wrong with returning an IO error? I always hated panics because of filesystem corruptions. An alternative would be to just bring that filesystem down. Its easy to panic a whole system with a bogus filesystem on a removeable media. If you're file system is so hosed that it does this, then panicing is the only safe thing to do. You don't know what continued operation will do to the filesytem, and you might end up losing more data. You don't do anything to a filesystem if you force umount it on detected inconsistencies, but your system is still up. In which way could the filesystem further harmed? I have a bunch of MO media and also get media which were written by others - currently the only way to be safe is to fsck every media bevor mounting to not panic the system by just reading a removeable media. I have no clue on about how hard it is to implement, but I can't see anything wrong from the idea itself. As I already wrote in another mail - panicing inside GEOM sounds OK, because the FS shouldn't try to access unavailable blocks. It is not unresonable to put parameter restrictions on function calls. It is not much different from enforcing that a pointer is not NULL when being passed as an argument. It is different - if a pointer is NULL then we have a software problem. If the filesystem is broken then the software might be OK and the cause could even be outside your own system. -- B.Walter BWCThttp://www.bwct.de [EMAIL PROTECTED] [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
Bernd Walter wrote this message on Wed, Sep 17, 2003 at 22:27 +0200: If you're file system is so hosed that it does this, then panicing is the only safe thing to do. You don't know what continued operation will do to the filesytem, and you might end up losing more data. You don't do anything to a filesystem if you force umount it on detected inconsistencies, but your system is still up. In which way could the filesystem further harmed? I have a bunch of MO media and also get media which were written by others - currently the only way to be safe is to fsck every media bevor mounting to not panic the system by just reading a removeable media. I have no clue on about how hard it is to implement, but I can't see anything wrong from the idea itself. there is nothing wrong with the idea, but implementation is difficult. As far as GEOM is considered, it just gets data read/write requests from various backing objects, but has no idea what fs or even if it is an fs that is trying to access the block. It could be broken swap code, or some person's custom kernel web server, etc. GEOM just can't know how to behave in these cases. As I already wrote in another mail - panicing inside GEOM sounds OK, because the FS shouldn't try to access unavailable blocks. Exactly. It is not unresonable to put parameter restrictions on function calls. It is not much different from enforcing that a pointer is not NULL when being passed as an argument. It is different - if a pointer is NULL then we have a software problem. If the filesystem is broken then the software might be OK and the cause could even be outside your own system. If the filesystem is broken, then we still have a software bug for not asserting that the properties of the fs is maintained. If/when we ever support user mounting fs's, we need to make sure that the fs doesn't do wacky things and provide a way to escelate permissions or crash the box. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
bad block 8239054478774324592, ino 3229486 bad block 7021770428354685254, ino 3229486 panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50 Debugger(panic) Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db trace Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54 panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5 g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7 spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72 spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18 breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122 bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286 indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334 handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at handle_workitem_freeblocks+0x21e process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0 sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304 fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x1a --- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 --- db Is this disk corruption, or a bug? Kris pgp0.pgp Description: PGP signature
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
8239054478774324592 = 0x72570065646F4D70 = rW\0edoMp 7021770428354685254 = 0x617257006C6C6946 = arW\0lliF That looks suspicious to me... In message [EMAIL PROTECTED], Kris Kennaway writes: --OgqxwSJOaUobr8KG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline bad block 8239054478774324592, ino 3229486 bad block 7021770428354685254, ino 3229486 panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50 Debugger(panic) Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db trace Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54 panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5 g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7 spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72 spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18 breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122 bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286 indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334 handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at handle_workitem_freeblocks+0x21e process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0 sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304 fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x1a --- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 --- db Is this disk corruption, or a bug? Kris --OgqxwSJOaUobr8KG Content-Type: application/pgp-signature Content-Disposition: inline -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/ZgHiWry0BWjoQKURAvPFAJ9vLJrNmZgRDT9Hhoked8il+5YGbACdENuh U4x0Dyqvq01pYLya7q4Xo60= =vAfZ -END PGP SIGNATURE- --OgqxwSJOaUobr8KG-- -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Mon, Sep 15, 2003 at 08:24:24PM +0200, Poul-Henning Kamp wrote: 8239054478774324592 = 0x72570065646F4D70 = rW\0edoMp 7021770428354685254 = 0x617257006C6C6946 = arW\0lliF That looks suspicious to me... Suspicious as indicating a kernel bug, or suspicious as in this panic is spurious and likely due to hardware failure? Kris pgp0.pgp Description: PGP signature
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Mon, 15 Sep 2003, Kris Kennaway wrote: bad block 8239054478774324592, ino 3229486 bad block 7021770428354685254, ino 3229486 panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50 Debugger(panic) Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0 db trace Debugger(c043aa25,c04ac1c0,c0435bc2,cd1d1980,100) at Debugger+0x54 panic(c0435bc2,5d2e4000,ffca8803,c7725d50,c0440989) at panic+0xd5 g_dev_strategy(c7725d50,0,c29b42e4,1,c0439d9c) at g_dev_strategy+0xa7 spec_xstrategy(c29c9920,c7725d50,0,c29b42e4,0) at spec_xstrategy+0x3ef spec_specstrategy(cd1d1a60,cd1d1a84,c02b2982,cd1d1a60,0) at spec_specstrategy+0x72 spec_vnoperate(cd1d1a60,0,e544,4000,0) at spec_vnoperate+0x18 breadn(c29c9920,1ae9720,e544,4000,0) at breadn+0x122 bread(c29c9920,1ae9720,e544,4000,0) at bread+0x4c ffs_blkfree(c29d4800,c29c9920,6c6c69,6f747561,4000) at ffs_blkfree+0x286 indir_trunc(c4abe200,313a0e0,0,0,c) at indir_trunc+0x334 handle_workitem_freeblocks(c4abe200,0,2,c04af748,c26acc00) at handle_workitem_freeblocks+0x21e process_worklist_item(0,0,3f65f661,0,c32961e0) at process_worklist_item+0x1fd softdep_process_worklist(0,0,c044228c,6f4,0) at softdep_process_worklist+0xe0 sched_sync(0,cd1d1d48,c04383a9,312,20) at sched_sync+0x304 fork_exit(c02c4e30,0,cd1d1d48) at fork_exit+0xcf fork_trampoline() at fork_trampoline+0x1a --- trap 0x1, eip = 0, esp = 0xcd1d1d7c, ebp = 0 --- db Is this disk corruption, or a bug? This is either disk corruption or an ffs bug. ffs passes the garbage block number 0xe5441ae9720 to bread. GEOM then handles this austerely by panicing. Garbage block numbers, including negative ones, can possibly be created by applications seeking to preposterous offsets, so they should not be handled with panics. The following script (with edits to turn off avoiding the bugs) demonstrated related bugs the last time I tried it (about 6 months ago). %%% #!/bin/sh SOMEFILE=/c/tmp/zz # Bugs: # (1) md silently truncates sizes (in DEV_BSIZE'ed units) mod 2^32. # (2) at least pre-GEOM versions of md get confused by this and cause a # null pointer panic in devstat. # # Use the maximum size that works (2^32 - 1). Unfortunately, this prevents # testing of file systems with size 2 TB or larger. dd if=/dev/zero of=$SOMEFILE oseek=0xFFFE count=1 mdconfig -a -t vnode -f ${SOMEFILE} -u 0 # The large values here are more to make newfs not take too long than to # get a large maxfilesize. newfs -O 1 -b 65536 -f 65536 -i 6533600 /dev/md0 # Note that this reports a very large maxfilesise (2282 TB). This is the # size associated with the triple indirect block limit, not the correct # one. I think the actual limit should be 64 TB (less epsilon). dumpfs /dev/md0 | grep maxfilesize mount /dev/md0 /mnt # Bugs: # (1) fsbtodb(nb) overflows when nb has type ufs1_daddr_t and the result # should be larger than (2^31 - 1). # (2) dscheck() used to detect garbage block numbers caused by (1) (if the # garbage happened to be negative or too large). Then it reported the error # and invalidated the buffer. GEOM doesn't detect any error. It apparently # passes on the garbage, so the error is eventually detected by ffs (since # md0 is on an ffs vnode) (if the garbage is preposterous). ffs_balloc_ufs1() # eventually sees the error as an EFBIG returned be bwrite() and gives up. # But the buffer says in the buffer cache to cause problems later. # (3) The result of bwrite() is sometimes ignored. # # Chop a couple of F's off the seek so that we don't get an EFBIG error. # Unfortunately, this breaks testing for files of size near 2282 TB. dd if=/dev/zero of=/mnt/zz oseek=0xFE count=1 ls -l /mnt/zz # Bugs: # (1) umount(2) returns the undocumented errno EFBIG for the unwriteable # buffer. # (2) umount -f and unmount at reboot fail too (the latter leaving all file # systems dirty). # # Removing the file clears the problem. rm /mnt/zz umount /mnt # Since we couldn't demonstrate files larger than 2 TB on md0, demonstrate # one near ${SOMEFILE}. dumpfs /c | egrep '(^bsize|^fsize|maxfilesize)' dd if=/dev/zero of=$SOMEFILE-bigger oseek=0x3FFFE count=1 ls -l $SOMEFILE-bigger rm $SOMEFILE-bigger mdconfig -d -u 0 rm $SOMEFILE %%% Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Negative bio_offset (-15050100712783872) on bio 0xc7725d50
On Tue, Sep 16, 2003 at 10:42:38AM +1000, Bruce Evans wrote: This is either disk corruption or an ffs bug. Thanks. The disk this occurred on is a flaky IBM drive which periodically experiences other kinds of FS corruption, so I'm inclined to blame it. Kris pgp0.pgp Description: PGP signature