Re: worklist_remove panic

2001-06-13 Thread Kirk McKusick

I have checked in revision 1.99 to ffs_softdep.c which 
builds on the change in revision 1.98 by [EMAIL PROTECTED]
The symptom being treated in 1.98 was to avoid freeing a
pagedep dependency if there was still a newdirblk dependency
referencing it. That change is correct and no longer prints
the warning message ``handle_written_filepage: active pagedep''
when it occurs. The other part of revision 1.98 was to panic
with ``deallocate_dependencies: active pagedep'' when a
newdirblk dependency was encountered during a file truncation.
This fix removes that panic and replaces it with code to find
and delete the newdirblk dependency so that the truncation can
succeed. This delta should clear up the recent problems that
folks have been having with soft updates.

Kirk McKusick

=-=-=-=-=

To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: worklist_remove panic
From: Dag-Erling Smorgrav [EMAIL PROTECTED]
Date: 26 May 2001 21:25:32 +0200

No dump (dumps seem to have been broken for about a month now), but a
stacktrace from DDB:

kernel: type 12 trap, code=0
Stopped at  worklist_remove+0x1c:   cmpw$0,0xa(%ecx)
db trace
worklist_remove(deadc0de) at worklist_remove+0x1c
free_diradd(deadc0de) at free_diradd+0x26
free_newdirblk(c2e45cd0) at free_newdirblk+0x32
handle_written_inodeblock(c287b200,c6323480) at handle_written_inodeblock+0x2b2
softdep_disk_write_complete(c6323480) at softdep_disk_write_complete+0x6a
bufdone(c6323480,cf2c7f54,c014de93,c6323480,c258b280) at bufdone+0x101
bufdonebio(c6323480) at bufdonebio+0xe
ad_interrupt(c2c5f940,c2564300,cf2c7f7c,c01ba6e4,c258b280) at ad_interrupt+0x3ef
ata_intr(c258b280) at ata_intr+0xae
ithread_loop(c258b200,cf2c7fa8) at ithread_loop+0x424
fork_exit(c01ba2c0,c258b200,cf2c7fa8) at fork_exit+0xf4
fork_trampoline() at fork_trampoline+0x8
db panic
panic: from debugger
Debugger(panic)
Stopped at  worklist_remove+0x1c:   cmpw$0,0xa(%ecx)
db 
panic: from debugger
Uptime: 1d0h12m13s

dumping to dev ad0b, offset 131104
dump ata0: resetting devices .. panic: witness_restore: lock (sleep mutex) Giant not 
locked
Uptime: 1d0h12m13s
Dump already in progress, bailing...
Automatic reboot in 15 seconds - press a key on the console to abort


des@des ~% gdb -k
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-unknown-freebsd.
(kgdb) exec-file /boot/kernel/kernel
(kgdb) symbol-file /sys/compile/DES/kernel.debug
Reading symbols from /sys/compile/DES/kernel.debug...done.
(kgdb) l *(worklist_remove+0x1c)
0xc0261750 is in worklist_remove (../../ufs/ffs/ffs_softdep.c:432).
427 struct worklist *item;
428 {
429
430 if (lk.lkt_held == -1)
431 panic(worklist_remove: lock not held);
432 if ((item-wk_state  ONWORKLIST) == 0) {
433 FREE_LOCK(lk);
434 panic(worklist_remove: not on list);
435 }
436 item-wk_state = ~ONWORKLIST;
(kgdb) l *(free_diradd+0x26)
0xc02640fa is in free_diradd (../../ufs/ffs/ffs_softdep.c:2601).
2596#ifdef DEBUG
2597if (lk.lkt_held == -1)
2598panic(free_diradd: lock not held);
2599#endif
2600WORKLIST_REMOVE(dap-da_list);
2601LIST_REMOVE(dap, da_pdlist);
2602if ((dap-da_state  DIRCHG) == 0) {
2603pagedep = dap-da_pagedep;
2604} else {
2605dirrem = dap-da_previous;
(kgdb) l *(free_newdirblk+0x32)
0xc026345e is in free_newdirblk (../../ufs/ffs/ffs_softdep.c:2033).
2028 */
2029pagedep = newdirblk-db_pagedep;
2030pagedep-pd_state = ~NEWBLOCK;
2031if ((pagedep-pd_state  ONWORKLIST) == 0)
2032while ((dap = LIST_FIRST(pagedep-pd_pendinghd)) != NULL)
2033free_diradd(dap);
2034/*
2035 * If no dependencies remain, the pagedep will be freed.
2036 */
2037for (i = 0; i  DAHASHSZ; i++)

After this panic, fsck complained of bad superblocks on all file
systems.

By the way, fsck is intolerably slow these days: more than twenty
minutes for 'fsck -y' of a 5.5 GB filesystem (roughly 380,000 files)
on a recent and far from sluggish IBM IDE drive.  Most (nearly all) of
that time is spent in phase 2.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: worklist_remove panic

2001-05-27 Thread Joerg Wunsch

Peter Wemm [EMAIL PROTECTED] wrote:

 For some reason, sysinstall or the kernel decided to += 64k on the
 start address of the swap partition (to avoid swap clobbering the
 fdisk, bootblocks, etc at the start of the disk), but neglected to
 remove 64k from the size.

This could be undone.  Swapping has been fixed long ago to not clobber
disklabels (i. e. it doesn't start at the beginning of the swap
partition).

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: worklist_remove panic

2001-05-27 Thread David Malone

 Check your disk label.  I got burned a few months back on a fairly old
 install where I created swap first, then root.  This causes the swap
 partition to start at sector 0, with root straight after.  For some reason,
 sysinstall or the kernel decided to += 64k on the start address of the swap
 partition (to avoid swap clobbering the fdisk, bootblocks, etc at the start
 of the disk), but neglected to remove 64k from the size.

That seems to be it. They actually overlap by 60 sectors. Grrr...

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



worklist_remove panic

2001-05-26 Thread Dag-Erling Smorgrav

No dump (dumps seem to have been broken for about a month now), but a
stacktrace from DDB:

kernel: type 12 trap, code=0
Stopped at  worklist_remove+0x1c:   cmpw$0,0xa(%ecx)
db trace
worklist_remove(deadc0de) at worklist_remove+0x1c
free_diradd(deadc0de) at free_diradd+0x26
free_newdirblk(c2e45cd0) at free_newdirblk+0x32
handle_written_inodeblock(c287b200,c6323480) at handle_written_inodeblock+0x2b2
softdep_disk_write_complete(c6323480) at softdep_disk_write_complete+0x6a
bufdone(c6323480,cf2c7f54,c014de93,c6323480,c258b280) at bufdone+0x101
bufdonebio(c6323480) at bufdonebio+0xe
ad_interrupt(c2c5f940,c2564300,cf2c7f7c,c01ba6e4,c258b280) at ad_interrupt+0x3ef
ata_intr(c258b280) at ata_intr+0xae
ithread_loop(c258b200,cf2c7fa8) at ithread_loop+0x424
fork_exit(c01ba2c0,c258b200,cf2c7fa8) at fork_exit+0xf4
fork_trampoline() at fork_trampoline+0x8
db panic
panic: from debugger
Debugger(panic)
Stopped at  worklist_remove+0x1c:   cmpw$0,0xa(%ecx)
db 
panic: from debugger
Uptime: 1d0h12m13s

dumping to dev ad0b, offset 131104
dump ata0: resetting devices .. panic: witness_restore: lock (sleep mutex) Giant not 
locked
Uptime: 1d0h12m13s
Dump already in progress, bailing...
Automatic reboot in 15 seconds - press a key on the console to abort


des@des ~% gdb -k
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-unknown-freebsd.
(kgdb) exec-file /boot/kernel/kernel
(kgdb) symbol-file /sys/compile/DES/kernel.debug
Reading symbols from /sys/compile/DES/kernel.debug...done.
(kgdb) l *(worklist_remove+0x1c)
0xc0261750 is in worklist_remove (../../ufs/ffs/ffs_softdep.c:432).
427 struct worklist *item;
428 {
429
430 if (lk.lkt_held == -1)
431 panic(worklist_remove: lock not held);
432 if ((item-wk_state  ONWORKLIST) == 0) {
433 FREE_LOCK(lk);
434 panic(worklist_remove: not on list);
435 }
436 item-wk_state = ~ONWORKLIST;
(kgdb) l *(free_diradd+0x26)
0xc02640fa is in free_diradd (../../ufs/ffs/ffs_softdep.c:2601).
2596#ifdef DEBUG
2597if (lk.lkt_held == -1)
2598panic(free_diradd: lock not held);
2599#endif
2600WORKLIST_REMOVE(dap-da_list);
2601LIST_REMOVE(dap, da_pdlist);
2602if ((dap-da_state  DIRCHG) == 0) {
2603pagedep = dap-da_pagedep;
2604} else {
2605dirrem = dap-da_previous;
(kgdb) l *(free_newdirblk+0x32)
0xc026345e is in free_newdirblk (../../ufs/ffs/ffs_softdep.c:2033).
2028 */
2029pagedep = newdirblk-db_pagedep;
2030pagedep-pd_state = ~NEWBLOCK;
2031if ((pagedep-pd_state  ONWORKLIST) == 0)
2032while ((dap = LIST_FIRST(pagedep-pd_pendinghd)) != NULL)
2033free_diradd(dap);
2034/*
2035 * If no dependencies remain, the pagedep will be freed.
2036 */
2037for (i = 0; i  DAHASHSZ; i++)

After this panic, fsck complained of bad superblocks on all file
systems.

By the way, fsck is intolerably slow these days: more than twenty
minutes for 'fsck -y' of a 5.5 GB filesystem (roughly 380,000 files)
on a recent and far from sluggish IBM IDE drive.  Most (nearly all) of
that time is spent in phase 2.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: worklist_remove panic

2001-05-26 Thread Peter Wemm

David Malone wrote:
 On Sat, May 26, 2001 at 09:25:32PM +0200, Dag-Erling Smorgrav wrote:
  No dump (dumps seem to have been broken for about a month now), but a
  stacktrace from DDB:
 
 Crashdumps have been working for me recently, (apart from the fact
 that they overrun the end of my swap partition by 64k and clobber
 the superblock of my /var partition).

Check your disk label.  I got burned a few months back on a fairly old
install where I created swap first, then root.  This causes the swap
partition to start at sector 0, with root straight after.  For some reason,
sysinstall or the kernel decided to += 64k on the start address of the swap
partition (to avoid swap clobbering the fdisk, bootblocks, etc at the start
of the disk), but neglected to remove 64k from the size.  So, when I
finally crashed the box with dumps enabled, it took out the first 64k of my
root filesystem.  I never found the code that made this change.  Either it
is well hidden, or got removed a while back.  This machine was installed
from a 4.0-SNAP from some time before march 1999.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message