Re: NULL td passed to propagate_priority() when using xmms...

2003-11-04 Thread Bruce Evans
On Mon, 3 Nov 2003, John Baldwin wrote:

 On 01-Nov-2003 Soren Schmidt wrote:
  It seems Sean Chittenden wrote:
  Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
  for those interested (this is using ule 1.67, rebuilding world now),
  here's my stack.  I couldn't figure out where td was being set to
  NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
  anything.  -sc
 
  Its not ULE, I'm running 4BSD and has gotten this on boot for over a
  week now, rendering -current totally useless...

 Having a kernel panic with INVARIANTS on would really help narrow down
 where the bug is.

I found something that causes this bug fairly reliably:
- configure ddb so that db_print_backtrace() is called on panics.
- break the fd driver so that the panic() in fdstrategy() is called on
  floppy accesses.
- attempt to access a floppy so that fdstrategy() is called.
- db_print_backtrace() then does bad things.  It never completes here,
  though it works in other contexts.  Usually it prints only the first
  line or two.  Then quite often ddb is called for a null pointer panic
  in propagate_priority().

More details about the null pointer panic:  This seems to have nothing
to do with scheduling.  propagate_priority() is not called with a null
td of course, but it sometimes follows a null m:

%%%
/*
 * Pick up the mutex that td is blocked on.
 */
m = td-td_blocked;
MPASS(m != NULL);

/*
 * Check if the thread needs to be moved up on
 * the blocked chain
 */
if (td == TAILQ_FIRST(m-mtx_blocked)) {
continue;
}
%%%

I don't have invariants enabled, so MPASS(m != NULL) doesn't do anything,
but m is null so attempting to load m-mtx_blocked causes a panic.

For the backtrace context, propagate_priority() gets called for attempting
to aquire a lock in softclock().  Tasks like the softclock task get
scheduled despite the system being in panic().  ps seemed to show that the
user process doing the floppy access no longer existed.  I don't know how
that could happen, since the panic() is done in the context of the that
process.

More details about bugs in db_print_backtrace(): Maybe the stack is
messed up.  Attempting to access invalid stack offsets can cause
problems.  My version of db_print_backtrace() has extra code to attempt
not to access invalid offsets, but there is normally no problem since
ddb's trap handler fixes up the problem.  But backtrace() bogusly calls
db_print_backtrace() in non-ddb context and then the longjmp in the trap
handler goes to hyperspace if anywhere.

Bugs tripped over while debugging this: Putting a breakpoint in fdopen()
didn't work, because fd.c:fdopen() conflicts with kern_descrip.c:fdopen().
This was broken in fd.c 1.259.  There are hundreds of similar conflicts
in GENERIC, some for obviously broken things like the same malloc type
being static in several files.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NULL td passed to propagate_priority() when using xmms...

2003-11-03 Thread John Baldwin

On 01-Nov-2003 Soren Schmidt wrote:
 It seems Sean Chittenden wrote:
 Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
 for those interested (this is using ule 1.67, rebuilding world now),
 here's my stack.  I couldn't figure out where td was being set to
 NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
 anything.  -sc
 
 Its not ULE, I'm running 4BSD and has gotten this on boot for over a
 week now, rendering -current totally useless...

Having a kernel panic with INVARIANTS on would really help narrow down
where the bug is.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NULL td passed to propagate_priority() when using xmms...

2003-11-03 Thread Soren Schmidt
It seems John Baldwin wrote:
 
 On 01-Nov-2003 Soren Schmidt wrote:
  It seems Sean Chittenden wrote:
  Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
  for those interested (this is using ule 1.67, rebuilding world now),
  here's my stack.  I couldn't figure out where td was being set to
  NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
  anything.  -sc
  
  Its not ULE, I'm running 4BSD and has gotten this on boot for over a
  week now, rendering -current totally useless...
 
 Having a kernel panic with INVARIANTS on would really help narrow down
 where the bug is.

doesn't produce anything it seems, but I do have this traceback:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x24
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc04b8f3b
stack pointer   = 0x10:0xcd619bc4
frame pointer   = 0x10:0xcd619bd8
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 3 (g_up)
kernel: type 12 trap, code=0
Stopped at  propagate_priority+0x8b:cmpl0x24(%ebx),%ecx
db trace
propagate_priority(c0ebfe40,c25983c0,c2512400,cd619c14,c04cc9d1) at propagate_pr
iority+0x8b
_mtx_lock_sleep(c069c340,0,0,0,c0698700) at _mtx_lock_sleep+0x249
bufdonebio(c7675b68,cd619c44,c048d612,c084c540,c259f990) at bufdonebio+0x47
biodone(c7675b68,c06411ba,c259f990,c7675b68,0) at biodone+0xbc
g_dev_done(c259f990,c0ebfe40,1,0,4) at g_dev_done+0x8a
biodone(c259f990,0,24c,c0640b45,a) at biodone+0xbc
g_io_schedule_up(c0ebfe40,c0ebe1e4,cd619d34,c04acae1,0) at g_io_schedule_up+0xb8
g_up_procbody(0,cd619d48,0,0,0) at g_up_procbody+0x28
fork_exit(c048df60,0,cd619d48) at fork_exit+0xb1
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xcd619d7c, ebp = 0 ---
db 

-Søren
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NULL td passed to propagate_priority() when using xmms...

2003-11-01 Thread Soren Schmidt
It seems Sean Chittenden wrote:
 Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
 for those interested (this is using ule 1.67, rebuilding world now),
 here's my stack.  I couldn't figure out where td was being set to
 NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
 anything.  -sc

Its not ULE, I'm running 4BSD and has gotten this on boot for over a
week now, rendering -current totally useless...

 
 (kgdb) bt
 #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
 #1  0xc0530569 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
 #2  0xc0530948 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
 #3  0xc06e6bc6 in trap_fatal (frame=0xd675dc2c, eva=0)
 at /usr/src/sys/i386/i386/trap.c:820
 #4  0xc06e6203 in trap (frame=
   {tf_fs = -881065960, tf_es = 16, tf_ds = -881065968, tf_edi = 0, tf_esi = 
 -871763664, tf_ebp = -696918912, tf_isp = -696918952, tf_ebx = 24, tf_edx = 24, 
 tf_ecx = -871763664, tf_eax = -879614080, tf_trapno = 12, tf_err = 0, tf_eip = 
 -1068208110, tf_cs = 8, tf_eflags = 66051, tf_esp = -874093984, tf_ss = 0})
 at /usr/src/sys/i386/i386/trap.c:252
 #5  0xc06d6a68 in calltrap () at {standard input}:102
 #6  0xc05257ac in propagate_priority (td=0x0)
 at /usr/src/sys/kern/kern_mutex.c:152
 #7  0xc0525bf9 in _mtx_lock_sleep (m=0xc0796b40, opts=0, file=0x0, line=0)
 at /usr/src/sys/kern/kern_mutex.c:635
 #8  0xc051a2c4 in ithread_loop (arg=0xcb7a6b00)
 at /usr/src/sys/kern/kern_intr.c:539
 #9  0xc0518f51 in fork_exit (callout=0xc051a100 ithread_loop, arg=0x0,
 frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
 (kgdb) frame 9
 #9  0xc0518f51 in fork_exit (callout=0xc051a100 ithread_loop, arg=0x0,
 frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
 796 callout(arg, frame);
 (kgdb) list
 791  * cpu_set_fork_handler intercepts this function call to
 792  * have this call a non-return function to stay in kernel mode.
 793  * initproc has its own fork handler, but it does return.
 794  */
 795 KASSERT(callout != NULL, (NULL callout in fork_exit));
 796 callout(arg, frame);
 797
 798 /*
 799  * Check if a kernel thread misbehaved and returned from its main
 800  * function.
 (kgdb) frame 8
 #8  0xc051a2c4 in ithread_loop (arg=0xcb7a6b00)
 at /usr/src/sys/kern/kern_intr.c:539
 539  mtx_lock(Giant);
 (kgdb) list
 534 wakeup(ih);
 535 mtx_unlock(ithd-it_lock);
 536 goto restart;
 537 }
 538 if ((ih-ih_flags  IH_MPSAFE) == 0)
 539 mtx_lock(Giant);
 540 ih-ih_handler(ih-ih_argument);
 541 if ((ih-ih_flags  IH_MPSAFE) == 0)
 542 mtx_unlock(Giant);
 543 }
 (kgdb) frame 7
 #7  0xc0525bf9 in _mtx_lock_sleep (m=0xc0796b40, opts=0, file=0x0, line=0)
 at /usr/src/sys/kern/kern_mutex.c:635
 635 propagate_priority(td);
 (kgdb) list
 630  * Save who we're blocked on.
 631  */
 632 td-td_blocked = m;
 633 td-td_lockname = m-mtx_object.lo_name;
 634 TD_SET_LOCK(td);
 635 propagate_priority(td);
 636
 637 if (LOCK_LOG_TEST(m-mtx_object, opts))
 638 CTR3(KTR_LOCK,
 639 _mtx_lock_sleep: p %p blocked on [%p] %s, td, 
 m,
 (kgdb) frame 6
 #6  0xc05257ac in propagate_priority (td=0x0)
 at /usr/src/sys/kern/kern_mutex.c:152
 152 sched_prio(td, pri);
 (kgdb) list
 147  * XXXKSE this gets a lot more complicated under threads
 148  * but try anyhow.
 149  */
 150 if (TD_ON_RUNQ(td)) {
 151 MPASS(td-td_blocked == NULL);
 152 sched_prio(td, pri);
 153 return;
 154 }
 155 /*
 156  * Adjust for any other cases.
 ---
 panic: page fault
 panic messages:
 ---
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x38
 fault code  = supervisor read, page not present
 instruction pointer = 0x8:0xc0547012
 stack pointer   = 0x10:0xd6763c6c
 frame pointer   = 0x10:0xd6763c80
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 

NULL td passed to propagate_priority() when using xmms...

2003-10-31 Thread Sean Chittenden
Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
for those interested (this is using ule 1.67, rebuilding world now),
here's my stack.  I couldn't figure out where td was being set to
NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
anything.  -sc

(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:240
#1  0xc0530569 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:372
#2  0xc0530948 in panic () at /usr/src/sys/kern/kern_shutdown.c:550
#3  0xc06e6bc6 in trap_fatal (frame=0xd675dc2c, eva=0)
at /usr/src/sys/i386/i386/trap.c:820
#4  0xc06e6203 in trap (frame=
  {tf_fs = -881065960, tf_es = 16, tf_ds = -881065968, tf_edi = 0, tf_esi = 
-871763664, tf_ebp = -696918912, tf_isp = -696918952, tf_ebx = 24, tf_edx = 24, tf_ecx 
= -871763664, tf_eax = -879614080, tf_trapno = 12, tf_err = 0, tf_eip = -1068208110, 
tf_cs = 8, tf_eflags = 66051, tf_esp = -874093984, tf_ss = 0})
at /usr/src/sys/i386/i386/trap.c:252
#5  0xc06d6a68 in calltrap () at {standard input}:102
#6  0xc05257ac in propagate_priority (td=0x0)
at /usr/src/sys/kern/kern_mutex.c:152
#7  0xc0525bf9 in _mtx_lock_sleep (m=0xc0796b40, opts=0, file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:635
#8  0xc051a2c4 in ithread_loop (arg=0xcb7a6b00)
at /usr/src/sys/kern/kern_intr.c:539
#9  0xc0518f51 in fork_exit (callout=0xc051a100 ithread_loop, arg=0x0,
frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
(kgdb) frame 9
#9  0xc0518f51 in fork_exit (callout=0xc051a100 ithread_loop, arg=0x0,
frame=0x0) at /usr/src/sys/kern/kern_fork.c:796
796 callout(arg, frame);
(kgdb) list
791  * cpu_set_fork_handler intercepts this function call to
792  * have this call a non-return function to stay in kernel mode.
793  * initproc has its own fork handler, but it does return.
794  */
795 KASSERT(callout != NULL, (NULL callout in fork_exit));
796 callout(arg, frame);
797
798 /*
799  * Check if a kernel thread misbehaved and returned from its main
800  * function.
(kgdb) frame 8
#8  0xc051a2c4 in ithread_loop (arg=0xcb7a6b00)
at /usr/src/sys/kern/kern_intr.c:539
539  mtx_lock(Giant);
(kgdb) list
534 wakeup(ih);
535 mtx_unlock(ithd-it_lock);
536 goto restart;
537 }
538 if ((ih-ih_flags  IH_MPSAFE) == 0)
539 mtx_lock(Giant);
540 ih-ih_handler(ih-ih_argument);
541 if ((ih-ih_flags  IH_MPSAFE) == 0)
542 mtx_unlock(Giant);
543 }
(kgdb) frame 7
#7  0xc0525bf9 in _mtx_lock_sleep (m=0xc0796b40, opts=0, file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:635
635 propagate_priority(td);
(kgdb) list
630  * Save who we're blocked on.
631  */
632 td-td_blocked = m;
633 td-td_lockname = m-mtx_object.lo_name;
634 TD_SET_LOCK(td);
635 propagate_priority(td);
636
637 if (LOCK_LOG_TEST(m-mtx_object, opts))
638 CTR3(KTR_LOCK,
639 _mtx_lock_sleep: p %p blocked on [%p] %s, td, m,
(kgdb) frame 6
#6  0xc05257ac in propagate_priority (td=0x0)
at /usr/src/sys/kern/kern_mutex.c:152
152 sched_prio(td, pri);
(kgdb) list
147  * XXXKSE this gets a lot more complicated under threads
148  * but try anyhow.
149  */
150 if (TD_ON_RUNQ(td)) {
151 MPASS(td-td_blocked == NULL);
152 sched_prio(td, pri);
153 return;
154 }
155 /*
156  * Adjust for any other cases.
---
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x38
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0547012
stack pointer   = 0x10:0xd6763c6c
frame pointer   = 0x10:0xd6763c80
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 33 (irq12: psm0)
trap number = 12
panic: page fault

syncing disks, buffers remaining...

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x38
fault code  = supervisor read, page not present
instruction 

Re: NULL td passed to propagate_priority() when using xmms...

2003-10-31 Thread Sean Chittenden
 Howdy.  I'm not sure if this is a ULE bug or a KSE bug, or both, but,
 for those interested (this is using ule 1.67, rebuilding world now),
 here's my stack.  I couldn't figure out where td was being set to
 NULL.  :( Oh!  Where is TD_SET_LOCK defined?  egrep -r didn't turn up
 anything.  -sc

sched_ule.c v1.70 fixes this.  -sc

-- 
Sean Chittenden
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]