Re: panic: blockable sleep lock (sx) allproc @ /usr/local/src/sys/kern/kern_proc.c:212

2001-10-07 Thread Garrett Wollman

On Sun, 7 Oct 2001 14:49:16 +1000 (EST), Bruce Evans [EMAIL PROTECTED] said:

 Is using xconsole significantly better than tail -f /var/log/messages?

I don't know.  I think `xterm -C' is better than either one, if it can
be made to work properly.  (I have held off on updating to latest
-current in the hope that this might be resolved one way or the
other.)

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: panic: blockable sleep lock (sx) allproc @ /usr/local/src/sys/kern/kern_proc.c:212

2001-09-30 Thread Bruce Evans

On Sun, 30 Sep 2001, Doug Barton wrote:

   With -current as of last night around 7pm PDT:


 IdlePTD 4390912
 initial pcb at 320920
 panicstr: bremfree: bp 0xc68bf184 not locked
 panic messages:
 ---
 panic: blockable sleep lock (sx) allproc @
 /usr/local/src/sys/kern/kern_proc.c:212

 syncing disks... panic: bremfree: bp 0xc68bf184 not locked
 Uptime: 9h46m14s

 ...
 #10 0xc01cf4d5 in panic (fmt=0xc02b0f20 blockable sleep lock (%s) %s @
 %s:%d)
 at /usr/local/src/sys/kern/kern_shutdown.c:628
 #11 0xc01e6054 in witness_lock (lock=0xc035a9a0, flags=0,
 file=0xc02ad8e0 /usr/local/src/sys/kern/kern_proc.c, line=212)
 at /usr/local/src/sys/kern/subr_witness.c:493
 #12 0xc01d3625 in _sx_slock (sx=0xc035a9a0,
 file=0xc02ad8e0 /usr/local/src/sys/kern/kern_proc.c, line=212)
 at /usr/local/src/sys/kern/kern_sx.c:115
 #13 0xc01ca5e4 in pfind (pid=353) at
 /usr/local/src/sys/kern/kern_proc.c:212
 #14 0xc01e98a2 in selwakeup (sip=0xc179b804) at
 /usr/local/src/sys/kern/sys_generic.c:1292
 #15 0xc01f33a3 in ptcwakeup (tp=0xc179b828, flag=1)
 at /usr/local/src/sys/kern/tty_pty.c:321
 #16 0xc01f337a in ptsstart (tp=0xc179b828) at
 /usr/local/src/sys/kern/tty_pty.c:310
 #17 0xc01f0958 in ttstart (tp=0xc179b828) at
 /usr/local/src/sys/kern/tty.c:1409
 #18 0xc01f1da9 in tputchar (c=99, tp=0xc179b828) at
 /usr/local/src/sys/kern/tty.c:2469
 #19 0xc01e291f in putchar (c=99, arg=0xcd9a0be8) at
 /usr/local/src/sys/kern/subr_prf.c:306
 #20 0xc01e2b9e in kvprintf (
 fmt=0xc02add01 alcru: negative time of %ld usec for pid %d (%s)\n,
 func=0xc01e28d0 putchar, arg=0xcd9a0be8, radix=10, ap=0xcd9a0c00
 Þ/Ìÿ\002\020)
 at /usr/local/src/sys/kern/subr_prf.c:489
 #21 0xc01e284c in printf (
 fmt=0xc02add00 calcru: negative time of %ld usec for pid %d (%s)\n)
 at /usr/local/src/sys/kern/subr_prf.c:262
 #22 0xc01cdeab in calcru (p=0xcd0e7300, up=0xc189ae00, sp=0xc189ae08,
 ip=0x0)
 at /usr/local/src/sys/kern/kern_resource.c:669
 #23 0xc01c041c in exit1 (td=0xcd0e7404, rv=256) at
 /usr/local/src/sys/kern/kern_exit.c:318
 #24 0xc01bfdde in sys_exit (td=0xcd0e7404, uap=0xcd9a0d20)
 at /usr/local/src/sys/kern/kern_exit.c:110
 #25 0xc027543f in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
 tf_edi = 0,
   tf_esi = -1, tf_ebp = -1077937600, tf_isp = -845542028, tf_ebx =
 672297256,
   tf_edx = 512, tf_ecx = 672374048, tf_eax = 1, tf_trapno = 12, tf_err
 = 2,
   tf_eip = 671896240, tf_cs = 31, tf_eflags = 643, tf_esp =
 -1077937644, tf_ss = 47})
 at /usr/local/src/sys/i386/i386/trap.c:1122
 #26 0xc026953d in syscall_with_err_pushed ()

This is a well-know bug in printf(9).  The TIOCCONS ioctl always gave
non-deterministic crashes.  Now it gives determinstic panics when
pintf() is called while sched_lock is held.  Work-around: don't use
anything that uses TIOCCONS (xconsole?).

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: panic: blockable sleep lock (sx) ...

2001-09-29 Thread Marcel Moolenaar

On Sat, Sep 29, 2001 at 12:52:48PM -0700, John Baldwin wrote:
 
 could figure out the faulting address that it trapp'd one that would help.  You
 could add KTR tracepoints or some such to store the log message to do that and
 then examin the KTR buffer to get the actual faulting address.

Ok. In the remote debugger I got a lot more data. In fact, it
brought me to linux_alarm() in /sys/compat/linux_misc.c

First some data about the trapping thread:

td = curthread = 0xcbb89304 looks ok
kg = td-ksegrp = 0xcbb89230looks ok

td-td_proc-p_comm = swi6: tty:sio clock

The LTP test that seems to be causing this is doing an alarm().

In linux_alarm() I see:

:
s = splsoftclock();
old_it = td-td_proc-p_realtimer;
:
td-td_proc-p_realtimer = it;
splx(s);
:

Is the splsoftclock() acceptable locking to access td-td_proc?

-- 
 Marcel Moolenaar USPA: A-39004  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: panic: blockable sleep lock (sx) ...

2001-09-29 Thread John Baldwin


On 30-Sep-01 Marcel Moolenaar wrote:
 On Sat, Sep 29, 2001 at 12:52:48PM -0700, John Baldwin wrote:
 
 could figure out the faulting address that it trapp'd one that would help. 
 You
 could add KTR tracepoints or some such to store the log message to do that
 and
 then examin the KTR buffer to get the actual faulting address.
 
 Ok. In the remote debugger I got a lot more data. In fact, it
 brought me to linux_alarm() in /sys/compat/linux_misc.c
 
 First some data about the trapping thread:
 
 td = curthread = 0xcbb89304   looks ok
 kg = td-ksegrp = 0xcbb89230  looks ok
 
 td-td_proc-p_comm = swi6: tty:sio clock
 
 The LTP test that seems to be causing this is doing an alarm().
 
 In linux_alarm() I see:
 
   :
   s = splsoftclock();
   old_it = td-td_proc-p_realtimer;
   :
   td-td_proc-p_realtimer = it;
   splx(s);
   :
 
 Is the splsoftclock() acceptable locking to access td-td_proc?

For now the code should be fine b/c it is under Giant.  (The spl's don't dp
anything anymore.)

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: panic: blockable sleep lock (sx) ...

2001-09-29 Thread Marcel Moolenaar

On Sat, Sep 29, 2001 at 12:52:48PM -0700, John Baldwin wrote:
 
 Can you do 'show locks' at the ddb prompt to get a list of what locks are held?

db show locks
exclusive (sleep mutex) Giant (0xc0343ae0) locked @ 
/nfs/5.x/src/sys/kern/kern_timeout.c:186
exclusive (spin mutex) sched lock (0xc0343940) locked @ 
/nfs/5.x/src/sys/kern/kern_mutex.c:340

 probably a NULL pointer dereference of some sort in _mtx_lock_sleep(). 

From trace:
:
--- trap 0xc, eip = 0xc01b67c6, esp = 0xcbf9ec74, ebp = 0xcbf9ec80 ---
_mtx_lock_sleep(cc4c310c,0,c029b360,27b) at _mtx_lock_sleep+0x14e
:

In gdb (now with debug information):
(kgdb) bt
:
#21 0xc01cf514 in printf (
fmt=0xc02b3480 kernel trap %d with interrupts disabled\n)
at /nfs/5.x/src/sys/kern/subr_prf.c:262
#22 0xc026bca9 in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
  tf_edi = -877096400, tf_esi = -867421940, tf_ebp = -872813440, 
  tf_isp = -872813472, tf_ebx = -877096188, tf_edx = -1049155008, 
  tf_ecx = 2, tf_eax = 0, tf_trapno = 12, tf_err = 0, 
  tf_eip = -1071945786, tf_cs = 8, tf_eflags = 65666, tf_esp = 635, 
  tf_ss = 0}) at /nfs/5.x/src/sys/i386/i386/trap.c:206
#23 0xc01b67c6 in _mtx_lock_sleep (m=0xcc4c310c, opts=0, 
file=0xc029b360 /nfs/5.x/src/sys/kern/kern_time.c, line=635)
at /nfs/5.x/src/sys/kern/kern_mutex.c:409
#24 0xc01b6421 in _mtx_lock_flags (m=0xcc4c310c, opts=0, 
file=0xc029b360 /nfs/5.x/src/sys/kern/kern_time.c, line=635)
at /nfs/5.x/src/sys/kern/kern_mutex.c:235
#25 0xc01c4b60 in realitexpire (arg=0xcc4c2f04)
at /nfs/5.x/src/sys/kern/kern_time.c:635
#26 0xc01c4fc6 in softclock (dummy=0x0)
at /nfs/5.x/src/sys/kern/kern_timeout.c:187
#27 0xc01b03ce in ithread_loop (arg=0xc0e45c80)
at /nfs/5.x/src/sys/kern/kern_intr.c:532
#28 0xc01af8ac in fork_exit (callout=0xc01b02a4 ithread_loop, 
arg=0xc0e45c80, frame=0xcbf9ed48) at /nfs/5.x/src/sys/kern/kern_fork.c:

(kgdb) up
:
#22 0xc026bca9 in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, 
  tf_edi = -877096400, tf_esi = -867421940, tf_ebp = -872813440, 
  tf_isp = -872813472, tf_ebx = -877096188, tf_edx = -1049155008, 
  tf_ecx = 2, tf_eax = 0, tf_trapno = 12, tf_err = 0, 
  tf_eip = -1071945786, tf_cs = 8, tf_eflags = 65666, tf_esp = 635, 
  tf_ss = 0}) at /nfs/5.x/src/sys/i386/i386/trap.c:206
206 printf(kernel trap %d with interrupts disabled\n,
(kgdb) 
#23 0xc01b67c6 in _mtx_lock_sleep (m=0xcc4c310c, opts=0, 
file=0xc029b360 /nfs/5.x/src/sys/kern/kern_time.c, line=635)
at /nfs/5.x/src/sys/kern/kern_mutex.c:409
409 if (td1-td_ksegrp-kg_pri.pri_level  
kg-kg_pri.pri_level)

(kgdb) p td1
$1 = (struct thread *) 0x0

The strange part is that line 409 is the body of a for-loop (TAILQ_FOREACH)
that breaks whrn td1 is NULL.

(kgdb) p m-mtx_blocked
$3 = {tqh_first = 0xc1772a40, tqh_last = 0xcc4c3100}

(kgdb) p $3.tqh_first
$6 = (struct thread *) 0xc1772a40

(kgdb) p $6-td_blkq
$7 = {tqe_next = 0x1, tqe_prev = 0xdeadc0de}

(kgdb) p *$6
$8 = {td_proc = 0x18729006, td_ksegrp = 0x0, td_last_kse = 0x0, td_kse = 0x0, 
  td_plist = {tqe_next = 0x86001, tqe_prev = 0x0}, td_kglist = {
tqe_next = 0x0, tqe_prev = 0x0}, td_slpq = {tqe_next = 0x0, 
tqe_prev = 0xcd20e000}, td_blkq = {tqe_next = 0x1, tqe_prev = 0xdeadc0de}, 
  td_runq = {tqe_next = 0xdeadc0de, tqe_prev = 0xdeadc0de}, 
  td_flags = -559038242, td_dupfd = -559038242, td_wchan = 0x0, 
  td_wmesg = 0xc16a0cac \200*wÁ, td_lastcpu = 0 '\000', td_locks = 0, 
  td_blocked = 0xcd207b48, td_ithd = 0x0, 
  td_mtxname = 0xcd207a0c \200*wÁ\220*wÁ\200z ÍW, td_contested = {
lh_first = 0xcd207a80}, td_sleeplocks = 0xcd207940, 
  td_intr_nesting_level = 1869349888, td_md = {No data fields}, td_retval = {
-559060125, -559038242}, td_pcb = 0xdeadc0de, td_slpcallout = {c_links = {
  sle = {sle_next = 0xdeadc0de}, tqe = {tqe_next = 0xdeadc0de, 
tqe_prev = 0xdeadc0de}}, c_time = -559038242, c_arg = 0xdeadc0de, 
c_func = 0, c_flags = -1049941016}, td_frame = 0xc1772b40, 
  td_kstack_obj = 0xc1772248, td_kstack = 0}

I don't know to what extend the structures have ben globbered by the
double panic, but this is what I see post mortem.

BTW: It seems easily reproducable so if you want some additional
info, let me know. The kernel is bleeding edge with a local fix for
the linprocfs breakage.

FYI,

-- 
 Marcel Moolenaar USPA: A-39004  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: panic: blockable sleep lock (sx) ...

2001-09-29 Thread John Baldwin


On 29-Sep-01 Marcel Moolenaar wrote:
 Gang,
 
 I was running the Linux Test Project's (ltp) regression suite to see
 how our Linuxulator was doing. It looks like we can use that to test
 our kernel as well. :-)\

Can you do 'show locks' at the ddb prompt to get a list of what locks are held?
My guess is that sched_lock is held.  The real problem is in the trap.  It's
probably a NULL pointer dereference of some sort in _mtx_lock_sleep(). 
Ideally, printf wouldn't be trying to d a wakeup in this case. :-P  If you
could figure out the faulting address that it trapp'd one that would help.  You
could add KTR tracepoints or some such to store the log message to do that and
then examin the KTR buffer to get the actual faulting address.

 panicstr: bremfree: bp 0xc69541a8 not locked
 panic messages:
 ---
 panic: blockable sleep lock (sx) allproc @
 /nfs/5.x/src/sys/kern/kern_proc.c:212
 
 syncing disks... panic: bremfree: bp 0xc69541a8 not locked
 Uptime: 3h46m47s
 
 dumping to dev ad0s3b, offset 1572992
 [snip]
 
 (kgdb) bt
#0  0xc01bc316 in dumpsys ()
#1  0xc01bc103 in boot ()
#2  0xc01bc529 in panic ()
#3  0xc01e96c9 in bremfree ()
#4  0xc0194d49 in spec_fsync ()
#5  0xc0194889 in spec_vnoperate ()
#6  0xc02337fd in ffs_sync ()
#7  0xc01f5c01 in sync ()
#8  0xc01bbd4c in boot ()
#9  0xc01bc529 in panic ()
#10 0xc01d2582 in witness_lock ()
#11 0xc01c0555 in _sx_slock ()
#12 0xc01b76d0 in pfind ()
#13 0xc01d5cc9 in selwakeup ()
#14 0xc01df5a7 in ptcwakeup ()
#15 0xc01df582 in ptsstart ()
#16 0xc01dcc60 in ttstart ()
#17 0xc01de08d in tputchar ()
#18 0xc01cf037 in putchar ()
#19 0xc01cf29e in kvprintf ()
#20 0xc01cef64 in printf ()
#21 0xc026b095 in trap ()
#22 0xc01b62b7 in _mtx_lock_sleep ()
#23 0xc01b5f45 in _mtx_lock_flags ()
#24 0xc01c4594 in realitexpire ()
#25 0xc01c49fa in softclock ()
#26 0xc01aff5e in ithread_loop ()
#27 0xc01af43c in fork_exit ()
 
 -- 
  Marcel Moolenaar   USPA: A-39004  [EMAIL PROTECTED]
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: panic: blockable sleep lock (sx) ...

2001-09-29 Thread Marcel Moolenaar

On Sat, Sep 29, 2001 at 12:52:48PM -0700, John Baldwin wrote:
 
 On 29-Sep-01 Marcel Moolenaar wrote:
  Gang,
  
  I was running the Linux Test Project's (ltp) regression suite to see
  how our Linuxulator was doing. It looks like we can use that to test
  our kernel as well. :-)\
 
 Can you do 'show locks' at the ddb prompt to get a list of what locks are held?

I'll get back to you. There are a couple of things going on at the
same time that I need to serialize...

 My guess is that sched_lock is held.  The real problem is in the trap.  It's
 probably a NULL pointer dereference of some sort in _mtx_lock_sleep(). 
 Ideally, printf wouldn't be trying to d a wakeup in this case. :-P  If you
 could figure out the faulting address that it trapp'd one that would help.  You
 could add KTR tracepoints or some such to store the log message to do that and
 then examin the KTR buffer to get the actual faulting address.

Will do. I also need to figure out if it was directly caused by a test
in the LTP suite and which one or that Murphy passed by my window...

-- 
 Marcel Moolenaar USPA: A-39004  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message