panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot

2014-01-28 Thread patrick keshishian
Hi,

Not sure if anyone is interested in this panic as the
snapshot is a bit old.

Happend while gdb-ing a process.

(typing off images; expect typos)

login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 
failed: file ../../../../lock.c, line 126
Stopped at  Debugger+0x5:   leave
RUN AT LEAST ...
ddb{0} trace
Debugger() at Debugger+0x5
panic() at panic+0xee
__assert() at __assert+0x21
_kernel_lock_init() at _kernel_lock_init
issignal() at issignal+0x205
sleep_setup_signal() at sleep_setup_signal+0x94
tsleep() at tsleep+0x92
thrsleep() at thrsleep+0x1b5
sys___thrsleep() at sys___thrsleep+0x6f
syscall() at syscall+0x249
--- syscall (number 94) ---
end of kernel
end trace frame: 0x1d47e3209d80, count: -10
0x1d47e4eba95a:
ddb{0}

now, the ps has a lot of output. a lot of firefox and gimp, and...
I'm happy to email photos to anyone interested. But I think these
are the interesting parts:

ddb{0} ps
PID PPIDPGRPUID S   FLAGS   WAITCOMMAND
  2944  154772944   100030x83   waitgdb
 14290   2944   30442   10002   0x4100603   myapp
  5416   2944   30442   10002   0x4100603   myapp
  7295   2944   30442   10002   0x4100603   myapp
 17989   2944   30442   10002   0x4100603   myapp
 21860   2944   30442   10002   0x4100603   myapp
 25934   2944   30442   10002   0x4100603   myapp
 32131   2944   30442   10002   0x4100603   myapp
  4616   2944   30442   10002   0x4100603   myapp
 17063   2944   30442   10002   0xc100603   myapp
*26101   2944   30442   10007   0xc100683   thrsleepmyapp
 30442   2944   30442   10003   0x8000683   pollmyapp
  3142   93373142   10004   0x8000403   vi
 11005   8117   27631   10003   0x4100082   pollfirefox
... lots of firefox threads ...
then other stuff ...

After typing all that: *ga!* This shit is in the dmesg :P


myapp is what was being debugged.

The app dlopen()'s a .so object, looks for a object holding
pointers to functions, and executes functions upon events.

I had set a break point in one of those functions, or rather
I thought I did ... then boom!

Interesting?


Side note 1:
Incidentally, I couldn't figure out how to correctly leave
a symbol unresolved while compiling myapp, until dlopen()
time. If anyone cares to give me a pointer (ha!) off-list,
I'd much appreciate it. I don't know if this is the cause
of the problem, but I'd have thunk, error like this, would
just crash the app not the system. The loaded functions
seem to execute just fine though I see log statements
from them.

Side note 2:
I see gdb crash quite often on subsequent run of app being
debugged. I've noticed this on a bunch of snapshots dating
some months back. Never reported it cause, I don't have a
simple test-case to submit, and things have been moving
fast with what you guys are doing.

$ gdb someapp
(gdb) break paris
(gdb) r args
[then break point]
[examine ...]
(gdb) p inside_wine_glass
[... then let it run to end]
(gdb) c
[start again without quitting gdb!]
(gdb) r args
[then break point]
[examine ...]
(gdb) p wine_bottle
gdb coredumps!

Cheers,
--patrick



sd2: 15322MB, 512 bytes/sector, 31379456 sectors
sd2 detached
scsibus2 detached
umass1 detached
panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: 
file ../../../../kern/kern_lock.c, line 126
Stopped at  Debugger+0x5:   leave   
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0} Debugger() at Debugger+0x5
panic() at panic+0xee
__assert() at __assert+0x21
_kernel_lock_init() at _kernel_lock_init
issignal() at issignal+0x205
sleep_setup_signal() at sleep_setup_signal+0x94
tsleep() at tsleep+0x92
thrsleep() at thrsleep+0x1b5
sys___thrsleep() at sys___thrsleep+0x6f
syscall() at syscall+0x249
--- syscall (number 94) ---
end of kernel
end trace frame: 0x1d47e3209d80, count: -10
0x1d47e4eba95a:
ddb{0}PID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND   
  
  2944  15477   2944   1000  30x83  wait  gdb 
 14290   2944  30442   1000  2   0x4100603myapp   
  5416   2944  30442   1000  2   0x4100603myapp   
  7295   2944  30442   1000  2   0x4100603myapp   
 17989   2944  30442   1000  2   0x4100603myapp   
 21860   2944  30442   1000  2   0x4100603myapp   
 25934   2944  30442   1000  2   0x4100603myapp   
 32131   2944 

Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot

2014-01-28 Thread Philip Guenther
On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com wrote:
 Not sure if anyone is interested in this panic as the
 snapshot is a bit old.

 Happend while gdb-ing a process.

 (typing off images; expect typos)

 login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 
 failed: file ../../../../lock.c, line 126

That's the bug I was working on at the start of the n2k14 hackathon.
I've managed to simplify some of the bits underneath it, so maybe I
can get the intermediate problem (recursive tsleep() usage)
resolved...


 Side note 1:
 Incidentally, I couldn't figure out how to correctly leave
 a symbol unresolved while compiling myapp, until dlopen()
 time.

If you can include the shared-object/library in the link command line,
then the symbol can be referenced at link time.  That lets you refer
to the symbols directly as if they were in the executable itself,
though you must still declare them for the compiler to know what type
they are, etc.

If you need to delay the loading of the shared-object until after
process startup via dlopen(), then the executable should use dlsym()
to get the symbol address, passing it the handle that dlopen()
returned.

c.f. the dlsym(3) manpage for details and other possbilities.


 Side note 2:
 I see gdb crash quite often on subsequent run of app being
 debugged. I've noticed this on a bunch of snapshots dating
 some months back. Never reported it cause, I don't have a
 simple test-case to submit, and things have been moving
 fast with what you guys are doing.

gdb has had many bugs in the past.  Many are fixed in versions newer
than the version that are shipped with OpenBSD, but licensing keeps us
from including them in general.  :-/  Anyone care to hack on gdb (no
copying GPLv3 source!)?


Philip Guenther



Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot

2014-01-28 Thread patrick keshishian
On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote:
 On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com 
 wrote:
  Not sure if anyone is interested in this panic as the
  snapshot is a bit old.
 
  Happend while gdb-ing a process.
 
  (typing off images; expect typos)
 
  login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 
  0 failed: file ../../../../lock.c, line 126
 
 That's the bug I was working on at the start of the n2k14 hackathon.
 I've managed to simplify some of the bits underneath it, so maybe I
 can get the intermediate problem (recursive tsleep() usage)
 resolved...

Thanks.

  Side note 1:
  Incidentally, I couldn't figure out how to correctly leave
  a symbol unresolved while compiling myapp, until dlopen()
  time.
 
 If you can include the shared-object/library in the link command line,
 then the symbol can be referenced at link time.  That lets you refer
 to the symbols directly as if they were in the executable itself,
 though you must still declare them for the compiler to know what type
 they are, etc.

A bit of a chicken-and-egg situation for this case. Unless
I do some ugly hopping around in Makefiles, it would be
difficult to include the object on the link line for the
main program.

 If you need to delay the loading of the shared-object until after
 process startup via dlopen(), then the executable should use dlsym()
 to get the symbol address, passing it the handle that dlopen()
 returned.

Yes. I do that right now with a bit of dance. However,
reading a bit about __attribute__((weak)) on-line, I
thought, there may be an easier way to achieve this. But,
I quite possibly misunderstood the use of that attribute.

Thanks again,
--patrick


 c.f. the dlsym(3) manpage for details and other possbilities.
 
 
  Side note 2:
  I see gdb crash quite often on subsequent run of app being
  debugged. I've noticed this on a bunch of snapshots dating
  some months back. Never reported it cause, I don't have a
  simple test-case to submit, and things have been moving
  fast with what you guys are doing.
 
 gdb has had many bugs in the past.  Many are fixed in versions newer
 than the version that are shipped with OpenBSD, but licensing keeps us
 from including them in general.  :-/  Anyone care to hack on gdb (no
 copying GPLv3 source!)?
 
 
 Philip Guenther
 



Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot

2014-01-28 Thread Philip Guenther
On Tue, Jan 28, 2014 at 10:31 PM, patrick keshishian
sids...@boxsoft.com wrote:
 On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote:
..
 If you need to delay the loading of the shared-object until after
 process startup via dlopen(), then the executable should use dlsym()
 to get the symbol address, passing it the handle that dlopen()
 returned.

 Yes. I do that right now with a bit of dance. However,
 reading a bit about __attribute__((weak)) on-line, I
 thought, there may be an easier way to achieve this. But,
 I quite possibly misunderstood the use of that attribute.

Undefined weak symbols are useful for testing for the presence of
optional functionality loaded at the same time as the code with the
reference to the weak symbol.  If the code that may contain the symbol
will be loaded later, using a weak symbol reference is fragile, as it
is dependent on the order of operations and reputably can conflict
with compiler optimizations on some platforms.  Even Sun, who were
quite in love with their dynamic linker, recommended against doing
that (c.f. the Sun Linker and Libraries Guide).


Philip Guenther



Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot

2014-01-28 Thread Jérémie Courrèges-Anglas
patrick keshishian sids...@boxsoft.com writes:

 On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote:
 On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com 
 wrote:
  Not sure if anyone is interested in this panic as the
  snapshot is a bit old.
 
  Happend while gdb-ing a process.
 
  (typing off images; expect typos)
 
  login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 
  0 failed: file ../../../../lock.c, line 126
 
 That's the bug I was working on at the start of the n2k14 hackathon.
 I've managed to simplify some of the bits underneath it, so maybe I
 can get the intermediate problem (recursive tsleep() usage)
 resolved...

 Thanks.

  Side note 1:
  Incidentally, I couldn't figure out how to correctly leave
  a symbol unresolved while compiling myapp, until dlopen()
  time.
 
 If you can include the shared-object/library in the link command line,
 then the symbol can be referenced at link time.  That lets you refer
 to the symbols directly as if they were in the executable itself,
 though you must still declare them for the compiler to know what type
 they are, etc.

 A bit of a chicken-and-egg situation for this case. Unless
 I do some ugly hopping around in Makefiles, it would be
 difficult to include the object on the link line for the
 main program.

 If you need to delay the loading of the shared-object until after
 process startup via dlopen(), then the executable should use dlsym()
 to get the symbol address, passing it the handle that dlopen()
 returned.

 Yes. I do that right now with a bit of dance. However,
 reading a bit about __attribute__((weak)) on-line, I
 thought, there may be an easier way to achieve this. But,
 I quite possibly misunderstood the use of that attribute.

It sounds like you're not using dlsym as intended, then.  If you used:

  void (*my_function)(void);

  my_function = dlsym(handle, target_function);
  if (my_function == NULL)
oops();
  (*my_function)();

then your program wouldn't need to reference target_function at all.

[...]

-- 
jca | PGP: 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
(previous: 0x06A11494 / 61DB D9A0 00A4 67CF 2A90  8961 6191 8FBF 06A1 1494)