panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot
Hi, Not sure if anyone is interested in this panic as the snapshot is a bit old. Happend while gdb-ing a process. (typing off images; expect typos) login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: file ../../../../lock.c, line 126 Stopped at Debugger+0x5: leave RUN AT LEAST ... ddb{0} trace Debugger() at Debugger+0x5 panic() at panic+0xee __assert() at __assert+0x21 _kernel_lock_init() at _kernel_lock_init issignal() at issignal+0x205 sleep_setup_signal() at sleep_setup_signal+0x94 tsleep() at tsleep+0x92 thrsleep() at thrsleep+0x1b5 sys___thrsleep() at sys___thrsleep+0x6f syscall() at syscall+0x249 --- syscall (number 94) --- end of kernel end trace frame: 0x1d47e3209d80, count: -10 0x1d47e4eba95a: ddb{0} now, the ps has a lot of output. a lot of firefox and gimp, and... I'm happy to email photos to anyone interested. But I think these are the interesting parts: ddb{0} ps PID PPIDPGRPUID S FLAGS WAITCOMMAND 2944 154772944 100030x83 waitgdb 14290 2944 30442 10002 0x4100603 myapp 5416 2944 30442 10002 0x4100603 myapp 7295 2944 30442 10002 0x4100603 myapp 17989 2944 30442 10002 0x4100603 myapp 21860 2944 30442 10002 0x4100603 myapp 25934 2944 30442 10002 0x4100603 myapp 32131 2944 30442 10002 0x4100603 myapp 4616 2944 30442 10002 0x4100603 myapp 17063 2944 30442 10002 0xc100603 myapp *26101 2944 30442 10007 0xc100683 thrsleepmyapp 30442 2944 30442 10003 0x8000683 pollmyapp 3142 93373142 10004 0x8000403 vi 11005 8117 27631 10003 0x4100082 pollfirefox ... lots of firefox threads ... then other stuff ... After typing all that: *ga!* This shit is in the dmesg :P myapp is what was being debugged. The app dlopen()'s a .so object, looks for a object holding pointers to functions, and executes functions upon events. I had set a break point in one of those functions, or rather I thought I did ... then boom! Interesting? Side note 1: Incidentally, I couldn't figure out how to correctly leave a symbol unresolved while compiling myapp, until dlopen() time. If anyone cares to give me a pointer (ha!) off-list, I'd much appreciate it. I don't know if this is the cause of the problem, but I'd have thunk, error like this, would just crash the app not the system. The loaded functions seem to execute just fine though I see log statements from them. Side note 2: I see gdb crash quite often on subsequent run of app being debugged. I've noticed this on a bunch of snapshots dating some months back. Never reported it cause, I don't have a simple test-case to submit, and things have been moving fast with what you guys are doing. $ gdb someapp (gdb) break paris (gdb) r args [then break point] [examine ...] (gdb) p inside_wine_glass [... then let it run to end] (gdb) c [start again without quitting gdb!] (gdb) r args [then break point] [examine ...] (gdb) p wine_bottle gdb coredumps! Cheers, --patrick sd2: 15322MB, 512 bytes/sector, 31379456 sectors sd2 detached scsibus2 detached umass1 detached panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: file ../../../../kern/kern_lock.c, line 126 Stopped at Debugger+0x5: leave RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO. DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb{0} Debugger() at Debugger+0x5 panic() at panic+0xee __assert() at __assert+0x21 _kernel_lock_init() at _kernel_lock_init issignal() at issignal+0x205 sleep_setup_signal() at sleep_setup_signal+0x94 tsleep() at tsleep+0x92 thrsleep() at thrsleep+0x1b5 sys___thrsleep() at sys___thrsleep+0x6f syscall() at syscall+0x249 --- syscall (number 94) --- end of kernel end trace frame: 0x1d47e3209d80, count: -10 0x1d47e4eba95a: ddb{0}PID PPID PGRPUID S FLAGS WAIT COMMAND 2944 15477 2944 1000 30x83 wait gdb 14290 2944 30442 1000 2 0x4100603myapp 5416 2944 30442 1000 2 0x4100603myapp 7295 2944 30442 1000 2 0x4100603myapp 17989 2944 30442 1000 2 0x4100603myapp 21860 2944 30442 1000 2 0x4100603myapp 25934 2944 30442 1000 2 0x4100603myapp 32131 2944
Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot
On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com wrote: Not sure if anyone is interested in this panic as the snapshot is a bit old. Happend while gdb-ing a process. (typing off images; expect typos) login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: file ../../../../lock.c, line 126 That's the bug I was working on at the start of the n2k14 hackathon. I've managed to simplify some of the bits underneath it, so maybe I can get the intermediate problem (recursive tsleep() usage) resolved... Side note 1: Incidentally, I couldn't figure out how to correctly leave a symbol unresolved while compiling myapp, until dlopen() time. If you can include the shared-object/library in the link command line, then the symbol can be referenced at link time. That lets you refer to the symbols directly as if they were in the executable itself, though you must still declare them for the compiler to know what type they are, etc. If you need to delay the loading of the shared-object until after process startup via dlopen(), then the executable should use dlsym() to get the symbol address, passing it the handle that dlopen() returned. c.f. the dlsym(3) manpage for details and other possbilities. Side note 2: I see gdb crash quite often on subsequent run of app being debugged. I've noticed this on a bunch of snapshots dating some months back. Never reported it cause, I don't have a simple test-case to submit, and things have been moving fast with what you guys are doing. gdb has had many bugs in the past. Many are fixed in versions newer than the version that are shipped with OpenBSD, but licensing keeps us from including them in general. :-/ Anyone care to hack on gdb (no copying GPLv3 source!)? Philip Guenther
Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot
On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote: On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com wrote: Not sure if anyone is interested in this panic as the snapshot is a bit old. Happend while gdb-ing a process. (typing off images; expect typos) login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: file ../../../../lock.c, line 126 That's the bug I was working on at the start of the n2k14 hackathon. I've managed to simplify some of the bits underneath it, so maybe I can get the intermediate problem (recursive tsleep() usage) resolved... Thanks. Side note 1: Incidentally, I couldn't figure out how to correctly leave a symbol unresolved while compiling myapp, until dlopen() time. If you can include the shared-object/library in the link command line, then the symbol can be referenced at link time. That lets you refer to the symbols directly as if they were in the executable itself, though you must still declare them for the compiler to know what type they are, etc. A bit of a chicken-and-egg situation for this case. Unless I do some ugly hopping around in Makefiles, it would be difficult to include the object on the link line for the main program. If you need to delay the loading of the shared-object until after process startup via dlopen(), then the executable should use dlsym() to get the symbol address, passing it the handle that dlopen() returned. Yes. I do that right now with a bit of dance. However, reading a bit about __attribute__((weak)) on-line, I thought, there may be an easier way to achieve this. But, I quite possibly misunderstood the use of that attribute. Thanks again, --patrick c.f. the dlsym(3) manpage for details and other possbilities. Side note 2: I see gdb crash quite often on subsequent run of app being debugged. I've noticed this on a bunch of snapshots dating some months back. Never reported it cause, I don't have a simple test-case to submit, and things have been moving fast with what you guys are doing. gdb has had many bugs in the past. Many are fixed in versions newer than the version that are shipped with OpenBSD, but licensing keeps us from including them in general. :-/ Anyone care to hack on gdb (no copying GPLv3 source!)? Philip Guenther
Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot
On Tue, Jan 28, 2014 at 10:31 PM, patrick keshishian sids...@boxsoft.com wrote: On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote: .. If you need to delay the loading of the shared-object until after process startup via dlopen(), then the executable should use dlsym() to get the symbol address, passing it the handle that dlopen() returned. Yes. I do that right now with a bit of dance. However, reading a bit about __attribute__((weak)) on-line, I thought, there may be an easier way to achieve this. But, I quite possibly misunderstood the use of that attribute. Undefined weak symbols are useful for testing for the presence of optional functionality loaded at the same time as the code with the reference to the weak symbol. If the code that may contain the symbol will be loaded later, using a weak symbol reference is fragile, as it is dependent on the order of operations and reputably can conflict with compiler optimizations on some platforms. Even Sun, who were quite in love with their dynamic linker, recommended against doing that (c.f. the Sun Linker and Libraries Guide). Philip Guenther
Re: panic: __mp_lock_held(sched_lock) 2013-DEC-27 snapshot
patrick keshishian sids...@boxsoft.com writes: On Tue, Jan 28, 2014 at 09:02:31PM -0800, Philip Guenther wrote: On Tue, Jan 28, 2014 at 4:55 PM, patrick keshishian sids...@boxsoft.com wrote: Not sure if anyone is interested in this panic as the snapshot is a bit old. Happend while gdb-ing a process. (typing off images; expect typos) login: panic: kernel diagnostic assertion __mp_lock_held(sched_lock) == 0 failed: file ../../../../lock.c, line 126 That's the bug I was working on at the start of the n2k14 hackathon. I've managed to simplify some of the bits underneath it, so maybe I can get the intermediate problem (recursive tsleep() usage) resolved... Thanks. Side note 1: Incidentally, I couldn't figure out how to correctly leave a symbol unresolved while compiling myapp, until dlopen() time. If you can include the shared-object/library in the link command line, then the symbol can be referenced at link time. That lets you refer to the symbols directly as if they were in the executable itself, though you must still declare them for the compiler to know what type they are, etc. A bit of a chicken-and-egg situation for this case. Unless I do some ugly hopping around in Makefiles, it would be difficult to include the object on the link line for the main program. If you need to delay the loading of the shared-object until after process startup via dlopen(), then the executable should use dlsym() to get the symbol address, passing it the handle that dlopen() returned. Yes. I do that right now with a bit of dance. However, reading a bit about __attribute__((weak)) on-line, I thought, there may be an easier way to achieve this. But, I quite possibly misunderstood the use of that attribute. It sounds like you're not using dlsym as intended, then. If you used: void (*my_function)(void); my_function = dlsym(handle, target_function); if (my_function == NULL) oops(); (*my_function)(); then your program wouldn't need to reference target_function at all. [...] -- jca | PGP: 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE (previous: 0x06A11494 / 61DB D9A0 00A4 67CF 2A90 8961 6191 8FBF 06A1 1494)