On Fri, May 22, 2009 at 9:29 PM, Mike Gerdts <mger...@gmail.com> wrote: > On Fri, May 22, 2009 at 6:52 PM, Jason King <ja...@ansipunx.net> wrote: >> On Fri, May 22, 2009 at 5:46 PM, Mike Gerdts <mger...@gmail.com> wrote: >>> I've encountered something that confuses me in a core file because dbx >>> and mdb seem to be telling me different things. According to dbx: >>> >>> $ dbx /.../httpd core >>> ... >>> t...@3 (l...@3) terminated by signal ILL (Illegal Instruction) >>> 0xff046df0: __lwp_park+0x0014: bcc,a,pt %icc,__lwp_park+0x24 ! 0xff046e00 >>> ... >>> (dbx) dis __lwp_park >>> 0xff046ddc: __lwp_park : mov %o1, %o2 >>> 0xff046de0: __lwp_park+0x0004: mov %o0, %o1 >>> 0xff046de4: __lwp_park+0x0008: clr %o0 >>> 0xff046de8: __lwp_park+0x000c: mov 77, %g1 >>> 0xff046dec: __lwp_park+0x0010: ta 8 >>> 0xff046df0: __lwp_park+0x0014: bcc,a,pt %icc,__lwp_park+0x24 ! 0xff046e00 >>> 0xff046df4: __lwp_park+0x0018: clr %o0 >>> 0xff046df8: __lwp_park+0x001c: cmp %o0, 91 >>> 0xff046dfc: __lwp_park+0x0020: move %icc,0x4, %o0 >>> 0xff046e00: __lwp_park+0x0024: retl >>> >>> Notice that there is no indication that __lwp_park_0x14 has anything >>> wrong with it. This appears to be hand-coded assembly that hasn't >>> changed for a long time. >>> >>> However, when I look at it with mdb, I see: >>> >>> # mdb core >>> Loading modules: [ libc.so.1 libuutil.so.1 ld.so.1 ] >>>> ::stack >>> libc.so.1`__lwp_park+0x14(0, 174058, 0, 0, 7d89c, 0) >>> libc.so.1`cond_wait_queue+0x4c(174098, 174058, 0, 0, 1c00, 0) >>> libc.so.1`cond_wait+0x10(174098, 174058, 0, 1c00, 0, 1741b8) >>> libc.so.1`pthread_cond_wait+8(174098, 174058, 0, 0, 174058, ff0402a0) >>> libapr-0.so.0.9.4`apr_thread_cond_wait+0x44(174090, 174050, 2aa688, >>> 240, 2aa780, 34e820) >>> ap_queue_pop+0x78(174038, fe2fbf1c, fe2fbf10, 0, 34e820, ff0c5480) >>> worker_thread+0x160(1742f8, 1a4858, 0, 0, feb40a00, 1) >>> libapr-0.so.0.9.4`dummy_worker+0x48(1742f8, fe2fc000, 0, 0, ff2d0a58, 1) >>> libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0) >>>> ::status >>> debugging core file of httpd (32-bit) from XXX >>> file: /.../httpd >>> initial argv: /.../httpd ... >>> threading model: multi-threaded >>> status: process terminated by SIGILL (Illegal Instruction) >>>> libc.so.1`__lwp_park::dis >>> libc.so.1`__lwp_park: mov %o1, %o2 >>> libc.so.1`__lwp_park+4: mov %o0, %o1 >>> libc.so.1`__lwp_park+8: clr %o0 >>> libc.so.1`__lwp_park+0xc: mov 0x4d, %g1 >>> libc.so.1`__lwp_park+0x10: ta 0x8 >>> libc.so.1`__lwp_park+0x14: 0x3a480004 <<< Notice that this >>> instruction is not decoded! >>> libc.so.1`__lwp_park+0x18: clr %o0 >>> libc.so.1`__lwp_park+0x1c: cmp %o0, 0x5b >>> libc.so.1`__lwp_park+0x20: 0x91646004 >>> libc.so.1`__lwp_park+0x24: retl >>> libc.so.1`__lwp_park+0x28: nop >>> >>> According to mdb, it looks to me like someone clobbered __lwp+0x14. >>> However, pmap suggests this shouldn't be possible because the memory >>> segment is readable and executable but not writable. >>> >>> $ pmap core >>> .... >>> FEF80000 1152K r-x-- /lib/libc.so.1 >>> ... >>> >>> >From the dbx output in the beginning of this message, we could see >>> that __lwp_park starts at address 0xff046df0, which is between >>> FEF80000 and FF0A0000 (FF0A0000 = FEF80000 + 1152k). >>> >>> Why do mdb and dbx disagree on the contents of __lwp_park+0x14 (and >>> +0x20)? Is mdb reading the core and dbx reading the executable and >>> libs from the file system? Does modification to a read-only memory >>> segment suggest a hardware error? >> >> That is the behavior when an invalid instruction (for the given >> disassembly mode) is given (which should match the old disassembler >> behavior). 0x3a480004 is the value for 'bcc,a,pt +0x10' >> (__lwp_park+0x24), but that is a sparcv9 instruction, not a v8, and >> the disassembler is very strict in how it decodes instructions :) >> >> Try ::dismode v9 and doing it again. It should agree -- it tested >> using /usr/bin/dis on libc.so on a recent box (mdb use the same code >> to disassemble, however I do not know what dbx is using). >> > > That did the trick. > > However, I'm extremely confused as to why the process died with SIGILL > when trying to execute this instruction.
It does sound interesting -- I'm not super familiar with the chips, but the only way I could see it happening is if there was some strict v8 mode on the chip (or worse, a hw bug!). But AFAIK, all recent sparc chips (at least from Sun or Fujitsu) even 32bit mode supports v8+ (thus should be valid). Can you reproduce the crash consistently (not that I can help much, but I'm curious myself)? _______________________________________________ tools-discuss mailing list tools-discuss@opensolaris.org