On Wed, 26 Dec 2007, Vamsee Priya wrote:

> Hi
> Apart from the scenario I explained below, I also get a SIGABRT with the
> following stack trace

This is libumem catching a memory error (either a double free or a heap 
overrun). On this coredump, what'S the output of "::umem_status" when you 
load it in mdb ?

FrankH.

>
> libc.so.1`_lwp_kill+0x15(1, 6)
> libc.so.1`raise+0x1f(6)
> libumem.so.1`umem_do_abort+0x25(9, fefb5000, 804691c, fef98b1c,
> fefa3ae8, 80aa810)
> libumem.so.1`umem_err_recoverable+0x46(fefa3ae8)
> libumem.so.1`umem_error+0x453(1, 80aa810, 80c7c68)
> libumem.so.1`umem_free+0xf6(80c7c68, 50)
> libumem.so.1`process_free+0xfd(80c7c70, 1, 0, 80469a8, 805890b, 80c7c70)
> libumem.so.1`free+0x14(80c7c70, 80c1b48, 0, 80c1b60)
> meta_free+0xbf(80469f0, 80c1b88, 1, 80a0bd0, 0, 0)
> active_out+0x44e(6, 8047e1f, 0, 65)
> active+0xe0(2, 8047e1f, 0, 0, 8046b57)
> main+0xd59(6, 8047d0c, 8047d28)
> _start+0x80(6, 8047dd8, 8047df7, 8047dfa, 8047e0c, 8047e1f)
>
> Please suggest me as to what can be done to over come these issues.
>
> Thanks
> Priya
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Vamsee
> Priya
> Sent: Wednesday, December 26, 2007 2:48 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Cc: [email protected]
> Subject: Re: [osol-discuss] SIGSEGV in libc.so.1`_malloc_unlocked
> onSolarisx86 machine
>
> Hi All,
>
> Thanks a lot for the responses. I used libumem to find out where the
> error occurred. But after I set the variables LD_PRELOAD and UMEM_DEBUG,
> I found that sometimes the SIGSEGV was gone!!!!....
>
> But this process runs on two machines simultaneously and these two
> machines communicate about the progress of each process. When SIGSEGV is
> gone (on the machine where it occurs), I find that other machine gets a
> SIGABRT signal and the generated core dump shows  the following info
> when I use a mdb to see what's happening.( I have set the variables
> LD_PRELOAD and UMEM_DEBUG on this machine where I get the following
> core)
>
> mdb core
> mdb: core file data for mapping at fedd0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fede0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fedf0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fee01000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fee10000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fee20000 not saved: Interrupted
> system call
> mdb: core file data for mapping at feea0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at feea5000 not saved: Interrupted
> system call
> mdb: core file data for mapping at feeb0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fef76000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fef7c000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fef80000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fef90000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fefb5000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fefba000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fefd0000 not saved: Interrupted
> system call
> mdb: core file data for mapping at fefda000 not saved: Interrupted
> system call
> mdb: core file data for mapping at feffa000 not saved: Interrupted
> system call
> mdb: core file data for mapping at feffb000 not saved: Interrupted
> system call
> mdb: warning: librtld_db failed to initialize; shared library
> information will not be available
> Loading modules: [ ld.so.1 ]
>> ::umem_status
> mdb: invalid command '::umem_status': unknown dcmd name
>
> I am not getting as to what can be done further. If I do not set the
> LD_PRELOAD and UMEM_DEBUG on the machine which has the above core, I
> find a SIGSEGV similar to the one I reported in my first mail (i.e in
> _malloc_unlocked() ) function.
>
> Please provide me with some inputs as to how can I proceed further?
>
> Thanks
> Priya
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Monday, December 24, 2007 6:21 PM
> To: [EMAIL PROTECTED]
> Cc: Vamsee Priya; [email protected]
> Subject: Re: [osol-discuss] SIGSEGV in libc.so.1`_malloc_unlocked on
> Solarisx86 machine
>
> On Mon, 24 Dec 2007, [EMAIL PROTECTED] wrote:
>
>>
>>> Hi
>>> I don't find a core dump generated when a SIGSEGV is received. I set
> the
>>> LD_PRELOAD variable to watchmalloc.so.1 but could not find the actual
>>> place of seg. fault as the core dump file is not generated. (I got
> the
>>> stack trace I pasted when I attached mdb to the process) I don't have
> a
>>> Sun studio compiler to run dbx.
>>> Any more tools with which I can debug futher?
>>
>> You can use "coreadm" to redirect the core someplace.
>>
>> Does your program call "chdir()"?  If so, the core dump will be
> elsewhere.
>>
>> Note that with watchmalloc.so.1 you will also need to set some other
>> variables.
>
> ... which are, like all good Solaris features, documented in the
> manpages,
> watchmalloc(3MALLOC) in that case :)
>
> watchmalloc and libumem are somewhat complementary, some problems are
> easier to track with one some easier with the other.
>
> Merry christmas,
> FrankH.
>
>
> _______________________________________________
> opensolaris-discuss mailing list
> [email protected]
>
>
>
_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to