On Wed, 26 Dec 2007, Vamsee Priya wrote:
> Hi > Apart from the scenario I explained below, I also get a SIGABRT with the > following stack trace This is libumem catching a memory error (either a double free or a heap overrun). On this coredump, what'S the output of "::umem_status" when you load it in mdb ? FrankH. > > libc.so.1`_lwp_kill+0x15(1, 6) > libc.so.1`raise+0x1f(6) > libumem.so.1`umem_do_abort+0x25(9, fefb5000, 804691c, fef98b1c, > fefa3ae8, 80aa810) > libumem.so.1`umem_err_recoverable+0x46(fefa3ae8) > libumem.so.1`umem_error+0x453(1, 80aa810, 80c7c68) > libumem.so.1`umem_free+0xf6(80c7c68, 50) > libumem.so.1`process_free+0xfd(80c7c70, 1, 0, 80469a8, 805890b, 80c7c70) > libumem.so.1`free+0x14(80c7c70, 80c1b48, 0, 80c1b60) > meta_free+0xbf(80469f0, 80c1b88, 1, 80a0bd0, 0, 0) > active_out+0x44e(6, 8047e1f, 0, 65) > active+0xe0(2, 8047e1f, 0, 0, 8046b57) > main+0xd59(6, 8047d0c, 8047d28) > _start+0x80(6, 8047dd8, 8047df7, 8047dfa, 8047e0c, 8047e1f) > > Please suggest me as to what can be done to over come these issues. > > Thanks > Priya > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Vamsee > Priya > Sent: Wednesday, December 26, 2007 2:48 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Cc: [email protected] > Subject: Re: [osol-discuss] SIGSEGV in libc.so.1`_malloc_unlocked > onSolarisx86 machine > > Hi All, > > Thanks a lot for the responses. I used libumem to find out where the > error occurred. But after I set the variables LD_PRELOAD and UMEM_DEBUG, > I found that sometimes the SIGSEGV was gone!!!!.... > > But this process runs on two machines simultaneously and these two > machines communicate about the progress of each process. When SIGSEGV is > gone (on the machine where it occurs), I find that other machine gets a > SIGABRT signal and the generated core dump shows the following info > when I use a mdb to see what's happening.( I have set the variables > LD_PRELOAD and UMEM_DEBUG on this machine where I get the following > core) > > mdb core > mdb: core file data for mapping at fedd0000 not saved: Interrupted > system call > mdb: core file data for mapping at fede0000 not saved: Interrupted > system call > mdb: core file data for mapping at fedf0000 not saved: Interrupted > system call > mdb: core file data for mapping at fee01000 not saved: Interrupted > system call > mdb: core file data for mapping at fee10000 not saved: Interrupted > system call > mdb: core file data for mapping at fee20000 not saved: Interrupted > system call > mdb: core file data for mapping at feea0000 not saved: Interrupted > system call > mdb: core file data for mapping at feea5000 not saved: Interrupted > system call > mdb: core file data for mapping at feeb0000 not saved: Interrupted > system call > mdb: core file data for mapping at fef76000 not saved: Interrupted > system call > mdb: core file data for mapping at fef7c000 not saved: Interrupted > system call > mdb: core file data for mapping at fef80000 not saved: Interrupted > system call > mdb: core file data for mapping at fef90000 not saved: Interrupted > system call > mdb: core file data for mapping at fefb5000 not saved: Interrupted > system call > mdb: core file data for mapping at fefba000 not saved: Interrupted > system call > mdb: core file data for mapping at fefd0000 not saved: Interrupted > system call > mdb: core file data for mapping at fefda000 not saved: Interrupted > system call > mdb: core file data for mapping at feffa000 not saved: Interrupted > system call > mdb: core file data for mapping at feffb000 not saved: Interrupted > system call > mdb: warning: librtld_db failed to initialize; shared library > information will not be available > Loading modules: [ ld.so.1 ] >> ::umem_status > mdb: invalid command '::umem_status': unknown dcmd name > > I am not getting as to what can be done further. If I do not set the > LD_PRELOAD and UMEM_DEBUG on the machine which has the above core, I > find a SIGSEGV similar to the one I reported in my first mail (i.e in > _malloc_unlocked() ) function. > > Please provide me with some inputs as to how can I proceed further? > > Thanks > Priya > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, December 24, 2007 6:21 PM > To: [EMAIL PROTECTED] > Cc: Vamsee Priya; [email protected] > Subject: Re: [osol-discuss] SIGSEGV in libc.so.1`_malloc_unlocked on > Solarisx86 machine > > On Mon, 24 Dec 2007, [EMAIL PROTECTED] wrote: > >> >>> Hi >>> I don't find a core dump generated when a SIGSEGV is received. I set > the >>> LD_PRELOAD variable to watchmalloc.so.1 but could not find the actual >>> place of seg. fault as the core dump file is not generated. (I got > the >>> stack trace I pasted when I attached mdb to the process) I don't have > a >>> Sun studio compiler to run dbx. >>> Any more tools with which I can debug futher? >> >> You can use "coreadm" to redirect the core someplace. >> >> Does your program call "chdir()"? If so, the core dump will be > elsewhere. >> >> Note that with watchmalloc.so.1 you will also need to set some other >> variables. > > ... which are, like all good Solaris features, documented in the > manpages, > watchmalloc(3MALLOC) in that case :) > > watchmalloc and libumem are somewhat complementary, some problems are > easier to track with one some easier with the other. > > Merry christmas, > FrankH. > > > _______________________________________________ > opensolaris-discuss mailing list > [email protected] > > > _______________________________________________ opensolaris-discuss mailing list [email protected]
