On Tue, Apr 27, 2010 at 12:40:41PM -0700, johan...@opensolaris.org wrote: > On Tue, Apr 27, 2010 at 09:06:46PM +0200, Jens Elkner wrote: > > t...@150 (l...@150) terminated by signal SEGV (no mapping at the fault > > address) > > 0xff2570a8: t_splay+0x0010: ld [%o2 + 8], %o1 > > Current function is dec_argv > > 1764 s = (char **)malloc((nelem + 1) * (sizeof *s)); > > current thread: t...@150 > > [1] t_splay(0x85a04, 0x0, 0x1fffff, 0x85808, 0x0, 0xff337480), at > > 0xff2570a8 > > [2] t_delete(0x85a04, 0x1fc, 0x1fffff, 0xff256f30, 0xff3303a8, 0x0), at > > 0xff256f30 > > [3] realfree(0x85800, 0x1ff, 0xd98dc, 0x8b7a0, 0x0, 0x8a768), at > > 0xff256b44 > > [4] cleanfree(0x0, 0xe, 0xd902c, 0x0, 0xff3303a8, 0xff3392a4), at > > 0xff2573cc > > [5] _malloc_unlocked(0x28, 0x0, 0x0, 0x0, 0xfffffffc, 0x0), at 0xff256524 > > [6] malloc(0x24, 0x1, 0xd9fd8, 0x0, 0xff3303a8, 0xff33a518), at > > 0xff256414 ... > > Is anybody able to spot, what's going wrong here? > > This looks like a classic case of heap corruption. You've died in > t_splay while trying to coalese free blocks before performing an > allocation. Sometimes this happens when an object is double-free'd, > when you free an object that wasn't allocated by the allocator, or other > similar mistakes.
Direct hit! Found it in the milter code (i.e. not in libmilter as assumed): a struct was malloced but one char* member was overseen when initializing it (not set to NULL). So in the cleanup function of the milter the chaos started: if (cf->helo) free(cf->helo); > Can you run this application under libumem? It has a bunch of debugging > features that might help you out here. > > This is what I typically use when trying to debug heap corruption. What > follows is for a 32-bit application. Omit the _32 if you're 64-bit. > > LD_PRELOAD_32=libumem.so > UMEM_DEBUG='audit=50,guards,contents' > UMEM_LOGGING='transaction,fail,contents' > > export LD_PRELOAD_32 UMEM_DEBUG UMEM_LOGGING > > HTH, Yes - really, really cool stuff! dbx pointed immediately to the line shown above. Everything else was a matter of 5 minutes. Have it running on two machines now for about 2.5 hours without any problem! Thanx a lot (incl. all others for the given hints)! BTW: bcheck seems to work on that machine (not sure, what output one should expect, but 4me it looks ok), but AFAICS didn't discover suspicious stuff: Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 0 total size: 0 bytes) Blocks in use report (blocks in use: 127 total size: 14430 bytes) Total % of Num of Avg Allocation call stack Size All Blocks Size ========== ==== ====== ====== ======================================= 10428 72% 1 10428 get_zone < getsystemTZ < _localtime_r < ctime < main 3136 21% 56 56 optadd < optparse 672 4% 53 12 _strdup < optadd 81 <1% 5 16 _strdup < optaddarg 54 <1% 7 7 _strdup < optadd 31 <1% 4 7 _strdup < optadd 28 <1% 1 28 _strdup < optaddarg < optparse < main Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code