On Tue, Apr 27, 2010 at 12:40:41PM -0700, johan...@opensolaris.org wrote:
> On Tue, Apr 27, 2010 at 09:06:46PM +0200, Jens Elkner wrote:
> > t...@150 (l...@150) terminated by signal SEGV (no mapping at the fault 
> > address)
> > 0xff2570a8: t_splay+0x0010:     ld       [%o2 + 8], %o1
> > Current function is dec_argv
> >  1764           s = (char **)malloc((nelem + 1) * (sizeof *s));
> > current thread: t...@150
> >   [1] t_splay(0x85a04, 0x0, 0x1fffff, 0x85808, 0x0, 0xff337480), at 
> > 0xff2570a8 
> >   [2] t_delete(0x85a04, 0x1fc, 0x1fffff, 0xff256f30, 0xff3303a8, 0x0), at 
> > 0xff256f30 
> >   [3] realfree(0x85800, 0x1ff, 0xd98dc, 0x8b7a0, 0x0, 0x8a768), at 
> > 0xff256b44 
> >   [4] cleanfree(0x0, 0xe, 0xd902c, 0x0, 0xff3303a8, 0xff3392a4), at 
> > 0xff2573cc 
> >   [5] _malloc_unlocked(0x28, 0x0, 0x0, 0x0, 0xfffffffc, 0x0), at 0xff256524 
> >   [6] malloc(0x24, 0x1, 0xd9fd8, 0x0, 0xff3303a8, 0xff33a518), at 
> > 0xff256414 
...
> > Is anybody able to spot, what's going wrong here? 
> 
> This looks like a classic case of heap corruption.  You've died in
> t_splay while trying to coalese free blocks before performing an
> allocation.  Sometimes this happens when an object is double-free'd,
> when you free an object that wasn't allocated by the allocator, or other
> similar mistakes.

Direct hit! Found it in the milter code (i.e. not in libmilter as
assumed): a struct was malloced but one char* member was overseen when
initializing it (not set to NULL). So in the cleanup function of the
milter the chaos started: if (cf->helo) free(cf->helo);
 
> Can you run this application under libumem?  It has a bunch of debugging
> features that might help you out here.
> 
> This is what I typically use when trying to debug heap corruption.  What 
> follows is for a 32-bit application.  Omit the _32 if you're 64-bit.
> 
> LD_PRELOAD_32=libumem.so
> UMEM_DEBUG='audit=50,guards,contents'
> UMEM_LOGGING='transaction,fail,contents'
> 
> export LD_PRELOAD_32 UMEM_DEBUG UMEM_LOGGING
> 
> HTH,

Yes - really, really cool stuff! dbx pointed immediately to the line shown
above. Everything else was a matter of 5 minutes. Have it running on two
machines now for about 2.5 hours without any problem!

Thanx a lot (incl. all others for the given hints)!


BTW: bcheck seems to work on that machine (not sure, what output one
should expect, but 4me it looks ok), but AFAICS didn't discover suspicious
stuff:

Actual leaks report    (actual leaks:            0  total size:          0 
bytes)

 

Possible leaks report  (possible leaks:          0  total size:          0 
bytes)

 

Blocks in use report   (blocks in use:         127  total size:      14430 
bytes)

  Total    % of Num of  Avg    Allocation call stack
  Size      All Blocks  Size
========== ==== ====== ======  =======================================
     10428  72%      1  10428  get_zone < getsystemTZ < _localtime_r < ctime < 
main 
      3136  21%     56     56  optadd < optparse 
       672   4%     53     12  _strdup < optadd 
        81  <1%      5     16  _strdup < optaddarg 
        54  <1%      7      7  _strdup < optadd 
        31  <1%      4      7  _strdup < optadd 
        28  <1%      1     28  _strdup < optaddarg < optparse < main 
 

Regards,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to