Hey Krister,

Does this pertain to libumem as well? It seems that libumem similarly isn't
doing so well on this benchmark.

Adam

On Thu, May 01, 2008 at 02:48:06PM -0700, johansen at sun.com wrote:
> Yeah, I did some digging when I had a free moment.  The following is
> the most germane to your issue.
> 
>       5070823 poor malloc() performance for small byte sizes
> 
> -j
> 
> On Thu, May 01, 2008 at 05:36:26PM -0400, Matty wrote:
> > We are building our application as a 32-bit entity on both Linux and
> > Solaris, so our
> > comparison should be apples to apples. Does anyone happen to know what the
> > bug id of the small malloc issue is? I searched the opensolaris bug
> > database, but
> > wasn't able to dig this up.
> > 
> > Thanks,
> > - Ryan
> > 
> > 
> > 
> > On Thu, May 1, 2008 at 4:33 PM,  <johansen at sun.com> wrote:
> > > Part of the problem is that these allocations are very small:
> > >
> > >  # dtrace -n 'pid$target::malloc:entry { @a["allocsz"] = quantize(arg0); 
> > > }' -c /tmp/xml
> > >
> > >   allocsz
> > >            value  ------------- Distribution ------------- count
> > >                1 |                                         0
> > >                2 |                                         300000
> > >                4 |@@@@@                                    4700005
> > >                8 |@@                                       1600006
> > >               16 |@@@@@                                    4300015
> > >               32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              24000006
> > >               64 |                                         200001
> > >              128 |                                         400001
> > >              256 |                                         100000
> > >              512 |                                         0
> > >             1024 |                                         100000
> > >             2048 |                                         100000
> > >             4096 |                                         0
> > >             8192 |                                         100000
> > >            16384 |                                         0
> > >
> > >  After seeing this, I took a look at the exact breakdown of the
> > >  allocation sizes:
> > >
> > >  # dtrace -n 'pid$target::malloc:entry {...@a[arg0] = count();}' -c 
> > > /tmp/xml
> > >
> > >                12                1
> > >                96                1
> > >               200                1
> > >                21           100000
> > >                43           100000
> > >                44           100000
> > >                51           100000
> > >                61           100000
> > >                75           100000
> > >                88           100000
> > >               128           100000
> > >               147           100000
> > >               181           100000
> > >               220           100000
> > >               440           100000
> > >              1024           100000
> > >              2048           100000
> > >              8194           100000
> > >                 8           100001
> > >                52           100001
> > >                 6           100002
> > >                36           100004
> > >                24           100005
> > >                33           200000
> > >                 4           200001
> > >                17           200001
> > >                 9           200003
> > >                 3           300000
> > >                10           300000
> > >                13           300000
> > >                14           300000
> > >                25           300000
> > >                28           400000
> > >                11           400001
> > >                20           700009
> > >                40           900000
> > >                 5           900001
> > >                16          2500000
> > >                 7          3500001
> > >                48          3800001
> > >                60         18500000
> > >
> > >  The most frequent malloc call is to allocate 60 bytes.  I believe that
> > >  we have a known issue with small mallocs on Solaris.  There's a bug open
> > >  for this somewhere; however, I can't find it's number at the moment.
> > >
> > >  Another problem that you may have run into is the 32-bit versus 64-bit
> > >  compilation problem.  I was able to shave about 10 seconds off my
> > >  runtime by compiling your testcase as a 64-bit app instead of a 32-bit
> > >  one:
> > >
> > >
> > >  $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > >  $ file xml
> > >  xml:            ELF 32-bit LSB executable 80386 Version 1 [FPU], 
> > > dynamically linked, not stripped, no debugging information available
> > >  $ ./xml
> > >  100000 iter in 22.749836 sec
> > >
> > >  versus:
> > >
> > >  $ gcc -m64 -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > >  $ file xml
> > >  xml:            ELF 64-bit LSB executable AMD64 Version 1, dynamically 
> > > linked, not stripped, no debugging information available
> > >  $ ./xml
> > >  100000 iter in 13.785916 sec
> > >
> > >
> > >  -j
> > >
> > >
> > >
> > >  On Wed, Apr 30, 2008 at 06:44:31PM -0400, Matty wrote:
> > >
> > >
> > > > On Wed, Apr 30, 2008 at 6:26 PM, David Lutz <David.Lutz at sun.com> 
> > > > wrote:
> > >  > > If your application is single threaded, you could try using the
> > >  > >  bsdmalloc library.  This is a fast malloc, but it is not 
> > > multi-thread
> > >  > >  safe and will also tend to use more memory than the default
> > >  > >  malloc.  For  a comparison of different malloc libraries, look
> > >  > >  at the NOTES section at the end of umem_alloc(3MALLOC).
> > >  > >
> > >  > >  I got the following result with your example code:
> > >  > >
> > >  > >
> > >  > >  $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > >  > >  $ ./xml
> > >  > >  100000 iter in 21.445672 sec
> > >  > >  $
> > >  > >  $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c 
> > > -lbsdmalloc
> > >  > >  $ ./xml
> > >  > >  100000 iter in 12.761969 sec
> > >  > >  $
> > >  > >
> > >  > >  I got similar results using Sun Studio 12.
> > >  > >
> > >  > >  Again, bsdmalloc is not multi-thread safe, so use it with caution.
> > >  >
> > >  > Thanks David. Does anyone happen to know why the memory allocation
> > >  > libraries in Solaris are so much slower than their Linux counterparts? 
> > > If
> > >  > the various malloc implementations were a second or two slower, I could
> > >  > understand. But they appear to be 10 - 12 seconds slower in our 
> > > specific
> > >  > test case, which seems kinda odd.
> > >  >
> > >  > Thanks,
> > >  > - Ryan
> > >
> > >
> > > > _______________________________________________
> > >  > perf-discuss mailing list
> > >  > perf-discuss at opensolaris.org
> > >
> > _______________________________________________
> > perf-discuss mailing list
> > perf-discuss at opensolaris.org 
> _______________________________________________
> tools-compilers mailing list
> tools-compilers at opensolaris.org

-- 
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

Reply via email to