Hey Krister,
Does this pertain to libumem as well? It seems that libumem similarly isn't
doing so well on this benchmark.
Adam
On Thu, May 01, 2008 at 02:48:06PM -0700, johansen at sun.com wrote:
> Yeah, I did some digging when I had a free moment. The following is
> the most germane to your issue.
>
> 5070823 poor malloc() performance for small byte sizes
>
> -j
>
> On Thu, May 01, 2008 at 05:36:26PM -0400, Matty wrote:
> > We are building our application as a 32-bit entity on both Linux and
> > Solaris, so our
> > comparison should be apples to apples. Does anyone happen to know what the
> > bug id of the small malloc issue is? I searched the opensolaris bug
> > database, but
> > wasn't able to dig this up.
> >
> > Thanks,
> > - Ryan
> >
> >
> >
> > On Thu, May 1, 2008 at 4:33 PM, <johansen at sun.com> wrote:
> > > Part of the problem is that these allocations are very small:
> > >
> > > # dtrace -n 'pid$target::malloc:entry { @a["allocsz"] = quantize(arg0);
> > > }' -c /tmp/xml
> > >
> > > allocsz
> > > value ------------- Distribution ------------- count
> > > 1 | 0
> > > 2 | 300000
> > > 4 |@@@@@ 4700005
> > > 8 |@@ 1600006
> > > 16 |@@@@@ 4300015
> > > 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24000006
> > > 64 | 200001
> > > 128 | 400001
> > > 256 | 100000
> > > 512 | 0
> > > 1024 | 100000
> > > 2048 | 100000
> > > 4096 | 0
> > > 8192 | 100000
> > > 16384 | 0
> > >
> > > After seeing this, I took a look at the exact breakdown of the
> > > allocation sizes:
> > >
> > > # dtrace -n 'pid$target::malloc:entry {...@a[arg0] = count();}' -c
> > > /tmp/xml
> > >
> > > 12 1
> > > 96 1
> > > 200 1
> > > 21 100000
> > > 43 100000
> > > 44 100000
> > > 51 100000
> > > 61 100000
> > > 75 100000
> > > 88 100000
> > > 128 100000
> > > 147 100000
> > > 181 100000
> > > 220 100000
> > > 440 100000
> > > 1024 100000
> > > 2048 100000
> > > 8194 100000
> > > 8 100001
> > > 52 100001
> > > 6 100002
> > > 36 100004
> > > 24 100005
> > > 33 200000
> > > 4 200001
> > > 17 200001
> > > 9 200003
> > > 3 300000
> > > 10 300000
> > > 13 300000
> > > 14 300000
> > > 25 300000
> > > 28 400000
> > > 11 400001
> > > 20 700009
> > > 40 900000
> > > 5 900001
> > > 16 2500000
> > > 7 3500001
> > > 48 3800001
> > > 60 18500000
> > >
> > > The most frequent malloc call is to allocate 60 bytes. I believe that
> > > we have a known issue with small mallocs on Solaris. There's a bug open
> > > for this somewhere; however, I can't find it's number at the moment.
> > >
> > > Another problem that you may have run into is the 32-bit versus 64-bit
> > > compilation problem. I was able to shave about 10 seconds off my
> > > runtime by compiling your testcase as a 64-bit app instead of a 32-bit
> > > one:
> > >
> > >
> > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > > $ file xml
> > > xml: ELF 32-bit LSB executable 80386 Version 1 [FPU],
> > > dynamically linked, not stripped, no debugging information available
> > > $ ./xml
> > > 100000 iter in 22.749836 sec
> > >
> > > versus:
> > >
> > > $ gcc -m64 -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > > $ file xml
> > > xml: ELF 64-bit LSB executable AMD64 Version 1, dynamically
> > > linked, not stripped, no debugging information available
> > > $ ./xml
> > > 100000 iter in 13.785916 sec
> > >
> > >
> > > -j
> > >
> > >
> > >
> > > On Wed, Apr 30, 2008 at 06:44:31PM -0400, Matty wrote:
> > >
> > >
> > > > On Wed, Apr 30, 2008 at 6:26 PM, David Lutz <David.Lutz at sun.com>
> > > > wrote:
> > > > > If your application is single threaded, you could try using the
> > > > > bsdmalloc library. This is a fast malloc, but it is not
> > > multi-thread
> > > > > safe and will also tend to use more memory than the default
> > > > > malloc. For a comparison of different malloc libraries, look
> > > > > at the NOTES section at the end of umem_alloc(3MALLOC).
> > > > >
> > > > > I got the following result with your example code:
> > > > >
> > > > >
> > > > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > > > > $ ./xml
> > > > > 100000 iter in 21.445672 sec
> > > > > $
> > > > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
> > > -lbsdmalloc
> > > > > $ ./xml
> > > > > 100000 iter in 12.761969 sec
> > > > > $
> > > > >
> > > > > I got similar results using Sun Studio 12.
> > > > >
> > > > > Again, bsdmalloc is not multi-thread safe, so use it with caution.
> > > >
> > > > Thanks David. Does anyone happen to know why the memory allocation
> > > > libraries in Solaris are so much slower than their Linux counterparts?
> > > If
> > > > the various malloc implementations were a second or two slower, I could
> > > > understand. But they appear to be 10 - 12 seconds slower in our
> > > specific
> > > > test case, which seems kinda odd.
> > > >
> > > > Thanks,
> > > > - Ryan
> > >
> > >
> > > > _______________________________________________
> > > > perf-discuss mailing list
> > > > perf-discuss at opensolaris.org
> > >
> > _______________________________________________
> > perf-discuss mailing list
> > perf-discuss at opensolaris.org
> _______________________________________________
> tools-compilers mailing list
> tools-compilers at opensolaris.org
--
Adam Leventhal, Fishworks http://blogs.sun.com/ahl