Yeah, I did some digging when I had a free moment. The following is the most germane to your issue.
5070823 poor malloc() performance for small byte sizes -j On Thu, May 01, 2008 at 05:36:26PM -0400, Matty wrote: > We are building our application as a 32-bit entity on both Linux and > Solaris, so our > comparison should be apples to apples. Does anyone happen to know what the > bug id of the small malloc issue is? I searched the opensolaris bug > database, but > wasn't able to dig this up. > > Thanks, > - Ryan > > > > On Thu, May 1, 2008 at 4:33 PM, <[EMAIL PROTECTED]> wrote: > > Part of the problem is that these allocations are very small: > > > > # dtrace -n 'pid$target::malloc:entry { @a["allocsz"] = quantize(arg0); }' > > -c /tmp/xml > > > > allocsz > > value ------------- Distribution ------------- count > > 1 | 0 > > 2 | 300000 > > 4 |@@@@@ 4700005 > > 8 |@@ 1600006 > > 16 |@@@@@ 4300015 > > 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24000006 > > 64 | 200001 > > 128 | 400001 > > 256 | 100000 > > 512 | 0 > > 1024 | 100000 > > 2048 | 100000 > > 4096 | 0 > > 8192 | 100000 > > 16384 | 0 > > > > After seeing this, I took a look at the exact breakdown of the > > allocation sizes: > > > > # dtrace -n 'pid$target::malloc:entry [EMAIL PROTECTED] = count();}' -c > > /tmp/xml > > > > 12 1 > > 96 1 > > 200 1 > > 21 100000 > > 43 100000 > > 44 100000 > > 51 100000 > > 61 100000 > > 75 100000 > > 88 100000 > > 128 100000 > > 147 100000 > > 181 100000 > > 220 100000 > > 440 100000 > > 1024 100000 > > 2048 100000 > > 8194 100000 > > 8 100001 > > 52 100001 > > 6 100002 > > 36 100004 > > 24 100005 > > 33 200000 > > 4 200001 > > 17 200001 > > 9 200003 > > 3 300000 > > 10 300000 > > 13 300000 > > 14 300000 > > 25 300000 > > 28 400000 > > 11 400001 > > 20 700009 > > 40 900000 > > 5 900001 > > 16 2500000 > > 7 3500001 > > 48 3800001 > > 60 18500000 > > > > The most frequent malloc call is to allocate 60 bytes. I believe that > > we have a known issue with small mallocs on Solaris. There's a bug open > > for this somewhere; however, I can't find it's number at the moment. > > > > Another problem that you may have run into is the 32-bit versus 64-bit > > compilation problem. I was able to shave about 10 seconds off my > > runtime by compiling your testcase as a 64-bit app instead of a 32-bit > > one: > > > > > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c > > $ file xml > > xml: ELF 32-bit LSB executable 80386 Version 1 [FPU], > > dynamically linked, not stripped, no debugging information available > > $ ./xml > > 100000 iter in 22.749836 sec > > > > versus: > > > > $ gcc -m64 -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c > > $ file xml > > xml: ELF 64-bit LSB executable AMD64 Version 1, dynamically > > linked, not stripped, no debugging information available > > $ ./xml > > 100000 iter in 13.785916 sec > > > > > > -j > > > > > > > > On Wed, Apr 30, 2008 at 06:44:31PM -0400, Matty wrote: > > > > > > > On Wed, Apr 30, 2008 at 6:26 PM, David Lutz <[EMAIL PROTECTED]> wrote: > > > > If your application is single threaded, you could try using the > > > > bsdmalloc library. This is a fast malloc, but it is not multi-thread > > > > safe and will also tend to use more memory than the default > > > > malloc. For a comparison of different malloc libraries, look > > > > at the NOTES section at the end of umem_alloc(3MALLOC). > > > > > > > > I got the following result with your example code: > > > > > > > > > > > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c > > > > $ ./xml > > > > 100000 iter in 21.445672 sec > > > > $ > > > > $ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c > > -lbsdmalloc > > > > $ ./xml > > > > 100000 iter in 12.761969 sec > > > > $ > > > > > > > > I got similar results using Sun Studio 12. > > > > > > > > Again, bsdmalloc is not multi-thread safe, so use it with caution. > > > > > > Thanks David. Does anyone happen to know why the memory allocation > > > libraries in Solaris are so much slower than their Linux counterparts? If > > > the various malloc implementations were a second or two slower, I could > > > understand. But they appear to be 10 - 12 seconds slower in our specific > > > test case, which seems kinda odd. > > > > > > Thanks, > > > - Ryan > > > > > > > _______________________________________________ > > > perf-discuss mailing list > > > perf-discuss@opensolaris.org > > > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org