Re: [HACKERS] tweaking MemSet() performance - 7.4.5

Marc Colosimo Fri, 01 Oct 2004 11:02:04 -0700

On Sep 29, 2004, at 7:37 AM, Bruce Momjian wrote:

Karel Zak wrote:
On Sat, 2004-09-25 at 23:23 +0200, Manfred Spraul wrote:
[EMAIL PROTECTED] wrote:
If the memset bypasses the cache then the following access will cause a cache line miss, which can be so slow that using the faster memset can result in a net performance loss.
Could you suggest some structs to test? If I get your meaning, I would make a loop that sets then reads from the structure.

Read the sources and the cpu specs. Benchmarking such problems is virtually impossible. I don't have OS-X, thus I checked the Linux-kernel sources: It seems that the power architecture doesn't have the same problem as x86. There is a special clear cacheline instruction for large memsets and the rest is done through carefully optimized store byte/halfword/word/double word sequences.

Thus I'd check what happens if you memset not perfectly aligned buffers. That's another point where over-optimized functions sometimes break down. If there is no slowdown, then I'd replace the postgres function with the OS provided function.

all memory (via malloc and friends) will be aligned on OS X, unless you remove padding (which I don't think you do)

I'd add some __builtin_constant_p() optimizations, but I guess Tom won't like gcc hacks ;-)
I think it cannot be problem if you write it to some .h file (in port
directory?) as macro with "#ifdef GCC". The other thing is real
advantage of hacks like this in practical PG usage :-)


The reason MemSet is a win is not that the C code is great but because
it eliminates a function call.

Using MemSet really did speed things up. I think the function overhead is okay. As for real world usage, the function ExecMakeFunctionResult dropped from the top of the list when profiling (now < 1% vs 16% before)! This was doing a big nasty delete (w/ cascading), insert in a cursor.

Here are results for a Mac G4 (single processor) OS 10.3, using -O2. This time the mac memset wins all around. Someone posted that this wasn't the case.

PG MemSet:
pgmemset_test 32
0.670u 0.020s 0:00.70 98.5%     0+0k 0+0io 0pf+0w
pgmemset_test 64
1.060u 0.000s 0:01.05 100.9%    0+0k 0+0io 0pf+0w
pgmemset_test 128
1.750u 0.010s 0:01.76 100.0%    0+0k 0+0io 0pf+0w
pgmemset_test 512
6.010u 0.030s 0:06.04 100.0%    0+0k 0+0io 0pf+0w

Mac memset:
memset_test 32
0.660u 0.020s 0:00.67 101.4%    0+0k 0+0io 0pf+0w
memset_test 64
0.720u 0.000s 0:00.72 100.0%    0+0k 0+0io 0pf+0w
memset_test 128
0.800u 0.010s 0:00.81 100.0%    0+0k 0+0io 0pf+0w
memset_test 512
1.470u 0.010s 0:01.48 100.0%    0+0k 0+0io 0pf+0w

Now I check about setting a byte after I memset, and it does slow down a tiny bit. But it is the same for both MemSet and memset for under 64.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] tweaking MemSet() performance - 7.4.5

Reply via email to