On Sep 29, 2004, at 7:37 AM, Bruce Momjian wrote:

Karel Zak wrote:
On Sat, 2004-09-25 at 23:23 +0200, Manfred Spraul wrote:
[EMAIL PROTECTED] wrote:

If the memset
bypasses the cache then the following access will cause a cache line
miss, which can be so slow that using the faster memset can result in a
net performance loss.

Could you suggest some structs to test? If I get your meaning, I would make a loop that sets then reads from the structure.


Read the sources and the cpu specs. Benchmarking such problems is
virtually impossible.
I don't have OS-X, thus I checked the Linux-kernel sources: It seems
that the power architecture doesn't have the same problem as x86.
There is a special clear cacheline instruction for large memsets and the
rest is done through carefully optimized store byte/halfword/word/double
word sequences.


Thus I'd check what happens if you memset not perfectly aligned buffers.
That's another point where over-optimized functions sometimes break
down. If there is no slowdown, then I'd replace the postgres function
with the OS provided function.



all memory (via malloc and friends) will be aligned on OS X, unless you remove padding (which I don't think you do)


I'd add some __builtin_constant_p() optimizations, but I guess Tom won't
like gcc hacks ;-)

I think it cannot be problem if you write it to some .h file (in port directory?) as macro with "#ifdef GCC". The other thing is real advantage of hacks like this in practical PG usage :-)

The reason MemSet is a win is not that the C code is great but because it eliminates a function call.


Using MemSet really did speed things up. I think the function overhead is okay. As for real world usage, the function ExecMakeFunctionResult dropped from the top of the list when profiling (now < 1% vs 16% before)! This was doing a big nasty delete (w/ cascading), insert in a cursor.


Here are results for a Mac G4 (single processor) OS 10.3, using -O2. This time the mac memset wins all around. Someone posted that this wasn't the case.

PG MemSet:
pgmemset_test 32
0.670u 0.020s 0:00.70 98.5%     0+0k 0+0io 0pf+0w
pgmemset_test 64
1.060u 0.000s 0:01.05 100.9%    0+0k 0+0io 0pf+0w
pgmemset_test 128
1.750u 0.010s 0:01.76 100.0%    0+0k 0+0io 0pf+0w
pgmemset_test 512
6.010u 0.030s 0:06.04 100.0%    0+0k 0+0io 0pf+0w

Mac memset:
memset_test 32
0.660u 0.020s 0:00.67 101.4%    0+0k 0+0io 0pf+0w
memset_test 64
0.720u 0.000s 0:00.72 100.0%    0+0k 0+0io 0pf+0w
memset_test 128
0.800u 0.010s 0:00.81 100.0%    0+0k 0+0io 0pf+0w
memset_test 512
1.470u 0.010s 0:01.48 100.0%    0+0k 0+0io 0pf+0w

Now I check about setting a byte after I memset, and it does slow down a tiny bit. But it is the same for both MemSet and memset for under 64.



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to