Travis Vitek wrote:
Doh! I should know better. Here is the results from a 12d build on the
same hardware.
Does this mean that there is almost no difference between the
intrinsic functions and the out of line ones, or that the test
is too simple to demonstrate them?
I expect the greatest advantage of the intrinsics over ordinary
out-of-line functions to be that they (might) make it possible
for the optimizer to generate better code *in certain contexts*
depending on from where they are called. This is going to be
hard to demonstrate in a simple test case. I suspect we would
need a more realistic test with a number of different uses of
string (and the atomic functions) to get some idea of how much
they might help.
Martin
normal patched
------ 1 threads ------ 1 threads
ms 934 ms 1015
ms/op 0.00005567 ms/op 0.00006050
------ 2 threads ------ 2 threads
ms 6049 ms 6266
ms/op 0.00036055 ms/op 0.00037348
------ 4 threads ------ 4 threads
ms 11948 ms 11813
ms/op 0.00071216 ms/op 0.00070411
------ 8 threads ------ 8 threads
ms 23855 ms 24743
ms/op 0.00142187 ms/op 0.00147480
Martin Sebor wrote:
8d is not thread-safe so the atomic function templates should
be implemented in terms of ordinary increments and decrements
(if they aren't it's a bug). They should only expand to the
atomic assembly (or the Win32 Interlocked) functions in 12X
and 15X build types.
Martin