Travis Vitek wrote:
Doh! I should know better. Here is the results from a 12d build on the
same hardware.

Does this mean that there is almost no difference between the
intrinsic functions and the out of line ones, or that the test
is too simple to demonstrate them?

I expect the greatest advantage of the intrinsics over ordinary out-of-line functions to be that they (might) make it possible
for the optimizer to generate better code *in certain contexts*
depending on from where they are called. This is going to be
hard to demonstrate in a simple test case. I suspect we would
need a more realistic test with a number of different uses of
string (and the atomic functions) to get some idea of how much
they might help.

Martin


  normal              patched
  ------  1 threads   ------  1 threads
  ms            934   ms           1015
  ms/op  0.00005567   ms/op  0.00006050
  ------  2 threads   ------  2 threads
  ms           6049   ms           6266
  ms/op  0.00036055   ms/op  0.00037348
  ------  4 threads   ------  4 threads
  ms          11948   ms          11813
  ms/op  0.00071216   ms/op  0.00070411
  ------  8 threads   ------  8 threads
  ms          23855   ms          24743
ms/op 0.00142187 ms/op 0.00147480


Martin Sebor wrote:
8d is not thread-safe so the atomic function templates should
be implemented in terms of ordinary increments and decrements
(if they aren't it's a bug). They should only expand to the
atomic assembly (or the Win32 Interlocked) functions in 12X
and 15X build types.

Martin


Reply via email to