On 24/01/2013 7:51 p.m., Alex Rousskov wrote:
On 01/23/2013 07:05 PM, Amos Jeffries wrote:
On 24/01/2013 7:20 a.m., Kinkie wrote:
the attached patch turns the unsigned int:1 flags in CachePeer to
bools.
Please retain the :1 bitmasking. My microbench is showing a consistent
~50ms speed gain on bitmasks over full bool, particularly when there are
multiple bools in the structure. We also get some useful object size gains.
Hello,
FYI: With g++ -O3, there is no measureable performance difference
between bool and bool:1 in my primitive tests (sources attached). I do
see that non-bool bit fields are consistently slower though ("foo:0"
below means type "foo" without bit fields; bool tests are repeated to
show result variance):
Excellent. Thanks for that. I did not go down to the ASM level for my
benchmarks. Just the 100 million loop iteration timing, runs a few
dozens of times to get an idea of the variance.
The binary was not built with -O at all, so whatever the G++ default is
was used.
<snip>
To me, it looks like bit fields in general may hurt performance where
memory composition is not important (as expected, I guess), and that
some compilers remove any difference between full and bit boolean with
-O3 (that surprised me).
G++ assembly source comparison seem to confirm that -- boolean-based
full and bit assembly sources are virtually identical with -O3 and newer
g++ versions, while bit fields show a lot more assembly operations with
-O0 (both diffs attached). Assembly is well beyond my expertise though.
At -O3 G++ is optimizing for speed at expense of code size.
-O2 is probably a better comparision level and AFAIK the preferred level
for high-performance and small code size build.
Am I testing this wrong or is it a case of YMMV? If it is "YMMV", should
we err on the side of simplicity and use simple bool where memory
savings are not important or not existent?
I think YMMV with the run-time measurement. I had to run the tests many
times to get an average variance range on the speed even at 100M loops.
Some runs the speed was 100ms out in the other direction, but only some,
most were 50ms towards bool:1. And the results differed between flag
position and struct with 1-byte length and struct with enough flags for
2-bytes.
I did not have time to look at the ASM, thank you for the details there.
If -O2 shows the same level of cycles reduction I think I will change my
tack...
we should be letting it handle the bitfields. BUT, we should still
take care to arrange the flags and members such that -O has an easy job
reducing them.
Amos