Am 08.04.2013 21:02, schrieb Justin Chudgar:
On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote:
I had experimentally thrown an optimization into my module's only
significantly warm functions. Since I am a novice, this was a
just-for-kicks experiment, but I would like to know whether to optimize at
all beyond the general "-O2", and what platforms are critical to consider
since I only use pulse on systems that are sufficient to run at "-O0"
without noticeable problems beyond unnecessary power consumption.
From another thread:
I'm not sure what to think about the __attribute__((optimize(3))) usage.
Have you done some benchmarking that shows that the speedup is
significant compared to the normal -O2? If yes, I guess we can keep
them. <tanuk>
I don't know what to think of them either. I did a really simplist benchmark
with the algorithm on my core i3 laptop initially to determine if it was
useful to keep everything double or float. There was no benefit to reducing
presicion on this one system, but that attribute was dramatic. Did not try
O2, though, just 03 and O0. I thought about messing with vectorization, but
I only have x86-64 PCs and that seems most valuable for embedded devices
which I cannot test at the moment.
11: Determine optimization strategy for filter code.
http://github.com/justinzane/pulseaudio/issues/issue/11
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss
Just some very simplistic benchmark results of
"__attribute__((optimize(#))) function()"
in code similar to a biquad filter:
optimize(0), 1867570825, 27.828974
optimize(1), 1017762024, 15.165836
optimize(2), 951896198, 14.184359
optimize(3), 952574300, 14.194463
This is for "memchunk" analogs of single channel 2^16 doubles being filtered
and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was
compiled with -O0.
With the supporting code compiled -O2, the numbers are:
optimize(0), 1436955156, 21.412300
optimize(1), 1020384309, 15.204911
optimize(2), 952980992, 14.200523
optimize(3), 952473365, 14.192959
Not much difference there.
With the benchmark compiled -O3, there is a DRASTIC change:
optimize(0), 1442046736, 21.488171
optimize(1), 1017924249, 15.168253
optimize(2), 954029138, 14.216142
optimize(3), 374432, 0.005579
That was such a freakish improvement, that I ran it several times, but the
results are quite reliable on my dev system.
This seems wrong. Does the code still execute *correctly*, does it even
run the benchmark at all at -O3? I suspect -O3 optimized large sections
of code away which may (or may not) produce incorrect code, perhaps
because because the benchmark code relies on undefined behavior or a bug
in gcc.
Best regards.
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss