Re: [pulseaudio-discuss] "Hot" function optimization recommendations

Thomas Martitz Tue, 09 Apr 2013 00:49:34 -0700

Am 08.04.2013 21:02, schrieb Justin Chudgar:

On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote:

I had experimentally thrown an optimization into my module's only
significantly warm functions. Since I am a novice, this was a
just-for-kicks experiment, but I would like to know whether to optimize at
all beyond the general "-O2", and what platforms are critical to consider
since I only use pulse on systems that are sufficient to run at "-O0"
without noticeable problems beyond unnecessary power consumption.


 From another thread:

I'm not sure what to think about the __attribute__((optimize(3))) usage.
Have you done some benchmarking that shows that the speedup is
significant compared to the normal -O2? If yes, I guess we can keep
them. <tanuk>

I don't know what to think of them either. I did a really simplist benchmark
with the algorithm on my core i3 laptop initially to determine if it was
useful to keep everything double or float. There was no benefit to reducing
presicion on this one system, but that attribute was dramatic. Did not try
O2, though, just 03 and O0. I thought about messing with vectorization, but
I only have x86-64 PCs and that seems most valuable for embedded devices
which I cannot test at the moment.

11: Determine optimization strategy for filter code.
http://github.com/justinzane/pulseaudio/issues/issue/11


_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Just some very simplistic benchmark results of
        "__attribute__((optimize(#))) function()"
in code similar to a biquad filter:
        optimize(0), 1867570825, 27.828974
        optimize(1), 1017762024, 15.165836
        optimize(2), 951896198, 14.184359
        optimize(3), 952574300, 14.194463
This is for "memchunk" analogs of single channel 2^16 doubles being filtered
and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was
compiled with -O0.

With the supporting code compiled -O2, the numbers are:
        optimize(0), 1436955156, 21.412300
        optimize(1), 1020384309, 15.204911
        optimize(2), 952980992, 14.200523
        optimize(3), 952473365, 14.192959
Not much difference there.

With the benchmark compiled -O3, there is a DRASTIC change:
        optimize(0), 1442046736, 21.488171
        optimize(1), 1017924249, 15.168253
        optimize(2), 954029138, 14.216142
        optimize(3), 374432, 0.005579
That was such a freakish improvement, that I ran it several times, but the
results are quite reliable on my dev system.

This seems wrong. Does the code still execute *correctly*, does it evenrun the benchmark at all at -O3? I suspect -O3 optimized large sectionsof code away which may (or may not) produce incorrect code, perhapsbecause because the benchmark code relies on undefined behavior or a bugin gcc.


Best regards.
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Re: [pulseaudio-discuss] "Hot" function optimization recommendations

Reply via email to