On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote:
> I had experimentally thrown an optimization into my module's only
> significantly warm functions. Since I am a novice, this was a
> just-for-kicks experiment, but I would like to know whether to optimize at
> all beyond the general "-O2", and what platforms are critical to consider
> since I only use pulse on systems that are sufficient to run at "-O0"
> without noticeable problems beyond unnecessary power consumption.
>
> From another thread:
> > I'm not sure what to think about the __attribute__((optimize(3))) usage.
> > Have you done some benchmarking that shows that the speedup is
> > significant compared to the normal -O2? If yes, I guess we can keep
> > them. <tanuk>
>
> I don't know what to think of them either. I did a really simplist benchmark
> with the algorithm on my core i3 laptop initially to determine if it was
> useful to keep everything double or float. There was no benefit to reducing
> presicion on this one system, but that attribute was dramatic. Did not try
> O2, though, just 03 and O0. I thought about messing with vectorization, but
> I only have x86-64 PCs and that seems most valuable for embedded devices
> which I cannot test at the moment.
>
> 11: Determine optimization strategy for filter code.
> http://github.com/justinzane/pulseaudio/issues/issue/11
>
>
> _______________________________________________
> pulseaudio-discuss mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss
Just some very simplistic benchmark results of
"__attribute__((optimize(#))) function()"
in code similar to a biquad filter:
optimize(0), 1867570825, 27.828974
optimize(1), 1017762024, 15.165836
optimize(2), 951896198, 14.184359
optimize(3), 952574300, 14.194463
This is for "memchunk" analogs of single channel 2^16 doubles being filtered
and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was
compiled with -O0.
With the supporting code compiled -O2, the numbers are:
optimize(0), 1436955156, 21.412300
optimize(1), 1020384309, 15.204911
optimize(2), 952980992, 14.200523
optimize(3), 952473365, 14.192959
Not much difference there.
With the benchmark compiled -O3, there is a DRASTIC change:
optimize(0), 1442046736, 21.488171
optimize(1), 1017924249, 15.168253
optimize(2), 954029138, 14.216142
optimize(3), 374432, 0.005579
That was such a freakish improvement, that I ran it several times, but the
results are quite reliable on my dev system.
Replacing the optimize(#) with hot and using -O3 for the whole gives:
hot, 310780, 0.004631
And removing the __attribute__ altogether, again using -O3 for the whole
gives:
<NONE>, 333013, 0.004962
Being generally a novice using a VERY simplistic wrapper of a rather simple
function, I'm loathe to draw too many conclusions. However, this suggests that
it might be worth using __attribute__(hot) for any serious number crunching
functions within pulse and adopting the -O3 compiler flags as the standard.
If I can figure out oprofile or something similar, I'll try to test. I'd also
like to hear general feedback about this since I'm just learning. Thanks, all.
Justin
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss