On 17 December 2017 at 15:45, Lubomir I. Ivanov <[email protected]> wrote: > On 17 December 2017 at 14:38, Berthold Stoeger > <[email protected]> wrote: >> Hi Robert, >> >> On Sonntag, 17. Dezember 2017 10:03:22 CET Robert Helling wrote: >>> Hi, >>> >>> > On 16. Dec 2017, at 20:29, Lubomir I. Ivanov <[email protected]> wrote: >>> > >>> > so yes, "float" is like a non-default sized type e.g. "short", yet >>> > one's variable might fit in a "short" and it's perfectly fine to use >>> > the non-default type. >>> > while nowadays we don't have much of a size restrain using the smaller >>> > type, it could be a good indication that we don't actually need the >>> > whole 64bit precision for this variable. >>> > the same way one could use a "char" vs an "int". >>> > >>> > on the other hand, if this causes more trouble than benefits, it's >>> > probably best to just convert all the calculations to be with >>> > "doubles", have consistency and silence the -Wall warnings. >>> >>> from my very limited understanding: RAM is indeed big enough, so there is no >>> size constraint coming from there. But what is actually finite is CPU cache >>> size. So when the doubling in memory footprint makes the difference for >>> some calculation to fit in the cache vs the CPU has to fetch the data from >>> RAM several times the performance will hugely differ. Of course I have no >>> idea at all if this is any concern of us (not having done any significant >>> performance tuning ever). Plus there are these vector instructions where >>> the CPU can do a number of floating point instructions on different data in >>> parallel: There the number of things done at once is twice for float than >>> for double (since twice as many floats fit in the CPU). Again no idea at >>> all if the compiler generates this parallelism for us at any critical >>> point. >> >> This is of course all correct, but none of the proposed changes are anywhere >> close to in a tight loop. In some cases it's only a local variable which is >> not even spilled to RAM. >> >> I tried to look at the assembler output and the first thing I noted is that >> we >> don't compile with -O2? Is that on purpose? >> >> Anyway, with -O3, the first case in plannernotes.h produces better code >> without >> casting to float. You could of course get the same result by replacing 1000.0 >> with 1000.0f, but this seems rather obscure. >> >> I removed two cases concerning binary representation from the patch and made >> a >> PR. Perhaps we get Linus to have a look at it. >> > > we don't optimize in C code intentionally, so that the code is more > readable. there are opportunities for that in many places, to help the > compiler out, but we just don't do it for the sake of readability. > > at the same time, i don't have a good answer about the -Oxx flags - > e.g. why don't we use them. i have a vague memory that this was > discussed back in the GTK days. > from my experience the -Oxx flags can definitely improve areas with > heavy operations. the only problem i see is that one should probably > enable the -Oxx for both release and debug, to catch possible compiler > bugs too. >
if we stick to GCC we can also do the per-function optimizations: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes see: - hot - optimize clang supports partially the attribute namespace as well. lubomir -- _______________________________________________ subsurface mailing list [email protected] http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface
