https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79748

            Bug ID: 79748
           Summary: [Enhancement] no_callee_saved_registers function
                    attribute (on x86)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: katsunori.kumatani at gmail dot com
  Target Milestone: ---

This is not a bug but a feature request/enhancement (hopefully simple?).

On x86(-64), upcoming GCC has added the 'no_caller_saved_registers' function
attribute which preserves all registers that are used in a function (for e.g.
interrupt handlers). This request would be for an attribute with the *opposite*
effect: to not save *any* register if it's used in a function, for better code
generation.

Rationale: GCC since version 5 does nice interprocedural optimizations if it
can see the function's definition. It knows which registers a function
clobbers, and which it does not, and so it knows which ones it doesn't have to
spill before calling the function and reload them after. For example 'rcx' is a
register on x86-64 that does not get preserved across functions in the ABI, yet
GCC can see if it's not clobbered in a function and not have to spill/reload it
in the *caller*. Unfortunately, GCC will *always* save the registers that are
callee-saved by the ABI/calling convention within the *callee*, even if the
function is really only called from a place that doesn't use those registers at
all.

How would this attribute help: GCC will *not* save any registers in the
function marked with this attribute at all, and callers will assume all
registers are dead after the function call... this may seem like bad code, but
that's *except* with the interprocedural optimization enabled. This only really
works best across the same compilation unit or with LTO (which works great).
Then GCC will *only* save/spill the registers it actually uses in the caller
and sees clobbered in the callee (because it sees its definition), which
results in better code especially for leaf functions with a lot of parameters.
It allows GCC more freedom to generate optimal code given the actual
circumstances of those function usages.

Obviously this attribute should *not* be used on functions that are externally
visible (or callbacks), or otherwise interface with other libraries/programs.
It's for better code generation within local function calls to the project
only. It's the responsibility of the programmer, of course, to use it properly.

Please keep in mind that the interprocedural optimizations seem to already be
in place (seeing which register gets used across functions) so I'm not asking
to implement any optimizations of any kind here. I'm simply asking if it's easy
and feasible to implement an attribute to indicate whether a function does
*not* save any registers at all, and let GCC's existing optimizations handle
the rest. Yes it deviates from the ABI but the ABI doesn't have to be followed
within an internal/local function call of a compilation unit, because they
don't interface with anything else except the code (which GCC sees with LTO!).
That's the whole point of interprocedural optimizations anyway.


Please understand I am not claiming this would result in extreme runtime
difference, just overall slightly better general code. I am *hoping* it would
be an easy addition because we already have no_caller_saved_registers attribute
and this is practically the total opposite of that.

Plus you do *not* have to worry about x87/SSE/AVX registers at all because they
already don't get saved anyway. If that's not the case and it would actually be
difficult to add, I can understand if it's hesitant to implement it.

I would try to submit a patch myself and see whether it's easily doable, but
unfortunately I am not familiar with GCC's internals (to be honest the
internals documentation is pretty hard to follow, unlike Clang's; any extra
advice about this would be helpful too). LLVM for instance has a lot more
calling conventions than GCC, and something similar to this is called the "cc
11" HiPE calling convention (High Performance Erlang), where there are no
callee-saved registers at all on x86 (also uses more to pass as parameters and
other things since it's a full-blown calling convention, but that's a different
subject of course).

Would this be a simple addition to add 'no_callee_saved_registers' attribute?
It would definitely have its uses if so.


(one last thing: I'm aware of the command-line options -fcall-used-reg, but
those are not specific to a function only, which makes them hard to use in a
source file which has an externally visible function that needs to follow the
ABI and other "helper" functions that don't get inlined (used often and are not
trivial)... and most people won't bother at all then, which is a shame. Also,
they apply even to sub-functions called within the functions compiled with it
which basically means GCC will assume any function called will clobber all
registers which can be a bad thing for code generation if you have to call a
library function -- that's why a function attribute that applies ONLY to the
function it is marked with is requested, or at least restrict the
'-fcall-used-reg' to one function only (and not to called functions from it). I
hope that makes it clear!)

Reply via email to