This patch series rework the liveness analysis and register allocator in order to generate more optimized code, by avoiding a lot of move instructions. I have measured a 9% performance improvement in user mode and 4% in system mode.
The idea behind this patch series is to free registers as soon as the temps are not used anymore instead of waiting for a basic block end or an op with side effects. In addition temps are copied to memory as soon as they are not going to be written anymore, this way even globals can be marked as "dead", avoiding moves to a new register when inputs and outputs are aliased. Finally qemu_ld/st operations do not save back globals to memory, but only copy them there. In case of an exception the globals have the correct values, and otherwise they do not have to be reloaded. Overall this greatly reduces the number of moves emitted, and spread them all over the TBs, increasing the performances on in-order CPUs. This also reduces register spilling, especially on CPUs with few registers. In practice it means the liveness analysis is providing more information to the register allocator, and especially when to the memory version of a temp with the content of the associated register. This means that the two are now quite linked, and that for some functions the code exist in two versions, one used when the liveness analysis is enabled which only does some checks with assert(), the other when it is disabled. It might be possible to keep only one version, but it implies de-optimizing the liveness analysis disabled case. In any case the checks with assert() should be kept, as they are quite useful to make sure nothing subtly breaks. v1 -> v2 -------- - patch 3: changed TEMP_VAL_CONST to fallthrough into TEMP_VAL_REG - patch 7: fixed ts->mem_coherent = 1 in const case factorized ots->fixed_reg & ts->val_type == TEMP_VAL_CONST case in the first if block changed comments for the first block - patch 10: move wrong hunk to patch 7 - patch 12: new patch - patch 14: should be considered as a new patch - patches 15-26: new patches Aurelien Jarno (26): tcg: add temp_dead() tcg: add tcg_reg_sync() tcg: add temp_sync() tcg: sync output arguments on liveness request tcg: rework liveness analysis tcg: improve tcg_reg_alloc_movi() tcg: rewrite tcg_reg_alloc_mov() tcg: always mark dead input arguments as dead tcg: start with local temps in TEMP_VAL_MEM state tcg: don't explicitly save globals and temps tcg: fix some op flags tcg: forbid ld/st function to modify globals tcg: synchronize globals for ops with side effects tcg: rework TCG helper flags target-alpha: rename helper flags target-arm: rename helper flags target-cris: rename helper flags target-i386: rename helper flags target-microblaze: rename helper flags target-mips: rename helper flags target-ppc: rename helper flags target-s390x: rename helper flags target-sh4: rename helper flags target-sparc: rename helper flags target-xtensa: rename helper flags tcg: remove compatiblity call flags target-alpha/helper.h | 176 ++++++++--------- target-arm/helper.h | 18 +- target-cris/helper.h | 18 +- target-i386/helper.h | 4 +- target-microblaze/helper.h | 6 +- target-mips/helper.h | 106 +++++------ target-ppc/helper.h | 38 ++-- target-s390x/helper.h | 76 ++++---- target-sh4/helper.h | 6 +- target-sparc/helper.h | 50 ++--- target-xtensa/helper.h | 16 +- tcg/README | 22 ++- tcg/optimize.c | 3 +- tcg/tcg-op.h | 18 +- tcg/tcg-opc.h | 29 ++- tcg/tcg.c | 453 +++++++++++++++++++++++++++----------------- tcg/tcg.h | 29 ++- 17 files changed, 597 insertions(+), 471 deletions(-) -- 1.7.10.4