This patchset optimizes the "swap bytes within words" instructions on the arm, cris and mips targets. It all started with the patchset from Philippe Mathieu-Daudé optimizing TCG code by using the extract op. Looking at the patch I have found that the aarch64 rev16 function can be optimized even more. Richard Henderson then suggested an even more optimized version.
Aurelien Jarno (4): target/arm: optimize aarch32 rev16 target/arm: simplify and optimize aarch64 rev16 target/cris: optimize swap target/mips: optimize WSBH, DSBH and DSHD target/arm/translate-a64.c | 24 ++++++------------------ target/arm/translate.c | 6 ++++-- target/cris/translate.c | 15 +++++++-------- target/mips/translate.c | 18 ++++++++++++------ 4 files changed, 29 insertions(+), 34 deletions(-) -- 2.11.0