Hi all,

This patch series attempts to improve code generation on arm and aarch64 for various bitwise operations that can be expressed with rev16 instructions in those architectures. In particular expressions of the form:
((x & 0x00ff00ff) << 8) | ((x & 0xff00ff00) >> 8)

This can appear in places like the Linux kernel and can be directly mapped to a single rev16 instruction.
This series has 3 parts:

[1/3] Add a new field to the rtx costs tables to represent the latency of the rev* group of instructions that will be used to accurately model the cost of these operations. Use it to properly cost existing patterns that generate rev16 (for bswap operations).

[2/3] Add aarch64 combine patterns to recognise the above bitwise operations and map them to rev16. Model the cost appropriately and add helper functions that can be reused by the arm backend.

[3/3] Define similar combine patterns for arm and reuse the helper functions introduced in patch 2/3 to properly cost them.

I'm proposing these for next stage-1 of course.

Thanks,
Kyrill

Reply via email to