Alexvod tried pr31849-patch from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31849. It changed result size from 72 to 68 below, but still didn't eliminated the loop counter below.
See the following code // compilation options: -march=armv5te -mthumb -Os struct tree_block { unsigned handler_block_flag:2; unsigned block_num:30; }; static int next_block_index = 2; void number_blocks (int n_blocks, struct tree_block **block_vector) { int i; for (i = 0; i < n_blocks; ++i) ((block_vector[i])->block_num) = next_block_index++; } is compiled by gcc 4.4.0 in very inefficient way. gcc 4.2.1 compiles it to 48 bytes, and gcc 4.4.0 to 72 bytes (1.5 times bigger). Analysis of assembly files shows the following problems: 1) operations with bitfields are done inefficiently. gcc-4.2.1 sets block_num by LSLing the value and ORRing it.gcc-4.4.0 loads an extra constant 0x3fffffff from memory and does AND in addition to that LSL and ORR. 2) gcc-4.4.0 doesn't eliminate loop counter i. It increments both block_vector and i. Instead gcc-4.2.1 computes the end of the loop and increments only block_vector 3) register allocation performs badly, it spills some registers to stack, which causes extra LDR, STR operations in the loop body The code was taken from GCC SPEC benchmark. -- Summary: Code bloating on operations with bit fields Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: sliao at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42501