I agree that optimization needs to be optional. Not only because it slows down compilation time, but also because of possible bugs (operators &&, || and ? generate jumps inside expressions and might rely on longer generated code variants)
There are numerous patterns that can be simplified 1. This is mentioned in my first post. In a sequences like 5+a+b (sum of two variables as a part of larger expression) we remark patterns like MOV ECX,[address] ADD EAX,ECX (other registers are also possible) it can be replaced with ADD EAX,[address] Similary, there are replaceable patterns MOV ECX,[address] AND EAX,ECX MOV ECX,[address] OR EAX,ECX MOV ECX,[address] XOR EAX,ECX MOV ECX,[address] SUB EAX,ECX 2) Comparison expressions a==800 is translated as MOV EAX,[address] CMP EAX,800 MOV EAX,0 SETE AL we can save 2 bytes by replacing last two instructions with SETE AL MOVZX EAX,AL if constant we compare with is larger than 255, we can save 2 more bytes by replacing first two instructions with CMP DWORD[address],800 3) Single instruction which occurs in almost every loop i++ generates MOV EAX,[address] MOV ECX,EAX INC EAX MOV [address],EAX and we can save 9 bytes by replacing it into INC DWORD [address] 4) Conditional expression: a==5?1:2 generates CMP EAX,5 MOV EAX,0 SETE AL CMP EAX,0 JE S1 JMP S2 S1: MOV EAX,2 JMP S3 S2: MOV EAX,1 S3: We can save 16 bytes for similar cases: CMP EAX,5 JE S2 S1: MOV EAX,2 JMP S3 S2: MOV EAX,1 S3: 5) Constant assignement a=10 generates MOV EAX,10 MOV [address],EAX One byte can be saved by replacing it with MOV DWORD [address],10 * but beware of conditional expressions 6) Function prologue generates PUSH EBP MOV EBP,ESP SUB ESP,8 NOP we can save 6 bytes with ENTER 0,8 (however code latency will increase from 3 to 12 cycles) 7) Constant and variable parameter passing to function p(2) parameter is passed using two instructions MOV EAX,2 PUSH EAX One byte saving with single instruction PUSH 2 8) Variable is already in register: Sometimes (for example in a=expression; p(a) we can see the code MOV [address],EAX MOV EAX,[address] Second instruction is redundant (6 bytes save) but ... * beware of goto to the second instruction On Sun, May 24, 2020 at 7:16 AM Christian Jullien <eli...@orange.fr> wrote: > Hi, there is certainly a tradeoff between compilation time and execution > time. > > A possible way to add this peephole optimization and possibly others we’ll > find later it to include this code only if –O2 is supplied. > > Now, removing 400 similar sequences like this is certainly something we > would like to have and on other backend if it applies. > > > > How long it makes tcc slower when compiling with and without this option? > > > > Wdyt? > > > > *From:* Tinycc-devel [mailto:tinycc-devel-bounces+eligis= > orange...@nongnu.org] *On Behalf Of *Samir Ribic > *Sent:* Sunday, May 24, 2020 01:09 > *To:* tinycc-devel@nongnu.org > *Subject:* [Tinycc-devel] Simple peephole optimization > > > > Tcc was never intended to be a real optimizing compiler, because it is > focused on compilation speed and memory savings. However, it seems that > some improvements on generated code can be done, without significant > compiler speed degradation.As tcc generates machine code, rather than > assembly, pattern searching for peephole optimization might be quite fast. > > > > I remarked that sequences in 386 generated code like this (Intel syntax) > > MOV ECX,[memory location] > > ADD EAX,ECX > > appear quite often, and they can be replaced with > > ADD EAX, [memory location] > > which saves two bytes. Together with variants using AND, OR, XOR, ADC, SUB > and SBB instructions, ]iIn a generated code from tcc.c there are about 400 > similar sequences. > > > > To do the replacement In a file i386-gen.c I have changed the function > > void gen_opi(int op) about 20 lines below the label gen_op8:. > > ..... > > } else { > gv2(RC_INT, RC_INT); > r = vtop[-1].r; > fr = vtop[0].r; > o((opc << 3) | 0x01); > o(0xc0 + r + fr * 8); > // Added code****** > unsigned char * peep; > peep=cur_text_section->data+ind; > > // op with global var > > if (peep[-8] == 0x8B && /* MOV reg,regm */ > ((peep[-7] & 0xC7) == 5) && /* modrm= [disp32] */ > ((peep[-7] & 0x38) == (peep[-1] & 0x38 )) /* dest reg of > first==srcreg of second */ > ) { > peep[-8]=(opc << 3) | 0x03; /* first instruction change > mov to op and swap registers */ > peep[-7] &= 0xC7; /* clear reg field of first > instruction */ > peep[-7] |= r*8; /* set reg field of first instruction > */ > ind -= 2; /* two bytes saved */ > } > // op with local var > else { > if (peep[-5] == 0x8B && /* MOV reg,regm */ > ((peep[-4] & 0xC7) == 0x45) && /* modrm= [EBP+disp8] */ > ((peep[-4] & 0x38) == (peep[-1] & 0x38 )) /* dest reg of > first==srcreg of second */ > ) { > peep[-5]=(opc << 3) | 0x03; /* first instruction change > mov to op and swap registers */ > peep[-4] &= 0xC7; /* clear reg field of first > instruction */ > peep[-4] |= r*8; /* set reg field of first instruction > */ > ind -= 2; /* two bytes saved */ > } > > } > //************************* > } > > > > > _______________________________________________ > Tinycc-devel mailing list > Tinycc-devel@nongnu.org > https://lists.nongnu.org/mailman/listinfo/tinycc-devel >
_______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel