[fpc-devel] More peephole optimisation questions
Hi everyone, So this is another question on peephole optimisation for x86_64. Occasionally you get situations where you write a load of constants to the stack - in this case it's part of an array parameter to a function call: movl $23199763,32(%rsp) movl $262149,36(%rsp) movl $33816983,40(%rsp) movl $36176315,44(%rsp) movl $50660102,48(%rsp) movl $65340390,52(%rsp) x86_64 doesn't support writing a 64-bit constant directly to memory, and you have to instead write it to a register first. With that in mind, is the following code faster? movq $1125921404878867,%eax movq %eax,32(%rsp) movq $155376089848611223,%eax movq %eax,40(%rsp) movq $280634838208545542,%eax movq %eax,48(%rsp) I know there will be a pipeline stall between the first two instructions, but logic tells me that parallelisation, out-of-order execution and register renaming will ensure that loading %eax with the next immediate can happen at the same time as its previous value is being written to memory. I know there are a lot of variables, like how smart the processor is and how many ALUs and AGUs are available, so that's why I'm after a second opinion before I start proposing an optimisation that's speculative at best. If necessary, I could even do this (if the registers are available): movq $1125921404878867,%eax movq $155376089848611223,%ecx movq $280634838208545542,%edx movq %eax,32(%rsp) movq %ecx,40(%rsp) movq %edx,48(%rsp) At the very least I'm pretty sure it's not worth it to concatenate a single pair of 32-bit immediates. For example, if it was just the first two: movl $23199763,32(%rsp) movl $262149,36(%rsp) ... it would not be worth it to transmute them into: movq $1125921404878867,%eax movq %eax,32(%rsp) Since in the former case, the two can be executed in parallel and the only barrier is memory latency (almost all modern Intel CPUs have at least 2 AGUs), while the latter case introduces a dependency. Gareth aka. Kit P.S. In this case, the assembly language is generated by this parameter in aoptx86: "[A_CMP, A_TEST, A_BSR, A_BSF, A_COMISS, A_COMISD, A_UCOMISS, A_UCOMISD, A_VCOMISS, A_VCOMISD, A_VUCOMISS, A_VUCOMISD]"... this is part of the CMOV optimisations and is a load of instructions that are used for comparisons - if the opcode matches one of the above, the peephole optimizer will see if it's possible to position MOV instructions before the comparison instead of between the comparison and the conditional jump, as this works better for macro-fusion and the ability to turn "mov $0,%reg" to "xor %reg,%reg", which cannot be done if the FLAGS register is in use (XOR scrambles them), so by moving MOV before the comparison, this eliminates that problem. -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole (Mantis)
On 22/01/2014 21:23, Florian Klämpfl wrote: Submit them to a bug report, I can look during the weekend into them. Done: 0025584, 0025586, 0025587 http://bugs.freepascal.org/view.php?id=25584 http://bugs.freepascal.org/view.php?id=25586 http://bugs.freepascal.org/view.php?id=25587 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 23/01/2014 20:34, Florian Klaempfl wrote: Yes and no. It is extra code and extra code is always bad ;) and it requires a separate compiler run. I wouldn't waste effort in it. testcase are extra code too. ;) scnr Ok, i see what you mean. No problem. It was just an idea. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 23.01.2014 21:15, schrieb Martin Frb: On 23/01/2014 20:04, Florian Klämpfl wrote: Am 23.01.2014 20:52, schrieb Martin Frb: On 23/01/2014 19:35, Florian Klämpfl wrote: I think this is hard to achive as well. Why? I consider it as complicated and it covers only cases one can forsee. Some statistical analysis of benchmark timings and procedure sizes is imo much more general. Ok, so we were talking about 2 different targets. You were talking (if I understand correct) about a general test for any and all forms of regressions (with regards to speed or size / not with regards to function) in code generation. Yes. This is indeed hard to test. Size may be do able by comparing to a known size that was once archived / size may increase or decrease, but then tests need to be fixed (decrease must be fixed, so that increase from the new optimum will then be detected.) Speed is indeed very hard. Since even a benchmark may vary. Yes, but having n=5 and doing daily benchmarking should one enable to identify trends. I was talking about checking for specific/known code snippets that are known to be inefficient (so anything that the peephole generator can/could detect). This is only a small subset of the possible speed/size regressions, but it is at least something. Yes and no. It is extra code and extra code is always bad ;) and it requires a separate compiler run. I wouldn't waste effort in it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 23/01/2014 20:04, Florian Klämpfl wrote: Am 23.01.2014 20:52, schrieb Martin Frb: On 23/01/2014 19:35, Florian Klämpfl wrote: I think this is hard to achive as well. Why? I consider it as complicated and it covers only cases one can forsee. Some statistical analysis of benchmark timings and procedure sizes is imo much more general. Ok, so we were talking about 2 different targets. You were talking (if I understand correct) about a general test for any and all forms of regressions (with regards to speed or size / not with regards to function) in code generation. This is indeed hard to test. Size may be do able by comparing to a known size that was once archived / size may increase or decrease, but then tests need to be fixed (decrease must be fixed, so that increase from the new optimum will then be detected.) Speed is indeed very hard. Since even a benchmark may vary. I was talking about checking for specific/known code snippets that are known to be inefficient (so anything that the peephole generator can/could detect). This is only a small subset of the possible speed/size regressions, but it is at least something. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 23.01.2014 20:52, schrieb Martin Frb: > On 23/01/2014 19:35, Florian Klämpfl wrote: >> Am 22.01.2014 23:22, schrieb Martin Frb: >>> One of the optimizations you said it where better avoided to be created >>> in first. I agree. >>> Only, even if that is archived at some time, who guarantees that it will >>> not be back (and unnoticed)? >>> >>> Are there tests, that can detect this? >> Not really, this is something I'am thinking about already for years :( >> >>> Or the code could be added, and asserted not to be triggered. Of course >>> adding actual test code to the compiler is not a good way either. Unless >>> assert style / Only compiled in if -Sa or -dXXX is given. >>> >>> Would anything in that direction make sense? >> I think this is hard to achive as well. >> > > Why? I consider it as complicated and it covers only cases one can forsee. Some statistical analysis of benchmark timings and procedure sizes is imo much more general. > > I see 2 scenarios. The peephole optimization is either > - always present (even if never triggered. > - contained in $IFDEF BUILD_WITH_TEST_ASSERT > > Then there is > {$IFDEF BUILD_WITH_TEST_ASSERT{ > asml.insertbefore(tai_comment.Create(strpnew('BAD_TEST_TRIGGERED')), p); > {$ENDIF} > > For the test, the compiler needs to be build with the define. > Then it runs with -al ( [1] it can build al testcases, or even the > compiler units, or rtl units / and there can be dedicated been written > units to be compiled) > The assembler in grepped for BAD_TEST_TRIGGERED, it MUST NOT be found. > > [1] The test is not complete, but it is impossible to feed every > thinkable input, as that is an infinite amount. However compiling a huge > amount of units gives a good chance of catching violations. > > I might be overlooking something, but it should not be so hard. > For readability those optimization can either be in an include fileso > there will be lines like > {$DEFINE TEST_INCLUDE_PART_FOO} {$I TEST_INCLUDE} > > Or it can be in a separate function, running an extra loop, at the end > or begin of the peephole pass. > > > > > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 23/01/2014 19:35, Florian Klämpfl wrote: Am 22.01.2014 23:22, schrieb Martin Frb: One of the optimizations you said it where better avoided to be created in first. I agree. Only, even if that is archived at some time, who guarantees that it will not be back (and unnoticed)? Are there tests, that can detect this? Not really, this is something I'am thinking about already for years :( Or the code could be added, and asserted not to be triggered. Of course adding actual test code to the compiler is not a good way either. Unless assert style / Only compiled in if -Sa or -dXXX is given. Would anything in that direction make sense? I think this is hard to achive as well. Why? I see 2 scenarios. The peephole optimization is either - always present (even if never triggered. - contained in $IFDEF BUILD_WITH_TEST_ASSERT Then there is {$IFDEF BUILD_WITH_TEST_ASSERT{ asml.insertbefore(tai_comment.Create(strpnew('BAD_TEST_TRIGGERED')), p); {$ENDIF} For the test, the compiler needs to be build with the define. Then it runs with -al ( [1] it can build al testcases, or even the compiler units, or rtl units / and there can be dedicated been written units to be compiled) The assembler in grepped for BAD_TEST_TRIGGERED, it MUST NOT be found. [1] The test is not complete, but it is impossible to feed every thinkable input, as that is an infinite amount. However compiling a huge amount of units gives a good chance of catching violations. I might be overlooking something, but it should not be so hard. For readability those optimization can either be in an include fileso there will be lines like {$DEFINE TEST_INCLUDE_PART_FOO} {$I TEST_INCLUDE} Or it can be in a separate function, running an extra loop, at the end or begin of the peephole pass. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 22.01.2014 23:22, schrieb Martin Frb: > On 22/01/2014 21:29, Florian Klämpfl wrote: >> Am 22.01.2014 04:06, schrieb Martin Frb: >>> On 21/01/2014 21:28, Florian Klämpfl wrote: Can you post some example code? It might be worth to think about improving this already in at the node level. >>> While getting examples, another issue: >>> >>> >>> with -O2 , -O3 or -O4 >>> >>> Note the >>> movl%eax,%edx >>> movl%edx,%eax >>> >>> with -O1 it is >>> movl%eax,-4(%ebp) >>> movl-4(%ebp),%eax >> Well, -O1 can be neglected ... >> > > Ok. > > There is another idea, I had in that context, and like to bring up. > Though maybe this is already solved in a different way. > > One of the optimizations you said it where better avoided to be created > in first. I agree. > Only, even if that is archived at some time, who guarantees that it will > not be back (and unnoticed)? > > Are there tests, that can detect this? Not really, this is something I'am thinking about already for years :( > Of course all generated assembler could be parsed back, and analysed. > But that is rather a lot of work. Yes. > > Or the code could be added, and asserted not to be triggered. Of course > adding actual test code to the compiler is not a good way either. Unless > assert style / Only compiled in if -Sa or -dXXX is given. > > Would anything in that direction make sense? I think this is hard to achive as well. > Then the code could work in real live, until no longer needed, at which > time it was still useful for assertion. Probably the best way would be benchmarking regarding runtime and procedure size: significant changes in runtime or procedure size would trigger an alert. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 22/01/2014 21:29, Florian Klämpfl wrote: Am 22.01.2014 04:06, schrieb Martin Frb: On 21/01/2014 21:28, Florian Klämpfl wrote: Can you post some example code? It might be worth to think about improving this already in at the node level. While getting examples, another issue: with -O2 , -O3 or -O4 Note the movl%eax,%edx movl%edx,%eax with -O1 it is movl%eax,-4(%ebp) movl-4(%ebp),%eax Well, -O1 can be neglected ... Ok. There is another idea, I had in that context, and like to bring up. Though maybe this is already solved in a different way. One of the optimizations you said it where better avoided to be created in first. I agree. Only, even if that is archived at some time, who guarantees that it will not be back (and unnoticed)? Are there tests, that can detect this? Of course all generated assembler could be parsed back, and analysed. But that is rather a lot of work. Or the code could be added, and asserted not to be triggered. Of course adding actual test code to the compiler is not a good way either. Unless assert style / Only compiled in if -Sa or -dXXX is given. Would anything in that direction make sense? Then the code could work in real live, until no longer needed, at which time it was still useful for assertion. Just an idea. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 22.01.2014 04:06, schrieb Martin Frb: > On 21/01/2014 21:28, Florian Klämpfl wrote: >> Can you post some example code? It might be worth to think about >> improving this already in at the node level. >> > > While getting examples, another issue: > > > with -O2 , -O3 or -O4 > > Note the > movl%eax,%edx > movl%edx,%eax > > with -O1 it is > movl%eax,-4(%ebp) > movl-4(%ebp),%eax Well, -O1 can be neglected ... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 22.01.2014 00:27, schrieb Martin Frb: > On 21/01/2014 21:28, Florian Klämpfl wrote: >> Am 20.01.2014 01:18, schrieb Martin: >> >>> It used >>> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and >>> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then >>> but should only compare the supregister part >>> I replaced that >>> not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) >>> then >>> >>> uncommented, and tested. >>> It does catch a big lot of occurrences. >> Can you post some example code? It might be worth to think about >> improving this already in at the node level. >> >> > > I will try to find some. (I just enabled it, and put a writeln in there, > to see, if it was triggered. Then run the tests and buli Lazarus. > > In the meantime, what about the other additions/changes? > > I already wrote code for them, and mailed it. > So what I need to know: How to best go on to get them accepted and into > the compiler? > Submit them to a bug report, I can look during the weekend into them. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 21/01/2014 21:28, Florian Klämpfl wrote: Can you post some example code? It might be worth to think about improving this already in at the node level. While getting examples, another issue: with -O2 , -O3 or -O4 Note the movl%eax,%edx movl%edx,%eax with -O1 it is movl%eax,-4(%ebp) movl-4(%ebp),%eax .section .text.n_p$project1$_$tfoo_$__$$_init,"x" .balign 16,0x90 .globlP$PROJECT1$_$TFOO_$__$$_INIT P$PROJECT1$_$TFOO_$__$$_INIT: # Temps allocated between esp+0 and esp+0 # Var $self located in register edx # [19] begin movl%eax,%edx # Var $self located in register edx # [20] Init; movl%edx,%eax movl(%edx),%edx call*100(%edx) # [21] end; ret program project1; {$mode objfpc} type { TFoo } TFoo = class a,b : Integer; function Bar: Boolean; procedure Init; virtual; end; function TFoo.Bar: Boolean; begin Result := a <> b; end; procedure TFoo.Init; begin Init; end; { TFoo } begin end. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 21/01/2014 23:27, Martin Frb wrote: On 21/01/2014 21:28, Florian Klämpfl wrote: Am 20.01.2014 01:18, schrieb Martin: It used (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then but should only compare the supregister part I replaced that not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then uncommented, and tested. It does catch a big lot of occurrences. Can you post some example code? It might be worth to think about improving this already in at the node level. I will try to find some. (I just enabled it, and put a writeln in there, to see, if it was triggered. Then run the tests and buli Lazarus. There are 100 or 1000nds of matches when compiling the IDE, not sure how many are triggered by the below, or by which other condition. 2 examples in below, but they only happen with -O- or -O1 In TFoo.Bar move the comment to the other statement, for 2nd example (if init is called within Init, then different code is produced) In both cases the 2nd register should not be needed at all. (but if present, is better loaded from program project1; {$mode objfpc} type TFoo = class a,b : Integer; function Bar: Boolean; procedure Init; virtual; end; function TFoo.Bar: Boolean; begin Result := a = b; // Init; end; procedure TFoo.Init; begin end; begin end. Result := a = b; .section .text.n_p$project1$_$tfoo_$__$$_bar$$boolean,"x" .balign 16,0x90 .globlP$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN P$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN: # Temps allocated between ebp-8 and ebp-8 # [project1.lpr] # [14] begin pushl%ebp movl%esp,%ebp leal-8(%esp),%esp # Var $self located at ebp-4 # Var $result located at ebp-8 movl%eax,-4(%ebp) # [15] Init; movl-4(%ebp),%eax // < movl-4(%ebp),%edx // < movl(%edx),%edx call*100(%edx) # [16] end; leave ret / INIT .section .text.n_p$project1$_$tfoo_$__$$_bar$$boolean,"x" .balign 16,0x90 .globlP$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN P$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN: # Temps allocated between ebp-8 and ebp-8 # [project1.lpr] # [14] begin pushl%ebp movl%esp,%ebp leal-8(%esp),%esp # Var $self located at ebp-4 # Var $result located at ebp-8 movl%eax,-4(%ebp) # [15] Init; movl-4(%ebp),%eax movl-4(%ebp),%edx movl(%edx),%edx call*100(%edx) # [16] end; leave ret ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
On 21/01/2014 21:28, Florian Klämpfl wrote: Am 20.01.2014 01:18, schrieb Martin: It used (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then but should only compare the supregister part I replaced that not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then uncommented, and tested. It does catch a big lot of occurrences. Can you post some example code? It might be worth to think about improving this already in at the node level. I will try to find some. (I just enabled it, and put a writeln in there, to see, if it was triggered. Then run the tests and buli Lazarus. In the meantime, what about the other additions/changes? I already wrote code for them, and mailed it. So what I need to know: How to best go on to get them accepted and into the compiler? One problem is, if I create 3 or 4 patches, and report them, only one will work. The others may then conflict. Or I need to create them, based on each other, and only applicable in the right order. I can do all in one (if that is preferred)... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] More peephole
Am 20.01.2014 01:18, schrieb Martin: > Just been looking at the peehole opt (i386). Other than the 2 items > already mailed, I found that: > > 1) Gode as follows is sometimes generated (at various opt levels) > > .Ll2: > # [36] i := 1; > movl$1,%eax > .Ll3: > # [38] i := i + 1; > movl$2,%eax > > I could not find any code dealing with it, There is none I think. > and added some. It does catch > a noticeable amount of occurrences during build of fpc and lazarus. Go ahead. According to me experience a good mean besides testing compiler and lazarus to quickly catch errors in the peephole optimizer is to compare assembler before and after. > > 2) > Commented existing code (apparently since revision 1) for > {movl [mem1],reg1 to movl [mem1],reg1 >movl [mem1],reg2movl reg1,reg2 } The code was commented 15 years ago :) with the comment * split the optimizer by Jonas. So I think such statements were caught by the old assembler cse optimizer which is disabled now. > > It used > (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and > (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then > but should only compare the supregister part > I replaced that > not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then > > uncommented, and tested. > It does catch a big lot of occurrences. Can you post some example code? It might be worth to think about improving this already in at the node level. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] More peephole
Just been looking at the peehole opt (i386). Other than the 2 items already mailed, I found that: 1) Gode as follows is sometimes generated (at various opt levels) .Ll2: # [36] i := 1; movl$1,%eax .Ll3: # [38] i := i + 1; movl$2,%eax I could not find any code dealing with it, and added some. It does catch a noticeable amount of occurrences during build of fpc and lazarus. 2) Commented existing code (apparently since revision 1) for {movl [mem1],reg1 to movl [mem1],reg1 movl [mem1],reg2movl reg1,reg2 } It used (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then but should only compare the supregister part I replaced that not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then uncommented, and tested. It does catch a big lot of occurrences. Both pass all tests form the test dir (except those failing already (see my other mail / compared at slightly diff settings) Both build a working FPC and Lazarus. Patch with all 3 changes at the end of the mail. I will separate them, if all else is ok, and needed. New Code for removing double move to register. Code is in a block where opsize, and other restraints have already been confirmed. if (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[1]^.typ = top_reg) and (taicpu(p).oper[1]^.reg = taicpu(hp1).oper[1]^.reg) and not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then { We have mov xxx, reg1 mov yyy, reg1 } begin asml.remove(p); p.free; p := hp1; end existing, now uncommented else {movl [mem1],reg1 to movl [mem1],reg1 movl [mem1],reg2movl reg1,reg2 } if (taicpu(p).oper[0]^.typ = top_ref) and (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[0]^.typ = top_ref) and (taicpu(hp1).oper[1]^.typ = top_reg) and RefsEqual(TReference(taicpu(p).oper[0]^.ref^),taicpu(hp1).oper[0]^.ref^) and not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then //(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and //(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then begin taicpu(hp1).loadReg(0,taicpu(p).oper[1]^.reg) end Index: compiler/i386/popt386.pas === --- compiler/i386/popt386.pas(revision 26519) +++ compiler/i386/popt386.pas(working copy) @@ -1369,39 +1369,50 @@ end end else -(* {movl [mem1],reg1 -movl [mem1],reg2 -to: - movl [mem1],reg1 - movl reg1,reg2 } -if (taicpu(p).oper[0]^.typ = top_ref) and - (taicpu(p).oper[1]^.typ = top_reg) and - (taicpu(hp1).oper[0]^.typ = top_ref) and - (taicpu(hp1).oper[1]^.typ = top_reg) and - (taicpu(p).opsize = taicpu(hp1).opsize) and - RefsEqual(TReference(taicpu(p).oper[0]^^),taicpu(hp1).oper[0]^^.ref^) and - (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and - (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then - taicpu(hp1).loadReg(0,taicpu(p).oper[1]^.reg) -else*) -{ movl const1,[mem1] -movl [mem1],reg1 -to: -movl const1,reg1 -movl reg1,[mem1] } - if (taicpu(p).oper[0]^.typ = top_const) and - (taicpu(p).oper[1]^.typ = top_ref) and +if (taicpu(p).oper[1]^.typ = top_reg) and + (taicpu(hp1).oper[1]^.typ = top_reg) and + (taicpu(p).oper[1]^.reg = taicpu(hp1).oper[1]^.reg) and + not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then +{ We have mov xxx, reg1 mov yyy, reg1 } + begin +asml.remove(p); +p.free; +p := hp1; + end +