[fpc-devel] More peephole optimisation questions

2022-04-19 Thread J. Gareth Moreton via fpc-devel

Hi everyone,

So this is another question on peephole optimisation for x86_64. 
Occasionally you get situations where you write a load of constants to 
the stack - in this case it's part of an array parameter to a function call:


    movl    $23199763,32(%rsp)
    movl    $262149,36(%rsp)
    movl    $33816983,40(%rsp)
    movl    $36176315,44(%rsp)
    movl    $50660102,48(%rsp)
    movl    $65340390,52(%rsp)

x86_64 doesn't support writing a 64-bit constant directly to memory, and 
you have to instead write it to a register first. With that in mind, is 
the following code faster?


    movq    $1125921404878867,%eax
    movq    %eax,32(%rsp)
    movq    $155376089848611223,%eax
    movq    %eax,40(%rsp)
    movq    $280634838208545542,%eax
    movq    %eax,48(%rsp)

I know there will be a pipeline stall between the first two 
instructions, but logic tells me that parallelisation, out-of-order 
execution and register renaming will ensure that loading %eax with the 
next immediate can happen at the same time as its previous value is 
being written to memory.  I know there are a lot of variables, like how 
smart the processor is and how many ALUs and AGUs are available, so 
that's why I'm after a second opinion before I start proposing an 
optimisation that's speculative at best.  If necessary, I could even do 
this (if the registers are available):


    movq    $1125921404878867,%eax
    movq    $155376089848611223,%ecx
    movq    $280634838208545542,%edx
    movq    %eax,32(%rsp)
    movq    %ecx,40(%rsp)
    movq    %edx,48(%rsp)

At the very least I'm pretty sure it's not worth it to concatenate a 
single pair of 32-bit immediates.  For example, if it was just the first 
two:


    movl    $23199763,32(%rsp)
    movl    $262149,36(%rsp)

... it would not be worth it to transmute them into:

    movq    $1125921404878867,%eax
    movq    %eax,32(%rsp)

Since in the former case, the two can be executed in parallel and the 
only barrier is memory latency (almost all modern Intel CPUs have at 
least 2 AGUs), while the latter case introduces a dependency.


Gareth aka. Kit

P.S. In this case, the assembly language is generated by this parameter 
in aoptx86: "[A_CMP, A_TEST, A_BSR, A_BSF, A_COMISS, A_COMISD, 
A_UCOMISS, A_UCOMISD, A_VCOMISS, A_VCOMISD, A_VUCOMISS, A_VUCOMISD]"... 
this is part of the CMOV optimisations and is a load of instructions 
that are used for comparisons - if the opcode matches one of the above, 
the peephole optimizer will see if it's possible to position MOV 
instructions before the comparison instead of between the comparison and 
the conditional jump, as this works better for macro-fusion and the 
ability to turn "mov $0,%reg" to "xor %reg,%reg", which cannot be done 
if the FLAGS register is in use (XOR scrambles them), so by moving MOV 
before the comparison, this eliminates that problem.



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole (Mantis)

2014-01-23 Thread Martin Frb

On 22/01/2014 21:23, Florian Klämpfl wrote:

Submit them to a bug report, I can look during the weekend into them.



Done: 0025584, 0025586, 0025587

http://bugs.freepascal.org/view.php?id=25584
http://bugs.freepascal.org/view.php?id=25586
http://bugs.freepascal.org/view.php?id=25587


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Martin Frb

On 23/01/2014 20:34, Florian Klaempfl wrote:


Yes and no. It is extra code and extra code is always bad ;) and it 
requires a separate compiler run. I wouldn't waste effort in it.


testcase are extra code too. ;) scnr

Ok, i see what you mean. No problem. It was just an idea.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Florian Klaempfl

Am 23.01.2014 21:15, schrieb Martin Frb:

On 23/01/2014 20:04, Florian Klämpfl wrote:

Am 23.01.2014 20:52, schrieb Martin Frb:

On 23/01/2014 19:35, Florian Klämpfl wrote:


I think this is hard to achive as well.


Why?

I consider it as complicated and it covers only cases one can forsee.
Some statistical analysis of benchmark timings and procedure sizes is
imo much more general.



Ok, so we were talking about 2 different targets.

You were talking (if I understand correct) about a general test for any
and all forms of regressions (with regards to speed or size / not with
regards to function) in code generation.


Yes.


This is indeed hard to test.
Size may be do able by comparing to a known size that was once archived
/ size may increase or decrease, but then tests need to be fixed
(decrease must be fixed, so that increase from the new optimum will then
be detected.)
Speed is indeed very hard. Since even a benchmark may vary.


Yes, but having n=5 and doing daily benchmarking should one enable to 
identify trends.





I was talking about checking for specific/known code snippets that are
known to be inefficient (so anything that the peephole generator
can/could detect). This is only a small subset of the possible
speed/size regressions, but it is at least something.


Yes and no. It is extra code and extra code is always bad ;) and it 
requires a separate compiler run. I wouldn't waste effort in it.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Martin Frb

On 23/01/2014 20:04, Florian Klämpfl wrote:

Am 23.01.2014 20:52, schrieb Martin Frb:

On 23/01/2014 19:35, Florian Klämpfl wrote:


I think this is hard to achive as well.


Why?

I consider it as complicated and it covers only cases one can forsee.
Some statistical analysis of benchmark timings and procedure sizes is
imo much more general.



Ok, so we were talking about 2 different targets.

You were talking (if I understand correct) about a general test for any 
and all forms of regressions (with regards to speed or size / not with 
regards to function) in code generation. This is indeed hard to test.
Size may be do able by comparing to a known size that was once archived 
/ size may increase or decrease, but then tests need to be fixed 
(decrease must be fixed, so that increase from the new optimum will then 
be detected.)

Speed is indeed very hard. Since even a benchmark may vary.


I was talking about checking for specific/known code snippets that are 
known to be inefficient (so anything that the peephole generator 
can/could detect). This is only a small subset of the possible 
speed/size regressions, but it is at least something.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Florian Klämpfl
Am 23.01.2014 20:52, schrieb Martin Frb:
> On 23/01/2014 19:35, Florian Klämpfl wrote:
>> Am 22.01.2014 23:22, schrieb Martin Frb:
>>> One of the optimizations you said it where better avoided to be created
>>> in first. I agree.
>>> Only, even if that is archived at some time, who guarantees that it will
>>> not be back (and unnoticed)?
>>>
>>> Are there tests, that can detect this?
>> Not really, this is something I'am thinking about already for years :(
>>
>>> Or the code could be added, and asserted not to be triggered. Of course
>>> adding actual test code to the compiler is not a good way either. Unless
>>> assert style / Only compiled in if -Sa or -dXXX is given.
>>>
>>> Would anything in that direction make sense?
>> I think this is hard to achive as well.
>>
> 
> Why?

I consider it as complicated and it covers only cases one can forsee.
Some statistical analysis of benchmark timings and procedure sizes is
imo much more general.

> 
> I see 2 scenarios. The peephole optimization is either
> - always present (even if never triggered.
> - contained in $IFDEF BUILD_WITH_TEST_ASSERT
> 
> Then there is
>   {$IFDEF BUILD_WITH_TEST_ASSERT{
> asml.insertbefore(tai_comment.Create(strpnew('BAD_TEST_TRIGGERED')), p);
> {$ENDIF}
> 
> For the test, the compiler needs to be build with the define.
> Then it runs with -al   ( [1] it can build al testcases, or even the
> compiler units, or rtl units / and there can be dedicated been written
> units to be compiled)
> The assembler in grepped for BAD_TEST_TRIGGERED, it MUST NOT be found.
> 
> [1] The test is not complete, but it is impossible to feed every
> thinkable input, as that is an infinite amount. However compiling a huge
> amount of units gives a good chance of catching violations.
> 
> I might be overlooking something, but it should not be so hard.
> For readability those optimization can either be in an include fileso
> there will be lines like
> {$DEFINE TEST_INCLUDE_PART_FOO} {$I TEST_INCLUDE}
> 
> Or it can be in a separate function, running an extra loop, at the end
> or begin of the peephole pass.
> 
> 
> 
> 
> 
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Martin Frb

On 23/01/2014 19:35, Florian Klämpfl wrote:

Am 22.01.2014 23:22, schrieb Martin Frb:

One of the optimizations you said it where better avoided to be created
in first. I agree.
Only, even if that is archived at some time, who guarantees that it will
not be back (and unnoticed)?

Are there tests, that can detect this?

Not really, this is something I'am thinking about already for years :(


Or the code could be added, and asserted not to be triggered. Of course
adding actual test code to the compiler is not a good way either. Unless
assert style / Only compiled in if -Sa or -dXXX is given.

Would anything in that direction make sense?

I think this is hard to achive as well.



Why?

I see 2 scenarios. The peephole optimization is either
- always present (even if never triggered.
- contained in $IFDEF BUILD_WITH_TEST_ASSERT

Then there is
  {$IFDEF BUILD_WITH_TEST_ASSERT{
asml.insertbefore(tai_comment.Create(strpnew('BAD_TEST_TRIGGERED')), p);
{$ENDIF}

For the test, the compiler needs to be build with the define.
Then it runs with -al   ( [1] it can build al testcases, or even the 
compiler units, or rtl units / and there can be dedicated been written 
units to be compiled)

The assembler in grepped for BAD_TEST_TRIGGERED, it MUST NOT be found.

[1] The test is not complete, but it is impossible to feed every 
thinkable input, as that is an infinite amount. However compiling a huge 
amount of units gives a good chance of catching violations.


I might be overlooking something, but it should not be so hard.
For readability those optimization can either be in an include fileso 
there will be lines like

{$DEFINE TEST_INCLUDE_PART_FOO} {$I TEST_INCLUDE}

Or it can be in a separate function, running an extra loop, at the end 
or begin of the peephole pass.






___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-23 Thread Florian Klämpfl
Am 22.01.2014 23:22, schrieb Martin Frb:
> On 22/01/2014 21:29, Florian Klämpfl wrote:
>> Am 22.01.2014 04:06, schrieb Martin Frb:
>>> On 21/01/2014 21:28, Florian Klämpfl wrote:
 Can you post some example code? It might be worth to think about
 improving this already in at the node level.

>>> While getting examples, another issue:
>>>
>>>
>>> with -O2 , -O3 or -O4
>>>
>>> Note the
>>>  movl%eax,%edx
>>>  movl%edx,%eax
>>>
>>> with -O1 it is
>>>  movl%eax,-4(%ebp)
>>>  movl-4(%ebp),%eax
>> Well, -O1 can be neglected ...
>>
> 
> Ok.
> 
> There is another idea, I had in that context, and like to bring up.
> Though maybe this is already solved in a different way.
> 
> One of the optimizations you said it where better avoided to be created
> in first. I agree.
> Only, even if that is archived at some time, who guarantees that it will
> not be back (and unnoticed)?
> 
> Are there tests, that can detect this?

Not really, this is something I'am thinking about already for years :(

> Of course all generated assembler could be parsed back, and analysed.
> But that is rather a lot of work.

Yes.

> 
> Or the code could be added, and asserted not to be triggered. Of course
> adding actual test code to the compiler is not a good way either. Unless
> assert style / Only compiled in if -Sa or -dXXX is given.
> 
> Would anything in that direction make sense?

I think this is hard to achive as well.

> Then the code could work in real live, until no longer needed, at which
> time it was still useful for assertion.

Probably the best way would be benchmarking regarding runtime and
procedure size: significant changes in runtime or procedure size would
trigger an alert.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-22 Thread Martin Frb

On 22/01/2014 21:29, Florian Klämpfl wrote:

Am 22.01.2014 04:06, schrieb Martin Frb:

On 21/01/2014 21:28, Florian Klämpfl wrote:

Can you post some example code? It might be worth to think about
improving this already in at the node level.


While getting examples, another issue:


with -O2 , -O3 or -O4

Note the
 movl%eax,%edx
 movl%edx,%eax

with -O1 it is
 movl%eax,-4(%ebp)
 movl-4(%ebp),%eax

Well, -O1 can be neglected ...



Ok.

There is another idea, I had in that context, and like to bring up. 
Though maybe this is already solved in a different way.


One of the optimizations you said it where better avoided to be created 
in first. I agree.
Only, even if that is archived at some time, who guarantees that it will 
not be back (and unnoticed)?


Are there tests, that can detect this?
Of course all generated assembler could be parsed back, and analysed. 
But that is rather a lot of work.


Or the code could be added, and asserted not to be triggered. Of course 
adding actual test code to the compiler is not a good way either. Unless 
assert style / Only compiled in if -Sa or -dXXX is given.


Would anything in that direction make sense?
Then the code could work in real live, until no longer needed, at which 
time it was still useful for assertion.


Just an idea.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-22 Thread Florian Klämpfl
Am 22.01.2014 04:06, schrieb Martin Frb:
> On 21/01/2014 21:28, Florian Klämpfl wrote:
>> Can you post some example code? It might be worth to think about
>> improving this already in at the node level.
>>
> 
> While getting examples, another issue:
> 
> 
> with -O2 , -O3 or -O4
> 
> Note the
> movl%eax,%edx
> movl%edx,%eax
> 
> with -O1 it is
> movl%eax,-4(%ebp)
> movl-4(%ebp),%eax

Well, -O1 can be neglected ...

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-22 Thread Florian Klämpfl
Am 22.01.2014 00:27, schrieb Martin Frb:
> On 21/01/2014 21:28, Florian Klämpfl wrote:
>> Am 20.01.2014 01:18, schrieb Martin:
>>
>>> It used
>>> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
>>> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
>>> but should only compare the supregister part
>>> I replaced that
>>> not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^))
>>> then
>>>
>>> uncommented, and tested.
>>> It does catch a big lot of occurrences.
>> Can you post some example code? It might be worth to think about
>> improving this already in at the node level.
>>
>>
> 
> I will try to find some. (I just enabled it, and put a writeln in there,
> to see, if it was triggered. Then run the tests and buli Lazarus.
> 
> In the meantime, what about the other additions/changes?
> 
> I already wrote code for them, and mailed it.
> So what I need to know: How to best go on to get them accepted and into
> the compiler?
> 

Submit them to a bug report, I can look during the weekend into them.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-21 Thread Martin Frb

On 21/01/2014 21:28, Florian Klämpfl wrote:

Can you post some example code? It might be worth to think about
improving this already in at the node level.



While getting examples, another issue:


with -O2 , -O3 or -O4

Note the
movl%eax,%edx
movl%edx,%eax

with -O1 it is
movl%eax,-4(%ebp)
movl-4(%ebp),%eax



.section .text.n_p$project1$_$tfoo_$__$$_init,"x"
.balign 16,0x90
.globlP$PROJECT1$_$TFOO_$__$$_INIT
P$PROJECT1$_$TFOO_$__$$_INIT:
# Temps allocated between esp+0 and esp+0
# Var $self located in register edx
# [19] begin
movl%eax,%edx
# Var $self located in register edx
# [20] Init;
movl%edx,%eax
movl(%edx),%edx
call*100(%edx)
# [21] end;
ret

program project1;
{$mode objfpc}

type
  { TFoo }

  TFoo = class
a,b : Integer;
function Bar: Boolean;
procedure Init; virtual;
  end;

function TFoo.Bar: Boolean;
begin
  Result := a <> b;
end;

procedure TFoo.Init;
begin
  Init;
end;

{ TFoo }


begin
end.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-21 Thread Martin Frb

On 21/01/2014 23:27, Martin Frb wrote:

On 21/01/2014 21:28, Florian Klämpfl wrote:

Am 20.01.2014 01:18, schrieb Martin:


It used
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
but should only compare the supregister part
I replaced that
not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) 
then


uncommented, and tested.
It does catch a big lot of occurrences.

Can you post some example code? It might be worth to think about
improving this already in at the node level.




I will try to find some. (I just enabled it, and put a writeln in 
there, to see, if it was triggered. Then run the tests and buli Lazarus.


There are 100 or 1000nds of matches when compiling the IDE, not sure how 
many are triggered by the below, or by which other condition.



2 examples in below, but they only happen with -O- or -O1
In TFoo.Bar move the comment to the other statement, for 2nd example
(if init is called within Init, then different code is produced)

In both cases the 2nd register should not be needed at all. (but if 
present, is better loaded from


program project1; {$mode objfpc}
type  TFoo = class
a,b : Integer;
function Bar: Boolean;
procedure Init; virtual;
  end;

function TFoo.Bar: Boolean;
begin
  Result := a = b;
//  Init;
end;
procedure TFoo.Init;  begin  end;
begin
end.

   Result := a = b;
.section .text.n_p$project1$_$tfoo_$__$$_bar$$boolean,"x"
.balign 16,0x90
.globlP$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN
P$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN:
# Temps allocated between ebp-8 and ebp-8
# [project1.lpr]
# [14] begin
pushl%ebp
movl%esp,%ebp
leal-8(%esp),%esp
# Var $self located at ebp-4
# Var $result located at ebp-8
movl%eax,-4(%ebp)
# [15] Init;
movl-4(%ebp),%eax  // <
movl-4(%ebp),%edx // <
movl(%edx),%edx
call*100(%edx)
# [16] end;
leave
ret

/ INIT
.section .text.n_p$project1$_$tfoo_$__$$_bar$$boolean,"x"
.balign 16,0x90
.globlP$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN
P$PROJECT1$_$TFOO_$__$$_BAR$$BOOLEAN:
# Temps allocated between ebp-8 and ebp-8
# [project1.lpr]
# [14] begin
pushl%ebp
movl%esp,%ebp
leal-8(%esp),%esp
# Var $self located at ebp-4
# Var $result located at ebp-8
movl%eax,-4(%ebp)
# [15] Init;
movl-4(%ebp),%eax
movl-4(%ebp),%edx
movl(%edx),%edx
call*100(%edx)
# [16] end;
leave
ret

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-21 Thread Martin Frb

On 21/01/2014 21:28, Florian Klämpfl wrote:

Am 20.01.2014 01:18, schrieb Martin:


It used
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
but should only compare the supregister part
I replaced that
not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then

uncommented, and tested.
It does catch a big lot of occurrences.

Can you post some example code? It might be worth to think about
improving this already in at the node level.




I will try to find some. (I just enabled it, and put a writeln in there, 
to see, if it was triggered. Then run the tests and buli Lazarus.


In the meantime, what about the other additions/changes?

I already wrote code for them, and mailed it.
So what I need to know: How to best go on to get them accepted and into 
the compiler?


One problem is, if I create 3  or 4  patches, and report them, only one 
will work. The others may then conflict. Or I need to create them, based 
on each other, and only applicable in the right order.


I can do all in one (if that is preferred)...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] More peephole

2014-01-21 Thread Florian Klämpfl
Am 20.01.2014 01:18, schrieb Martin:
> Just been looking at the peehole opt (i386). Other than the 2 items
> already mailed, I found that:
> 
> 1) Gode as follows is sometimes generated (at various opt levels)
> 
> .Ll2:
> # [36] i := 1;
> movl$1,%eax
> .Ll3:
> # [38] i := i + 1;
> movl$2,%eax
> 
> I could not find any code dealing with it, 

There is none I think.

> and added some. It does catch
> a noticeable amount of occurrences during build of fpc and lazarus.

Go ahead. According to me experience a good mean besides testing
compiler and lazarus to quickly catch errors in the peephole optimizer
is to compare assembler before and after.

> 
> 2)
> Commented existing code (apparently since revision 1) for
>   {movl [mem1],reg1 to movl [mem1],reg1
>movl [mem1],reg2movl reg1,reg2 }

The code was commented 15 years ago :) with the comment

* split the optimizer

by Jonas. So I think such statements were caught by the old assembler
cse optimizer which is disabled now.

> 
> It used
> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
> (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
> but should only compare the supregister part
> I replaced that
> not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then
> 
> uncommented, and tested.
> It does catch a big lot of occurrences.

Can you post some example code? It might be worth to think about
improving this already in at the node level.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] More peephole

2014-01-19 Thread Martin
Just been looking at the peehole opt (i386). Other than the 2 items 
already mailed, I found that:


1) Gode as follows is sometimes generated (at various opt levels)

.Ll2:
# [36] i := 1;
movl$1,%eax
.Ll3:
# [38] i := i + 1;
movl$2,%eax

I could not find any code dealing with it, and added some. It does catch 
a noticeable amount of occurrences during build of fpc and lazarus.


2)
Commented existing code (apparently since revision 1) for
  {movl [mem1],reg1 to movl [mem1],reg1
   movl [mem1],reg2movl reg1,reg2 }

It used
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
but should only compare the supregister part
I replaced that
not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then

uncommented, and tested.
It does catch a big lot of occurrences.

Both pass all tests form the test dir (except those failing already (see 
my other mail / compared at slightly diff settings)


Both build a working FPC and Lazarus.


Patch with all 3 changes at the end of the mail. I will separate them, 
if all else is ok, and needed.


New Code
for removing double move to register. Code is in a block where opsize, 
and other restraints have already been confirmed.


if (taicpu(p).oper[1]^.typ = top_reg) and
   (taicpu(hp1).oper[1]^.typ = top_reg) and
   (taicpu(p).oper[1]^.reg = 
taicpu(hp1).oper[1]^.reg) and

not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then
{ We have mov xxx, reg1  mov yyy, reg1 }
  begin
asml.remove(p);
p.free;
p := hp1;
  end


existing, now uncommented
else
  {movl [mem1],reg1 to movl [mem1],reg1
   movl [mem1],reg2movl reg1,reg2 }
  if (taicpu(p).oper[0]^.typ = top_ref) and
 (taicpu(p).oper[1]^.typ = top_reg) and
 (taicpu(hp1).oper[0]^.typ = top_ref) and
 (taicpu(hp1).oper[1]^.typ = top_reg) and
RefsEqual(TReference(taicpu(p).oper[0]^.ref^),taicpu(hp1).oper[0]^.ref^) and
not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then
//(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
//(taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
begin
taicpu(hp1).loadReg(0,taicpu(p).oper[1]^.reg)
end





Index: compiler/i386/popt386.pas
===
--- compiler/i386/popt386.pas(revision 26519)
+++ compiler/i386/popt386.pas(working copy)
@@ -1369,39 +1369,50 @@
 end
 end
   else
-(*  {movl [mem1],reg1
-movl [mem1],reg2
-to:
-  movl [mem1],reg1
-  movl reg1,reg2 }
-if (taicpu(p).oper[0]^.typ = top_ref) and
-  (taicpu(p).oper[1]^.typ = top_reg) and
-  (taicpu(hp1).oper[0]^.typ = top_ref) and
-  (taicpu(hp1).oper[1]^.typ = top_reg) and
-  (taicpu(p).opsize = taicpu(hp1).opsize) and
- RefsEqual(TReference(taicpu(p).oper[0]^^),taicpu(hp1).oper[0]^^.ref^) and
- (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.base) and
- (taicpu(p).oper[1]^.reg<>taicpu(hp1).oper[0]^^.ref^.index) then
- taicpu(hp1).loadReg(0,taicpu(p).oper[1]^.reg)
-else*)
-{   movl const1,[mem1]
-movl [mem1],reg1
-to:
-movl const1,reg1
-movl reg1,[mem1] }
-  if (taicpu(p).oper[0]^.typ = top_const) and
- (taicpu(p).oper[1]^.typ = top_ref) and
+if (taicpu(p).oper[1]^.typ = top_reg) and
+   (taicpu(hp1).oper[1]^.typ = top_reg) and
+   (taicpu(p).oper[1]^.reg = 
taicpu(hp1).oper[1]^.reg) and

+ not(RegInOp(getsupreg(taicpu(p).oper[1]^.reg),taicpu(hp1).oper[0]^)) then
+{ We have mov xxx, reg1  mov yyy, reg1 }
+  begin
+asml.remove(p);
+p.free;
+p := hp1;
+  end
+