Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-09 Thread J. Gareth Moreton
Is there anything I should do to aid testing the jump optimisations, or 
is there anything else that needs cleaning up?


Unfortunately there are "TODO" comments in some of the "condition_in" 
functions because I don't know enough about that particular platform to 
ensure correctness, but it is failsafe in that it will simply return 
False after trying a simple comparison (calls "conditions_equal") and 
checking against the trivial 'C_None'.


I do wonder if I'm allowed to override the "in" operator so the code 
that calls those functions is a bit cleaner.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-08 Thread J. Gareth Moreton
Made some more updates to the jump optimisation patches, removing code 
duplication and also some untidy debugging code.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-06 Thread J. Gareth Moreton
Jump optimisation patches ready, and something resembling a design spec 
to explain things.


https://bugs.freepascal.org/view.php?id=36271

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-01 Thread J. Gareth Moreton

Turns out it wasn't a bug, but a very contrived set-up.

When the peephole optimiser is turned off completely, the following is 
revealed:


    je    .Lj371
    jmp    .Lj372
.Lj371:
    movl    $3,%r14d
    movl    $1,%r15d
    jmp    .Lj373
    .p2align 4,,10
    .p2align 3
.Lj372:
    jmp    .Lj333
.Lj373:
.Lj370:
.Lj367:
   ...

What happens is that the optimiser changes "je .Lj371; jmp .Lj372; 
.Lj371:" into "jne .Lj372" (what I call a 'conditional jump inversion'), 
and then the final destination is tracked: the compiler notices that 
after "jne .Lj372", the program flow immediately stumbles upon an 
unconditional jump ("jmp .Lj333"), so it changes the destination to 
match, thus it becomes "jne .Lj333".  After all of this, .Lj372 becomes 
a dead label, and when the optimiser reaches "jmp .Lj373", it removes 
all the dead code between it and the next live label, since it will 
never be executed; this includes the original "jmp .Lj333" instruction 
because .Lj372 is no longer referenced by anything (Then "jmp .Lj373" is 
removed as well because the destination label is right after it once 
everything is stripped).  As such, because .Lj372 is a dead label and 
there are no other (live) labels in that cluster, it correctly removes 
the alignment fields... in other words, the remaining labels were never 
actually aligned - it was just a coincidence they became aligned before.


Gareth aka. Kit

On 01/11/2019 14:11, Sven Barth via fpc-devel wrote:
J. Gareth Moreton > schrieb am Fr., 1. Nov. 2019, 12:56:


So the tests all passed on i386-win32 and x86_64-win64, so that's s a
good sign.  I can't submit the patches for evaluation yet because I
haven't finished the design spec yet, and also because of a minor bug
that deals with collapsing label clusters:

 .p2align 4,,10
 .p2align 3
.Lj370:
.Lj367:
.Lj364:
.Lj361:
.Lj358:
.Lj355:
.Lj352:
.Lj349:
.Lj346:
.Lj343:
.Lj340:

In this segment, everything is stripped except for the last label,
which
is fine as all the references are changed too. Unfortunately, the
alignment fields are removed too, which shouldn't happen. It doesn't
produce incorrect code, but it may incur a performance penalty, so
shouldn't be removed - I'm just trying to figure out why that's
happening!


Considering that you said that this feature is essentially cross 
platform: on some other platform the alignments might be important 
beside performance (e.g. a branch inside a branch delay slot or 
something like that). So, yeah, that should be fixed...


Regards,
Sven


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-01 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am Fr., 1. Nov. 2019,
12:56:

> So the tests all passed on i386-win32 and x86_64-win64, so that's s a
> good sign.  I can't submit the patches for evaluation yet because I
> haven't finished the design spec yet, and also because of a minor bug
> that deals with collapsing label clusters:
>
>  .p2align 4,,10
>  .p2align 3
> .Lj370:
> .Lj367:
> .Lj364:
> .Lj361:
> .Lj358:
> .Lj355:
> .Lj352:
> .Lj349:
> .Lj346:
> .Lj343:
> .Lj340:
>
> In this segment, everything is stripped except for the last label, which
> is fine as all the references are changed too. Unfortunately, the
> alignment fields are removed too, which shouldn't happen.  It doesn't
> produce incorrect code, but it may incur a performance penalty, so
> shouldn't be removed - I'm just trying to figure out why that's happening!
>

Considering that you said that this feature is essentially cross platform:
on some other platform the alignments might be important beside performance
(e.g. a branch inside a branch delay slot or something like that). So,
yeah, that should be fixed...

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-11-01 Thread J. Gareth Moreton
So the tests all passed on i386-win32 and x86_64-win64, so that's s a 
good sign.  I can't submit the patches for evaluation yet because I 
haven't finished the design spec yet, and also because of a minor bug 
that deals with collapsing label clusters:


    .p2align 4,,10
    .p2align 3
.Lj370:
.Lj367:
.Lj364:
.Lj361:
.Lj358:
.Lj355:
.Lj352:
.Lj349:
.Lj346:
.Lj343:
.Lj340:

In this segment, everything is stripped except for the last label, which 
is fine as all the references are changed too. Unfortunately, the 
alignment fields are removed too, which shouldn't happen.  It doesn't 
produce incorrect code, but it may incur a performance penalty, so 
shouldn't be removed - I'm just trying to figure out why that's happening!


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-10-31 Thread J. Gareth Moreton
I forgot about "fullcycle", thanks.  The main reason is because of the 
"condition_in" functions I've added to the other platforms - I want to 
see if they at least compile!


Now the fun begins.

Gareth aka. Kit

On 31/10/2019 06:18, Sven Barth via fpc-devel wrote:

Am 31.10.2019 um 05:57 schrieb J. Gareth Moreton:
One thing I do need to do is to cross-compile to all known 
platforms... I can only test a handful of the i386 and x86_64 
platforms, but I want to at least see if it compiles before I submit 
the patch for more thorough testing.  Does anyone have any tips or a 
sample script (preferably a .BAT file) that cross-compiles from 
x86_64-win64 or i386-win32 onto a reasonable set of platforms that at 
least gives me a good smoke test? (Obviously, every combination is a 
little overkill if impractical).
If you only want to test whether the compiler builds you can do a 
"fullcycle" inside the compiler directory.


If you also want to test whether the RTL compiles you could use 
dummyas in compiler\utils and pass that as AS and compile the RTL for 
all platforms you want to test though that will obviously hide 
assembly problems. Otherwise you'll need binutils for all platforms 
you want to test.


I myself (as I'm using Windows 10) use the Windows Subsystem for Linux 
where I have installed the binutils for various targets and can 
compile the whole of FPC from the same repository I use on Windows for 
various targets and using QEMU's user space emulation I can even run 
the testsuite.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-10-31 Thread Sven Barth via fpc-devel

Am 31.10.2019 um 05:57 schrieb J. Gareth Moreton:
One thing I do need to do is to cross-compile to all known 
platforms... I can only test a handful of the i386 and x86_64 
platforms, but I want to at least see if it compiles before I submit 
the patch for more thorough testing.  Does anyone have any tips or a 
sample script (preferably a .BAT file) that cross-compiles from 
x86_64-win64 or i386-win32 onto a reasonable set of platforms that at 
least gives me a good smoke test? (Obviously, every combination is a 
little overkill if impractical).
If you only want to test whether the compiler builds you can do a 
"fullcycle" inside the compiler directory.


If you also want to test whether the RTL compiles you could use dummyas 
in compiler\utils and pass that as AS and compile the RTL for all 
platforms you want to test though that will obviously hide assembly 
problems. Otherwise you'll need binutils for all platforms you want to test.


I myself (as I'm using Windows 10) use the Windows Subsystem for Linux 
where I have installed the binutils for various targets and can compile 
the whole of FPC from the same repository I use on Windows for various 
targets and using QEMU's user space emulation I can even run the testsuite.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-10-30 Thread J. Gareth Moreton
A partial example of where "condition_in" shows its power (compiled code 
from fpc\packages\fcl-base\src\advancedipc.pp)


BEFORE:

    ...
    jb    .Lj178
    subl    $17,%eax
    cmpl    $26,%eax
    jb    .Lj178
    cmpl    $30,%eax
    stc
    je    .Lj178
    subl    $32,%eax
    cmpl    $26,%eax
.Lj178:
    jc    .Lj180    (Appears later in the code)
    ...

AFTER:

    ...
    jb    .Lj180    <-- Label changed
    subl    $17,%eax
    cmpl    $26,%eax
    jb    .Lj180    <-- Label changed
    cmpl    $30,%eax
    stc
    je    .Lj178
    subl    $32,%eax
    cmpl    $26,%eax
.Lj178:
    jc    .Lj180    (Appears later in the code)
    ...

In this case, 'condition_in' detects that C_C is a subset of C_B 
(they're actually equivalent in this case... both branch if the Carry 
Flag is set).  This particular optimisation means that the code only has 
to branch once if the JB conditions are met, rather than twice.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel