Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-10-30 Thread J. Gareth Moreton
A partial example of where "condition_in" shows its power (compiled code 
from fpc\packages\fcl-base\src\advancedipc.pp)


BEFORE:

    ...
    jb    .Lj178
    subl    $17,%eax
    cmpl    $26,%eax
    jb    .Lj178
    cmpl    $30,%eax
    stc
    je    .Lj178
    subl    $32,%eax
    cmpl    $26,%eax
.Lj178:
    jc    .Lj180    (Appears later in the code)
    ...

AFTER:

    ...
    jb    .Lj180    <-- Label changed
    subl    $17,%eax
    cmpl    $26,%eax
    jb    .Lj180    <-- Label changed
    cmpl    $30,%eax
    stc
    je    .Lj178
    subl    $32,%eax
    cmpl    $26,%eax
.Lj178:
    jc    .Lj180    (Appears later in the code)
    ...

In this case, 'condition_in' detects that C_C is a subset of C_B 
(they're actually equivalent in this case... both branch if the Carry 
Flag is set).  This particular optimisation means that the code only has 
to branch once if the JB conditions are met, rather than twice.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations

2019-10-30 Thread J. Gareth Moreton

Hi everyone,

So after my x86_64 optimizer overhaul was rejected in its current form, 
I decided to set to work again in cleaner steps.  The first thing I've 
started to address are jump optimisations, since they gave notable 
improvements that were independent of things lke the reduction in the 
number of passes.  There is also a bonus... a lot of it is 
cross-platform too!


One thing I do need to do is to cross-compile to all known platforms... 
I can only test a handful of the i386 and x86_64 platforms, but I want 
to at least see if it compiles before I submit the patch for more 
thorough testing.  Does anyone have any tips or a sample script 
(preferably a .BAT file) that cross-compiles from x86_64-win64 or 
i386-win32 onto a reasonable set of platforms that at least gives me a 
good smoke test? (Obviously, every combination is a little overkill if 
impractical).


I'm trying to keep things as simple as possible, but for the best 
optimisations, I had to introduce a new function for all platforms:


*function *condition_in(*const *c1, c2: TAsmCond): Boolean;

It is similar to "conditions_equal", but it returns true if c1 is a 
subset of c2 (e.g. "C_E" and "C_L" are both subsets of "C_LE", but 
"C_LE" is not a subset of "C_L").  It always returns True if c2 is 
"C_None" (i.e. an unconditional jump) or if the conditions are equal, 
but analysing the subsets is platform-specific.  I've done the best I 
can for the platform-specific assemblers that I don't know too well, but 
I'll need someone to check them over and maybe improve them, since I 
don't know what some of the conditions are.  The function is mostly 
failsafe, in that if it returns False, an optimisation won't be 
performed.  It's to help track final jump destinations more accurately, 
as well as detect what I call 'dominated jumps' - for example:


jbe @lbl1
je   @lbl2

The je instruction will never branch because the jbe instruction above 
it will branch if the equality condition is met, so it can be safely 
removed.


I'll run the win32 and win64 tests over Thursday and will submit the 
patches once everything passes, as well as a PDF design and 
implementation spec to explain what everything is as well as the 
concepts of dominated jumps and the like.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-30 Thread J. Gareth Moreton
Well, when it comes to the specific changes I made to uComplex... the 
compiler might be able to detect a kind of 'auto-const' system, but 
actually inserting 'const' into the formal parameters helps with syntax 
checking as well as generating more efficient code, namely modifying the 
parameter when you're perhaps not supposed to.


For vectorcall, I don't think the compiler will correctly guess when and 
when not to use the calling convention, and there are times where you 
may not want to use vectorcall, usually when interfacing with 
third-party programs or libraries.  In this case, it's more likely that 
the programmer may stumble upon unintented behaviour if it tries to 
enable vectorcall for something that is meant to be the default 
Microsoft ABI instead.


And using assembly language to directly call the uComplex routines I 
don't think is a realistic real-world example, considering that's a 
situation where you're more likely to be using the XMM registers 
directly to do such mathematics.  Besides, I think all bets are off when 
it comes to assembly language - in this instance I tried to make sure 
that Pascal code didn't have to change though (other than a 
recompilation maybe).


I could just say 'screw it' and write my own complex number library, but 
then that would just add to the growing collection of third-party 
libraries instead of a standard set of libraries that are antiquated and 
potentially sluggish on modern systems.


Gareth aka. Kit

On 30/10/2019 22:02, Florian Klämpfl wrote:

Am 29.10.19 um 14:06 schrieb Marco van de Voort:


Op 2019-10-27 om 10:46 schreef Florian Klämpfl:

Am 27.10.19 um 10:27 schrieb Michael Van Canneyt:
If you genuinely believe that micro-optimization changes can make a 
difference:


Submit patches. 


As said: I am against applying them. Why? They clutter code and 
after all, they make assumptions about the current target which not 
might be always valid. And time testing them is much better spent in 
improving the compiler and then all code benefits. Another point: 
for example explicit inline increases normally code size (not always 
but often), so it is against the use of -Os. Applying inline 
manually on umpteen subroutines makes no sense. Better improve auto 
inlining.


Auto inlining is also no panacea.   It only works with heuristics, 
and is thus only as good as a formula of the heuristic.


Yes. And manually adding inline is only as good as the knowledge of 
the user doing so. If somebody implements it right (I did not, I used 
the easiest approach and used an existing function to estimate the 
complexity of a subroutine). The compiler can just count the number of 
the generate instructions or even calculate the length of the 
procedure and then decide to keep the node tree for inlining.




Changing calling conventions, vectorizing, loops all complicates 
that, and it will never be perfect, and a change here will lead to a 
problem there etc.


See above.



If you know a routine can evaluate to one instruction in most cases, 
I don't see anything wrong with just marking it as such.




The compiler knows this as well as the compiler generated the code. 
Why should I guess if the compiler knows?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Question on updating FPC packages

2019-10-30 Thread Florian Klämpfl

Am 29.10.19 um 14:06 schrieb Marco van de Voort:


Op 2019-10-27 om 10:46 schreef Florian Klämpfl:

Am 27.10.19 um 10:27 schrieb Michael Van Canneyt:
If you genuinely believe that micro-optimization changes can make a 
difference:


Submit patches. 


As said: I am against applying them. Why? They clutter code and after 
all, they make assumptions about the current target which not might be 
always valid. And time testing them is much better spent in improving 
the compiler and then all code benefits. Another point: for example 
explicit inline increases normally code size (not always but often), 
so it is against the use of -Os. Applying inline manually on umpteen 
subroutines makes no sense. Better improve auto inlining.


Auto inlining is also no panacea.   It only works with heuristics, and 
is thus only as good as a formula of the heuristic.


Yes. And manually adding inline is only as good as the knowledge of the 
user doing so. If somebody implements it right (I did not, I used the 
easiest approach and used an existing function to estimate the 
complexity of a subroutine). The compiler can just count the number of 
the generate instructions or even calculate the length of the procedure 
and then decide to keep the node tree for inlining.




Changing calling conventions, vectorizing, loops all complicates that, 
and it will never be perfect, and a change here will lead to a problem 
there etc.


See above.



If you know a routine can evaluate to one instruction in most cases, I 
don't see anything wrong with just marking it as such.




The compiler knows this as well as the compiler generated the code. Why 
should I guess if the compiler knows?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Progress on reviewing x86_64 optimizer overhaul and node semantic pass

2019-10-30 Thread George Bakhtadze
> For example, if I have changes in separate files that I want to split up, how 
> might one go about it without manually modiying the patch files? (As an easy 
> example, I split up the uComplex patches into two... one with the alignment 
> and vectorcall changes, and the other with the "const" modifier in the 
> parameters).

As an option, you can apply patches and create new ones with e.g. VCS->Create 
patch in IDEA where you can select which files and which changes within each 
file should go to the patch.

---
Best regards, George
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel