Re: [fpc-devel] The 15k bounty: Optimizing executable speed forLinux x86 / LLVM

2018-10-27 Thread J. Gareth Moreton
 To add my two pence to this:

 Low-level optimisation is my personal speciality.  Currently a lot of
what I do is on small savings within the peephole optimizer on x86
systems.  Generally the result is that the compiler runs slightly more
slowly, but the final binary runs a bit more efficiently - for example,
replacing integer divisions by a constant with multiplications.

 It's not as important as improving node generation and the like, but every
little helps.  In the meantime, I'm seeing if it's possible to minimise
the number of passes that the optimiser makes - currently it's at least 4
per code block when -O3 is specified... pre-peephole, pass 1 (which is
repeated until there are no more critical changes), pass 2 and
post-peephole).  This may not amount to anything, but I tend to experiment
a lot.

 Part of my incentive is that I like to design games and am also an amateur
mathematician, both fields that can benefit from speed gains, so if I can
make Free Pascal into something that is suitable for such tasks, then my
life is complete!

 Gareth aka. Kit
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 8:22 PM Ozz Nixon  wrote:

> * Sorry for off topic - just that grabbed my "What did he just say?"
> button..
>

Huh? I said "Also linked lists absolutely everywhere, that would perform
much better as array based lists."

Meaning, exactly the same thing you just implied. You got what I meant
completely backwards somehow.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ozz Nixon
SORRY - JUST RE-READ... that is what you are saying... it's late here ;-(

On Sat, Oct 27, 2018 at 8:22 PM Ozz Nixon  wrote:

> * Not arguing, but... *
>
> Linked List faster than Array?
> Unless I missed what you are talking about... I always teach programmers:
>
> Array is the fastest collection to use, followed by Linked List, followed
> by bTree, etc.
>
> * Sorry for off topic - just that grabbed my "What did he just say?"
> button...
>
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ozz Nixon
* Not arguing, but... *

Linked List faster than Array?
Unless I missed what you are talking about... I always teach programmers:

Array is the fastest collection to use, followed by Linked List, followed
by bTree, etc.

* Sorry for off topic - just that grabbed my "What did he just say?"
button...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 6:46 PM Sven Barth via fpc-devel <
fpc-devel@lists.freepascal.org> wrote:

> Except of course for optimizations that can be done on the platform
> independent node tree.
>

That specifically is IMO the "key" to a higher compiler-wide level of
optimization capabilities, as shown by various more recent compilers for
other languages and also by LLVM. Target-CPU-level optimizations are
certainly still very necessary for some things, but it you pass the
assembly code generator better information to begin with they're not nearly
as relevant. I've been looking over the compiler codebase recently and
there's quite a few things that could obviously be done better IMO at the
top level before any platform specific-stuff comes into play.

There's also a number of things that would specifically help the build-time
performance of the compiler itself that I've noticed, such as there being
many, many, many, one-liner functions and procedures that should almost
certainly be marked as inline but currently are not. Also linked lists
absolutely everywhere, that would perform much better as array based lists.

If the core team is open to arbitrary/speculative patches I might try to
work out a few for what I think are the most important issues and submit
them for consideration sometime in the near future.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Sven Barth via fpc-devel
Ben Grasset  schrieb am So., 28. Okt. 2018, 00:29:

> On Sat, Oct 27, 2018 at 1:38 PM Florian Klämpfl 
> wrote:
>
>> That it is useful to work on table based exception handling for all
>> targets
>>
>
> Not arguing with that at all. I was just trying to point out that I'm not
> a fan of the idea that FPC's code generators are "good enough" as is.
>

And no one said that it is. But points like table based exception handling
and section based threadvars can be relatively easily achieved and benefits
more targets while working on the optimizer usually is a per platform work.
Except of course for optimizations that can be done on the platform
independent node tree.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 1:38 PM Florian Klämpfl 
wrote:

> That it is useful to work on table based exception handling for all targets
>

Not arguing with that at all. I was just trying to point out that I'm not a
fan of the idea that FPC's code generators are "good enough" as is.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Florian Klämpfl
Am 27.10.2018 um 19:19 schrieb Michael Van Canneyt:
> 
> 
> On Sat, 27 Oct 2018, Florian Klämpfl wrote:
> 
>> If you read the whole thread, LLVM needs a rewritten exception handling as 
>> well. Further, a quick test
>> of table based exception handling on bansi1 (which is mainly a memory 
>> manager test) gives:
>>
>> standard exception handling:
>>
>> fpctrunk\tests\bench>pp11 bansi1 -O3
>>
>> fpctrunk\tests\bench>bansi1
>> Test 1: 100 done in 0.537 sec
>> Test 2: 100 done in 0.535 sec
>> Test 3: 100 done in 0.587 sec
>>
>> SEH based exception handling:
>>
>> fpctrunk\tests\bench>pp11 bansi1 -O3
>>
>> fpctrunk\tests\bench>bansi1
>> Test 1: 100 done in 0.456 sec
>> Test 2: 100 done in 0.457 sec
>> Test 3: 100 done in 0.446 sec
> 
> Florian, I am not sure what this is supposed to prove ?

That it is useful to work on table based exception handling for all targets ...

> 
> It's 15% off the elapsed time (almost 1/6th), that seems worth spending some 
> time on...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Florian Klämpfl
Am 27.10.2018 um 18:21 schrieb Ben Grasset:
> On Sat, Oct 27, 2018 at 8:47 AM Jonas Maebe  > wrote:
> 
> On 27/10/18 05:45, Ben Grasset wrote:
> 
>   
> 
> You also need "opt" if you want to perform full optimizations (or just
> use clang, which a.o. combines the functionality of llc and opt).
> 
> There's one more problem I forgot to mention in my first post, and it is
> probably a deal breaker for the original bounty: LLVM does not support
> Borland's fastcall calling convention for i386. So you would need to add
> support for Borland fastcall on i386 to LLVM if it has to support
> existing i386 inline assembly routines written for FPC/Delphi.
> 
> Finally, adding support for 32 bit targets in FPC's LLVM backend would
> also require some work due to how FPC's code generator is structured,
> and due to the fact that need to have two code generators in a single
> binary (the native one to support the generation of entry and exit code
> for pure inline assembler routines, and the LLVM one for the rest).
> 
> 
> LLC (at least now) statically links the necessary parts of LLVM and works 
> independently of Opt, with a simpler set of
> command line options (it just has overall O1, O2, and O3 flags.)
> 
> As far as the point about assembly on 32 bit, while it does seem like that 
> would be a problem for the bounty
> requirements, would it really be the end of the world in a more general 
> sense? I can't imagine people who are still
> using 32-bit-hardware and writing 32-bit applications would complain if the 
> LLVM backend was not available for 32-bit.
> 
> Anyways though, I do think code gen improvements for FPC, LLVM or not, are 
> likely going to be a lot more widely helpful
> than just rewriting exception handling 

If you read the whole thread, LLVM needs a rewritten exception handling as 
well. Further, a quick test
of table based exception handling on bansi1 (which is mainly a memory manager 
test) gives:

standard exception handling:

fpctrunk\tests\bench>pp11 bansi1 -O3

fpctrunk\tests\bench>bansi1
Test 1: 100 done in 0.537 sec
Test 2: 100 done in 0.535 sec
Test 3: 100 done in 0.587 sec

SEH based exception handling:

fpctrunk\tests\bench>pp11 bansi1 -O3

fpctrunk\tests\bench>bansi1
Test 1: 100 done in 0.456 sec
Test 2: 100 done in 0.457 sec
Test 3: 100 done in 0.446 sec

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 8:47 AM Jonas Maebe  wrote:

> On 27/10/18 05:45, Ben Grasset wrote:



> You also need "opt" if you want to perform full optimizations (or just
> use clang, which a.o. combines the functionality of llc and opt).
>
> There's one more problem I forgot to mention in my first post, and it is
> probably a deal breaker for the original bounty: LLVM does not support
> Borland's fastcall calling convention for i386. So you would need to add
> support for Borland fastcall on i386 to LLVM if it has to support
> existing i386 inline assembly routines written for FPC/Delphi.
>
> Finally, adding support for 32 bit targets in FPC's LLVM backend would
> also require some work due to how FPC's code generator is structured,
> and due to the fact that need to have two code generators in a single
> binary (the native one to support the generation of entry and exit code
> for pure inline assembler routines, and the LLVM one for the rest).
>

LLC (at least now) statically links the necessary parts of LLVM and works
independently of Opt, with a simpler set of command line options (it just
has overall O1, O2, and O3 flags.)

As far as the point about assembly on 32 bit, while it does seem like that
would be a problem for the bounty requirements, would it really be the end
of the world in a more general sense? I can't imagine people who are still
using 32-bit-hardware and writing 32-bit applications would complain if the
LLVM backend was not available for 32-bit.

Anyways though, I do think code gen improvements for FPC, LLVM or not, are
likely going to be a lot more widely helpful than just rewriting exception
handling (not that rewriting exception handling is a bad idea.) I think
there's a lot of people who would like FPC to generate faster code than it
currently does. Can you recommend any known areas in need of improvement of
the non-platform-specific parts of the code generators that might be a good
place to start for someone who's an experienced Pascal developer but hasn't
worked with the compiler codebase before?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Jonas Maebe

On 27/10/18 05:45, Ben Grasset wrote:

As far as dependencies, it would add 
none whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, 
no more than any other target FPC supports. Just think of it as yet 
another assembler format.)


You also need "opt" if you want to perform full optimizations (or just 
use clang, which a.o. combines the functionality of llc and opt).


There's one more problem I forgot to mention in my first post, and it is 
probably a deal breaker for the original bounty: LLVM does not support 
Borland's fastcall calling convention for i386. So you would need to add 
support for Borland fastcall on i386 to LLVM if it has to support 
existing i386 inline assembly routines written for FPC/Delphi.


Finally, adding support for 32 bit targets in FPC's LLVM backend would 
also require some work due to how FPC's code generator is structured, 
and due to the fact that need to have two code generators in a single 
binary (the native one to support the generation of entry and exit code 
for pure inline assembler routines, and the LLVM one for the rest).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Martin Schreiber
On Saturday 27 October 2018 09:27:59 Sven Barth via fpc-devel wrote:
> >
> > Not really. The IR format has been pretty stable since version 3.9 or so
> > (LLVM is current at version 8.) As far as dependencies, it would add none
> > whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, no
> > more than any other target FPC supports. Just think of it as yet another
> > assembler format.)
>
> It's more than just an additional assembler format as the infrastructure
> inside the compiler shows. Also there are the problems that Jonas
> mentioned.
> In my opinion that time is better spent optimizing our own code generator.
>
MSElang uses the approach to write LLVM bitcode directly without a temporary 
LLVM assembler text. Building the needed LLVM lists and tracking the ssa 
values is not trivial. IMO the worst aspect of LLVM is its slowness but the 
resulting code is awesome.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Sven Barth via fpc-devel
Ben Grasset  schrieb am Sa., 27. Okt. 2018, 05:46:

> On Thu, Oct 25, 2018 at 3:06 AM Sven Barth via fpc-devel <
> fpc-devel@lists.freepascal.org> wrote:
>
>> Simon Kissel  schrieb am Do., 25. Okt.
>> 2018, 08:54:
>>
>>> - Complete the LLVM branch of FPC. It looks like Jonas has stopped
>>>   working on it two years ago, which is a pity.
>>>
>>
>> I personally don't think that LLVM is the way to go. It's essentially a
>> moving target and adds an unnecessary dependency to the compiler.
>>
>
> Not really. The IR format has been pretty stable since version 3.9 or so
> (LLVM is current at version 8.) As far as dependencies, it would add none
> whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, no more
> than any other target FPC supports. Just think of it as yet another
> assembler format.)
>

It's more than just an additional assembler format as the infrastructure
inside the compiler shows. Also there are the problems that Jonas
mentioned.
In my opinion that time is better spent optimizing our own code generator.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel