Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-12-04 Thread Simon Kissel
Hi Florian,

> Do you compile with -Aas? The internal assemblers do not support TLS yet, 
> this is WIP.

Ah wow! -Aas does indeed help. Both the assembler errors and
the internal error are gone, both in Linux i386 and ARM.

And the created binaries even work. Nice! Thank you!

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-12-04 Thread Florian Klämpfl
Am 04.12.2018 um 02:16 schrieb Simon Kissel:
> Hi Florian,
>
>
> 
> we are currently to try to do some real-life benchmarks with our
> products, however with rev. 40346 compilation fails with the two following
> showstoppers:

Do you compile with -Aas? The internal assemblers do not support TLS yet, this 
is WIP.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-12-03 Thread Simon Kissel
Hi Florian,

we are currently to try to do some real-life benchmarks with our
products, however with rev. 40346 compilation fails with the two following
showstoppers:

1.)

The assembler parser appears to be broken - the following very valid
opcodes get rejected:

SBMath.pas(1932,9) Error: Asm: [cmp imm32,imm8s] invalid combination of opcode 
and operands
SBMath.pas(1934,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1939,9) Error: Asm: [cmp imm32,imm8s] invalid combination of opcode 
and operands
SBMath.pas(1941,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1946,9) Error: Asm: [cmp imm32,imm8s] invalid combination of opcode 
and operands
SBMath.pas(1948,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1953,3) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1954,3) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1955,3) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1972,3) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1976,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1981,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands
SBMath.pas(1982,5) Error: Asm: [lea reg32,imm32] invalid combination of opcode 
and operands

(-Tlinux -XPi386-linux- -CpPENTIUMM -O2 -OoCSE -CfSSE2 -Ooorderfields)

2.)

On ARM, I get Internal error 200603253 at various places:

SBMath.pas(1989,1) Fatal: Internal error 200603253
(sadly the line numbers are complete off for unknown reasons, so I can
not find the actual source line causing this)

But also happens at various other places. Most easy to reproduce by
compiling PasZLib-SG (e.g. https://github.com/Soldat/PasZlib-SG).


Any clues?

BR,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-27 Thread Simon Kissel
Hi guys,

that platform is not relevant for us, but to provide some motivational
boost:

CrossFPC 4.14 beta Win64:
C:\Users\BeRo\Documents\Projects\Tests\threadingtest0\aa>vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4
...
Time: 5021ms = 9460267 pkts/s = 14363 MB/s

vs. Delphi 10.3 Win64:
C:\Users\BeRo\Documents\Projects\Tests\threadingtest0\aa>vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4
..
Time: 5086ms = 4915454 pkts/s = 7462 MB/s

:)

Best regards,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-25 Thread Jonas Maebe

On 27/10/18 18:21, Ben Grasset wrote:
LLC (at least now) statically links the necessary parts of LLVM and 
works independently of Opt, with a simpler set of command line options 
(it just has overall O1, O2, and O3 flags.)


Are you certain llc now incorporates the functionality of opt? From what 
I can tell, llc still only performs codegen optimisations and no complex 
IR transformations. It has always had the -O1/-O2/-O3 flags, but those 
always have only affected the codegen.


All information I can find via google also suggest you need to use 
either clang or both opt and llc to get everything (e.g. 
https://lists.llvm.org/pipermail/llvm-dev/2018-January/120226.html ).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-25 Thread Florian Klämpfl
Am 23.11.2018 um 21:07 schrieb Simon Kissel:
> problem is distributed all across the code. However, there
> is something sticking out, being at the very top of pretty
> much all multi-threaded code we compile:
> 
> fpc_pushexceptaddr & CRelocateThreadVar.
> 

This, however, does the benchmark not reflect.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-24 Thread Tomas Hajny
On Sat, November 24, 2018 13:43, Adriaan van Os wrote:
> Simon Kissel wrote:


Hello all,

> In case you are just trolling, I recommend reading
> a book on programming, learning to write better code.

Could we stop this, please? This is neither on topic, nor very polite,
especially after Simon explained that he already spent effort on improving
his code, but also referenced comparison to another compiler / RTL doing
better job than FPC in that particular area. Simon's original sentence
about FPC weakness might have been somewhat sharper than necessary, but
these follow-ups are useless. If you, Adrian, believe to have discovered
inefficiency in code posted by Adrian, feel free to point it out
explicitly.

Thanks

Tomas
(one of FPC mailing list moderators)


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-24 Thread Adriaan van Os

Simon Kissel wrote:

Hi Adriaan,

In case you aren't just trolling and the subject really is of
interest to you, I would recommend reading the discussion
thread in full. That works much better than treating this
like a write-only system.


In case you are just trolling, I recommend reading a book on programming, learning to write better 
code.


Adriaan van Os

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Florian Klämpfl
Am 23.11.2018 um 21:07 schrieb Simon Kissel:
> own code, it won't get much better than what it is today,
> and that Kylix producing faster code does not compensate it

Well, to be fair, there is a lot of code out there where FPC is faster. 
Nevertheless, FPC's code can be still improved.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Simon Kissel
Hi Adriaan,

In case you aren't just trolling and the subject really is of
interest to you, I would recommend reading the discussion
thread in full. That works much better than treating this
like a write-only system.

> You didn't answer any of my questions. The goal is to get the
> code faster, isn't it.

No, the goal is not to get any specific code faster. The goal
is to have the compiler and/or RTL improved so that all code
compiled benefits, and that execution speed in general gets on
par with the 15 years old Kylix/Delphi 7 compilers.

And yes, of course we are profiling our code for years, and we
know what we are doing and talking about. Our code sadly does
not have any bottlenecks in the sense of a small number of
functions eating most of the CPU, the load is pretty evenly
distributed across all of the functions. This means that the
problem is distributed all across the code. However, there
is something sticking out, being at the very top of pretty
much all multi-threaded code we compile:

fpc_pushexceptaddr & CRelocateThreadVar.

Besides this, not everything can be uncovered by profiling,
and that part is nothing that FPC can change: On one of
the ARM platforms we use every context switch results in a
CPU cache flush, so simply by having more threads *all* of
them will become slower.

The benchmark code as our real-life code is able to utilize
~99% of the CPU, so no, it's also not a matter of thread
synchronization (we aren't spinlocking).

The commercial reason behind putting out a 15k bounty is that
no matter how much more money I invest into optimizing my
own code, it won't get much better than what it is today,
and that Kylix producing faster code does not compensate it
not supporting any of the nice-to-have language features that
FPC has today.

Simon




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Simon Kissel
Hi Florian,

> Actually, most of the improvements so far are no related to
> threading. In particular r40339 helped a lot, it was a bug
> fix: the compiler assumed that a certain sub expression was written
> while it not was and this prevented CSE.

Even better, that means there is still gold to be uncovered :)

In our case the bottleneck very clearly appears to be that
every call to fpc_pushexceptaddr/fpc_popaddrstack causes a
call to CRelocateThreadVar, which causes a call to
pthread_getspecific.

We do create our ARM production builds with {$IMPLICITEXCEPTIONS OFF}
to get acceptable speed, else it would be completely unbearable.

BR,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Florian Klämpfl
Am 23.11.2018 um 14:36 schrieb Simon Kissel:
> Hi Adriaan,
> 
>> I find the phrase. "FPC's terrible multi-threading performance"
>> unjust.
> 
> Well, see the complete thread to better understand what this
> is about, and what progress is being made. So far a 20%
> improvement has been made, which kinda is like a proof that
> there was something to improve ;)
> 
>>  When I do multi-threading
>> with FPC, I get a near N speed improvement (on i386 and x86_64) where N is 
>> the number of cores,
>> including hyper-threaded cores 
> 
> This isn't about FPC's code not scaling with N cores, it does.
> It is about it being slow as soon as threads are used *at all*,
> due to TLS stuff and exception handling. It's slow in a linear
> fashion, so to say...
> 

Actually, most of the improvements so far are no related to threading. In 
particular r40339 helped a lot, it was a bug
fix: the compiler assumed that a certain sub expression was written while it 
not was and this prevented CSE.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Adriaan van Os

Simon Kissel wrote:


This isn't about FPC's code not scaling with N cores, it does.
It is about it being slow as soon as threads are used *at all*,


N cores being near N times faster than "not using threads at all".


due to TLS stuff and exception handling. It's slow in a linear
fashion, so to say...


You didn't answer any of my questions. The goal is to get the code faster, isn't it. Or are you 
writing an academic thesis on compilers ?


Regards,

Adriaan van Os
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Simon Kissel
Hi Adriaan,

> I find the phrase. "FPC's terrible multi-threading performance"
> unjust.

Well, see the complete thread to better understand what this
is about, and what progress is being made. So far a 20%
improvement has been made, which kinda is like a proof that
there was something to improve ;)

>  When I do multi-threading
> with FPC, I get a near N speed improvement (on i386 and x86_64) where N is 
> the number of cores,
> including hyper-threaded cores 

This isn't about FPC's code not scaling with N cores, it does.
It is about it being slow as soon as threads are used *at all*,
due to TLS stuff and exception handling. It's slow in a linear
fashion, so to say...

Best regards,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Sven Barth via fpc-devel
Am Fr., 23. Nov. 2018, 12:15 hat Adriaan van Os 
geschrieben:

> Simon Kissel wrote:
>
> > We know about a couple of bottlenecks (fpc_pushexceptaddr /
> > RelocateThreadVar etc) which explain FPC's terrible multi-threading
> > performance, but in general, FPC's code generator really is quite
> > a mess, which we learned the hard way a couple of years when we
> > did optimization work on the ARM target.
>
> I find the phrase. "FPC's terrible multi-threading performance" unjust.
> When I do multi-threading
> with FPC, I get a near N speed improvement (on i386 and x86_64) where N is
> the number of cores,
> including hyper-threaded cores 
>
> What about taking another way, having a precise look at the source code ?
> Did you profile it ? What
> sort of work does the code do ? How are the threads synchronized ? What
> data structures are used ?
>
> I don't take "the compiler is so bad" without an answer to these questions.
>

Simon wrote that the same code performs better when compiled with Kylix, so
there definitely are things that can be done better by FPC and as Florian's
work on TLS variables showed indeed *do* make FPC perform better. I suspect
a similar improvement with DWARF exceptions as the setjmp/longjmp based
approach *is* more expensive for the case when no exception occures
compared to the case of marking protected code in the meta data as DWARF
and SEH64 do.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-23 Thread Adriaan van Os

Simon Kissel wrote:


We know about a couple of bottlenecks (fpc_pushexceptaddr /
RelocateThreadVar etc) which explain FPC's terrible multi-threading
performance, but in general, FPC's code generator really is quite
a mess, which we learned the hard way a couple of years when we
did optimization work on the ARM target.


I find the phrase. "FPC's terrible multi-threading performance" unjust. When I do multi-threading 
with FPC, I get a near N speed improvement (on i386 and x86_64) where N is the number of cores, 
including hyper-threaded cores 


What about taking another way, having a precise look at the source code ? Did you profile it ? What 
sort of work does the code do ? How are the threads synchronized ? What data structures are used ?


I don't take "the compiler is so bad" without an answer to these questions.

Regards,

Adriaan van Os

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-20 Thread Simon Kissel
Hi Florian,

> The changes help also on arm and arm can be build using the same
> command line, however, at least on a Raspi3B+ the
> improvement is less significant than on i386 (still the old cache
> flush (?) issue which is outside of the scope of FPC?).

Actually the changes are significant:

Before:

01-00512-00-00016:/opt/viprinet/bin # ./vipribenchmemcache_nodeps_crossfpc
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4
...
Time: 5212ms = 287797 pkts/s = 430 MB/s

After:

01-00512-00-00016:/opt/viprinet/bin # ./vipribenchmemcache_nodeps_armv5te_fpc
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4

Time: 5893ms = 339386 pkts/s = 507 MB/s

BR,

Simon

-- 
Nerdherrschaft GmbH
Mainzer Str. 40
55411 Bingen am Rhein
Germany

Phone:+49-6721-9492994
Fax:  +49-6721-9492996

simon.kis...@nerdherrschaft.com
http://www.nerdherrschaft.com

Registered office/Sitz der Gesellschaft: Bingen am Rhein, Germany
CEO/Geschäftsführer: Simon Kissel
Commercial register/Handelsregister: Amtsgericht Mainz HRB43337

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-18 Thread Simon Kissel
Hi Florian,

> Compile the benchmark with (where fpcnew is the newly build fpc):

Bero has confirmed, works for us as well. This rocks!

> The changes help also on arm and arm can be build using the same
> command line, however, at least on a Raspi3B+ the
> improvement is less significant than on i386 (still the old cache
> flush (?) issue which is outside of the scope of FPC?).

We'll try that next. And yes, on the bloody Kirkwood CPU which we use
a context switch will result in a CPU cache flush.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-18 Thread Florian Klämpfl
Am 17.11.2018 um 22:28 schrieb Florian Klämpfl:
> Am 17.11.2018 um 22:10 schrieb Simon Kissel:
>> Hi Florian,
>>
>>> With some compiler tuning and a few tricks (two changes to the code
>>> and hand-simulated peephole optimizations, but I
>>> think these tricks can also the compiler do):
>>
>> Nice - what changes did you do?
>>
>> Changing the code of course is cheating, but there might be something
>> to learn for us, here.
> 
> I prevented the compiler to put certain variables in registers by taking 
> their address :) But I did so only to test if
> this helps and for i386 this helps as the decision which variables go into 
> registers is not that easy, but see below.
> 
>>
>> Would be great if whatever trick you did could be part of the
>> compiler.
> 
> Meanwhile the compiler can do it (not yet committed). Same VM as yesterday, 
> all rates are a little bit lower, not sure
> why (probably to many VMs open :)), but this applies to all three executables.
> 
> florian@ubuntu32:~$ ./vipribenchmemcache_nodeps

With rev. 40346 I have committed my last changes. As the code is still 
experimental, it needs to be activated by the
command line when building FPC:

make clean all "OPT=-Aas -dtls_threadvars -O4 -dSPILLING_NEW"

(add -Cp... -Op... options if the target system is known)

Compile the benchmark with (where fpcnew is the newly build fpc):

fpcnew -O4 -Sd -FWvipri.wpo -OWDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr
fpcnew -O4 -Sd -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr

The changes help also on arm and arm can be build using the same command line, 
however, at least on a Raspi3B+ the
improvement is less significant than on i386 (still the old cache flush (?) 
issue which is outside of the scope of FPC?).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-17 Thread Jonas Maebe

On 17/11/18 22:15, Simon Kissel wrote:


How far of a way is that? Sadly we'll have to support some 32 bit
platforms for a couple more years...


I really don't know. It's not something I have looked into, but I'm 
afraid it will be messy.



And how far away is getting this to run on Linux?


Getting it to work on Linux/x86-64 should be fairly easy. Other 64 bit 
platforms (both architectures and OSes) should not be difficult either.



And: Any language features or RTL stuff that does not yet work
with FPC/LLVM?


Only the ones mentioned before:
* global variables are currently not treated as volatile by the LLVM 
code generator, so if you use them to share values between threads with 
explicit synchronisation, that will fail (as Sven explained)


* hardware exceptions (like segmentation faults, fpu exceptions and bus 
errors) because LLVM does not model them. I could try to work around 
this by making all accesses to all variables potentially referenced in 
try/except blocks "volatile" (both in the blocks and afterwards), but 
that would prevent many optimizations and it would not even guarantee to 
solve all potential problems (since the LLVM code generator would still 
assume that if those instructions trap, all behaviour afterwards is 
undefined and hence it can optimize as if those instructions will never 
trap; marking them as volatile won't change that: even if in many cases 
the end result may be the same, it's not guaranteed).


  The only work done on LLVM in this regard is some experimental 
support for FPU exceptions in recent versions 
(https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics), 
but I have not yet added support for that yet (nor do I know how well it 
works, or on which platforms it is supported).



Bonus question: I don't know on which layer threads and exceptions are
handled with LLVM - will you be able to make use of the improvements to
TLS and Exception handling, in other words, can we combine the best
of both worlds?


The only improvements to exception handling until now have been for the 
LLVM target. The code I already submitted is generic though, and can be 
used by non-LLVM targets as well.


TLS-based threadvar support needs to be implemented separately for LLVM, 
but that should be fairly easy (it's just another way of declaring the 
variable in the LLVM IR.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-17 Thread Florian Klämpfl
Am 17.11.2018 um 22:10 schrieb Simon Kissel:
> Hi Florian,
> 
>> With some compiler tuning and a few tricks (two changes to the code
>> and hand-simulated peephole optimizations, but I
>> think these tricks can also the compiler do):
> 
> Nice - what changes did you do?
> 
> Changing the code of course is cheating, but there might be something
> to learn for us, here.

I prevented the compiler to put certain variables in registers by taking their 
address :) But I did so only to test if
this helps and for i386 this helps as the decision which variables go into 
registers is not that easy, but see below.

> 
> Would be great if whatever trick you did could be part of the
> compiler.

Meanwhile the compiler can do it (not yet committed). Same VM as yesterday, all 
rates are a little bit lower, not sure
why (probably to many VMs open :)), but this applies to all three executables.

florian@ubuntu32:~$ ./vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000,
NumberOfSynchroThreads=4
...
Time: 5022ms = 8661888 pkts/s = 12952 MB/s
florian@ubuntu32:~$ ./vipribenchmemcache_nodeps_kylix
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000,
NumberOfSynchroThreads=4
..
Time: 5040ms = 8531746 pkts/s = 12758 MB/s
florian@ubuntu32:~$ ./vipribenchmemcache_nodeps_fpc
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000,
NumberOfSynchroThreads=4
.
Time: 5058ms = 6030051 pkts/s = 9017 MB/s
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-17 Thread Simon Kissel
Hi Jonas,

Nice results!

> Since I only have a preliminary llvm version (with Dwarf EH) running on
> macOS, I can't provide a direct Kylix comparison. The versions below are
> both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way
> off.

How far of a way is that? Sadly we'll have to support some 32 bit
platforms for a couple more years...

And how far away is getting this to run on Linux?

And: Any language features or RTL stuff that does not yet work
with FPC/LLVM?

Bonus question: I don't know on which layer threads and exceptions are
handled with LLVM - will you be able to make use of the improvements to
TLS and Exception handling, in other words, can we combine the best
of both worlds?

BR,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-17 Thread Simon Kissel
Hi Florian,

> With some compiler tuning and a few tricks (two changes to the code
> and hand-simulated peephole optimizations, but I
> think these tricks can also the compiler do):

Nice - what changes did you do?

Changing the code of course is cheating, but there might be something
to learn for us, here.

Would be great if whatever trick you did could be part of the
compiler.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-17 Thread Jonas Maebe

On 16/11/18 23:41, Florian Klämpfl wrote:

diff --git a/compiler/nmem.pas b/compiler/nmem.pas
index d5c1d85e8f..52add1fd81 100644
--- a/compiler/nmem.pas
+++ b/compiler/nmem.pas
@@ -1176,7 +1176,7 @@ implementation
begin
  include(flags,nf_write);
  { see comment in tsubscriptnode.mark_write }
-if not(is_implicit_pointer_object_type(left.resultdef)) then
+if not(is_implicit_array_pointer(left.resultdef)) then
left.mark_write;
end;


The compiler crashes when I try to compile the program with that patch 
applied (I did not do a make cycle with that patch, just applied it, 
recompiled the compiler, and then tried to compile the test program with 
the new compiler.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-16 Thread Florian Klämpfl
Am 16.11.2018 um 23:41 schrieb Florian Klämpfl:
> Am 16.11.2018 um 23:36 schrieb Jonas Maebe:
>> On 16/11/18 22:44, Florian Klämpfl wrote:
>>> With some compiler tuning and a few tricks (two changes to the code and 
>>> hand-simulated peephole optimizations, but I
>>> think these tricks can also the compiler do):
>>
>> You can improve performance further by devirtualising all method calls using 
>> wpo. First compile it with -FWvipri.wpo
>> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at 
>> least on my machine it gives a small boost,
>> and makes the results also more stable).
>>
>> Since I only have a preliminary llvm version (with Dwarf EH) running on 
>> macOS, I can't provide a direct Kylix
>> comparison. The versions below are both x86-64. As mentioned before, a 32 
>> bit FPC/LLVM is still quite a way off.
>>
>> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:
>>
>> $ time ./vipribenchmemcache_nodeps
>> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
>> NumberOfChannels=6, BufferPackets=5000,
>> NumberOfSynchroThreads=4
>> .
>> Time: 5016ms = 9669059 pkts/s = 14680 MB/s
>>
>> real    0m5.137s
>> user    0m5.042s
>> sys    0m0.017s
>>
>> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm IR) 
>> and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no
>> LLVM link-time optimization):
>>
>> $ time ./vipribenchmemcache_nodeps_llvm
>> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
>> NumberOfChannels=6, BufferPackets=5000,
>> NumberOfSynchroThreads=4
>> .
>> Time: 5018ms = 11259466 pkts/s = 17094 MB/s
>>
>> real    0m5.161s
>> user    0m5.060s
>> sys    0m0.017s
>>
> 
> Can you test with FPC 3.1.1 native, -O4 and the following patch:
> 
>  compiler/nmem.pas | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/compiler/nmem.pas b/compiler/nmem.pas
> index d5c1d85e8f..52add1fd81 100644
> --- a/compiler/nmem.pas
> +++ b/compiler/nmem.pas
> @@ -1176,7 +1176,7 @@ implementation
>begin
>  include(flags,nf_write);
>  { see comment in tsubscriptnode.mark_write }
> -if not(is_implicit_pointer_object_type(left.resultdef)) then
> +if not(is_implicit_array_pointer(left.resultdef)) then
>left.mark_write;
>end;
> 
> ?

Hmmm, needs a few more of my changes to make work, though it should work if 
used only with the benchmark.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-16 Thread Florian Klämpfl
Am 16.11.2018 um 23:36 schrieb Jonas Maebe:
> On 16/11/18 22:44, Florian Klämpfl wrote:
>> With some compiler tuning and a few tricks (two changes to the code and 
>> hand-simulated peephole optimizations, but I
>> think these tricks can also the compiler do):
> 
> You can improve performance further by devirtualising all method calls using 
> wpo. First compile it with -FWvipri.wpo
> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at 
> least on my machine it gives a small boost,
> and makes the results also more stable).
> 
> Since I only have a preliminary llvm version (with Dwarf EH) running on 
> macOS, I can't provide a direct Kylix
> comparison. The versions below are both x86-64. As mentioned before, a 32 bit 
> FPC/LLVM is still quite a way off.
> 
> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:
> 
> $ time ./vipribenchmemcache_nodeps
> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
> NumberOfChannels=6, BufferPackets=5000,
> NumberOfSynchroThreads=4
> .
> Time: 5016ms = 9669059 pkts/s = 14680 MB/s
> 
> real    0m5.137s
> user    0m5.042s
> sys    0m0.017s
> 
> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm IR) 
> and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no
> LLVM link-time optimization):
> 
> $ time ./vipribenchmemcache_nodeps_llvm
> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
> NumberOfChannels=6, BufferPackets=5000,
> NumberOfSynchroThreads=4
> .
> Time: 5018ms = 11259466 pkts/s = 17094 MB/s
> 
> real    0m5.161s
> user    0m5.060s
> sys    0m0.017s
> 

Can you test with FPC 3.1.1 native, -O4 and the following patch:

 compiler/nmem.pas | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compiler/nmem.pas b/compiler/nmem.pas
index d5c1d85e8f..52add1fd81 100644
--- a/compiler/nmem.pas
+++ b/compiler/nmem.pas
@@ -1176,7 +1176,7 @@ implementation
   begin
 include(flags,nf_write);
 { see comment in tsubscriptnode.mark_write }
-if not(is_implicit_pointer_object_type(left.resultdef)) then
+if not(is_implicit_array_pointer(left.resultdef)) then
   left.mark_write;
   end;

?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-16 Thread Jonas Maebe

On 16/11/18 22:44, Florian Klämpfl wrote:

With some compiler tuning and a few tricks (two changes to the code and 
hand-simulated peephole optimizations, but I
think these tricks can also the compiler do):


You can improve performance further by devirtualising all method calls 
using wpo. First compile it with -FWvipri.wpo -OWDEVIRTCALLS,OPTVMTS and 
next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at least on my machine it 
gives a small boost, and makes the results also more stable).


Since I only have a preliminary llvm version (with Dwarf EH) running on 
macOS, I can't provide a direct Kylix comparison. The versions below are 
both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way 
off.


* FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:

$ time ./vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4

.
Time: 5016ms = 9669059 pkts/s = 14680 MB/s

real0m5.137s
user0m5.042s
sys 0m0.017s

FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm 
IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no LLVM link-time 
optimization):


$ time ./vipribenchmemcache_nodeps_llvm
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4

.
Time: 5018ms = 11259466 pkts/s = 17094 MB/s

real0m5.161s
user0m5.060s
sys 0m0.017s


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-16 Thread Florian Klämpfl
Am 16.11.2018 um 20:22 schrieb Simon Kissel:
> Hi guys,
> 
> turns out that in our real-life scenario there sadly aren't big
> improvements yet. Might be due to the exception handling, but
> we haven't profiled it yet. As said we have seen better improvements
> in simpler benchmark code - but this benchmark here is what
> really matters for us.
> 
> Please find the benchmark here - the ZIP includes a Kylix-built
> binary.
> 
> https://share.nerdherrschaft.net/f/2ac772f0327e4840a533/?dl=1
> 
> Here are some results from a Dualcore i7 with 2 cores and 4 HT,
> 32 bit:
> 
> Kylix:
> Time: 5015ms = 9770688 pkts/s = 14610 MB/s
> ./vipribenchmemcache_nodeps_kylix  5.06s user 0.01s system 99% cpu 5.119 total
> 
> FPC 3.0.4:
> Time: 5052ms = 8016627 pkts/s = 11987 MB/s
> ./vipribenchmemcache  5.07s user 0.01s system 97% cpu 5.206 total
> 
> FPC 3.3.1 trunk (SVN Rev 40300):
> Time: 5040ms = 8035714 pkts/s = 12016 MB/s
> ./vipribenchmemcache_nodeps  5.07s user 0.02s system 97% cpu 5.207 total
> 
> Benchmark results for ARM will follow.

With some compiler tuning and a few tricks (two changes to the code and 
hand-simulated peephole optimizations, but I
think these tricks can also the compiler do):

florian@ubuntu32:~$ ./vipribench
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000,
NumberOfSynchroThreads=4
..
Time: 5005ms = 9390609 pkts/s = 14042 MB/s
florian@ubuntu32:~$ ./vipribenchmemcache_nodeps_kylix
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, 
NumberOfChannels=6, BufferPackets=5000,
NumberOfSynchroThreads=4
.
Time: 5018ms = 9266640 pkts/s = 13856 MB/s

;)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-16 Thread Simon Kissel
Hi guys,

turns out that in our real-life scenario there sadly aren't big
improvements yet. Might be due to the exception handling, but
we haven't profiled it yet. As said we have seen better improvements
in simpler benchmark code - but this benchmark here is what
really matters for us.

Please find the benchmark here - the ZIP includes a Kylix-built
binary.

https://share.nerdherrschaft.net/f/2ac772f0327e4840a533/?dl=1

Here are some results from a Dualcore i7 with 2 cores and 4 HT,
32 bit:

Kylix:
Time: 5015ms = 9770688 pkts/s = 14610 MB/s
./vipribenchmemcache_nodeps_kylix  5.06s user 0.01s system 99% cpu 5.119 total

FPC 3.0.4:
Time: 5052ms = 8016627 pkts/s = 11987 MB/s
./vipribenchmemcache  5.07s user 0.01s system 97% cpu 5.206 total

FPC 3.3.1 trunk (SVN Rev 40300):
Time: 5040ms = 8035714 pkts/s = 12016 MB/s
./vipribenchmemcache_nodeps  5.07s user 0.02s system 97% cpu 5.207 total

Benchmark results for ARM will follow.

Cheers,

Simon


Thursday, November 15, 2018, 10:31:55 PM, you wrote:

> Am 14.11.2018 um 14:46 schrieb Simon Kissel:
>> 
>> We have not yet tested this on ARM (does it work on ARM?).
>>

> After r40321, arm-linux works as well.
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



Best regards,

Simon Kissel

-- 
Nerdherrschaft GmbH
Mainzer Str. 40
55411 Bingen am Rhein
Germany

Phone:+49-6721-9492994
Fax:  +49-6721-9492996

simon.kis...@nerdherrschaft.com
http://www.nerdherrschaft.com

Registered office/Sitz der Gesellschaft: Bingen am Rhein, Germany
CEO/Geschäftsführer: Simon Kissel
Commercial register/Handelsregister: Amtsgericht Mainz HRB43337

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-15 Thread Florian Klämpfl
Am 14.11.2018 um 14:46 schrieb Simon Kissel:
> 
> We have not yet tested this on ARM (does it work on ARM?).
>

After r40321, arm-linux works as well.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-14 Thread Sven Barth via fpc-devel
Am Mi., 14. Nov. 2018, 14:46 hat Simon Kissel <
simon.kis...@nerdherrschaft.com> geschrieben:

> Hi Florian,
>
> you are a hero. In a very artificial benchmark which just consists
> of threads and exception handlers, a 32 bit Linux executable now
> is *twice as fast*!
>

Up to now only thread variables are improved, the exception handling not
yet.


> In a real-life scenario we are "only" seeing an improvement of about
> 10%. But really, this is huge progress. I think everyone will
> benefit from these improvements.
>
> We have not yet tested this on ARM (does it work on ARM?).
>

Currently it's i386-linux only.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-14 Thread Simon Kissel
Hi Florian,

you are a hero. In a very artificial benchmark which just consists
of threads and exception handlers, a 32 bit Linux executable now
is *twice as fast*!

In a real-life scenario we are "only" seeing an improvement of about
10%. But really, this is huge progress. I think everyone will
benefit from these improvements.

We have not yet tested this on ARM (does it work on ARM?).

Bero will do more testing in the next couple of days and report
back.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-11 Thread Florian Klämpfl
Am 07.11.2018 um 23:00 schrieb Florian Klämpfl:
> - threadvars in FPC built libraries do not work yet

This is fixed with r40281. It requires though that all units being part of a 
library are compiled with -fPIC.

Now waiting for Simon, if he reports any improvements ...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-11-04 Thread Florian Klämpfl
Am 25.10.2018 um 20:13 schrieb Florian Klämpfl:

In case somebody wonders: as I started years ago on tls-based threadvars, I 
decided first to work on this one first and
try to bring this code into a commitable state.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Jonas Maebe

On 28/10/18 13:00, Simon Kissel wrote:

Hi Jonas,
[exceptions for invalid memory accesses]

have been working a bit on it since then). This is not something that
can be changed/fixed in FPC, and is quite different from how FPC's
current code generator works (I don't know how Embarcardero deals with
it in their LLVM-based code generator).


Someone could do some reverse engineering to learn more
about how they have solved the problem (unlike actually copying
code I don't see any legal or ethical problem in learning from
reversing).


Well, maybe they didn't... Optimizations based on undefined behaviour 
are a major feature of LLVM.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Jonas Maebe

On 28/10/18 19:14, Jonas Maebe wrote:
I've committed it in the dwarf_eh branch. Unfortunately, the an x86-64 
compiler compiled with optimizations enabled crashes while compiling 
this code (probably due to https://bugs.freepascal.org/view.php?id=34385 
:) )


Actually, it was to a bug in my code! Fixed.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Jonas Maebe

On 25/10/18 21:33, Sven Barth via fpc-devel wrote:
As you already started working on translating that part of libgcc, would 
you please provide what you have so far? :)


I've committed it in the dwarf_eh branch. Unfortunately, the an x86-64 
compiler compiled with optimizations enabled crashes while compiling 
this code (probably due to https://bugs.freepascal.org/view.php?id=34385 
:) )



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Sven Barth via fpc-devel

Am 28.10.2018 um 13:00 schrieb Simon Kissel:



Additionally, in the current FPC code generator global variables behave
mostly as volatile variables. With LLVM, that won't be the case (unless
we mark all of their accesses as volatile, but that would obviously
inhibit LLVM optimizations). This may break some multithreaded code that
currently works, and would probably require the introduction of a
volatile() operatator (similar to the unaligned() one). On the other
hand, I already added support for tracking the volatile state of
references in the past, so that should be easy to do.

I have to admit my knowledge on this is very limited. We do
use global variables unsynchronized in Multi-Threaded code,
but only in a single-writer multiple-reader scenarios, in these
cases we don't have any expectations for the new value to be
available "immediately". Obviously the compiler can not know at
what point during runtime the thread gets scheduled, but what
are the rules (if any) on "how long" it takes for a volatile
variables content to get "flushed"? Is there some scoping involved
like "on return of current function/method"?
What volatile means in this context is that the compiler always fetches 
the global value anew when it is accessed instead of e.g. caching it in 
a register which could be done if global variables would not be 
considered as volatile.

Unlike the crap that Embarcadero has been polluting the language
with in recent years, I think that adding support for volatile()
to the language would make a lot of sense - however potentially
turning this around so that the unmodified default stays
volatile, and an implementing an erm.. "non-volatile" modifier
instead, so not to break existing code.
It seems that Delphi changed the default behavior in their NextGen 
compiler (probably due to the same reasons that Jonas stated for LLVM) 
as they introduced a "[volatile]" compiler attribute to decorate global 
variables and fields so that the compiler handles them as volatile...


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Florian,

[DWARF-EH]

> This is something I would like to work for years on already. So
> maybe its now a good opportunity to start with it.

*hugs*

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Sven,

> Borland's Fastcall is more famously known as the Register calling
> convention aka the default calling convention in Object Pascal. As
> you admitted in your mail further down you have quite some assembly
> code and as such you rely on the calling convention for parameter
> passing. Here register differs significantly from cdecl or stdcall.
> Thus not supporting the calling convention *will break* your code. 

My expectations are not that no (low-level library) code may be broken.
There are much bigger IFDEF hells than adapting assembler code
boiler plates to handle other calling conventions.

Just throwing a compiler error if an assembler procedure is not
decorated with a calling convention supported by the LLVM branch
would be just fine to me.

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Jonas,

>> - Complete the LLVM branch of FPC. It looks like Jonas has stopped
>>working on it two years ago, which is a pity.

> I didn't stop working on it, but I didn't make real progress anymore 
> either.

So, would you be interested in making progress again? :)

> a) exception handling in general: indeed needs DWARF-EH support in the
> RTL, and also support for the LLVM exception handling intrinsics in the
> code generator. I've worked on and off on this and have some local 
> patches, but it's not complete

So maybe someone else could work on DWARF exceptions, which then
would enable you to progress on LLVM?

> have been working a bit on it since then). This is not something that
> can be changed/fixed in FPC, and is quite different from how FPC's 
> current code generator works (I don't know how Embarcardero deals with
> it in their LLVM-based code generator).

Someone could do some reverse engineering to learn more
about how they have solved the problem (unlike actually copying
code I don't see any legal or ethical problem in learning from
reversing).

If the lone Embarcardero russian Java-developer-turned-compiler
engineer can do it, you guys sure can, too ;)

> Additionally, in the current FPC code generator global variables behave
> mostly as volatile variables. With LLVM, that won't be the case (unless
> we mark all of their accesses as volatile, but that would obviously 
> inhibit LLVM optimizations). This may break some multithreaded code that
> currently works, and would probably require the introduction of a 
> volatile() operatator (similar to the unaligned() one). On the other 
> hand, I already added support for tracking the volatile state of 
> references in the past, so that should be easy to do.

I have to admit my knowledge on this is very limited. We do
use global variables unsynchronized in Multi-Threaded code,
but only in a single-writer multiple-reader scenarios, in these
cases we don't have any expectations for the new value to be
available "immediately". Obviously the compiler can not know at
what point during runtime the thread gets scheduled, but what
are the rules (if any) on "how long" it takes for a volatile
variables content to get "flushed"? Is there some scoping involved
like "on return of current function/method"?

Unlike the crap that Embarcadero has been polluting the language
with in recent years, I think that adding support for volatile()
to the language would make a lot of sense - however potentially
turning this around so that the unmodified default stays
volatile, and an implementing an erm.. "non-volatile" modifier
instead, so not to break existing code.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Sven Barth via fpc-devel
Simon Kissel  schrieb am So., 28. Okt.
2018, 12:46:

> Hi Florian,
>
> > But there is another pretty simple optimization opportunity in this
> > area: make the FPC heap manager capable of using
> > os-based memory reallocation. Kernel-based memory reallocation of
> > large blocks has the big advantage that the OS can
> > move the memory contents only by re-mapping memory pages.
>
> I fully agree that the memory manager for obvious reasons is
> an important subject, especially for heavily multithreaded code,
> and even more for any string stuff in such code. I haven't
> informed myself enough to judge how well the FPC memory manager
> behaves in this regard, and if it might make sense to try
> to use an alternative memory manager with FPC for Linux.
>
> However, being aware of that, we are avoiding reallocations
> wherever we can and instantiate pretty much every thing using
> own memory caches.
>

I think Florian was talking about the memory management inside the compiler
樂

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Sven Barth via fpc-devel
Simon Kissel  schrieb am So., 28. Okt.
2018, 12:39:

> Hi Ben,
>
> >  There's one more problem I forgot to mention in my first post, and it is
> >  probably a deal breaker for the original bounty: LLVM does not support
> >  Borland's fastcall calling convention for i386. So you would need to add
> >  support for Borland fastcall on i386 to LLVM if it has to support
> >  existing i386 inline assembly routines written for FPC/Delphi.
>
> I don't see how not supporting fastcall would be a deal-breaker?
>

You mean Jonas here I take it, not Ben.

Borland's Fastcall is more famously known as the Register calling
convention aka the default calling convention in Object Pascal. As you
admitted in your mail further down you have quite some assembly code and as
such you rely on the calling convention for parameter passing. Here
register differs significantly from cdecl or stdcall. Thus not supporting
the calling convention *will break* your code.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Sven,

> The thing is that we can't enable or disable a feature based on
> whether a program links third party libraries or a unit is included
> in a library or not, cause we might need to work with precompiled
> units. So either you'll need to enable this feature for a locally
> build FPC amd be aware that you can't really create libraries then
> or the feature needs to be implemented completely. 

For us it would be just fine to have custom FPC builds, we
do that anyway - we use CrossFPC built by bero to be able
to target a whole lot of platforms concurrently, including
inside the Delphi and Lazarus IDEs.

But of course it means far less other users would benefit.

Simon


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Florian,

> But there is another pretty simple optimization opportunity in this
> area: make the FPC heap manager capable of using
> os-based memory reallocation. Kernel-based memory reallocation of
> large blocks has the big advantage that the OS can
> move the memory contents only by re-mapping memory pages.

I fully agree that the memory manager for obvious reasons is
an important subject, especially for heavily multithreaded code,
and even more for any string stuff in such code. I haven't
informed myself enough to judge how well the FPC memory manager
behaves in this regard, and if it might make sense to try
to use an alternative memory manager with FPC for Linux.

However, being aware of that, we are avoiding reallocations
wherever we can and instantiate pretty much every thing using
own memory caches.

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Sven Barth via fpc-devel
Simon Kissel  schrieb am So., 28. Okt.
2018, 12:30:

> Hi Florian,
>
> > The %gs based approach works only for object files linked statically to
> > the executable. In general there are four TLS access models on linux and
> > at least three of them need to be supported, if one wants to support
> > dyn. libraries in a usefull manner.
>
> Are you talking about being able to create dynlibs in FPC,
> that then are consumed by FPC, and need to be able to support
> exceptions?
>
> I know an approach is needed that FPC benefits from in a generic
> way, but for my case: We don't do that. As long as I am able
> to link against glibc-based stuff, I am fine.
>

The thing is that we can't enable or disable a feature based on whether a
program links third party libraries or a unit is included in a library or
not, cause we might need to work with precompiled units. So either you'll
need to enable this feature for a locally build FPC amd be aware that you
can't really create libraries then or the feature needs to be implemented
completely.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Sven,

> And no one said that it is. But points like table based exception
> handling and section based threadvars can be relatively easily
> achieved and benefits more targets while working on the optimizer
> usually is a per platform work.

I agree that this very likely will make a big boost. From what
I recall, and the oldest ARM platform we have (Marvell Kirkwood),
every access to threadvars right now involve a full CPU cache
flush (but forgot why exactly, has been a long time).

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Ben,

>  There's one more problem I forgot to mention in my first post, and it is
>  probably a deal breaker for the original bounty: LLVM does not support
>  Borland's fastcall calling convention for i386. So you would need to add
>  support for Borland fastcall on i386 to LLVM if it has to support 
>  existing i386 inline assembly routines written for FPC/Delphi.

I don't see how not supporting fastcall would be a deal-breaker?

> As far as the point about assembly on 32 bit, while it does seem
> like that would be a problem for the bounty requirements, would it
> really be the end of the world in a more general sense? I can't
> imagine people who are still using 32-bit-hardware and writing
> 32-bit applications would complain if the LLVM backend was not available for 
> 32-bit.

We have tons of hand-tuned Assembler library code for stuff
like encryption, and other libraries we use, have, too, even those
who are multiplatform - think mORMot, for example.

Most of our embedded platforms sadly aren't and won't be 64bit.

Cheers,

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Simon Kissel
Hi Florian,

> The %gs based approach works only for object files linked statically to
> the executable. In general there are four TLS access models on linux and
> at least three of them need to be supported, if one wants to support 
> dyn. libraries in a usefull manner.

Are you talking about being able to create dynlibs in FPC,
that then are consumed by FPC, and need to be able to support
exceptions?

I know an approach is needed that FPC benefits from in a generic
way, but for my case: We don't do that. As long as I am able
to link against glibc-based stuff, I am fine.

Simon

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Florian Klämpfl
Am 28.10.2018 um 02:11 schrieb Ben Grasset:
> 
> There's also a number of things that would specifically help the build-time 
> performance of the compiler itself that I've
> noticed, such as there being many, many, many, one-liner functions and 
> procedures that should almost certainly be marked
> as inline but currently are not. 

... because FPC can auto inline if needed. However, the current autoinline 
heuristics which is pretty conservative
(read: inlines only very small subroutines), has exactly two effects: it makes 
the compiler executable bigger and
slower. A few bytes bigger would be ok, but slower is not acceptable, right? I 
can tell you also why it is slower: the
compiler is memory throughput limited, so everything which increases the memory 
footprint is bad. While (auto)inlining
helps very much for "normal" programs and benchmarks, for the compiler it is 
not a good solution.

The only thing I consider useful in this direction is to work on improving the 
auto inline heuristics by maybe adding
two methods: for pure size and for speed, if the program is not memory 
throughput limited.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-28 Thread Florian Klämpfl
Am 28.10.2018 um 02:24 schrieb Ben Grasset:
> On Sat, Oct 27, 2018 at 8:22 PM Ozz Nixon  > wrote:
> 
> * Sorry for off topic - just that grabbed my "What did he just say?" 
> button..
> 
> 
> Huh? I said "Also linked lists absolutely everywhere, that would perform much 
> better as array based lists."

Only if it does not increase memory fragmentation which is even now already a 
problem.

But there is another pretty simple optimization opportunity in this area: make 
the FPC heap manager capable of using
os-based memory reallocation. Kernel-based memory reallocation of large blocks 
has the big advantage that the OS can
move the memory contents only by re-mapping memory pages.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 8:22 PM Ozz Nixon  wrote:

> * Sorry for off topic - just that grabbed my "What did he just say?"
> button..
>

Huh? I said "Also linked lists absolutely everywhere, that would perform
much better as array based lists."

Meaning, exactly the same thing you just implied. You got what I meant
completely backwards somehow.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ozz Nixon
SORRY - JUST RE-READ... that is what you are saying... it's late here ;-(

On Sat, Oct 27, 2018 at 8:22 PM Ozz Nixon  wrote:

> * Not arguing, but... *
>
> Linked List faster than Array?
> Unless I missed what you are talking about... I always teach programmers:
>
> Array is the fastest collection to use, followed by Linked List, followed
> by bTree, etc.
>
> * Sorry for off topic - just that grabbed my "What did he just say?"
> button...
>
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ozz Nixon
* Not arguing, but... *

Linked List faster than Array?
Unless I missed what you are talking about... I always teach programmers:

Array is the fastest collection to use, followed by Linked List, followed
by bTree, etc.

* Sorry for off topic - just that grabbed my "What did he just say?"
button...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 6:46 PM Sven Barth via fpc-devel <
fpc-devel@lists.freepascal.org> wrote:

> Except of course for optimizations that can be done on the platform
> independent node tree.
>

That specifically is IMO the "key" to a higher compiler-wide level of
optimization capabilities, as shown by various more recent compilers for
other languages and also by LLVM. Target-CPU-level optimizations are
certainly still very necessary for some things, but it you pass the
assembly code generator better information to begin with they're not nearly
as relevant. I've been looking over the compiler codebase recently and
there's quite a few things that could obviously be done better IMO at the
top level before any platform specific-stuff comes into play.

There's also a number of things that would specifically help the build-time
performance of the compiler itself that I've noticed, such as there being
many, many, many, one-liner functions and procedures that should almost
certainly be marked as inline but currently are not. Also linked lists
absolutely everywhere, that would perform much better as array based lists.

If the core team is open to arbitrary/speculative patches I might try to
work out a few for what I think are the most important issues and submit
them for consideration sometime in the near future.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Sven Barth via fpc-devel
Ben Grasset  schrieb am So., 28. Okt. 2018, 00:29:

> On Sat, Oct 27, 2018 at 1:38 PM Florian Klämpfl 
> wrote:
>
>> That it is useful to work on table based exception handling for all
>> targets
>>
>
> Not arguing with that at all. I was just trying to point out that I'm not
> a fan of the idea that FPC's code generators are "good enough" as is.
>

And no one said that it is. But points like table based exception handling
and section based threadvars can be relatively easily achieved and benefits
more targets while working on the optimizer usually is a per platform work.
Except of course for optimizations that can be done on the platform
independent node tree.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 1:38 PM Florian Klämpfl 
wrote:

> That it is useful to work on table based exception handling for all targets
>

Not arguing with that at all. I was just trying to point out that I'm not a
fan of the idea that FPC's code generators are "good enough" as is.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Florian Klämpfl
Am 27.10.2018 um 19:19 schrieb Michael Van Canneyt:
> 
> 
> On Sat, 27 Oct 2018, Florian Klämpfl wrote:
> 
>> If you read the whole thread, LLVM needs a rewritten exception handling as 
>> well. Further, a quick test
>> of table based exception handling on bansi1 (which is mainly a memory 
>> manager test) gives:
>>
>> standard exception handling:
>>
>> fpctrunk\tests\bench>pp11 bansi1 -O3
>>
>> fpctrunk\tests\bench>bansi1
>> Test 1: 100 done in 0.537 sec
>> Test 2: 100 done in 0.535 sec
>> Test 3: 100 done in 0.587 sec
>>
>> SEH based exception handling:
>>
>> fpctrunk\tests\bench>pp11 bansi1 -O3
>>
>> fpctrunk\tests\bench>bansi1
>> Test 1: 100 done in 0.456 sec
>> Test 2: 100 done in 0.457 sec
>> Test 3: 100 done in 0.446 sec
> 
> Florian, I am not sure what this is supposed to prove ?

That it is useful to work on table based exception handling for all targets ...

> 
> It's 15% off the elapsed time (almost 1/6th), that seems worth spending some 
> time on...
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Florian Klämpfl
Am 27.10.2018 um 18:21 schrieb Ben Grasset:
> On Sat, Oct 27, 2018 at 8:47 AM Jonas Maebe  > wrote:
> 
> On 27/10/18 05:45, Ben Grasset wrote:
> 
>   
> 
> You also need "opt" if you want to perform full optimizations (or just
> use clang, which a.o. combines the functionality of llc and opt).
> 
> There's one more problem I forgot to mention in my first post, and it is
> probably a deal breaker for the original bounty: LLVM does not support
> Borland's fastcall calling convention for i386. So you would need to add
> support for Borland fastcall on i386 to LLVM if it has to support
> existing i386 inline assembly routines written for FPC/Delphi.
> 
> Finally, adding support for 32 bit targets in FPC's LLVM backend would
> also require some work due to how FPC's code generator is structured,
> and due to the fact that need to have two code generators in a single
> binary (the native one to support the generation of entry and exit code
> for pure inline assembler routines, and the LLVM one for the rest).
> 
> 
> LLC (at least now) statically links the necessary parts of LLVM and works 
> independently of Opt, with a simpler set of
> command line options (it just has overall O1, O2, and O3 flags.)
> 
> As far as the point about assembly on 32 bit, while it does seem like that 
> would be a problem for the bounty
> requirements, would it really be the end of the world in a more general 
> sense? I can't imagine people who are still
> using 32-bit-hardware and writing 32-bit applications would complain if the 
> LLVM backend was not available for 32-bit.
> 
> Anyways though, I do think code gen improvements for FPC, LLVM or not, are 
> likely going to be a lot more widely helpful
> than just rewriting exception handling 

If you read the whole thread, LLVM needs a rewritten exception handling as 
well. Further, a quick test
of table based exception handling on bansi1 (which is mainly a memory manager 
test) gives:

standard exception handling:

fpctrunk\tests\bench>pp11 bansi1 -O3

fpctrunk\tests\bench>bansi1
Test 1: 100 done in 0.537 sec
Test 2: 100 done in 0.535 sec
Test 3: 100 done in 0.587 sec

SEH based exception handling:

fpctrunk\tests\bench>pp11 bansi1 -O3

fpctrunk\tests\bench>bansi1
Test 1: 100 done in 0.456 sec
Test 2: 100 done in 0.457 sec
Test 3: 100 done in 0.446 sec

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Ben Grasset
On Sat, Oct 27, 2018 at 8:47 AM Jonas Maebe  wrote:

> On 27/10/18 05:45, Ben Grasset wrote:



> You also need "opt" if you want to perform full optimizations (or just
> use clang, which a.o. combines the functionality of llc and opt).
>
> There's one more problem I forgot to mention in my first post, and it is
> probably a deal breaker for the original bounty: LLVM does not support
> Borland's fastcall calling convention for i386. So you would need to add
> support for Borland fastcall on i386 to LLVM if it has to support
> existing i386 inline assembly routines written for FPC/Delphi.
>
> Finally, adding support for 32 bit targets in FPC's LLVM backend would
> also require some work due to how FPC's code generator is structured,
> and due to the fact that need to have two code generators in a single
> binary (the native one to support the generation of entry and exit code
> for pure inline assembler routines, and the LLVM one for the rest).
>

LLC (at least now) statically links the necessary parts of LLVM and works
independently of Opt, with a simpler set of command line options (it just
has overall O1, O2, and O3 flags.)

As far as the point about assembly on 32 bit, while it does seem like that
would be a problem for the bounty requirements, would it really be the end
of the world in a more general sense? I can't imagine people who are still
using 32-bit-hardware and writing 32-bit applications would complain if the
LLVM backend was not available for 32-bit.

Anyways though, I do think code gen improvements for FPC, LLVM or not, are
likely going to be a lot more widely helpful than just rewriting exception
handling (not that rewriting exception handling is a bad idea.) I think
there's a lot of people who would like FPC to generate faster code than it
currently does. Can you recommend any known areas in need of improvement of
the non-platform-specific parts of the code generators that might be a good
place to start for someone who's an experienced Pascal developer but hasn't
worked with the compiler codebase before?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Jonas Maebe

On 27/10/18 05:45, Ben Grasset wrote:

As far as dependencies, it would add 
none whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, 
no more than any other target FPC supports. Just think of it as yet 
another assembler format.)


You also need "opt" if you want to perform full optimizations (or just 
use clang, which a.o. combines the functionality of llc and opt).


There's one more problem I forgot to mention in my first post, and it is 
probably a deal breaker for the original bounty: LLVM does not support 
Borland's fastcall calling convention for i386. So you would need to add 
support for Borland fastcall on i386 to LLVM if it has to support 
existing i386 inline assembly routines written for FPC/Delphi.


Finally, adding support for 32 bit targets in FPC's LLVM backend would 
also require some work due to how FPC's code generator is structured, 
and due to the fact that need to have two code generators in a single 
binary (the native one to support the generation of entry and exit code 
for pure inline assembler routines, and the LLVM one for the rest).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Martin Schreiber
On Saturday 27 October 2018 09:27:59 Sven Barth via fpc-devel wrote:
> >
> > Not really. The IR format has been pretty stable since version 3.9 or so
> > (LLVM is current at version 8.) As far as dependencies, it would add none
> > whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, no
> > more than any other target FPC supports. Just think of it as yet another
> > assembler format.)
>
> It's more than just an additional assembler format as the infrastructure
> inside the compiler shows. Also there are the problems that Jonas
> mentioned.
> In my opinion that time is better spent optimizing our own code generator.
>
MSElang uses the approach to write LLVM bitcode directly without a temporary 
LLVM assembler text. Building the needed LLVM lists and tracking the ssa 
values is not trivial. IMO the worst aspect of LLVM is its slowness but the 
resulting code is awesome.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-27 Thread Sven Barth via fpc-devel
Ben Grasset  schrieb am Sa., 27. Okt. 2018, 05:46:

> On Thu, Oct 25, 2018 at 3:06 AM Sven Barth via fpc-devel <
> fpc-devel@lists.freepascal.org> wrote:
>
>> Simon Kissel  schrieb am Do., 25. Okt.
>> 2018, 08:54:
>>
>>> - Complete the LLVM branch of FPC. It looks like Jonas has stopped
>>>   working on it two years ago, which is a pity.
>>>
>>
>> I personally don't think that LLVM is the way to go. It's essentially a
>> moving target and adds an unnecessary dependency to the compiler.
>>
>
> Not really. The IR format has been pretty stable since version 3.9 or so
> (LLVM is current at version 8.) As far as dependencies, it would add none
> whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, no more
> than any other target FPC supports. Just think of it as yet another
> assembler format.)
>

It's more than just an additional assembler format as the infrastructure
inside the compiler shows. Also there are the problems that Jonas
mentioned.
In my opinion that time is better spent optimizing our own code generator.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-26 Thread Ben Grasset
On Thu, Oct 25, 2018 at 3:06 AM Sven Barth via fpc-devel <
fpc-devel@lists.freepascal.org> wrote:

> Simon Kissel  schrieb am Do., 25. Okt.
> 2018, 08:54:
>
>> - Complete the LLVM branch of FPC. It looks like Jonas has stopped
>>   working on it two years ago, which is a pity.
>>
>
> I personally don't think that LLVM is the way to go. It's essentially a
> moving target and adds an unnecessary dependency to the compiler.
>

Not really. The IR format has been pretty stable since version 3.9 or so
(LLVM is current at version 8.) As far as dependencies, it would add none
whatsoever other than a copy of the LLC or LLVM-AS binaries (as in, no more
than any other target FPC supports. Just think of it as yet another
assembler format.)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel

Am 25.10.2018 um 20:34 schrieb Jonas Maebe:

On 25/10/18 20:13, Florian Klämpfl wrote:

Am 25.10.2018 um 18:59 schrieb Jonas Maebe:

On 20/10/18 16:07, Simon Kissel wrote:

- Complete the LLVM branch of FPC. It looks like Jonas has stopped
    working on it two years ago, which is a pity.
I didn't stop working on it, but I didn't make real progress anymore 
either. The current state of the LLVM code

generator is that everything works on Darwin/x86-64, except for
a) exception handling in general: indeed needs DWARF-EH support in 
the RTL,
This is something I would like to work for years on already. So maybe 
its now a good opportunity to start with it.


I started a branch for 
it:https://svn.freepascal.org/svn/fpc/branches/debug_eh


As a first step, I'll depend on libgcc unwinding, let's see how far 
we get.


Using libgcc's foreign exception support works somewhat, but is not 
very usable in practice due to the limitation of having only one 
exception in flight. I simply started translating all of libgcc's 
exception support to Pascal, since it's also licensed under LGPL + 
linking exception (I took the one from gcc 4.2.1 for the people who 
don't like (L)GPL3).
As you already started working on translating that part of libgcc, would 
you please provide what you have so far? :)


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Jeppe Johansen

On 10/20/18 4:07 PM, Simon Kissel wrote:

The requirements for my bounty would be:

- Must bring executable speed for non-Floating point load
   on both multihreaded and non-multithreaded workloads to
   the Speed of Kylix combined binaries

- Improvements should also help on ARM targets

- An LLVM-based solution must allow inline assembler for
   all x86 and ARM

- Must be completed by February 2019

So, any suggestions on how to move forward on this?

Cheers,

Simon


Hi,

Can you create some benchmarks showing typical workloads that you 
experience a large performance difference on?


Best Regards,
Jeppe

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Jonas Maebe

On 25/10/18 20:13, Florian Klämpfl wrote:

Am 25.10.2018 um 18:59 schrieb Jonas Maebe:

On 20/10/18 16:07, Simon Kissel wrote:

- Complete the LLVM branch of FPC. It looks like Jonas has stopped
    working on it two years ago, which is a pity.

I didn't stop working on it, but I didn't make real progress anymore either. 
The current state of the LLVM code
generator is that everything works on Darwin/x86-64, except for
a) exception handling in general: indeed needs DWARF-EH support in the RTL,

This is something I would like to work for years on already. So maybe its now a 
good opportunity to start with it.

I started a branch for it:https://svn.freepascal.org/svn/fpc/branches/debug_eh

As a first step, I'll depend on libgcc unwinding, let's see how far we get.


Using libgcc's foreign exception support works somewhat, but is not very 
usable in practice due to the limitation of having only one exception in 
flight. I simply started translating all of libgcc's exception support 
to Pascal, since it's also licensed under LGPL + linking exception (I 
took the one from gcc 4.2.1 for the people who don't like (L)GPL3).



Jonas

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Thu, 25 Oct 2018, Florian Klaempfl wrote:

> >> That is good news.  The contours of a TODO list are becoming visible :)
> >>
> >> But we may need also need a solution for other platforms, which means the
> >> current system should remain in place for those platforms where such a
> >> system is not present ?
> >
> > FPC already has some code to support section threadvars via the GS segment
> > on i386 at least, but it doesn't seem to be enabled by default? (Couldn't
> > test it, but the tf_section_threadvars target flag, which enable this is
> > actually behind a define in i_linux.pas, which I couldn't find enabled
> > anywhere?). Also tf_section_threadvars flag has some code to support it
> > all over the compiler, including the x86 cg. I have some really vague
> > memories I actually enabled it in some experimental local version I had,
> > and it worked on first sight at least, but I could be completely off here.
> >
> > I wonder why it was never enabled by default.
>
> The %gs based approach works only for object files linked statically to
> the executable. In general there are four TLS access models on linux and
> at least three of them need to be supported, if one wants to support
> dyn. libraries in a usefull manner. Of course, this comes with the
> requirement to over means to control the used model. The tls.pdf by U.
> Drepper decribes it very well.

Ah, right. It's been a while. Ironically, it would have been enough for
the actual use case at hand, when I fiddled with it.

Charlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Jonas Maebe

On 20/10/18 16:07, Simon Kissel wrote:

- Complete the LLVM branch of FPC. It looks like Jonas has stopped
   working on it two years ago, which is a pity.


I didn't stop working on it, but I didn't make real progress anymore 
either. The current state of the LLVM code generator is that everything 
works on Darwin/x86-64, except for
a) exception handling in general: indeed needs DWARF-EH support in the 
RTL, and also support for the LLVM exception handling intrinsics in the 
code generator. I've worked on and off on this and have some local 
patches, but it's not complete
b) hardware exceptions (null pointer, floating point): the LLVM versions 
I worked with back then did not support support any form of hardware 
exceptions. If a memory access faults, the result is undefined behaviour 
(even with full exception support in the LLVM IR). If a floating point 
instruction  throw an exception, the result is undefined (although they 
have been working a bit on it since then). This is not something that 
can be changed/fixed in FPC, and is quite different from how FPC's 
current code generator works (I don't know how Embarcardero deals with 
it in their LLVM-based code generator).


Additionally, in the current FPC code generator global variables behave 
mostly as volatile variables. With LLVM, that won't be the case (unless 
we mark all of their accesses as volatile, but that would obviously 
inhibit LLVM optimizations). This may break some multithreaded code that 
currently works, and would probably require the introduction of a 
volatile() operatator (similar to the unaligned() one). On the other 
hand, I already added support for tracking the volatile state of 
references in the past, so that should be easy to do.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Florian Klaempfl

Am 25.10.2018 um 09:06 schrieb Sven Barth via fpc-devel:
Simon Kissel > schrieb am Do., 25. Okt. 2018, 
08:54:


- Complete the LLVM branch of FPC. It looks like Jonas has stopped
   working on it two years ago, which is a pity.


I personally don't think that LLVM is the way to go. It's essentially a 
moving target and adds an unnecessary dependency to the compiler.


Me neither :)



- Rewrite the code generator, for example in a SSA-IR way


Didn't Florian work on that already? I wonder how far he is by now 樂


Got distracted by other stuff but also because I do not believe that it 
matters much for a lot real world programs (small benchmarks are another 
story).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Florian Klaempfl

Am 25.10.2018 um 11:18 schrieb Sven Barth via fpc-devel:
Michael Van Canneyt > schrieb am Do., 25. Okt. 2018, 09:38:




On Sat, 20 Oct 2018, Simon Kissel wrote:

 > - Make Exception handling, TLS etc use the infrastructure that
 >  libpthread is providing

TLS is handled already by libpthread. I doubt you will gain much there.

However, Exception handling is a problem. There are 2 possible ways
ahead:
- DWARF exception handling as mentioned by Sven.
- Port SEH to be cross platform, this is the approach as taken by Kylix.
Kilyx has a small rtlunwind  library that mimics the needed run-time
functionality
offered by Windows.

Conceivably, it can be duplicated. wine probably has such a library
which
can be used as an inspiration.

The needed compiler infrastructure for SEH  already exists, so this
is most likely
the fastest way to proceed.


I'm against emulating SEH. Better implement DWARF exceptions. 


Yes.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Florian Klaempfl

Am 25.10.2018 um 17:38 schrieb Karoly Balogh (Charlie/SGR):

Hi,

On Thu, 25 Oct 2018, Michael Van Canneyt wrote:


- Make Exception handling, TLS etc use the infrastructure that
  libpthread is providing


TLS is handled already by libpthread. I doubt you will gain much there.



GCC has (depending on the platform) a faster implementation for "__thread"
variables. E.g. on x86 it uses the GS segment and the data is stored in ELF
sections. There were experiments in the past to support this in FPC as
well, so maybe we're on a good way there already.


That is good news.  The contours of a TODO list are becoming visible :)

But we may need also need a solution for other platforms, which means the
current system should remain in place for those platforms where such a
system is not present ?


FPC already has some code to support section threadvars via the GS segment
on i386 at least, but it doesn't seem to be enabled by default? (Couldn't
test it, but the tf_section_threadvars target flag, which enable this is
actually behind a define in i_linux.pas, which I couldn't find enabled
anywhere?). Also tf_section_threadvars flag has some code to support it
all over the compiler, including the x86 cg. I have some really vague
memories I actually enabled it in some experimental local version I had,
and it worked on first sight at least, but I could be completely off here.

I wonder why it was never enabled by default. 


The %gs based approach works only for object files linked statically to 
the executable. In general there are four TLS access models on linux and 
at least three of them need to be supported, if one wants to support 
dyn. libraries in a usefull manner. Of course, this comes with the 
requirement to over means to control the used model. The tls.pdf by U. 
Drepper decribes it very well.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Thu, 25 Oct 2018, Michael Van Canneyt wrote:

> >>> - Make Exception handling, TLS etc use the infrastructure that
> >>>  libpthread is providing
> >>
> >> TLS is handled already by libpthread. I doubt you will gain much there.
> >>
> >
> > GCC has (depending on the platform) a faster implementation for "__thread"
> > variables. E.g. on x86 it uses the GS segment and the data is stored in ELF
> > sections. There were experiments in the past to support this in FPC as
> > well, so maybe we're on a good way there already.
>
> That is good news.  The contours of a TODO list are becoming visible :)
>
> But we may need also need a solution for other platforms, which means the
> current system should remain in place for those platforms where such a
> system is not present ?

FPC already has some code to support section threadvars via the GS segment
on i386 at least, but it doesn't seem to be enabled by default? (Couldn't
test it, but the tf_section_threadvars target flag, which enable this is
actually behind a define in i_linux.pas, which I couldn't find enabled
anywhere?). Also tf_section_threadvars flag has some code to support it
all over the compiler, including the x86 cg. I have some really vague
memories I actually enabled it in some experimental local version I had,
and it worked on first sight at least, but I could be completely off here.

I wonder why it was never enabled by default. Maybe to keep compatibility
to some older Linux version, which didn't support this yet?

IOW, it might be an one line change. Can I take some of the bounty now? :P

Charlie
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Michael Van Canneyt



On Thu, 25 Oct 2018, Sven Barth via fpc-devel wrote:


Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
09:38:




On Sat, 20 Oct 2018, Simon Kissel wrote:


- Make Exception handling, TLS etc use the infrastructure that
 libpthread is providing


TLS is handled already by libpthread. I doubt you will gain much there.



GCC has (depending on the platform) a faster implementation for "__thread"
variables. E.g. on x86 it uses the GS segment and the data is stored in ELF
sections. There were experiments in the past to support this in FPC as
well, so maybe we're on a good way there already.


That is good news.  The contours of a TODO list are becoming visible :)

But we may need also need a solution for other platforms, which means the
current system should remain in place for those platforms where such a
system is not present ?

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel
Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
09:38:

>
>
> On Sat, 20 Oct 2018, Simon Kissel wrote:
>
> > - Make Exception handling, TLS etc use the infrastructure that
> >  libpthread is providing
>
> TLS is handled already by libpthread. I doubt you will gain much there.
>

GCC has (depending on the platform) a faster implementation for "__thread"
variables. E.g. on x86 it uses the GS segment and the data is stored in ELF
sections. There were experiments in the past to support this in FPC as
well, so maybe we're on a good way there already.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel
Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
14:55:

>
>
> On Thu, 25 Oct 2018, Sven Barth via fpc-devel wrote:
>
> >
> >> Personally I am also in favour of a more open technique instead of a
> >> technique which is proprietary to a platform, and in this sense I
> >> understand
> >> and endorse your point of view, but beggars can't be choosers.
> >>
> >> There is no problem to have both techniques available. As I wrote, the
> SEH
> >> is the fastest path.
> >>
> >
> > I have my doubts especially as the rtlunwind stuff of Kylix only works on
> > i386. The SEH mechanism between i386 and all other Windows platforms
> > differs significantly and I doubt that Simon only wants i386 to benefit.
>
> If 'SEH is the fastest path.' is not correct, then all the more reason to
> use DWARF...
>

A further obstacle for SEH on non-i386: GNU AS supports the pseudo
instructions needed for SEH only for PE/COFF, but not ELF. This would mean
that we'd need to add them manually to to the assembly files which would
definitely be more bothersome...

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Michael Van Canneyt



On Thu, 25 Oct 2018, Sven Barth via fpc-devel wrote:




Personally I am also in favour of a more open technique instead of a
technique which is proprietary to a platform, and in this sense I
understand
and endorse your point of view, but beggars can't be choosers.

There is no problem to have both techniques available. As I wrote, the SEH
is the fastest path.



I have my doubts especially as the rtlunwind stuff of Kylix only works on
i386. The SEH mechanism between i386 and all other Windows platforms
differs significantly and I doubt that Simon only wants i386 to benefit.


If 'SEH is the fastest path.' is not correct, then all the more reason to use 
DWARF...

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Joao Schuler
Hello Simon - wondering if you have code examples that provoke problems you
are experiencing? It will be easier to measure/test improvements with test
cases. Solutions might not come from a single person/team and therefore not
sure how to apply the bounty in the most effective/fair way.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel
Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
11:51:

>
>
> On Thu, 25 Oct 2018, Sven Barth via fpc-devel wrote:
>
> > Michael Van Canneyt  schrieb am Do., 25. Okt.
> 2018,
> > 09:38:
> >
> >>
> >>
> >> On Sat, 20 Oct 2018, Simon Kissel wrote:
> >>
> >>> - Make Exception handling, TLS etc use the infrastructure that
> >>>  libpthread is providing
> >>
> >> TLS is handled already by libpthread. I doubt you will gain much there.
> >>
> >> However, Exception handling is a problem. There are 2 possible ways
> ahead:
> >> - DWARF exception handling as mentioned by Sven.
> >> - Port SEH to be cross platform, this is the approach as taken by Kylix.
> >> Kilyx has a small rtlunwind  library that mimics the needed run-time
> >> functionality
> >> offered by Windows.
> >>
> >> Conceivably, it can be duplicated. wine probably has such a library
> which
> >> can be used as an inspiration.
> >>
> >> The needed compiler infrastructure for SEH  already exists, so this is
> >> most likely
> >> the fastest way to proceed.
> >>
> >
> > I'm against emulating SEH. Better implement DWARF exceptions. The
> > infrastructure that was created for SEH inside the compiler should help
> > nevertheless.
>
> You can be against, and  you don't need to work on it,
> but if someone supplies a patch, I don't think we should refuse it.
>

I don't agree here.


> Personally I am also in favour of a more open technique instead of a
> technique which is proprietary to a platform, and in this sense I
> understand
> and endorse your point of view, but beggars can't be choosers.
>
> There is no problem to have both techniques available. As I wrote, the SEH
> is the fastest path.
>

I have my doubts especially as the rtlunwind stuff of Kylix only works on
i386. The SEH mechanism between i386 and all other Windows platforms
differs significantly and I doubt that Simon only wants i386 to benefit.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Michael Van Canneyt



On Thu, 25 Oct 2018, Martin Schreiber wrote:


On Thursday 25 October 2018 11:18:58 Sven Barth via fpc-devel wrote:


I'm against emulating SEH. Better implement DWARF exceptions. The
infrastructure that was created for SEH inside the compiler should help
nevertheless.

MSElang has some code for "Itanium ABI Zero-cost Exception Handling" supported 
by LLVM, for example the runtime part:

https://gitlab.com/mseide-msegui/mselang/blob/master/mselang/compiler/__mla__personality.pas
Works well so far.


Great, thank you for this info. The more choice, the better!

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Michael Van Canneyt



On Thu, 25 Oct 2018, Sven Barth via fpc-devel wrote:


Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
09:38:




On Sat, 20 Oct 2018, Simon Kissel wrote:


- Make Exception handling, TLS etc use the infrastructure that
 libpthread is providing


TLS is handled already by libpthread. I doubt you will gain much there.

However, Exception handling is a problem. There are 2 possible ways ahead:
- DWARF exception handling as mentioned by Sven.
- Port SEH to be cross platform, this is the approach as taken by Kylix.
Kilyx has a small rtlunwind  library that mimics the needed run-time
functionality
offered by Windows.

Conceivably, it can be duplicated. wine probably has such a library which
can be used as an inspiration.

The needed compiler infrastructure for SEH  already exists, so this is
most likely
the fastest way to proceed.



I'm against emulating SEH. Better implement DWARF exceptions. The
infrastructure that was created for SEH inside the compiler should help
nevertheless.


You can be against, and  you don't need to work on it, 
but if someone supplies a patch, I don't think we should refuse it.


Personally I am also in favour of a more open technique instead of a
technique which is proprietary to a platform, and in this sense I understand
and endorse your point of view, but beggars can't be choosers.

There is no problem to have both techniques available. As I wrote, the SEH
is the fastest path.

So hopefully we will be able to compare and can still choose the better/faster 
one.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Martin Schreiber
On Thursday 25 October 2018 11:18:58 Sven Barth via fpc-devel wrote:
>
> I'm against emulating SEH. Better implement DWARF exceptions. The
> infrastructure that was created for SEH inside the compiler should help
> nevertheless.
>
MSElang has some code for "Itanium ABI Zero-cost Exception Handling" supported 
by LLVM, for example the runtime part:
https://gitlab.com/mseide-msegui/mselang/blob/master/mselang/compiler/__mla__personality.pas
Works well so far.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel
Michael Van Canneyt  schrieb am Do., 25. Okt. 2018,
09:38:

>
>
> On Sat, 20 Oct 2018, Simon Kissel wrote:
>
> > - Make Exception handling, TLS etc use the infrastructure that
> >  libpthread is providing
>
> TLS is handled already by libpthread. I doubt you will gain much there.
>
> However, Exception handling is a problem. There are 2 possible ways ahead:
> - DWARF exception handling as mentioned by Sven.
> - Port SEH to be cross platform, this is the approach as taken by Kylix.
> Kilyx has a small rtlunwind  library that mimics the needed run-time
> functionality
> offered by Windows.
>
> Conceivably, it can be duplicated. wine probably has such a library which
> can be used as an inspiration.
>
> The needed compiler infrastructure for SEH  already exists, so this is
> most likely
> the fastest way to proceed.
>

I'm against emulating SEH. Better implement DWARF exceptions. The
infrastructure that was created for SEH inside the compiler should help
nevertheless.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Michael Van Canneyt



On Sat, 20 Oct 2018, Simon Kissel wrote:


- Make Exception handling, TLS etc use the infrastructure that
 libpthread is providing


TLS is handled already by libpthread. I doubt you will gain much there.

However, Exception handling is a problem. There are 2 possible ways ahead:
- DWARF exception handling as mentioned by Sven.
- Port SEH to be cross platform, this is the approach as taken by Kylix.
Kilyx has a small rtlunwind  library that mimics the needed run-time 
functionality
offered by Windows.

Conceivably, it can be duplicated. wine probably has such a library which
can be used as an inspiration.

The needed compiler infrastructure for SEH  already exists, so this is most 
likely
the fastest way to proceed.

Michael..
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Sven Barth via fpc-devel
Simon Kissel  schrieb am Do., 25. Okt.
2018, 08:54:

> - Complete the LLVM branch of FPC. It looks like Jonas has stopped
>   working on it two years ago, which is a pity.
>

I personally don't think that LLVM is the way to go. It's essentially a
moving target and adds an unnecessary dependency to the compiler.

- Rewrite the code generator, for example in a SSA-IR way
>

Didn't Florian work on that already? I wonder how far he is by now 樂

- Make Exception handling, TLS etc use the infrastructure that
>   libpthread is providing
>

I'm against having such a basic functionality depend on an external library
as I quite enjoy that FPC can be used without any dependencies on Linux.
However I am in favor of introducing DWARF exception handling that should
have similar benefits as SEH on Win64 if I remember correctly.
And for threadvars we could try to implement a different mechanism as well.
I think there was some experiment for that some time ago 樂

A further problem is that not all of us have access to Kylix so that not
everyone can compare the performance.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel