Re: [fpc-devel] inline... and philosophy

2019-12-16 Thread Sven Barth via fpc-devel

Am 16.12.2019 um 23:08 schrieb Marco van de Voort:


Op 2019-11-21 om 22:56 schreef Sven Barth via fpc-devel:
In the meantime I've managed to fix the dynamic package support that 
had experienced a bit of bit rot in the last years. Though I've 
currently only tested Win32 and Win64 (x86_64-linux as well as 
*-darwin *should* work as well). And as before only compile time 
packages are supported.


(I noticed I hadn't replied to this msg).  Indeed quite sizable. I 
also don't like the subdivision in many packages.


It's to simplyfy our maintenance effort: If we simply can use the 
packages as they are in our directory structure we have less work to do. 
And *that* trumps every other argument in my opinion. We won't ever be 
Delphi compatible there anyway.


Could it be that still simply too many symbols are exported ?
Maybe. Maybe the compiler could also try to optimize harder inside the 
packages. Also I have compiled with no optimizations enabled, so that 
might count as well...


Anyway: For the first iteration it's better if too many symbols are 
exported than too less. Cause in the later case it simply won't work... 
Things can be further improved in the future.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-12-16 Thread Marco van de Voort


Op 2019-11-21 om 22:56 schreef Sven Barth via fpc-devel:
In the meantime I've managed to fix the dynamic package support that 
had experienced a bit of bit rot in the last years. Though I've 
currently only tested Win32 and Win64 (x86_64-linux as well as 
*-darwin *should* work as well). And as before only compile time 
packages are supported.


(I noticed I hadn't replied to this msg).  Indeed quite sizable. I also 
don't like the subdivision in many packages.


Could it be that still simply too many symbols are exported ?


For those that are interested, the sizes of the binaries for chmls are 
as follows:


=== output win32 begin ===

2633984 rtl.dll
414820 rtl.objpas.dll
247060 rtl.extra.dll
364625 rtl.generics.dll
389888 fcl.res.dll
788664 fcl.base.dll
962560 fcl.xml.dll
953676 chm.dll
68694 chmls.exe


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread Marģers . via fpc-devel
> Does that mean in some situations, if you have a small, tight loop, it
> might be better to optimise over speed in some very rare cases? For
> example, turning MOV EAX, $ into OR EAX, $FF to squeeze out a
> few extra bytes, even though the instruction introduces a false dependency.

Latency 4 clock cycles is a lot. As long dependency can be resolved in shorter 
time there will be some performance gain. 
That performance penalty is not fixed 20%. It depends what code you have before 
that. Long latency instructions have time to catch up with rest of code. It is 
possible to completely cancel out, by placing call so that ret will fall into 
next 64 byte line. 
It's place where tricky optimizations can be done.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread J. Gareth Moreton

To optimise SIZE over speed, sorry.  Missed a word out.

On 22/11/2019 09:04, J. Gareth Moreton wrote:
Does that mean in some situations, if you have a small, tight loop, it 
might be better to optimise over speed in some very rare cases? For 
example, turning MOV EAX, $ into OR EAX, $FF to squeeze out a 
few extra bytes, even though the instruction introduces a false 
dependency.


Gareth aka. Kit

On 22/11/2019 08:29, Marģers . via fpc-devel wrote:

Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel
  Most processors have a fairly large uop cache (up to 2048 for the 
newest
generations iirc), so this would only be for the first iteration? 
Do you

have a reference (agner fog page or so) or more explanation for this
that describes this?)
I have to revoke my statement. Don't have evidence to back up. 
Code, that lead me to thous conclusions, has been discarded.
I have read most whats published in agner's fog page. There nothing 
to pinpoint as reference.

No prob. Was just interested, I had to do some sse/avx code the last
years, and hadn't heard of this.

I did some research

manual from Agner's Fog page
The microarchitecture of Intel, AMD and VIA CPUs

20.17 Cache and memory access
Level 1 code  64 kB, 4 way, 256 sets, 64 B line size, per core. 
Latency 4 clocks


As well i created some performance tests and found out that if loop 
crossed 64 B line it got 20% performance lose while measurement error 
was 2%.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread J. Gareth Moreton
Does that mean in some situations, if you have a small, tight loop, it 
might be better to optimise over speed in some very rare cases? For 
example, turning MOV EAX, $ into OR EAX, $FF to squeeze out a 
few extra bytes, even though the instruction introduces a false dependency.


Gareth aka. Kit

On 22/11/2019 08:29, Marģers . via fpc-devel wrote:

Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel

  Most processors have a fairly large uop cache (up to 2048 for the newest

generations iirc), so this would only be for the first iteration? Do you
have a reference (agner fog page or so) or more explanation for this
that describes this?)

I have to revoke my statement. Don't have evidence to back up. Code, that lead 
me to thous conclusions, has been discarded.
I have read most whats published in agner's fog page. There nothing to pinpoint 
as reference.

No prob. Was just interested, I had to do some sse/avx code the last
years, and hadn't heard of this.

I did some research

manual from Agner's Fog page
The microarchitecture of Intel, AMD and VIA CPUs

20.17 Cache and memory access
Level 1 code  64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 
clocks

As well i created some performance tests and found out that if loop crossed 64 
B line it got 20% performance lose while measurement error was 2%.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread Marģers . via fpc-devel
> Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel
> >  Most processors have a fairly large uop cache (up to 2048 for the newest
> >> generations iirc), so this would only be for the first iteration? Do you
> >> have a reference (agner fog page or so) or more explanation for this
> >> that describes this?)
> > I have to revoke my statement. Don't have evidence to back up. Code, that 
> > lead me to thous conclusions, has been discarded.
> > I have read most whats published in agner's fog page. There nothing to 
> > pinpoint as reference.
> No prob. Was just interested, I had to do some sse/avx code the last
> years, and hadn't heard of this.

I did some research

manual from Agner's Fog page
The microarchitecture of Intel, AMD and VIA CPUs

20.17 Cache and memory access
Level 1 code  64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 
clocks

As well i created some performance tests and found out that if loop crossed 64 
B line it got 20% performance lose while measurement error was 2%.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-21 Thread Sven Barth via fpc-devel

Am 10.11.2019 um 16:06 schrieb Michael Van Canneyt:



On Sun, 10 Nov 2019, J. Gareth Moreton wrote:

This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of 
binaries behind a redistributable library.  I don't ever seek to make 
binaries smaller at the expense of speed, but if I see a potential 
saving that could be done automatically, I dive for it!


On 10/11/2019 14:47, Marco van de Voort wrote:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in 
both smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that 
calls the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are 
when it comes fo updating packages, other than the obvious one of not 
breaking old code.


I think Marco referred to dynamically loadable packages (aka run-time
packages)
In the meantime I've managed to fix the dynamic package support that had 
experienced a bit of bit rot in the last years. Though I've currently 
only tested Win32 and Win64 (x86_64-linux as well as *-darwin *should* 
work as well). And as before only compile time packages are supported.


For those that are interested, the sizes of the binaries for chmls are 
as follows:


=== output win32 begin ===

2633984 rtl.dll
414820 rtl.objpas.dll
247060 rtl.extra.dll
364625 rtl.generics.dll
389888 fcl.res.dll
788664 fcl.base.dll
962560 fcl.xml.dll
953676 chm.dll
68694 chmls.exe

=== output win32 end ===

=== output win64 begin ===

3707538 rtl.dll
601446 rtl.objpas.dll
345340 rtl.extra.dll
459357 rtl.generics.dll
568559 fcl.res.dll
1187518 fcl.base.dll
1602915 fcl.xml.dll
1419896 chm.dll
85131 chmls.exe

=== output win64 end ===

For those that wonder that rtl.generics is so small: the big part is 
contained in the metadata .pcp file:


- Win32: 38442358 rtl.generics.pcp
- Win64: 38607350 rtl.generics.pcp

Yes, it's massive, but only required on the development machine. :)

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-11 Thread Marco van de Voort


Op 10/11/2019 om 16:02 schreef J. Gareth Moreton:
This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of 
binaries behind a redistributable library.  I don't ever seek to make 
binaries smaller at the expense of speed, but if I see a potential 
saving that could be done automatically, I dive for it!


Keep in mind that the size differences (if more than a few percent) are 
usually not really compiler efficiency related, but more due to other 
reasons like framework architecture(RTTI, class registration), 
redistributable libraries (MSVCRT,QT) etc. Winapi binaries can be quite 
tight on FPC too. LCL is simply a bit more high level. Not just higher 
level than winapi but higher level than MFC too.


(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in 
both smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that calls 
the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are 
when it comes fo updating packages, other than the obvious one of not 
breaking old code.



I tested the ucomplex with my ffts testcase yesterday btw. I saw no 
differences but it turned out that


- I work with a "single" based complex record ->  record is 8 byte, so 
doesn't really benefit from vectorcall.


- the fft unit doesn't have procedures with value parameters that are 
not inlined anyway.


- no vectorization whatsoever, so no add a complex in one step. Probably 
needs either vectorizer or intrinsics.


The only other thing I noticed is that it seems that the compiler only 
uses XMM0, occasionally XMM1  and extremely rarely XMM2. Seems there is 
no register variables for XMM floating point?




I like working on optimisation because I have a morbid fascination 
with the lowest level of the CPU and I feel well-suited for it, 
although there are still some things I'm learning about it.


There is nothing wrong with that.  But it is wise to lot lose track of 
magnitudes.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Sven Barth via fpc-devel

Am 10.11.2019 um 15:47 schrieb Marco van de Voort:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)
I don't know if that would help much, cause especially on Windows every 
application would probably provide its own set of binaries and the RTL 
package alone has a size of ~3.7 MB on Win64.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Florian Klämpfl

Am 10.11.19 um 18:01 schrieb J. Gareth Moreton:

Fair enough - thank you.

This is a bit of a micro-optimisation for the compiler in regards to 
what I've just done, but I've noticed that, a couple of times, commands 
to the effect of the following appear:


tasmlabel(symbol).decrefs;
if tasmlabel(symbol).getrefs = 0 then
...

That is... dereference a label, and then do something if it turns into a 
dead label (usually remove it).  Would it be permissible to change the 
decrefs method so it returns a Boolean expression, namely True if the 
reference falls to zero and False otherwise? Given the function already 
checks to see if the reference count is less than zero (to raise an 
internal error), it should have negligable performance loss, and if 
anything, the new jump optimisations will help a lot.


In the meantime, I've just taken a quick look at my code, and noticed 
something possibly a little risky (compiler/x86/aoptx86.pas, line 3540):


{ Remove label xxx (it will have a ref of zero due to the initial check }
StripLabelFast(hp4);

{ Now we can safely decrement it }
tasmlabel(symbol).decrefs; >
There's nothing actually buggy with the code because it's known that the 
label has a reference count of 1 at this point, but "StripLabelFast" 
removes the label while it's still live, something that I even said in 
the procedure's comments that you shouldn't do!  i.e. I broke my own 
rule!  To be safe, these two commands should probably be switched 


Committed.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread J. Gareth Moreton

Fair enough - thank you.

This is a bit of a micro-optimisation for the compiler in regards to 
what I've just done, but I've noticed that, a couple of times, commands 
to the effect of the following appear:


tasmlabel(symbol).decrefs;
if tasmlabel(symbol).getrefs = 0 then
...

That is... dereference a label, and then do something if it turns into a 
dead label (usually remove it).  Would it be permissible to change the 
decrefs method so it returns a Boolean expression, namely True if the 
reference falls to zero and False otherwise? Given the function already 
checks to see if the reference count is less than zero (to raise an 
internal error), it should have negligable performance loss, and if 
anything, the new jump optimisations will help a lot.


In the meantime, I've just taken a quick look at my code, and noticed 
something possibly a little risky (compiler/x86/aoptx86.pas, line 3540):


{ Remove label xxx (it will have a ref of zero due to the initial check }
StripLabelFast(hp4);

{ Now we can safely decrement it }
tasmlabel(symbol).decrefs;

There's nothing actually buggy with the code because it's known that the 
label has a reference count of 1 at this point, but "StripLabelFast" 
removes the label while it's still live, something that I even said in 
the procedure's comments that you shouldn't do!  i.e. I broke my own 
rule!  To be safe, these two commands should probably be switched 
around, so the label is actually dead when StripLabelFast is called.  It 
won't affect the output in any way, but will hopefully reduce the risk 
of alarming another programmer who stumbles upon it.


Gareth aka. Kit


On 10/11/2019 16:45, Florian Klämpfl wrote:

Am 10.11.19 um 17:42 schrieb J. Gareth Moreton:


Some of the "condition_in" functions need expanding though, and I 
don't yet have an answer if it's okay to do operator overloading in 
the compiler source (so I can do things like "if (jmp1.cond in 
jmp2.cond) then", for example, instead of the more ambiguous "if 
condition_in(jmp1.cond, jmp2.cond) then".


I wouldn't do, for somebody without experience with the code, this is 
confusing. Operator overloading makes imo only sense if it effects a 
lot of code and makes it more readable because it replaces a lot of 
nested function calls.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Florian Klämpfl

Am 10.11.19 um 17:42 schrieb J. Gareth Moreton:


Some of the "condition_in" functions need expanding though, and I don't 
yet have an answer if it's okay to do operator overloading in the 
compiler source (so I can do things like "if (jmp1.cond in jmp2.cond) 
then", for example, instead of the more ambiguous "if 
condition_in(jmp1.cond, jmp2.cond) then".


I wouldn't do, for somebody without experience with the code, this is 
confusing. Operator overloading makes imo only sense if it effects a lot 
of code and makes it more readable because it replaces a lot of nested 
function calls.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread J. Gareth Moreton
That seems fair, yep.  Currently, vectorcall, and the more intricate 
parts of the System V ABI, is only really beneficial when interfacing 
with third-party libraries or when programming in assembly language.


Sorry if I've given you a headache with my stubbornness and passion.  
I'll try to behave myself.  Thanks for accepting the jump optimisations, 
by the way.  I hope they perform well.


Some of the "condition_in" functions need expanding though, and I don't 
yet have an answer if it's okay to do operator overloading in the 
compiler source (so I can do things like "if (jmp1.cond in jmp2.cond) 
then", for example, instead of the more ambiguous "if 
condition_in(jmp1.cond, jmp2.cond) then".


Gareth aka. Kit

On 10/11/2019 15:33, Florian Klämpfl wrote:

Am 10.11.19 um 16:02 schrieb J. Gareth Moreton:
This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of 
binaries behind a redistributable library.  I don't ever seek to make 
binaries smaller at the expense of speed, but if I see a potential 
saving that could be done automatically, I dive for it!


On 10/11/2019 14:47, Marco van de Voort wrote:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in 
both smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that 
calls the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are 
when it comes fo updating packages, other than the obvious one of not 
breaking old code.


Currently, there is no real gain by changing the calling conventions. 
When we have a vectorizer, we can talk about it.




I like working on optimisation because I have a morbid fascination 
with the lowest level of the CPU and I feel well-suited for it, 
although there are still some things I'm learning about it.


Gareth aka. Kit


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Florian Klämpfl

Am 10.11.19 um 16:02 schrieb J. Gareth Moreton:
This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of binaries 
behind a redistributable library.  I don't ever seek to make binaries 
smaller at the expense of speed, but if I see a potential saving that 
could be done automatically, I dive for it!


On 10/11/2019 14:47, Marco van de Voort wrote:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in both 
smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that calls 
the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are when 
it comes fo updating packages, other than the obvious one of not 
breaking old code.


Currently, there is no real gain by changing the calling conventions. 
When we have a vectorizer, we can talk about it.




I like working on optimisation because I have a morbid fascination with 
the lowest level of the CPU and I feel well-suited for it, although 
there are still some things I'm learning about it.


Gareth aka. Kit


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Michael Van Canneyt



On Sun, 10 Nov 2019, J. Gareth Moreton wrote:

This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of binaries 
behind a redistributable library.  I don't ever seek to make binaries 
smaller at the expense of speed, but if I see a potential saving that 
could be done automatically, I dive for it!


On 10/11/2019 14:47, Marco van de Voort wrote:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in both 
smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that calls 
the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are when 
it comes fo updating packages, other than the obvious one of not 
breaking old code.


I think Marco referred to dynamically loadable packages (aka run-time
packages)

Michael..___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread J. Gareth Moreton
This message chain has proven to be a lot more educational and 
insightful than I would have given it credit for.  Thanks everybody!


I know a lot of the time, the size of binaries is just an illusion, 
along with unfair comparisons with GCC (a behemoth with corporate 
support) and Microsoft Visual C++ that often hides the size of binaries 
behind a redistributable library.  I don't ever seek to make binaries 
smaller at the expense of speed, but if I see a potential saving that 
could be done automatically, I dive for it!


On 10/11/2019 14:47, Marco van de Voort wrote:
(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)


I did try to start simple with the 'uComplex' unit, but concerns were 
raised because I changed the formal parameters to 'const' and aligned 
the complex type on x86-64 platforms so it can take advantage of XMM 
registers better (which, given proper optimisation, would result in both 
smaller code size and higher speed).  While I made sure that the 
interfaces would not change for Pascal code, assembler code that calls 
the functions (if it exists) might need to be changed slightly 
(something Florian raised).  I'm not quite sure what the rules are when 
it comes fo updating packages, other than the obvious one of not 
breaking old code.


I like working on optimisation because I have a morbid fascination with 
the lowest level of the CPU and I feel well-suited for it, although 
there are still some things I'm learning about it.


Gareth aka. Kit


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Marco van de Voort


Op 09/11/2019 om 15:51 schreef J. Gareth Moreton:


Competitions aside, there are times where space is a premium, whether 
it be from distributing an application on a DVD, bandwidth or data 
limits (even some first world countries are still on dial-up in 
places, or are otherwise monopolised by a single, bad-quality 
provider), the smaller capacity of solid-state hard drives (especially 
on some laptops) and can otherwise be a money saver sometimes.


Maybe. But what the faq warnes against is in using these kind of 
scenarios to retroactive justify old dos era sentiments.  Even small 
SSDs are huge compared to FPC binaries, and the possible gains are 
really not that high. Constrained pipes usually already employ 
compression, and a few percent really doesn't save that much anyway.


It is not a bad thing to dive into binary sizes, but keep it to the 
point, and try to quantify savings in larger programs (e.g. the compiler 
or lazarus). Get a feel for what changes are worth it and what not, but 
be warned, there is much less to gain than people think.


(and btw, if you are serious about these scenarios, drop all 
optimization work immediately, and start working on packages :-)




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Marco van de Voort


Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel

 Most processors have a fairly large uop cache (up to 2048 for the newest

generations iirc), so this would only be for the first iteration? Do you
have a reference (agner fog page or so) or more explanation for this
that describes this?)

I have to revoke my statement. Don't have evidence to back up. Code, that lead 
me to thous conclusions, has been discarded.
  I have read most whats published in agner's fog page. There nothing to 
pinpoint as reference.
No prob. Was just interested, I had to do some sse/avx code the last 
years, and hadn't heard of this.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Mattias Gaertner via fpc-devel
On Sun, 10 Nov 2019 02:23:03 +
"J. Gareth Moreton"  wrote:

> Does the smart linker strip out LCL components that are not used, or 
> must everything that's registered in a package or unit be included? 

If you mean with "registered" the RegsiterClass or RegisterComponents
functions:
If it is registered the compiler must include it.
Usually the LCL components are only registered by designtime code.


> Granted, since forms are being read from a resource file, I doubt it
> can really be tied into the compiler that closely.


Mattias


> 
> Gareth aka. Kit
> 
> On 09/11/2019 15:50, Sven Barth via fpc-devel wrote:
> > J. Gareth Moreton  > > schrieb am Sa., 9. Nov. 2019,
> > 16:20:
> >
> >
> > On 09/11/2019 15:14, Michael Van Canneyt wrote:  
> > >
> > >
> > > On Sat, 9 Nov 2019, J. Gareth Moreton wrote:
> > >  
> > >> Competitions aside, there are times where space is a
> > >> premium,  
> > whether  
> > >> it be from distributing an application on a DVD, bandwidth
> > >> or data limits (even some first world countries are still on
> > >> dial-up in places, or are otherwise monopolised by a single,
> > >> bad-quality provider), the smaller capacity of solid-state
> > >> hard drives (especially on some laptops) and can otherwise
> > >> be a money saver sometimes.  
> > >
> > > I tend to think more size gains can be obtained from more  
> > aggressive  
> > > smartlinking.
> > > The smartlinking is sometimes disabled by the way code is
> > > written.
> > >
> > > To give an example, pas2js has a switch to convert published
> > > to  
> > public  
> > > sections. As a result, the published sections are suddenly  
> > reduced to  
> > > what is actually used in code. This produces significant size
> > > gains.
> > >
> > > Michael.
> > > ___
> > > fpc-devel maillist  - fpc-devel@lists.freepascal.org  
> >   
> > > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> > >  
> > That's true.  That's mentioned in the "size matters" article. I
> > didn't
> > know about 'published' until then.  Presumably, if that switch
> > doesn't
> > exist (like with most of the LCL), I gather the only way to
> > strip out those unused published sections is some very intelligent
> > whole-program
> > optimisation, and even then it may not work if a string (to
> > access a property name) is not deterministic.
> >
> >
> > For the LCL it's simply not possible, because it relies heavily on
> > the RTTI. And in the future that will only increase with extended
> > RTTI.
> >
> > Regards,
> > Sven
> >
> >
> > ___
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel  

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Marģers . via fpc-devel
> Op 2019-11-09 om 02:24 schreef Marģers . via fpc-devel:
> >
> > 3) it changes code location (code cross page boundaries). For my particular 
> > cpu there are 64 byte code page. If loop can fit in it, speed is twice as 
> > it overlaps even one byte over page boundary. Jumping forward is ok (as 
> > expected code flow is always forward). And there is lager page few kb - 
> > calling outside - small penalty.

> Most processors have a fairly large uop cache (up to 2048 for the newest
> generations iirc), so this would only be for the first iteration? Do you
> have a reference (agner fog page or so) or more explanation for this
> that describes this?)

I have to revoke my statement. Don't have evidence to back up. Code, that lead 
me to thous conclusions, has been discarded. 
 I have read most whats published in agner's fog page. There nothing to 
pinpoint as reference.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am So., 10. Nov.
2019, 03:23:

> Does the smart linker strip out LCL components that are not used, or must
> everything that's registered in a package or unit be included? Granted,
> since forms are being read from a resource file, I doubt it can really be
> tied into the compiler that closely
>

As long as one doesn't include the function that does the Registration then
components that are not used should be smart linked away. Cause even if
you're reading from a resource file there is still the use of the component
inside the form's declaration and thus the compiler as its usage reference.
The only problematic part might be the widgetset backend itself where
things might not be able to be smartlinked away that nicely.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread J. Gareth Moreton
Does the smart linker strip out LCL components that are not used, or 
must everything that's registered in a package or unit be included? 
Granted, since forms are being read from a resource file, I doubt it can 
really be tied into the compiler that closely.


Gareth aka. Kit

On 09/11/2019 15:50, Sven Barth via fpc-devel wrote:
J. Gareth Moreton > schrieb am Sa., 9. Nov. 2019, 16:20:



On 09/11/2019 15:14, Michael Van Canneyt wrote:
>
>
> On Sat, 9 Nov 2019, J. Gareth Moreton wrote:
>
>> Competitions aside, there are times where space is a premium,
whether
>> it be from distributing an application on a DVD, bandwidth or data
>> limits (even some first world countries are still on dial-up in
>> places, or are otherwise monopolised by a single, bad-quality
>> provider), the smaller capacity of solid-state hard drives
>> (especially on some laptops) and can otherwise be a money saver
>> sometimes.
>
> I tend to think more size gains can be obtained from more
aggressive
> smartlinking.
> The smartlinking is sometimes disabled by the way code is written.
>
> To give an example, pas2js has a switch to convert published to
public
> sections. As a result, the published sections are suddenly
reduced to
> what is actually used in code. This produces significant size gains.
>
> Michael.
> ___
> fpc-devel maillist  - fpc-devel@lists.freepascal.org

> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
That's true.  That's mentioned in the "size matters" article. I
didn't
know about 'published' until then.  Presumably, if that switch
doesn't
exist (like with most of the LCL), I gather the only way to strip out
those unused published sections is some very intelligent
whole-program
optimisation, and even then it may not work if a string (to access a
property name) is not deterministic.


For the LCL it's simply not possible, because it relies heavily on the 
RTTI. And in the future that will only increase with extended RTTI.


Regards,
Sven


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am Sa., 9. Nov. 2019,
16:20:

>
> On 09/11/2019 15:14, Michael Van Canneyt wrote:
> >
> >
> > On Sat, 9 Nov 2019, J. Gareth Moreton wrote:
> >
> >> Competitions aside, there are times where space is a premium, whether
> >> it be from distributing an application on a DVD, bandwidth or data
> >> limits (even some first world countries are still on dial-up in
> >> places, or are otherwise monopolised by a single, bad-quality
> >> provider), the smaller capacity of solid-state hard drives
> >> (especially on some laptops) and can otherwise be a money saver
> >> sometimes.
> >
> > I tend to think more size gains can be obtained from more aggressive
> > smartlinking.
> > The smartlinking is sometimes disabled by the way code is written.
> >
> > To give an example, pas2js has a switch to convert published to public
> > sections. As a result, the published sections are suddenly reduced to
> > what is actually used in code. This produces significant size gains.
> >
> > Michael.
> > ___
> > fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> >
> That's true.  That's mentioned in the "size matters" article. I didn't
> know about 'published' until then.  Presumably, if that switch doesn't
> exist (like with most of the LCL), I gather the only way to strip out
> those unused published sections is some very intelligent whole-program
> optimisation, and even then it may not work if a string (to access a
> property name) is not deterministic.
>

For the LCL it's simply not possible, because it relies heavily on the
RTTI. And in the future that will only increase with extended RTTI.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Michael Van Canneyt



On Sat, 9 Nov 2019, J. Gareth Moreton wrote:

I tend to think more size gains can be obtained from more aggressive 
smartlinking.

The smartlinking is sometimes disabled by the way code is written.

To give an example, pas2js has a switch to convert published to public 
sections. As a result, the published sections are suddenly reduced to 
what is actually used in code. This produces significant size gains.


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

That's true.  That's mentioned in the "size matters" article. I didn't 
know about 'published' until then.  Presumably, if that switch doesn't 
exist (like with most of the LCL), I gather the only way to strip out 
those unused published sections is some very intelligent whole-program 
optimisation, and even then it may not work if a string (to access a 
property name) is not deterministic.


WPO will not cut it, since the properties are usually loaded from a stream
which can be an external file or a resource. You simply do not know and have
no way to know. That is why all published variables are always kept.

Michael.___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread J. Gareth Moreton


On 09/11/2019 15:14, Michael Van Canneyt wrote:



On Sat, 9 Nov 2019, J. Gareth Moreton wrote:

Competitions aside, there are times where space is a premium, whether 
it be from distributing an application on a DVD, bandwidth or data 
limits (even some first world countries are still on dial-up in 
places, or are otherwise monopolised by a single, bad-quality 
provider), the smaller capacity of solid-state hard drives 
(especially on some laptops) and can otherwise be a money saver 
sometimes.


I tend to think more size gains can be obtained from more aggressive 
smartlinking.

The smartlinking is sometimes disabled by the way code is written.

To give an example, pas2js has a switch to convert published to public 
sections. As a result, the published sections are suddenly reduced to 
what is actually used in code. This produces significant size gains.


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

That's true.  That's mentioned in the "size matters" article. I didn't 
know about 'published' until then.  Presumably, if that switch doesn't 
exist (like with most of the LCL), I gather the only way to strip out 
those unused published sections is some very intelligent whole-program 
optimisation, and even then it may not work if a string (to access a 
property name) is not deterministic.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread J. Gareth Moreton

On 09/11/2019 13:46, Michael Van Canneyt wrote:

It's never enough:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Michael.


That level of byte-counting is just plain insanity, even for me! Still, 
when it comes to byte-counting, I did do something similar in my first 
ever patch for FPC, which was to look for things like "MOV RAX, 
40" and replace them with "MOV EAX, 40" to shave off 5 
bytes each time (4 bytes if the register is R8-R15 due to the REX 
prefix).  It actually cut the FPC executable down by about 50 kilobytes 
overall.  This optimisation works because if you set a 32-bit register 
to a particular value, the upper 32 bits are guaranteed to be set to 
zero (obviously this only works with values that are less than 2^32).


In regards to small executables though, I remember back in the mid-90s, 
I came across a file on a floppy disk named "RESTART.COM".  Now, COM 
files (don't confuse with the Component Object Model) are super-simple 
executables, pretty much containing only machine code with no sections 
or headers beyond a signature or two.  It was designed to be run from a 
batch file and its function was to restart the computer (I think it 
called a particular interrupt).  The file was something like 28 bytes in 
size, if that.


One thing to note though is that going THAT small with an executable is 
wasted effort unless you're just trying to show off.  Why?  The cluster 
size on a 1.44 MB floppy disk is 512 bytes, so even if RESTART.COM was 
only 28 bytes in size, it took 512 bytes of disk space.  Similar issues 
occur with hard drive partitions, although the cluster size I believe is 
4 KB in most cases.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Michael Van Canneyt



On Sat, 9 Nov 2019, J. Gareth Moreton wrote:

Competitions aside, there are times where space is a premium, whether it be 
from distributing an application on a DVD, bandwidth or data limits (even 
some first world countries are still on dial-up in places, or are otherwise 
monopolised by a single, bad-quality provider), the smaller capacity of 
solid-state hard drives (especially on some laptops) and can otherwise be a 
money saver sometimes.


I tend to think more size gains can be obtained from more aggressive 
smartlinking.
The smartlinking is sometimes disabled by the way code is written.

To give an example, pas2js has a switch to convert published to public sections. 
As a result, the published sections are suddenly reduced to what is actually used in code. 
This produces significant size gains.


Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread J. Gareth Moreton
Competitions aside, there are times where space is a premium, whether it 
be from distributing an application on a DVD, bandwidth or data limits 
(even some first world countries are still on dial-up in places, or are 
otherwise monopolised by a single, bad-quality provider), the smaller 
capacity of solid-state hard drives (especially on some laptops) and can 
otherwise be a money saver sometimes.


Now, I don't condone EXE compression except maybe for setup programs, 
and I don't condone making a binary smaller at the expense of 
performance (hence why you should never use the "LOOP" opcode).


It is mentioned on the Wiki article that those who program on embedded 
systems use their own customised libraries for smaller sizes and higher 
speed.  Nothing wrong with that, but I don't think RTL patches should be 
ignored straight-up.  I hoped to update uComplex to be friendlier to 
newer processors (one problem that Delphi had) by aligning the type and 
utilising vectorcall on win64 (aligning the type is enough on Unix-like 
systems for the compiler to take full advantage of the vector 
registers).  This can make a significant improvement in performance AND 
code size once the compiler gets smarter with vectorisation.  Given it's 
a very old unit, I've done my best to ensure backwards compatibility is 
maintained in Pascal-only code, although I think there will always be 
breaks in edge cases.


Maybe it is unrealistic to compare FPC to GCC and some commercial 
compilers, but I personally like to aim high anyway.


In truth, I think the FPC test suite could use some more benchmark tests 
to a) see if a proposed performance buff actually improves compiler 
speed (e.g. what I did for improving case blocks), and b) to see if a 
size reduction optimisation doesn't impact speed too badly.  Ideas that 
spring to mind... integer and floating-point division operations (recent 
versions of the compiler convert them to multiplications when possible), 
and maybe a use-case for uComplex.  Granted, I could make my own 
highly-optimised mathematical unit, but how might one recommend people 
use it over what's supplied with FPC?


Gareth aka. Kit


On 09/11/2019 13:46, Michael Van Canneyt wrote:



On Sat, 9 Nov 2019, Marco van de Voort wrote:

Seeking to reduce binary size (without sacrificing speed) and make 
as many optimisations as possible may be a fool's errand because the 
returns don't justify the costs, but I personally enjoy the 
challenge and puzzle-solving element of it.  I just hope I haven't 
scared off the administrators when I argued with Florian on my jump 
optimisations (the aforementioned inline/blobing issue).


Start with asking why you feel the need to do it, and how much would 
be enough.


It's never enough:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Michael.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Michael Van Canneyt



On Sat, 9 Nov 2019, Marco van de Voort wrote:

Seeking to reduce binary size (without sacrificing speed) and make as 
many optimisations as possible may be a fool's errand because the 
returns don't justify the costs, but I personally enjoy the challenge 
and puzzle-solving element of it.  I just hope I haven't scared off 
the administrators when I argued with Florian on my jump optimisations 
(the aforementioned inline/blobing issue).


Start with asking why you feel the need to do it, and how much would be 
enough.


It's never enough:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Michael.___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Marco van de Voort


Op 2019-11-09 om 02:24 schreef Marģers . via fpc-devel:
  
3) it changes code location (code cross page boundaries). For my particular cpu there are 64 byte code page. If loop can fit in it, speed is twice as it overlaps even one byte over page boundary. Jumping forward is ok (as expected code flow is always forward). And there is lager page few kb - calling outside - small penalty.


Most processors have a fairly large uop cache (up to 2048 for the newest 
generations iirc), so this would only be for the first iteration? Do you 
have a reference (agner fog page or so) or more explanation for this 
that describes this?)



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Marco van de Voort


Op 2019-11-08 om 23:37 schreef J. Gareth Moreton:
It is a good point.  With my C++ programs, I tend to compile with 
everything statically linked and self-contained, since it tends to be 
smaller than a dynamically-linked program plus the redistributable 
combined (and the risk of "DLL Hell" means you can't just install the 
redistributable in the System directory). Granted, it's useful if you 
are shipping a lot of small utilities together.  Admittedly, FPC might 
benefit from the dynamically-linked redistributable.  A workplace that 
I was once doing a contract for turned down my request to have FPC 
installed because there were too many EXE files to add to their 
exceptions list (the company dealt with financial data, so they were 
extra-paranoid about what gets installed on their workstations).


Did you btw read the size matters wikipedia entry? It has some hints 
about common pitfalls? (like measuring too small binaries, or pure 
winapi ones vs one with a framework behind it)


https://wiki.freepascal.org/Size_Matters

Does smart linking strip out elements of the RTL that aren't used? 
Granted, I'm trying to think of an example off the top of my head - I 
was going to say "WriteLn", but I can see the internal functions using 
them for stack traces and exception handling.


Yes. It simply starts with the entrypoint(s) and starts marking 
reachable symbols. I tried to summarize this a bit in this post:


https://stackoverflow.com/questions/4519726/delphi-which-are-the-downsides-of-having-unused-units-listed-in-the-uses-clause/4519894#4519894



I do hope I'm not a person who people wish to avoid... I've gotten too 
passionate for my own good a couple of times.  I do want to learn 
everything I can when it comes to the inner workings of a binary 
(admittedly I'm currently locked to Intel platforms, but that may 
change in future).  The warning of 'blobing' as a reason against 
inlining single-use functions was never something I was introduced to 
or was really documented anywhere, so I didn't really know any 
better.  I'm still guessing what is meant by 'blobing', but hopefully 
I can learn soon enough.


Don't worry too much. Just remain constructive and it will all sort out.



Seeking to reduce binary size (without sacrificing speed) and make as 
many optimisations as possible may be a fool's errand because the 
returns don't justify the costs, but I personally enjoy the challenge 
and puzzle-solving element of it.  I just hope I haven't scared off 
the administrators when I argued with Florian on my jump optimisations 
(the aforementioned inline/blobing issue).


Start with asking why you feel the need to do it, and how much would be 
enough.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Marģers . via fpc-devel
 

> By the way, what is your 'particular CPU'? If it's not Intel-based,
amd zen 1. gen - the same x86_64. Not much help for testing on other platforms.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton

Thanks Marģers,

That makes a lot of sense.  The jump sizes and indices for local 
variables is a big one.  On Intel processors, when generating addresses, 
if the byte displacement fits into a signed byte (-128 to 127), then 
said displacement only takes a byte to store... outside of that range it 
uses 4 bytes.  Similarly with jumps.


I'm glad you like the jump optimisations.  I just released version 3 of 
my patches a few hours ago - hopefully all is okay now.  There seems to 
be a significant improvement in the compiler speed as a result, which I 
honestly didn't expect.


By the way, what is your 'particular CPU'? If it's not Intel-based, 
would you be willing to test the patches on other platforms? I'm only 
able to run the test suite on a handful of i386 and x86_64 platforms, so 
I'm not certain how the optimisations perform.  Also, I can't guarantee 
that my 'condition_in' functions are optimal on non-Intel platforms.  
I'm fairly sure they're not incorrect, but they still need testing and 
confirmation, and in the case of PowerPC, need expanding since I don't 
know how the condition flags work on that architecture.


Gareth aka. Kit

P.S. If something doesn't require philosophy and can be theoretically 
calculated, like inlining and outlining, I want to work it out! 
(Although some algorithms take far too long to be practical, hence why I 
don't plan to implement an 'auto-pure' feature)


On 09/11/2019 01:24, Marģers . via fpc-devel wrote:

blobing  - i meant unnecessarily increase in size, that function loses good shape. There 
is no such word "blobing" in  English. My bad.
let me periphrases 'just wrong' - 'questionable right'. Currently inlining are 
left in hands of programmers. And it is abused as magical performance booster. 
For small function it's must likely true, for larger function it's questionable.
1) it might increase index size for accessing local variables on stack.
2) it might increase jump instruction size
3) it changes code location (code cross page boundaries). For my particular cpu 
there are 64 byte code page. If loop can fit in it, speed is twice as it 
overlaps even one byte over page boundary. Jumping forward is ok (as expected 
code flow is always forward). And there is lager page few kb - calling outside 
- small penalty. As fpc do not manage this any how, it's just pure luck. It 
just might get unlucky. Code align generally do not solve thous things.
Conclusion: by naked eye one cannot tell inline is any good or not. Inline or 
not to inline is nothing to do with philosophy, it has to be calculated (as 
clang does and fpc don't).

I'm looking forward for jump optimization to be accepted.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Marģers . via fpc-devel
blobing  - i meant unnecessarily increase in size, that function loses good 
shape. There is no such word "blobing" in  English. My bad.
let me periphrases 'just wrong' - 'questionable right'. Currently inlining are 
left in hands of programmers. And it is abused as magical performance booster. 
For small function it's must likely true, for larger function it's 
questionable. 
1) it might increase index size for accessing local variables on stack.
2) it might increase jump instruction size
3) it changes code location (code cross page boundaries). For my particular cpu 
there are 64 byte code page. If loop can fit in it, speed is twice as it 
overlaps even one byte over page boundary. Jumping forward is ok (as expected 
code flow is always forward). And there is lager page few kb - calling outside 
- small penalty. As fpc do not manage this any how, it's just pure luck. It 
just might get unlucky. Code align generally do not solve thous things. 
Conclusion: by naked eye one cannot tell inline is any good or not. Inline or 
not to inline is nothing to do with philosophy, it has to be calculated (as 
clang does and fpc don't). 

I'm looking forward for jump optimization to be accepted.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton
It is a good point.  With my C++ programs, I tend to compile with 
everything statically linked and self-contained, since it tends to be 
smaller than a dynamically-linked program plus the redistributable 
combined (and the risk of "DLL Hell" means you can't just install the 
redistributable in the System directory). Granted, it's useful if you 
are shipping a lot of small utilities together.  Admittedly, FPC might 
benefit from the dynamically-linked redistributable.  A workplace that I 
was once doing a contract for turned down my request to have FPC 
installed because there were too many EXE files to add to their 
exceptions list (the company dealt with financial data, so they were 
extra-paranoid about what gets installed on their workstations).


Does smart linking strip out elements of the RTL that aren't used?  
Granted, I'm trying to think of an example off the top of my head - I 
was going to say "WriteLn", but I can see the internal functions using 
them for stack traces and exception handling.


I do hope I'm not a person who people wish to avoid... I've gotten too 
passionate for my own good a couple of times.  I do want to learn 
everything I can when it comes to the inner workings of a binary 
(admittedly I'm currently locked to Intel platforms, but that may change 
in future).  The warning of 'blobing' as a reason against inlining 
single-use functions was never something I was introduced to or was 
really documented anywhere, so I didn't really know any better.  I'm 
still guessing what is meant by 'blobing', but hopefully I can learn 
soon enough.


Seeking to reduce binary size (without sacrificing speed) and make as 
many optimisations as possible may be a fool's errand because the 
returns don't justify the costs, but I personally enjoy the challenge 
and puzzle-solving element of it.  I just hope I haven't scared off the 
administrators when I argued with Florian on my jump optimisations (the 
aforementioned inline/blobing issue).


Gareth aka. Kit


On 08/11/2019 22:22, Nikolai Zhubr via fpc-devel wrote:

08.11.2019 16:28, J. Gareth Moreton:
[...]

No gain? Wow, is whole-program optimisation that underperforming? Given
the bloated size of FPC's binaries compared to, say, what a mainstream
C++ compiler than do, I would have thought that there could be a lot


Keep in mind that pretty much any tiny MSVC application would these 
days push a (few-megabytes-sized) vcredist package in front of it. 
Similarly, gcc would typically dynamically link against 
some-megabytes-sized libc and other system libraries. On the other 
hand, FPC typically produces self-contained binaries with all required 
RTL code built-in. Whether it is good or not depends on your usage 
context, but application binary size comparison should at least take 
this into account to be of some use.


--
Regards,
Nikolai


that could be stripped out in regards to unused functions and the like.
Or am I missing something?  The large binary sizes feel like an elephant
in the room that no-one talks about.  What causes them?

Gareth aka. Kit

___
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Nikolai Zhubr via fpc-devel

08.11.2019 16:28, J. Gareth Moreton:
[...]

No gain? Wow, is whole-program optimisation that underperforming? Given
the bloated size of FPC's binaries compared to, say, what a mainstream
C++ compiler than do, I would have thought that there could be a lot


Keep in mind that pretty much any tiny MSVC application would these days 
push a (few-megabytes-sized) vcredist package in front of it. Similarly, 
gcc would typically dynamically link against some-megabytes-sized libc 
and other system libraries. On the other hand, FPC typically produces 
self-contained binaries with all required RTL code built-in. Whether it 
is good or not depends on your usage context, but application binary 
size comparison should at least take this into account to be of some use.


--
Regards,
Nikolai


that could be stripped out in regards to unused functions and the like.
Or am I missing something?  The large binary sizes feel like an elephant
in the room that no-one talks about.  What causes them?

Gareth aka. Kit

___
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton
i'm willing to accept I have a lot to learn if you can educate me on 
little intricacies like that, so I may better submit code to the 
compiler.  I don't want to say it's 'just wrong' though as there might 
be gains in some situations.  I guess it has something to do with memory 
page sizes, right?


Gareth aka. Kit

On 08/11/2019 21:31, J. Gareth Moreton wrote:

Can you explain what you mean by 'blobbing'?

On 08/11/2019 19:36, Marģers . via fpc-devel wrote:
- Identifying functions that are only used once.  This became a 
slight point of contention between Florian and myself, because I 
inlined a couple of functions
Inlining every once used function is just wrong. Gain from 
eliminating call and function prologue and epilogue might not be 
sufficient to outweigh "blobing" caller function. One optimizations 
of clang  is "outline" some parts of larger functions (like else 
statement).


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton

Can you explain what you mean by 'blobbing'?

On 08/11/2019 19:36, Marģers . via fpc-devel wrote:

- Identifying functions that are only used once.  This became a slight point of 
contention between Florian and myself, because I inlined a couple of functions

Inlining every once used function is just wrong. Gain from eliminating call and function prologue 
and epilogue might not be sufficient to outweigh "blobing" caller function. One 
optimizations of clang  is "outline" some parts of larger functions (like else statement).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Marģers . via fpc-devel
> - Identifying functions that are only used once.  This became a slight point 
> of contention between Florian and myself, because I inlined a couple of 
> functions

Inlining every once used function is just wrong. Gain from eliminating call and 
function prologue and epilogue might not be sufficient to outweigh "blobing" 
caller function. One optimizations of clang  is "outline" some parts of larger 
functions (like else statement).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton
Thanks for the explanation.  I still have a lot to learn with some 
things.  I guess when you compare yourself to the behemoths, you're 
always going to look sloppy.


Things that come to mind that could be possible when I think about 
whole-program optimisation and smart linking:


- Identification of duplicate functions.  This is not always as obvious 
as code duplication, but when I've compiled the compiler before, I've 
noticed that a number of leaf functions compile into the same machine 
code, which may be due to types that are identical on a particular 
platform, or the elimination of code due to preprocessor directives, for 
example.  I imagine such identification could be done via a hash table, 
and then doing a more thorough check to see if there is an actual match 
or an unfortunate collision.  Admittedly a number of these routines are 
inlined so it might not produce much of a saving in the real world.


- Identifying functions that are only used once.  This became a slight 
point of contention between Florian and myself, because I inlined a 
couple of functions in my jump optimisations that I was absolutely 
certain were only called once elsewhere.  When a function is only called 
once, theoretically there's a slight speed and size saving if the 
function is inlined at the call.  I figure it would require 
whole-program optimisation though because the function call opcodes have 
already been implemented, while inserting the raw nodes would yield more 
optimal code (better register usage and cancelling out actual parameter 
set-up). Theorising an implementation, calls that are 'noinline' or have 
something that the compiler flags as 'cannot inline' would not be 
optimised in this way, and assembler routines are intrinsically 
'noinline' as well, so it covers that use case.


Thanks again for the education on what I don't know everything about!

Gareth aka. Kit

On 08/11/2019 16:15, Sven Barth via fpc-devel wrote:
J. Gareth Moreton > schrieb am Fr., 8. Nov. 2019, 14:28:



On 08/11/2019 13:14, Sven Barth via fpc-devel wrote:
> ...
> What's stopping that? Simple: no driving need. It's just work for
> something that has essentially no gain.

No gain? Wow, is whole-program optimisation that underperforming?
Given
the bloated size of FPC's binaries compared to, say, what a
mainstream
C++ compiler than do, I would have thought that there could be a lot
that could be stripped out in regards to unused functions and the
like.


Unused functions are handled by smart linking. No need for WPO here. 
WPO is needed for devirtualisation for example where the compiler is a 
very good usecase for due to the architecture of the backend. For 
other real world applications your mileage may vary.
One possible further WPO task would be deduplication of generic 
specializations for the same types (at least unless the target also 
supports comdat sections).

But all in all WPO isn't used that much in the real world.

Or am I missing something?  The large binary sizes feel like an
elephant
in the room that no-one talks about.  What causes them?


Mainly RTTI and the fact that FPC provides a statically linked RTL. 
Change MSVC to static linking and suddenly you get 300 KB executables 
as well.


Back when I did the first tests with dynamic packages the chmcmd 
binary only had 20 KB or so, but the necessary package libraries were 
much bigger (and there smart linking and WPO are both much less usable 
as they can only strip stuff that is not exported).


Regards,
Sven


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am Fr., 8. Nov. 2019,
14:28:

>
> On 08/11/2019 13:14, Sven Barth via fpc-devel wrote:
> > ...
> > What's stopping that? Simple: no driving need. It's just work for
> > something that has essentially no gain.
>
> No gain? Wow, is whole-program optimisation that underperforming? Given
> the bloated size of FPC's binaries compared to, say, what a mainstream
> C++ compiler than do, I would have thought that there could be a lot
> that could be stripped out in regards to unused functions and the like.
>

Unused functions are handled by smart linking. No need for WPO here. WPO is
needed for devirtualisation for example where the compiler is a very good
usecase for due to the architecture of the backend. For other real world
applications your mileage may vary.
One possible further WPO task would be deduplication of generic
specializations for the same types (at least unless the target also
supports comdat sections).
But all in all WPO isn't used that much in the real world.

Or am I missing something?  The large binary sizes feel like an elephant
> in the room that no-one talks about.  What causes them?
>

Mainly RTTI and the fact that FPC provides a statically linked RTL. Change
MSVC to static linking and suddenly you get 300 KB executables as well.

Back when I did the first tests with dynamic packages the chmcmd binary
only had 20 KB or so, but the necessary package libraries were much bigger
(and there smart linking and WPO are both much less usable as they can only
strip stuff that is not exported).

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Ben Grasset via fpc-devel
On Fri, Nov 8, 2019 at 10:02 AM J. Gareth Moreton 
wrote:

> I guess that's the consequence of Microsoft Visual C++ having such a large
> market share.
>
 I mean, GCC is far more widely used than MSVC (and actually generates
rather smaller binaries usually.)

Also the Clang version of the MS compiler (clang-cl) is objectively better,
honestly.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton
I guess that's the consequence of Microsoft Visual C++ having such a 
large market share.


Gareth aka. Kit

On 08/11/2019 14:55, Ben Grasset via fpc-devel wrote:
On Fri, Nov 8, 2019 at 8:28 AM J. Gareth Moreton 
mailto:gar...@moreton-family.com>> wrote:


The large binary sizes feel like an elephant
in the room that no-one talks about.


Relatively speaking, FPC actually does very well as far as binary size 
for a language that specifically aims to have robust RTTI 
functionality. C++ binaries are often small simply because because 
it's quite normal in C++ to build with both exception handling and 
RTTI disabled entirely, for example.


Against basically anything else FPC generally comes out significantly 
smaller, though. Ever seen the size of a Go binary? Or a Rust binary? 
Even their Helllo Worlds are non-trivially larger than FPC's.


So I think no one talks about it essentially because FPC binaries are 
already exactly the size they logically should be, given the general 
goals of the language / compiler.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Ben Grasset via fpc-devel
On Fri, Nov 8, 2019 at 8:28 AM J. Gareth Moreton 
wrote:

> The large binary sizes feel like an elephant
> in the room that no-one talks about.
>

Relatively speaking, FPC actually does very well as far as binary size for
a language that specifically aims to have robust RTTI functionality. C++
binaries are often small simply because because it's quite normal in C++ to
build with both exception handling and RTTI disabled entirely, for example.

Against basically anything else FPC generally comes out significantly
smaller, though. Ever seen the size of a Go binary? Or a Rust binary? Even
their Helllo Worlds are non-trivially larger than FPC's.

So I think no one talks about it essentially because FPC binaries are
already exactly the size they logically should be, given the general goals
of the language / compiler.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton


On 08/11/2019 13:14, Sven Barth via fpc-devel wrote:

...
What's stopping that? Simple: no driving need. It's just work for 
something that has essentially no gain.


No gain? Wow, is whole-program optimisation that underperforming? Given 
the bloated size of FPC's binaries compared to, say, what a mainstream 
C++ compiler than do, I would have thought that there could be a lot 
that could be stripped out in regards to unused functions and the like.  
Or am I missing something?  The large binary sizes feel like an elephant 
in the room that no-one talks about.  What causes them?


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am Fr., 8. Nov. 2019,
11:20:

>
> On 08/11/2019 09:49, Sven Barth via fpc-devel wrote:
> > ...
> > The compiler is compiled multiple times anyway when bootstrapping and
> > the need for the scripting there is not only the WPO, but the repeated
> > compilation of RTL and compiler.
> >
> > Also the compiler's infrastructure is currently not geared towards
> > repeated compilations in the same process. Yes, the textmode IDE
> > essentially does that, but all it does is call the compile() function.
> > For WPO that would need to happen *inside* that function.
> >
> > Regards,
> > Sven
>
> What's stopping that? It seems like a relatively straightforward
> implementation if possibly a little clumsy with the WPO files. Granted,
> something cleaner (e.g. not linking on the first pass) might take a
> minor overhaul.
>

What's stopping that? Simple: no driving need. It's just work for something
that has essentially no gain.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread J. Gareth Moreton


On 08/11/2019 09:49, Sven Barth via fpc-devel wrote:

...
The compiler is compiled multiple times anyway when bootstrapping and 
the need for the scripting there is not only the WPO, but the repeated 
compilation of RTL and compiler.


Also the compiler's infrastructure is currently not geared towards 
repeated compilations in the same process. Yes, the textmode IDE 
essentially does that, but all it does is call the compile() function. 
For WPO that would need to happen *inside* that function.


Regards,
Sven


What's stopping that? It seems like a relatively straightforward 
implementation if possibly a little clumsy with the WPO files. Granted, 
something cleaner (e.g. not linking on the first pass) might take a 
minor overhaul.


Gareth aka. Kit

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Sven Barth via fpc-devel
J. Gareth Moreton  schrieb am Fr., 8. Nov. 2019,
04:01:

> Speaking of whole program optimisation, it always seems very fiddly to
> set up, to the point that the FPC bootstrapper needs a script to get it
> working.  Not exactly user-friendly and practically demands learning a
> separate skill to get working (at least I've struggled). Shouldn't the
> compiler have an option to do the two stages of whole program
> optimisation (generate the information files, then use the information
> files) in one sitting?
>

The compiler is compiled multiple times anyway when bootstrapping and the
need for the scripting there is not only the WPO, but the repeated
compilation of RTL and compiler.

Also the compiler's infrastructure is currently not geared towards repeated
compilations in the same process. Yes, the textmode IDE essentially does
that, but all it does is call the compile() function. For WPO that would
need to happen *inside* that function.

Regards,
Sven

>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] inline... and philosophy

2019-11-07 Thread J. Gareth Moreton

Hi everyone,

This is probably more rant-like than it's supposed to be, and maybe a 
bit philosophical, especially after I ran my mouth in the jump 
optimisation issue.


So I'm wondering what the future will be for this directive, since it 
seems to be very divisive.  On one side, its use is questionable with 
proposals for 'auto-inline', while on the other side some live by it for 
minor performance gains.  There's always going to be unexpected uses for 
things, and one point of contention that was recently raised is the way 
I used it on a function that I knew was only called in one particular 
place and would stay that way for the forseeable future.  Though I wrote 
a comment to explain what I did, apparently that's not a valid reason or 
use for it.


I guess that's also an interesting argument in terms of purity of 
purpose... do you go by what something was designed to do or what it 
actually can do?  There's an argument for both - the approach of purity 
yields cleaner code but it might suffer with performance, and the 
approach of practicality is potentially more efficient, but may be much 
more difficult to maintain.


Granted, regardless of one's stance, one does have to abide by a 
project's coding standards, even if they are a little nebulous sometimes 
and one doesn't often know what they are until they're violated.  I 
apologise for getting so upset over the presence of a handful of 
directives in my patch - to explain, there were a couple of new methods 
related to my jump optimisations that were marked as inline (so long as 
DEBUG_JUMP wasn't defined) - each function was only called in exactly 
one location with no intention of calling them elsewhere.  It was set up 
this way to compartmentalise the code, separating the jump optimisations 
from the main loop of the Peephole Optimizer.  Florian took issue with 
the inline directives despite the comments explaining why they were 
there, since it wasn't exactly to the spirit of what inline is used for 
(leaf functions that compile into only a few machine code 
instructions).  I wasn't too happy about the argument that one of the 
functions MAY get called a second time in the future - possibly, but you 
can remove the inline directive when that time comes (they'd be looking 
up the function header at the very least to see what the parameters are).


I guess sometimes, especially when you've worked on some closed-source 
applications, you come across the occasional 'black magic' (for a good 
real-world example, look up "fast inverse square root") that takes a 
University thesis to explain! Depending on where you work, some are 
quite strict with coding standards and how many changes you squeeze into 
a single commit, while others a bit more cavalier... which leads to the 
minefield that's legacy code!  For me personally, I've always pushed for 
performance, hence why I have no problem dropping into assembly language 
on some critical leaf functions, but I do want to go out of my way to 
explain what I'm doing so another programmer can follow what's happening 
and improve on it if needs be.


To go back to the original subject, what are the intentions with inline? 
I've gotten the impression from Florian that he doesn't like the 
directive and would much rather leave it to the compiler to auto-inline 
short functions, which I guess is fair, and can potentially make better 
judgement calls on a per-platform basis (e.g. vector addition on AMD64 
platforms collapses into a single line of machine code (not counting 
memory moving), whereas a platform that doesn't have vector registers 
may end up with something much longer).  As for my example, 
auto-inlining a long function that only gets called from one location, 
that is certainly possible if functions are reference-counted, but 
something tells me it would require whole-program optimisation, 
especially if said function is public (or protected).


I don't trust a compiler to produce the most optimal code, whether it be 
Free Pascal, Delphi or a C++ compiler, and if I am greatly concerned 
about execution speed, I look at the disassembly (I am aware that most 
mainstream programmers don't do this).  In most cases, it's inefficient 
high-level code that simple refactoring can fix (to use an exaggerated 
example, changing a sorting algorithm to use quicksort instead of 
bubblesort), while other times there is little you can do from the code 
alone.  I do try to help the compiler though by giving hints like the 
inline directive.  To branch into a semi-related issue of the compiler 
deciding what's best... when I get pure functions working, one could 
argue that the compiler should be smart enough to determine if a 
function is pure or not, and hence would have no need for a 'pure' 
directive - this is true, but would also be prohibitively slow, since 
the compiler would be analysing every node in every function with a fine 
tooth-comb.  That is partly my concern with auto-inline as well...