Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread J. Gareth Moreton
 Sorry, I just realised that was unfairly impatient of me.  I've still got
little things I can work on, but I'm worried about creating a large
backlog.

 Gareth

 On Fri 15/06/18 21:11 , "J. Gareth Moreton" gar...@moreton-family.com
sent:
  Something tells me that we should write our own patch.exe at some point
to alleviate these shortcomings!  Thanks for the patch again.

 Any word on what I've submitted so far? I ask because I found some new
peephole optimisations that can make some good speed and size savings, but
one of them requires a new Pass 1 function and will either have to be
merged into the binary search list, or the large case block, so I can't
submit it yet until I know which way the source tree will go.
 Gareth aka. Kit

 On Fri 15/06/18 20:03 , Florian Klämpfl flor...@freepascal.org sent:
 Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton: 
 > Not much luck for me - the file won't patch without options or
modifications, and using -p 1 to remove the "a/" and "b/" 
 > from the starts of the files causes an assertion in patch.exe. 

 Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for
windows. Try again with the attached one or 
 convert the line endings of the first one to window ones. 

 ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

  ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [3]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://secureweb.fast.net.uk/ http:=
[3] mailto:fpc-devel@lists.freepascal.org
[4] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread J. Gareth Moreton
 Something tells me that we should write our own patch.exe at some point to
alleviate these shortcomings!  Thanks for the patch again.

 Any word on what I've submitted so far? I ask because I found some new
peephole optimisations that can make some good speed and size savings, but
one of them requires a new Pass 1 function and will either have to be
merged into the binary search list, or the large case block, so I can't
submit it yet until I know which way the source tree will go.
 Gareth aka. Kit

 On Fri 15/06/18 20:03 , Florian Klämpfl flor...@freepascal.org sent:
 Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton: 
 > Not much luck for me - the file won't patch without options or
modifications, and using -p 1 to remove the "a/" and "b/" 
 > from the starts of the files causes an assertion in patch.exe. 

 Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for
windows. Try again with the attached one or 
 convert the line endings of the first one to window ones. 

 ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread Florian Klämpfl

Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton:
Not much luck for me - the file won't patch without options or modifications, and using -p 1 to remove the "a/" and "b/" 
from the starts of the files causes an assertion in patch.exe. 


Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for windows. Try again with the attached one or 
convert the line endings of the first one to window ones.
From 7c679b8365e7819f1c39e57c2c222b08bc935f72 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Florian=20Kl=C3=A4mpfl?= 
Date: Sat, 11 Jun 2016 13:15:55 +0200
Subject: [PATCH] unfinished: + new pass to carry out peep hole optimizations
 before register allocation, this might reduce register pressure + x86:
 optmize LeaMov2Mov

---
 compiler/aopt.pas| 32 
 compiler/aoptobj.pas | 25 +
 compiler/psub.pas|  4 
 compiler/x86/aoptx86.pas | 47 ++-
 4 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/compiler/aopt.pas b/compiler/aopt.pas
index fde868713e..ad3590e1a1 100644
--- a/compiler/aopt.pas
+++ b/compiler/aopt.pas
@@ -39,9 +39,11 @@ Unit aopt;
 { _AsmL is the PAasmOutpout list that has to be optimized }
 Constructor create(_AsmL: TAsmList); virtual; reintroduce;

+Destructor destroy;override;
+
 { call the necessary optimizer procedures }
 Procedure Optimize;virtual;
-Destructor destroy;override;
+procedure PreRegallocOptimize;virtual;

   private
 procedure FindLoHiLabels;
@@ -68,6 +70,7 @@ Unit aopt;
   casmoptimizer : TAsmOptimizerClass;
   cpreregallocscheduler : TAsmSchedulerClass;

+procedure PreRegallocOptimize(AsmL:TAsmList);
 procedure Optimize(AsmL:TAsmList);
 procedure PreRegallocSchedule(AsmL:TAsmList);

@@ -86,6 +89,7 @@ Unit aopt;
 inherited create(_asml,nil,nil,nil);
 { setup labeltable, always necessary }
 New(LabelInfo);
+LabelInfo^.LabelTable:=nil;
   End;

 procedure TAsmOptimizer.FindLoHiLabels;
@@ -315,11 +319,21 @@ Unit aopt;
   End;


+Procedure TAsmOptimizer.PreRegallocOptimize;
+  begin
+BlockStart := tai(AsmL.First);
+PreRegAllocPeepHoleOptPass;
+  end;
+
+
 Destructor TAsmOptimizer.Destroy;
   Begin
-if assigned(LabelInfo^.LabelTable) then
-  Freemem(LabelInfo^.LabelTable);
-Dispose(LabelInfo);
+if assigned(LabelInfo) then
+  begin
+if assigned(LabelInfo^.LabelTable) then
+  Freemem(LabelInfo^.LabelTable);
+Dispose(LabelInfo);
+  end;
 inherited Destroy;
   End;

@@ -375,6 +389,16 @@ Unit aopt;
   End;


+procedure PreRegallocOptimize(AsmL:TAsmList);
+  var
+p : TAsmOptimizer;
+  begin
+p:=casmoptimizer.Create(AsmL);
+p.PreRegallocOptimize;
+p.free;
+  end;
+
+
 procedure Optimize(AsmL:TAsmList);
   var
 p : TAsmOptimizer;
diff --git a/compiler/aoptobj.pas b/compiler/aoptobj.pas
index 02fc330167..985c03ec2b 100644
--- a/compiler/aoptobj.pas
+++ b/compiler/aoptobj.pas
@@ -344,12 +344,16 @@ Unit AoptObj;
 procedure PeepHoleOptPass2; virtual;
 procedure PostPeepHoleOpts; virtual;

+procedure PreRegAllocPeepHoleOptPass; virtual;
+
 { processor dependent methods }
 // if it returns true, perform a "continue"
 function PrePeepHoleOptsCpu(var p: tai): boolean; virtual;
 function PeepHoleOptPass1Cpu(var p: tai): boolean; virtual;
 function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual;
 function PostPeepHoleOptsCpu(var p: tai): boolean; virtual;
+
+function PreRegAllocPeepHoleOptPassCpu(var p : tai) : boolean; virtual;
   End;

Function ArrayRefsEq(const r1, r2: TReference): Boolean;
@@ -1572,4 +1576,25 @@ Unit AoptObj;
 result := false;
   end;

+
+procedure TAOptObj.PreRegAllocPeepHoleOptPass;
+  var
+p: tai;
+  begin
+p := BlockStart;
+while (p <> BlockEnd) Do
+  begin
+if PreRegAllocPeepHoleOptPassCPU(p) then
+  continue;
+p:=tai(p.next);
+  end;
+  end;
+
+
+function TAOptObj.PreRegAllocPeepHoleOptPassCpu(var p: tai): boolean;
+  begin
+result := false;
+  end;
+
+
 End.
diff --git a/compiler/psub.pas b/compiler/psub.pas
index 95e5030631..b070145d77 100644
--- a/compiler/psub.pas
+++ b/compiler/psub.pas
@@ -1585,6 +1585,10 @@ implementation
 }


+if (cs_opt_level1 in current_settings.optimizerswitches) and
+   { do not optimize pure assembler procedures }
+   not(pi_is_assembler in flags)  then
+  PreRegallocOptimize(aktproccode);
 {$ifndef NoOpt}
 {$ifndef i386}
 if (cs_opt_scheduler in current_settings.optimizerswitches) and
diff --git a

Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread J. Gareth Moreton
 Not much luck for me - the file won't patch without options or
modifications, and using -p 1 to remove the "a/" and "b/" from the starts
of the files causes an assertion in patch.exe.  Back to doing it manually
for now!

 Gareth

 On Fri 15/06/18 16:23 , Florian Klämpfl flor...@freepascal.org sent:
 Am 14.06.2018 um 23:49 schrieb J. Gareth Moreton: 
 > Hi Florian, 
 > 
 > I don't know if you have any answers, but I'm unable to apply any
patches I receive. I can view them and see the 
 > changes, and manually apply them via copy+paste if I have to, but using
the "Apply Patch" option ends up not doing 
 > anything.  Is there a fix to this, or does it error out because I only
have read access to SVN (even though the patch 
 > should only modify my local files)? 

 Did you try to use the patch.exe from the command line which comes with
FPC? 
 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread J. Gareth Moreton
Oh! I'm still a beginner with version 
control, it seems!

On Fri 15/06/18 16:23 , Florian Klämpfl 
flor...@freepascal.org sent:
> Am 14.06.2018 um 23:49 schrieb J. Gareth 
Moreton:
> 
> > Hi Florian,
> 
> > 
> 
> > I don't know if you have any answers, 
but I'm
> unable to apply any patches I receive. I 
can view them and see the 
> > changes, and manually apply them via 
copy+paste
> if I have to, but using the "Apply 
Patch" option ends up not
> doing 
> > anything.  Is there a fix to this, or 
does
> it error out because I only have read 
access to SVN (even though the patch 
> > should only modify my local files)?
> 
> 
> 
> Did you try to use the patch.exe from 
the command line which comes with
> FPC?
> 
__
_
> 
> fpc-devel maillist  -  fpc-
de...@lists.freepascal.org
> http://lists.freepascal.org/cgi-
bin/mailman/listinfo/fpc-devel
> 
> 
> 
> 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-15 Thread Florian Klämpfl

Am 14.06.2018 um 23:49 schrieb J. Gareth Moreton:

Hi Florian,

I don't know if you have any answers, but I'm unable to apply any patches I receive. I can view them and see the 
changes, and manually apply them via copy+paste if I have to, but using the "Apply Patch" option ends up not doing 
anything.  Is there a fix to this, or does it error out because I only have read access to SVN (even though the patch 
should only modify my local files)?


Did you try to use the patch.exe from the command line which comes with FPC?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-14 Thread J. Gareth Moreton
 Hi Florian,
 I don't know if you have any answers, but I'm unable to apply any patches
I receive. I can view them and see the changes, and manually apply them via
copy+paste if I have to, but using the "Apply Patch" option ends up not
doing anything.  Is there a fix to this, or does it error out because I
only have read access to SVN (even though the patch should only modify my
local files)?

 Looking at the design though, I can definitely experiment to see how the
deep optimiser performs in the preallocation block.  It will certainly
have the advantage of being able to handle registers that may end up being
stored on the stack due to the lack of free actual registers.  If needs
be, I'll submit the current deep optimiser that does all of its work after
the peephole optimisation, and can change it to pre register allocation
later on.  I will need to see if it performs better or worse the earlier
stage and also potentially cause other optimisations to get missed because
of MOVs being changed or removed.

 Fun times ahead!  Thanks for the patch.
 Gareth

 On Thu 14/06/18 21:58 , Florian Klämpfl flor...@freepascal.org sent:
 Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton: 
 > I haven't fully uncovered the secrets of 
 > the compiler yet, but I did notice "pre- 
 > peephole pass" under x86, but I think the 
 > only functions it touched was one of the 
 > bit shifts. Does this occur before 
 > register allocation or was it just 
 > something that had to be done before Pass 
 > 1? 

 It is only before pass 1. 

 I attached a patch I once started which shows the idea. 

 ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-14 Thread J. Gareth Moreton
 Thanks. I'll have a study of this and potentially move my initial deep
optimisation component to this stage.

 I've made some more peephole optimisations in the meantime, but I'm going
to hold off on posting them because they're starting to conflict with my
other submissions.  Besides, I've given you far too many patches already!

 Gareth

 On Thu 14/06/18 21:58 , Florian Klämpfl flor...@freepascal.org sent:
 Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton: 
 > I haven't fully uncovered the secrets of 
 > the compiler yet, but I did notice "pre- 
 > peephole pass" under x86, but I think the 
 > only functions it touched was one of the 
 > bit shifts. Does this occur before 
 > register allocation or was it just 
 > something that had to be done before Pass 
 > 1? 

 It is only before pass 1. 

 I attached a patch I once started which shows the idea. 

 ___
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-14 Thread Florian Klämpfl

Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton:

I haven't fully uncovered the secrets of
the compiler yet, but I did notice "pre-
peephole pass" under x86, but I think the
only functions it touched was one of the
bit shifts. Does this occur before
register allocation or was it just
something that had to be done before Pass
1?


It is only before pass 1.

I attached a patch I once started which shows the idea.
From 7c679b8365e7819f1c39e57c2c222b08bc935f72 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Florian=20Kl=C3=A4mpfl?= 
Date: Sat, 11 Jun 2016 13:15:55 +0200
Subject: [PATCH] unfinished: + new pass to carry out peep hole optimizations
 before register allocation, this might reduce register pressure + x86:
 optmize LeaMov2Mov

---
 compiler/aopt.pas| 32 
 compiler/aoptobj.pas | 25 +
 compiler/psub.pas|  4 
 compiler/x86/aoptx86.pas | 47 ++-
 4 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/compiler/aopt.pas b/compiler/aopt.pas
index fde868713e..ad3590e1a1 100644
--- a/compiler/aopt.pas
+++ b/compiler/aopt.pas
@@ -39,9 +39,11 @@ Unit aopt;
 { _AsmL is the PAasmOutpout list that has to be optimized }
 Constructor create(_AsmL: TAsmList); virtual; reintroduce;
 
+Destructor destroy;override;
+
 { call the necessary optimizer procedures }
 Procedure Optimize;virtual;
-Destructor destroy;override;
+procedure PreRegallocOptimize;virtual;
 
   private
 procedure FindLoHiLabels;
@@ -68,6 +70,7 @@ Unit aopt;
   casmoptimizer : TAsmOptimizerClass;
   cpreregallocscheduler : TAsmSchedulerClass;
 
+procedure PreRegallocOptimize(AsmL:TAsmList);
 procedure Optimize(AsmL:TAsmList);
 procedure PreRegallocSchedule(AsmL:TAsmList);
 
@@ -86,6 +89,7 @@ Unit aopt;
 inherited create(_asml,nil,nil,nil);
 { setup labeltable, always necessary }
 New(LabelInfo);
+LabelInfo^.LabelTable:=nil;
   End;
 
 procedure TAsmOptimizer.FindLoHiLabels;
@@ -315,11 +319,21 @@ Unit aopt;
   End;
 
 
+Procedure TAsmOptimizer.PreRegallocOptimize;
+  begin
+BlockStart := tai(AsmL.First);
+PreRegAllocPeepHoleOptPass;
+  end;
+
+
 Destructor TAsmOptimizer.Destroy;
   Begin
-if assigned(LabelInfo^.LabelTable) then
-  Freemem(LabelInfo^.LabelTable);
-Dispose(LabelInfo);
+if assigned(LabelInfo) then
+  begin
+if assigned(LabelInfo^.LabelTable) then
+  Freemem(LabelInfo^.LabelTable);
+Dispose(LabelInfo);
+  end;
 inherited Destroy;
   End;
 
@@ -375,6 +389,16 @@ Unit aopt;
   End;
 
 
+procedure PreRegallocOptimize(AsmL:TAsmList);
+  var
+p : TAsmOptimizer;
+  begin
+p:=casmoptimizer.Create(AsmL);
+p.PreRegallocOptimize;
+p.free;
+  end;
+
+
 procedure Optimize(AsmL:TAsmList);
   var
 p : TAsmOptimizer;
diff --git a/compiler/aoptobj.pas b/compiler/aoptobj.pas
index 02fc330167..985c03ec2b 100644
--- a/compiler/aoptobj.pas
+++ b/compiler/aoptobj.pas
@@ -344,12 +344,16 @@ Unit AoptObj;
 procedure PeepHoleOptPass2; virtual;
 procedure PostPeepHoleOpts; virtual;
 
+procedure PreRegAllocPeepHoleOptPass; virtual;
+
 { processor dependent methods }
 // if it returns true, perform a "continue"
 function PrePeepHoleOptsCpu(var p: tai): boolean; virtual;
 function PeepHoleOptPass1Cpu(var p: tai): boolean; virtual;
 function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual;
 function PostPeepHoleOptsCpu(var p: tai): boolean; virtual;
+
+function PreRegAllocPeepHoleOptPassCpu(var p : tai) : boolean; virtual;
   End;
 
Function ArrayRefsEq(const r1, r2: TReference): Boolean;
@@ -1572,4 +1576,25 @@ Unit AoptObj;
 result := false;
   end;
 
+
+procedure TAOptObj.PreRegAllocPeepHoleOptPass;
+  var
+p: tai;
+  begin
+p := BlockStart;
+while (p <> BlockEnd) Do
+  begin
+if PreRegAllocPeepHoleOptPassCPU(p) then
+  continue;
+p:=tai(p.next);
+  end;
+  end;
+
+
+function TAOptObj.PreRegAllocPeepHoleOptPassCpu(var p: tai): boolean;
+  begin
+result := false;
+  end;
+
+
 End.
diff --git a/compiler/psub.pas b/compiler/psub.pas
index 95e5030631..b070145d77 100644
--- a/compiler/psub.pas
+++ b/compiler/psub.pas
@@ -1585,6 +1585,10 @@ implementation
 }
 
 
+if (cs_opt_level1 in current_settings.optimizerswitches) and
+   { do not optimize pure assembler procedures }
+   not(pi_is_assembler in flags)  then
+  PreRegallocOptimize(aktproccode);
 {$ifndef NoOpt}
 {$ifndef i386}
 if (cs_opt_scheduler in current_settings.optimizerswitches)

Re: [fpc-devel] Kit's ambitions!

2018-06-13 Thread J. Gareth Moreton
I haven't fully uncovered the secrets of 
the compiler yet, but I did notice "pre-
peephole pass" under x86, but I think the 
only functions it touched was one of the 
bit shifts. Does this occur before 
register allocation or was it just 
something that had to be done before Pass 
1?

Gareth

On Wed 13/06/18 20:29 , Florian Klämpfl 
flor...@freepascal.org sent:
> Am 12.06.2018 um 23:27 schrieb J. Gareth 
Moreton:
> 
> > Ideally yes, but this occurs after 
peephole
> optimisations where all of the register 
allocations have already been made.
>  
> > Doing the peephole and deep 
optimisations while
> the registers are still in a virtual 
state would be better overall, but 
> > may require a huge overhaul of the 
compiler that
> might be asking for too much trouble.  
There's also the issue that
> some 
> > commands only work with certain 
registers, and
> optimisations have to be careful of that 
fact.
> 
> 
> This is not that hard actually. The only 
difference is how register
> allocations are handled. Just look at 
the scheduler 
> pass of arm, it works also before 
register allocation (and afterwards).
> 
> 
__
_
> 
> fpc-devel maillist  -  fpc-
de...@lists.freepascal.org
> http://lists.freepascal.org/cgi-
bin/mailman/listinfo/fpc-devel
> 
> 
> 
> 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-13 Thread Florian Klämpfl

Am 12.06.2018 um 23:27 schrieb J. Gareth Moreton:
Ideally yes, but this occurs after peephole optimisations where all of the register allocations have already been made.  
Doing the peephole and deep optimisations while the registers are still in a virtual state would be better overall, but 
may require a huge overhaul of the compiler that might be asking for too much trouble.  There's also the issue that some 
commands only work with certain registers, and optimisations have to be careful of that fact.


This is not that hard actually. The only difference is how register allocations are handled. Just look at the scheduler 
pass of arm, it works also before register allocation (and afterwards).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-13 Thread Florian Klämpfl

Am 12.06.2018 um 23:45 schrieb nick...@gmail.com:

On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote:

Thanks David,

I'm still learning some of the nuances of the Intel and AMD
processors, but most of it is just logical analysis.  Admittedly my
main drive has been to shrink down the size of the binary, since
Delphi and Free Pascal have always been a little bit bloated in
comparison.  Not that it is necessarily a bad thing, but saving space
without sacrificing performance can only be a good thing, especially
for those with limited bandwidth or for saving those few precious
bytes when burning files to a CD or DVD.

There have been a few instances in the compiled compiler (my main
test case) where an entire register is freed up due to my deep
optimisation, and that means the corresponding "push" and "pop" at
either end of the procedure can be removed (along with the
corresponding stack unwinding information), although I haven't
started programming that yet.


Isn't it better to perform this optimization before register
allocation. Then, when this happens, the corresponding "push" and "pop"
wouldn't even be put by the compiler, because the register wouldn't
have to be spilled.


Yes, this is what I already started once, a peephole optimizer pass being able to be run before register allocation 
which executes in particular optimizations which reduce register usage.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-12 Thread J. Gareth Moreton
 Ideally yes, but this occurs after peephole optimisations where all of the
register allocations have already been made.  Doing the peephole and deep
optimisations while the registers are still in a virtual state would be
better overall, but may require a huge overhaul of the compiler that might
be asking for too much trouble.  There's also the issue that some commands
only work with certain registers, and optimisations have to be careful of
that fact.

 Gareth

 On Tue 12/06/18 22:45 , nick...@gmail.com sent:
 On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote: 
 > Thanks David, 
 > 
 > I'm still learning some of the nuances of the Intel and AMD 
 > processors, but most of it is just logical analysis. Admittedly my 
 > main drive has been to shrink down the size of the binary, since 
 > Delphi and Free Pascal have always been a little bit bloated in 
 > comparison. Not that it is necessarily a bad thing, but saving space 
 > without sacrificing performance can only be a good thing, especially 
 > for those with limited bandwidth or for saving those few precious 
 > bytes when burning files to a CD or DVD. 
 > 
 > There have been a few instances in the compiled compiler (my main 
 > test case) where an entire register is freed up due to my deep 
 > optimisation, and that means the corresponding "push" and "pop" at 
 > either end of the procedure can be removed (along with the 
 > corresponding stack unwinding information), although I haven't 
 > started programming that yet. 

 Isn't it better to perform this optimization before register 
 allocation. Then, when this happens, the corresponding "push" and "pop" 
 wouldn't even be put by the compiler, because the register wouldn't 
 have to be spilled. 

 Nikolay 
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-12 Thread nickysn
On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote:
> Thanks David,
> 
> I'm still learning some of the nuances of the Intel and AMD
> processors, but most of it is just logical analysis.  Admittedly my
> main drive has been to shrink down the size of the binary, since
> Delphi and Free Pascal have always been a little bit bloated in
> comparison.  Not that it is necessarily a bad thing, but saving space
> without sacrificing performance can only be a good thing, especially
> for those with limited bandwidth or for saving those few precious
> bytes when burning files to a CD or DVD.
> 
> There have been a few instances in the compiled compiler (my main
> test case) where an entire register is freed up due to my deep
> optimisation, and that means the corresponding "push" and "pop" at
> either end of the procedure can be removed (along with the
> corresponding stack unwinding information), although I haven't
> started programming that yet.

Isn't it better to perform this optimization before register
allocation. Then, when this happens, the corresponding "push" and "pop"
wouldn't even be put by the compiler, because the register wouldn't
have to be spilled.

Nikolay

> 
> I am ready to submit this part of my deep optimiser as a patch.  I'm
> just waiting for Florian's acceptance or rejection of my debug strip
> patch - https://bugs.freepascal.org/view.php?id=33798 (the 3rd
> attempt!) - only because it shares some debugging code with said
> patch (it was useful to monitor how the registers inside references
> were changed).  If it's rejected, it just means I'll have to change
> some of that debugging code a bit.
> 
> Gareth aka. Kit
> 
> 
> On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent:
> > Hi, 
> > nice work. 
> > 
> > On 8. 6. 2018 0:46, J. Gareth Moreton wrote: 
> > 
> > > The deep optimiser changes this to: 
> > > 
> > > movq %rcx,%rax 
> > > movq %rdx,%rsi 
> > > movq %rcx,%rbx 
> > > 
> > > It determines, for the third MOV, it can 
> > > change %rax for %rcx to minimise a 
> > > pipeline stall, and then knows that %rbx 
> > > and %rcx contain the same value, so can 
> > > remove the 4th MOV completely. Given that 
> > > modern processors usually have at least 3 
> > > ALUs and the interdependencies have been 
> > > removed, this will likely give a speed 
> > > increase of one cycle over these few 
> > > commands. 
> > 
> > Note that modern cpu-s can use move elimination for reg to reg
> > moves, so 
> > it doesn't cost any execution resources (it's "free"). Despite that
> > it's 
> > still a win, because it spares both bytes in I-cache and decoder 
> > bandwidth (which can indirectly lead to some spared cycle(s) at
> > other 
> > places). 
> > 
> > David 
> > ___ 
> > fpc-devel maillist - fpc-devel@lists.freepascal.org 
> > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel";>htt
> > p://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 
> > 
> > 
> 
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-11 Thread J. Gareth Moreton
 Thanks David,

 I'm still learning some of the nuances of the Intel and AMD processors,
but most of it is just logical analysis.  Admittedly my main drive has
been to shrink down the size of the binary, since Delphi and Free Pascal
have always been a little bit bloated in comparison.  Not that it is
necessarily a bad thing, but saving space without sacrificing performance
can only be a good thing, especially for those with limited bandwidth or
for saving those few precious bytes when burning files to a CD or DVD.

 There have been a few instances in the compiled compiler (my main test
case) where an entire register is freed up due to my deep optimisation, and
that means the corresponding "push" and "pop" at either end of the
procedure can be removed (along with the corresponding stack unwinding
information), although I haven't started programming that yet.

 I am ready to submit this part of my deep optimiser as a patch.  I'm just
waiting for Florian's acceptance or rejection of my debug strip patch -
https://bugs.freepascal.org/view.php?id=33798 (the 3rd attempt!) - only
because it shares some debugging code with said patch (it was useful to
monitor how the registers inside references were changed).  If it's
rejected, it just means I'll have to change some of that debugging code a
bit.

 Gareth aka. Kit 

 On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent:
 Hi, 
 nice work. 

 On 8. 6. 2018 0:46, J. Gareth Moreton wrote: 

 > The deep optimiser changes this to: 
 > 
 > movq %rcx,%rax 
 > movq %rdx,%rsi 
 > movq %rcx,%rbx 
 > 
 > It determines, for the third MOV, it can 
 > change %rax for %rcx to minimise a 
 > pipeline stall, and then knows that %rbx 
 > and %rcx contain the same value, so can 
 > remove the 4th MOV completely. Given that 
 > modern processors usually have at least 3 
 > ALUs and the interdependencies have been 
 > removed, this will likely give a speed 
 > increase of one cycle over these few 
 > commands. 

 Note that modern cpu-s can use move elimination for reg to reg moves, so 
 it doesn't cost any execution resources (it's "free"). Despite that it's 
 still a win, because it spares both bytes in I-cache and decoder 
 bandwidth (which can indirectly lead to some spared cycle(s) at other 
 places). 

 David 
 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-11 Thread David Pethes
Hi,
nice work.

On 8. 6. 2018 0:46, J. Gareth Moreton wrote:

> The deep optimiser changes this to:
> 
> movq %rcx,%rax
> movq %rdx,%rsi
> movq %rcx,%rbx
> 
> It determines, for the third MOV, it can 
> change %rax for %rcx to minimise a 
> pipeline stall, and then knows that %rbx 
> and %rcx contain the same value, so can 
> remove the 4th MOV completely. Given that 
> modern processors usually have at least 3 
> ALUs and the interdependencies have been 
> removed, this will likely give a speed 
> increase of one cycle over these few 
> commands.

Note that modern cpu-s can use move elimination for reg to reg moves, so
it doesn't cost any execution resources (it's "free"). Despite that it's
still a win, because it spares both bytes in I-cache and decoder
bandwidth (which can indirectly lead to some spared cycle(s) at other
places).

David
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-07 Thread J. Gareth Moreton
So a progress update.

I've tied in part of my deep optimiser 
into the peephole optimiser, specifically 
PostPeepholeOptMov, and it's had some 
unexpected benefits. One of the things it 
does is start with a MOV command that 
copies a register's contents into another, 
then looks at subsequent reference 
addresses to see if it can swap out one 
register for another, to reduce the chance 
of a pipeline stall. There are cases where 
it's noticed that all such registers have 
been switched in a certain block and hence 
safely removes the original MOV command.

What this means is that as well as 
reducing the chances of a pipeline stall, 
it's removing unnecessary assignments.

My main test case has been compiling the 
compiler, since it's sufficiently complex 
and easy to crash if incorrect machine 
code is produced, and it also gives plenty 
of examples of optimisation. As a very 
brief example, in 
compiler/x86_64/symcpu.pas in 
TCPUProcDef.ppuload_platform, the first 
four lines are:

movq %rcx,%rax
movq %rdx,%rsi
movq %rax,%rbx
movq %rbx,%rcx

The deep optimiser changes this to:

movq %rcx,%rax
movq %rdx,%rsi
movq %rcx,%rbx

It determines, for the third MOV, it can 
change %rax for %rcx to minimise a 
pipeline stall, and then knows that %rbx 
and %rcx contain the same value, so can 
remove the 4th MOV completely. Given that 
modern processors usually have at least 3 
ALUs and the interdependencies have been 
removed, this will likely give a speed 
increase of one cycle over these few 
commands.

Before I go submitting patches though, I 
still need to test it under Linux and 
i386.

Kit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-03 Thread J. Gareth Moreton
 So far, I'm researching the optimisation as listed below... tracking
registers with identical values and changing them to minimise pipeline
stalls.  Because I don't need to keep track of their actual values, just
whether they've changed since a particular MOV instruction, I've managed to
move this into the peephole optimiser as an extension to
TX86AsmOptimizer.PostPeepholeOptMov().

 It's a bit more difficult than it looks though - I've had a lot of crashes
so far when it changes a register when it shouldn't do, but I'm ironing out
the bugs one by one.  To truly see the gains though, one would need to
perform some kind of intense timing comparison.

 This would be the first step in the step-by-step implementation.  More
in-depth deep data-flow optimisation, like successfully merging div and mod
instructions of the same numerator and denominator will require some more
care and thought, especially as the two divison operations may not use the
same registers (if successful though, it will improve the compiler itself,
since it has "x div 1000" and "x mod 1000" side-by-side in a couple of
places, a common pair of expressions to produce a human-readable time
metric, e.g. seconds and milliseconds).
 Gareth aka. Kit

 On Sun 03/06/18 14:12 , Florian Klämpfl flor...@freepascal.org sent:
 Am 21.05.2018 um 21:05 schrieb J. Gareth Moreton: 
 > Would you object to me trying anyway, Florian? 

 No, feel free to go ahead, but it needs to be done step by step. 

 > It might be that I run into the same problems you had and it's too 
 > unsafe, but I'm going by a conservative philosophy in that if it spots
something that it can't work out (e.g. an 
 > instruction that it's not programmed to handle) or is potentially unsafe
(e.g. reading and writing to a block of memory 
 > that it doesn't have control over, due to multi-threading issues), then
it just stops optimising and drops all 
 > assumptions that it has made at that point. 
 > 
 > As a small test case, I'm attempting to see if I can spot and optimise,
for example, "mov %rax, %rbx; lea %rcx, 
 > -8(%rsp); mov %rbx, 8(%rsp)", where a pipeline stall occurs due to a
read-after-write penalty (with %rbx in this case). 

 Things like this are fine, it gets hairy though as soon as memory
locations are involved. 
 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-06-03 Thread Florian Klämpfl

Am 21.05.2018 um 21:05 schrieb J. Gareth Moreton:
Would you object to me trying anyway, Florian? 


No, feel free to go ahead, but it needs to be done step by step.

It might be that I run into the same problems you had and it's too 
unsafe, but I'm going by a conservative philosophy in that if it spots something that it can't work out (e.g. an 
instruction that it's not programmed to handle) or is potentially unsafe (e.g. reading and writing to a block of memory 
that it doesn't have control over, due to multi-threading issues), then it just stops optimising and drops all 
assumptions that it has made at that point.


As a small test case, I'm attempting to see if I can spot and optimise, for example, "mov %rax, %rbx; lea %rcx, 
-8(%rsp); mov %rbx, 8(%rsp)", where a pipeline stall occurs due to a read-after-write penalty (with %rbx in this case).  


Things like this are fine, it gets hairy though as soon as memory locations are 
involved.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-21 Thread J. Gareth Moreton
 Would you object to me trying anyway, Florian? It might be that I run into
the same problems you had and it's too unsafe, but I'm going by a
conservative philosophy in that if it spots something that it can't work
out (e.g. an instruction that it's not programmed to handle) or is
potentially unsafe (e.g. reading and writing to a block of memory that it
doesn't have control over, due to multi-threading issues), then it just
stops optimising and drops all assumptions that it has made at that point.

 As a small test case, I'm attempting to see if I can spot and optimise,
for example, "mov %rax, %rbx; lea %rcx, -8(%rsp); mov %rbx, 8(%rsp)", where
a pipeline stall occurs due to a read-after-write penalty (with %rbx in
this case).  Normally the peephole optimiser will sort out such a problem,
but because LEA appears in between the MOV's, it won't catch it.  The
other thing is attempting to merge unsigned division and modulus into a
single "div" operation - that was actually my intention behind
"Minor_div_improvement" in issue #32984 - by changing SBB to SETAE, it
preserves the flags register that I can make better use of it in such a
deep optimiser, as I can be certain as to what its value is at that point
(or at the very least know it hasn't changed).
 More than anything, it's a personal research project - I'd be over the
moon if it works and it holds up to every test case thrown at it, but if it
doesn't, well, at least I know more about assembly language and have gained
experience to improve the peephole optimiser and other elements of FPC.

 Gareth aka. Kit

 On Mon 21/05/18 20:44 , Florian Klämpfl flor...@freepascal.org sent:
 Am 13.05.2018 um 21:02 schrieb Christo: 
 > On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote: 
 >>  Expand on Data Flow Analysis in the compiler. 
 >> 
 >> What I personally call the "Deep Optimizer", I'm proposing an
assembler-level optimisation 
 >> system (although it won't touch pure assembler routines) that
rearranges commands and changes 
 >> registers in order to minimise pipeline stalls and to also collapse a
"div" and "mod" 
 >> operation into a single instruction where possible.   
 > 
 > I would also like the data flow analyzer to look at inline assembler and
emit hints and warnings 
 > if it picks up something incoherent. 

 We had already a data flow analyzer for assembler in 1.0.x and early 2.x
times, however, it was 
 disabled after several years as it was too hard to make it work safely. 
 ___ 
 fpc-devel maillist - fpc-devel@lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

 

Links:
--
[1] mailto:fpc-devel@lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-21 Thread Florian Klämpfl
Am 13.05.2018 um 21:02 schrieb Christo:
> On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote:
>>  Expand on Data Flow Analysis in the compiler.
>>
>> What I personally call the "Deep Optimizer", I'm proposing an 
>> assembler-level optimisation
>> system (although it won't touch pure assembler routines) that rearranges 
>> commands and changes
>> registers in order to minimise pipeline stalls and to also collapse a "div" 
>> and "mod"
>> operation into a single instruction where possible.  
> 
> I would also like the data flow analyzer to look at inline assembler and emit 
> hints and warnings
> if it picks up something incoherent.

We had already a data flow analyzer for assembler in 1.0.x and early 2.x times, 
however, it was
disabled after several years as it was too hard to make it work safely.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-18 Thread J. Gareth Moreton
What you've shown suggests that the 
routine is NOT inlined, as it's building a 
stack frame and the 'add' operations at 
the very top looks like padding (they're 
all zeroes in machine code) to align the 
procedure to a 16 byte boundary, and would 
crash the program if directly executed.

Look at the disassembly where your 
function is called - the presence of CALL 
will tell you it is not inlined.

Gareth aka. Kit
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-18 Thread Thorsten Engler
From: fpc-devel  On Behalf Of Wolf
Sent: Friday, 18 May 2018 07:27

This is the disassembly of function GetProcessorUsed: longint;inline;  
Unless you advise me otherwise, I take the absence of function 
GetProcessorUsed: longint;inline; mentioned anywhere in this screen print 
that GetProcessorUsed is indeed inlined. And in the face of your incredulity, I 
need to remind you that I get all the compiler complaints about inlining unless 
 I restrict memory operations to the local stack, as outlined in my original 
message.



 

The line numbers mentioned in your disassembly screen shot don’t line up with 
the source code you’ve previously posted, and the screenshot doesn’t show 
enough context to say anything about this being either simply the disassembly 
of the GetProcessorUsed function or really a place where you called 
GetProcessorUsed in source code and the compiler inlined the call.

 

But given the fact that you can see it building a stack frame, I would strongly 
suspect that you are simply showing the disassembly of GetProcessorUsed here 
instead of a call site which is calling GetProcessorUsed and has the call 
inlined.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-18 Thread Wolf

Hi Gareth,



This is the disassembly of /function GetProcessorUsed: longint;    
inline; /Unless you advise me otherwise, I take the absence of /function 
GetProcessorUsed: longint;    inline; /mentioned anywhere in this screen 
print that /GetProcessorUsed/ is indeed inlined. And in the face of your 
incredulity, I need to remind you that I get all the compiler complaints 
about inlining unless  I restrict memory operations to the local stack, 
as outlined in my original message.


Wolf



On 17/05/2018 10:42, J. Gareth Moreton wrote:
Unless I'm mistaken, Wolf, you cannot inline procedures that have asm 
blocks appearing anywhere (this includes the entire procedure).  
Nevertheless, does the disassembly of your program show it to be inlined?


Gareth aka. Kit


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-17 Thread David Pethes
Hi,

On 17. 5. 2018 0:56, Wolf wrote:
>
>
> On 14/05/2018 04:30, David Pethes wrote:
>> Hi,
>> I would welcome inlining of (simple) asm routines.
>>
> I do not know what you consider to be the existing obstacles to inlining
> assembler routines.

That this doesn't get inlined :) (FPC trunk, few weeks old, Win64)

function do_bextr(bits, extr: integer): integer; assembler;
nostackframe; inline;
asm
  bextr eax, ecx, edx
end;

As per your advice (if I get it correctly) this doesn't either:

function do_bextr(bits, extr: integer): integer; inline;
var foo:integer;
begin
  asm
mov ecx, bits
mov edx, extr
bextr eax, ecx, edx
mov foo, eax
  end['rax', 'rcx', 'rdx'];
  result := foo;
end;


Ad your measurement function - looks complicated, but I don't know
what's the scenario. Though if you're already at the level of measuring
ticks and the function doesn't take more than few 100s of thousands of
ticks, rdtsc is ok. Also, be sure to measure in the real program as well
- measuring and optimizing in isolation under different conditions can
be misguiding.

FFmpeg used something like this:
https://github.com/dpethes/fevh264/blob/master/core/bench.inc - place
start/stop_timer around your measured function and run at least few
hundreds of runs, until it stabilizes. It skips some outliers, which
your code seems to do as well, but under different conditions.
Advantages: dead simple, easy to integrate
Disadv.: no instruction serialization (cpuid)
So it's good for relative comparisons (did I make the function
faster/slower?), but not for absolute tick count (how many ticks does
this _precisely_ take?).

David


On 17. 5. 2018 0:56, Wolf wrote:
> 
> 
> On 14/05/2018 04:30, David Pethes wrote:
>> Hi,
>> I would welcome inlining of (simple) asm routines. 
>>
> I do not know what you consider to be the existing obstacles to inlining
> assembler routines. What I do know is that in the attached program,
> inlining does work. It summarises my (current) understanding of how to
> measure time with nanosecond reliability
> (asking for time via the Linux function "if
> clock_gettime(CLOCK_MONOTONIC, @ts)=0 then" does indeed return
> nanoseconds, but takes some 270 ns (or about 1000 clock ticks) to
> execute and thus does not produce nanosecond reliability)
> but repeated measurements do not produce the same output, and therefore
> my little program does not have the reliability I want. Statistical
> processing does something to improve the situation, but not quite what I
> want.
> 
> What I can say about inlining assembler routines is this: if the
> variables onto which registers are to be saved are on the stack, they
> can be inlined. Never mind the hints in Lazarus' message pane. Take the
> /function GetProcessorUsed: longint;    inline;//
> //var//
> //  ProcUsed: longint;//
> //begin//
> //  asm//
> //    CPUID//
> //    .byte 0x0F, 0x01, 0xF9  // read the Time-Stamp Counter rdtscp
> (as op-code format),//
> //    movl %ecx, ProcUsed      // This is the processor on which
> measurements take place. Measurements on other processors are discarded.//
> //  end  ['eax','ebx','ecx','edx'];//
> //  GetProcessorUsed:=ProcUsed;//
> //end;/
> Because /ProcUsed/ is on the stack, I can move %ecx into it. But I
> cannot get %ecx directly into /GetProcessorUsed/. That requires a
> separate line of code.
> 
> wolf
> 
> Here is the full code, as promised. If anybody has a suggestion on how
> to improve it, please let me know, in a separate thread.
> 
> /program Speed_Test;
> {$ASMMODE att}
> 
> uses sysutils, Linux, math;
> type
>   TtscCount = record
>   Group: longint;
>   Count: longint;
>   CumFreq: Int64;
>   end;
> type
>   TCumFreq = record
>   Group: longint;
>   CumFreq: real;
>   end;
>   TCumFrequency= array of TCumFreq;
>   TTimeSpec = record
>     tv_sec: int64;  //time_t;    //Seconds
>     tv_nsec: int64; //clong; //Nanoseconds
>   end;
> var
>   TscCount: array of TtscCount;
>   Measured: TCumFrequency;
>   MeasurementsToDo: int64=100;
>   ProcessorUsed: LongInt;
>   Range: array[0..] of longint;
>   ValidMeasurements: Int64;
> 
> function Get_ClockFreq(CPU: Char): real;
> {Since there is no way I can find to extract actual clock frequency, I
> read it from /proc/cpuinfo }
> var
>   FileHandle: LongInt;
>   i: integer;
>   Data: ansistring;
>   rc:real;
>   NumRead: int64;
>   Buffer : packed array[0..4095] of char;
>   SourceFile: AnsiString= '/proc/cpuinfo';
> begin
>   if not FileExists(SourceFile) then
>   begin
>     writeln('Error: Input file "',SourceFile,'" has not been found');
>     halt;
>   end;
>   FileHandle:=FileOpen('/proc/cpuinfo',fmOpenRead);
>   NumRead:=FileRead(FileHandle, Buffer,SizeOf(Buffer));
>   Data:=Buffer[0..NumRead];
>   i:=0;
>   while i<=NumRead do
>   begin
>     inc(i);
>     if CompareText(Data[i..i+8],'Processor')=0 then
>     begin
>   if char(Data[i+12])=CPU then
>   begin
>     i:=i+12;
>     repeat inc(i); until Co

Re: [fpc-devel] Kit's ambitions!

2018-05-16 Thread J. Gareth Moreton
 Unless I'm mistaken, Wolf, you cannot inline procedures that have asm
blocks appearing anywhere (this includes the entire procedure). 
Nevertheless, does the disassembly of your program show it to be inlined?

 Gareth aka. Kit
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-16 Thread Wolf



On 14/05/2018 04:30, David Pethes wrote:

Hi,
I would welcome inlining of (simple) asm routines.

I do not know what you consider to be the existing obstacles to inlining 
assembler routines. What I do know is that in the attached program, 
inlining does work. It summarises my (current) understanding of how to 
measure time with nanosecond reliability
(asking for time via the Linux function "if 
clock_gettime(CLOCK_MONOTONIC, @ts)=0 then" does indeed return 
nanoseconds, but takes some 270 ns (or about 1000 clock ticks) to 
execute and thus does not produce nanosecond reliability)
but repeated measurements do not produce the same output, and therefore 
my little program does not have the reliability I want. Statistical 
processing does something to improve the situation, but not quite what I 
want.


What I can say about inlining assembler routines is this: if the 
variables onto which registers are to be saved are on the stack, they 
can be inlined. Never mind the hints in Lazarus' message pane. Take the

/function GetProcessorUsed: longint;    inline;//
//var//
//  ProcUsed: longint;//
//begin//
//  asm//
//    CPUID//
//    .byte 0x0F, 0x01, 0xF9  // read the Time-Stamp Counter rdtscp 
(as op-code format),//
//    movl %ecx, ProcUsed      // This is the processor on which 
measurements take place. Measurements on other processors are discarded.//

//  end  ['eax','ebx','ecx','edx'];//
//  GetProcessorUsed:=ProcUsed;//
//end;/
Because /ProcUsed/ is on the stack, I can move %ecx into it. But I 
cannot get %ecx directly into /GetProcessorUsed/. That requires a 
separate line of code.


wolf

Here is the full code, as promised. If anybody has a suggestion on how 
to improve it, please let me know, in a separate thread.


/program Speed_Test;
{$ASMMODE att}

uses sysutils, Linux, math;
type
  TtscCount = record
  Group: longint;
  Count: longint;
  CumFreq: Int64;
  end;
type
  TCumFreq = record
  Group: longint;
  CumFreq: real;
  end;
  TCumFrequency= array of TCumFreq;
  TTimeSpec = record
    tv_sec: int64;  //time_t;    //Seconds
    tv_nsec: int64; //clong; //Nanoseconds
  end;
var
  TscCount: array of TtscCount;
  Measured: TCumFrequency;
  MeasurementsToDo: int64=100;
  ProcessorUsed: LongInt;
  Range: array[0..] of longint;
  ValidMeasurements: Int64;

function Get_ClockFreq(CPU: Char): real;
{Since there is no way I can find to extract actual clock frequency, I 
read it from /proc/cpuinfo }

var
  FileHandle: LongInt;
  i: integer;
  Data: ansistring;
  rc:real;
  NumRead: int64;
  Buffer : packed array[0..4095] of char;
  SourceFile: AnsiString= '/proc/cpuinfo';
begin
  if not FileExists(SourceFile) then
  begin
    writeln('Error: Input file "',SourceFile,'" has not been found');
    halt;
  end;
  FileHandle:=FileOpen('/proc/cpuinfo',fmOpenRead);
  NumRead:=FileRead(FileHandle, Buffer,SizeOf(Buffer));
  Data:=Buffer[0..NumRead];
  i:=0;
  while i<=NumRead do
  begin
    inc(i);
    if CompareText(Data[i..i+8],'Processor')=0 then
    begin
  if char(Data[i+12])=CPU then
  begin
    i:=i+12;
    repeat inc(i); until CompareText(Data[i..i+6],'cpu MHz')=0 ;
    try
  rc:=StrToFloat(Data[i+11..i+18]);
    except
    on E : exception do
  begin
    writeln('Data read error: cannot convert 
',Data[i+11..i+18],' into number');

    writeln('Program aborted');
    halt;
  end;
    end;
    break;
  end;
    end;
  end;
  FileClose(FileHandle);
  Get_ClockFreq:=rc;
end;

procedure ReadProcessorFrequencyInformationLeaf;  inline;
var
  CPUID_16H_AX: Word;  // Processor Base Frequency (in MHz)
  CPUID_16H_BX: Word;  // Maximum Frequency (in MHz)
  CPUID_16H_CX: Word;  // Bus (Reference) frequency (in MHz)
  CPUID_16H_DX: Word;  // Reserved = 0
begin
  CPUID_16H_AX:=0;
  CPUID_16H_BX:=0;
  CPUID_16H_CX:=0;
  asm
    mov $0x16, %eax   // select Processor Frequency 
Information Leaf 0x16

    cpuid // access it
    mov %ax, CPUID_16H_AX // Processor Base Frequency (in MHz)
    mov %bx, CPUID_16H_BX // Maximum Frequency (in MHz)
    mov %cx, CPUID_16H_CX // Bus (Reference) frequency (in MHz)
    mov %dx, CPUID_16H_DX  // Reserved = 0
  end  ['ax','bx','cx','dx'];
end;

function GetProcessorUsed: longint;    inline;
var
  ProcUsed: longint;
begin
  asm
    CPUID
    .byte 0x0F, 0x01, 0xF9  // read the Time-Stamp Counter rdtscp 
(as op-code format),
    movl %ecx, ProcUsed    // This is the processor on which 
measurements take place. Measurements on other processors are discarded.

  end  ['eax','ebx','ecx','edx'];
  GetProcessorUsed:=ProcUsed;
end;

procedure MeasureCode;
var
  ts: TTimeSpec;
  MilliSecondTime: extended;
  AX, BX, CX: Word;
  Start,Stop,i,k,l: int64;   // saves starting value from the Time 
Stamp counter

  Hi: int64;
  x:real;
  y: real=2;
  ProcessorUsed_Start, ProcessorUsed_Sto

Re: [fpc-devel] Kit's ambitions!

2018-05-16 Thread David Pethes
Hi,
I would welcome inlining of (simple) asm routines. Lately I wanted to
use the BEXTR instruction to speed up some inlined bit reading
functions. As there's no intrinsic for it and including even a simple
assembly method disables inlining, it didn't go well.

As for using a BEXTR intrinsic instead: I'd like to try to add it, if
it's welcomed. Judging by searching for POPCNT it shouldn't be that much
work, but I'm likely to miss something - any advice is welcomed.
There's at least one catch that I know of - there's no CPU target that
supports BMI1 but not BMI2 (there are several such AMD cpu-s), so it
should be added as well.


David

On 13. 5. 2018 4:28, J. Gareth Moreton wrote:

> - Research possibility for 'inline' support for certain assembler routines.
> 
> For situations where speed is of the highest priority, there are some
> internal functions such as Int and Frac that can theoretically be
> inlined (a procedure call is quite expensive, around 50 cycles), but
> because they are written in pure assembly language, the compiler will
> never inline them.  I'm still working out quite a bit of theory, but I
> believe I will be able to allow the inlining of routines that are leaf
> functions (don't have CALLs of their own) and declared as
> 'nostackframe'.  Such a system would allow the support of 'intrinsics'
> that can be composed programmatically rather than as internal routines,
> though it's not exactly what Florian is planning. Even if Florian does
> go for a different approach for intrinsics, I like to think that such
> inline support will have uses elsewhere, especially some of the routines
> like "GetStackFrame" (I think) that simply return the value of RSP (if
> it's 'inline', which it is actually declared as in the unit, the return
> value will be far more accurate).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-15 Thread J. Gareth Moreton
 Thanks for your kind words, Christo. A lot of this is personal research,
but I would like to make some elements work.  For inline support on
assembler routines, I'm going to be rather conservative about what will be
successful - for example, if there exists a CALL operation inside the
routine, I will stop trying to inline it because there's no telling what it
will do to affect the registers.

 For the Deep Optimizer, I've programmed a coloured text output of an
assembled procedure that I will probably make a feature that can be
included with a $DEFINE in the Free Pascal source code, as the intention of
it is to show what the assembled procedure is before it is optimised with
data flow analysis, and then what it is after optimisation.  Alongside
standard hints and warnings, it can easily be adapted to show things that
look suspicious such as an uninitialised register. We'll see.

 Gareth aka. Kit
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-15 Thread Christo
On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote:
>  Expand on Data Flow Analysis in the compiler.
> 
> What I personally call the "Deep Optimizer", I'm proposing an assembler-level 
> optimisation
> system (although it won't touch pure assembler routines) that rearranges 
> commands and changes
> registers in order to minimise pipeline stalls and to also collapse a "div" 
> and "mod"
> operation into a single instruction where possible.  

I would also like the data flow analyzer to look at inline assembler and emit 
hints and warnings
if it picks up something incoherent.

> - Research possibility for 'inline' support for certain assembler routines.

This has been requested before and the argument against this was that it is 
difficult to predict
side effects of assembler routines.  I suppose if data flow analysis doesn't 
pick up side
effects then it would be a useful feature.

I'm not a developer, but want to encourage your efforts.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Kit's ambitions!

2018-05-15 Thread J. Gareth Moreton
 I don't know much about AVR yet, but who knows.  At the moment I'm
researching how inline Pascal procedures are compiled and seeing how it can
be translated into assembler routines. It might end up that I end up
writing a kind of mid-level assembler within the Pascal compiler that
allows the registers to be changed where possible.

 Unless its stability is proven, the Deep Optimizer and the assembly
inliner will be -O4 options, although I would like to get them down to -O3
ideally, especially the Deep Optimizer because that is purely at assembler
level where the registers are no longer virtual, and in most cases, lines
of code are being removed or rearranged rather than added.

 Gareth aka. Kit

 On Mon 14/05/18 18:07 , "Christo Crause" christo.cra...@gmail.com sent:
 This sounds great! Perhaps some of your work will trickle down to the AVR
target someday. 
 Best regards, Christo

 On 13 May 2018 10:31 pm, "J. Gareth Moreton"  wrote:
 Thanks for your kind words, Christo. A lot of this is personal research,
but I would like to make some elements work.  For inline support on
assembler routines, I'm going to be rather conservative about what will be
successful - for example, if there exists a CALL operation inside the
routine, I will stop trying to inline it because there's no telling what it
will do to affect the registers.

 For the Deep Optimizer, I've programmed a coloured text output of an
assembled procedure that I will probably make a feature that can be
included with a $DEFINE in the Free Pascal source code, as the intention of
it is to show what the assembled procedure is before it is optimised with
data flow analysis, and then what it is after optimisation.  Alongside
standard hints and warnings, it can easily be adapted to show things that
look suspicious such as an uninitialised register. We'll see.

 Gareth aka. Kit

  ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel