Re: [fpc-devel] Kit's ambitions!
Sorry, I just realised that was unfairly impatient of me. I've still got little things I can work on, but I'm worried about creating a large backlog. Gareth On Fri 15/06/18 21:11 , "J. Gareth Moreton" gar...@moreton-family.com sent: Something tells me that we should write our own patch.exe at some point to alleviate these shortcomings! Thanks for the patch again. Any word on what I've submitted so far? I ask because I found some new peephole optimisations that can make some good speed and size savings, but one of them requires a new Pass 1 function and will either have to be merged into the binary search list, or the large case block, so I can't submit it yet until I know which way the source tree will go. Gareth aka. Kit On Fri 15/06/18 20:03 , Florian Klämpfl flor...@freepascal.org sent: Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton: > Not much luck for me - the file won't patch without options or modifications, and using -p 1 to remove the "a/" and "b/" > from the starts of the files causes an assertion in patch.exe. Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for windows. Try again with the attached one or convert the line endings of the first one to window ones. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [3] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://secureweb.fast.net.uk/ http:= [3] mailto:fpc-devel@lists.freepascal.org [4] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Something tells me that we should write our own patch.exe at some point to alleviate these shortcomings! Thanks for the patch again. Any word on what I've submitted so far? I ask because I found some new peephole optimisations that can make some good speed and size savings, but one of them requires a new Pass 1 function and will either have to be merged into the binary search list, or the large case block, so I can't submit it yet until I know which way the source tree will go. Gareth aka. Kit On Fri 15/06/18 20:03 , Florian Klämpfl flor...@freepascal.org sent: Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton: > Not much luck for me - the file won't patch without options or modifications, and using -p 1 to remove the "a/" and "b/" > from the starts of the files causes an assertion in patch.exe. Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for windows. Try again with the attached one or convert the line endings of the first one to window ones. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 15.06.2018 um 18:17 schrieb J. Gareth Moreton: Not much luck for me - the file won't patch without options or modifications, and using -p 1 to remove the "a/" and "b/" from the starts of the files causes an assertion in patch.exe. Sorry, my bad. The patch has unix line feeds, this crashes patch.exe for windows. Try again with the attached one or convert the line endings of the first one to window ones. From 7c679b8365e7819f1c39e57c2c222b08bc935f72 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Florian=20Kl=C3=A4mpfl?= Date: Sat, 11 Jun 2016 13:15:55 +0200 Subject: [PATCH] unfinished: + new pass to carry out peep hole optimizations before register allocation, this might reduce register pressure + x86: optmize LeaMov2Mov --- compiler/aopt.pas| 32 compiler/aoptobj.pas | 25 + compiler/psub.pas| 4 compiler/x86/aoptx86.pas | 47 ++- 4 files changed, 103 insertions(+), 5 deletions(-) diff --git a/compiler/aopt.pas b/compiler/aopt.pas index fde868713e..ad3590e1a1 100644 --- a/compiler/aopt.pas +++ b/compiler/aopt.pas @@ -39,9 +39,11 @@ Unit aopt; { _AsmL is the PAasmOutpout list that has to be optimized } Constructor create(_AsmL: TAsmList); virtual; reintroduce; +Destructor destroy;override; + { call the necessary optimizer procedures } Procedure Optimize;virtual; -Destructor destroy;override; +procedure PreRegallocOptimize;virtual; private procedure FindLoHiLabels; @@ -68,6 +70,7 @@ Unit aopt; casmoptimizer : TAsmOptimizerClass; cpreregallocscheduler : TAsmSchedulerClass; +procedure PreRegallocOptimize(AsmL:TAsmList); procedure Optimize(AsmL:TAsmList); procedure PreRegallocSchedule(AsmL:TAsmList); @@ -86,6 +89,7 @@ Unit aopt; inherited create(_asml,nil,nil,nil); { setup labeltable, always necessary } New(LabelInfo); +LabelInfo^.LabelTable:=nil; End; procedure TAsmOptimizer.FindLoHiLabels; @@ -315,11 +319,21 @@ Unit aopt; End; +Procedure TAsmOptimizer.PreRegallocOptimize; + begin +BlockStart := tai(AsmL.First); +PreRegAllocPeepHoleOptPass; + end; + + Destructor TAsmOptimizer.Destroy; Begin -if assigned(LabelInfo^.LabelTable) then - Freemem(LabelInfo^.LabelTable); -Dispose(LabelInfo); +if assigned(LabelInfo) then + begin +if assigned(LabelInfo^.LabelTable) then + Freemem(LabelInfo^.LabelTable); +Dispose(LabelInfo); + end; inherited Destroy; End; @@ -375,6 +389,16 @@ Unit aopt; End; +procedure PreRegallocOptimize(AsmL:TAsmList); + var +p : TAsmOptimizer; + begin +p:=casmoptimizer.Create(AsmL); +p.PreRegallocOptimize; +p.free; + end; + + procedure Optimize(AsmL:TAsmList); var p : TAsmOptimizer; diff --git a/compiler/aoptobj.pas b/compiler/aoptobj.pas index 02fc330167..985c03ec2b 100644 --- a/compiler/aoptobj.pas +++ b/compiler/aoptobj.pas @@ -344,12 +344,16 @@ Unit AoptObj; procedure PeepHoleOptPass2; virtual; procedure PostPeepHoleOpts; virtual; +procedure PreRegAllocPeepHoleOptPass; virtual; + { processor dependent methods } // if it returns true, perform a "continue" function PrePeepHoleOptsCpu(var p: tai): boolean; virtual; function PeepHoleOptPass1Cpu(var p: tai): boolean; virtual; function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual; function PostPeepHoleOptsCpu(var p: tai): boolean; virtual; + +function PreRegAllocPeepHoleOptPassCpu(var p : tai) : boolean; virtual; End; Function ArrayRefsEq(const r1, r2: TReference): Boolean; @@ -1572,4 +1576,25 @@ Unit AoptObj; result := false; end; + +procedure TAOptObj.PreRegAllocPeepHoleOptPass; + var +p: tai; + begin +p := BlockStart; +while (p <> BlockEnd) Do + begin +if PreRegAllocPeepHoleOptPassCPU(p) then + continue; +p:=tai(p.next); + end; + end; + + +function TAOptObj.PreRegAllocPeepHoleOptPassCpu(var p: tai): boolean; + begin +result := false; + end; + + End. diff --git a/compiler/psub.pas b/compiler/psub.pas index 95e5030631..b070145d77 100644 --- a/compiler/psub.pas +++ b/compiler/psub.pas @@ -1585,6 +1585,10 @@ implementation } +if (cs_opt_level1 in current_settings.optimizerswitches) and + { do not optimize pure assembler procedures } + not(pi_is_assembler in flags) then + PreRegallocOptimize(aktproccode); {$ifndef NoOpt} {$ifndef i386} if (cs_opt_scheduler in current_settings.optimizerswitches) and diff --git a
Re: [fpc-devel] Kit's ambitions!
Not much luck for me - the file won't patch without options or modifications, and using -p 1 to remove the "a/" and "b/" from the starts of the files causes an assertion in patch.exe. Back to doing it manually for now! Gareth On Fri 15/06/18 16:23 , Florian Klämpfl flor...@freepascal.org sent: Am 14.06.2018 um 23:49 schrieb J. Gareth Moreton: > Hi Florian, > > I don't know if you have any answers, but I'm unable to apply any patches I receive. I can view them and see the > changes, and manually apply them via copy+paste if I have to, but using the "Apply Patch" option ends up not doing > anything. Is there a fix to this, or does it error out because I only have read access to SVN (even though the patch > should only modify my local files)? Did you try to use the patch.exe from the command line which comes with FPC? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Oh! I'm still a beginner with version control, it seems! On Fri 15/06/18 16:23 , Florian Klämpfl flor...@freepascal.org sent: > Am 14.06.2018 um 23:49 schrieb J. Gareth Moreton: > > > Hi Florian, > > > > > > I don't know if you have any answers, but I'm > unable to apply any patches I receive. I can view them and see the > > changes, and manually apply them via copy+paste > if I have to, but using the "Apply Patch" option ends up not > doing > > anything. Is there a fix to this, or does > it error out because I only have read access to SVN (even though the patch > > should only modify my local files)? > > > > Did you try to use the patch.exe from the command line which comes with > FPC? > __ _ > > fpc-devel maillist - fpc- de...@lists.freepascal.org > http://lists.freepascal.org/cgi- bin/mailman/listinfo/fpc-devel > > > > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 14.06.2018 um 23:49 schrieb J. Gareth Moreton: Hi Florian, I don't know if you have any answers, but I'm unable to apply any patches I receive. I can view them and see the changes, and manually apply them via copy+paste if I have to, but using the "Apply Patch" option ends up not doing anything. Is there a fix to this, or does it error out because I only have read access to SVN (even though the patch should only modify my local files)? Did you try to use the patch.exe from the command line which comes with FPC? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Hi Florian, I don't know if you have any answers, but I'm unable to apply any patches I receive. I can view them and see the changes, and manually apply them via copy+paste if I have to, but using the "Apply Patch" option ends up not doing anything. Is there a fix to this, or does it error out because I only have read access to SVN (even though the patch should only modify my local files)? Looking at the design though, I can definitely experiment to see how the deep optimiser performs in the preallocation block. It will certainly have the advantage of being able to handle registers that may end up being stored on the stack due to the lack of free actual registers. If needs be, I'll submit the current deep optimiser that does all of its work after the peephole optimisation, and can change it to pre register allocation later on. I will need to see if it performs better or worse the earlier stage and also potentially cause other optimisations to get missed because of MOVs being changed or removed. Fun times ahead! Thanks for the patch. Gareth On Thu 14/06/18 21:58 , Florian Klämpfl flor...@freepascal.org sent: Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton: > I haven't fully uncovered the secrets of > the compiler yet, but I did notice "pre- > peephole pass" under x86, but I think the > only functions it touched was one of the > bit shifts. Does this occur before > register allocation or was it just > something that had to be done before Pass > 1? It is only before pass 1. I attached a patch I once started which shows the idea. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Thanks. I'll have a study of this and potentially move my initial deep optimisation component to this stage. I've made some more peephole optimisations in the meantime, but I'm going to hold off on posting them because they're starting to conflict with my other submissions. Besides, I've given you far too many patches already! Gareth On Thu 14/06/18 21:58 , Florian Klämpfl flor...@freepascal.org sent: Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton: > I haven't fully uncovered the secrets of > the compiler yet, but I did notice "pre- > peephole pass" under x86, but I think the > only functions it touched was one of the > bit shifts. Does this occur before > register allocation or was it just > something that had to be done before Pass > 1? It is only before pass 1. I attached a patch I once started which shows the idea. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 13.06.2018 um 20:50 schrieb J. Gareth Moreton: I haven't fully uncovered the secrets of the compiler yet, but I did notice "pre- peephole pass" under x86, but I think the only functions it touched was one of the bit shifts. Does this occur before register allocation or was it just something that had to be done before Pass 1? It is only before pass 1. I attached a patch I once started which shows the idea. From 7c679b8365e7819f1c39e57c2c222b08bc935f72 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Florian=20Kl=C3=A4mpfl?= Date: Sat, 11 Jun 2016 13:15:55 +0200 Subject: [PATCH] unfinished: + new pass to carry out peep hole optimizations before register allocation, this might reduce register pressure + x86: optmize LeaMov2Mov --- compiler/aopt.pas| 32 compiler/aoptobj.pas | 25 + compiler/psub.pas| 4 compiler/x86/aoptx86.pas | 47 ++- 4 files changed, 103 insertions(+), 5 deletions(-) diff --git a/compiler/aopt.pas b/compiler/aopt.pas index fde868713e..ad3590e1a1 100644 --- a/compiler/aopt.pas +++ b/compiler/aopt.pas @@ -39,9 +39,11 @@ Unit aopt; { _AsmL is the PAasmOutpout list that has to be optimized } Constructor create(_AsmL: TAsmList); virtual; reintroduce; +Destructor destroy;override; + { call the necessary optimizer procedures } Procedure Optimize;virtual; -Destructor destroy;override; +procedure PreRegallocOptimize;virtual; private procedure FindLoHiLabels; @@ -68,6 +70,7 @@ Unit aopt; casmoptimizer : TAsmOptimizerClass; cpreregallocscheduler : TAsmSchedulerClass; +procedure PreRegallocOptimize(AsmL:TAsmList); procedure Optimize(AsmL:TAsmList); procedure PreRegallocSchedule(AsmL:TAsmList); @@ -86,6 +89,7 @@ Unit aopt; inherited create(_asml,nil,nil,nil); { setup labeltable, always necessary } New(LabelInfo); +LabelInfo^.LabelTable:=nil; End; procedure TAsmOptimizer.FindLoHiLabels; @@ -315,11 +319,21 @@ Unit aopt; End; +Procedure TAsmOptimizer.PreRegallocOptimize; + begin +BlockStart := tai(AsmL.First); +PreRegAllocPeepHoleOptPass; + end; + + Destructor TAsmOptimizer.Destroy; Begin -if assigned(LabelInfo^.LabelTable) then - Freemem(LabelInfo^.LabelTable); -Dispose(LabelInfo); +if assigned(LabelInfo) then + begin +if assigned(LabelInfo^.LabelTable) then + Freemem(LabelInfo^.LabelTable); +Dispose(LabelInfo); + end; inherited Destroy; End; @@ -375,6 +389,16 @@ Unit aopt; End; +procedure PreRegallocOptimize(AsmL:TAsmList); + var +p : TAsmOptimizer; + begin +p:=casmoptimizer.Create(AsmL); +p.PreRegallocOptimize; +p.free; + end; + + procedure Optimize(AsmL:TAsmList); var p : TAsmOptimizer; diff --git a/compiler/aoptobj.pas b/compiler/aoptobj.pas index 02fc330167..985c03ec2b 100644 --- a/compiler/aoptobj.pas +++ b/compiler/aoptobj.pas @@ -344,12 +344,16 @@ Unit AoptObj; procedure PeepHoleOptPass2; virtual; procedure PostPeepHoleOpts; virtual; +procedure PreRegAllocPeepHoleOptPass; virtual; + { processor dependent methods } // if it returns true, perform a "continue" function PrePeepHoleOptsCpu(var p: tai): boolean; virtual; function PeepHoleOptPass1Cpu(var p: tai): boolean; virtual; function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual; function PostPeepHoleOptsCpu(var p: tai): boolean; virtual; + +function PreRegAllocPeepHoleOptPassCpu(var p : tai) : boolean; virtual; End; Function ArrayRefsEq(const r1, r2: TReference): Boolean; @@ -1572,4 +1576,25 @@ Unit AoptObj; result := false; end; + +procedure TAOptObj.PreRegAllocPeepHoleOptPass; + var +p: tai; + begin +p := BlockStart; +while (p <> BlockEnd) Do + begin +if PreRegAllocPeepHoleOptPassCPU(p) then + continue; +p:=tai(p.next); + end; + end; + + +function TAOptObj.PreRegAllocPeepHoleOptPassCpu(var p: tai): boolean; + begin +result := false; + end; + + End. diff --git a/compiler/psub.pas b/compiler/psub.pas index 95e5030631..b070145d77 100644 --- a/compiler/psub.pas +++ b/compiler/psub.pas @@ -1585,6 +1585,10 @@ implementation } +if (cs_opt_level1 in current_settings.optimizerswitches) and + { do not optimize pure assembler procedures } + not(pi_is_assembler in flags) then + PreRegallocOptimize(aktproccode); {$ifndef NoOpt} {$ifndef i386} if (cs_opt_scheduler in current_settings.optimizerswitches)
Re: [fpc-devel] Kit's ambitions!
I haven't fully uncovered the secrets of the compiler yet, but I did notice "pre- peephole pass" under x86, but I think the only functions it touched was one of the bit shifts. Does this occur before register allocation or was it just something that had to be done before Pass 1? Gareth On Wed 13/06/18 20:29 , Florian Klämpfl flor...@freepascal.org sent: > Am 12.06.2018 um 23:27 schrieb J. Gareth Moreton: > > > Ideally yes, but this occurs after peephole > optimisations where all of the register allocations have already been made. > > > Doing the peephole and deep optimisations while > the registers are still in a virtual state would be better overall, but > > may require a huge overhaul of the compiler that > might be asking for too much trouble. There's also the issue that > some > > commands only work with certain registers, and > optimisations have to be careful of that fact. > > > This is not that hard actually. The only difference is how register > allocations are handled. Just look at the scheduler > pass of arm, it works also before register allocation (and afterwards). > > __ _ > > fpc-devel maillist - fpc- de...@lists.freepascal.org > http://lists.freepascal.org/cgi- bin/mailman/listinfo/fpc-devel > > > > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 12.06.2018 um 23:27 schrieb J. Gareth Moreton: Ideally yes, but this occurs after peephole optimisations where all of the register allocations have already been made. Doing the peephole and deep optimisations while the registers are still in a virtual state would be better overall, but may require a huge overhaul of the compiler that might be asking for too much trouble. There's also the issue that some commands only work with certain registers, and optimisations have to be careful of that fact. This is not that hard actually. The only difference is how register allocations are handled. Just look at the scheduler pass of arm, it works also before register allocation (and afterwards). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 12.06.2018 um 23:45 schrieb nick...@gmail.com: On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote: Thanks David, I'm still learning some of the nuances of the Intel and AMD processors, but most of it is just logical analysis. Admittedly my main drive has been to shrink down the size of the binary, since Delphi and Free Pascal have always been a little bit bloated in comparison. Not that it is necessarily a bad thing, but saving space without sacrificing performance can only be a good thing, especially for those with limited bandwidth or for saving those few precious bytes when burning files to a CD or DVD. There have been a few instances in the compiled compiler (my main test case) where an entire register is freed up due to my deep optimisation, and that means the corresponding "push" and "pop" at either end of the procedure can be removed (along with the corresponding stack unwinding information), although I haven't started programming that yet. Isn't it better to perform this optimization before register allocation. Then, when this happens, the corresponding "push" and "pop" wouldn't even be put by the compiler, because the register wouldn't have to be spilled. Yes, this is what I already started once, a peephole optimizer pass being able to be run before register allocation which executes in particular optimizations which reduce register usage. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Ideally yes, but this occurs after peephole optimisations where all of the register allocations have already been made. Doing the peephole and deep optimisations while the registers are still in a virtual state would be better overall, but may require a huge overhaul of the compiler that might be asking for too much trouble. There's also the issue that some commands only work with certain registers, and optimisations have to be careful of that fact. Gareth On Tue 12/06/18 22:45 , nick...@gmail.com sent: On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote: > Thanks David, > > I'm still learning some of the nuances of the Intel and AMD > processors, but most of it is just logical analysis. Admittedly my > main drive has been to shrink down the size of the binary, since > Delphi and Free Pascal have always been a little bit bloated in > comparison. Not that it is necessarily a bad thing, but saving space > without sacrificing performance can only be a good thing, especially > for those with limited bandwidth or for saving those few precious > bytes when burning files to a CD or DVD. > > There have been a few instances in the compiled compiler (my main > test case) where an entire register is freed up due to my deep > optimisation, and that means the corresponding "push" and "pop" at > either end of the procedure can be removed (along with the > corresponding stack unwinding information), although I haven't > started programming that yet. Isn't it better to perform this optimization before register allocation. Then, when this happens, the corresponding "push" and "pop" wouldn't even be put by the compiler, because the register wouldn't have to be spilled. Nikolay ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
On Mon, 2018-06-11 at 21:07 +0100, J. Gareth Moreton wrote: > Thanks David, > > I'm still learning some of the nuances of the Intel and AMD > processors, but most of it is just logical analysis. Admittedly my > main drive has been to shrink down the size of the binary, since > Delphi and Free Pascal have always been a little bit bloated in > comparison. Not that it is necessarily a bad thing, but saving space > without sacrificing performance can only be a good thing, especially > for those with limited bandwidth or for saving those few precious > bytes when burning files to a CD or DVD. > > There have been a few instances in the compiled compiler (my main > test case) where an entire register is freed up due to my deep > optimisation, and that means the corresponding "push" and "pop" at > either end of the procedure can be removed (along with the > corresponding stack unwinding information), although I haven't > started programming that yet. Isn't it better to perform this optimization before register allocation. Then, when this happens, the corresponding "push" and "pop" wouldn't even be put by the compiler, because the register wouldn't have to be spilled. Nikolay > > I am ready to submit this part of my deep optimiser as a patch. I'm > just waiting for Florian's acceptance or rejection of my debug strip > patch - https://bugs.freepascal.org/view.php?id=33798 (the 3rd > attempt!) - only because it shares some debugging code with said > patch (it was useful to monitor how the registers inside references > were changed). If it's rejected, it just means I'll have to change > some of that debugging code a bit. > > Gareth aka. Kit > > > On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent: > > Hi, > > nice work. > > > > On 8. 6. 2018 0:46, J. Gareth Moreton wrote: > > > > > The deep optimiser changes this to: > > > > > > movq %rcx,%rax > > > movq %rdx,%rsi > > > movq %rcx,%rbx > > > > > > It determines, for the third MOV, it can > > > change %rax for %rcx to minimise a > > > pipeline stall, and then knows that %rbx > > > and %rcx contain the same value, so can > > > remove the 4th MOV completely. Given that > > > modern processors usually have at least 3 > > > ALUs and the interdependencies have been > > > removed, this will likely give a speed > > > increase of one cycle over these few > > > commands. > > > > Note that modern cpu-s can use move elimination for reg to reg > > moves, so > > it doesn't cost any execution resources (it's "free"). Despite that > > it's > > still a win, because it spares both bytes in I-cache and decoder > > bandwidth (which can indirectly lead to some spared cycle(s) at > > other > > places). > > > > David > > ___ > > fpc-devel maillist - fpc-devel@lists.freepascal.org > > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel";>htt > > p://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > > > > > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Thanks David, I'm still learning some of the nuances of the Intel and AMD processors, but most of it is just logical analysis. Admittedly my main drive has been to shrink down the size of the binary, since Delphi and Free Pascal have always been a little bit bloated in comparison. Not that it is necessarily a bad thing, but saving space without sacrificing performance can only be a good thing, especially for those with limited bandwidth or for saving those few precious bytes when burning files to a CD or DVD. There have been a few instances in the compiled compiler (my main test case) where an entire register is freed up due to my deep optimisation, and that means the corresponding "push" and "pop" at either end of the procedure can be removed (along with the corresponding stack unwinding information), although I haven't started programming that yet. I am ready to submit this part of my deep optimiser as a patch. I'm just waiting for Florian's acceptance or rejection of my debug strip patch - https://bugs.freepascal.org/view.php?id=33798 (the 3rd attempt!) - only because it shares some debugging code with said patch (it was useful to monitor how the registers inside references were changed). If it's rejected, it just means I'll have to change some of that debugging code a bit. Gareth aka. Kit On Mon 11/06/18 20:27 , David Pethes pub...@satd.sk sent: Hi, nice work. On 8. 6. 2018 0:46, J. Gareth Moreton wrote: > The deep optimiser changes this to: > > movq %rcx,%rax > movq %rdx,%rsi > movq %rcx,%rbx > > It determines, for the third MOV, it can > change %rax for %rcx to minimise a > pipeline stall, and then knows that %rbx > and %rcx contain the same value, so can > remove the 4th MOV completely. Given that > modern processors usually have at least 3 > ALUs and the interdependencies have been > removed, this will likely give a speed > increase of one cycle over these few > commands. Note that modern cpu-s can use move elimination for reg to reg moves, so it doesn't cost any execution resources (it's "free"). Despite that it's still a win, because it spares both bytes in I-cache and decoder bandwidth (which can indirectly lead to some spared cycle(s) at other places). David ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Hi, nice work. On 8. 6. 2018 0:46, J. Gareth Moreton wrote: > The deep optimiser changes this to: > > movq %rcx,%rax > movq %rdx,%rsi > movq %rcx,%rbx > > It determines, for the third MOV, it can > change %rax for %rcx to minimise a > pipeline stall, and then knows that %rbx > and %rcx contain the same value, so can > remove the 4th MOV completely. Given that > modern processors usually have at least 3 > ALUs and the interdependencies have been > removed, this will likely give a speed > increase of one cycle over these few > commands. Note that modern cpu-s can use move elimination for reg to reg moves, so it doesn't cost any execution resources (it's "free"). Despite that it's still a win, because it spares both bytes in I-cache and decoder bandwidth (which can indirectly lead to some spared cycle(s) at other places). David ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
So a progress update. I've tied in part of my deep optimiser into the peephole optimiser, specifically PostPeepholeOptMov, and it's had some unexpected benefits. One of the things it does is start with a MOV command that copies a register's contents into another, then looks at subsequent reference addresses to see if it can swap out one register for another, to reduce the chance of a pipeline stall. There are cases where it's noticed that all such registers have been switched in a certain block and hence safely removes the original MOV command. What this means is that as well as reducing the chances of a pipeline stall, it's removing unnecessary assignments. My main test case has been compiling the compiler, since it's sufficiently complex and easy to crash if incorrect machine code is produced, and it also gives plenty of examples of optimisation. As a very brief example, in compiler/x86_64/symcpu.pas in TCPUProcDef.ppuload_platform, the first four lines are: movq %rcx,%rax movq %rdx,%rsi movq %rax,%rbx movq %rbx,%rcx The deep optimiser changes this to: movq %rcx,%rax movq %rdx,%rsi movq %rcx,%rbx It determines, for the third MOV, it can change %rax for %rcx to minimise a pipeline stall, and then knows that %rbx and %rcx contain the same value, so can remove the 4th MOV completely. Given that modern processors usually have at least 3 ALUs and the interdependencies have been removed, this will likely give a speed increase of one cycle over these few commands. Before I go submitting patches though, I still need to test it under Linux and i386. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
So far, I'm researching the optimisation as listed below... tracking registers with identical values and changing them to minimise pipeline stalls. Because I don't need to keep track of their actual values, just whether they've changed since a particular MOV instruction, I've managed to move this into the peephole optimiser as an extension to TX86AsmOptimizer.PostPeepholeOptMov(). It's a bit more difficult than it looks though - I've had a lot of crashes so far when it changes a register when it shouldn't do, but I'm ironing out the bugs one by one. To truly see the gains though, one would need to perform some kind of intense timing comparison. This would be the first step in the step-by-step implementation. More in-depth deep data-flow optimisation, like successfully merging div and mod instructions of the same numerator and denominator will require some more care and thought, especially as the two divison operations may not use the same registers (if successful though, it will improve the compiler itself, since it has "x div 1000" and "x mod 1000" side-by-side in a couple of places, a common pair of expressions to produce a human-readable time metric, e.g. seconds and milliseconds). Gareth aka. Kit On Sun 03/06/18 14:12 , Florian Klämpfl flor...@freepascal.org sent: Am 21.05.2018 um 21:05 schrieb J. Gareth Moreton: > Would you object to me trying anyway, Florian? No, feel free to go ahead, but it needs to be done step by step. > It might be that I run into the same problems you had and it's too > unsafe, but I'm going by a conservative philosophy in that if it spots something that it can't work out (e.g. an > instruction that it's not programmed to handle) or is potentially unsafe (e.g. reading and writing to a block of memory > that it doesn't have control over, due to multi-threading issues), then it just stops optimising and drops all > assumptions that it has made at that point. > > As a small test case, I'm attempting to see if I can spot and optimise, for example, "mov %rax, %rbx; lea %rcx, > -8(%rsp); mov %rbx, 8(%rsp)", where a pipeline stall occurs due to a read-after-write penalty (with %rbx in this case). Things like this are fine, it gets hairy though as soon as memory locations are involved. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 21.05.2018 um 21:05 schrieb J. Gareth Moreton: Would you object to me trying anyway, Florian? No, feel free to go ahead, but it needs to be done step by step. It might be that I run into the same problems you had and it's too unsafe, but I'm going by a conservative philosophy in that if it spots something that it can't work out (e.g. an instruction that it's not programmed to handle) or is potentially unsafe (e.g. reading and writing to a block of memory that it doesn't have control over, due to multi-threading issues), then it just stops optimising and drops all assumptions that it has made at that point. As a small test case, I'm attempting to see if I can spot and optimise, for example, "mov %rax, %rbx; lea %rcx, -8(%rsp); mov %rbx, 8(%rsp)", where a pipeline stall occurs due to a read-after-write penalty (with %rbx in this case). Things like this are fine, it gets hairy though as soon as memory locations are involved. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Would you object to me trying anyway, Florian? It might be that I run into the same problems you had and it's too unsafe, but I'm going by a conservative philosophy in that if it spots something that it can't work out (e.g. an instruction that it's not programmed to handle) or is potentially unsafe (e.g. reading and writing to a block of memory that it doesn't have control over, due to multi-threading issues), then it just stops optimising and drops all assumptions that it has made at that point. As a small test case, I'm attempting to see if I can spot and optimise, for example, "mov %rax, %rbx; lea %rcx, -8(%rsp); mov %rbx, 8(%rsp)", where a pipeline stall occurs due to a read-after-write penalty (with %rbx in this case). Normally the peephole optimiser will sort out such a problem, but because LEA appears in between the MOV's, it won't catch it. The other thing is attempting to merge unsigned division and modulus into a single "div" operation - that was actually my intention behind "Minor_div_improvement" in issue #32984 - by changing SBB to SETAE, it preserves the flags register that I can make better use of it in such a deep optimiser, as I can be certain as to what its value is at that point (or at the very least know it hasn't changed). More than anything, it's a personal research project - I'd be over the moon if it works and it holds up to every test case thrown at it, but if it doesn't, well, at least I know more about assembly language and have gained experience to improve the peephole optimiser and other elements of FPC. Gareth aka. Kit On Mon 21/05/18 20:44 , Florian Klämpfl flor...@freepascal.org sent: Am 13.05.2018 um 21:02 schrieb Christo: > On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote: >> Expand on Data Flow Analysis in the compiler. >> >> What I personally call the "Deep Optimizer", I'm proposing an assembler-level optimisation >> system (although it won't touch pure assembler routines) that rearranges commands and changes >> registers in order to minimise pipeline stalls and to also collapse a "div" and "mod" >> operation into a single instruction where possible. > > I would also like the data flow analyzer to look at inline assembler and emit hints and warnings > if it picks up something incoherent. We had already a data flow analyzer for assembler in 1.0.x and early 2.x times, however, it was disabled after several years as it was too hard to make it work safely. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: -- [1] mailto:fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Am 13.05.2018 um 21:02 schrieb Christo: > On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote: >> Expand on Data Flow Analysis in the compiler. >> >> What I personally call the "Deep Optimizer", I'm proposing an >> assembler-level optimisation >> system (although it won't touch pure assembler routines) that rearranges >> commands and changes >> registers in order to minimise pipeline stalls and to also collapse a "div" >> and "mod" >> operation into a single instruction where possible. > > I would also like the data flow analyzer to look at inline assembler and emit > hints and warnings > if it picks up something incoherent. We had already a data flow analyzer for assembler in 1.0.x and early 2.x times, however, it was disabled after several years as it was too hard to make it work safely. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
What you've shown suggests that the routine is NOT inlined, as it's building a stack frame and the 'add' operations at the very top looks like padding (they're all zeroes in machine code) to align the procedure to a 16 byte boundary, and would crash the program if directly executed. Look at the disassembly where your function is called - the presence of CALL will tell you it is not inlined. Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
From: fpc-devel On Behalf Of Wolf Sent: Friday, 18 May 2018 07:27 This is the disassembly of function GetProcessorUsed: longint;inline; Unless you advise me otherwise, I take the absence of function GetProcessorUsed: longint;inline; mentioned anywhere in this screen print that GetProcessorUsed is indeed inlined. And in the face of your incredulity, I need to remind you that I get all the compiler complaints about inlining unless I restrict memory operations to the local stack, as outlined in my original message. The line numbers mentioned in your disassembly screen shot don’t line up with the source code you’ve previously posted, and the screenshot doesn’t show enough context to say anything about this being either simply the disassembly of the GetProcessorUsed function or really a place where you called GetProcessorUsed in source code and the compiler inlined the call. But given the fact that you can see it building a stack frame, I would strongly suspect that you are simply showing the disassembly of GetProcessorUsed here instead of a call site which is calling GetProcessorUsed and has the call inlined. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Hi Gareth, This is the disassembly of /function GetProcessorUsed: longint; inline; /Unless you advise me otherwise, I take the absence of /function GetProcessorUsed: longint; inline; /mentioned anywhere in this screen print that /GetProcessorUsed/ is indeed inlined. And in the face of your incredulity, I need to remind you that I get all the compiler complaints about inlining unless I restrict memory operations to the local stack, as outlined in my original message. Wolf On 17/05/2018 10:42, J. Gareth Moreton wrote: Unless I'm mistaken, Wolf, you cannot inline procedures that have asm blocks appearing anywhere (this includes the entire procedure). Nevertheless, does the disassembly of your program show it to be inlined? Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Hi, On 17. 5. 2018 0:56, Wolf wrote: > > > On 14/05/2018 04:30, David Pethes wrote: >> Hi, >> I would welcome inlining of (simple) asm routines. >> > I do not know what you consider to be the existing obstacles to inlining > assembler routines. That this doesn't get inlined :) (FPC trunk, few weeks old, Win64) function do_bextr(bits, extr: integer): integer; assembler; nostackframe; inline; asm bextr eax, ecx, edx end; As per your advice (if I get it correctly) this doesn't either: function do_bextr(bits, extr: integer): integer; inline; var foo:integer; begin asm mov ecx, bits mov edx, extr bextr eax, ecx, edx mov foo, eax end['rax', 'rcx', 'rdx']; result := foo; end; Ad your measurement function - looks complicated, but I don't know what's the scenario. Though if you're already at the level of measuring ticks and the function doesn't take more than few 100s of thousands of ticks, rdtsc is ok. Also, be sure to measure in the real program as well - measuring and optimizing in isolation under different conditions can be misguiding. FFmpeg used something like this: https://github.com/dpethes/fevh264/blob/master/core/bench.inc - place start/stop_timer around your measured function and run at least few hundreds of runs, until it stabilizes. It skips some outliers, which your code seems to do as well, but under different conditions. Advantages: dead simple, easy to integrate Disadv.: no instruction serialization (cpuid) So it's good for relative comparisons (did I make the function faster/slower?), but not for absolute tick count (how many ticks does this _precisely_ take?). David On 17. 5. 2018 0:56, Wolf wrote: > > > On 14/05/2018 04:30, David Pethes wrote: >> Hi, >> I would welcome inlining of (simple) asm routines. >> > I do not know what you consider to be the existing obstacles to inlining > assembler routines. What I do know is that in the attached program, > inlining does work. It summarises my (current) understanding of how to > measure time with nanosecond reliability > (asking for time via the Linux function "if > clock_gettime(CLOCK_MONOTONIC, @ts)=0 then" does indeed return > nanoseconds, but takes some 270 ns (or about 1000 clock ticks) to > execute and thus does not produce nanosecond reliability) > but repeated measurements do not produce the same output, and therefore > my little program does not have the reliability I want. Statistical > processing does something to improve the situation, but not quite what I > want. > > What I can say about inlining assembler routines is this: if the > variables onto which registers are to be saved are on the stack, they > can be inlined. Never mind the hints in Lazarus' message pane. Take the > /function GetProcessorUsed: longint; inline;// > //var// > // ProcUsed: longint;// > //begin// > // asm// > // CPUID// > // .byte 0x0F, 0x01, 0xF9 // read the Time-Stamp Counter rdtscp > (as op-code format),// > // movl %ecx, ProcUsed // This is the processor on which > measurements take place. Measurements on other processors are discarded.// > // end ['eax','ebx','ecx','edx'];// > // GetProcessorUsed:=ProcUsed;// > //end;/ > Because /ProcUsed/ is on the stack, I can move %ecx into it. But I > cannot get %ecx directly into /GetProcessorUsed/. That requires a > separate line of code. > > wolf > > Here is the full code, as promised. If anybody has a suggestion on how > to improve it, please let me know, in a separate thread. > > /program Speed_Test; > {$ASMMODE att} > > uses sysutils, Linux, math; > type > TtscCount = record > Group: longint; > Count: longint; > CumFreq: Int64; > end; > type > TCumFreq = record > Group: longint; > CumFreq: real; > end; > TCumFrequency= array of TCumFreq; > TTimeSpec = record > tv_sec: int64; //time_t; //Seconds > tv_nsec: int64; //clong; //Nanoseconds > end; > var > TscCount: array of TtscCount; > Measured: TCumFrequency; > MeasurementsToDo: int64=100; > ProcessorUsed: LongInt; > Range: array[0..] of longint; > ValidMeasurements: Int64; > > function Get_ClockFreq(CPU: Char): real; > {Since there is no way I can find to extract actual clock frequency, I > read it from /proc/cpuinfo } > var > FileHandle: LongInt; > i: integer; > Data: ansistring; > rc:real; > NumRead: int64; > Buffer : packed array[0..4095] of char; > SourceFile: AnsiString= '/proc/cpuinfo'; > begin > if not FileExists(SourceFile) then > begin > writeln('Error: Input file "',SourceFile,'" has not been found'); > halt; > end; > FileHandle:=FileOpen('/proc/cpuinfo',fmOpenRead); > NumRead:=FileRead(FileHandle, Buffer,SizeOf(Buffer)); > Data:=Buffer[0..NumRead]; > i:=0; > while i<=NumRead do > begin > inc(i); > if CompareText(Data[i..i+8],'Processor')=0 then > begin > if char(Data[i+12])=CPU then > begin > i:=i+12; > repeat inc(i); until Co
Re: [fpc-devel] Kit's ambitions!
Unless I'm mistaken, Wolf, you cannot inline procedures that have asm blocks appearing anywhere (this includes the entire procedure). Nevertheless, does the disassembly of your program show it to be inlined? Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
On 14/05/2018 04:30, David Pethes wrote: Hi, I would welcome inlining of (simple) asm routines. I do not know what you consider to be the existing obstacles to inlining assembler routines. What I do know is that in the attached program, inlining does work. It summarises my (current) understanding of how to measure time with nanosecond reliability (asking for time via the Linux function "if clock_gettime(CLOCK_MONOTONIC, @ts)=0 then" does indeed return nanoseconds, but takes some 270 ns (or about 1000 clock ticks) to execute and thus does not produce nanosecond reliability) but repeated measurements do not produce the same output, and therefore my little program does not have the reliability I want. Statistical processing does something to improve the situation, but not quite what I want. What I can say about inlining assembler routines is this: if the variables onto which registers are to be saved are on the stack, they can be inlined. Never mind the hints in Lazarus' message pane. Take the /function GetProcessorUsed: longint; inline;// //var// // ProcUsed: longint;// //begin// // asm// // CPUID// // .byte 0x0F, 0x01, 0xF9 // read the Time-Stamp Counter rdtscp (as op-code format),// // movl %ecx, ProcUsed // This is the processor on which measurements take place. Measurements on other processors are discarded.// // end ['eax','ebx','ecx','edx'];// // GetProcessorUsed:=ProcUsed;// //end;/ Because /ProcUsed/ is on the stack, I can move %ecx into it. But I cannot get %ecx directly into /GetProcessorUsed/. That requires a separate line of code. wolf Here is the full code, as promised. If anybody has a suggestion on how to improve it, please let me know, in a separate thread. /program Speed_Test; {$ASMMODE att} uses sysutils, Linux, math; type TtscCount = record Group: longint; Count: longint; CumFreq: Int64; end; type TCumFreq = record Group: longint; CumFreq: real; end; TCumFrequency= array of TCumFreq; TTimeSpec = record tv_sec: int64; //time_t; //Seconds tv_nsec: int64; //clong; //Nanoseconds end; var TscCount: array of TtscCount; Measured: TCumFrequency; MeasurementsToDo: int64=100; ProcessorUsed: LongInt; Range: array[0..] of longint; ValidMeasurements: Int64; function Get_ClockFreq(CPU: Char): real; {Since there is no way I can find to extract actual clock frequency, I read it from /proc/cpuinfo } var FileHandle: LongInt; i: integer; Data: ansistring; rc:real; NumRead: int64; Buffer : packed array[0..4095] of char; SourceFile: AnsiString= '/proc/cpuinfo'; begin if not FileExists(SourceFile) then begin writeln('Error: Input file "',SourceFile,'" has not been found'); halt; end; FileHandle:=FileOpen('/proc/cpuinfo',fmOpenRead); NumRead:=FileRead(FileHandle, Buffer,SizeOf(Buffer)); Data:=Buffer[0..NumRead]; i:=0; while i<=NumRead do begin inc(i); if CompareText(Data[i..i+8],'Processor')=0 then begin if char(Data[i+12])=CPU then begin i:=i+12; repeat inc(i); until CompareText(Data[i..i+6],'cpu MHz')=0 ; try rc:=StrToFloat(Data[i+11..i+18]); except on E : exception do begin writeln('Data read error: cannot convert ',Data[i+11..i+18],' into number'); writeln('Program aborted'); halt; end; end; break; end; end; end; FileClose(FileHandle); Get_ClockFreq:=rc; end; procedure ReadProcessorFrequencyInformationLeaf; inline; var CPUID_16H_AX: Word; // Processor Base Frequency (in MHz) CPUID_16H_BX: Word; // Maximum Frequency (in MHz) CPUID_16H_CX: Word; // Bus (Reference) frequency (in MHz) CPUID_16H_DX: Word; // Reserved = 0 begin CPUID_16H_AX:=0; CPUID_16H_BX:=0; CPUID_16H_CX:=0; asm mov $0x16, %eax // select Processor Frequency Information Leaf 0x16 cpuid // access it mov %ax, CPUID_16H_AX // Processor Base Frequency (in MHz) mov %bx, CPUID_16H_BX // Maximum Frequency (in MHz) mov %cx, CPUID_16H_CX // Bus (Reference) frequency (in MHz) mov %dx, CPUID_16H_DX // Reserved = 0 end ['ax','bx','cx','dx']; end; function GetProcessorUsed: longint; inline; var ProcUsed: longint; begin asm CPUID .byte 0x0F, 0x01, 0xF9 // read the Time-Stamp Counter rdtscp (as op-code format), movl %ecx, ProcUsed // This is the processor on which measurements take place. Measurements on other processors are discarded. end ['eax','ebx','ecx','edx']; GetProcessorUsed:=ProcUsed; end; procedure MeasureCode; var ts: TTimeSpec; MilliSecondTime: extended; AX, BX, CX: Word; Start,Stop,i,k,l: int64; // saves starting value from the Time Stamp counter Hi: int64; x:real; y: real=2; ProcessorUsed_Start, ProcessorUsed_Sto
Re: [fpc-devel] Kit's ambitions!
Hi, I would welcome inlining of (simple) asm routines. Lately I wanted to use the BEXTR instruction to speed up some inlined bit reading functions. As there's no intrinsic for it and including even a simple assembly method disables inlining, it didn't go well. As for using a BEXTR intrinsic instead: I'd like to try to add it, if it's welcomed. Judging by searching for POPCNT it shouldn't be that much work, but I'm likely to miss something - any advice is welcomed. There's at least one catch that I know of - there's no CPU target that supports BMI1 but not BMI2 (there are several such AMD cpu-s), so it should be added as well. David On 13. 5. 2018 4:28, J. Gareth Moreton wrote: > - Research possibility for 'inline' support for certain assembler routines. > > For situations where speed is of the highest priority, there are some > internal functions such as Int and Frac that can theoretically be > inlined (a procedure call is quite expensive, around 50 cycles), but > because they are written in pure assembly language, the compiler will > never inline them. I'm still working out quite a bit of theory, but I > believe I will be able to allow the inlining of routines that are leaf > functions (don't have CALLs of their own) and declared as > 'nostackframe'. Such a system would allow the support of 'intrinsics' > that can be composed programmatically rather than as internal routines, > though it's not exactly what Florian is planning. Even if Florian does > go for a different approach for intrinsics, I like to think that such > inline support will have uses elsewhere, especially some of the routines > like "GetStackFrame" (I think) that simply return the value of RSP (if > it's 'inline', which it is actually declared as in the unit, the return > value will be far more accurate). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
Thanks for your kind words, Christo. A lot of this is personal research, but I would like to make some elements work. For inline support on assembler routines, I'm going to be rather conservative about what will be successful - for example, if there exists a CALL operation inside the routine, I will stop trying to inline it because there's no telling what it will do to affect the registers. For the Deep Optimizer, I've programmed a coloured text output of an assembled procedure that I will probably make a feature that can be included with a $DEFINE in the Free Pascal source code, as the intention of it is to show what the assembled procedure is before it is optimised with data flow analysis, and then what it is after optimisation. Alongside standard hints and warnings, it can easily be adapted to show things that look suspicious such as an uninitialised register. We'll see. Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
On Sun, 2018-05-13 at 03:28 +0100, J. Gareth Moreton wrote: > Expand on Data Flow Analysis in the compiler. > > What I personally call the "Deep Optimizer", I'm proposing an assembler-level > optimisation > system (although it won't touch pure assembler routines) that rearranges > commands and changes > registers in order to minimise pipeline stalls and to also collapse a "div" > and "mod" > operation into a single instruction where possible. I would also like the data flow analyzer to look at inline assembler and emit hints and warnings if it picks up something incoherent. > - Research possibility for 'inline' support for certain assembler routines. This has been requested before and the argument against this was that it is difficult to predict side effects of assembler routines. I suppose if data flow analysis doesn't pick up side effects then it would be a useful feature. I'm not a developer, but want to encourage your efforts. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Kit's ambitions!
I don't know much about AVR yet, but who knows. At the moment I'm researching how inline Pascal procedures are compiled and seeing how it can be translated into assembler routines. It might end up that I end up writing a kind of mid-level assembler within the Pascal compiler that allows the registers to be changed where possible. Unless its stability is proven, the Deep Optimizer and the assembly inliner will be -O4 options, although I would like to get them down to -O3 ideally, especially the Deep Optimizer because that is purely at assembler level where the registers are no longer virtual, and in most cases, lines of code are being removed or rearranged rather than added. Gareth aka. Kit On Mon 14/05/18 18:07 , "Christo Crause" christo.cra...@gmail.com sent: This sounds great! Perhaps some of your work will trickle down to the AVR target someday. Best regards, Christo On 13 May 2018 10:31 pm, "J. Gareth Moreton" wrote: Thanks for your kind words, Christo. A lot of this is personal research, but I would like to make some elements work. For inline support on assembler routines, I'm going to be rather conservative about what will be successful - for example, if there exists a CALL operation inside the routine, I will stop trying to inline it because there's no telling what it will do to affect the registers. For the Deep Optimizer, I've programmed a coloured text output of an assembled procedure that I will probably make a feature that can be included with a $DEFINE in the Free Pascal source code, as the intention of it is to show what the assembled procedure is before it is optimised with data flow analysis, and then what it is after optimisation. Alongside standard hints and warnings, it can easily be adapted to show things that look suspicious such as an uninitialised register. We'll see. Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel