Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-28 Thread J. Gareth Moreton
I figured so, although when I implemented such a pooled object, it unintentionally fixed a couple of memory leaks!  Of course, it might just be a lazy workaround instead of putting in the missing "ReleaseUsedRegs" commands. Gareth aka. Kit On Fri 28/12/18 17:41 , Florian Klämpfl

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-28 Thread Florian Klämpfl
Am 15.12.2018 um 16:18 schrieb J. Gareth Moreton: > Ah right, so things like "TmpUsedRegs" (an array of TUsedRegs) constantly > being created and destroyed in the peephole > optimizer is actually not that much of a penalty hit, and creating a pooled > object for continuous use doesn't give that

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-27 Thread J. Gareth Moreton
Is it possible to still consider these changes? They do give big time savings when compiling large projects under x86_64 and a couple of the rewritten functions perform better in finding optimisations with jumps. I'm holding off doing additional peephole additions until I know whether this

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-23 Thread J. Gareth Moreton
Updated https://bugs.freepascal.org/view.php?id=34628 - new "overhaul-mov-refactor.patch" that now changes "movl addl movq" to "addl" (and equivalently with "incl").  Currently the patches only apply to x86_64, but i386 is ready for upload if approved... much less splitting involved! Gareth

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-22 Thread J. Gareth Moreton
Saying that, it might not be a bug but a design choice.  If the compiler is able to extend the variable to 64 bits on the stack, it will do, including the use of "incq" over "incl" - whenever I try to exploit the upper 32 bits, the compiler is too smart for that! The only problem is that it can

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-22 Thread J. Gareth Moreton
Have to apologise again for my web client making life difficult for the mail archive system. Currently I'm a little reluctant to put in the "incq" fix because the code isn't equivalent.  More than anything, it's a very minor bug with the node system in that it writes the full 64-bit register

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-17 Thread Marco van de Voort
Op 12/17/2018 om 8:23 AM schreef Ryan Joseph: On Dec 16, 2018, at 10:57 PM, Marco van de Voort wrote: I'm no expert, but afaik creating an object involves an exception frame, which is afaik cheaper in Delphi with SEH, then FPC with setjmp. Even if there is no try..except block? Try..

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-16 Thread Ryan Joseph
> On Dec 16, 2018, at 10:57 PM, Marco van de Voort > wrote: > > I'm no expert, but afaik creating an object involves an exception frame, > which is afaik cheaper in Delphi with SEH, then FPC with setjmp. Even if there is no try..except block? I don’t use exceptions in FPC so shouldn’t this

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-16 Thread Marco van de Voort
Op 2018-12-15 om 19:01 schreef Martok: memory manager in daily use. Doing that is a C++-ism that shouldn't exist in a sane environment ;-) I just tested something, and I'm a surprised by how big the difference is. This simple test is 1.5 times slower in FPC/trunk/win32 than Delphi 2007 and

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Jonas Maebe
On 15/12/18 21:04, Ben Grasset wrote: On Sat, Dec 15, 2018 at 2:43 PM Jonas Maebe > wrote: That is incorrect. I didn't mean that it doesn't *care* about being fast, but more that it will not necessarily use more memory in all cases that it might result in a

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread J. Gareth Moreton
Apologies for my webmail client problems.  There's little I can do about that. This was the object pooling that I did - https://bugs.freepascal.org/view.php?id=34679 - though there's some cycle counting involved (e.g. in OptPass1MOV, only the integer registers are updated instead of all 8

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Ben Grasset
On Sat, Dec 15, 2018 at 2:37 PM Martok wrote: > In any case, FPC's cmem on Win32 calls into mscvrt, and that is so slow > that I > killed the test code after a couple of minutes, where even FPC-builtin was > done > after 10 seconds. > Interesting. On Win64 I've found it to be consistently

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Jonas Maebe
On 15/12/18 19:37, Ben Grasset wrote: Should this really be surprising at all though? To me it seems obvious why that would be the case. Delphi the compiler (not the IDE) is not written in Pascal. It's written in a combination of C and C++. Thus, I would imagine that Delphi's *default*

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Martok
Sorry for hijacking the thread. Your mail client issue makes the conversation really hard to follow, so I have literally no idea what the current subtopic of a reply chain is, and there's little point in properly detaching a thread. Am 15.12.2018 um 18:13 schrieb J. Gareth Moreton: > I dare ask,

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread J. Gareth Moreton
Well, Florian did say he was concerned about the increased maintenance costs, given how complex the compiler is already.  Granted, it's one of the few surefire ways that I've sped up the compiler quite significantly.  Other speed-ups like other case block algorithms may also help. Though the

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Ben Grasset
On Sat, Dec 15, 2018 at 1:14 PM J. Gareth Moreton wrote: > P.S. This thread is supposed to be for the x86_64 optimizer overhaul that > I presented! > Despite the other reply I just sent about the memory management stuff I also agree here! Your changes look very beneficial and it would be nice

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Ben Grasset
On Sat, Dec 15, 2018 at 1:01 PM Martok wrote: > I just tested something, and I'm a surprised by how big the difference is. > Should this really be surprising at all though? To me it seems obvious why that would be the case. Delphi the compiler (not the IDE) is not written in Pascal. It's

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread J. Gareth Moreton
I dare ask, does that mean we should avoid workarounds in the compiler (and our own programs) that aim to avoid constant construction and destruction of objects, and instead try to improve the memory manager? So many discoveries! Gareth aka. Kit P.S. This thread is supposed to be for the

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Martok
Am 15.12.2018 um 17:12 schrieb Florian Klämpfl: > The memory manager itself pools already, so no need for the compiler. If > somebody wants to improve the heap manager: > implement OS supported re-allocations (OS can move memory by just shuffling > pages). Very much agree, it's not a user

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread J. Gareth Moreton
Ah right, so things like "TmpUsedRegs" (an array of TUsedRegs) constantly being created and destroyed in the peephole optimizer is actually not that much of a penalty hit, and creating a pooled object for continuous use doesn't give that much of a performance gain? Gareth On Sat 15/12/18

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-15 Thread Florian Klämpfl
Am 12.12.2018 um 13:49 schrieb Ryan Joseph: > > >> On Dec 12, 2018, at 7:20 PM, Martok wrote: >> >> Checking out the memory manager(s) could be useful as well - there are a lot >> of >> small allocations, that generally tends to put much stress on it. >> And any improvement there would also

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
  - Reply to message - Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 12. decembris 17:02:02 From: J. Gareth Moreton To: FPC developers' list > By the way, what generates that set of > operations? I'm curious because I want to > see what's going on in the

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread J. Gareth Moreton
By the way, what generates that set of operations? I'm curious because I want to see what's going on in the compiler. You see, "incq" and that "mov, add, mov" set aren't equivalent; anything over $1 gets truncated with the set, but not with "incq", although it's not a concern if

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Ryan Joseph
> On Dec 12, 2018, at 7:59 PM, Ryan Joseph wrote: > > For example every time you it parses “1 + 1” a large code block is entered Correction, 1+1 doesn’t enter a large code block unless there’s an overload present. Once you add overloads however that’s when a caching solution would start

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
  > Nice spot with the "incq" command there.  It wasn't intentional for that to be split into 3 commands, but is likely just a side-effect of pass 1 not being run twice now... granted, since one of my criteria was that the code should not be less optimal, I'll see if I can watch out for that one.

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread J. Gareth Moreton
is written. Gareth aka. Kit On Wed 12/12/18 13:08 , "Marģers ." margers.ro...@inbox.lv sent:   - Reply to message ----- Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 6. decembris 18:57:29 From: J. Gareth Moreton To: FPC developers' list > I beli

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
  - Reply to message - Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 6. decembris 18:57:29 From: J. Gareth Moreton To: FPC developers' list > I believed I've fixed the bug.  Thanks for your help. Now it's way better. -O3 and -O4 works fine. Speed test for

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Ryan Joseph
> On Dec 12, 2018, at 7:20 PM, Martok wrote: > > Checking out the memory manager(s) could be useful as well - there are a lot > of > small allocations, that generally tends to put much stress on it. > And any improvement there would also directly benefit user applications. I noticed today

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marco van de Voort
Op 12/12/2018 om 1:49 PM schreef Ryan Joseph: This is especially a good idea because the compiler is a one pass program so leaks over the long term aren’t a problem. (well, unless it is integrated in the textmode IDE "fp" of course) ___ fpc-devel

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Ryan Joseph
> On Dec 12, 2018, at 7:20 PM, Martok wrote: > > Checking out the memory manager(s) could be useful as well - there are a lot > of > small allocations, that generally tends to put much stress on it. > And any improvement there would also directly benefit user applications. I was going to say

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Martok
Am 12.12.2018 um 04:51 schrieb Ryan Joseph: > I’ve spent some time in the compiler sources now and I’m curious just where > people think the bottle necks for performance actually are. It’s such a > complicated system for anyone one person to have a good understanding of so > it’s not clear

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-11 Thread J. Gareth Moreton
It is indeed such a complex system.  I wouldn't say that I've identified a bottleneck per se, but I've chosen to focus my improvements there.  The idea behind the overhaul is that it attempts to reduce the number of passes during the peephole optimizer stage - given that I've managed to shave off

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-11 Thread Ryan Joseph
I’ve spent some time in the compiler sources now and I’m curious just where people think the bottle necks for performance actually are. It’s such a complicated system for anyone one person to have a good understanding of so it’s not clear where to begin looking. > On Dec 12, 2018, at 9:42 AM,

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-11 Thread J. Gareth Moreton
I think this was intended for the mailing list - I'm looking forward to it too.  Depends on what Florian says though. The overhaul primarily increases the speed of compilation, but it makes some minor improvements to conditional branches here and there.  Nevertheless, I'm always happy to find a

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-09 Thread J. Gareth Moreton
Because of how intertwined my work is, I can't easily work on something else until I know if this overhaul is accepted or rejected.  However, in the meantime, would anyone object if I start porting it to i386, so I can get rid of all those horrible $ifdef's more than anything? From the little

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-09 Thread J. Gareth Moreton
I think patch.exe gets a bit confused if the changes have to be offset - in my case, there are 6 patches that should be applied together, and some of them modify the same file, hence causing movement of procedures in the source file. Sorry if I'm getting pushy and impatient with this change - I

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-09 Thread Marco van de Voort
Op 2018-12-09 om 02:05 schreef J. Gareth Moreton: I'm not sure. I've always had problems with patch.exe. I personally use "svn patch", which works for me both under Windows and Linux. Cygwin patch afaik also works fine. ___ fpc-devel maillist -

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
Saying that though, despite the near-identical time, what's the size of the binary like?  It should be the same or slightly smaller, but (hopefully) never larger. Gareth aka. Kit On Sun 09/12/18 03:32 , "Ryan Joseph" r...@thealchemistguild.com sent: > On Dec 9, 2018, at 9:15 AM, J. Gareth

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
Hmmm, it might imply that the overhaul isn't worth it except for the largest projects.  I guess we'll have to let Florian make that call. I'm not sure how to time the optimisation stage separately, unless you're able to pass in the PPU files directly.  Other factors like reading from the disk

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
> On Dec 9, 2018, at 9:15 AM, J. Gareth Moreton > wrote: > > Hmmm, that's a shame if the time difference is so small. Up to you if it's > worth it or not. I hoped it would be slightly better than that, although if > it's consistently faster, especially with large projects, then it's a

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
got it compiling but I need a better way to specify the changed rtl/package units. I just did a “make clean all” but I need to specify the new location of the rtf/packages units which are no longer in the default locations. Is there a better way than adding tons of -Fu’s in the command line?

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
Hmmm, that's a shame if the time difference is so small.  Up to you if it's worth it or not.  I hoped it would be slightly better than that, although if it's consistently faster, especially with large projects, then it's a winner in my eyes.  Fingers crossed! Gareth aka. Kit On Sun 09/12/18

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
> On Dec 9, 2018, at 9:02 AM, J. Gareth Moreton > wrote: > > (This should probably be on the mailing list because it's helpful to everyone) > > Hmmm, I'm not sure about that one - those shouldn't be affected. Just the > standard "make clean all" should work. > > However, the document here

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
Got everything building finally but the time difference is so small I'll need to make a script to compile multiple times and average all the runs. Is it even worth the time doing that? Regards, Ryan Joseph ___ fpc-devel maillist -

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
(This should probably be on the mailing list because it's helpful to everyone) Hmmm, I'm not sure about that one - those shouldn't be affected.  Just the standard "make clean all" should work. However, the document here contains everything about building FPC:

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
I'm not sure. I've always had problems with patch.exe. I personally use "svn patch", which works for me both under Windows and Linux. I hope this works better. Gareth aka. Kit On Sun 09/12/18 01:56 , "Ryan Joseph" r...@thealchemistguild.com sent: > I was stupid and didn’t use the right

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
Couldn’t figure out the patching. I tried a dry run but it doesn’t seem to find the file. Downloaded from svn cd trunk patch < /Users/ryanjoseph/Downloads/overhaul-64-32-split.patch --dry-run (Stripping trailing CRs from patch.) can't find file to patch at input line 5 Perhaps you should have

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread Ryan Joseph
I was stupid and didn’t use the right options. They work now except this one: sudo patch -p0 < /Users/ryanjoseph/Downloads/overhaul-base.patch (Stripping trailing CRs from patch.) patching file compiler/aopt.pas patch: malformed patch at line 15: Index: compiler/aoptbase.pas did it fail?

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-08 Thread J. Gareth Moreton
Had any luck with this? Gareth aka. Kit On Fri 07/12/18 01:26 , Ryan Joseph r...@thealchemistguild.com sent: > > > > > > On Dec 7, 2018, at 5:11 AM, J. Gareth Moreton > e...@moreton-family.com> wrote: > > > > > Does anyone have other test projects to compile > that would give more

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-06 Thread J. Gareth Moreton
The patches in question can be found here: https://bugs.freepascal.org/view.php?id=34628 - just get the latest source files from the SVN trunk and apply the patches - the order shouldn't matter, but be careful you don't accidentally apply the same patch twice.  After that, you just "make clean

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-06 Thread Ryan Joseph
> On Dec 7, 2018, at 5:11 AM, J. Gareth Moreton > wrote: > > Does anyone have other test projects to compile that would give more coverage > for the timing metrics? Sure. How do I download and build? Are you just relying the FPC standard output for timing or are there are special switches

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-06 Thread J. Gareth Moreton
Reply to message ----- Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 2. decembris 23:32:36 From: J. Gareth Moreton To: FPC developers' list Thanks for the feedback.  Do you have a reproducible case, and does it fail on Linux or Windows?  I'll have a look for the infi

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-06 Thread J. Gareth Moreton
binarySearchLong:=mid;     exit; end;           end;     if (sortedArray^[low] = toFind) then     begin binarySearchLong:=low;     end else     binarySearchLong := -1; { Not found} end;       - Reply to message - Subject: Re: [fpc-devel] x86_64 Opti

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-02 Thread J. Gareth Moreton
; end;           end;     if (sortedArray^[low] = toFind) then     begin binarySearchLong:=low;     end else     binarySearchLong := -1; { Not found} end;       ----- Reply to message ----- Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018.

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-02 Thread J. Gareth Moreton
Thanks for the feedback.  Do you have a reproducible case, and does it fail on Linux or Windows?  I'll have a look for the infinite loops in the meantime. Gareth aka. Kit On Sun 02/12/18 20:54 , "Marģers ." margers.ro...@inbox.lv sent: > I've had problems testing it under Linux due to

Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-01 Thread J. Gareth Moreton
Following advice from Florian, I've split my submission into five separate patches so they are easier to test.  It also now compiles under x86_64-linux.  It seems that there's an apparent fault with one of the MOV optimisations that was causing incorrect code to be generated in some instances.  I