Re: [fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations
A partial example of where "condition_in" shows its power (compiled code from fpc\packages\fcl-base\src\advancedipc.pp) BEFORE: ... jb .Lj178 subl $17,%eax cmpl $26,%eax jb .Lj178 cmpl $30,%eax stc je .Lj178 subl $32,%eax cmpl $26,%eax .Lj178: jc .Lj180 (Appears later in the code) ... AFTER: ... jb .Lj180 <-- Label changed subl $17,%eax cmpl $26,%eax jb .Lj180 <-- Label changed cmpl $30,%eax stc je .Lj178 subl $32,%eax cmpl $26,%eax .Lj178: jc .Lj180 (Appears later in the code) ... In this case, 'condition_in' detects that C_C is a subset of C_B (they're actually equivalent in this case... both branch if the Carry Flag is set). This particular optimisation means that the code only has to branch once if the JB conditions are met, rather than twice. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Optimizer Overhaul Take 2... Jump Optimizations
Hi everyone, So after my x86_64 optimizer overhaul was rejected in its current form, I decided to set to work again in cleaner steps. The first thing I've started to address are jump optimisations, since they gave notable improvements that were independent of things lke the reduction in the number of passes. There is also a bonus... a lot of it is cross-platform too! One thing I do need to do is to cross-compile to all known platforms... I can only test a handful of the i386 and x86_64 platforms, but I want to at least see if it compiles before I submit the patch for more thorough testing. Does anyone have any tips or a sample script (preferably a .BAT file) that cross-compiles from x86_64-win64 or i386-win32 onto a reasonable set of platforms that at least gives me a good smoke test? (Obviously, every combination is a little overkill if impractical). I'm trying to keep things as simple as possible, but for the best optimisations, I had to introduce a new function for all platforms: *function *condition_in(*const *c1, c2: TAsmCond): Boolean; It is similar to "conditions_equal", but it returns true if c1 is a subset of c2 (e.g. "C_E" and "C_L" are both subsets of "C_LE", but "C_LE" is not a subset of "C_L"). It always returns True if c2 is "C_None" (i.e. an unconditional jump) or if the conditions are equal, but analysing the subsets is platform-specific. I've done the best I can for the platform-specific assemblers that I don't know too well, but I'll need someone to check them over and maybe improve them, since I don't know what some of the conditions are. The function is mostly failsafe, in that if it returns False, an optimisation won't be performed. It's to help track final jump destinations more accurately, as well as detect what I call 'dominated jumps' - for example: jbe @lbl1 je @lbl2 The je instruction will never branch because the jbe instruction above it will branch if the equality condition is met, so it can be safely removed. I'll run the win32 and win64 tests over Thursday and will submit the patches once everything passes, as well as a PDF design and implementation spec to explain what everything is as well as the concepts of dominated jumps and the like. Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Well, when it comes to the specific changes I made to uComplex... the compiler might be able to detect a kind of 'auto-const' system, but actually inserting 'const' into the formal parameters helps with syntax checking as well as generating more efficient code, namely modifying the parameter when you're perhaps not supposed to. For vectorcall, I don't think the compiler will correctly guess when and when not to use the calling convention, and there are times where you may not want to use vectorcall, usually when interfacing with third-party programs or libraries. In this case, it's more likely that the programmer may stumble upon unintented behaviour if it tries to enable vectorcall for something that is meant to be the default Microsoft ABI instead. And using assembly language to directly call the uComplex routines I don't think is a realistic real-world example, considering that's a situation where you're more likely to be using the XMM registers directly to do such mathematics. Besides, I think all bets are off when it comes to assembly language - in this instance I tried to make sure that Pascal code didn't have to change though (other than a recompilation maybe). I could just say 'screw it' and write my own complex number library, but then that would just add to the growing collection of third-party libraries instead of a standard set of libraries that are antiquated and potentially sluggish on modern systems. Gareth aka. Kit On 30/10/2019 22:02, Florian Klämpfl wrote: Am 29.10.19 um 14:06 schrieb Marco van de Voort: Op 2019-10-27 om 10:46 schreef Florian Klämpfl: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. Auto inlining is also no panacea. It only works with heuristics, and is thus only as good as a formula of the heuristic. Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Changing calling conventions, vectorizing, loops all complicates that, and it will never be perfect, and a change here will lead to a problem there etc. See above. If you know a routine can evaluate to one instruction in most cases, I don't see anything wrong with just marking it as such. The compiler knows this as well as the compiler generated the code. Why should I guess if the compiler knows? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 29.10.19 um 14:06 schrieb Marco van de Voort: Op 2019-10-27 om 10:46 schreef Florian Klämpfl: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. Auto inlining is also no panacea. It only works with heuristics, and is thus only as good as a formula of the heuristic. Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Changing calling conventions, vectorizing, loops all complicates that, and it will never be perfect, and a change here will lead to a problem there etc. See above. If you know a routine can evaluate to one instruction in most cases, I don't see anything wrong with just marking it as such. The compiler knows this as well as the compiler generated the code. Why should I guess if the compiler knows? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Progress on reviewing x86_64 optimizer overhaul and node semantic pass
> For example, if I have changes in separate files that I want to split up, how > might one go about it without manually modiying the patch files? (As an easy > example, I split up the uComplex patches into two... one with the alignment > and vectorcall changes, and the other with the "const" modifier in the > parameters). As an option, you can apply patches and create new ones with e.g. VCS->Create patch in IDEA where you can select which files and which changes within each file should go to the patch. --- Best regards, George ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel