Re: [fpc-devel] Question on updating FPC packages
To get back on track with uComplex, I didn't change any routines to make them inline - they were that way already. All I did was change the parameters to 'const', align the complex type so it is equivalent to __m128d so the System V ABI can pass it all in one register, and enable vectorcall on Win64 so the same thing can happen on that platform. Is that really too much? Changing the Win64 build of FPC to default to vectorcall is an option, although the option to fall back to the fastcall-based convention needs to exist for the sake of interfacing with third-party libraries, and it doesn't change the fact that the complex type still needs to be aligned. Either way, it might break assembler code that calls the uComplex functions, but my argument still stands that I don't think this a realistic set-up in the wide scheme of things. Gareth aka. Kit On 31/10/2019 21:13, Florian Klämpfl wrote: Am 31.10.19 um 20:11 schrieb Marco van de Voort: Op 2019-10-30 om 23:02 schreef Florian Klämpfl: Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Well, it depends of course of what happens when. Would you really count final instructions or cycles after all optimization and peephole passes ? This is not really an issue: actually for inlining mainly instructions/code length matters and e.g. the arm compiler even does this (actually something more complex) as it has to insert the constant tables at the right locations into the code because the relative offsets are limited. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 31.10.19 um 20:11 schrieb Marco van de Voort: Op 2019-10-30 om 23:02 schreef Florian Klämpfl: Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Well, it depends of course of what happens when. Would you really count final instructions or cycles after all optimization and peephole passes ? This is not really an issue: actually for inlining mainly instructions/code length matters and e.g. the arm compiler even does this (actually something more complex) as it has to insert the constant tables at the right locations into the code because the relative offsets are limited. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Op 2019-10-30 om 23:02 schreef Florian Klämpfl: Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Well, it depends of course of what happens when. Would you really count final instructions or cycles after all optimization and peephole passes ? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Well, when it comes to the specific changes I made to uComplex... the compiler might be able to detect a kind of 'auto-const' system, but actually inserting 'const' into the formal parameters helps with syntax checking as well as generating more efficient code, namely modifying the parameter when you're perhaps not supposed to. For vectorcall, I don't think the compiler will correctly guess when and when not to use the calling convention, and there are times where you may not want to use vectorcall, usually when interfacing with third-party programs or libraries. In this case, it's more likely that the programmer may stumble upon unintented behaviour if it tries to enable vectorcall for something that is meant to be the default Microsoft ABI instead. And using assembly language to directly call the uComplex routines I don't think is a realistic real-world example, considering that's a situation where you're more likely to be using the XMM registers directly to do such mathematics. Besides, I think all bets are off when it comes to assembly language - in this instance I tried to make sure that Pascal code didn't have to change though (other than a recompilation maybe). I could just say 'screw it' and write my own complex number library, but then that would just add to the growing collection of third-party libraries instead of a standard set of libraries that are antiquated and potentially sluggish on modern systems. Gareth aka. Kit On 30/10/2019 22:02, Florian Klämpfl wrote: Am 29.10.19 um 14:06 schrieb Marco van de Voort: Op 2019-10-27 om 10:46 schreef Florian Klämpfl: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. Auto inlining is also no panacea. It only works with heuristics, and is thus only as good as a formula of the heuristic. Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Changing calling conventions, vectorizing, loops all complicates that, and it will never be perfect, and a change here will lead to a problem there etc. See above. If you know a routine can evaluate to one instruction in most cases, I don't see anything wrong with just marking it as such. The compiler knows this as well as the compiler generated the code. Why should I guess if the compiler knows? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 29.10.19 um 14:06 schrieb Marco van de Voort: Op 2019-10-27 om 10:46 schreef Florian Klämpfl: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. Auto inlining is also no panacea. It only works with heuristics, and is thus only as good as a formula of the heuristic. Yes. And manually adding inline is only as good as the knowledge of the user doing so. If somebody implements it right (I did not, I used the easiest approach and used an existing function to estimate the complexity of a subroutine). The compiler can just count the number of the generate instructions or even calculate the length of the procedure and then decide to keep the node tree for inlining. Changing calling conventions, vectorizing, loops all complicates that, and it will never be perfect, and a change here will lead to a problem there etc. See above. If you know a routine can evaluate to one instruction in most cases, I don't see anything wrong with just marking it as such. The compiler knows this as well as the compiler generated the code. Why should I guess if the compiler knows? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Tue, 29 Oct 2019, Ben Grasset wrote: On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt wrote: Saying that the code is 'almost unusably slow' is the kind of statement that does not help. I use the code almost daily in production, no complaints about performance, so clearly it is usable. Instead, demonstrate your claim with facts, for example by creating a patch that demonstrably increases performance. I was perhaps slightly exaggerating there. I use it as well in real life, but in many cases have found myself altering the sources to perform more optimally (some of which I could submit as patches, I suppose. Please do. As said, I rarely refuse patches for optimization for code I maintain, exactly because I know I pay little attention to it. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt wrote: > Saying that the code is 'almost unusably slow' is the kind of statement > that does > not help. I use the code almost daily in production, no complaints about > performance, so clearly it is usable. > > Instead, demonstrate your claim with facts, for example by creating a > patch that > demonstrably increases performance. > I was perhaps slightly exaggerating there. I use it as well in real life, but in many cases have found myself altering the sources to perform more optimally (some of which I could submit as patches, I suppose. On Sun, Oct 27, 2019 at 5:27 AM Michael Van Canneyt wrote: > If you genuinely believe that micro-optimization changes can make a > difference: > > Submit patches. When focused and well explained, I doubt they will be > refused. > The stuff that I'm particularly concerned about is usually more along the lines of "small things that add up in significant ways in the context of long-running programs", so while they might be "micro" on their own I wouldn't necessarily call them that in context of larger overall situations. On Sun, Oct 27, 2019 at 5:46 AM Florian Klämpfl wrote: > Another point: for example > explicit inline increases normally code size (not always but often) I've had the opposite experience in most cases. The code FPC generates for something like four un-inlined functions in a situation where each one calls the next is generally significantly bigger due to the setup for the parameters being passed in / etc. Whereas if it's inlining all of them it seems to be able to do a much better job of combining "redundant" things and optimizing based on that, which tends to give a much smaller result. Again, in a world where robust autoinlining was the default I'd happily rely on it exclusively, as it's not as though I specifically *want* to have to add the "inline" modifier in particular places. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On 29/10/2019 14:24, Michael Van Canneyt wrote: On Tue, 29 Oct 2019, J. Gareth Moreton wrote: Please note that only Marco's e-mails are making the list. I don't see Michael's responses. That's probably because I am not responding ;-) Michael. Yep, just noticed that Marco was responding to your messages from a few days ago! Perception fail! In regards to passing everything into XMM0, try running "tests/test/cg/tvectorcall1.pp" on Linux. It's a bit of a weird test because there's a lot of Win64 stuff that's not compiled since it tests aggregates, something that only vectorcall takes advantage of. Nevertheless, if you get an error such as 'FAIL: HorizontalAddSingle(HVA) has the vector in the wrong register.', then the System V ABI is not passing the __m128 type properly. The way it tests this is via a pair of functions, one in Pascal and one in assembler: function HorizontalAddSingle(V: TM128): Single; vectorcall; begin HorizontalAddSingle := V.M128_F32[0] + V.M128_F32[1] + V.M128_F32[2] + V.M128_F32[3]; end; function HorizontalAddSingle_ASM(V: TM128): Single; vectorcall; assembler; nostackframe; asm HADDPS XMM0, XMM0 HADDPS XMM0, XMM0 end; If the results are not equal, then the entire vector isn't in XMM0. I haven't tested it on Linux as much as I would like because I have to boot into a virtual machine to do so, and I'm still a bit of a Linux novice. I'm curious to know what the assembler dump is though. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Oh, I just noticed you're replying to messages from a few days ago. Oops! There is no right way of going about optimisation. I'm of the school that if you can give the compiler a helpful hint, without complicating the code, then do it. In one way I compare it to the id Tech (Quake) and Unreal engines back in the 90s and early 2000s. When making maps, the id Tech engines attempted to compile everything itself when it came to determining what was visible and what it should cull - as a result, the compilation process would take a long time and there were some situations where it could easily fall apart due to rounding errors or just some glitch in the tree. The Unreal engine, on the other hand, had /you/, the map designer, decide what was visible and what wasn't, and had you decide where to place portals and other hints to the engine. This was useful because it was much easier to subdivide areas if you were sensible about it and hence the Unreal engine could handle much more complex outdoor scenes, for example. The cost though, especially with later versions of the Unreal engine that added more features, is that it was very hard for a novice to get started - for example, the 'terrain' feature didn't do any automatic visibility culling, so if you had a large hill, for example, you would have to insert an 'anti-portal' underneath it to give a hint to the engine that if it is within the viewport, any polygons behind it is invisible (which causes very weird artefacts if you place one in the middle of an open room). I like to take a middle ground, especially as the Pascal compiler has a reputation of being fast. A smart compiler is a good compiler, but expecting it to be able to know which procedures should be auto-vectorised, especially with old source code and no rules on memory alignment, it's either impossible or will take a disproportinately long time. Other times it's an excuse for lazy programming! As for the vectorcall tests, they should vectorise the entire argument on both x86_64-win64 and x86_64-linux. If not, there's a bug somewhere. I'll have a look. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Tue, 29 Oct 2019, J. Gareth Moreton wrote: Please note that only Marco's e-mails are making the list. I don't see Michael's responses. That's probably because I am not responding ;-) Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Please note that only Marco's e-mails are making the list. I don't see Michael's responses. Gareth aka. Kit On 29/10/2019 13:41, Marco van de Voort wrote: Op 2019-10-27 om 10:27 schreef Michael Van Canneyt: Absolutely. Personally, I don't have any concern for performance in this sense. Almost zero. I invariably favour code simplicity over performance, for sake of maintenance. But there is another kick-in-the-open-door statement about performance: That the most performance is gained in a relative small part of the code. To tackle that you need tools to force the compiler to behave a certain way that might not (yet?) be doable on the compiler side. IMHO it is unfair to deem this all microoptimization just because it doesn't hurt you. For good reason: for the kind of code which I create daily, the kind of micro-optimizations that you seem to refer to, are utterly insignificant, and I expect the compiler to handle them. If it currently does not, then I think the compiler, rather than the code, must be improved. Just the vectorizing will probably more than double the performance. Just look at the asm that I posted and imagine reducing it to one instruction. And while set FFT unit is not yet a performance bottle neck for us now, it has been marked as a relative large factor of the measurement time. (iirc it is about 1ms for a 400 sample array on somewhat older hardware) And what is exactly needed might change at any given moment. If a new camera comes out, if processing can keep up you can process more samples which in turn reduces errors and improves the measurement nearly automatically Doing the same purely algorithmically usually means weeks-months of hard maths trying to improve signal quality, and after that validating that for umpteen products and customers etc etc. Believe me, "Microoptimization" then sounds very tempting. If Gareth can get this running enough to show that the FFT reduces instructions, I can just stuff it in a DLL, and have it lying on a shelf to insert into the Delphi app when needed. Which would be great. Code should not entirely disregard optimization, but then it should be on a higher level: don't use bubble sort when you can use a better sort. No amount of micro-optimization will make bubble sort outperform quickort. ( Interesting example, I'm not really a hardcore algorithms man, but I can think of some potential problems with that statement: 1 that only goes for N->Infinity and that computers don't have infinite resources. If quicksort uses more memory (e.g. to track state) it might not apply in certain circumstances. 2 if your swap() function is extremely expensive, sorting an already sorted array is more expensive with quicksort because it is a non stable sort. 3 the non recursive bubble sort might be easier to unroll and then optimize by the compiler in cases of sorting a fixed number of items. (e.g. ordering the elements of a short vector) ) Anyway, besides the fun, the "algorithms" mantra is only a first order guideline, not an absolute truth. Saying that the code is 'almost unusably slow' is the kind of statement that does not help. I use the code almost daily in production, no complaints about performance, so clearly it is usable. True. Claims should be proven, and with code that does something (not with simply a loop around a single operation) But that is why I brought up the FFT unit. It is possible that that is such a case. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Op 2019-10-27 om 10:27 schreef Michael Van Canneyt: Absolutely. Personally, I don't have any concern for performance in this sense. Almost zero. I invariably favour code simplicity over performance, for sake of maintenance. But there is another kick-in-the-open-door statement about performance: That the most performance is gained in a relative small part of the code. To tackle that you need tools to force the compiler to behave a certain way that might not (yet?) be doable on the compiler side. IMHO it is unfair to deem this all microoptimization just because it doesn't hurt you. For good reason: for the kind of code which I create daily, the kind of micro-optimizations that you seem to refer to, are utterly insignificant, and I expect the compiler to handle them. If it currently does not, then I think the compiler, rather than the code, must be improved. Just the vectorizing will probably more than double the performance. Just look at the asm that I posted and imagine reducing it to one instruction. And while set FFT unit is not yet a performance bottle neck for us now, it has been marked as a relative large factor of the measurement time. (iirc it is about 1ms for a 400 sample array on somewhat older hardware) And what is exactly needed might change at any given moment. If a new camera comes out, if processing can keep up you can process more samples which in turn reduces errors and improves the measurement nearly automatically Doing the same purely algorithmically usually means weeks-months of hard maths trying to improve signal quality, and after that validating that for umpteen products and customers etc etc. Believe me, "Microoptimization" then sounds very tempting. If Gareth can get this running enough to show that the FFT reduces instructions, I can just stuff it in a DLL, and have it lying on a shelf to insert into the Delphi app when needed. Which would be great. Code should not entirely disregard optimization, but then it should be on a higher level: don't use bubble sort when you can use a better sort. No amount of micro-optimization will make bubble sort outperform quickort. ( Interesting example, I'm not really a hardcore algorithms man, but I can think of some potential problems with that statement: 1 that only goes for N->Infinity and that computers don't have infinite resources. If quicksort uses more memory (e.g. to track state) it might not apply in certain circumstances. 2 if your swap() function is extremely expensive, sorting an already sorted array is more expensive with quicksort because it is a non stable sort. 3 the non recursive bubble sort might be easier to unroll and then optimize by the compiler in cases of sorting a fixed number of items. (e.g. ordering the elements of a short vector) ) Anyway, besides the fun, the "algorithms" mantra is only a first order guideline, not an absolute truth. Saying that the code is 'almost unusably slow' is the kind of statement that does not help. I use the code almost daily in production, no complaints about performance, so clearly it is usable. True. Claims should be proven, and with code that does something (not with simply a loop around a single operation) But that is why I brought up the FFT unit. It is possible that that is such a case. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Op 2019-10-27 om 10:46 schreef Florian Klämpfl: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. Auto inlining is also no panacea. It only works with heuristics, and is thus only as good as a formula of the heuristic. Changing calling conventions, vectorizing, loops all complicates that, and it will never be perfect, and a change here will lead to a problem there etc. If you know a routine can evaluate to one instruction in most cases, I don't see anything wrong with just marking it as such. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Op 2019-10-29 om 12:23 schreef J. Gareth Moreton: When it comes to testing vectorcall, uComplex isn't the best example actually because most of the operators are inlined. There are a number of tests under "tests/test/cg" that test vectorcall and the System V ABI using a Pascal implementation of the opaque __m128 type (the two ABIs should behave exactly the same when dealing with simple vectors). The last time I checked it didn't vector anything at all. So only the native vectorizing of the record of two singles would be nice. Last time I checked in 2017, complexadd inlined looked something like this: leal 32(%eax),%edx leal 8(%eax),%ecx vmovss (%ecx),%xmm0 vaddss (%edx),%xmm0,%xmm0 vmovss %xmm0,-8(%ebp) vmovss 4(%ecx),%xmm0 vaddss 4(%edx),%xmm0,%xmm0 vmovss %xmm0,-4(%ebp) And I realize quite some rearrangements must be done. If anything though, the example function you gave (I'll need to double-check what ComplexScl does though, if it isn't a simple multiplication) It is simple multiplication of both real and imaginary with a scalar (as opposed to complex*complex which has more terms). would be a pretty solid and heavy-duty test of the compiler attempting to vectorise the code - in an ideal world, individual calls to ComplexAdd and ComplexSub (which are simple + and - operations in uComplex) will compile into a single line of assembly language (ADDPD and SUBPD respectively). Nevertheless, one could disable the inlining to see how well the compiler handles the function chaining, since with aligned data, the result from XMM0 should be easily transposed in one go to another XMM register if not just left alone as parameter data for the next function. Yes, it is just a somewhat realworld codebase to play with. It is MPL even. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
When it comes to testing vectorcall, uComplex isn't the best example actually because most of the operators are inlined. There are a number of tests under "tests/test/cg" that test vectorcall and the System V ABI using a Pascal implementation of the opaque __m128 type (the two ABIs should behave exactly the same when dealing with simple vectors). If anything though, the example function you gave (I'll need to double-check what ComplexScl does though, if it isn't a simple multiplication) would be a pretty solid and heavy-duty test of the compiler attempting to vectorise the code - in an ideal world, individual calls to ComplexAdd and ComplexSub (which are simple + and - operations in uComplex) will compile into a single line of assembly language (ADDPD and SUBPD respectively). Nevertheless, one could disable the inlining to see how well the compiler handles the function chaining, since with aligned data, the result from XMM0 should be easily transposed in one go to another XMM register if not just left alone as parameter data for the next function. Gareth aka. Kit On 29/10/2019 11:06, Marco van de Voort wrote: Op 2019-10-27 om 09:02 schreef Florian Klämpfl: I guess you're right. It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned. In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping). I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs. Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee. Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit iirc, http://www.simdesign.nl) It is a D7 language level unit that uses a complex record and simple procedures as options. It should be easy to transpose to ucomplex. It is quite hll and switchable between single and double. (I use it in single mode, but to test vectorcall, obviously double mode would be best?) And it has routines that do a variety of complex operations. procedure FFT_5(var Z: array of TComplex); // usage of open array is to make things generic. Could be solved differently. var T1, T2, T3, T4, T5: TComplex; M1, M2, M3, M4, M5: TComplex; S1, S2, S3, S4, S5: TComplex; begin T1 := ComplexAdd(Z[1], Z[4]); T2 := ComplexAdd(Z[2], Z[3]); T3 := ComplexSub(Z[1], Z[4]); T4 := ComplexSub(Z[3], Z[2]); T5 := ComplexAdd(T1, T2); Z[0] := ComplexAdd(Z[0], T5); M1 := ComplexScl(c51, T5); M2 := ComplexScl(c52, ComplexSub(T1, T2)); M3.Re := -c53 * (T3.Im + T4.Im); // replace by i*add(t3,t4).scale(c53-i*c53) ? M3.Im := c53 * (T3.Re + T4.Re); M4.Re := -c54 * T4.Im; M4.Im := c54 * T4.Re; M5.Re := -c55 * T3.Im; M5.Im := c55 * T3.Re; S3 := ComplexSub(M3, M4); S5 := ComplexAdd(M3, M5);; S1 := ComplexAdd(Z[0], M1); S2 := ComplexAdd(S1, M2); S4 := ComplexSub(S1, M2); Z[1] := ComplexAdd(S2, S3); Z[2] := ComplexAdd(S4, S5); Z[3] := ComplexSub(S4, S5); Z[4] := ComplexSub(S2, S3); end; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Op 2019-10-27 om 09:02 schreef Florian Klämpfl: I guess you're right. It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned. In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping). I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs. Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee. Nils Haeck's FFT unit might be interesting. (same guy as nativejpg unit iirc, http://www.simdesign.nl) It is a D7 language level unit that uses a complex record and simple procedures as options. It should be easy to transpose to ucomplex. It is quite hll and switchable between single and double. (I use it in single mode, but to test vectorcall, obviously double mode would be best?) And it has routines that do a variety of complex operations. procedure FFT_5(var Z: array of TComplex); // usage of open array is to make things generic. Could be solved differently. var T1, T2, T3, T4, T5: TComplex; M1, M2, M3, M4, M5: TComplex; S1, S2, S3, S4, S5: TComplex; begin T1 := ComplexAdd(Z[1], Z[4]); T2 := ComplexAdd(Z[2], Z[3]); T3 := ComplexSub(Z[1], Z[4]); T4 := ComplexSub(Z[3], Z[2]); T5 := ComplexAdd(T1, T2); Z[0] := ComplexAdd(Z[0], T5); M1 := ComplexScl(c51, T5); M2 := ComplexScl(c52, ComplexSub(T1, T2)); M3.Re := -c53 * (T3.Im + T4.Im); // replace by i*add(t3,t4).scale(c53-i*c53) ? M3.Im := c53 * (T3.Re + T4.Re); M4.Re := -c54 * T4.Im; M4.Im := c54 * T4.Re; M5.Re := -c55 * T3.Im; M5.Im := c55 * T3.Re; S3 := ComplexSub(M3, M4); S5 := ComplexAdd(M3, M5);; S1 := ComplexAdd(Z[0], M1); S2 := ComplexAdd(S1, M2); S4 := ComplexSub(S1, M2); Z[1] := ComplexAdd(S2, S3); Z[2] := ComplexAdd(S4, S5); Z[3] := ComplexSub(S4, S5); Z[4] := ComplexSub(S2, S3); end; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Another point to bring up... I could easily write a cross-platform complex number library that is designed to take advantage of vector registers whenever possible for the absolute best performance, but there comes a problem of having multiple libraries that do the same thing and not really sticking to any standard. People tend to stick to what they're familiar with as well, and if a tool already exists, no matter how inefficient it is, people will use that instead. That's why I opted to update an existing library while doing my best to ensure Pascal code isn't broken. When it comes to assembly language, all bets tend to be off anyway, although once again, I argue that using assembly language to directly interface with the complex number routines is not a realistic situation, since if you're writing things in assembly language, complex numbers is one of those constructs that you would write in assembler as well for the sake of speed and efficiency. Long story short... why would people use or update their code to use a new complex number library when one that's been tried and tested (albeit out of date) already exists? Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Sun, 27 Oct 2019, Sven Barth via fpc-devel wrote: Michael Van Canneyt schrieb am So., 27. Okt. 2019, 10:58: Best of all would IMHO be to abolish or even totally ignore 'inline'. It is a hint, after all. The compiler is not forced to inline, even when the modifier is there. That would be a bit problematic: auto inlining needs to first parse the routine to determine whether it can be inlined at all which would then change the checksum of the interface section as now the routine would carry the node information required for inlining which it didn't before thus leading to the requirement of an additional compilation pass of dependent units. How does $autoinline then work ? Doesn't it have to do the same ? Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Michael Van Canneyt schrieb am So., 27. Okt. 2019, 10:58: > Best of all would IMHO be to abolish or even totally ignore 'inline'. > It is a hint, after all. The compiler is not forced to inline, even > when the modifier is there. > That would be a bit problematic: auto inlining needs to first parse the routine to determine whether it can be inlined at all which would then change the checksum of the interface section as now the routine would carry the node information required for inlining which it didn't before thus leading to the requirement of an additional compilation pass of dependent units. Regards, Sven > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Originally my patch just added the "vectorcall" calling convention to the functions; the "const" modifier was suggested by a third party and seemed sensible enough. I weighed up the fact that it wouldn't change how to you call the function in Pascal code and accepted it. The patch should be easily enough to split though, especially as the vectorcall thing is now just "{$calling vectorcall}" at the top of the file. Gareth aka. Kit On 27/10/2019 11:15, Michael Van Canneyt wrote: On Sun, 27 Oct 2019, J. Gareth Moreton wrote: I was more referring to the use of correct types, use const when possible etc. Change classes to advanced records where appropriate, that kind of thing. Michael. Which is why I hoped my patches for uComplex were permissible, since it adds 'const' to make the compilation more efficient and sets the calling convention to 'vectorcall' for Win64, something that the compiler won't think to do unless explicitly told so, and maybe a slight incentive to improve the compiler as far as vectorisation is concerned (and complex numbers are a good candidate since for most basic operations, the components are modified in tandem). Well, I can't comment on this, in such matters I trust Florian knows what he is talking about. I guess adding 'vectorcall' and 'const' are micro-optimisations, but I see it more as refactoring and good coding practice in the case of 'const', while 'vectorcall' is more about knowing what kind of data you're dealing with. I would not argue in the case of const and apply where appropriate, I don't know enough about vectorcall to comment. Maybe the patch can be split into parts so const can already be applied. It's not the first time we must advise to keep patches small and focused. Although I am also often a sinner when it comes to mixing things in a patch. When you're in the flow of things, that's the last thing on your mind :/ Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Sun, 27 Oct 2019, J. Gareth Moreton wrote: I was more referring to the use of correct types, use const when possible etc. Change classes to advanced records where appropriate, that kind of thing. Michael. Which is why I hoped my patches for uComplex were permissible, since it adds 'const' to make the compilation more efficient and sets the calling convention to 'vectorcall' for Win64, something that the compiler won't think to do unless explicitly told so, and maybe a slight incentive to improve the compiler as far as vectorisation is concerned (and complex numbers are a good candidate since for most basic operations, the components are modified in tandem). Well, I can't comment on this, in such matters I trust Florian knows what he is talking about. I guess adding 'vectorcall' and 'const' are micro-optimisations, but I see it more as refactoring and good coding practice in the case of 'const', while 'vectorcall' is more about knowing what kind of data you're dealing with. I would not argue in the case of const and apply where appropriate, I don't know enough about vectorcall to comment. Maybe the patch can be split into parts so const can already be applied. It's not the first time we must advise to keep patches small and focused. Although I am also often a sinner when it comes to mixing things in a patch. When you're in the flow of things, that's the last thing on your mind :/ Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
I was more referring to the use of correct types, use const when possible etc. Change classes to advanced records where appropriate, that kind of thing. Michael. Which is why I hoped my patches for uComplex were permissible, since it adds 'const' to make the compilation more efficient and sets the calling convention to 'vectorcall' for Win64, something that the compiler won't think to do unless explicitly told so, and maybe a slight incentive to improve the compiler as far as vectorisation is concerned (and complex numbers are a good candidate since for most basic operations, the components are modified in tandem). I guess adding 'vectorcall' and 'const' are micro-optimisations, but I see it more as refactoring and good coding practice in the case of 'const', while 'vectorcall' is more about knowing what kind of data you're dealing with. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Ideally, you should specify 'vectorcall' either when interfacing with third-party libraries, when the code can be vectorised by the compiler, or when doing it yourself in assembly language. For example, if I wanted to write the cmod function in x86_64 assembler (Intel notation): function cmod(z: Complex): Double; vectorcall; assembler; nostackframe; asm MULPD XMM0, XMM0 HADDPD XMM0, XMM0 SQRTSD XMM0, XMM0 end; Without vectorcall (or an unaligned type), where each field would be in a separate register, the code would instead be: function cmod(z: Complex): Double; assembler; nostackframe; asm MULSD XMM0, XMM0 MULSD XMM1, XMM1 ADDSD XMM0, XMM1 SQRTSD XMM0, XMM0 end; Admittedly the advantages are more obvious when using arrays of Singles. I guess a good example would be a 4-component dot product (I know there's a dot product instruction in SSE4, but I'm ignoring it for now): type TVector4 = record x, y, z, w: Single; end align 16; { hey, I can dream! } function DotProduct(V: TVector4): Single; vectorcall; assembler; nostackframe; begin MULPS XMM0, XMM0 HADDPS XMM0, XMM0 HADDPS XMM0, XMM0 { Only the first component of XMM0 is considered for the result } end; And without vectorcall (or an unaligned type): function DotProduct(V: TVector4): Single; vectorcall; assembler; nostackframe; begin MULSS XMM0, XMM0 MULSS XMM1, XMM1 MULSS XMM2, XMM2 MULSS XMM3, XMM3 ADDSS XMM0, XMM1 ADDSS XMM0, XMM2 ADDSS XMM0, XMM3 end; It's hard to say which function is more efficient here due to the latency of HADDPS and the multiple logic ports available (usually you can do at least two independent vector multiplications simultaneously), but the overhead of moving each field to a separate register will definitely add up. At the very least though, for the first dot product example, if the compiler was able to produce such assembler from Pascal source, it would be much more efficient to inline because it only uses a single register throughout. I'm not sure how the compiler would know to inline a function when it's reached the assembler stage though, even if the registers are still virtual. To get back to the subject at hand... the advantages of vectorcall. Microsoft Visual C++ does have a compiler option where it automatically sets the calling convention to "vectorcall" rather than the default Microsoft calling convention (which is based off "fastcall"), since in most cases with integers, pointers and individual floating-point parameters, vectorcall doesn't behave any differently. FPC would only be able to take full advantage of vectorcall and aligned types under Linux if the compiler was made better with vectorising instructions. As a side-note, I would like to propose adding the "fastcall" calling convention for i386-win32 and x86_64-win64 (and maybe other i386 and x86_64 platforms). Under Win32, fastcall uses ECX and EDX for its first two parameters and EAX for the result (it's a worse form of Pascal's default 'register' convention, but this was designed in the days when C++ functions pushed all their parameters to the stack), while under Win64 it would be equivalent to 'ms_abi_default' and force the default Microsoft calling convention regardless of whether there was a setting to default to vectorcall (I consider the default calling convention to be based off fastcall because it uses RCX and RDX for its first two parameters, then adds R8 and R9 for the next two, and the XMM registers for floating-point arguments). More than anything it would just help to interface with third-party libraries again. Gareth aka. Kit On 27/10/2019 08:02, Florian Klämpfl wrote: Am 27.10.19 um 07:32 schrieb J. Gareth Moreton: I guess you're right. It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned. In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping). I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs. Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www
Re: [fpc-devel] Question on updating FPC packages
On Sun, 27 Oct 2019, Florian Klämpfl wrote: Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. I am aware of your point of view, and I agree. Because, as I wrote: As a rule, the programmer should not have to care about such things. The compiler must handle that. It knows better (well, it should :)). Best of all would IMHO be to abolish or even totally ignore 'inline'. It is a hint, after all. The compiler is not forced to inline, even when the modifier is there. I was more referring to the use of correct types, use const when possible etc. Change classes to advanced records where appropriate, that kind of thing. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 27.10.19 um 10:27 schrieb Michael Van Canneyt: If you genuinely believe that micro-optimization changes can make a difference: Submit patches. As said: I am against applying them. Why? They clutter code and after all, they make assumptions about the current target which not might be always valid. And time testing them is much better spent in improving the compiler and then all code benefits. Another point: for example explicit inline increases normally code size (not always but often), so it is against the use of -Os. Applying inline manually on umpteen subroutines makes no sense. Better improve auto inlining. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Sat, 26 Oct 2019, Ben Grasset wrote: On Sat, Oct 26, 2019 at 1:31 PM Florian Klämpfl wrote: This is imo a waste of time and clutters only code. It is much more beneficial to improve the compiler to avoid a copying of the variable if it can prove that it is not needed (or to improve auto inlining.) While I absolutely agree that it would be nice if FPC auto-inlined *by default*, as most compilers do (*without* the {$AUTOINLINE} optimization directive that essentially nobody knows exists and thus never uses anyways), FPC doesn't do so currently, and as far as I can tell probably won't in the foreseeable future. Clairvoyance is a rare gift. At risk of sounding overly abrasive or rude, there is *enormous* amounts of code in both the RTL and packages that is almost unusably slow due to what seems like a general lack of *any kind* of concern for performance. Absolutely. Personally, I don't have any concern for performance in this sense. Almost zero. I invariably favour code simplicity over performance, for sake of maintenance. For good reason: for the kind of code which I create daily, the kind of micro-optimizations that you seem to refer to, are utterly insignificant, and I expect the compiler to handle them. If it currently does not, then I think the compiler, rather than the code, must be improved. Code should not entirely disregard optimization, but then it should be on a higher level: don't use bubble sort when you can use a better sort. No amount of micro-optimization will make bubble sort outperform quickort. Saying that the code is 'almost unusably slow' is the kind of statement that does not help. I use the code almost daily in production, no complaints about performance, so clearly it is usable. Instead, demonstrate your claim with facts, for example by creating a patch that demonstrably increases performance. Far too much of it is just un-inlined heap allocation on top of un-inlined heap allocation on top of un-inlined heap-allocation on top of for-loop that uses "Integer" when it should really use "SizeInt" on top of utter avoidance of pointer arithmetic even though it's always faster on top of methods that have no reason to be marked "virtual" but are anyways on top of blah blah blah... I'm sure you get the point. These are the kind of micro-optimizations that are irrelevant for me. About virtual: In general, don't condemn the use virtual unless you know why it was put there. Extensability & compatibility with delphi are 2 important reasons. Sizeint vs. Integer. 2 points: 1. A programmer should not have to care. The programmer must care about 'what does the logic require', not 'what does the CPU require'. It's the job of the compiler to make sure it creates the most suitable code for a given type. 2. The current amount of Integer types is a historical mess. Many/Most of these types did not exist when the RTL code was written. So if today with the whole zoo of integers we have (it's like elementary particle physics quadrupled) there is still a lot of code that uses suboptimal integer types: it is only to be expected. I certainly don't go over the codebase whenever a new integer type is invented. Can this be improved ? Certainly. Do I want to do this ? No, I think it is more important for me to add new functionality. And of course I haven't even mentioned the fact that in reality, *anywhere* that an advanced record (or even object) can be used instead of a class, it should be, because it means you're avoiding an unnecessary allocation, but good luck convincing anyone who matters of that! Several points here. Most of the code was written before advanced records existed. There is backwards and/or delphi compatibility to be considered. Advanced records also have a disadvantage: copying them is expensive. So when advocating this change: make sure a record is not being passed around and/or copied a lot. That said, I haven't seen a single proposal where you personally would change a class to an advanced record. But maybe I missed such cases? I'm sure you get my point. I think I do. I don't necessarily agree with all of what you say. If you genuinely believe that micro-optimization changes can make a difference: Submit patches. When focused and well explained, I doubt they will be refused. When such patches appear for code that I wrote/maintain, I almost invariably apply them. For most, I didn't even require explicit proof that they improve speed. It's not because I don't care about optimization that I deny someone else the right to care and to submit patches. Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 27.10.19 um 01:07 schrieb Ben Grasset: FPC doesn't do so currently, and as far as I can tell probably won't in the foreseeable future. Yes, people write only lengthy mails on fpc-devel instead of writing code. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 27.10.19 um 07:32 schrieb J. Gareth Moreton: I guess you're right. It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned. In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping). I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs. Is there currently any example which shows that vectorcall has any advantage with FPC? Else I would propose first to make FPC able to take advantage of it and then talk about if we really add vectorcall. Currently I fear, FPC gets only into trouble when using vectorcall as it tries first to push everything into one xmm register and then splits this again in the callee. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
I guess you're right. It just seems weird because the System V ABI was designed from the start to use the MM registers fully, so long as the data is aligned. In effect, it had vectorcall wrapped into its design from the start. Granted, vectorcall has some advantages and can deal with relatively complex aggregates that the System V ABI cannot handle (for example, a record type that contains a normal vector and information relating to bump mapping). I just hoped that making updates to uComplex, while ensuring existing Pascal code still compiles, would help take advantage of modern ABI designs. Gareth aka. Kit On 27/10/2019 01:12, Sven Barth via fpc-devel wrote: I don't think the compiler can be made smart and safe enough to auto-align something like the complex type to take full advantage of the System V ABI, and vectorcall is not the default Win64 calling convention (and the default convention is a little badly-designed if I'm allowed to say, since it doesn't vectorise anything at all). It's not badly designed, it's a child of its time. Back when Win64 was conceived it wasn't expected that the use of SSE would become as widespread as it is now. And one doesn't simply change a platform ABI on a whim. That's why Microsoft introduced vectorcall after all... Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
> > I don't think the compiler can be made smart and safe enough to > auto-align something like the complex type to take full advantage of the > System V ABI, and vectorcall is not the default Win64 calling convention > (and the default convention is a little badly-designed if I'm allowed to > say, since it doesn't vectorise anything at all). > It's not badly designed, it's a child of its time. Back when Win64 was conceived it wasn't expected that the use of SSE would become as widespread as it is now. And one doesn't simply change a platform ABI on a whim. That's why Microsoft introduced vectorcall after all... Regards, Sven > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
On Sat, Oct 26, 2019 at 1:31 PM Florian Klämpfl wrote: > This is imo a waste of time and clutters only code. It is much more > beneficial to > improve the compiler to avoid a copying of the variable if it can prove > that it is not needed (or to improve auto inlining.) > While I absolutely agree that it would be nice if FPC auto-inlined *by default*, as most compilers do (*without* the {$AUTOINLINE} optimization directive that essentially nobody knows exists and thus never uses anyways), FPC doesn't do so currently, and as far as I can tell probably won't in the foreseeable future. At risk of sounding overly abrasive or rude, there is *enormous* amounts of code in both the RTL and packages that is almost unusably slow due to what seems like a general lack of *any kind* of concern for performance. Far too much of it is just un-inlined heap allocation on top of un-inlined heap allocation on top of un-inlined heap-allocation on top of for-loop that uses "Integer" when it should really use "SizeInt" on top of utter avoidance of pointer arithmetic even though it's always faster on top of methods that have no reason to be marked "virtual" but are anyways on top of blah blah blah... I'm sure you get the point. And of course I haven't even mentioned the fact that in reality, *anywhere* that an advanced record (or even object) can be used instead of a class, it should be, because it means you're avoiding an unnecessary allocation, but good luck convincing anyone who matters of that! I'm sure you get my point. And no, I'm not advocating for "micro-optimization", or as I constantly hear "stuff that doesn't matter except in contrived benchmarks", I'm advocating for the bare minimum standards that average people would and do expect from the "standard" library and packages of a modern programming language. People are of course free to pretend like it doesn't matter that *each and every* use of the "inline" modifier in the Classes unit is hidden behind a "CLASSESINLINE" define never set to true in any makefile (which yes, indeed mean that absolutely nothing in Classes is inlined, under any circumstances, ever!) but I at the same time am free to realize that incurring the cost of *two* function calls for every single indexed access to a TFPList instead of zero via inlining is utterly insane, and modify my local makefiles to define CLASSESINLINE. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
With my experiments on i386 anx x86_64 (without the alignment changes) the complex record is always passed by reference, but without const, the function prologue then makes a copy of it on the function's local stack, which is then referenced to in the rest of the function. Whether or not const is present or not, the same reference is passed into the function unmodified (the compiled assembly language is no different). I think in a way, Florian and I have slightly different views. I don't trust the compiler to make the most optimal code (i.e. a lazy compiler... I didn't want to say that Florian's compiler was inefficient until he himself said it!), so I try to give it hints where I can, and inserting "const" modifiers seems harmless enough since this has been a documented Pascal feature for decades, and most of the functions don't modify the parameter, so adding "const" just enforces it on the compiler's side. Granted, I do seek to make improvements to the compiler where possible, and it's something I enjoy doing. In the case of 'auto-const', I imagine it could be done at the node level, detecting that a parameter is only read from and never written to, but there may still be traps where you modify it without meaning to and causing inefficiencies. Case in point, I had to make one small change to the "cth" function because it reused the parameter as a temporary variable. Originally, it was this: function cth (z : complex) : complex; { hyberbolic complex tangent } { th(x) = sinh(x) / cosh(x) } { cosh(x) > 1 qq x } var temp: complex; begin temp := cch(z); z := csh(z); cth := z / temp; end; I changed it to the following because specifying "const" caused a compiler error: function cth (const z : complex) : complex; { hyberbolic complex tangent } { th(x) = sinh(x) / cosh(x) } { cosh(x) > 1 qq x } var temp, hsinz : complex; begin temp := cch(z); hsinz := csh(z); cth := hsinz / temp; end; I'm assuming there's a good reason as to why it can't simply be written as "cth := csh(z) / cch(z);" (and it looks easier to auto-inline), although currently that reason eludes me. I don't think the compiler can be made smart and safe enough to auto-align something like the complex type to take full advantage of the System V ABI, and vectorcall is not the default Win64 calling convention (and the default convention is a little badly-designed if I'm allowed to say, since it doesn't vectorise anything at all). Plus other platforms may have more restrictive memory availability and coarse alignment is not desired since it causes wastage. Granted, when it comes to increased maintainability, the little tricks required to align the complex type while keeping the same field names is very tricky to understand and get correct (hence my suggestion of a distinct "align ##" modifier at the end of the type declaration, but that's another story). I think the question of whether a micro-optimisation increases maintainability is fairly subjective and can only be determined on a case-by-case basis. In my mind, if someone has done the optimisation and the code is still relatively clean, then it's okay to merge so long as everyone accepts it and it's fully tested. Gareth aka. Kit On 26/10/2019 18:02, Sven Barth via fpc-devel wrote: Am 26.10.2019 um 18:51 schrieb J. Gareth Moreton: The "const" suggestion was made by a third party, and while I went out of my way to ensure the functions aren't changed in Pascal code, Florian pointed out that it could break existing assembler code. Maybe I'm being a bit stubborn or unreasonable, I'm not sure, but in my eyes, using assembly language to directly call the uComplex functions and operators seems rather unrealistic. I figured if you're writing in assembly language, especially if you're using vector registers, you'd be using your own code to play around with complex numbers. Plus I figured that if you're developing on a non-x86_64 platform, the only thing that's different are the 'const' modifiers, which I don't think changes the way you actually call the function, regardless of platform. Am I right in this? It totally depends on how "const" is implemented for the particular target. On some there might not be any difference on others there might be a similar difference as for x86, namely that something is passed as a reference instead of a copy. I guess a more fundamental question I should ask, and this might be terribly naïve of me, is this: when you call some function F(x: TType), is there a situation where calling F(const x: TType) produces different machine code or where a particular actual parameter becomes illegal? Note I'm talking about how you call the function, not how the function itself is compiled. Didn't you provide the example yourself with your changes to the uComplex unit? There are cases (especially with records)
Re: [fpc-devel] Question on updating FPC packages
Am 26.10.19 um 18:51 schrieb J. Gareth Moreton: The "const" suggestion was made by a third party, and while I went out of my way to ensure the functions aren't changed in Pascal code, Florian pointed out that it could break existing assembler code. Maybe I'm being a bit stubborn or unreasonable, I'm not sure, but in my eyes, using assembly language to directly call the uComplex functions and operators seems rather unrealistic. I figured if you're writing in assembly language, especially if you're using vector registers, you'd be using your own code to play around with complex numbers. Plus I figured that if you're developing on a non-x86_64 platform, the only thing that's different are the 'const' modifiers, which I don't think changes the way you actually call the function, regardless of platform. Am I right in this? The intention was to make the lightweight unit even more lightweight and optimal, without breaking backwards compatibility. I do not like such (micro-)optimziations working around a lazy compiler. I saw similar patches in lazarus recently (adding inline). This is imo a waste of time and clutters only code. It is much more beneficial to improve the compiler to avoid a copying of the variable if it can prove that it is not needed (or to improve auto inlining if it does not work in certain cases). And in this case it would probably possible to find out that a copy is not needed. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Question on updating FPC packages
Am 26.10.2019 um 18:51 schrieb J. Gareth Moreton: The "const" suggestion was made by a third party, and while I went out of my way to ensure the functions aren't changed in Pascal code, Florian pointed out that it could break existing assembler code. Maybe I'm being a bit stubborn or unreasonable, I'm not sure, but in my eyes, using assembly language to directly call the uComplex functions and operators seems rather unrealistic. I figured if you're writing in assembly language, especially if you're using vector registers, you'd be using your own code to play around with complex numbers. Plus I figured that if you're developing on a non-x86_64 platform, the only thing that's different are the 'const' modifiers, which I don't think changes the way you actually call the function, regardless of platform. Am I right in this? It totally depends on how "const" is implemented for the particular target. On some there might not be any difference on others there might be a similar difference as for x86, namely that something is passed as a reference instead of a copy. I guess a more fundamental question I should ask, and this might be terribly naïve of me, is this: when you call some function F(x: TType), is there a situation where calling F(const x: TType) produces different machine code or where a particular actual parameter becomes illegal? Note I'm talking about how you call the function, not how the function itself is compiled. Didn't you provide the example yourself with your changes to the uComplex unit? There are cases (especially with records) where "x" is passed as a copy on the stack and "const x" is passed as a reference. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Question on updating FPC packages
Hi everyone, So recently I took it upon myself to make some minor changes to uComplex in order to produce more optimal code, especially under x86_64 platforms. The changes included adding "const" to most of the function and operator parameters so ones that are passed by reference aren't needlessly copied by the prologues, and also to align the complex type and utilise the vectorcall calling convention under Win64 (and the System V ABI on non-Windows platforms) so the compiler better utilises the XMM registers (it can pass the entire complex type by value this way through a single register). The "const" suggestion was made by a third party, and while I went out of my way to ensure the functions aren't changed in Pascal code, Florian pointed out that it could break existing assembler code. Maybe I'm being a bit stubborn or unreasonable, I'm not sure, but in my eyes, using assembly language to directly call the uComplex functions and operators seems rather unrealistic. I figured if you're writing in assembly language, especially if you're using vector registers, you'd be using your own code to play around with complex numbers. Plus I figured that if you're developing on a non-x86_64 platform, the only thing that's different are the 'const' modifiers, which I don't think changes the way you actually call the function, regardless of platform. Am I right in this? The intention was to make the lightweight unit even more lightweight and optimal, without breaking backwards compatibility. Are there any known examples out there that could break or would otherwise need testing? I figured uComplex was a good place to start in optimising/refactoring some of the existing units, mainly because I'm a mathematician and hence know how complex numbers work, and the individual functions are simple enough that you can easily see how efficient they are in a disassembler (so are a good test case for new compiler optimisations). I guess a more fundamental question I should ask, and this might be terribly naïve of me, is this: when you call some function F(x: TType), is there a situation where calling F(const x: TType) produces different machine code or where a particular actual parameter becomes illegal? Note I'm talking about how you call the function, not how the function itself is compiled. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel