Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
In the meantime, if everything seems present and correct, https://bugs.freepascal.org/view.php?id=36202 contains the alignment and vectorcall modifiers for uComplex. It shouldn't affect anything outside of x86_64 but should still keep the unit very lightweight, which I believe was the original intent. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
Hmmm, that is unfortunate if the horizontal operations are inefficient. I had a look at them at https://www.agner.org/optimize/instruction_tables.pdf - you are right in that HADDPS has a surprisingly high latency (approximately how many cycles it takes to execute), although HADDPD isn't as bad, probably because it's only dealing with 2 Doubles instead of 4 Singles, and it seems mostly equivalent in speed to the multiplication instructions. Using just SSE2: mulpd %xmm0,%xmm0 shufpd %xmm0,%xmm1,1 addsd %xmm1,%xmm0 sqrtsd %xmm0,%xmm0 Ultimately it's not much better than what you have: shufpd %xmm0,%xmm1,1 { Only needed if both fields are in %xmm0 } mulsd %xmm0,%xmm0 mulsd %xmm1,%xmm1 addsd %xmm1,%xmm0 sqrtsd %xmm0,%xmm0 If you measure the dependencies between the instructions (shufpd and the first mulsd can run simultaneously, or equivalently, the two mulsd instructions), it still amounts to 4 cycles, assuming each instruction takes an equal amount of time to execute (which they don't, but it's a reasonable approximation). The subroutines are also probably too small to get accurate timing metrics on them. It might be something to experiment on though - I would hope at the very least that the horizontal operations have improved in later years. I know though that vectorising instructions is, by and large, a net gain. For example, let's go to a simpler example of adding two complex numbers together: operator + (z1, z2 : complex) z : complex; vectorcall; {$ifdef TEST_INLINE} inline; {$endif TEST_INLINE} { addition : z := z1 + z2 } begin z.re := z1.re + z2.re; z.im := z1.im + z2.im; end; No horizonal adds here, just a simple packed addition and storing the result into %xmm0 as opposed to two scalar additions and then combining the result in whatever way is demanded (if aligned, it's all in %xmm0, if unaligned, I think %xmm0 and %xmm1 are supposed to be used). Mind you, in this case the function is inlined, so the parameter passing doesn't always apply. Once again though, I was surprised at how inefficient HADDPS is once you pointed it out. The double-precision versions aren't nearly as bad though, so maybe they can still be used. Gareth aka. Kit P.S. As far as 128-bit aligned vector types are concerned, vectorcall and the System V ABI can be considered equivalent. Vectorcall can use more MM registers for return values and more complex aggregates as parameters, but in our examples, we don't have to worry about that yet. On 23/10/2019 21:20, Florian Klämpfl wrote: Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: mulpd %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously } haddpd %xmm0, %xmm0 { Adds the above multiplications together (horizontal add) } Unfortunatly, those horizontal operations are normally not very efficient IIRC. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
So I did a bit of reading after finding the "mpx-linux64-abi.pdf" document. As I suspected, the System V ABI is like vectorcall when it comes to using the XMM registers... only the types __m128, __float128 and __Decimal128 use the "SSEUP" class and hence use the entire register. The types are opaque, but both their size and alignment are 16 bytes, so I think anything that abides by those rules can be considered equivalent. If the complex type is unaligned, the two fields get their own XMM register. If aligned, they both go into %xmm0. At least that is what I gathered from reading the document - it's a little unclear sometimes. Gareth aka. Kit On 23/10/2019 06:59, Florian Klämpfl wrote: Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" : That's definitely a marked improvement. Under the System V ABI and vectorcall, both fields of a complex type would be passed through xmm0. Splitting it up into two separate registers would require something like: shufpd%xmm0,%xmm1,3 { Copy the high-order Double into the low-order position - an immediate operand of "1" will also work, since we're not concerned with the upper 64 bits of %xmm1 } After which your complied code will work correctly (since it looks like %xmm1 was undefined before): The code is correct, on x86_64-linux vectorcall is ignored. Supporting vectorcall with my approach would be more difficult. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: mulpd %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously } haddpd %xmm0, %xmm0 { Adds the above multiplications together (horizontal add) } Unfortunatly, those horizontal operations are normally not very efficient IIRC. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride
On 23/10/19 11:34, Joost van der Sluis wrote: Op 13-10-2019 om 00:22 schreef Martin: - ShortString are encoded exactly as record len: int; st: array of char; end; - And in dwarf 3, ansistring are encoded as array. . Well. If someone creates a record called 'ShortString', (s)he should not be surprised that the debugger thinks that it is actually a shortstring? I do not see the issue here. The compiler generates debug-information that makes it possible for any debugger to show the data correctly. For shortstrings it repors a structure with a length and the actual characters. This is what a shortstring is.. A user may copy a watch expression from its source (when using mouse hint, but also instead of typing a copy to the watch window). That may be FooString[5] If shortstring is a record (gdb without help from the IDE) then the watch fails, because it must be FooString.st[5] Of course this is not a problem in the IDE since the IDE can change it, and may even do so for a userdefined record. Though FooString [0] should only work for shortstrings. (to get the len) The problem is more severe in cases (dwarf2) where ansistring and pchar are indistinguishable. Because s[1] can be the 1st or 2nd char And that means WRONG data can be displayed. In dwarf3 it is array of char vs ansistring. And while ansistring should usually display utf8 as text, array of char is on "8bit" char at a time. (IMHO) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride
On 23/10/19 11:34, Joost van der Sluis wrote: Op 13-10-2019 om 00:22 schreef Martin: I have a few wishes with regards to: https://bugs.freepascal.org/view.php?id=36144 1) FpDebug detects fpc as dwarf provider, and checks the fpc version. Based on this it can interpret the misplaced tags, and work around the issue. I have now configured 3.3.0 as the cut-off for the workaround (since fpc now puts the tags in the correct location). Should this patch get merged to 3.2 then please alert me of this. So I can adjust the check in FpDebug. About the patch: I made the original change within the period that I tried to get multiple-dimensional arrays to be displayed correctly in gdb. Especially arrays of ansistrings. In the end this worked, does this still work? We need some kind of debug-tests. I know you have some, and that there are huge differences between gdb versions, but still... I am away this week, I check next week. 2) "I shot myself in the foot" Having reported this issue, and it no being fixed, I realized that I (ab)used the presence of this issue. - ShortString are encoded exactly as record len: int; st: array of char; end; - And in dwarf 3, ansistring are encoded as array. With the only difference that they always had the stride in the array, and not in the range. FpDebug used the knowledge of this implementation detail (in the hope that it would not change) to detect the diff between a user defined record (with the exact same fieldnames), and an actual shortstring. That no longer works Well. If someone creates a record called 'ShortString', (s)he should not be surprised that the debugger thinks that it is actually a shortstring? I do not see the issue here. The compiler generates debug-information that makes it possible for any debugger to show the data correctly. For shortstrings it repors a structure with a length and the actual characters. This is what a shortstring is. Yes it is correct right now. But it might be possible to improve, since dwarf has a string type, and all we (or I) need to test is if gdb (nowadays) can display it. shortstring might still be a record, though... Btw shortstring has 2 length... type s = string[20]; can hold 20 chars (important if the debugger wants to change the value) but s:='abc' sets the length (s[0]) to 3 So I need a new difference, please. Adding an artificial difference (some sort of implementation-detail) on which some debuggers depend, does not seem to be a good idea, imho. true, I have one for now ... Ideally using DW_TAG_string_type (available since dwarf 2). I have currently no idea what gdb will do with that. For FpDebug I will have to implement it, but that is no problem. DW_TAG_string_type is deliberately not used, as it describes a string-type that fpc does not use. could you explain? If the stride is optional, the compiler should always omit it when not necessary, to decrease the executable (debuginfo) size. What you ask is to add a bug deliberately, which you can use to detect whether something is a string or not. True. I am not fond of implementation detail stuff But I would very much like a definite specification fpc has -godwarfcpp it could have godwarffpd and then use vendor tags, to describe any pascal type that has no exact dwarf spec. fpdebug knows the fpc version that wrote the dwarf. so for older fpc it can use the implementation details (they do not change). And if we can start and work out proper details for the future then that will solve it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] 0036144: Wrong Dwarf2/3/4 info for array (all array, but affects bitpacked) / incorrect use of DW_AT_bit_stride
Op 13-10-2019 om 00:22 schreef Martin: I have a few wishes with regards to: https://bugs.freepascal.org/view.php?id=36144 1) FpDebug detects fpc as dwarf provider, and checks the fpc version. Based on this it can interpret the misplaced tags, and work around the issue. I have now configured 3.3.0 as the cut-off for the workaround (since fpc now puts the tags in the correct location). Should this patch get merged to 3.2 then please alert me of this. So I can adjust the check in FpDebug. About the patch: I made the original change within the period that I tried to get multiple-dimensional arrays to be displayed correctly in gdb. Especially arrays of ansistrings. In the end this worked, does this still work? We need some kind of debug-tests. I know you have some, and that there are huge differences between gdb versions, but still... 2) "I shot myself in the foot" Having reported this issue, and it no being fixed, I realized that I (ab)used the presence of this issue. - ShortString are encoded exactly as record len: int; st: array of char; end; - And in dwarf 3, ansistring are encoded as array. With the only difference that they always had the stride in the array, and not in the range. FpDebug used the knowledge of this implementation detail (in the hope that it would not change) to detect the diff between a user defined record (with the exact same fieldnames), and an actual shortstring. That no longer works Well. If someone creates a record called 'ShortString', (s)he should not be surprised that the debugger thinks that it is actually a shortstring? I do not see the issue here. The compiler generates debug-information that makes it possible for any debugger to show the data correctly. For shortstrings it repors a structure with a length and the actual characters. This is what a shortstring is. That some debuggers, specially made for fpc (like fpdebug, but in some regard this holds for gdb too) show a more convenient format, is nice. But I think this is not relevant for the compiler. So I need a new difference, please. Adding an artificial difference (some sort of implementation-detail) on which some debuggers depend, does not seem to be a good idea, imho. Ideally using DW_TAG_string_type (available since dwarf 2). I have currently no idea what gdb will do with that. For FpDebug I will have to implement it, but that is no problem. DW_TAG_string_type is deliberately not used, as it describes a string-type that fpc does not use. If that is not an option, can we go for a simpler (implementation detail (yes again)) workaround (that then goes into trunk, and/or if the original fix is merged, can be merged too): - The stride is optional. If absent it is equal to the element size (shortstring = char = byte) - Arrays always have a stride - Drop it from the strings array And I can then detect that. If the stride is optional, the compiler should always omit it when not necessary, to decrease the executable (debuginfo) size. What you ask is to add a bug deliberately, which you can use to detect whether something is a string or not. Regards, Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" : > That's definitely a marked improvement. Under the System V ABI and > vectorcall, both fields of a complex type would be passed through xmm0. > Splitting it up into two separate registers would require something like: > > > shufpd%xmm0,%xmm1,3 { Copy the high-order Double into the low-order > position - an immediate operand of "1" will also work, since we're not > concerned with the upper 64 bits of %xmm1 } > > > After which your complied code will work correctly (since it looks like > %xmm1 was undefined before): The code is correct, on x86_64-linux vectorcall is ignored. Supporting vectorcall with my approach would be more difficult. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel