Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-27 Thread J. Gareth Moreton
The following passes everything through XMM0: #include #include doubleMod(__m128dz) { returnsqrt((z[0]*z[0])+(z[1]*z[1])); } intmain() { __m128dz; z[0] = 0; z[1] = 1; doubled = Mod(z); } I will admit that it's very fiddly to get right.  All of my attempts to map an anonymous struct to

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-27 Thread Florian Klämpfl
Am 23.10.19 um 22:36 schrieb J. Gareth Moreton: So I did a bit of reading after finding the "mpx-linux64-abi.pdf" document.  As I suspected, the System V ABI is like vectorcall when it comes to using the XMM registers... only the types __m128, __float128 and __Decimal128 use the "SSEUP" class

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
In the meantime, if everything seems present and correct, https://bugs.freepascal.org/view.php?id=36202 contains the alignment and vectorcall modifiers for uComplex.  It shouldn't affect anything outside of x86_64 but should still keep the unit very lightweight, which I believe was the

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
Hmmm, that is unfortunate if the horizontal operations are inefficient.  I had a look at them at https://www.agner.org/optimize/instruction_tables.pdf - you are right in that HADDPS has a surprisingly high latency (approximately how many cycles it takes to execute), although HADDPD isn't as

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton
So I did a bit of reading after finding the "mpx-linux64-abi.pdf" document.  As I suspected, the System V ABI is like vectorcall when it comes to using the XMM registers... only the types __m128, __float128 and __Decimal128 use the "SSEUP" class and hence use the entire register.  The types

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: mulpd    %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously } haddpd    %xmm0, %xmm0 { Adds the above multiplications together (horizontal add) } Unfortunatly, those horizontal operations are normally not very efficient IIRC.

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl
Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" : > That's definitely a marked improvement. Under the System V ABI and > vectorcall, both fields of a complex type would be passed through xmm0. > Splitting it up into two separate registers would require something like: > > > shufpd

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-22 Thread J. Gareth Moreton
That's definitely a marked improvement.  Under the System V ABI and vectorcall, both fields of a complex type would be passed through xmm0.  Splitting it up into two separate registers would require something like: shufpd    %xmm0,%xmm1,3 { Copy the high-order Double into the low-order

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-22 Thread Florian Klämpfl
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: Bigger challenges would be optimising the modulus of a complex number:   function cmod (z : complex): real; vectorcall;     { module : r = |z| }     begin    with z do cmod := sqrt((re * re) + (im * im));     end; A perfect

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-21 Thread J. Gareth Moreton
This is a long read, so strap in! Well, I finally got it to work - the required type defintion was as follows: {$push} {$codealign RECORDMIN=16} {$PACKRECORDS C}   { This record forces "complex" to be aligned to a 16-byte boundary }   type align_dummy = record     filler: array[0..1] of real;