subject:"Re\: \[fpc\-devel\] Difficulty in specifying record alignment... and more compiler optimisation shenanigans\!"

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-27 Thread J. Gareth Moreton

The following passes everything through XMM0: #include #include doubleMod(__m128dz) { returnsqrt((z[0]*z[0])+(z[1]*z[1])); } intmain() { __m128dz; z[0] = 0; z[1] = 1; doubled = Mod(z); } I will admit that it's very fiddly to get right. All of my attempts to map an anonymous struct to

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-27 Thread Florian Klämpfl

Am 23.10.19 um 22:36 schrieb J. Gareth Moreton: So I did a bit of reading after finding the "mpx-linux64-abi.pdf" document. As I suspected, the System V ABI is like vectorcall when it comes to using the XMM registers... only the types __m128, __float128 and __Decimal128 use the "SSEUP" class

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton

In the meantime, if everything seems present and correct, https://bugs.freepascal.org/view.php?id=36202 contains the alignment and vectorcall modifiers for uComplex. It shouldn't affect anything outside of x86_64 but should still keep the unit very lightweight, which I believe was the

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton

Hmmm, that is unfortunate if the horizontal operations are inefficient. I had a look at them at https://www.agner.org/optimize/instruction_tables.pdf - you are right in that HADDPS has a surprisingly high latency (approximately how many cycles it takes to execute), although HADDPD isn't as

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread J. Gareth Moreton

So I did a bit of reading after finding the "mpx-linux64-abi.pdf" document. As I suspected, the System V ABI is like vectorcall when it comes to using the XMM registers... only the types __m128, __float128 and __Decimal128 use the "SSEUP" class and hence use the entire register. The types

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl

Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: mulpd %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously } haddpd %xmm0, %xmm0 { Adds the above multiplications together (horizontal add) } Unfortunatly, those horizontal operations are normally not very efficient IIRC.

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-23 Thread Florian Klämpfl

Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton" : > That's definitely a marked improvement. Under the System V ABI and > vectorcall, both fields of a complex type would be passed through xmm0. > Splitting it up into two separate registers would require something like: > > > shufpd

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-22 Thread J. Gareth Moreton

That's definitely a marked improvement. Under the System V ABI and vectorcall, both fields of a complex type would be passed through xmm0. Splitting it up into two separate registers would require something like: shufpd %xmm0,%xmm1,3 { Copy the high-order Double into the low-order

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-22 Thread Florian Klämpfl

Am 22.10.19 um 05:01 schrieb J. Gareth Moreton: Bigger challenges would be optimising the modulus of a complex number: function cmod (z : complex): real; vectorcall; { module : r = |z| } begin with z do cmod := sqrt((re * re) + (im * im)); end; A perfect

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

2019-10-21 Thread J. Gareth Moreton

This is a long read, so strap in! Well, I finally got it to work - the required type defintion was as follows: {$push} {$codealign RECORDMIN=16} {$PACKRECORDS C} { This record forces "complex" to be aligned to a 16-byte boundary } type align_dummy = record filler: array[0..1] of real;

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Re: [fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

10 matches

Site Navigation

Mail list logo

Footer information