The following passes everything through XMM0:
#include
#include
doubleMod(__m128dz)
{
returnsqrt((z[0]*z[0])+(z[1]*z[1]));
}
intmain()
{
__m128dz;
z[0] = 0; z[1] = 1;
doubled = Mod(z);
}
I will admit that it's very fiddly to get right. All of my attempts to
map an anonymous struct to
Am 23.10.19 um 22:36 schrieb J. Gareth Moreton:
So I did a bit of reading after finding the "mpx-linux64-abi.pdf"
document. As I suspected, the System V ABI is like vectorcall when it
comes to using the XMM registers... only the types __m128, __float128
and __Decimal128 use the "SSEUP" class
In the meantime, if everything seems present and correct,
https://bugs.freepascal.org/view.php?id=36202 contains the alignment and
vectorcall modifiers for uComplex. It shouldn't affect anything outside
of x86_64 but should still keep the unit very lightweight, which I
believe was the
Hmmm, that is unfortunate if the horizontal operations are inefficient.
I had a look at them at
https://www.agner.org/optimize/instruction_tables.pdf - you are right in
that HADDPS has a surprisingly high latency (approximately how many
cycles it takes to execute), although HADDPD isn't as
So I did a bit of reading after finding the "mpx-linux64-abi.pdf"
document. As I suspected, the System V ABI is like vectorcall when it
comes to using the XMM registers... only the types __m128, __float128
and __Decimal128 use the "SSEUP" class and hence use the entire
register. The types
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:
mulpd %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously }
haddpd %xmm0, %xmm0 { Adds the above multiplications together
(horizontal add) }
Unfortunatly, those horizontal operations are normally not very
efficient IIRC.
Am 23. Oktober 2019 01:14:03 schrieb "J. Gareth Moreton"
:
> That's definitely a marked improvement. Under the System V ABI and
> vectorcall, both fields of a complex type would be passed through xmm0.
> Splitting it up into two separate registers would require something like:
>
>
> shufpd
That's definitely a marked improvement. Under the System V ABI and
vectorcall, both fields of a complex type would be passed through xmm0.
Splitting it up into two separate registers would require something like:
shufpd %xmm0,%xmm1,3 { Copy the high-order Double into the low-order
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:
Bigger challenges would be optimising the modulus of a complex number:
function cmod (z : complex): real; vectorcall;
{ module : r = |z| }
begin
with z do
cmod := sqrt((re * re) + (im * im));
end;
A perfect
This is a long read, so strap in!
Well, I finally got it to work - the required type defintion was as follows:
{$push}
{$codealign RECORDMIN=16}
{$PACKRECORDS C}
{ This record forces "complex" to be aligned to a 16-byte boundary }
type align_dummy = record
filler: array[0..1] of real;
10 matches
Mail list logo