Sure, I can send you something. It might have to be to a personal
e-mail though depending on how big the attachments are. Watch this space.
I may be a bit of a mad scientist when it comes to my testing and
research (and sometimes I make a stupid mistake like with the recent
nested function
On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote:
Confirmed my suspicions. if I zero the upper bits of the register (I
used something akin to "AND RCX, $F"), there is no speed loss.
Therefore, I can make the hypothesis, on my Intel(R) Core(TM)
i7-10750H, that using TEST on a
Confirmed my suspicions. if I zero the upper bits of the register (I
used something akin to "AND RCX, $F"), there is no speed loss.
Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H,
that using TEST on a sub-register causes a false dependency if the bits
outside of the
So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4 and
the like in a number-crunching function, and it seems to cause a notable
penalty, even though none of the instructions are in my critical loop.
So I think it's something that needs to be avoided in most cases. I
think
In the meantime, I've uploaded the patch to the bug report after
confirming that all tests on x86_64-win64 have passed with no
regressions: https://bugs.freepascal.org/view.php?id=37785
Other platforms and AVX-512-specific code still need testing though.
Gareth aka. Kit
On 02/10/2020 07:59,
Ah brilliant, thank you.
I have used Agner Fog's material before for cycle counting. When I
implemented my 3 MOV -> XCHG optimisation
(https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog's
empirical results to determine when it's best to apply this optimisation
where speed is
Ah crumbs, I thought it was too easy! I can't believe I missed the
obvious there! Not much of a saving if it has to store the return
address somewhere (mov @return(%rip),%rcx; mov %rcx, (somewhere on the
stack)).
The advantage would be reducing the chance of additional memory caching
since
Hi Torsten,
The reason why it's not compiling correctly with -a is because the
operand size is being set to S_XMM, not S_YMM (because it's going by the
size of the source operand), so when writing the .s files, it adds an
'x' suffix to the end of the opcode.
I know there's a high risk of it