Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Sure, I can send you something.  It might have to be to a personal e-mail though depending on how big the attachments are. Watch this space. I may be a bit of a mad scientist when it comes to my testing and research (and sometimes I make a stupid mistake like with the recent nested function

Re: [fpc-devel] x86_64 question

2020-10-02 Thread Nikolay Nikolov via fpc-devel
On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote: Confirmed my suspicions.  if I zero the upper bits of the register (I used something akin to "AND RCX, $F"), there is no speed loss. Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H, that using TEST on a

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Confirmed my suspicions.  if I zero the upper bits of the register (I used something akin to "AND RCX, $F"), there is no speed loss. Therefore, I can make the hypothesis, on my Intel(R) Core(TM) i7-10750H, that using TEST on a sub-register causes a false dependency if the bits outside of the

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
So... I've done some tests, replacing TEST RCX, $4 with TEST CL, $4 and the like in a number-crunching function, and it seems to cause a notable penalty, even though none of the instructions are in my critical loop.  So I think it's something that needs to be avoided in most cases.  I think

Re: [fpc-devel] SSE/AVX instruction encodings

2020-10-02 Thread J. Gareth Moreton via fpc-devel
In the meantime, I've uploaded the patch to the bug report after confirming that all tests on x86_64-win64 have passed with no regressions: https://bugs.freepascal.org/view.php?id=37785 Other platforms and AVX-512-specific code still need testing though. Gareth aka. Kit On 02/10/2020 07:59,

Re: [fpc-devel] x86_64 question

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Ah brilliant, thank you. I have used Agner Fog's material before for cycle counting.  When I implemented my 3 MOV -> XCHG optimisation (https://bugs.freepascal.org/view.php?id=36511), I used Agner Fog's empirical results to determine when it's best to apply this optimisation where speed is

Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Ah crumbs, I thought it was too easy!  I can't believe I missed the obvious there!  Not much of a saving if it has to store the return address somewhere (mov @return(%rip),%rcx; mov %rcx, (somewhere on the stack)). The advantage would be reducing the chance of additional memory caching since

Re: [fpc-devel] SSE/AVX instruction encodings

2020-10-02 Thread J. Gareth Moreton via fpc-devel
Hi Torsten, The reason why it's not compiling correctly with -a is because the operand size is being set to S_XMM, not S_YMM (because it's going by the size of the source operand), so when writing the .s files, it adds an 'x' suffix to the end of the opcode. I know there's a high risk of it