Re: [fpc-devel] x86_64 question

2020-10-01 Thread Nikolay Nikolov via fpc-devel


On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote:
I thought that might be the case - thanks Nikolay.  And I meant to say 
lower bits of a REGISTER, not an instruction!


Admittedly I'm cycle-counting and byte-counting again!  I was looking 
for ways to reduce 13 bytes of padding in one of my pure assembly 
language routines and realised I could make a saving there.  The only 
thing I can think of that I have to watch out for logically is if I 
change, say, TEST EAX, $80 to TEST AL, $80, the latter will set the 
sign flag if the most-significant bit is 1 after the 'and' operation) 
while the former always clears the sign flag.


I have used such subregisters before in the FPC RTL, in fpc_int_real 
and fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead of 
the larger RAX, but that's only after a call to "SHR RAX, 48" that 
guarantees that everything above the 16th bit is zero, and after 
testing other implementation candidates a kind of informal 
competition. (Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax; 
cmp $0x4330,%ax" runs faster than moving 64-bit constants into 
temporary registers (since 64-bit immediates aren't supported outside 
of MOV) and using 'and' and 'cmp' on %rax directly)


I think you always get a read penalty when using the high-byte 
registers because the processor has to do an implicit shift operation.


I don't remember the reason, but I recall reading they are less 
efficient in Agner Fog's optimization manual. Here's the relevant quote:


"Any use of the high 8-bit registers AH, BH, CH, DH should be avoided 
because it can cause false dependences and less efficient code."


It's from the chapter "Partial registers" (page 61) of this document:

https://www.agner.org/optimize/optimizing_assembly.pdf

Highly recommended reading, as it addresses exactly the topic of partial 
registers. In general, it is the partial register writes of 16-bit or 
8-bit subregisters that cause problems - either false read dependencies 
(usually on AMD) or extra penalties for joining/splitting registers (on 
Intel, at least in the P6 era).


Best regards,

Nikolay

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'

2020-10-01 Thread Martin Frb via fpc-devel

On 01/10/2020 23:22, J. Gareth Moreton via fpc-devel wrote:
In a way, yes, but not quite the same, since multiple calls to the 
nested function would still redirect to the same block of code rather 
than being inlined at each call.  I suppose more similar to the old 
GOSUB; RETURN combination in old versions of Basic, and the nested 
routine slotted either at the end of the parent function or, if the 
compiler is intelligent enough, right after one of the function calls 
(in effect, inlining it at this point) so the peephole optimizer can 
then remove a zero-distance jump.


And how to you get back?

When you talk of jump instead of call, I imagine something like:
nop
nop
jmp + 0 // to nested
mov // in nested
jmp ??? // back to outer
nop  // continue outer
nop
jmp -5 // to nested
nop // continue outer

What is the advantage over "call"?


Maybe the following (not sure if beneficial) goes into the same direction

If a "nested proc" (which can have its own locals, and params) does NOT 
recurse, then instead of generating a separate stack frame (enter/leave) 
space for the locals could be allocated at the end of the outer 
function. That means that the basepointer (rbp) can be kept as it is. 
And that also means, that the framepointer for the outer (access to 
outer variables) does not need to be passed, as it is the basepointer.
(This kind of extends the recent "do not pass the outer-fp", if there is 
no access to outer vars, only this time outer vars may be accessed.)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] SSE/AVX instruction encodings

2020-10-01 Thread avx512--- via fpc-devel
Hi Gareth,

in my opinion it is not a good idea to introduce a new function to calculate 
the operand size.

The risk of breaking existing code (fpc and user code) is very high. 

I introduced the system with memrefinfo for sse and avx opcodes to protect the 
existing user code. The basis of this concept is the opcode definition in 
x86ins.dat

In trunk is the definition for opcode VCVTPD2PS: 

; VCVTPD2PS xmmreg_mz,mem256 must come first - map MemRefSize 256bits correct
;  map all other MemrefSize 
(without broasdcast MemRef) to xmmreg, xmmrm 
[VCVTPD2PS,vcvtpd2psM]
(Ch_Wop2, Ch_Rop1)
xmmreg_mz,mem256  \350\352\361\362\364\370\1\x5A\110
AVX,SANDYBRIDGE,TFV
xmmreg_mz,ymmreg  \350\352\361\362\364\370\1\x5A\110
AVX,SANDYBRIDGE
xmmreg_mz,xmmrm   \350\352\361\362\370\1\x5A\110
AVX,SANDYBRIDGE,TFV

// AVX512
xmmreg_mz,bmem64  \350\352\361\370\1\x5A\110
AVX512,BCST2,TFV
xmmreg_mz,bmem64  \350\352\361\364\370\1\x5A\110
AVX512,BCST4,TFV
ymmreg_mz,mem512  \350\351\352\361\370\1\x5A\110
AVX512,TFV
ymmreg_mz,bmem64  \350\351\352\361\370\1\x5A\110
AVX512,BCST8,TFV
ymmreg_mz,zmmreg_er   \350\351\352\361\370\1\x5A\110
AVX512


In trunk is compiling correct (without compileroption -a), with -a is not 
correct. I check this.

Torsten



-Original-Nachricht-
Betreff: Re: [fpc-devel] SSE/AVX instruction encodings
Datum: 2020-10-01T18:04:26+0200
Von: "J. Gareth Moreton via fpc-devel" 
An: "fpc-devel@lists.freepascal.org" 

Hi Torsten,

I've done that already actually, although only to grab the value of the 
ExistsSSEAVX field.  I'm currently testing a new nested function in 
Tx86Instruction.SetInstructionOpsize:

   function CheckSSEAVX: Boolean;
     begin
   Result := False;

   if not MemRefInfo(opcode).ExistsSSEAVX then
     Exit;

   { This check also covers MMX instructions that move data to and from
     32-bit and 64-bit registers or memory, since such instructions are
     replicated in SSE2 for use with XMM registers }
   if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then
     begin
   opsize := S_NO;
   Exit(True);
     end;

   if (tx86operand(operands[1]).opsize <> S_NO) and 
(operands[1].opr.typ = OPR_REFERENCE) then
     begin
   { Memory sizes of 64 bits and under are handled above }
   opsize:=tx86operand(operands[1]).opsize;
   Exit(True);
     end;

   { If the source operand is larger than the destination (e.g.
     "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source 
operand }
   if ((tx86operand(operands[1]).opsize = S_YMM) and 
(tx86operand(operands[2]).opsize = S_XMM)) or
     (tx86operand(operands[1]).opsize = S_ZMM) and 
(tx86operand(operands[2]).opsize = S_XMM) or
     (tx86operand(operands[1]).opsize = S_ZMM) and 
(tx86operand(operands[2]).opsize = S_YMM) then
     begin
   opsize:=tx86operand(operands[1]).opsize;
   Exit(True);
     end;

   { If none of the conditions are met, this function returns False 
and the
     opsize is set to the last operand's opsize }
     end;

I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc 
to see how it handles itself and to simplify the code. "make all" at 
least works successfully and it fixes the bug listed in #37785, but it 
will need extensive testing, lest I break someone's assembly language.

Note that the reason why I've done "(tx86operand(operands[1]).opsize = 
S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not 
something like "(tx86operand(operands[1]).opsize >= S_YMM) and 
(tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is 
for future safety, since the opsize field doesn't have items in size 
order (plus some entries, like S_BL, don't have a distinct size because 
it's a size conversion) and it's to prevent an unintended side-effect if 
a new entry is added after S_ZMM in the future.

One thing that makes it difficult is that I don't have a processor that 
supports the AVX-512 instruction set, at least I don't think it does 
(Intel Core i7-10750H).

Gareth aka. Kit

P.S. If anyone can see a way to break the above code (before I submit a 
patch), please tell me!


On 01/10/2020 15:52, avx512--- via fpc-devel wrote:
> Hi,
>
> look at the function "MemRefInfo(aAsmop: TAsmOp)" in 
> "compiler/x86/aasmcpu.pas".
>
>
> Torsten
>
>
>
> -Original-Nachricht-
> Betreff: [fpc-devel] SSE/AVX instruction encodings
> Datum: 2020-10-01T13:57:05+0200
> Von: "J. Gareth Moreton via fpc-devel" 
> An: "FPC developers' list" 
>
> Hi everyone,
>
> I've decided to take on https://bugs.freepascal.org/view.php?id=37785 -
> I've 

Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'

2020-10-01 Thread J. Gareth Moreton via fpc-devel
In a way, yes, but not quite the same, since multiple calls to the 
nested function would still redirect to the same block of code rather 
than being inlined at each call.  I suppose more similar to the old 
GOSUB; RETURN combination in old versions of Basic, and the nested 
routine slotted either at the end of the parent function or, if the 
compiler is intelligent enough, right after one of the function calls 
(in effect, inlining it at this point) so the peephole optimizer can 
then remove a zero-distance jump.


Gareth aka. Kit

On 01/10/2020 22:10, Ryan Joseph via fpc-devel wrote:



On Oct 1, 2020, at 10:37 AM, J. Gareth Moreton via fpc-devel 
 wrote:

In situations where a nested function has no parameters, is it feasible and 
beneficial to programmatically merge it into the main procedure

What do you mean by "merge"? Like inlining?

Regards,
Ryan Joseph

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'

2020-10-01 Thread Ryan Joseph via fpc-devel


> On Oct 1, 2020, at 10:37 AM, J. Gareth Moreton via fpc-devel 
>  wrote:
> 
> In situations where a nested function has no parameters, is it feasible and 
> beneficial to programmatically merge it into the main procedure

What do you mean by "merge"? Like inlining?

Regards,
Ryan Joseph

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86_64 question

2020-10-01 Thread J. Gareth Moreton via fpc-devel
I thought that might be the case - thanks Nikolay.  And I meant to say 
lower bits of a REGISTER, not an instruction!


Admittedly I'm cycle-counting and byte-counting again!  I was looking 
for ways to reduce 13 bytes of padding in one of my pure assembly 
language routines and realised I could make a saving there.  The only 
thing I can think of that I have to watch out for logically is if I 
change, say, TEST EAX, $80 to TEST AL, $80, the latter will set the sign 
flag if the most-significant bit is 1 after the 'and' operation) while 
the former always clears the sign flag.


I have used such subregisters before in the FPC RTL, in fpc_int_real and 
fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead of the 
larger RAX, but that's only after a call to "SHR RAX, 48" that 
guarantees that everything above the 16th bit is zero, and after testing 
other implementation candidates a kind of informal competition. 
(Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax; cmp $0x4330,%ax" 
runs faster than moving 64-bit constants into temporary registers (since 
64-bit immediates aren't supported outside of MOV) and using 'and' and 
'cmp' on %rax directly)


I think you always get a read penalty when using the high-byte registers 
because the processor has to do an implicit shift operation.


Thanks again for the answer.

Gareth aka. Kit

On 01/10/2020 19:43, Nikolay Nikolov via fpc-devel wrote:


On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,

I have a small question with assembler size optimisation that maybe 
one of you guys can give me a second opinion on:


If you are using the "test" instruction to test some of the lower 
bits of an instruction, e.g. TEST RCX, $2, is there a penalty with 
calling TEST CL, $2 instead? The instruction size is a lot smaller on 
account of the immediate only being 1 byte long instead of 4 bytes, 
and are mathematically equivalent.  I know you have to be careful 
with partial write penalties, but partial reads seem to be a bit more 
nebulous (the register is not modified with TEST).


Yes, I think the shorter TEST CL, $2 is preferred over TEST RCX, $2 on 
every x86_64 CPU. AFAIK, there's no penalty for using 8-bit 
subregisters (except perhaps AH, BH, CH and DH, but the FPC code 
generator doesn't use them). Others can correct me if I'm wrong.


Nikolay

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] r63899 breaks build of FpDebug

2020-10-01 Thread Pascal Riekenberg via fpc-devel
Sorry, wrong list.


Pascal

> Pascal Riekenberg via fpc-devel  hat am 
> 01.10.2020 22:17 geschrieben:
> 
> 
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(265,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsInteger(Int64);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(275,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsCardinal(QWord);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(296,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsBool(Boolean);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(317,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsCardinal(QWord);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(338,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsCardinal(QWord);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(340,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsString(AnsiString);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(406,15) 
> Error: (3058) There is no method in an ancestor class to be overridden: 
> "SetAsCardinal(QWord);"
> C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(1038,1) 
> Fatal: (10026) There were 7 errors compiling module, stopping
> Fatal: (1018) Compilation aborted
> 
> What am i missing? This commit is some days old now.
> 
> 
> Pascal
> 
> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] r63899 breaks build of FpDebug

2020-10-01 Thread Pascal Riekenberg via fpc-devel
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(265,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsInteger(Int64);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(275,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsCardinal(QWord);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(296,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsBool(Boolean);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(317,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsCardinal(QWord);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(338,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsCardinal(QWord);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(340,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsString(AnsiString);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(406,15) Error: 
(3058) There is no method in an ancestor class to be overridden: 
"SetAsCardinal(QWord);"
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(1038,1) Fatal: 
(10026) There were 7 errors compiling module, stopping
Fatal: (1018) Compilation aborted

What am i missing? This commit is some days old now.


Pascal
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86_64 question

2020-10-01 Thread Nikolay Nikolov via fpc-devel


On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote:

Hi everyone,

I have a small question with assembler size optimisation that maybe 
one of you guys can give me a second opinion on:


If you are using the "test" instruction to test some of the lower bits 
of an instruction, e.g. TEST RCX, $2, is there a penalty with calling 
TEST CL, $2 instead? The instruction size is a lot smaller on account 
of the immediate only being 1 byte long instead of 4 bytes, and are 
mathematically equivalent.  I know you have to be careful with partial 
write penalties, but partial reads seem to be a bit more nebulous (the 
register is not modified with TEST).


Yes, I think the shorter TEST CL, $2 is preferred over TEST RCX, $2 on 
every x86_64 CPU. AFAIK, there's no penalty for using 8-bit subregisters 
(except perhaps AH, BH, CH and DH, but the FPC code generator doesn't 
use them). Others can correct me if I'm wrong.


Nikolay

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] x86_64 question

2020-10-01 Thread J. Gareth Moreton via fpc-devel

Hi everyone,

I have a small question with assembler size optimisation that maybe one 
of you guys can give me a second opinion on:


If you are using the "test" instruction to test some of the lower bits 
of an instruction, e.g. TEST RCX, $2, is there a penalty with calling 
TEST CL, $2 instead? The instruction size is a lot smaller on account of 
the immediate only being 1 byte long instead of 4 bytes, and are 
mathematically equivalent.  I know you have to be careful with partial 
write penalties, but partial reads seem to be a bit more nebulous (the 
register is not modified with TEST).


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'

2020-10-01 Thread J. Gareth Moreton via fpc-devel

Hi everyone,

This is an idea that sprung to mind while looking at fixing an unrelated 
bug, and that's to do with nested functions.


In situations where a nested function has no parameters, is it feasible 
and beneficial to programmatically merge it into the main procedure in 
some circumstances (it wouldn't be possible if the nested routine has 
inline assembly language because RET won't behave the same, for 
example), using jumps to navigate to, from and around it? I know for one 
thing it will possibly free up a register in some calculations because 
it doesn't have to pass the base pointer (e.g. RBP) as a hidden parameter.


On a similar topic, one person mentioned that GCC and other compilers 
sometimes 'outline' conditional branches by effectively moving the 
branch into a nested procedure in order to help with caching.by giving 
the main procedure a smaller memory footprint.


Might this be something worth researching?

Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] SSE/AVX instruction encodings

2020-10-01 Thread J. Gareth Moreton via fpc-devel

Hi Torsten,

I've done that already actually, although only to grab the value of the 
ExistsSSEAVX field.  I'm currently testing a new nested function in 
Tx86Instruction.SetInstructionOpsize:


  function CheckSSEAVX: Boolean;
    begin
  Result := False;

  if not MemRefInfo(opcode).ExistsSSEAVX then
    Exit;

  { This check also covers MMX instructions that move data to and from
    32-bit and 64-bit registers or memory, since such instructions are
    replicated in SSE2 for use with XMM registers }
  if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then
    begin
  opsize := S_NO;
  Exit(True);
    end;

  if (tx86operand(operands[1]).opsize <> S_NO) and 
(operands[1].opr.typ = OPR_REFERENCE) then

    begin
  { Memory sizes of 64 bits and under are handled above }
  opsize:=tx86operand(operands[1]).opsize;
  Exit(True);
    end;

  { If the source operand is larger than the destination (e.g.
    "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source 
operand }
  if ((tx86operand(operands[1]).opsize = S_YMM) and 
(tx86operand(operands[2]).opsize = S_XMM)) or
    (tx86operand(operands[1]).opsize = S_ZMM) and 
(tx86operand(operands[2]).opsize = S_XMM) or
    (tx86operand(operands[1]).opsize = S_ZMM) and 
(tx86operand(operands[2]).opsize = S_YMM) then

    begin
  opsize:=tx86operand(operands[1]).opsize;
  Exit(True);
    end;

  { If none of the conditions are met, this function returns False 
and the

    opsize is set to the last operand's opsize }
    end;

I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc 
to see how it handles itself and to simplify the code. "make all" at 
least works successfully and it fixes the bug listed in #37785, but it 
will need extensive testing, lest I break someone's assembly language.


Note that the reason why I've done "(tx86operand(operands[1]).opsize = 
S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not 
something like "(tx86operand(operands[1]).opsize >= S_YMM) and 
(tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is 
for future safety, since the opsize field doesn't have items in size 
order (plus some entries, like S_BL, don't have a distinct size because 
it's a size conversion) and it's to prevent an unintended side-effect if 
a new entry is added after S_ZMM in the future.


One thing that makes it difficult is that I don't have a processor that 
supports the AVX-512 instruction set, at least I don't think it does 
(Intel Core i7-10750H).


Gareth aka. Kit

P.S. If anyone can see a way to break the above code (before I submit a 
patch), please tell me!



On 01/10/2020 15:52, avx512--- via fpc-devel wrote:

Hi,

look at the function "MemRefInfo(aAsmop: TAsmOp)" in "compiler/x86/aasmcpu.pas".


Torsten



-Original-Nachricht-
Betreff: [fpc-devel] SSE/AVX instruction encodings
Datum: 2020-10-01T13:57:05+0200
Von: "J. Gareth Moreton via fpc-devel" 
An: "FPC developers' list" 

Hi everyone,

I've decided to take on https://bugs.freepascal.org/view.php?id=37785 -
I've noticed that the compiler isn't too good at working out the sizes
of SSE and AVX instructions.  If you look at
Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it
checks for individual problematic instructions rather than any logical
flags.  I feel this isn't viable in the long-term (i.e. I really don't
want to continually add exceptional instructions) and has the code smell
of something being fundamentally wrong or incomplete with how
instruction sizes and encodings are determined.

I'm looking to see if there's a way I can detect the correct size
logically given the flags.  I figure I'll need to learn a few things
about AVX512 as well so I don't mess anything up (I've noticed a few
AVX512 flags to indicate if scalars rather than vectors are being used,
and wondering if they can be incorporated into the older SSE and AVX
instructions in x86ins.dat.

Long story short, I'm going to experiment a bit to see if I can develop
an algorithm that works and is correct.

Gareth aka. Kit




--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] SSE/AVX instruction encodings

2020-10-01 Thread avx512--- via fpc-devel
Hi,

look at the function "MemRefInfo(aAsmop: TAsmOp)" in "compiler/x86/aasmcpu.pas".


Torsten



-Original-Nachricht-
Betreff: [fpc-devel] SSE/AVX instruction encodings
Datum: 2020-10-01T13:57:05+0200
Von: "J. Gareth Moreton via fpc-devel" 
An: "FPC developers' list" 

Hi everyone,

I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - 
I've noticed that the compiler isn't too good at working out the sizes 
of SSE and AVX instructions.  If you look at 
Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it 
checks for individual problematic instructions rather than any logical 
flags.  I feel this isn't viable in the long-term (i.e. I really don't 
want to continually add exceptional instructions) and has the code smell 
of something being fundamentally wrong or incomplete with how 
instruction sizes and encodings are determined.

I'm looking to see if there's a way I can detect the correct size 
logically given the flags.  I figure I'll need to learn a few things 
about AVX512 as well so I don't mess anything up (I've noticed a few 
AVX512 flags to indicate if scalars rather than vectors are being used, 
and wondering if they can be incorporated into the older SSE and AVX 
instructions in x86ins.dat.

Long story short, I'm going to experiment a bit to see if I can develop 
an algorithm that works and is correct.

Gareth aka. Kit


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] SSE/AVX instruction encodings

2020-10-01 Thread J. Gareth Moreton via fpc-devel

Hi everyone,

I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - 
I've noticed that the compiler isn't too good at working out the sizes 
of SSE and AVX instructions.  If you look at 
Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it 
checks for individual problematic instructions rather than any logical 
flags.  I feel this isn't viable in the long-term (i.e. I really don't 
want to continually add exceptional instructions) and has the code smell 
of something being fundamentally wrong or incomplete with how 
instruction sizes and encodings are determined.


I'm looking to see if there's a way I can detect the correct size 
logically given the flags.  I figure I'll need to learn a few things 
about AVX512 as well so I don't mess anything up (I've noticed a few 
AVX512 flags to indicate if scalars rather than vectors are being used, 
and wondering if they can be incorporated into the older SSE and AVX 
instructions in x86ins.dat.


Long story short, I'm going to experiment a bit to see if I can develop 
an algorithm that works and is correct.


Gareth aka. Kit


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel