Re: [fpc-devel] x86_64 question
On 10/1/20 11:36 PM, J. Gareth Moreton via fpc-devel wrote: I thought that might be the case - thanks Nikolay. And I meant to say lower bits of a REGISTER, not an instruction! Admittedly I'm cycle-counting and byte-counting again! I was looking for ways to reduce 13 bytes of padding in one of my pure assembly language routines and realised I could make a saving there. The only thing I can think of that I have to watch out for logically is if I change, say, TEST EAX, $80 to TEST AL, $80, the latter will set the sign flag if the most-significant bit is 1 after the 'and' operation) while the former always clears the sign flag. I have used such subregisters before in the FPC RTL, in fpc_int_real and fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead of the larger RAX, but that's only after a call to "SHR RAX, 48" that guarantees that everything above the 16th bit is zero, and after testing other implementation candidates a kind of informal competition. (Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax; cmp $0x4330,%ax" runs faster than moving 64-bit constants into temporary registers (since 64-bit immediates aren't supported outside of MOV) and using 'and' and 'cmp' on %rax directly) I think you always get a read penalty when using the high-byte registers because the processor has to do an implicit shift operation. I don't remember the reason, but I recall reading they are less efficient in Agner Fog's optimization manual. Here's the relevant quote: "Any use of the high 8-bit registers AH, BH, CH, DH should be avoided because it can cause false dependences and less efficient code." It's from the chapter "Partial registers" (page 61) of this document: https://www.agner.org/optimize/optimizing_assembly.pdf Highly recommended reading, as it addresses exactly the topic of partial registers. In general, it is the partial register writes of 16-bit or 8-bit subregisters that cause problems - either false read dependencies (usually on AMD) or extra penalties for joining/splitting registers (on Intel, at least in the P6 era). Best regards, Nikolay ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'
On 01/10/2020 23:22, J. Gareth Moreton via fpc-devel wrote: In a way, yes, but not quite the same, since multiple calls to the nested function would still redirect to the same block of code rather than being inlined at each call. I suppose more similar to the old GOSUB; RETURN combination in old versions of Basic, and the nested routine slotted either at the end of the parent function or, if the compiler is intelligent enough, right after one of the function calls (in effect, inlining it at this point) so the peephole optimizer can then remove a zero-distance jump. And how to you get back? When you talk of jump instead of call, I imagine something like: nop nop jmp + 0 // to nested mov // in nested jmp ??? // back to outer nop // continue outer nop jmp -5 // to nested nop // continue outer What is the advantage over "call"? Maybe the following (not sure if beneficial) goes into the same direction If a "nested proc" (which can have its own locals, and params) does NOT recurse, then instead of generating a separate stack frame (enter/leave) space for the locals could be allocated at the end of the outer function. That means that the basepointer (rbp) can be kept as it is. And that also means, that the framepointer for the outer (access to outer variables) does not need to be passed, as it is the basepointer. (This kind of extends the recent "do not pass the outer-fp", if there is no access to outer vars, only this time outer vars may be accessed.) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] SSE/AVX instruction encodings
Hi Gareth, in my opinion it is not a good idea to introduce a new function to calculate the operand size. The risk of breaking existing code (fpc and user code) is very high. I introduced the system with memrefinfo for sse and avx opcodes to protect the existing user code. The basis of this concept is the opcode definition in x86ins.dat In trunk is the definition for opcode VCVTPD2PS: ; VCVTPD2PS xmmreg_mz,mem256 must come first - map MemRefSize 256bits correct ; map all other MemrefSize (without broasdcast MemRef) to xmmreg, xmmrm [VCVTPD2PS,vcvtpd2psM] (Ch_Wop2, Ch_Rop1) xmmreg_mz,mem256 \350\352\361\362\364\370\1\x5A\110 AVX,SANDYBRIDGE,TFV xmmreg_mz,ymmreg \350\352\361\362\364\370\1\x5A\110 AVX,SANDYBRIDGE xmmreg_mz,xmmrm \350\352\361\362\370\1\x5A\110 AVX,SANDYBRIDGE,TFV // AVX512 xmmreg_mz,bmem64 \350\352\361\370\1\x5A\110 AVX512,BCST2,TFV xmmreg_mz,bmem64 \350\352\361\364\370\1\x5A\110 AVX512,BCST4,TFV ymmreg_mz,mem512 \350\351\352\361\370\1\x5A\110 AVX512,TFV ymmreg_mz,bmem64 \350\351\352\361\370\1\x5A\110 AVX512,BCST8,TFV ymmreg_mz,zmmreg_er \350\351\352\361\370\1\x5A\110 AVX512 In trunk is compiling correct (without compileroption -a), with -a is not correct. I check this. Torsten -Original-Nachricht- Betreff: Re: [fpc-devel] SSE/AVX instruction encodings Datum: 2020-10-01T18:04:26+0200 Von: "J. Gareth Moreton via fpc-devel" An: "fpc-devel@lists.freepascal.org" Hi Torsten, I've done that already actually, although only to grab the value of the ExistsSSEAVX field. I'm currently testing a new nested function in Tx86Instruction.SetInstructionOpsize: function CheckSSEAVX: Boolean; begin Result := False; if not MemRefInfo(opcode).ExistsSSEAVX then Exit; { This check also covers MMX instructions that move data to and from 32-bit and 64-bit registers or memory, since such instructions are replicated in SSE2 for use with XMM registers } if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then begin opsize := S_NO; Exit(True); end; if (tx86operand(operands[1]).opsize <> S_NO) and (operands[1].opr.typ = OPR_REFERENCE) then begin { Memory sizes of 64 bits and under are handled above } opsize:=tx86operand(operands[1]).opsize; Exit(True); end; { If the source operand is larger than the destination (e.g. "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source operand } if ((tx86operand(operands[1]).opsize = S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)) or (tx86operand(operands[1]).opsize = S_ZMM) and (tx86operand(operands[2]).opsize = S_XMM) or (tx86operand(operands[1]).opsize = S_ZMM) and (tx86operand(operands[2]).opsize = S_YMM) then begin opsize:=tx86operand(operands[1]).opsize; Exit(True); end; { If none of the conditions are met, this function returns False and the opsize is set to the last operand's opsize } end; I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc to see how it handles itself and to simplify the code. "make all" at least works successfully and it fixes the bug listed in #37785, but it will need extensive testing, lest I break someone's assembly language. Note that the reason why I've done "(tx86operand(operands[1]).opsize = S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not something like "(tx86operand(operands[1]).opsize >= S_YMM) and (tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is for future safety, since the opsize field doesn't have items in size order (plus some entries, like S_BL, don't have a distinct size because it's a size conversion) and it's to prevent an unintended side-effect if a new entry is added after S_ZMM in the future. One thing that makes it difficult is that I don't have a processor that supports the AVX-512 instruction set, at least I don't think it does (Intel Core i7-10750H). Gareth aka. Kit P.S. If anyone can see a way to break the above code (before I submit a patch), please tell me! On 01/10/2020 15:52, avx512--- via fpc-devel wrote: > Hi, > > look at the function "MemRefInfo(aAsmop: TAsmOp)" in > "compiler/x86/aasmcpu.pas". > > > Torsten > > > > -Original-Nachricht- > Betreff: [fpc-devel] SSE/AVX instruction encodings > Datum: 2020-10-01T13:57:05+0200 > Von: "J. Gareth Moreton via fpc-devel" > An: "FPC developers' list" > > Hi everyone, > > I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - > I've
Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'
In a way, yes, but not quite the same, since multiple calls to the nested function would still redirect to the same block of code rather than being inlined at each call. I suppose more similar to the old GOSUB; RETURN combination in old versions of Basic, and the nested routine slotted either at the end of the parent function or, if the compiler is intelligent enough, right after one of the function calls (in effect, inlining it at this point) so the peephole optimizer can then remove a zero-distance jump. Gareth aka. Kit On 01/10/2020 22:10, Ryan Joseph via fpc-devel wrote: On Oct 1, 2020, at 10:37 AM, J. Gareth Moreton via fpc-devel wrote: In situations where a nested function has no parameters, is it feasible and beneficial to programmatically merge it into the main procedure What do you mean by "merge"? Like inlining? Regards, Ryan Joseph ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'
> On Oct 1, 2020, at 10:37 AM, J. Gareth Moreton via fpc-devel > wrote: > > In situations where a nested function has no parameters, is it feasible and > beneficial to programmatically merge it into the main procedure What do you mean by "merge"? Like inlining? Regards, Ryan Joseph ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86_64 question
I thought that might be the case - thanks Nikolay. And I meant to say lower bits of a REGISTER, not an instruction! Admittedly I'm cycle-counting and byte-counting again! I was looking for ways to reduce 13 bytes of padding in one of my pure assembly language routines and realised I could make a saving there. The only thing I can think of that I have to watch out for logically is if I change, say, TEST EAX, $80 to TEST AL, $80, the latter will set the sign flag if the most-significant bit is 1 after the 'and' operation) while the former always clears the sign flag. I have used such subregisters before in the FPC RTL, in fpc_int_real and fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead of the larger RAX, but that's only after a call to "SHR RAX, 48" that guarantees that everything above the 16th bit is zero, and after testing other implementation candidates a kind of informal competition. (Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax; cmp $0x4330,%ax" runs faster than moving 64-bit constants into temporary registers (since 64-bit immediates aren't supported outside of MOV) and using 'and' and 'cmp' on %rax directly) I think you always get a read penalty when using the high-byte registers because the processor has to do an implicit shift operation. Thanks again for the answer. Gareth aka. Kit On 01/10/2020 19:43, Nikolay Nikolov via fpc-devel wrote: On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote: Hi everyone, I have a small question with assembler size optimisation that maybe one of you guys can give me a second opinion on: If you are using the "test" instruction to test some of the lower bits of an instruction, e.g. TEST RCX, $2, is there a penalty with calling TEST CL, $2 instead? The instruction size is a lot smaller on account of the immediate only being 1 byte long instead of 4 bytes, and are mathematically equivalent. I know you have to be careful with partial write penalties, but partial reads seem to be a bit more nebulous (the register is not modified with TEST). Yes, I think the shorter TEST CL, $2 is preferred over TEST RCX, $2 on every x86_64 CPU. AFAIK, there's no penalty for using 8-bit subregisters (except perhaps AH, BH, CH and DH, but the FPC code generator doesn't use them). Others can correct me if I'm wrong. Nikolay ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] r63899 breaks build of FpDebug
Sorry, wrong list. Pascal > Pascal Riekenberg via fpc-devel hat am > 01.10.2020 22:17 geschrieben: > > > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(265,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsInteger(Int64);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(275,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsCardinal(QWord);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(296,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsBool(Boolean);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(317,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsCardinal(QWord);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(338,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsCardinal(QWord);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(340,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsString(AnsiString);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(406,15) > Error: (3058) There is no method in an ancestor class to be overridden: > "SetAsCardinal(QWord);" > C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(1038,1) > Fatal: (10026) There were 7 errors compiling module, stopping > Fatal: (1018) Compilation aborted > > What am i missing? This commit is some days old now. > > > Pascal > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] r63899 breaks build of FpDebug
C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(265,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsInteger(Int64);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(275,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsCardinal(QWord);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(296,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsBool(Boolean);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(317,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsCardinal(QWord);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(338,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsCardinal(QWord);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(340,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsString(AnsiString);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(406,15) Error: (3058) There is no method in an ancestor class to be overridden: "SetAsCardinal(QWord);" C:\Users\public\freepascal\laz\components\fpdebug\fpdbgdwarf.pas(1038,1) Fatal: (10026) There were 7 errors compiling module, stopping Fatal: (1018) Compilation aborted What am i missing? This commit is some days old now. Pascal ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86_64 question
On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote: Hi everyone, I have a small question with assembler size optimisation that maybe one of you guys can give me a second opinion on: If you are using the "test" instruction to test some of the lower bits of an instruction, e.g. TEST RCX, $2, is there a penalty with calling TEST CL, $2 instead? The instruction size is a lot smaller on account of the immediate only being 1 byte long instead of 4 bytes, and are mathematically equivalent. I know you have to be careful with partial write penalties, but partial reads seem to be a bit more nebulous (the register is not modified with TEST). Yes, I think the shorter TEST CL, $2 is preferred over TEST RCX, $2 on every x86_64 CPU. AFAIK, there's no penalty for using 8-bit subregisters (except perhaps AH, BH, CH and DH, but the FPC code generator doesn't use them). Others can correct me if I'm wrong. Nikolay ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] x86_64 question
Hi everyone, I have a small question with assembler size optimisation that maybe one of you guys can give me a second opinion on: If you are using the "test" instruction to test some of the lower bits of an instruction, e.g. TEST RCX, $2, is there a penalty with calling TEST CL, $2 instead? The instruction size is a lot smaller on account of the immediate only being 1 byte long instead of 4 bytes, and are mathematically equivalent. I know you have to be careful with partial write penalties, but partial reads seem to be a bit more nebulous (the register is not modified with TEST). Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Proposal/discussion: Simple nested functions and 'outlining'
Hi everyone, This is an idea that sprung to mind while looking at fixing an unrelated bug, and that's to do with nested functions. In situations where a nested function has no parameters, is it feasible and beneficial to programmatically merge it into the main procedure in some circumstances (it wouldn't be possible if the nested routine has inline assembly language because RET won't behave the same, for example), using jumps to navigate to, from and around it? I know for one thing it will possibly free up a register in some calculations because it doesn't have to pass the base pointer (e.g. RBP) as a hidden parameter. On a similar topic, one person mentioned that GCC and other compilers sometimes 'outline' conditional branches by effectively moving the branch into a nested procedure in order to help with caching.by giving the main procedure a smaller memory footprint. Might this be something worth researching? Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] SSE/AVX instruction encodings
Hi Torsten, I've done that already actually, although only to grab the value of the ExistsSSEAVX field. I'm currently testing a new nested function in Tx86Instruction.SetInstructionOpsize: function CheckSSEAVX: Boolean; begin Result := False; if not MemRefInfo(opcode).ExistsSSEAVX then Exit; { This check also covers MMX instructions that move data to and from 32-bit and 64-bit registers or memory, since such instructions are replicated in SSE2 for use with XMM registers } if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then begin opsize := S_NO; Exit(True); end; if (tx86operand(operands[1]).opsize <> S_NO) and (operands[1].opr.typ = OPR_REFERENCE) then begin { Memory sizes of 64 bits and under are handled above } opsize:=tx86operand(operands[1]).opsize; Exit(True); end; { If the source operand is larger than the destination (e.g. "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source operand } if ((tx86operand(operands[1]).opsize = S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)) or (tx86operand(operands[1]).opsize = S_ZMM) and (tx86operand(operands[2]).opsize = S_XMM) or (tx86operand(operands[1]).opsize = S_ZMM) and (tx86operand(operands[2]).opsize = S_YMM) then begin opsize:=tx86operand(operands[1]).opsize; Exit(True); end; { If none of the conditions are met, this function returns False and the opsize is set to the last operand's opsize } end; I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc to see how it handles itself and to simplify the code. "make all" at least works successfully and it fixes the bug listed in #37785, but it will need extensive testing, lest I break someone's assembly language. Note that the reason why I've done "(tx86operand(operands[1]).opsize = S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not something like "(tx86operand(operands[1]).opsize >= S_YMM) and (tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is for future safety, since the opsize field doesn't have items in size order (plus some entries, like S_BL, don't have a distinct size because it's a size conversion) and it's to prevent an unintended side-effect if a new entry is added after S_ZMM in the future. One thing that makes it difficult is that I don't have a processor that supports the AVX-512 instruction set, at least I don't think it does (Intel Core i7-10750H). Gareth aka. Kit P.S. If anyone can see a way to break the above code (before I submit a patch), please tell me! On 01/10/2020 15:52, avx512--- via fpc-devel wrote: Hi, look at the function "MemRefInfo(aAsmop: TAsmOp)" in "compiler/x86/aasmcpu.pas". Torsten -Original-Nachricht- Betreff: [fpc-devel] SSE/AVX instruction encodings Datum: 2020-10-01T13:57:05+0200 Von: "J. Gareth Moreton via fpc-devel" An: "FPC developers' list" Hi everyone, I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - I've noticed that the compiler isn't too good at working out the sizes of SSE and AVX instructions. If you look at Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it checks for individual problematic instructions rather than any logical flags. I feel this isn't viable in the long-term (i.e. I really don't want to continually add exceptional instructions) and has the code smell of something being fundamentally wrong or incomplete with how instruction sizes and encodings are determined. I'm looking to see if there's a way I can detect the correct size logically given the flags. I figure I'll need to learn a few things about AVX512 as well so I don't mess anything up (I've noticed a few AVX512 flags to indicate if scalars rather than vectors are being used, and wondering if they can be incorporated into the older SSE and AVX instructions in x86ins.dat. Long story short, I'm going to experiment a bit to see if I can develop an algorithm that works and is correct. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] SSE/AVX instruction encodings
Hi, look at the function "MemRefInfo(aAsmop: TAsmOp)" in "compiler/x86/aasmcpu.pas". Torsten -Original-Nachricht- Betreff: [fpc-devel] SSE/AVX instruction encodings Datum: 2020-10-01T13:57:05+0200 Von: "J. Gareth Moreton via fpc-devel" An: "FPC developers' list" Hi everyone, I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - I've noticed that the compiler isn't too good at working out the sizes of SSE and AVX instructions. If you look at Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it checks for individual problematic instructions rather than any logical flags. I feel this isn't viable in the long-term (i.e. I really don't want to continually add exceptional instructions) and has the code smell of something being fundamentally wrong or incomplete with how instruction sizes and encodings are determined. I'm looking to see if there's a way I can detect the correct size logically given the flags. I figure I'll need to learn a few things about AVX512 as well so I don't mess anything up (I've noticed a few AVX512 flags to indicate if scalars rather than vectors are being used, and wondering if they can be incorporated into the older SSE and AVX instructions in x86ins.dat. Long story short, I'm going to experiment a bit to see if I can develop an algorithm that works and is correct. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] SSE/AVX instruction encodings
Hi everyone, I've decided to take on https://bugs.freepascal.org/view.php?id=37785 - I've noticed that the compiler isn't too good at working out the sizes of SSE and AVX instructions. If you look at Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it checks for individual problematic instructions rather than any logical flags. I feel this isn't viable in the long-term (i.e. I really don't want to continually add exceptional instructions) and has the code smell of something being fundamentally wrong or incomplete with how instruction sizes and encodings are determined. I'm looking to see if there's a way I can detect the correct size logically given the flags. I figure I'll need to learn a few things about AVX512 as well so I don't mess anything up (I've noticed a few AVX512 flags to indicate if scalars rather than vectors are being used, and wondering if they can be incorporated into the older SSE and AVX instructions in x86ins.dat. Long story short, I'm going to experiment a bit to see if I can develop an algorithm that works and is correct. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel