Re: [fpc-devel] Textmode ide, patch
> 5) make Ctrl+Enter shortcut key possible in unix/linux xterm: ctrl_enter_xterm_unix_keyboard.diff I found out that this patch also eliminates Ctrl+J shortcut key and turn it into Ctrl+Enter. But that is what already happening with Ctrl+i, Ctrl+M, Ctrl+[, Ctrl+H... More investigation lead me to conclusion that there is no true fully working keyboard for unix/linux, unless hooking all "Ctrl+Key" to escape strings. That is possible only in particular configuration settings which no one has. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Textmode ide, patch
Improvements for Textmode ide 1) current ide (of fpc 3.3.1) has problems with second compilations. Resolve issue with this patch tdef_nil_for_ide_fpc331.patch 2) add missing xterm escape string sequences keyboard_add_escape_keys_unix_fpc331.patch 3) fpc 3.2.3 add missing xterm escape string sequences keyboard_add_escape_keys_unix_fixes32.patch Also make sure shift state of escaped keys is transferred to TKeyEvent. 4) kb_scancode_updates.patch safe to apply after keyboard_add_escape_keys_unix_xxx.patch Some scancode have been bogus for a good reason. I suggest to fix some of them, and be more consistent among all platforms. Patched file keyscan.inc is in active use only by unix/keyboard.pp Drivers unit in FV, have internal fix ups, it's fine to make those changes. 5) make Ctrl+Enter shortcut key possible in unix/linux xterm: ctrl_enter_xterm_unix_keyboard.diff patch is as good for fpc 3.3.1 as for fixes 3.2 Apply patch and make entry in xterm configuration file .Xresources xterm*VT100.Translations: #override \ Ctrl Return: string(0x0a) diff -ru a/packages/rtl-console/src/inc/keyscan.inc b/packages/rtl-console/src/inc/keyscan.inc --- a/packages/rtl-console/src/inc/keyscan.inc 2024-05-09 00:13:05.0 + +++ b/packages/rtl-console/src/inc/keyscan.inc 2024-05-28 07:03:45.682436000 + @@ -4,12 +4,8 @@ kbAltEsc = $01; {Alt+Esc = scancode 01, ascii code 0.} kbEsc = $01; {Esc = scancode 01, ascii code 27.} kbAltSpace= $02; - kbCtrlIns = $04; - kbShiftIns= $05; - kbCtrlDel = $06; - kbShiftDel= $07; - kbAltBack = $08; kbAltShiftBack= $09; + kbAltBack = $0E; kbShiftTab= $0F; kbAltQ= $10; kbAltW= $11; @@ -68,7 +64,9 @@ kbDown= $50; kbPgDn= $51; kbIns = $52; + kbShiftIns= $52; { Differs from kbIns only by shift state } kbDel = $53; + kbShiftDel= $53; { Differs from kbDel only by shift state } kbShiftF1 = $54; kbShiftF2 = $55; kbShiftF3 = $56; @@ -131,6 +129,8 @@ kbCtrlCenter = $8F; kbCtrlGreyPlus= $90; kbCtrlDown= $91; + kbCtrlIns = $92; + kbCtrlDel = $93; kbCtrlTab = $94; kbAltHome = $97; kbAltUp = $98; diff -ru a/packages/rtl-console/src/unix/keyboard.pp b/packages/rtl-console/src/unix/keyboard.pp --- a/packages/rtl-console/src/unix/keyboard.pp 2023-01-19 15:42:50.438798000 + +++ b/packages/rtl-console/src/unix/keyboard.pp 2024-05-27 23:54:54.113784131 + @@ -39,6 +39,7 @@ char : byte; ScanValue : byte; CharValue : byte; +ShiftValue : byte; SpecialHandler : Tprocedure; end; @@ -76,7 +77,7 @@ const KeyBufferSize = 20; var - KeyBuffer : Array[0..KeyBufferSize-1] of Char; + KeyBuffer : Array[0..KeyBufferSize-1] of record ch: Char; sh :byte end; KeyPut, KeySend : longint; @@ -92,18 +93,18 @@ {$i keyscan.inc} {Some internal only scancodes} -const KbShiftUp= $f0; - KbShiftLeft = $f1; - KbShiftRight = $f2; - KbShiftDown = $f3; - KbShiftHome = $f4; - KbShiftEnd = $f5; - KbCtrlShiftUp= $f6; - KbCtrlShiftDown = $f7; - KbCtrlShiftRight = $f8; - KbCtrlShiftLeft = $f9; - KbCtrlShiftHome = $fa; - KbCtrlShiftEnd = $fb; +const kbAltCenter = kbCenter; + kbShiftCenter = kbCenter; + + KbShiftUp= KbUp; + KbShiftLeft = KbLeft; + KbShiftRight = KbRight; + KbShiftDown = KbDown; + KbShiftHome = KbHome; + KbShiftEnd = kbEnd; + kbShiftPgUp = kbPgUp; + kbShiftPgDn = kbPgDn; + double_esc_hack_enabled : boolean = false; @@ -412,7 +413,7 @@ InTail:=0; end; -procedure PushKey(Ch:char); +procedure PushKey(Ch:char;aShift:byte); var Tmp : Longint; begin @@ -421,17 +422,22 @@ If KeyPut>=KeyBufferSize Then KeyPut:=0; If KeyPut<>KeySend Then - KeyBuffer[Tmp]:=Ch + begin + KeyBuffer[Tmp].ch:=Ch; + KeyBuffer[Tmp].sh:=aShift; + end Else KeyPut:=Tmp; End; -function PopKey:char; +function PopKey(var aShift:byte):char; begin + aShift:=0; If KeyPut<>KeySend Then begin - PopKey:=KeyBuffer[KeySend]; + PopKey:=KeyBuffer[KeySend].ch; + aShift:=KeyBuffer[KeySend].sh; Inc(KeySend); If KeySend>=KeyBufferSize Then KeySend:=0; @@ -441,10 +447,10 @@ End; -procedure PushExt(b:byte); +procedure PushExt(b:byte;sh:byte); begin - PushKey(#0); - PushKey(chr(b)); + PushKey(#0,sh); + PushKey(chr(b),0); end; @@ -742,7 +748,7 @@ Pa^.Child:=newPtree; end; -function DoAddSequence(const St : String; AChar,AScan :byte) : PTreeElement; +function DoAddSequence(const St : String; AChar,AScan, AShift :byte) : PTreeElement; var CurPTree,NPT : PTreeElement; c : byte; @@ -803,9 +809,12 @@ Writeln(system.stderr,'Scan was ',ScanValue,' now ',AScan);
[fpc-devel] x86 assembler improvements, patch
Some compiler x86 assembler improvements 1) patch for fpc 3.3.1 (attachment: mkx86ins_version_bump.patch) compiler/utils/mkx86ins.pp Version bumped from 1.6.1 to 1.6.2 There has been changes to code, so version has to represent that. 2) Patch to enable ENTER asm instruction (attachment: enable_asm_instr_enter.patch) same for fpc 3.3.1 and fixes 3.2 3) patch for fpc 3.3.1 compiler/x86/x86ins.dat (attachment: x86ins_4_fpc331.patch) 3.1) Rename 3DNow instruction (fixed long lasting typo in mnemonic). PMULHRWA --> PMULHRW 3.2) Add vpclmullqlqdq, vpclmulhqlqdq, vpclmullqhqdq, vpclmulhqhqdq. 3.3) Fix "typo" for SHA1MSG2 4) patch asm instructions for fixes 3.2 (attachment: x86ins_4_fixes32.patch) add missing instructions of BMI1, BMI2, ADX, CMUL, SHA, XSAVE, MOVBE no "code" changes, only x86ins.dat and generated files with mkx86ins Some instructions deliberately have wrong tags in order to make no changes beside x86ins.dat. 5) patch prof of concept back port asm instructions to fpc 3.0.4 (attachment: x86ins_4_fpc304.patch) add missing instructions of BMI1, BMI2, ADX, CMUL, SHA, XSAVE, MOVBE, RAND no "code" changes, but const maxinfolen = 8; to maxinfolen = 9; x86ins.dat and generated files with mkx86ins I did this to make an argument that it's safe to add asm instructions to fpc 3.2.3 Engine, that supports those instruction, is in production for a while now. diff -ru a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas --- a/compiler/x86/aasmcpu.pas 2024-05-09 00:13:05.0 + +++ b/compiler/x86/aasmcpu.pas 2024-05-16 22:49:29.290239056 + @@ -1664,8 +1664,9 @@ else begin { allow 2nd, 3rd or 4th operand being a constant and expect no size for shuf* etc. } -{ further, allow AAD and AAM with imm. operand } +{ further, allow ENTER, AAD and AAM with imm. operand } if (opsize=S_NO) and not((i in [1,2,3]) + or ((i=0) and (opcode in [A_ENTER])) {$ifndef x86_64} or ((i=0) and (opcode in [A_AAD,A_AAM])) {$endif x86_64} diff -ru u/compiler/i386/i386att.inc v/compiler/i386/i386att.inc --- u/compiler/i386/i386att.inc 2021-07-18 13:32:23.955521000 + +++ v/compiler/i386/i386att.inc 2024-05-16 15:20:21.135532388 + @@ -254,7 +254,7 @@ 'pmaddwd', 'pmagw', 'pmulhriw', -'pmulhrwa', +'pmulhrw', 'pmulhrwc', 'pmulhw', 'pmullw', @@ -797,6 +797,10 @@ 'vpblendvb', 'vpblendw', 'vpclmulqdq', +'vpclmullqlqdq', +'vpclmulhqlqdq', +'vpclmullqhqdq', +'vpclmulhqhqdq', 'vpcmpeqb', 'vpcmpeqd', 'vpcmpeqq', @@ -931,11 +935,26 @@ 'vzeroupper', 'andn', 'bextr', +'blsi', +'blsmsk', +'blsr', 'tzcnt', +'bzhi', +'mulx', +'pdep', +'pext', 'rorx', 'sarx', 'shlx', 'shrx', +'movbe', +'pclmulqdq', +'pclmullqlqdq', +'pclmulhqlqdq', +'pclmullqhqdq', +'pclmulhqhqdq', +'adcx', +'adox', 'vbroadcasti128', 'vextracti128', 'vinserti128', @@ -1016,5 +1035,23 @@ 'vfnmsub231sd', 'vfnmsub132ss', 'vfnmsub213ss', -'vfnmsub231ss' +'vfnmsub231ss', +'rdrand', +'rdseed', +'xgetbv', +'xsetbv', +'xsave', +'xsave64', +'xrstor', +'xrstor64', +'xsaveopt', +'xsaveopt64', +'prefetchwt1', +'sha1rnds4', +'sha1nexte', +'sha1msg1', +'sha1msg2', +'sha256rnds2', +'sha256msg1', +'sha256msg2' ); diff -ru u/compiler/i386/i386atts.inc v/compiler/i386/i386atts.inc --- u/compiler/i386/i386atts.inc 2021-07-18 13:32:23.951521000 + +++ v/compiler/i386/i386atts.inc 2024-05-16 15:20:21.135532388 + @@ -947,11 +947,14 @@ attsufNONE, attsufNONE, attsufNONE, +attsufINT, attsufNONE, attsufNONE, attsufNONE, attsufNONE, attsufNONE, +attsufINT, +attsufINT, attsufNONE, attsufNONE, attsufNONE, @@ -1013,6 +1016,40 @@ attsufNONE, attsufNONE, attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufNONE, +attsufINT, +attsufNONE, +attsufNONE, +attsufNONE, attsufNONE, attsufNONE, attsufNONE, diff -ru u/compiler/i386/i386int.inc v/compiler/i386/i386int.inc --- u/compiler/i386/i386int.inc 2021-07-18 13:32:23.955521000 + +++ v/compiler/i386/i386int.inc 2024-05-16 15:20:21.135532388 + @@ -254,7 +254,7 @@ 'pmaddwd', 'pmagw', 'pmulhriw', -'pmulhrwa', +'pmulhrw', 'pmulhrwc', 'pmulhw', 'pmullw', @@ -797,6 +797,10 @@ 'vpblendvb', 'vpblendw', 'vpclmulqdq', +'vpclmullqlqdq', +'vpclmulhqlqdq', +'vpclmullqhqdq', +'vpclmulhqhqdq', 'vpcmpeqb', 'vpcmpeqd', 'vpcmpeqq', @@ -931,11 +935,26 @@ 'vzeroupper', 'andn', 'bextr', +'blsi', +'blsmsk', +'blsr', 'tzcnt', +'bzhi', +'mulx', +'pdep', +'pext', 'rorx', 'sarx', 'shlx',
Re: [fpc-devel] download or compile documentation
>> Is there a way to download human readable format documenation? > >Please explain ? html, pdf, chm format... The documentation is available as PDF. That's human readable ? > Looking documentation of 3.2.2 and it is bad. Errors and errors. Wanted to crosschech, maybe it is all fixed meanwhile. You can check the actual state of documentation on the daily build: https://www.freepascal.org/daily/ Ok. That will do. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] download or compile documentation
On 2024-05-09 09:17, Marģers . via fpc-devel wrote: Hi, > Is there a way to download human readable format documenation? > > Looking documentation of 3.2.2 and it is bad. Errors and errors. What do you mean with "errors and errors"? Wrong word usage, decription of function does not mach function. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] download or compile documentation
Is there a way to download human readable format documenation? Looking documentation of 3.2.2 and it is bad. Errors and errors. Wanted to crosschech, maybe it is all fixed meanwhile. Failing to compilde documentation by myself. Where to download or how to compile? Simple "make" does not work. It is allways "Target not supported, run fpcmake". And that is not valid solution. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] error target i386 -Cp80486
1) does not work make clean singlezipinstall OS_TARGET=win32 CPU_TARGET=i386 ALLOW_WARNINGS=1 OPT=" -O2 -vxitl -Cp80486 -Op80486" hangs on system.inc(421,2) Start reading includefile C:\Users\Lietotajs\Downloads\fora\a\486\gh\rtl\inc\generic.inc 100 5.174/5.888 Kb Used 900 5.307/6.336 Kb Used 1000 5.326/6.336 Kb Used this is fine make clean singlezipinstall OS_TARGET=win32 CPU_TARGET=i386 ALLOW_WARNINGS=1 OPT=" -O2 -vxitl -CpPENTIUM2 -OpPENTIUM2" some other "-Cp" also failing ... 2) if libgdb.a is not found, then it ends compilation with error... (for 3.2.2 it was optional). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] dos go32v2 compile target on target?
On 2024-02-27 13:08, Marģers . wrote: >>> Should I be able to compile DOS go32v2 target from DOS itself? >>> >>> compiling trunk using "make" fall into infinite loop on this >> command >>> >>> t:\sv\fpc331\compiler\ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 >>> -FE. -FUt:\sv\fpc331\rtl\units\go32v2 -vx -di386 -dRELEASE -Us -Sg >>> system.pp >>> >>> Even thou executing it separately there is no problem. >> >> Building the compiler under GO32v2 is not guaranteed. It might work >> if >> all the prerequisites are fulfilled, but it may not. What's the >> environment you used (operating system, etc.)? > > FreeDos on real hardware and DoxBox-x with long filename support. > For FreeDos you need to clear environment variables to proceed > set dosdir= > set cfgfile= > > I came to found out that booth failed the same way. > DoxBox-x has problems to delete some temp files. I got impression that > is known problem and not resolved one. I wouldn't spend too much time on DosBox-x, there's simply too much outside of your control. Regarding FreeDos - if it works with individual compilation, but it fails if running under make, you might: 1) Make sure that there are no remaining files from previous compilation attempts (do not rely on "make clean", but really check if there are no .ppu and/or .o and/or .a files). 2) Check memory and disk conditions - how much memory is there, how much free disk space? 3) Check whether CWSDPMI is used, or whether there is some other DPMI provider in use (I don't remember if there's something else included in FreeDos). Note that some DPMI providers ignore the SIGSEGV condition (unlike CWSDPMI and unlike the DPMI requirements) and that may easily result in an endless loop if such a condition occurs due to some error. 4) Increase the verbosity level - I'd try at least "OPT=-vvlx" to see if you find something interesting (either directly in the output on screen, or in the generated fpcdebug.txt - however, you need to be prepared for the fact that fpcdebug.txt may not be complete if you interrupt the compiler run forcibly). 5) Disable optimizations ("OPT=O-") and check whether it makes any difference. Thank you for advice 3) CWSDPMI - from fpc 3.2.2 official release. Should be fine. 4) i used -vxit and redirected output to file. 5) O- optimization failed as well i will try little bit more. It juts takes a lot of time testing every possibility. Margers ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] dos go32v2 compile target on target?
Should I be able to compile DOS go32v2 target from DOS itself? Overcoming some challenges was possible to compile fpc 3.2.2 with starting compiler version 3.2.2. Version 3.2.0 does not work. compiling trunk using "make" fall into infinite loop on this command t:\sv\fpc331\compiler\ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -FE. -FUt:\sv\fpc331\rtl\units\go32v2 -vx -di386 -dRELEASE -Us -Sg system.pp Even thou executing it separately there is no problem. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] LEA instruction speed
1. why you leave "time:=..." in benchmark loop? It does add 50% of execution time per call. 2. Pascal version does not match assembler version. Had to fix it. //Result := X + Counter + $87654321; Result:=Result + X + $87654321; Result:=Result xor y; 3. Assembler functions can be unified to work under win64,win32, linux 64, linux 32 function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe; asm @Loop2: LEA Input, [Input + X + $87654321] XOR Input, y DEC y JNZ @Loop2 MOV EAX, Input end; 4. My results. Ryzen 2700x Pascal control case: 0.7 ns/call 0.0710 Using LEA instruction: 0.7 ns/call 0.0700 Using ADD instructions: 0.7 ns/call 0.0710 Even thou results are equal, i was able to add 4 independent ADD instructions around LEA while results didn't chance, but only 2 around ADD. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] qoi image file format support
There is this super new, super fast and super what not image format. I have added support for fcl-images. attached zip file contains: 1) read and write support in files qoif/qoicomn.pas qoif/fpwriteqoi.pas qoif/fpreadqoi.pas 2) example of read and write QOI file qoif/wrqoif.pas qoif/wrpngf.pas more information about QOI https://qoiformat.org example images https://qoiformat.org/qoi_test_images.zip <> ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The "magic div" algorithm
Hi, Thank you for implementation. Is that true, that you can not detect word and byte size on division by constant? Squeezing in 32 bit register it would make byte-code shorter, not necessarily faster. - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorithm From: J. Gareth Moreton via fpc-devel To: This one is for Marģers especially: https://gitlab.com/freepascal.org/fpc/source/-/issues/39355 Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The "magic div" algorithm
I came up with even shorter variant of div example function teDWordDivBy7_v4( divided : dword):dword; assembler; nostackframe; asm mov ecx,divided mov rax,2635249153693862181 mul rcx mov eax,edx end; current version for comparison function teDWordDivBy7_v0( divided : dword):dword; assembler; nostackframe; asm mov ecx,divided mov eax,613566757 mul ecx add edx,ecx rcr edx,1 shr edx,2 mov eax,edx end; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The "magic div" algorithm
I was over exited of making working example for byte div case. For word case it can be done. In case of dword div it's not so nice. It's theoretically possible to shave 1 cpu clock cycle. And i have no working example jet. Sorry, for spreading false hope. example for byte case: function teByteDivBy7( divided : dword):dword; assembler; nostackframe; asm mov ecx,divided mov eax,293 mul ecx shr eax, 11 end; - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorithm From: Marģers . via fpc-devel To: FPC developers' list For unsigned byte, word and dword divisions by constant on 64 bit cpu can be converted as good cases. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The "magic div" algorithm
if you have not bout book you can always try search in internet. I did some research in magic div constants. As already known, there are good case where you can replace division by mov magic mul shr and bad case mov magic mul add rcr shr Bad cases are approximately 1/3 of all cases. For unsigned byte, word and dword divisions by constant on 64 bit cpu can be converted as good cases. Here is possibility for improvements. - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorithm From: J. Gareth Moreton via fpc-devel To: Something tells me I should purchase that book - I sense it could reveal some interesting insights. Note that while I understand the concept of turning integer division into multiplication (indeed, I implemented the first version into x86 before it was improved with the "calc_divconst_magic_unsigned" routine, and then implemented it for AArch64), the algorithm that is used in "calc_divconst_magic_unsigned" I don't quite get if just for the lack of comments and references, although I am studying it more closely. Still, I figure I'll put down Hacker's Delight as a future purchase so I have a reputable source rather than just an online one (I know there's one somewhere that isn't behind a paywall). Gareth aka. Kit On 20/08/2021 18:46, Marģers . via fpc-devel wrote: is there a reference to the algorithm that's used to calculate the reciprocal constants used in the integer division optimisations for x86 and AArch64? Hacker’s Delight Second Edition Henry S. Warren, Jr. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Virus-free. www.avast.com ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The "magic div" algorithm
is there a reference to the algorithm that's used to calculate the reciprocal constants used in the integer division optimisations for x86 and AArch64? Hacker’s Delight Second Edition Henry S. Warren, Jr. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] duplicate internal error numbers
- Reply to message - Subject: Re: [fpc-devel] duplicate internal error numbers Date: ceturtd., 8 okt. 2020, 19:39 From: Sven Barth via fpc-devel To: FPC developers' list > Marģers . via fpc-devel schrieb am Do., 8. > Okt. 2020, 12:39: > > > I would advise against an automated change in case it changes too many > > > internal error numbers and when a third party raises an issue where the > > > compiler has triggered one, we can no longer identify where in the code > > > base that the state has gone bad because none of the numbers match any > > > more. > > It is not that much problem. Most of internal errors would not be triggered > > by end users. > Opened new ticket in bug tracer https://bugs.freepascal.org/view.php?id=37888 > That is kinda the point. They *should* not be triggered, but when they are > triggered they should be found. This is > especially important as most users are using release versions while we are > working on trunk. There are 555 internal error numbers changed of 4300+. It's about 13%. What's the probability that one of changed number will be triggered until next release. One or two in a year (or none). As i see it, it's just decision to make to accept or reject the patch. I was curios how much duplicate internal error are there. A lot. Now you know that as well. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] duplicate internal error numbers
- Reply to message - Subject: Re: [fpc-devel] duplicate internal error numbers Date: trešd., 7 okt. 2020, 16:40 From: J. Gareth Moreton via fpc-devel To: > When two different programmers write code on the same day in different > parts of the compiler, there's bound to be a clash eventually. good example was z80 and xtensa > I would advise against an automated change in case it changes too many > internal error numbers and when a third party raises an issue where the > compiler has triggered one, we can no longer identify where in the code > base that the state has gone bad because none of the numbers match any more. It is not that much problem. Most of internal errors would not be triggered by end users. Opened new ticket in bug tracer https://bugs.freepascal.org/view.php?id=37888 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] duplicate internal error numbers
- Reply to message - Subject: Re: [fpc-devel] duplicate internal error numbers Date: trešd., 7 okt. 2020, 14:16 From: Jonas Maebe via fpc-devel To: > On 07/10/2020 13:02, Marģers . via fpc-devel wrote: > > found total 4300+ > > 1001 error number has to be changed to make all error number unique > > > > as there are so huge number of changes to make i have question > > 1) would it be desirable to change all (or most) duplicate error > > numbers in one single big patch? > > 2) selective amount of changes in multiple patches? > > 3) leave it as is? not broken don't fix or do nothing is a choice. > How many of these are part of different code generators? We only support > one architecture per compiler binary _and_ code generators for new > architectures often start as (partial) copies of existing code > generators, so it's normal that they have a lot of duplicate internal > errors, but it doesn't matter. > OTOH, I'm sure there are also still duplicates in generic code and > within single architectures, but to find those you have to look > exclusively at the generic files in combination with those for a single > architecture (which may be in multiple directories, e.g. compiler/x86 > and compiler/i386 for the i386 target). It's still plenty potential duplicates. Got 339 unique cases to investigate. A lot manual labor. As i understand that automated changes are not welcomed. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] duplicate internal error numbers
i did check for duplicate internal error numbers found total 4300+ 1001 error number has to be changed to make all error number unique as there are so huge number of changes to make i have question 1) would it be desirable to change all (or most) duplicate error numbers in one single big patch? 2) selective amount of changes in multiple patches? 3) leave it as is? not broken don't fix or do nothing is a choice. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Producing assembly with less branches?
- Reply to message - Subject: [fpc-devel] Producing assembly with less branches? From: Stefan Glienke To: > Hi, > not sure if anything significantly changed in trunk compared to 3.2 wrt > to optimized code being generated but I am quite disappointed that fpc > (checked win64 with -O3 and -O4) does not use cmovxx instructions and > alike for the most basic things and produces terrible code like this: > unit1.pas:49 if left < right then > 00010002E3C0 39ca cmp edx,ecx > 00010002E3C2 7e06 jle 0x10002e3ca > unit1.pas:50 Result := -1 > 00010002E3C4 b8 mov eax,0x > 00010002E3C9 c3 ret > unit1.pas:51 else if left > right then > 00010002E3CA 39ca cmp edx,ecx > 00010002E3CC 7d06 jge 0x10002e3d4 > unit1.pas:52 Result := 1 > 00010002E3CE b80100c3 mov eax,0x1 > unit1.pas:54 Result := 0; > 00010002E3D4 31c0 xor eax,eax > unit1.pas:55 end; > 00010002E3D6 c3 ret > Similar for even simpler things: > unit1.pas:43 if i < 0 then > 00010002E3A1 85c0 test eax,eax > 00010002E3A3 7d03 jge 0x10002e3a8 > > unit1.pas:44 i := 0; > 00010002E3A5 31c0 xor eax,eax > 00010002E3A7 90 nop > Imo someone should work at that and make the compiler produce less > branches. Not sure if that is on your list but it should be looked at. it's already done in trunk (sadly not in 3.2.0) to get cmov instruction emitted, has to meet two conditions 1) if statement without else part 2) assign value of variable (not constant). your code has to look like to benefit from cmov function cmov2(left, right : longint):longint; var l1,lf: longint; r : longint; begin l1:=1; lf:=-1; r:=0; if left > right then begin r:=lf; end;// else if left < right then begin r:=l1; end;// else r:=0; cmov2:=r; end; 00400370b9 01 00 00 00mov ecx,0001h 00400375ba ff ff ff ffmov edx,0h 0040037a31 c0 xor eax,eax 0040037c39 fe cmp esi,edi 0040037e0f 4c c2 cmovl eax,edx 0040038139 fe cmp esi,edi 004003830f 4f c1 cmovnle eax,ecx 00400386c3ret ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] AArch64 r44737 and r44738
can someone check if remove of r44737 in r44738 was intended? https://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/compiler/aarch64/aoptcpu.pas?r1=44738=44737=44738 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] generate assembler with no clear purpose MOV
> > From: J. Gareth Moreton > > To: > >> Are you able to dump the nodes as well with -an? (You'll need to define > >> -dEXTDEBUG though) That might give some clues behind the presence of > >> that movslq instruction. > > > You will need to also add ALLOW_WARNINGS=1 to the make call. Ok, ALLOW_WARNINGS=1 did the job. in attachment testsh.pas and testsh.s program defuu; function roo(lk:dword):byte; var k : dword; bit : dword; num : byte; one : dword; begin num:=0; one:=1; for k:=0 to 25 do begin bit:= one shl k; //- this is the line - if (lk and bit) <> 0 then begin lk:=lk xor bit; inc(num); end; end; roo:=num; end; function sh ( a, b: dword):longint; var z : dword; begin z:=a shl b; sh:= z+1; end; begin end. .file "testsh.pas" # Begin asmlist al_begin .section .debug_line .type .Ldebug_linesection0,@object .Ldebug_linesection0: .type .Ldebug_line0,@object .Ldebug_line0: .section .debug_abbrev .type .Ldebug_abbrevsection0,@object .Ldebug_abbrevsection0: .type .Ldebug_abbrev0,@object .Ldebug_abbrev0: .section .text.b_DEBUGSTART_$P$DEFUU,"ax" .globl DEBUGSTART_$P$DEFUU .type DEBUGSTART_$P$DEFUU,@object DEBUGSTART_$P$DEFUU: # End asmlist al_begin # Begin asmlist al_procedures .section .text.n_p$defuu_$$_roo$longword$$byte,"ax" .globl P$DEFUU_$$_ROO$LONGWORD$$BYTE .type P$DEFUU_$$_ROO$LONGWORD$$BYTE,@function P$DEFUU_$$_ROO$LONGWORD$$BYTE: .Lc2: # Register rsp allocated # Var $result located in register al # Var k located in register eax # Var bit located in register esi # Var num located in register al # Var one located in register eax # second blockn (entry) # second nothing-nothg (entry) # second nothing-nothg (exit) # second blockn (entry) #second nothing-nothg (entry) #second nothing-nothg (exit) #second asm (entry) # Register edi,edi allocated # Var lk located in register edi #second asm (exit) #second asm (entry) #second asm (exit) #second asm (entry) # [testsh.pas] # [9] begin #second asm (exit) #second asm (entry) #second asm (exit) #second blockn (entry) #second blockn (exit) #second blockn (entry) # second assignment (entry) # second load (entry) # second load (exit) # second ordconst (entry) # second ordconst (exit) # Var num located in register al # Register al allocated .Ll1: # [10] num:=0; xorb %al,%al # second assignment (exit) # second assignment (entry) # second load (entry) # second load (exit) # second ordconst (entry) # second ordconst (exit) # Var one located in register r8d # Register r8d allocated .Ll2: # [11] one:=1; movl $1,%r8d # second assignment (exit) # second blockn (entry) # second nothing-nothg (entry) # second nothing-nothg (exit) # second blockn (entry) # second nothing-nothg (entry) # second nothing-nothg (exit) # second assignment (entry) #second load (entry) #second load (exit) #second ordconst (entry) #second ordconst (exit) # Var k located in register ecx # Register ecx allocated .Ll3: # [12] for k:=0 to 25 do xorl %ecx,%ecx # second assignment (exit) # second while_repeat (entry) # Register esi allocated .Lj5: #second blockn (entry) # second nothing-nothg (entry) # second nothing-nothg (exit) # second blockn (entry) # second assignment (entry) # second shlshr-shln (entry) #second load (entry) #second load (exit) #second typeconv (entry) # second load (entry) # second load (exit) # Register rdx allocated # Peephole Optimization: MovAnd2Mov 1 done .Ll4: # [14] bit:= one shl k; //- this is the line - movl %ecx,%edx #second typeconv (exit) # Register edx,edx allocated # Peephole Optimization: %edx = %ecx; changed to minimise pipeline stall (MovXXX2MovXXX) shlx %ecx,%r8d,%edx # second shlshr-shln (exit) # second load (entry) # second load (exit) movl %edx,%esi # second assignment (exit) # second ifn (entry) # second add-unequaln (entry) #second add-andn (entry) # second load (entry) # second load (exit) # second load (entry) # second load (exit) # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall (MovXXX2MovXXX) # Peephole Optimization: Mov2Nop 4 done .Ll5: # [15] if (lk and bit) <> 0 then andl %edi,%edx #second add-andn (exit) #second ordconst (entry) #second ordconst (exit) # Register rflags allocated # Register edx released # second add-unequaln (exit) je .Lj9 # Register rflags released # second blockn (entry) #second inline (entry) #
Re: [fpc-devel] generate assembler with no clear purpose MOV
From: J. Gareth Moreton To: > Are you able to dump the nodes as well with -an? (You'll need to define > -dEXTDEBUG though) That might give some clues behind the presence of > that movslq instruction. building compiler with -dEXTDEBUG does not work for me make singlezipinstall OS_TARGET=linux CPU_TARGET=x86_64 OPT="-dEXTDEBUG -CpCOREAVX2 -OpCOREAVX2 -Fu/home/user/fpc304/lib/fpc/3.0.4/units/x86_64-linux/rtl/" constexp.pas(125,13) Warning: Location (LOC_CSSETREG) not equal to expectloc (LOC_REG): typeconvn constexp.pas(594) Fatal: There were 1 errors compiling module, stopping ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] generate assembler with no clear purpose MOV
- Reply to message - Subject: Re: [fpc-devel] generate assembler with no clear purpose MOV Date: otrd., 4 febr. 2020, 22:24 From: J. Gareth Moreton To: > To hazard a guess, it's sign-extending to the CPU word size as an > intermediate step. It's not something the peephole optimizer can easily > eliminate. Do the register allocations give any clues before that > instruction? # Var k located in register ecx # Var bit located in register esi seems to be sign-extend, but if change variable "k" and "bit" to dword then there is simple movl %ecx,%edx. Instruction SHLX (as well SHRX) is treated as variables always are memory variables and there for first read value in temp register and after write back. As well SHL and SHR are logical operators so no need for sign extension. While those MOV instructions do not hurt much, there is benefit of resolving this issue - 2 or 1 free registers available for other purposes. > On 04/02/2020 18:50, Marģers . via fpc-devel wrote: > > p.s. tested execution speed and there is no measurable difference. > > > > > >> asm code > >> # [109] bit:= longint(1) shl k; > >> movslq %ecx,%rdx > >> # Register r8d allocated > >> movl $1,%r8d > >> # Register edx,edx allocated > >> shlx %edx,%r8d,%edx > >> # Register r8d released > >> # Register edx allocated > >> movl %edx,%esi > >> # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall > >> (MovXXX2MovXXX) > >> # Peephole Optimization: Mov2Nop 4 done > > > >> what purpose serve: movslq %ecx,%rdx ? > >> movl %edx,%esi seems unnecessary, > >> when just enough would be > >> movl $1,%esi > >> shlx %ecx,%esi,%esi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] generate assembler with no clear purpose MOV
p.s. tested execution speed and there is no measurable difference. > asm code > # [109] bit:= longint(1) shl k; > movslq %ecx,%rdx > # Register r8d allocated > movl $1,%r8d > # Register edx,edx allocated > shlx %edx,%r8d,%edx > # Register r8d released > # Register edx allocated > movl %edx,%esi > # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall > (MovXXX2MovXXX) > # Peephole Optimization: Mov2Nop 4 done > what purpose serve: movslq %ecx,%rdx ? > movl %edx,%esi seems unnecessary, > when just enough would be > movl $1,%esi > shlx %ecx,%esi,%esi > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] generate assembler with no clear purpose MOV
hi example code: function roo(lk:longint):byte; var k : longint; bit : longint; num : byte; begin num:=0; for k:=0 to 25 do begin bit:= longint(1) shl k; if (lk and bit) <> 0 then begin lk:=lk xor bit; inc(num); end; end; roo:=num; end; begin end. asm code # [109] bit:= longint(1) shl k; movslq %ecx,%rdx # Register r8d allocated movl $1,%r8d # Register edx,edx allocated shlx %edx,%r8d,%edx # Register r8d released # Register edx allocated movl %edx,%esi # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall (MovXXX2MovXXX) # Peephole Optimization: Mov2Nop 4 done what purpose serve: movslq %ecx,%rdx ? movl %edx,%esi seems unnecessary, when just enough would be movl $1,%esi shlx %ecx,%esi,%esi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
> Does that mean in some situations, if you have a small, tight loop, it > might be better to optimise over speed in some very rare cases? For > example, turning MOV EAX, $ into OR EAX, $FF to squeeze out a > few extra bytes, even though the instruction introduces a false dependency. Latency 4 clock cycles is a lot. As long dependency can be resolved in shorter time there will be some performance gain. That performance penalty is not fixed 20%. It depends what code you have before that. Long latency instructions have time to catch up with rest of code. It is possible to completely cancel out, by placing call so that ret will fall into next 64 byte line. It's place where tricky optimizations can be done. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
> Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel > > Most processors have a fairly large uop cache (up to 2048 for the newest > >> generations iirc), so this would only be for the first iteration? Do you > >> have a reference (agner fog page or so) or more explanation for this > >> that describes this?) > > I have to revoke my statement. Don't have evidence to back up. Code, that > > lead me to thous conclusions, has been discarded. > > I have read most whats published in agner's fog page. There nothing to > > pinpoint as reference. > No prob. Was just interested, I had to do some sse/avx code the last > years, and hadn't heard of this. I did some research manual from Agner's Fog page The microarchitecture of Intel, AMD and VIA CPUs 20.17 Cache and memory access Level 1 code 64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 clocks As well i created some performance tests and found out that if loop crossed 64 B line it got 20% performance lose while measurement error was 2%. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
> Op 2019-11-09 om 02:24 schreef Marģers . via fpc-devel: > > > > 3) it changes code location (code cross page boundaries). For my particular > > cpu there are 64 byte code page. If loop can fit in it, speed is twice as > > it overlaps even one byte over page boundary. Jumping forward is ok (as > > expected code flow is always forward). And there is lager page few kb - > > calling outside - small penalty. > Most processors have a fairly large uop cache (up to 2048 for the newest > generations iirc), so this would only be for the first iteration? Do you > have a reference (agner fog page or so) or more explanation for this > that describes this?) I have to revoke my statement. Don't have evidence to back up. Code, that lead me to thous conclusions, has been discarded. I have read most whats published in agner's fog page. There nothing to pinpoint as reference. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
> By the way, what is your 'particular CPU'? If it's not Intel-based, amd zen 1. gen - the same x86_64. Not much help for testing on other platforms. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
blobing - i meant unnecessarily increase in size, that function loses good shape. There is no such word "blobing" in English. My bad. let me periphrases 'just wrong' - 'questionable right'. Currently inlining are left in hands of programmers. And it is abused as magical performance booster. For small function it's must likely true, for larger function it's questionable. 1) it might increase index size for accessing local variables on stack. 2) it might increase jump instruction size 3) it changes code location (code cross page boundaries). For my particular cpu there are 64 byte code page. If loop can fit in it, speed is twice as it overlaps even one byte over page boundary. Jumping forward is ok (as expected code flow is always forward). And there is lager page few kb - calling outside - small penalty. As fpc do not manage this any how, it's just pure luck. It just might get unlucky. Code align generally do not solve thous things. Conclusion: by naked eye one cannot tell inline is any good or not. Inline or not to inline is nothing to do with philosophy, it has to be calculated (as clang does and fpc don't). I'm looking forward for jump optimization to be accepted. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] inline... and philosophy
> - Identifying functions that are only used once. This became a slight point > of contention between Florian and myself, because I inlined a couple of > functions Inlining every once used function is just wrong. Gain from eliminating call and function prologue and epilogue might not be sufficient to outweigh "blobing" caller function. One optimizations of clang is "outline" some parts of larger functions (like else statement). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] multi-line strings
hi, Java is going to have multi-line strings https://www.youtube.com/watch?v=J1YKAFtNz70 I'm posting this, because they have different way of indentation as its currently in proposed patch. Maybe it worth of consideration. If not, it's ok. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] problem compiling compiler
> make info PP=/home/user/fpc304/lib/fpc/3.0.4/ppcx64 > If the compiler is found, it should be reported as the first line printed after == Configuration info == == Configuration info == FPC.. /home/user/fpc304/lib/fpc/3.0.4/ppcx64 "make info" shows correct location of starting compiler, but during actual compilation makefile is unable locate it. i can successfully compile 3.0.4, but unable to compile 3.3.1 ( few month ago i was able do so ). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] problem compiling compiler
Hi, i used simple script for compiling compiler export PP=/home/user/fpc304/lib/fpc/3.0.4/ppcx64 make singlezipinstall OS_TARGET=linux CPU_TARGET=x86_64 OPT=" -Fu/home/user/fpc304/lib/fpc/3.0.4/units/x86_64-linux/rtl/" Reason of "export PP=" is that i don't have installed fpc, so i have to specify path where to find it. Script was fine some time ago, but now it can not find compiler: Makefile:135: *** Compiler /home/user/where/is/my/copy/of/fpc/compiler/ppcx64 not found. Stop. Can i improve my script to actually work or is it some sort of makefile bug? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Some thoughts on multi-line string support, and a possible syntax that I think is perfectly clean and Pascal-ish.
- Reply to message - From: To: > On 7/6/19 4:50 PM, wkitt...@windstream.net wrote: > > writeln("MultiLine1= '",MultiLine1,"'"); > > writeln("MultiLine2= '",MultiLine2,"'"); > (* i forgot to do the line for MultiLine3 *) > writeln("MultiLine3= '",MultiLine3,"'"); > (* that's what happens when you write directly > in the email editor without testing *) Well, trunk i use still has no string quote character " . Maybe it's time add that too while Ben Grasset on implementing multi line strings? Just kidding... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Some thoughts on multi-line string support, and a possible syntax that I think is perfectly clean and Pascal-ish.
- Reply to message - Subject: Re: [fpc-devel] Some thoughts on multi-line string support, and a possible syntax that I think is perfectly clean and Pascal-ish. Date: trešd., 3 jūl., 23:20 From: Ben Grasset To: FPC developers' list > program Example; > (* > This is a perfectly > normal multi-line > Pascal comment. > *) > const SA = ` > This is a multiline > string using hypothetical backticks. > Imagine it was fully syntax-highlighted > like normal strings and the comment > above are. > `; > begin > end. Why introduce ` if there already is ' ? Just use ' as well for multi line strings. For people of more conservative view point, put multilinestring behind mode switch. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] XML node dump feature
- Reply to message - Subject: Re: [fpc-devel] XML node dump feature Date: otrd., 25 jūn., 03:16 From: Ben Grasset To: FPC developers' list > const > A: TVec3F = (X: 1.2; Y: 2.4; Z: 3.8); > B: TVec3F = (X: 2.1; Y: 4.2; Z: 8.3); > // You can't do the next part currently, obviously > C: TVec3F = A + B; > D: TVec3F = A - B; > E: TVec3F = A * B; > F: TVec3F = A / B; > Sorry to say but, this should not work even with *pure* function. Typed constants are not truly constants. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Possible idea... "safe" subroutines/methods
> As mentioned in a previous message, fpc trunk supports a volatile > intrinsic: > http://wiki.freepascal.org/FPC_New_Features_Trunk#Support_for_.22volatile.22_intrinsic My bad, I didn't know about volatile intrinsic. So, does it mean that compiler is allowed to optimize any variable, even promote global variables to registers, as long they are not in volatile(...)? > Jonas > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Possible idea... "safe" subroutines/methods
As i understand this idea is another way around key word "volatile" in order to allow compiler perform more optimization. Then why to go half way introducing "safe", when it's better introduce "volatile". Not too long ago here was discussion about it, but it was strongly rejected by core team. I do not support idea of "safe", as it's partial solution to more complex problem. It's better to introduce "volatile" together with data flow analysis. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Successful implementation of inline support for pure assembler routines on x86
- Reply to message - Subject: Re: [fpc-devel] Successful implementation of inline support for pure assembler routines on x86 Date: 2019. gada 17. marts 19:38:03 From: Florian Klämpfl To: > Am 15.03.19 um 11:32 schrieb J. Gareth Moreton: > * using inline assembler is always the worst > solution to do something, > it normally means either: > - the compiler misses a certain feature Question is about, who is going to implement every imaginable feature in reasonable time frame? For example intrinsics for BMI. > - the compiler is generating bad code Generated code for x86_64 instruction set is pretty good, if pascal code is tuned to take advantaged of optimization currently done by compiler (LLVM target does not do any better). But assembler still give 10-20% boost. So there is still space for improvements. > * intrinsics provide a much better way to achieve > exactly what this > approach aims at: > - they enable the use of all registers as the > compiler does register > allocation sadly it does not apply to bsf > Is there any advantage over intrinsics, I missed > so far? To take advantage of flags changed by instruction. I looking forward for assembler inline functionality to be accepted. I would benefit of it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Successful implementation of inline supportforpureassembler routines on x86
- Reply to message - Subject: Re: [fpc-devel] Successful implementation of inline supportforpureassembler routines on x86 Date: 2019. gada 18. marts 00:28:10 From: J. Gareth Moreton To: FPC developers' list > To use the integer clamp function as an example (if x < 0 then x := 0): > { Microsoft x64 calling convention... X is in ECX } > function ClampInt(X: LongInt): LongInt; assembler; nostackframe; inline; > asm > XOR EAX, EAX > TEST ECX, ECX > CMOVG EAX, ECX > end; try code: y:=0; if x < 0 then x:=y; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86-64: MOVZX peephole optimisations
> I'm a tad confused in regards to the best course of action regarding MOVZX. Many of the peephole optimisations seek to change them to MOV followed by AND (e.g. "movzbl (mem), %eax" to "mov (mem), %al; and $0xff, %eax"). Does MOVZX have a well-documented performance penalty in modern processors that favours the MOV/AND combination? It seems odd because the combination implies a pipeline stall, which becomes more pronounced if the MOV instruction is reading from memory. For intel pentium and earlier processors combination MOV, AND was better, but now days cpu handle MOVZX as good as MOV. It's just question for which cpu to optimize? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] I'll be straight
fillchar is one useful function, but should be avoided from use in time critical code by any cost. Only reasonable way to improve fillchar is to make it internal function, where fpc can decide, depending on parameters, what will be best solution for filling mem with specified value. But it will also slow down compilation time. Not sure fpc core team will accept that kind improvement. Optimized code for speed most likely became less readable and less maintainable. It's misfortune of optimization. - Reply to message - Subject: Re: [fpc-devel] I'll be straight Date: 2019. gada 9. februāris 10:35:51 From: J. Gareth Moreton To: FPC developers' list > Thanks Michael, > In the last patch that Jonas almost immediately closed, the speed savings were inconclusive because the number of cycles saved is probably only a few dozen, but I would argue it makes the code a bit more reasonable too because it replaces things like loc[low(loc)] with loc[0] and fillchar with a for-loop that initialises each element of an array to zero (it's slightly faster because the element size is a multiple of 16, while fillchar is general-purpose and spends a lot of time jumping around and even performing a multiplication before it starts filling things up). I guess seeing it marked as "won't fix" within 20 minutes was a moment of horror. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86_64 Optimizer Overhaul
- Reply to message - Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 12. decembris 17:02:02 From: J. Gareth Moreton To: FPC developers' list > By the way, what generates that set of > operations? I'm curious because I want to > see what's going on in the compiler. You > see, "incq" and that "mov, add, mov" set > aren't equivalent; anything over > $1 gets truncated with the set, > but not with "incq", although it's not a > concern if only the lower 32 bits are > used. Have to agree, it's not equivalent. I added example program for you to examine this situation. It might and might not be an error. note: i use compiler parameter -O4 > If both combinations run at about the same > speed, then "incq" is better just on > account of code size. I spent some time to examine "incq mem" and "mov add mov" On my particular cpu if "incq" is independent instruction, then actual performance is 1 clock cycle. Combination of "mov add mov" ended up like 1 - 1.2 clock cycles. Chain of "mov add mov" was always few clocks more than the same length chain of "incq". But in case if "incq" fall into sever dependency chain then "incq" executes 25% worse than "mov add mov". "incq" 4,5 clock cycles "mov add mov" 3,8 clock cycles I vote for shorter code and prefer "incq" margers program overhaul_incq; var globalQ : longint; function dummycall(a,b: longint):longint; begin dummycall:=a+b; end; procedure fuu; var k : longint; { rbx for loop counter } a,b,c,m,z,q : longint; {no real use, just to occupie r12-r15} sk : longint; {no free real registers - so to be temp on stack} begin sk:=0; q:=0; a:=0; for k:=0 to 100 do { k takes rbx } begin { dummy math to keep busy registers r12 - r15 } c:=q+a; m:=k+1; { call discards r8 - r11, rax, rdx, rcx, rdi, rsi - no use of them} z:=dummycall(k,c); q:=c+z; { as fpc don't use rbp for variable, } { we don't have left any usable register } { incq [mem] } inc(sk); {writeln(k,' ',q);} end; globalQ:=q; end; begin fuu; writeln(globalQ); end. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86_64 Optimizer Overhaul
> Nice spot with the "incq" command there. It wasn't intentional for that to be split into 3 commands, but is likely just a side-effect of pass 1 not being run twice now... granted, since one of my criteria was that the code should not be less optimal, I'll see if I can watch out for that one. Both versions are kinda equivalent in execution speed. > One interesting thing to note though is that the read and add work on the 32-bit register, but then the full 64-bit register is written. As local variables are meant to be allocated in registers, but procedure has calls to other procedures, they are stored "temporarily" on stack as 64 bit registers. It's not an error or at least not an error for program logic in this case. > > # [468] inc(sk); > > --trunk - > > incq 272(%rsp) > > -- overhaul --- > > movl 272(%rsp),%eax > > addl $1,%eax > > movq %rax,272(%rsp) > > did you mean to be so? > > margers ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] x86_64 Optimizer Overhaul
- Reply to message - Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 6. decembris 18:57:29 From: J. Gareth Moreton To: FPC developers' list > I believed I've fixed the bug. Thanks for your help. Now it's way better. -O3 and -O4 works fine. Speed test for my programs shows no measurable difference. # [468] inc(sk); --trunk - incq 272(%rsp) -- overhaul --- movl272(%rsp),%eax addl$1,%eax movq%rax,272(%rsp) did you mean to be so? margers ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] x86_64 Optimizer Overhaul
I run it no linux. Problem code part. type PLongData = ^TLongData; TLongData = array [0..100] of longint; function binarySearchLong ( sortedArray:PLongData; nLen, toFind:longint):longint; var low, high, mid, l, h, m : longint; begin { Returns index of toFind in sortedArray, or -1 if not found} low := 0; high := nLen - 1; l := sortedArray^[low]; h := sortedArray^[high]; while ((l <= toFind) and (h >= toFind)) do begin mid := (low + high) shr 1; { var "low" in register r8d } m := sortedArray^[mid]; if (m < toFind) then begin low := mid + 1; l := sortedArray^[low]; { asm code generated -- with trunk lea r8d, [r11d+1H] mov esi, r8d --end trunk -- with overhaul it never set r8d to new value, but should lea esi, [r11d+1H] -- end overhaul mov r10d, dword [rdi+rsi*4] jmp ?_00144 } end else if (m > toFind) then begin high := mid - 1; h := sortedArray^[high]; end else begin binarySearchLong:=mid; exit; end; end; if (sortedArray^[low] = toFind) then begin binarySearchLong:=low; end else binarySearchLong := -1; { Not found} end; - Reply to message - Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul Date: 2018. gada 2. decembris 23:32:36 From: J. Gareth Moreton To: FPC developers' list Thanks for the feedback. Do you have a reproducible case, and does it fail on Linux or Windows? I'll have a look for the infinite loops in the meantime. Gareth aka. Kit ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] x86_64 Optimizer Overhaul
> I've had problems testing it under Linux due to configuration difficulties, so if anyone is willing to try out "make all", I'll be most grateful. "make all" work well on linux. Compiler options -O3 and -O4 are broken. It was possible to compile my program, but program at some point went into never ending loop - cpu usage 100% and response zero. Compiling my speed test program using -O2, optimizations made by Overhaul, was speed lose by 2% comparing to current trunk. I guess, optimizations is good for compiler itself, but no so much for user programs. margers ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] LLVM code generator
> The support is currently only on the > https://svn.freepascal.org/FPC/svn/fpc/branches/debug_eh branch. I got sources from https://svn.freepascal.org/svn/fpc/branches/debug_eh > ** Linux: you may also have to specify the library path to libgcc_s. > E.g. on Ubuntu 16.04: > make LOCALOPT="-dllvm -Fullvm -Fl/usr/lib/gcc/x86_64-linux-gnu/5" > OPT="-Fullvm -Fl/usr/lib/gcc/x86_64-linux-gnu/5" all -j 4 FPMAKEOPT="-T > 4" compilation does not work for me with those options. I keep getting following error Fatal: Cannot open whole program optimization feedback file "/home/blabla/src/llvm/compiler/pp1.wpo" Fatal: Compilation aborted Makefile:3912: recipe for target 'system.ppu' failed Problem is LOCALOPT. As soon it is as parameter for make, then wpo files are not created. margers ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel