Re: [fpc-devel] Textmode ide, patch

2024-05-31 Thread Marģers . via fpc-devel

 
 
   > 5) make Ctrl+Enter shortcut key possible in unix/linux xterm: ctrl_enter_xterm_unix_keyboard.diff 
   
   I found out that this patch also eliminates Ctrl+J shortcut key and turn it into Ctrl+Enter. 
   But that is what already happening with Ctrl+i, Ctrl+M, Ctrl+[, Ctrl+H... 
   More investigation lead me to conclusion that there is no true fully working keyboard for unix/linux, unless hooking all "Ctrl+Key" to escape strings.  That is possible only in particular configuration settings which no one has. 
     
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Textmode ide, patch

2024-05-29 Thread Marģers . via fpc-devel

 
 
  Improvements for Textmode ide
   1) current ide (of fpc 3.3.1) has problems with second compilations. Resolve issue with this patch tdef_nil_for_ide_fpc331.patch
   2) add missing xterm escape string sequences keyboard_add_escape_keys_unix_fpc331.patch
   3) fpc 3.2.3  add missing xterm escape string sequences keyboard_add_escape_keys_unix_fixes32.patch
   Also make sure shift state of escaped keys is transferred to TKeyEvent.
   4) kb_scancode_updates.patch safe to apply after keyboard_add_escape_keys_unix_xxx.patch
   Some scancode have been bogus for a good reason. I suggest to fix some of them, and be more consistent among all platforms.
   Patched file keyscan.inc is in active use only by unix/keyboard.pp
   Drivers unit in FV, have internal fix ups, it's fine to make those changes.
   5) make Ctrl+Enter shortcut key possible in unix/linux xterm: ctrl_enter_xterm_unix_keyboard.diff
   patch is as good for fpc 3.3.1 as for fixes 3.2
   Apply patch and make entry in xterm configuration file .Xresources
   
   xterm*VT100.Translations: #override \
       Ctrl  Return: string(0x0a)
   
    
 
diff -ru a/packages/rtl-console/src/inc/keyscan.inc b/packages/rtl-console/src/inc/keyscan.inc
--- a/packages/rtl-console/src/inc/keyscan.inc	2024-05-09 00:13:05.0 +
+++ b/packages/rtl-console/src/inc/keyscan.inc	2024-05-28 07:03:45.682436000 +
@@ -4,12 +4,8 @@
kbAltEsc  = $01;  {Alt+Esc = scancode 01, ascii code 0.}
kbEsc = $01;  {Esc = scancode 01, ascii code 27.}
kbAltSpace= $02;
-   kbCtrlIns = $04;
-   kbShiftIns= $05;
-   kbCtrlDel = $06;
-   kbShiftDel= $07;
-   kbAltBack = $08;
kbAltShiftBack= $09;
+   kbAltBack = $0E;
kbShiftTab= $0F;
kbAltQ= $10;
kbAltW= $11;
@@ -68,7 +64,9 @@
kbDown= $50;
kbPgDn= $51;
kbIns = $52;
+   kbShiftIns= $52;  { Differs from kbIns only by shift state }
kbDel = $53;
+   kbShiftDel= $53;  { Differs from kbDel only by shift state }
kbShiftF1 = $54;
kbShiftF2 = $55;
kbShiftF3 = $56;
@@ -131,6 +129,8 @@
kbCtrlCenter  = $8F;
kbCtrlGreyPlus= $90;
kbCtrlDown= $91;
+   kbCtrlIns = $92;
+   kbCtrlDel = $93;
kbCtrlTab = $94;
kbAltHome = $97;
kbAltUp   = $98;
diff -ru a/packages/rtl-console/src/unix/keyboard.pp b/packages/rtl-console/src/unix/keyboard.pp
--- a/packages/rtl-console/src/unix/keyboard.pp	2023-01-19 15:42:50.438798000 +
+++ b/packages/rtl-console/src/unix/keyboard.pp	2024-05-27 23:54:54.113784131 +
@@ -39,6 +39,7 @@
 char : byte;
 ScanValue : byte;
 CharValue : byte;
+ShiftValue : byte;
 SpecialHandler : Tprocedure;
   end;
 
@@ -76,7 +77,7 @@
 const
   KeyBufferSize = 20;
 var
-  KeyBuffer : Array[0..KeyBufferSize-1] of Char;
+  KeyBuffer : Array[0..KeyBufferSize-1] of record ch: Char; sh :byte end;
   KeyPut,
   KeySend   : longint;
 
@@ -92,18 +93,18 @@
 {$i keyscan.inc}
 
 {Some internal only scancodes}
-const KbShiftUp= $f0;
-  KbShiftLeft  = $f1;
-  KbShiftRight = $f2;
-  KbShiftDown  = $f3;
-  KbShiftHome  = $f4;
-  KbShiftEnd   = $f5;
-  KbCtrlShiftUp= $f6;
-  KbCtrlShiftDown  = $f7;
-  KbCtrlShiftRight = $f8;
-  KbCtrlShiftLeft  = $f9;
-  KbCtrlShiftHome  = $fa;
-  KbCtrlShiftEnd   = $fb;
+const kbAltCenter = kbCenter;
+  kbShiftCenter = kbCenter;
+
+  KbShiftUp= KbUp;
+  KbShiftLeft  = KbLeft;
+  KbShiftRight = KbRight;
+  KbShiftDown  = KbDown;
+  KbShiftHome  = KbHome;
+  KbShiftEnd   = kbEnd;
+  kbShiftPgUp = kbPgUp;
+  kbShiftPgDn = kbPgDn;
+
 
   double_esc_hack_enabled : boolean = false;
 
@@ -412,7 +413,7 @@
 InTail:=0;
 end;
 
-procedure PushKey(Ch:char);
+procedure PushKey(Ch:char;aShift:byte);
 var
   Tmp : Longint;
 begin
@@ -421,17 +422,22 @@
   If KeyPut>=KeyBufferSize Then
KeyPut:=0;
   If KeyPut<>KeySend Then
-   KeyBuffer[Tmp]:=Ch
+  begin
+   KeyBuffer[Tmp].ch:=Ch;
+   KeyBuffer[Tmp].sh:=aShift;
+  end
   Else
KeyPut:=Tmp;
 End;
 
 
-function PopKey:char;
+function PopKey(var aShift:byte):char;
 begin
+  aShift:=0;
   If KeyPut<>KeySend Then
begin
- PopKey:=KeyBuffer[KeySend];
+ PopKey:=KeyBuffer[KeySend].ch;
+ aShift:=KeyBuffer[KeySend].sh;
  Inc(KeySend);
  If KeySend>=KeyBufferSize Then
   KeySend:=0;
@@ -441,10 +447,10 @@
 End;
 
 
-procedure PushExt(b:byte);
+procedure PushExt(b:byte;sh:byte);
 begin
-  PushKey(#0);
-  PushKey(chr(b));
+  PushKey(#0,sh);
+  PushKey(chr(b),0);
 end;
 
 
@@ -742,7 +748,7 @@
 Pa^.Child:=newPtree;
 end;
 
-function DoAddSequence(const St : String; AChar,AScan :byte) : PTreeElement;
+function DoAddSequence(const St : String; AChar,AScan, AShift :byte) : PTreeElement;
 var
   CurPTree,NPT : PTreeElement;
   c : byte;
@@ -803,9 +809,12 @@
 Writeln(system.stderr,'Scan was ',ScanValue,' now ',AScan);

[fpc-devel] x86 assembler improvements, patch

2024-05-29 Thread Marģers . via fpc-devel

 
 
  Some compiler x86 assembler improvements
   
   1) patch for fpc 3.3.1 (attachment: mkx86ins_version_bump.patch)
   compiler/utils/mkx86ins.pp
   Version bumped from 1.6.1 to 1.6.2
   There has been changes to code, so version has to represent that.
   
   2) Patch to enable ENTER asm instruction (attachment: enable_asm_instr_enter.patch)
   same for fpc 3.3.1 and fixes 3.2
   
   3) patch for fpc 3.3.1 compiler/x86/x86ins.dat (attachment: x86ins_4_fpc331.patch)
   3.1)
   Rename 3DNow instruction (fixed long lasting typo in mnemonic).
   PMULHRWA  --> PMULHRW
   3.2)
   Add vpclmullqlqdq, vpclmulhqlqdq, vpclmullqhqdq, vpclmulhqhqdq.
   3.3)
   Fix "typo" for SHA1MSG2
   
   
   4) patch asm instructions for fixes 3.2 (attachment: x86ins_4_fixes32.patch)
   add missing instructions of BMI1, BMI2, ADX, CMUL, SHA, XSAVE, MOVBE
   no "code" changes, only x86ins.dat and generated files with mkx86ins
   Some instructions deliberately have wrong tags in order to make no changes beside x86ins.dat.
   
   
   5) patch prof of concept back port asm instructions to fpc 3.0.4 (attachment: x86ins_4_fpc304.patch)
   add missing instructions of BMI1, BMI2, ADX, CMUL, SHA, XSAVE, MOVBE, RAND
   no "code" changes, but const maxinfolen = 8; to maxinfolen = 9;
   x86ins.dat and generated files with mkx86ins
   I did this to make an argument that it's safe to add asm instructions to fpc 3.2.3
   Engine, that supports those instruction, is in production for a while now.
   
    
 
diff -ru a/compiler/x86/aasmcpu.pas b/compiler/x86/aasmcpu.pas
--- a/compiler/x86/aasmcpu.pas	2024-05-09 00:13:05.0 +
+++ b/compiler/x86/aasmcpu.pas	2024-05-16 22:49:29.290239056 +
@@ -1664,8 +1664,9 @@
   else
   begin
 { allow 2nd, 3rd or 4th operand being a constant and expect no size for shuf* etc. }
-{ further, allow AAD and AAM with imm. operand }
+{ further, allow ENTER, AAD and AAM with imm. operand }
 if (opsize=S_NO) and not((i in [1,2,3])
+  or ((i=0) and (opcode in [A_ENTER]))
 {$ifndef x86_64}
   or ((i=0) and (opcode in [A_AAD,A_AAM]))
 {$endif x86_64}
diff -ru u/compiler/i386/i386att.inc v/compiler/i386/i386att.inc
--- u/compiler/i386/i386att.inc	2021-07-18 13:32:23.955521000 +
+++ v/compiler/i386/i386att.inc	2024-05-16 15:20:21.135532388 +
@@ -254,7 +254,7 @@
 'pmaddwd',
 'pmagw',
 'pmulhriw',
-'pmulhrwa',
+'pmulhrw',
 'pmulhrwc',
 'pmulhw',
 'pmullw',
@@ -797,6 +797,10 @@
 'vpblendvb',
 'vpblendw',
 'vpclmulqdq',
+'vpclmullqlqdq',
+'vpclmulhqlqdq',
+'vpclmullqhqdq',
+'vpclmulhqhqdq',
 'vpcmpeqb',
 'vpcmpeqd',
 'vpcmpeqq',
@@ -931,11 +935,26 @@
 'vzeroupper',
 'andn',
 'bextr',
+'blsi',
+'blsmsk',
+'blsr',
 'tzcnt',
+'bzhi',
+'mulx',
+'pdep',
+'pext',
 'rorx',
 'sarx',
 'shlx',
 'shrx',
+'movbe',
+'pclmulqdq',
+'pclmullqlqdq',
+'pclmulhqlqdq',
+'pclmullqhqdq',
+'pclmulhqhqdq',
+'adcx',
+'adox',
 'vbroadcasti128',
 'vextracti128',
 'vinserti128',
@@ -1016,5 +1035,23 @@
 'vfnmsub231sd',
 'vfnmsub132ss',
 'vfnmsub213ss',
-'vfnmsub231ss'
+'vfnmsub231ss',
+'rdrand',
+'rdseed',
+'xgetbv',
+'xsetbv',
+'xsave',
+'xsave64',
+'xrstor',
+'xrstor64',
+'xsaveopt',
+'xsaveopt64',
+'prefetchwt1',
+'sha1rnds4',
+'sha1nexte',
+'sha1msg1',
+'sha1msg2',
+'sha256rnds2',
+'sha256msg1',
+'sha256msg2'
 );
diff -ru u/compiler/i386/i386atts.inc v/compiler/i386/i386atts.inc
--- u/compiler/i386/i386atts.inc	2021-07-18 13:32:23.951521000 +
+++ v/compiler/i386/i386atts.inc	2024-05-16 15:20:21.135532388 +
@@ -947,11 +947,14 @@
 attsufNONE,
 attsufNONE,
 attsufNONE,
+attsufINT,
 attsufNONE,
 attsufNONE,
 attsufNONE,
 attsufNONE,
 attsufNONE,
+attsufINT,
+attsufINT,
 attsufNONE,
 attsufNONE,
 attsufNONE,
@@ -1013,6 +1016,40 @@
 attsufNONE,
 attsufNONE,
 attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufNONE,
+attsufINT,
+attsufNONE,
+attsufNONE,
+attsufNONE,
 attsufNONE,
 attsufNONE,
 attsufNONE,
diff -ru u/compiler/i386/i386int.inc v/compiler/i386/i386int.inc
--- u/compiler/i386/i386int.inc	2021-07-18 13:32:23.955521000 +
+++ v/compiler/i386/i386int.inc	2024-05-16 15:20:21.135532388 +
@@ -254,7 +254,7 @@
 'pmaddwd',
 'pmagw',
 'pmulhriw',
-'pmulhrwa',
+'pmulhrw',
 'pmulhrwc',
 'pmulhw',
 'pmullw',
@@ -797,6 +797,10 @@
 'vpblendvb',
 'vpblendw',
 'vpclmulqdq',
+'vpclmullqlqdq',
+'vpclmulhqlqdq',
+'vpclmullqhqdq',
+'vpclmulhqhqdq',
 'vpcmpeqb',
 'vpcmpeqd',
 'vpcmpeqq',
@@ -931,11 +935,26 @@
 'vzeroupper',
 'andn',
 'bextr',
+'blsi',
+'blsmsk',
+'blsr',
 'tzcnt',
+'bzhi',
+'mulx',
+'pdep',
+'pext',
 'rorx',
 'sarx',
 'shlx',
 

Re: [fpc-devel] download or compile documentation

2024-05-09 Thread Marģers . via fpc-devel

>> Is there a way to download human readable format documenation?
>
>Please explain ?

html, pdf, chm format...

 


The documentation is available as PDF. That's human readable ?

> Looking documentation of 3.2.2 and it is bad. Errors and errors. Wanted to crosschech, maybe it is all fixed meanwhile.

You can check the actual state of documentation on the daily build:
https://www.freepascal.org/daily/
 


Ok. That will do.


 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] download or compile documentation

2024-05-09 Thread Marģers . via fpc-devel
 
On 2024-05-09 09:17, Marģers . via fpc-devel wrote:


Hi,

> Is there a way to download human readable format documenation?
>
> Looking documentation of 3.2.2 and it is bad. Errors and errors.

What do you mean with "errors and errors"?
 

Wrong word usage, decription of function does not mach function.


 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] download or compile documentation

2024-05-09 Thread Marģers . via fpc-devel
Is there a way to download human readable format documenation?

Looking documentation of 3.2.2 and it is bad. Errors and errors. Wanted to crosschech, maybe it is all fixed meanwhile.

Failing to compilde documentation by myself. Where to download or how to compile?
Simple "make" does not work. It is allways "Target not supported, run fpcmake". And that is not valid solution.
 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] error target i386 -Cp80486

2024-04-23 Thread Marģers . via fpc-devel
1) does not work
make clean singlezipinstall OS_TARGET=win32 CPU_TARGET=i386  ALLOW_WARNINGS=1 OPT="  -O2 -vxitl -Cp80486 -Op80486"

hangs on
system.inc(421,2) Start reading includefile C:\Users\Lietotajs\Downloads\fora\a\486\gh\rtl\inc\generic.inc
100 5.174/5.888 Kb Used

900 5.307/6.336 Kb Used
1000 5.326/6.336 Kb Used

this  is fine
make clean singlezipinstall OS_TARGET=win32 CPU_TARGET=i386  ALLOW_WARNINGS=1 OPT="  -O2 -vxitl -CpPENTIUM2 -OpPENTIUM2"

some other "-Cp" also failing ...

2) if libgdb.a is not found, then it ends compilation with error... (for 3.2.2 it was optional).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] dos go32v2 compile target on target?

2024-02-27 Thread Marģers . via fpc-devel
 
On 2024-02-27 13:08, Marģers . wrote:
>>> Should I be able to compile DOS go32v2 target from DOS itself?
>>>
>>> compiling trunk using "make" fall into infinite loop on this
>> command
>>>
>>> t:\sv\fpc331\compiler\ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386
>>> -FE. -FUt:\sv\fpc331\rtl\units\go32v2 -vx -di386 -dRELEASE -Us -Sg
>>> system.pp
>>>
>>> Even thou executing it separately there is no problem.
>>
>> Building the compiler under GO32v2 is not guaranteed. It might work
>> if
>> all the prerequisites are fulfilled, but it may not. What's the
>> environment you used (operating system, etc.)?
>
> FreeDos on real hardware and DoxBox-x with long filename support.
> For FreeDos you need to clear environment variables to proceed
> set dosdir=
> set cfgfile=
>
> I came to found out that booth failed the same way.
> DoxBox-x has problems to delete some temp files. I got impression that
> is known problem and not resolved one.

I wouldn't spend too much time on DosBox-x, there's simply too much
outside of your control.

Regarding FreeDos - if it works with individual compilation, but it
fails if running under make, you might:

1) Make sure that there are no remaining files from previous compilation
attempts (do not rely on "make clean", but really check if there are no
.ppu and/or .o and/or .a files).

2) Check memory and disk conditions - how much memory is there, how much
free disk space?

3) Check whether CWSDPMI is used, or whether there is some other DPMI
provider in use (I don't remember if there's something else included in
FreeDos). Note that some DPMI providers ignore the SIGSEGV condition
(unlike CWSDPMI and unlike the DPMI requirements) and that may easily
result in an endless loop if such a condition occurs due to some error.

4) Increase the verbosity level - I'd try at least "OPT=-vvlx" to see if
you find something interesting (either directly in the output on screen,
or in the generated fpcdebug.txt - however, you need to be prepared for
the fact that fpcdebug.txt may not be complete if you interrupt the
compiler run forcibly).

5) Disable optimizations ("OPT=O-") and check whether it makes any
difference.

Thank you for advice

3) CWSDPMI - from fpc 3.2.2 official release. Should be fine.
4) i used -vxit and redirected output to file.
5) O- optimization failed as well

i will try little bit more.  It juts takes a lot of time testing every possibility.

Margers

 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] dos go32v2 compile target on target?

2024-02-27 Thread Marģers . via fpc-devel
Should I be able to compile DOS go32v2 target from DOS itself?

Overcoming some challenges was possible to compile fpc 3.2.2 with starting compiler version 3.2.2. Version 3.2.0 does not work.

compiling trunk using "make" fall into infinite loop on this command

t:\sv\fpc331\compiler\ppc1.exe -Ur -Xs -O2 -n -Fi../inc -Fi../i386 -FE. -FUt:\sv\fpc331\rtl\units\go32v2 -vx -di386 -dRELEASE -Us -Sg system.pp

Even thou executing it separately there is no problem. 
 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] LEA instruction speed

2023-10-08 Thread Marģers . via fpc-devel
1. why you leave "time:=..." in benchmark loop? It does add 50% of execution time per call.
2. Pascal version does not match assembler version. Had to fix it.
  //Result := X + Counter + $87654321;
  Result:=Result + X + $87654321;
  Result:=Result xor y;
3. Assembler functions can be unified to work under win64,win32, linux 64, linux 32
function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop2:
  LEA Input, [Input + X + $87654321]
  XOR Input, y
  DEC y
  JNZ @Loop2
  MOV EAX, Input
end;

4. My results. Ryzen 2700x

   Pascal control case: 0.7 ns/call  0.0710
 Using LEA instruction: 0.7 ns/call  0.0700
Using ADD instructions: 0.7 ns/call  0.0710

Even thou results are equal, i was able to add 4 independent ADD instructions around LEA while results didn't chance, but only 2 around ADD.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] qoi image file format support

2022-05-06 Thread Marģers . via fpc-devel
There is this super new, super fast and super what not image format. I have added support for fcl-images.
attached zip file contains:

1) read and write support in files
qoif/qoicomn.pas
qoif/fpwriteqoi.pas
qoif/fpreadqoi.pas

2) example of read and write QOI file
qoif/wrqoif.pas
qoif/wrpngf.pas

more information about QOI https://qoiformat.org

example images https://qoiformat.org/qoi_test_images.zip

<>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The "magic div" algorithm

2021-09-15 Thread Marģers . via fpc-devel
Hi,
Thank you for implementation.

Is that true, that you can not detect word and byte size on division by constant? Squeezing in 32 bit register it would make byte-code shorter, not necessarily faster.
 
- Reply to message -
Subject: Re: [fpc-devel] The "magic div" algorithm
From:  J. Gareth Moreton via fpc-devel 
To:  

This one is for Marģers especially:

https://gitlab.com/freepascal.org/fpc/source/-/issues/39355

Gareth aka. Kit


 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel
I came up with even shorter variant of div
example 
function teDWordDivBy7_v4( divided : dword):dword; assembler; nostackframe;
asm
 mov ecx,divided
 mov rax,2635249153693862181
 mul rcx
 mov eax,edx
end;

current version for comparison

function teDWordDivBy7_v0( divided : dword):dword; assembler; nostackframe;
asm
 mov ecx,divided
 mov eax,613566757
 mul ecx
 add edx,ecx
 rcr edx,1
 shr edx,2
 mov eax,edx
end;

 





	
		
			 
			 
		
	







___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel
I was over exited of making working example for byte div case. For word case it can be done.
In case of dword div it's not so nice. It's theoretically possible to shave 1 cpu clock cycle. And i have no working example jet. Sorry, for spreading false hope.

example for byte case:

function teByteDivBy7( divided : dword):dword; assembler; nostackframe;
asm
 mov ecx,divided
 mov eax,293
 mul ecx
 shr eax, 11
end;


 
- Reply to message -
Subject: Re: [fpc-devel] The "magic div" algorithm
From:  Marģers . via fpc-devel 
To:  FPC developers' list 


For unsigned byte, word and dword divisions by constant on 64 bit cpu can be converted as good cases.



	
		
			 
			 
		
	






___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel

if you have not bout book you can always try search in internet.

I did some research in magic div constants. As already known, there are good case where you can replace division by

mov magic
mul
shr

and bad case

mov magic
mul
add
rcr
shr


Bad cases are approximately  1/3 of all cases.
For unsigned byte, word and dword divisions by constant on 64 bit cpu can be converted as good cases.

Here is possibility for improvements.
 
- Reply to message -
Subject: Re: [fpc-devel] The "magic div" algorithm
From:  J. Gareth Moreton via fpc-devel 
To:  


Something tells me I should purchase that book - I sense it could reveal some interesting insights.

Note that while I understand the concept of turning integer division into multiplication (indeed, I implemented the first version into x86 before it was improved with the "calc_divconst_magic_unsigned" routine, and then implemented it for AArch64), the algorithm that is used in "calc_divconst_magic_unsigned" I don't quite get if just for the lack of comments and references, although I am studying it more closely.

Still, I figure I'll put down Hacker's Delight as a future purchase so I have a reputable source rather than just an online one (I know there's one somewhere that isn't behind a paywall).

Gareth aka. Kit

On 20/08/2021 18:46, Marģers . via fpc-devel wrote:

 
 is there a reference to the algorithm that's used to calculate the
reciprocal constants used in the integer division optimisations for x86 and AArch64?

Hacker’s Delight
Second Edition
Henry S. Warren, Jr.

 
 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



 

	
		
			
			Virus-free. www.avast.com
		
	





___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The "magic div" algorithm

2021-08-20 Thread Marģers . via fpc-devel
 
 is there a reference to the algorithm that's used to calculate the
reciprocal constants used in the integer division optimisations for x86 and AArch64?

Hacker’s Delight
Second Edition
Henry S. Warren, Jr.

 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] duplicate internal error numbers

2020-10-08 Thread Marģers . via fpc-devel
- Reply to message -
Subject: Re: [fpc-devel] duplicate internal error numbers
Date: ceturtd., 8 okt. 2020, 19:39
From:  Sven Barth via fpc-devel 
To:  FPC developers' list 
> Marģers . via fpc-devel  schrieb am Do., 8. 
> Okt. 2020, 12:39:
> > > I would advise against an automated change in case it changes too many
> > > internal error numbers and when a third party raises an issue where the
> > > compiler has triggered one, we can no longer identify where in the code
> > > base that the state has gone bad because none of the numbers match any 
> > > more.

> > It is not that much problem. Most of internal errors would not be triggered 
> > by end users.
> Opened new ticket in bug tracer https://bugs.freepascal.org/view.php?id=37888

> That is kinda the point. They *should* not be triggered, but when they are 
> triggered they should be found. This is 
> especially important as most users are using release versions while we are 
> working on trunk. 

There are 555 internal error numbers changed of 4300+. It's about 13%. What's 
the probability that one of changed number will be triggered until next 
release. One or two in a year (or none). As i see it, it's just decision to 
make to accept or reject  the patch. 
I was curios how much duplicate internal error are there. A lot. Now you know 
that as well.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] duplicate internal error numbers

2020-10-08 Thread Marģers . via fpc-devel
- Reply to message -
Subject: Re: [fpc-devel] duplicate internal error numbers
Date: trešd., 7 okt. 2020, 16:40
From:  J. Gareth Moreton via fpc-devel 
To:  

> When two different programmers write code on the same day in different
> parts of the compiler, there's bound to be a clash eventually.  

good example was z80 and xtensa

> I would advise against an automated change in case it changes too many
> internal error numbers and when a third party raises an issue where the
> compiler has triggered one, we can no longer identify where in the code
> base that the state has gone bad because none of the numbers match any more.

It is not that much problem. Most of internal errors would not be triggered by 
end users.
Opened new ticket in bug tracer https://bugs.freepascal.org/view.php?id=37888

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] duplicate internal error numbers

2020-10-07 Thread Marģers . via fpc-devel
- Reply to message -
Subject: Re: [fpc-devel] duplicate internal error numbers
Date: trešd., 7 okt. 2020, 14:16
From:  Jonas Maebe via fpc-devel 
To:  
> On 07/10/2020 13:02, Marģers . via fpc-devel wrote:
> > found total 4300+
> > 1001 error number has to be changed to make all error number unique
> >
> > as there are so huge number of changes to make i have question
> > 1) would it be desirable to change all (or most) duplicate error 
> > numbers in one single big patch?
> > 2) selective amount of changes in multiple patches?
> > 3) leave it as is? not broken don't fix or do nothing is a choice.

> How many of these are part of different code generators? We only support
> one architecture per compiler binary _and_ code generators for new
> architectures often start as (partial) copies of existing code
> generators, so it's normal that they have a lot of duplicate internal
> errors, but it doesn't matter.

> OTOH, I'm sure there are also still duplicates in generic code and
> within single architectures, but to find those you have to look
> exclusively at the generic files in combination with those for a single
> architecture (which may be in multiple directories, e.g. compiler/x86
> and compiler/i386 for the i386 target).

It's still plenty potential duplicates. Got 339 unique cases to investigate. A 
lot manual labor. As i understand that automated changes are not welcomed.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] duplicate internal error numbers

2020-10-07 Thread Marģers . via fpc-devel
i did check for duplicate internal error numbers

found total 4300+
1001 error number has to be changed to make all error number unique

as there are so huge number of changes to make i have question
1) would it be desirable to change all (or most) duplicate error  numbers in one single big patch?
2) selective amount of changes in multiple patches?
3) leave it as is? not broken don't fix or do nothing is a choice.
 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Producing assembly with less branches?

2020-07-19 Thread Marģers . via fpc-devel
- Reply to message -
Subject: [fpc-devel] Producing assembly with less branches?
From:  Stefan Glienke 
To:  
> Hi,

> not sure if anything significantly changed in trunk compared to 3.2 wrt
> to optimized code being generated but I am quite disappointed that fpc
> (checked win64 with -O3 and -O4) does not use cmovxx instructions and
> alike for the most basic things and produces terrible code like this:

> unit1.pas:49  if left < right then
> 00010002E3C0 39ca cmp    edx,ecx
> 00010002E3C2 7e06 jle    0x10002e3ca 
> unit1.pas:50  Result := -1
> 00010002E3C4 b8   mov    eax,0x
> 00010002E3C9 c3   ret
> unit1.pas:51  else if left > right then
> 00010002E3CA 39ca cmp    edx,ecx
> 00010002E3CC 7d06 jge    0x10002e3d4 
> unit1.pas:52  Result := 1
> 00010002E3CE b80100c3 mov    eax,0x1
> unit1.pas:54  Result := 0;
> 00010002E3D4 31c0 xor    eax,eax
> unit1.pas:55  end;
> 00010002E3D6 c3   ret

> Similar for even simpler things:

> unit1.pas:43  if i < 0 then
> 00010002E3A1 85c0 test   eax,eax
> 00010002E3A3 7d03 jge    0x10002e3a8
> 
> unit1.pas:44  i := 0;
> 00010002E3A5 31c0 xor    eax,eax
> 00010002E3A7 90   nop

> Imo someone should work at that and make the compiler produce less
> branches. Not sure if that is on your list but it should be looked at.

it's already done in trunk (sadly not in 3.2.0)
to get cmov instruction emitted, has to meet two conditions
1) if statement without else part
2) assign value of variable (not constant).

your code has to look like to benefit from cmov

function cmov2(left, right : longint):longint;
var l1,lf: longint;
 r : longint;
begin
 l1:=1;
 lf:=-1;
 r:=0;
 if left > right then
 begin
  r:=lf;
 end;// else
 if left < right then
 begin
  r:=l1;
 end;// else  r:=0;
 cmov2:=r;
end;
  

00400370b9 01 00 00 00mov ecx,0001h
00400375ba ff ff ff ffmov edx,0h
0040037a31 c0 xor eax,eax
0040037c39 fe cmp esi,edi
0040037e0f 4c c2  cmovl eax,edx
0040038139 fe cmp esi,edi
004003830f 4f c1  cmovnle eax,ecx
00400386c3ret

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] AArch64 r44737 and r44738

2020-04-16 Thread Marģers . via fpc-devel
can someone check if remove of r44737 in r44738 was intended?


https://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/compiler/aarch64/aoptcpu.pas?r1=44738=44737=44738

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] generate assembler with no clear purpose MOV

2020-02-05 Thread Marģers . via fpc-devel
> > From: J. Gareth Moreton 
> > To: 
> >> Are you able to dump the nodes as well with -an? (You'll need to define
> >> -dEXTDEBUG though) That might give some clues behind the presence of
> >> that movslq instruction.
> >

> You will need to also add ALLOW_WARNINGS=1 to the make call.

Ok, ALLOW_WARNINGS=1 did the job.

in attachment testsh.pas and testsh.s

program defuu;


function roo(lk:dword):byte;
var k : dword;
bit : dword;
num : byte;
one : dword;
begin
 num:=0;
 one:=1;
 for k:=0 to 25 do
 begin
  bit:= one shl k; //- this is the line -
  if (lk and bit) <> 0 then
  begin
   lk:=lk xor bit;
   inc(num);
  end;
 end;
 roo:=num;
end;

function sh ( a, b: dword):longint;
var z  : dword;
begin
 z:=a  shl b;
 sh:= z+1;
end;

begin
end.
	.file "testsh.pas"
# Begin asmlist al_begin

.section .debug_line
	.type	.Ldebug_linesection0,@object
.Ldebug_linesection0:
	.type	.Ldebug_line0,@object
.Ldebug_line0:

.section .debug_abbrev
	.type	.Ldebug_abbrevsection0,@object
.Ldebug_abbrevsection0:
	.type	.Ldebug_abbrev0,@object
.Ldebug_abbrev0:

.section .text.b_DEBUGSTART_$P$DEFUU,"ax"
.globl	DEBUGSTART_$P$DEFUU
	.type	DEBUGSTART_$P$DEFUU,@object
DEBUGSTART_$P$DEFUU:
# End asmlist al_begin
# Begin asmlist al_procedures

.section .text.n_p$defuu_$$_roo$longword$$byte,"ax"
.globl	P$DEFUU_$$_ROO$LONGWORD$$BYTE
	.type	P$DEFUU_$$_ROO$LONGWORD$$BYTE,@function
P$DEFUU_$$_ROO$LONGWORD$$BYTE:
.Lc2:
	# Register rsp allocated
# Var $result located in register al
# Var k located in register eax
# Var bit located in register esi
# Var num located in register al
# Var one located in register eax
#  second blockn (entry)
#   second nothing-nothg (entry)
#   second nothing-nothg (exit)
#   second blockn (entry)
#second nothing-nothg (entry)
#second nothing-nothg (exit)
#second asm (entry)
	# Register edi,edi allocated
# Var lk located in register edi
#second asm (exit)
#second asm (entry)
#second asm (exit)
#second asm (entry)
# [testsh.pas]
# [9] begin
#second asm (exit)
#second asm (entry)
#second asm (exit)
#second blockn (entry)
#second blockn (exit)
#second blockn (entry)
# second assignment (entry)
#  second load (entry)
#  second load (exit)
#  second ordconst (entry)
#  second ordconst (exit)
# Var num located in register al
	# Register al allocated
.Ll1:
# [10] num:=0;
	xorb	%al,%al
# second assignment (exit)
# second assignment (entry)
#  second load (entry)
#  second load (exit)
#  second ordconst (entry)
#  second ordconst (exit)
# Var one located in register r8d
	# Register r8d allocated
.Ll2:
# [11] one:=1;
	movl	$1,%r8d
# second assignment (exit)
# second blockn (entry)
#  second nothing-nothg (entry)
#  second nothing-nothg (exit)
#  second blockn (entry)
#   second nothing-nothg (entry)
#   second nothing-nothg (exit)
#   second assignment (entry)
#second load (entry)
#second load (exit)
#second ordconst (entry)
#second ordconst (exit)
# Var k located in register ecx
	# Register ecx allocated
.Ll3:
# [12] for k:=0 to 25 do
	xorl	%ecx,%ecx
#   second assignment (exit)
#   second while_repeat (entry)
	# Register esi allocated
.Lj5:
#second blockn (entry)
# second nothing-nothg (entry)
# second nothing-nothg (exit)
# second blockn (entry)
#  second assignment (entry)
#   second shlshr-shln (entry)
#second load (entry)
#second load (exit)
#second typeconv (entry)
# second load (entry)
# second load (exit)
	# Register rdx allocated
# Peephole Optimization: MovAnd2Mov 1 done
.Ll4:
# [14] bit:= one shl k; //- this is the line -
	movl	%ecx,%edx
#second typeconv (exit)
	# Register edx,edx allocated
# Peephole Optimization: %edx = %ecx; changed to minimise pipeline stall (MovXXX2MovXXX)
	shlx	%ecx,%r8d,%edx
#   second shlshr-shln (exit)
#   second load (entry)
#   second load (exit)
	movl	%edx,%esi
#  second assignment (exit)
#  second ifn (entry)
#   second add-unequaln (entry)
#second add-andn (entry)
# second load (entry)
# second load (exit)
# second load (entry)
# second load (exit)
# Peephole Optimization: %esi = %edx; changed to minimise pipeline stall (MovXXX2MovXXX)
# Peephole Optimization: Mov2Nop 4 done
.Ll5:
# [15] if (lk and bit) <> 0 then
	andl	%edi,%edx
#second add-andn (exit)
#second ordconst (entry)
#second ordconst (exit)
	# Register rflags allocated
	# Register edx released
#   second add-unequaln (exit)
	je	.Lj9
	# Register rflags released
#   second blockn (entry)
#second inline (entry)
# 

Re: [fpc-devel] generate assembler with no clear purpose MOV

2020-02-04 Thread Marģers . via fpc-devel
 

From:  J. Gareth Moreton 
To:  
> Are you able to dump the nodes as well with -an? (You'll need to define
> -dEXTDEBUG though) That might give some clues behind the presence of
> that movslq instruction.

building compiler with -dEXTDEBUG does not work for me
make singlezipinstall OS_TARGET=linux CPU_TARGET=x86_64  OPT="-dEXTDEBUG 
-CpCOREAVX2 -OpCOREAVX2 
-Fu/home/user/fpc304/lib/fpc/3.0.4/units/x86_64-linux/rtl/"

constexp.pas(125,13) Warning: Location (LOC_CSSETREG) not equal to expectloc 
(LOC_REG): typeconvn
constexp.pas(594) Fatal: There were 1 errors compiling module, stopping

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] generate assembler with no clear purpose MOV

2020-02-04 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] generate assembler with no clear purpose MOV
Date: otrd., 4 febr. 2020, 22:24
From:  J. Gareth Moreton 
To:  
> To hazard a guess, it's sign-extending to the CPU word size as an
> intermediate step.  It's not something the peephole optimizer can easily
> eliminate.  Do the register allocations give any clues before that
> instruction?


# Var k located in register ecx
# Var bit located in register esi

seems to be sign-extend, but if change variable "k" and "bit" to dword then 
there is simple movl %ecx,%edx.
Instruction SHLX (as well SHRX) is treated as variables always are memory 
variables and there for first read value in temp register and after write back. 
As well SHL and SHR are logical operators so no need for sign extension.
While those MOV instructions do not hurt much, there is benefit of resolving 
this issue - 2 or 1 free registers available for other purposes.


> On 04/02/2020 18:50, Marģers . via fpc-devel wrote:
> >  p.s. tested execution speed and there is no measurable difference.
> >
> >
> >> asm code
> >> # [109] bit:= longint(1) shl k;
> >>     movslq    %ecx,%rdx
> >>     # Register r8d allocated
> >>     movl    $1,%r8d
> >>     # Register edx,edx allocated
> >>     shlx    %edx,%r8d,%edx
> >>     # Register r8d released
> >>     # Register edx allocated
> >>     movl    %edx,%esi
> >> # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall 
> >> (MovXXX2MovXXX)
> >> # Peephole Optimization: Mov2Nop 4 done
> >
> >> what purpose serve: movslq    %ecx,%rdx   ?
> >> movl    %edx,%esi seems unnecessary,
> >> when just enough would be
> >> movl    $1,%esi
> >> shlx    %ecx,%esi,%esi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] generate assembler with no clear purpose MOV

2020-02-04 Thread Marģers . via fpc-devel
 p.s. tested execution speed and there is no measurable difference.


> asm code
> # [109] bit:= longint(1) shl k;
>     movslq    %ecx,%rdx
>     # Register r8d allocated
>     movl    $1,%r8d
>     # Register edx,edx allocated
>     shlx    %edx,%r8d,%edx
>     # Register r8d released
>     # Register edx allocated
>     movl    %edx,%esi
> # Peephole Optimization: %esi = %edx; changed to minimise pipeline stall 
> (MovXXX2MovXXX)
> # Peephole Optimization: Mov2Nop 4 done


> what purpose serve: movslq    %ecx,%rdx   ?

> movl    %edx,%esi seems unnecessary,
> when just enough would be
> movl    $1,%esi
> shlx    %ecx,%esi,%esi

> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] generate assembler with no clear purpose MOV

2020-02-04 Thread Marģers . via fpc-devel
 hi 
example code:
function roo(lk:longint):byte;
var k : longint;
    bit : longint;
    num : byte;
begin
 num:=0;
 for k:=0 to 25 do
 begin
  bit:= longint(1) shl k;
  if (lk and bit) <> 0 then
  begin
   lk:=lk xor bit;
   inc(num);
  end;
 end;
 roo:=num;
end;
begin
end.

asm code 
# [109] bit:= longint(1) shl k;
    movslq    %ecx,%rdx
    # Register r8d allocated
    movl    $1,%r8d
    # Register edx,edx allocated
    shlx    %edx,%r8d,%edx
    # Register r8d released
    # Register edx allocated
    movl    %edx,%esi
# Peephole Optimization: %esi = %edx; changed to minimise pipeline stall 
(MovXXX2MovXXX)
# Peephole Optimization: Mov2Nop 4 done


what purpose serve: movslq    %ecx,%rdx   ?

movl    %edx,%esi seems unnecessary, 
when just enough would be
movl    $1,%esi
shlx    %ecx,%esi,%esi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread Marģers . via fpc-devel
> Does that mean in some situations, if you have a small, tight loop, it
> might be better to optimise over speed in some very rare cases? For
> example, turning MOV EAX, $ into OR EAX, $FF to squeeze out a
> few extra bytes, even though the instruction introduces a false dependency.

Latency 4 clock cycles is a lot. As long dependency can be resolved in shorter 
time there will be some performance gain. 
That performance penalty is not fixed 20%. It depends what code you have before 
that. Long latency instructions have time to catch up with rest of code. It is 
possible to completely cancel out, by placing call so that ret will fall into 
next 64 byte line. 
It's place where tricky optimizations can be done.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-22 Thread Marģers . via fpc-devel
> Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel
> >  Most processors have a fairly large uop cache (up to 2048 for the newest
> >> generations iirc), so this would only be for the first iteration? Do you
> >> have a reference (agner fog page or so) or more explanation for this
> >> that describes this?)
> > I have to revoke my statement. Don't have evidence to back up. Code, that 
> > lead me to thous conclusions, has been discarded.
> > I have read most whats published in agner's fog page. There nothing to 
> > pinpoint as reference.
> No prob. Was just interested, I had to do some sse/avx code the last
> years, and hadn't heard of this.

I did some research

manual from Agner's Fog page
The microarchitecture of Intel, AMD and VIA CPUs

20.17 Cache and memory access
Level 1 code  64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 
clocks

As well i created some performance tests and found out that if loop crossed 64 
B line it got 20% performance lose while measurement error was 2%.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-10 Thread Marģers . via fpc-devel
> Op 2019-11-09 om 02:24 schreef Marģers . via fpc-devel:
> >
> > 3) it changes code location (code cross page boundaries). For my particular 
> > cpu there are 64 byte code page. If loop can fit in it, speed is twice as 
> > it overlaps even one byte over page boundary. Jumping forward is ok (as 
> > expected code flow is always forward). And there is lager page few kb - 
> > calling outside - small penalty.

> Most processors have a fairly large uop cache (up to 2048 for the newest
> generations iirc), so this would only be for the first iteration? Do you
> have a reference (agner fog page or so) or more explanation for this
> that describes this?)

I have to revoke my statement. Don't have evidence to back up. Code, that lead 
me to thous conclusions, has been discarded. 
 I have read most whats published in agner's fog page. There nothing to 
pinpoint as reference.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-09 Thread Marģers . via fpc-devel
 

> By the way, what is your 'particular CPU'? If it's not Intel-based,
amd zen 1. gen - the same x86_64. Not much help for testing on other platforms.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Marģers . via fpc-devel
blobing  - i meant unnecessarily increase in size, that function loses good 
shape. There is no such word "blobing" in  English. My bad.
let me periphrases 'just wrong' - 'questionable right'. Currently inlining are 
left in hands of programmers. And it is abused as magical performance booster. 
For small function it's must likely true, for larger function it's 
questionable. 
1) it might increase index size for accessing local variables on stack.
2) it might increase jump instruction size
3) it changes code location (code cross page boundaries). For my particular cpu 
there are 64 byte code page. If loop can fit in it, speed is twice as it 
overlaps even one byte over page boundary. Jumping forward is ok (as expected 
code flow is always forward). And there is lager page few kb - calling outside 
- small penalty. As fpc do not manage this any how, it's just pure luck. It 
just might get unlucky. Code align generally do not solve thous things. 
Conclusion: by naked eye one cannot tell inline is any good or not. Inline or 
not to inline is nothing to do with philosophy, it has to be calculated (as 
clang does and fpc don't). 

I'm looking forward for jump optimization to be accepted.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] inline... and philosophy

2019-11-08 Thread Marģers . via fpc-devel
> - Identifying functions that are only used once.  This became a slight point 
> of contention between Florian and myself, because I inlined a couple of 
> functions

Inlining every once used function is just wrong. Gain from eliminating call and 
function prologue and epilogue might not be sufficient to outweigh "blobing" 
caller function. One optimizations of clang  is "outline" some parts of larger 
functions (like else statement).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] multi-line strings

2019-10-05 Thread Marģers . via fpc-devel
 hi,

Java is going to have multi-line strings https://www.youtube.com/watch?v=J1YKAFtNz70
I'm posting this, because they have different way of indentation  as its currently in proposed patch.
Maybe it worth of consideration. If not, it's ok.


 


 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] problem compiling compiler

2019-08-21 Thread Marģers . via fpc-devel
 

> make info PP=/home/user/fpc304/lib/fpc/3.0.4/ppcx64

> If the compiler is found, it should be reported
as the first line printed after == Configuration
info ==

== Configuration info ==

FPC.. /home/user/fpc304/lib/fpc/3.0.4/ppcx64

"make info" shows correct location of starting
compiler, but during actual compilation makefile
is unable locate it.
i can successfully compile 3.0.4, but unable to
compile 3.3.1 ( few month ago i was able do so ).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] problem compiling compiler

2019-08-21 Thread Marģers . via fpc-devel
Hi,
 i used simple script for compiling compiler

export PP=/home/user/fpc304/lib/fpc/3.0.4/ppcx64
make singlezipinstall OS_TARGET=linux
CPU_TARGET=x86_64    OPT=" 
-Fu/home/user/fpc304/lib/fpc/3.0.4/units/x86_64-linux/rtl/"

Reason of "export PP="  is that i don't have
installed fpc, so i have to specify path where to
find it.
Script was fine some time ago, but now it can not
find compiler: 

Makefile:135: *** Compiler
/home/user/where/is/my/copy/of/fpc/compiler/ppcx64
not found.  Stop.


Can i improve my script to actually work or is it
some sort of makefile bug?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Some thoughts on multi-line string support, and a possible syntax that I think is perfectly clean and Pascal-ish.

2019-07-07 Thread Marģers . via fpc-devel
 

- Reply to message -
From:  
To:  
> On 7/6/19 4:50 PM, wkitt...@windstream.net wrote:

> >   writeln("MultiLine1= '",MultiLine1,"'");
> >   writeln("MultiLine2= '",MultiLine2,"'");

> (* i forgot to do the line for MultiLine3 *)
> writeln("MultiLine3= '",MultiLine3,"'");
> (* that's what happens when you write directly
> in the email editor without testing *)

Well, trunk i use still has no string quote
character " .
Maybe it's time add that too while Ben Grasset on
implementing multi line strings? Just kidding...

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Some thoughts on multi-line string support, and a possible syntax that I think is perfectly clean and Pascal-ish.

2019-07-04 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] Some thoughts on
multi-line string support, and a possible syntax
that I think is perfectly clean and Pascal-ish.
Date: trešd., 3 jūl., 23:20
From:  Ben Grasset 
To:  FPC developers' list

> program Example;

> (*
>   This is a perfectly
>   normal multi-line
>   Pascal comment.
> *)

> const SA = `
>   This is a multiline
>   string using hypothetical backticks.
>   Imagine it was fully syntax-highlighted
>   like normal strings and the comment
>   above are.
> `;

> begin
> end. 

Why introduce ` if there already is ' ? Just use '
as well for multi line strings. For people of more
conservative view point, put multilinestring
behind mode switch.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] XML node dump feature

2019-06-24 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] XML node dump feature
Date: otrd., 25 jūn., 03:16
From:  Ben Grasset 
To:  FPC developers' list


> const 
>   A: TVec3F = (X: 1.2; Y: 2.4; Z: 3.8);
>   B: TVec3F = (X: 2.1; Y: 4.2; Z: 8.3);
>   // You can't do the next part currently, obviously
>   C: TVec3F = A + B;
>   D: TVec3F = A - B;
>   E: TVec3F = A * B;
>   F: TVec3F = A / B;
>

Sorry to say but, this should not work even with
*pure* function.  Typed constants are not truly
constants.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Possible idea... "safe" subroutines/methods

2019-05-05 Thread Marģers . via fpc-devel
> As mentioned in a previous message, fpc trunk
supports a volatile
> intrinsic:
>
http://wiki.freepascal.org/FPC_New_Features_Trunk#Support_for_.22volatile.22_intrinsic

My bad, I didn't know about volatile intrinsic.
So, does it mean that compiler is allowed to
optimize any variable, even promote global
variables to registers, as long they are not in
volatile(...)?


> Jonas
> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Possible idea... "safe" subroutines/methods

2019-05-05 Thread Marģers . via fpc-devel
As i understand this idea is another way around key word "volatile" in order to allow compiler perform more optimization.
Then why to go half way introducing "safe", when it's better introduce "volatile". Not too long ago here was discussion about it, but it was strongly rejected by core team.

I do not support idea of "safe", as it's partial solution to more complex problem. It's better to introduce "volatile" together with data flow analysis.
 

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline support for pure assembler routines on x86

2019-03-17 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] Successful implementation
of inline support for pure assembler routines on x86
Date: 2019. gada 17. marts 19:38:03
From:  Florian Klämpfl 
To:  
> Am 15.03.19 um 11:32 schrieb J. Gareth Moreton:
> * using inline assembler is always the worst
> solution to do something,
> it normally means either:
> - the compiler misses a certain feature

Question is about, who is going to implement every
imaginable feature in reasonable time frame? For
example intrinsics for BMI.

> - the compiler is generating bad code

Generated code for x86_64 instruction set is
pretty  good, if pascal code is tuned to take
advantaged of optimization currently done by
compiler (LLVM target does not do any better). But
assembler still give 10-20% boost. So there is
still space for improvements. 

> * intrinsics provide a much better way to achieve
> exactly what this
> approach aims at:
> - they enable the use of all registers as the
> compiler does register
> allocation

sadly it does not apply to bsf

> Is there any advantage over intrinsics, I missed
> so far?

To take advantage of flags changed by instruction.

I looking forward for assembler inline
functionality to be accepted. I would benefit of it.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Successful implementation of inline supportforpureassembler routines on x86

2019-03-17 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] Successful implementation
of inline supportforpureassembler routines on x86
Date: 2019. gada 18. marts 00:28:10
From:  J. Gareth Moreton 
To:  FPC developers' list


>   To use the integer clamp function as an
example (if x < 0 then x := 0):

> { Microsoft x64 calling convention... X is in ECX }
> function ClampInt(X: LongInt): LongInt;
assembler; nostackframe; inline;
> asm
>   XOR EAX, EAX
>   TEST ECX, ECX
>   CMOVG EAX, ECX
> end;

try code:
y:=0;
if x < 0 then x:=y;

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86-64: MOVZX peephole optimisations

2019-03-08 Thread Marģers . via fpc-devel
> I'm a tad confused in regards to the best course
of action regarding MOVZX.  Many of the peephole
optimisations seek to change them to MOV followed
by AND (e.g. "movzbl (mem), %eax" to "mov (mem),
%al; and $0xff, %eax").  Does MOVZX have a
well-documented performance penalty in modern
processors that favours the MOV/AND combination? 
It seems odd because the combination implies a
pipeline stall, which becomes more pronounced if
the MOV instruction is reading from memory.

For intel pentium and earlier processors
combination MOV, AND was better, but now days cpu
handle MOVZX as good as MOV.  It's just question
for which cpu to optimize?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] I'll be straight

2019-02-09 Thread Marģers . via fpc-devel
  
fillchar is one useful function, but should be
avoided from use in time critical code by any
cost. Only reasonable way to improve fillchar is
to make it internal function, where fpc can
decide, depending on parameters, what will be best
solution for filling mem with specified value. But
it  will also slow down compilation time. Not sure
fpc core team will accept that kind improvement.

Optimized code for speed most likely became less
readable and less maintainable. It's misfortune of
optimization.

- Reply to message -
Subject: Re: [fpc-devel] I'll be straight
Date: 2019. gada 9. februāris 10:35:51
From:  J. Gareth Moreton 
To:  FPC developers' list

> Thanks Michael,

> In the last patch that Jonas almost immediately
closed, the speed savings were inconclusive
because the number of cycles saved is probably
only a few dozen, but I would argue it makes the
code a bit more reasonable too because it replaces
things like loc[low(loc)] with loc[0] and fillchar
with a for-loop that initialises each element of
an array to zero (it's slightly faster because the
element size is a multiple of 16, while fillchar
is general-purpose and spends a lot of time
jumping around and even performing a
multiplication before it starts filling things
up).  I guess seeing it marked as "won't fix"
within 20 minutes was a moment of horror.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul
Date: 2018. gada 12. decembris 17:02:02
From:  J. Gareth Moreton 
To:  FPC developers' list

> By the way, what generates that set of
> operations? I'm curious because I want to
> see what's going on in the compiler. You
> see, "incq" and that "mov, add, mov" set
> aren't equivalent; anything over
> $1 gets truncated with the set,
> but not with "incq", although it's not a
> concern if only the lower 32 bits are
> used.

Have to agree, it's not equivalent. I added
example program for you to examine this situation.
It might and might not be an error. 
note: i use compiler parameter -O4

> If both combinations run at about the same
> speed, then "incq" is better just on
> account of code size.
I spent some time to examine "incq mem" and "mov
add mov"
On my particular cpu if "incq" is independent
instruction, then actual performance is 1 clock
cycle. 
Combination of "mov add mov" ended up like 1  -
1.2 clock cycles. Chain of "mov add mov" was
always few clocks more than the same length chain
of "incq".
But in case if "incq" fall into sever dependency
chain then "incq" executes 25% worse than "mov add
mov".
"incq" 4,5 clock cycles 
"mov add mov" 3,8 clock cycles

I vote for shorter code and prefer "incq" 

margers

program overhaul_incq;

var globalQ : longint;

function dummycall(a,b: longint):longint;
begin
 dummycall:=a+b;
end;

procedure fuu;
var k : longint;  { rbx for loop counter }
a,b,c,m,z,q  : longint; {no real use, just to occupie r12-r15}
sk : longint;  {no free real registers - so to be temp on stack}
begin
 sk:=0;
 q:=0; a:=0;
 for k:=0 to 100 do  { k takes rbx }
 begin
  { dummy math to keep busy registers r12 - r15 }
  c:=q+a;
  m:=k+1;
  { call discards  r8 - r11, rax, rdx, rcx, rdi, rsi - no use of them}
  z:=dummycall(k,c);
  q:=c+z;

  {  as fpc don't use rbp for variable,  }
  {  we don't have left any usable register }
  { incq [mem] }
  inc(sk);
  {writeln(k,' ',q);}
 end;
 globalQ:=q;
end;

begin
 fuu;
 writeln(globalQ);
end.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
 

> Nice spot with the "incq" command there.  It
wasn't intentional for that to be split into 3
commands, but is likely just a side-effect of pass
1 not being run twice now... granted, since one of
my criteria was that the code should not be less
optimal, I'll see if I can watch out for that one.

Both versions are kinda equivalent in execution
speed. 

> One interesting thing to note though is that the
read and add work on the 32-bit register, but then
the full 64-bit register is written.

As local variables are meant to be allocated in 
registers, but procedure has calls to other
procedures, they are stored "temporarily" on stack
as 64 bit registers.
It's not an error or at least not an error for
program logic in this case.


> > # [468] inc(sk);
> > --trunk -
> > incq 272(%rsp)

> > -- overhaul ---
> > movl 272(%rsp),%eax
> > addl $1,%eax
> > movq %rax,272(%rsp)

> > did you mean to be so?

> > margers

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] x86_64 Optimizer Overhaul

2018-12-12 Thread Marģers . via fpc-devel
 

- Reply to message -
Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul
Date: 2018. gada 6. decembris 18:57:29
From:  J. Gareth Moreton 
To:  FPC developers' list

> I believed I've fixed the bug.  Thanks for your
help.

Now it's way better. -O3 and -O4 works fine.
Speed test for my programs shows no measurable
difference.


# [468] inc(sk);
--trunk  - 
incq 272(%rsp)

-- overhaul --- 
movl272(%rsp),%eax
addl$1,%eax
movq%rax,272(%rsp)

did you mean to be so?

margers

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] x86_64 Optimizer Overhaul

2018-12-03 Thread Marģers . via fpc-devel
I run it no linux. Problem code part.

type PLongData = ^TLongData;
  TLongData = array [0..100] of longint;

function binarySearchLong ( sortedArray:PLongData; nLen, toFind:longint):longint;
var low, high, mid, l, h, m : longint;
begin
    { Returns index of toFind in sortedArray, or -1 if not found}
    low := 0;
    high := nLen - 1;

    l := sortedArray^[low];
    h := sortedArray^[high];

    while ((l <= toFind) and (h >= toFind)) do
    begin
 mid := (low + high) shr 1;   { var "low" in register r8d }
 m := sortedArray^[mid];

 if (m < toFind) then
 begin
  low := mid + 1;
  l := sortedArray^[low];

        { asm code generated
-- with trunk
    lea r8d, [r11d+1H]  
    mov  esi, r8d
--end trunk
-- with overhaul   it never set r8d to new value, but should
    lea esi, [r11d+1H]  
-- end  overhaul

    mov r10d, dword [rdi+rsi*4]     
    jmp ?_00144     

        }
 end else
 if (m > toFind) then
 begin
  high := mid - 1;
  h := sortedArray^[high];
 end else
 begin
    binarySearchLong:=mid;
    exit;
 end;
     
    end;

    if (sortedArray^[low] = toFind) then
    begin
 binarySearchLong:=low;
    end else
    binarySearchLong := -1; { Not found}
end;

 


 

 

- Reply to message -
Subject: Re: [fpc-devel] x86_64 Optimizer Overhaul
Date: 2018. gada 2. decembris 23:32:36
From:  J. Gareth Moreton 
To:  FPC developers' list 

Thanks for the feedback.  Do you have a reproducible case, and does it fail on Linux or Windows?  I'll have a look for the infinite loops in the meantime.
 

Gareth aka. Kit
 


 



 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] x86_64 Optimizer Overhaul

2018-12-03 Thread Marģers . via fpc-devel
> I've had problems testing it under Linux due to configuration difficulties, so if anyone is willing to try out "make all", I'll be most grateful. 

"make all" work well on linux.

Compiler options -O3 and -O4 are broken.
It was possible to compile my program, but program at some point went into never ending loop - cpu usage 100% and response zero.

Compiling my speed test program using -O2, optimizations made by Overhaul, was speed lose by 2% comparing to current trunk.  I guess, optimizations is good for compiler itself, but no so much for user programs.

margers
 


 

 

 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] LLVM code generator

2018-12-03 Thread Marģers . via fpc-devel
> The support is currently only on the
> https://svn.freepascal.org/FPC/svn/fpc/branches/debug_eh branch.

I got sources from
https://svn.freepascal.org/svn/fpc/branches/debug_eh

> ** Linux: you may also have to specify the library path to libgcc_s.
> E.g. on Ubuntu 16.04:
> make LOCALOPT="-dllvm -Fullvm -Fl/usr/lib/gcc/x86_64-linux-gnu/5"
> OPT="-Fullvm -Fl/usr/lib/gcc/x86_64-linux-gnu/5" all -j 4 FPMAKEOPT="-T
> 4"

compilation does not work for me with those options. I keep getting following error

Fatal: Cannot open whole program optimization feedback file "/home/blabla/src/llvm/compiler/pp1.wpo"
Fatal: Compilation aborted
Makefile:3912: recipe for target 'system.ppu' failed

Problem is LOCALOPT. As soon it is as parameter for make, then wpo files are not created.


 


margers
 


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel