Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-04-01 Thread J. Gareth Moreton
I'm having problems connecting to the FTP server to download the release 
candidate - it keeps timing out.


Gareth aka. Kit

On 29/03/2020 19:18, Marco van de Voort wrote:

Hello,

We have placed the first release candidate of the Free Pascal Compiler
version 3.2.0 on our ftp servers.

You can help improve the upcoming 3.2.0 release by downloading and
testing this release. If you want you can report what you have done here:
http://wiki.freepascal.org/Testers_3.2.0 or in the maillist.

Changes that may break backwards compatibility will be documented at:
http://wiki.freepascal.org/User_Changes_3.2.0

Downloads are available at the main FTP server,

ftp://ftp.freepascal.org/pub/fpc/beta/3.2.0-rc1/

Enjoy!

The Free Pascal Compiler Team

For an overview of what is new see

https://wiki.freepascal.org/FPC_New_Features_3.2

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-04-01 Thread Joao Schuler
I regret to say that I can't reproduce my initial result showing 9%
improvement on 3.2.0rc1 against 3.0.4. Both versions show the same speed
now.

I also compared 3.0.4 against trunk in another environment:
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1014-gcp x86_64)
cpu model name: Intel(R) Xeon(R) CPU @ 2.00GHz

This is the raw result from 3.0.4:
640 Examples seen. Accuracy:0.1006 Error:   1.79914 Loss:2.31176 Threads: 4
Forward time:  0.99s Backward time:  0.77s Step time:  1.51s
1280 Examples seen. Accuracy:0.1025 Error:   1.78724 Loss:2.26048 Threads:
4 Forward time:  0.99s Backward time:  0.75s Step time:  1.49s
1920 Examples seen. Accuracy:0.1087 Error:   1.78000 Loss:2.26476 Threads:
4 Forward time:  0.99s Backward time:  0.77s Step time:  1.49s

This is the raw result from trunk:
640 Examples seen. Accuracy:0.1175 Error:   1.79696 Loss:2.30112 Threads: 4
Forward time:  0.94s Backward time:  0.72s Step time:  1.46s
1280 Examples seen. Accuracy:0.1203 Error:   1.79009 Loss:2.27688 Threads:
4 Forward time:  0.94s Backward time:  0.73s Step time:  1.44s
1920 Examples seen. Accuracy:0.1226 Error:   1.76832 Loss:2.20816 Threads:
4 Forward time:  0.93s Backward time:  0.74s Step time:  1.44s

I usually look at the "Step time" for comparisons.

Tested with:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-04-01 Thread J. Gareth Moreton
With your permission I'd like to use this project as part of my work to 
improve the peephole optimiser, among other things.  Any improvements 
probably won't happen until the version after 3.2.0, but I like a challenge!


J. Gareth Moreton, aka. Kit

P.S. One of the things I introduced into FPC 3.2.0 is the "vectorcall" 
calling convention for x86_64-win64.  It's currently mostly there for 
pure assembler routines and for interfacing with third-party libraries, 
since automatic vectorisation of Pascal code is still rather limited.  
It might be something to experiment with though.



On 01/04/2020 17:39, Kostas Michalopoulos via fpc-devel wrote:
Hm, for me the new compiler produces slightly slower results. The 
difference is tiny, but consistent. I use my raytracing benchmark from 
here:


http://runtimeterror.com/tools/raybench/

The results on my AMD Ryzen 3700X are:

FPC 3.0.4: 3.984 seconds
FPC 3.2.0RC1: 4.047 seconds

As i wrote, the difference is tiny, but over several runs it is pretty 
much consistent.


Note that these are for 32bit Windows executables. Also i'm using the 
Lazarus-bundled build mentioned by Martin Frb above.



On Tue, Mar 31, 2020 at 11:59 PM Florian Klämpfl 
mailto:flor...@freepascal.org>> wrote:


Am 31.03.20 um 05:55 schrieb Joao Schuler:
> Just tested with my own neural networks API and I can confirm
that it works!
> Environment: WIN10 64bits AVX
>
> Tested with:
>

https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr

>
>
> In this test, there is a performance gain (speed) against 3.0.4
at about 9%.

Do you have numbers in comparison with trunk?

___
fpc-devel maillist  - fpc-devel@lists.freepascal.org

https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel



--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Error building xtensa rtl

2020-04-01 Thread Christo Crause via fpc-devel
On Wed, Apr 1, 2020 at 6:58 PM Christo Crause 
wrote:

> Good idea, the alternative instructions are part of the core ISA so it
> should always be supported.
>

So the updated cgcpu patch attached...
diff --git a/compiler/xtensa/cgcpu.pas b/compiler/xtensa/cgcpu.pas
index a1fdbede87..25b6cb40b9 100644
--- a/compiler/xtensa/cgcpu.pas
+++ b/compiler/xtensa/cgcpu.pas
@@ -167,14 +167,26 @@ implementation
   list.concat(taicpu.op_reg_reg_const_const(A_EXTUI,reg2,reg1,0,8));
 OS_S8:
   begin
-list.concat(taicpu.op_reg_reg_const(A_SEXT,reg2,reg1,7));
+if CPUXTENSA_HAS_SEXT in cpu_capabilities[current_settings.cputype] then
+  list.concat(taicpu.op_reg_reg_const(A_SEXT,reg2,reg1,7))
+else
+  begin
+list.concat(taicpu.op_reg_reg_const(A_SLLI,reg2,reg1,24));
+list.concat(taicpu.op_reg_reg_const(A_SRAI,reg2,reg2,24));
+  end;
 if tosize=OS_16 then
   list.concat(taicpu.op_reg_reg_const_const(A_EXTUI,reg2,reg2,0,16));
   end;
 OS_16:
   list.concat(taicpu.op_reg_reg_const_const(A_EXTUI,reg2,reg1,0,16));
 OS_S16:
-  list.concat(taicpu.op_reg_reg_const(A_SEXT,reg2,reg1,15));
+  if CPUXTENSA_HAS_SEXT in cpu_capabilities[current_settings.cputype] then
+list.concat(taicpu.op_reg_reg_const(A_SEXT,reg2,reg1,15))
+  else
+begin
+  list.concat(taicpu.op_reg_reg_const(A_SLLI,reg2,reg1,16));
+  list.concat(taicpu.op_reg_reg_const(A_SRAI,reg2,reg2,16));
+end;
 else
   conv_done:=false;
   end;
@@ -258,7 +270,13 @@ implementation
 list.concat(taicpu.op_reg_ref(op,reg,href));
 
 if (fromsize=OS_S8) and not(tosize in [OS_S8,OS_8]) then
-  list.concat(taicpu.op_reg_reg_const(A_SEXT,reg,reg,7));
+  if CPUXTENSA_HAS_SEXT in cpu_capabilities[current_settings.cputype] then
+list.concat(taicpu.op_reg_reg_const(A_SEXT,reg,reg,7))
+  else
+begin
+  list.concat(taicpu.op_reg_reg_const(A_SLLI,reg,reg,24));
+  list.concat(taicpu.op_reg_reg_const(A_SRAI,reg,reg,24));
+end;
 if (fromsize<>tosize) and (not (tosize in [OS_SINT,OS_INT])) then
   a_load_reg_reg(list,fromsize,tosize,reg,reg);
   end;
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Error building xtensa rtl

2020-04-01 Thread Christo Crause via fpc-devel
On Wed, Apr 1, 2020 at 12:06 AM Sven Barth 
wrote:

> Christo Crause  schrieb am Di., 31. März 2020,
> 19:45:
>
>> On Tue, Mar 31, 2020 at 7:39 AM Sven Barth via fpc-devel <
>> fpc-devel@lists.freepascal.org> wrote:
>>
>>> Am 30.03.2020 um 22:07 schrieb Christo Crause via fpc-devel:
>>>
>>> I've noticed GCC uses the SLLI + SRAI instructions to perform sign
>>> extension on ESP8266.
>>>
>>> Since different CPUs can support different subsets of the Xtensa
>>> instructions do you think a finalizecode type function can be used as a
>>> post code generation step to map unsupported instructions to alternative
>>> sequences?
>>>
>>>
>>> These are simply different CPU types (-CpXXX or selected by the
>>> controller type) which the code generator will handle accordingly. Just
>>> like it's done with ARM, AVR and all other platforms.
>>>
>>
>> Attach please find a patch to rtl/embedded/MakeFile* to handle subarch
>> similar to avr and others.
>>
>
> Did you manually edit the Makefile or regenerate it from the Makefile.fpc?
> If the former then your changes at the top will be overwritten by the next
> makefile regeneration.
>

I directly edited the makefile to demonstrate the principle. In addition to
a change to makefile.fpc an update to fpcmake.ini is also required, see
attached fpcmake.patch


> Also attached a patch that checks whether the SEXT instruction is
>> available for the current subarchitecture, else it generates SLLI + SRAI
>> combination.
>>
>
> If SLLI and SRAI are supported by the other processors supported by FPC
> then you don't need to check for the processor type, checking against the
> capability for SEXT is enough. If some processor does not support SLLI or
> SRAI either then this would need to be a capability as well.
>

Good idea, the alternative instructions are part of the core ISA so it
should always be supported.
diff --git a/utils/fpcm/fpcmake.ini b/utils/fpcm/fpcmake.ini
index 42cd924023..1b3b179cfb 100644
--- a/utils/fpcm/fpcmake.ini
+++ b/utils/fpcm/fpcmake.ini
@@ -288,6 +288,13 @@ endif
 override FPCOPT+=-Cp$(SUBARCH)
 endif
 
+ifeq ($(FULL_TARGET),xtensa-embedded)
+ifeq ($(SUBARCH),)
+$(error When compiling for xtensa-embedded, a sub-architecture (e.g. SUBARCH=lx106 or SUBARCH=lx6) must be defined)
+endif
+override FPCOPT+=-Cp$(SUBARCH)
+endif
+
 # Full name of the target, including CPU and OS. For OSs limited
 # to 8.3 we only use the target OS
 ifneq ($(findstring $(OS_SOURCE),$(LIMIT83fs)),)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-04-01 Thread Kostas Michalopoulos via fpc-devel
Hm, for me the new compiler produces slightly slower results. The
difference is tiny, but consistent. I use my raytracing benchmark from here:

http://runtimeterror.com/tools/raybench/

The results on my AMD Ryzen 3700X are:

FPC 3.0.4: 3.984 seconds
FPC 3.2.0RC1: 4.047 seconds

As i wrote, the difference is tiny, but over several runs it is pretty much
consistent.

Note that these are for 32bit Windows executables. Also i'm using the
Lazarus-bundled build mentioned by Martin Frb above.


On Tue, Mar 31, 2020 at 11:59 PM Florian Klämpfl 
wrote:

> Am 31.03.20 um 05:55 schrieb Joao Schuler:
> > Just tested with my own neural networks API and I can confirm that it
> works!
> > Environment: WIN10 64bits AVX
> >
> > Tested with:
> >
> https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr
> >
> >
> > In this test, there is a performance gain (speed) against 3.0.4 at about
> 9%.
>
> Do you have numbers in comparison with trunk?
>
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel