Re: [fpc-devel] An interesting thought... AI

2022-11-10 Thread Joao Schuler via fpc-devel
This is an interesting idea indeed.

https://www.researchgate.net/publication/343322212_Automatic_Code_Optimization_With_Machine_Learning_And_Combinatorial_Optimization

https://www.ijresm.com/Vol.2_2019/Vol2_Iss4_April19/IJRESM_V2_I4_149.pdf  -
Compiler Optimization using Artificial Intelligence

https://arxiv.org/pdf/2110.09610.pdf -  A Survey on Machine Learning
Techniques for Source Code Analysis

https://openreview.net/forum?id=SKat5ZX5RET - Self-Programming Artificial
Intelligence Using Code-Generating Language Models

The above links might give ideas for some googling.

On Thu, Nov 10, 2022 at 3:10 PM J. Gareth Moreton via fpc-devel <
fpc-devel@lists.freepascal.org> wrote:

> Hi everyone,
>
> This has been something that has been on my mind for a while now, but
> with my increasingly more complex optimisations being developed for the
> Free Pascal Compiler and the code becoming an ever bigger spiderweb of
> conditions, it got me to start wondering... might compiler optimisation
> be a candidate for AI? Often I try to hand-optimise assembly language to
> get the same output in fewer cycles (and fewer bytes too if possible),
> and then see if I can program the compiler to match it.  I can't hope to
> catch every possible optimisation though, and I wonder if using an AI in
> some way to develop more efficient machine code has ever been a serious
> contender for research.  I have heard of stories like the Deepmind AI
> finding a faster way to multiply matrices, so it seems logical that it
> can improve instruction processes.
>
> This is probably a lazy question, but what would be a good set of
> resources when it comes to beginning machine learning, or at the very
> least building simple models?  When it comes to hardware, I have a
> couple of 3060 Tis at my disposal for some parallel computation.
>
> Kit
>
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] State of SSE/AVX intrinsics

2020-04-21 Thread Joao Schuler
just as point for consideration, I'm not sure if data alignment will
improve speed on future processors:
https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/

Food for thought: imagine if we had single (32 bits floating point) values
dynamic arrays with 1 million values each: a b and c. I would love to have
something like this:
a := b + c;
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC and Z80

2020-04-19 Thread Joao Schuler
I think that you'll find some interesting links on this thread:
https://forum.lazarus.freepascal.org/index.php/topic,38569.msg262288.html#msg262288
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-04-01 Thread Joao Schuler
I regret to say that I can't reproduce my initial result showing 9%
improvement on 3.2.0rc1 against 3.0.4. Both versions show the same speed
now.

I also compared 3.0.4 against trunk in another environment:
Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1014-gcp x86_64)
cpu model name: Intel(R) Xeon(R) CPU @ 2.00GHz

This is the raw result from 3.0.4:
640 Examples seen. Accuracy:0.1006 Error:   1.79914 Loss:2.31176 Threads: 4
Forward time:  0.99s Backward time:  0.77s Step time:  1.51s
1280 Examples seen. Accuracy:0.1025 Error:   1.78724 Loss:2.26048 Threads:
4 Forward time:  0.99s Backward time:  0.75s Step time:  1.49s
1920 Examples seen. Accuracy:0.1087 Error:   1.78000 Loss:2.26476 Threads:
4 Forward time:  0.99s Backward time:  0.77s Step time:  1.49s

This is the raw result from trunk:
640 Examples seen. Accuracy:0.1175 Error:   1.79696 Loss:2.30112 Threads: 4
Forward time:  0.94s Backward time:  0.72s Step time:  1.46s
1280 Examples seen. Accuracy:0.1203 Error:   1.79009 Loss:2.27688 Threads:
4 Forward time:  0.94s Backward time:  0.73s Step time:  1.44s
1920 Examples seen. Accuracy:0.1226 Error:   1.76832 Loss:2.20816 Threads:
4 Forward time:  0.93s Backward time:  0.74s Step time:  1.44s

I usually look at the "Step time" for comparisons.

Tested with:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 3.2.0RC1 released!

2020-03-30 Thread Joao Schuler
Just tested with my own neural networks API and I can confirm that it works!
Environment: WIN10 64bits AVX

Tested with:
https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr


In this test, there is a performance gain (speed) against 3.0.4 at about 9%.

Congrats!
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] market as inline is not inlined

2019-12-27 Thread Joao Schuler
Hello,
I'm not sure if it's happening only to me, but I have a feeling that trunk
produces more "marked as inline is not inlined" than FPC 3.0.4.

This is an example if anyone intends to build and see:
https://github.com/joaopauloschuler/neural-api/tree/master/examples/XorAndOr


BTW, trunk is faster than FPC 3.0.4 at my end.

Long life to pascal.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Simplicity vs. Complexity

2019-03-26 Thread Joao Schuler
Dear Moreton,
I think that you might have touched the most important question of all.
I'll express my own professional opinion in regards to this (not wishing to
convince others - just expressing my own).

I've been thinking on this question for more than 20 years. If you own a
company and your developers develop code towards speed to a point that is
too hard to find new employees able to understand the code, you will be in
a dangerous/costly zone. Firstly, because you have to spend too much
finding brave/qualified developers. Secondly, the harder the code gets to
be understood, the harder will be future improvements and eventual bug
fixes. Imagine that you own a company and your developers can't fix a bug
introduced 10 years before...Worst, some companies that I worked for in the
past had contractual requirements for fixing some problems in just 2
hours... It's a horrible place to be. In my mind, when you start modifying
a code with only speed in mind, your code optimization reaches a local
maxima (https://en.wikipedia.org/wiki/Maxima_and_minima). No one can safely
optimize a code that doesn't fully understand. So, there is a paradox: by
optimizing, you may prevent future optimizations because the human energy
needed to understand the code might defuse future optimization attempts.
The harder the code gets, more numerous voices will say "we need to trash
and recode this".

There are ways to deal with extreme optimizations. One way is extreme
documentation.  As an example, there is a 12 pages documentation about a
one page code here: https://cnugteren.github.io/tutorial/pages/page1.html .

As this email will be, our code will be read by others. I don't care if my
mind understands it. Will others understand it?

Food for thought I hope.

Cheers!

On Tue, Mar 26, 2019 at 5:20 PM J. Gareth Moreton 
wrote:

> This is a question regarding the compiler
> in general, and I sense there is no single
> correct answer.
>
> As you may already know, FPC compiles
> source code into intermediate nodes. Most
> of these are quite straightforward, like
> addition and a procedure call, but then
> you get quite a few that map onto internal
> functions and intrinsics like "abs" and
> are otherwise handled directly by the
> compiler rather than calling a function in
> the System unit, say.
>
> In your experience, and through theory,
> where should the line be drawn with
> internal routines and explicitly writing a
> function? I can see advantages in both
> approaches, like it's easier to assemble a
> node into a specific instruction set, but
> it can cause a lot of bloat in the
> compiler, while having an explicit
> function reduces this compiler complexity
> and allows for internal code improvements
> and better acceptance of features like
> pure functions, but may increase
> compilation time and make optimisation
> more difficult, depending on how it is
> implemented.
>
> Just looking for discussion.
>
> Gareth aka. Kit
>
>
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

2018-10-25 Thread Joao Schuler
Hello Simon - wondering if you have code examples that provoke problems you
are experiencing? It will be easier to measure/test improvements with test
cases. Solutions might not come from a single person/team and therefore not
sure how to apply the bounty in the most effective/fair way.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Attn. Florian, r39759

2018-09-17 Thread Joao Schuler
Assuming:  v=1; x=10; y=3

(v-x) < (y-x) == (1-10) < (3-10) == -9 < -7 == *true*

(v>=x) and (v<=y) == (1>=10) and (1<=3) == false and true = *false*
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-09-07 Thread Joao Schuler
I can confirm that this works:

  VEXTRACTF32x4 xmm2, zmm0, 1
  VEXTRACTF32x4 xmm3, zmm0, 2
  VEXTRACTF32x4 xmm4, zmm0, 3

Well done job!

I have more good news: I've just finished coding support for AVX512 in my
own project: https://www.youtube.com/watch?v=qGnfwpKUTIQ

I'm getting loads of warnings "marked as inline is not inlined". Is there
anything I can do to be able to properly compile with inlines? I can't tell
users to use your branch as of now as the lack of inline decreases speed.

Anyway, thank you!
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-08-26 Thread Joao Schuler
Quick update in reply to my own question: VEXTRACTF128 should not support
zmm registers. Therefore, the current behavior is correct. This is the
reference:
https://www.felixcloutier.com/x86/VEXTRACTF128:VEXTRACTF32x4:VEXTRACTF64x2:VEXTRACTF32x8:VEXTRACTF64x4.html

Anyway, supporting VEXTRACTF32x4 would help me.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-08-25 Thread Joao Schuler
Hello,
Almost everything I tested works perfectly.

This is what I tested so far:

zmm registers are properly recognized:
  end  [
'RAX', 'RCX', 'RDX',
'ymm2', 'ymm3', 'ymm4', 'ymm5', 'ymm0'
{$IFDEF AVX512},'zmm2', 'zmm3', 'zmm0'{$ENDIF}
  ];

*These commands work:*

  VBROADCASTSS zmm0, [rdx]
  vmulps  zmm2, zmm0, [rax]
  vmulps  zmm3, zmm0, [rax+64]
  vmulps  zmm2, zmm5, [rdx]
  vmulps  zmm3, zmm5, [rdx+64]
  vmovups [rax],zmm2
  vmovups [rax+64], zmm3
  vaddps  zmm2, zmm2, [rdx]
  vaddps  zmm3, zmm3, [rdx+64]
  vsubps  zmm2, zmm2, [rdx]
  vsubps  zmm3, zmm3, [rdx+64]

I'm getting more "inline" warnings than usual. Unfortunately, source forge
is offline now and I can't share the code.

Question: should the following 2 commands be supported?

   - vfmadd231ps zmm0, zmm5, [rax]
   - VEXTRACTF128 xmm3, zmm0, 2

Congrats for the work,
JP
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-08-22 Thread Joao Schuler
THANK YOU S MUCH!!! Intend to test along weekend.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-06-17 Thread Joao Schuler
Thank you Kit.

[VADDPS]
(Ch_Wop3, Ch_Rop2, Ch_Rop1)
xmmreg,xmmreg,xmmrm  \362\370\1\x58\75\120
AVX,SANDYBRIDGE
ymmreg,ymmreg,ymmrm  \362\364\370\1\x58\75\120
AVX,SANDYBRIDGE

In regards to the opcode, what is the base (doesn't look hexa) for these
numbers? Example:

\362\364\370\  - are these 16 bits numbers (too big for bytes)?

x58 - is this 58 hexa?

As an example,

*vaddps ymm0 ymm1 ymm3*
I was expecting:

*C5F458C3*

On Mon, Jun 18, 2018 at 5:26 AM, J. Gareth Moreton <
gar...@moreton-family.com> wrote:

> The file you want is compiler/x86/x86ins.dat, which contains the syntax
> information for all of the x86-64 assembler commands.
>
> A tool that's run by "make" will then generate a number of .inc files that
> are then referenced by the source code.
>
> Gareth aka. Kit
>
>
>
> On Sun 17/06/18 20:59 , Joao Schuler j...@schulers.com sent:
>
> I can give a try to support vaddps and other instructions I need the most
> in AVX512. Where is the code (what file) for the above please?
>
> On Sun, Jun 17, 2018 at 6:30 PM, Florian Klämpfl 
> wrote:
>
>> Am 17.06.2018 um 06:37 schrieb Joao Schuler:
>>
>>> Hi,
>>> I started testing the AVX512 branch:
>>> https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/
>>>
>>> This is the code:
>>>
>>> {$ASMMODE intel}
>>> asm
>>>  vaddps  zmm1, zmm2, zmm3
>>> end;
>>>
>>> The error message is: invalid combination of opcode and operands.
>>>
>>> The assembly code looks correct to me:
>>> http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E
>>> 1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf
>>>
>>> (look at page 19 above).
>>>
>>> I'm I doing something very wrong?
>>>
>>
>> No, this is feature branch and work in progress. It is only useful to
>> check out if you want to contribute to it.
>>
>> Should I submit a bug report?
>>>
>>> Only if you submit a patch with it :)
>> ___
>> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
>> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>
> ___
> fpc-devel maillist - fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel;>
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-06-17 Thread Joao Schuler
I can give a try to support vaddps and other instructions I need the most
in AVX512. Where is the code (what file) for the above please?

On Sun, Jun 17, 2018 at 6:30 PM, Florian Klämpfl 
wrote:

> Am 17.06.2018 um 06:37 schrieb Joao Schuler:
>
>> Hi,
>> I started testing the AVX512 branch:
>> https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/
>>
>> This is the code:
>>
>> {$ASMMODE intel}
>> asm
>>  vaddps  zmm1, zmm2, zmm3
>> end;
>>
>> The error message is: invalid combination of opcode and operands.
>>
>> The assembly code looks correct to me:
>> http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E
>> 1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf
>>
>> (look at page 19 above).
>>
>> I'm I doing something very wrong?
>>
>
> No, this is feature branch and work in progress. It is only useful to
> check out if you want to contribute to it.
>
> Should I submit a bug report?
>>
>> Only if you submit a patch with it :)
> ___
> fpc-devel maillist  -  fpc-devel@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3

2018-06-17 Thread Joao Schuler
Hi,
I started testing the AVX512 branch:
https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/

This is the code:

{$ASMMODE intel}
asm
vaddps  zmm1, zmm2, zmm3
end;

The error message is: invalid combination of opcode and operands.

The assembly code looks correct to me:
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf

(look at page 19 above).

I'm I doing something very wrong? Should I submit a bug report?

Kind regards,
JP.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel