Re: [fpc-devel] An interesting thought... AI
This is an interesting idea indeed. https://www.researchgate.net/publication/343322212_Automatic_Code_Optimization_With_Machine_Learning_And_Combinatorial_Optimization https://www.ijresm.com/Vol.2_2019/Vol2_Iss4_April19/IJRESM_V2_I4_149.pdf - Compiler Optimization using Artificial Intelligence https://arxiv.org/pdf/2110.09610.pdf - A Survey on Machine Learning Techniques for Source Code Analysis https://openreview.net/forum?id=SKat5ZX5RET - Self-Programming Artificial Intelligence Using Code-Generating Language Models The above links might give ideas for some googling. On Thu, Nov 10, 2022 at 3:10 PM J. Gareth Moreton via fpc-devel < fpc-devel@lists.freepascal.org> wrote: > Hi everyone, > > This has been something that has been on my mind for a while now, but > with my increasingly more complex optimisations being developed for the > Free Pascal Compiler and the code becoming an ever bigger spiderweb of > conditions, it got me to start wondering... might compiler optimisation > be a candidate for AI? Often I try to hand-optimise assembly language to > get the same output in fewer cycles (and fewer bytes too if possible), > and then see if I can program the compiler to match it. I can't hope to > catch every possible optimisation though, and I wonder if using an AI in > some way to develop more efficient machine code has ever been a serious > contender for research. I have heard of stories like the Deepmind AI > finding a faster way to multiply matrices, so it seems logical that it > can improve instruction processes. > > This is probably a lazy question, but what would be a good set of > resources when it comes to beginning machine learning, or at the very > least building simple models? When it comes to hardware, I have a > couple of 3060 Tis at my disposal for some parallel computation. > > Kit > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] State of SSE/AVX intrinsics
just as point for consideration, I'm not sure if data alignment will improve speed on future processors: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/ Food for thought: imagine if we had single (32 bits floating point) values dynamic arrays with 1 million values each: a b and c. I would love to have something like this: a := b + c; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC and Z80
I think that you'll find some interesting links on this thread: https://forum.lazarus.freepascal.org/index.php/topic,38569.msg262288.html#msg262288 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC 3.2.0RC1 released!
I regret to say that I can't reproduce my initial result showing 9% improvement on 3.2.0rc1 against 3.0.4. Both versions show the same speed now. I also compared 3.0.4 against trunk in another environment: Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1014-gcp x86_64) cpu model name: Intel(R) Xeon(R) CPU @ 2.00GHz This is the raw result from 3.0.4: 640 Examples seen. Accuracy:0.1006 Error: 1.79914 Loss:2.31176 Threads: 4 Forward time: 0.99s Backward time: 0.77s Step time: 1.51s 1280 Examples seen. Accuracy:0.1025 Error: 1.78724 Loss:2.26048 Threads: 4 Forward time: 0.99s Backward time: 0.75s Step time: 1.49s 1920 Examples seen. Accuracy:0.1087 Error: 1.78000 Loss:2.26476 Threads: 4 Forward time: 0.99s Backward time: 0.77s Step time: 1.49s This is the raw result from trunk: 640 Examples seen. Accuracy:0.1175 Error: 1.79696 Loss:2.30112 Threads: 4 Forward time: 0.94s Backward time: 0.72s Step time: 1.46s 1280 Examples seen. Accuracy:0.1203 Error: 1.79009 Loss:2.27688 Threads: 4 Forward time: 0.94s Backward time: 0.73s Step time: 1.44s 1920 Examples seen. Accuracy:0.1226 Error: 1.76832 Loss:2.20816 Threads: 4 Forward time: 0.93s Backward time: 0.74s Step time: 1.44s I usually look at the "Step time" for comparisons. Tested with: https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] FPC 3.2.0RC1 released!
Just tested with my own neural networks API and I can confirm that it works! Environment: WIN10 64bits AVX Tested with: https://github.com/joaopauloschuler/neural-api/blob/master/examples/SimpleImageClassifier/SimpleImageClassifier.lpr In this test, there is a performance gain (speed) against 3.0.4 at about 9%. Congrats! ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] market as inline is not inlined
Hello, I'm not sure if it's happening only to me, but I have a feeling that trunk produces more "marked as inline is not inlined" than FPC 3.0.4. This is an example if anyone intends to build and see: https://github.com/joaopauloschuler/neural-api/tree/master/examples/XorAndOr BTW, trunk is faster than FPC 3.0.4 at my end. Long life to pascal. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Simplicity vs. Complexity
Dear Moreton, I think that you might have touched the most important question of all. I'll express my own professional opinion in regards to this (not wishing to convince others - just expressing my own). I've been thinking on this question for more than 20 years. If you own a company and your developers develop code towards speed to a point that is too hard to find new employees able to understand the code, you will be in a dangerous/costly zone. Firstly, because you have to spend too much finding brave/qualified developers. Secondly, the harder the code gets to be understood, the harder will be future improvements and eventual bug fixes. Imagine that you own a company and your developers can't fix a bug introduced 10 years before...Worst, some companies that I worked for in the past had contractual requirements for fixing some problems in just 2 hours... It's a horrible place to be. In my mind, when you start modifying a code with only speed in mind, your code optimization reaches a local maxima (https://en.wikipedia.org/wiki/Maxima_and_minima). No one can safely optimize a code that doesn't fully understand. So, there is a paradox: by optimizing, you may prevent future optimizations because the human energy needed to understand the code might defuse future optimization attempts. The harder the code gets, more numerous voices will say "we need to trash and recode this". There are ways to deal with extreme optimizations. One way is extreme documentation. As an example, there is a 12 pages documentation about a one page code here: https://cnugteren.github.io/tutorial/pages/page1.html . As this email will be, our code will be read by others. I don't care if my mind understands it. Will others understand it? Food for thought I hope. Cheers! On Tue, Mar 26, 2019 at 5:20 PM J. Gareth Moreton wrote: > This is a question regarding the compiler > in general, and I sense there is no single > correct answer. > > As you may already know, FPC compiles > source code into intermediate nodes. Most > of these are quite straightforward, like > addition and a procedure call, but then > you get quite a few that map onto internal > functions and intrinsics like "abs" and > are otherwise handled directly by the > compiler rather than calling a function in > the System unit, say. > > In your experience, and through theory, > where should the line be drawn with > internal routines and explicitly writing a > function? I can see advantages in both > approaches, like it's easier to assemble a > node into a specific instruction set, but > it can cause a lot of bloat in the > compiler, while having an explicit > function reduces this compiler complexity > and allows for internal code improvements > and better acceptance of features like > pure functions, but may increase > compilation time and make optimisation > more difficult, depending on how it is > implemented. > > Just looking for discussion. > > Gareth aka. Kit > > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM
Hello Simon - wondering if you have code examples that provoke problems you are experiencing? It will be easier to measure/test improvements with test cases. Solutions might not come from a single person/team and therefore not sure how to apply the bounty in the most effective/fair way. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Attn. Florian, r39759
Assuming: v=1; x=10; y=3 (v-x) < (y-x) == (1-10) < (3-10) == -9 < -7 == *true* (v>=x) and (v<=y) == (1>=10) and (1<=3) == false and true = *false* ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
I can confirm that this works: VEXTRACTF32x4 xmm2, zmm0, 1 VEXTRACTF32x4 xmm3, zmm0, 2 VEXTRACTF32x4 xmm4, zmm0, 3 Well done job! I have more good news: I've just finished coding support for AVX512 in my own project: https://www.youtube.com/watch?v=qGnfwpKUTIQ I'm getting loads of warnings "marked as inline is not inlined". Is there anything I can do to be able to properly compile with inlines? I can't tell users to use your branch as of now as the lack of inline decreases speed. Anyway, thank you! ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
Quick update in reply to my own question: VEXTRACTF128 should not support zmm registers. Therefore, the current behavior is correct. This is the reference: https://www.felixcloutier.com/x86/VEXTRACTF128:VEXTRACTF32x4:VEXTRACTF64x2:VEXTRACTF32x8:VEXTRACTF64x4.html Anyway, supporting VEXTRACTF32x4 would help me. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
Hello, Almost everything I tested works perfectly. This is what I tested so far: zmm registers are properly recognized: end [ 'RAX', 'RCX', 'RDX', 'ymm2', 'ymm3', 'ymm4', 'ymm5', 'ymm0' {$IFDEF AVX512},'zmm2', 'zmm3', 'zmm0'{$ENDIF} ]; *These commands work:* VBROADCASTSS zmm0, [rdx] vmulps zmm2, zmm0, [rax] vmulps zmm3, zmm0, [rax+64] vmulps zmm2, zmm5, [rdx] vmulps zmm3, zmm5, [rdx+64] vmovups [rax],zmm2 vmovups [rax+64], zmm3 vaddps zmm2, zmm2, [rdx] vaddps zmm3, zmm3, [rdx+64] vsubps zmm2, zmm2, [rdx] vsubps zmm3, zmm3, [rdx+64] I'm getting more "inline" warnings than usual. Unfortunately, source forge is offline now and I can't share the code. Question: should the following 2 commands be supported? - vfmadd231ps zmm0, zmm5, [rax] - VEXTRACTF128 xmm3, zmm0, 2 Congrats for the work, JP ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
THANK YOU S MUCH!!! Intend to test along weekend. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
Thank you Kit. [VADDPS] (Ch_Wop3, Ch_Rop2, Ch_Rop1) xmmreg,xmmreg,xmmrm \362\370\1\x58\75\120 AVX,SANDYBRIDGE ymmreg,ymmreg,ymmrm \362\364\370\1\x58\75\120 AVX,SANDYBRIDGE In regards to the opcode, what is the base (doesn't look hexa) for these numbers? Example: \362\364\370\ - are these 16 bits numbers (too big for bytes)? x58 - is this 58 hexa? As an example, *vaddps ymm0 ymm1 ymm3* I was expecting: *C5F458C3* On Mon, Jun 18, 2018 at 5:26 AM, J. Gareth Moreton < gar...@moreton-family.com> wrote: > The file you want is compiler/x86/x86ins.dat, which contains the syntax > information for all of the x86-64 assembler commands. > > A tool that's run by "make" will then generate a number of .inc files that > are then referenced by the source code. > > Gareth aka. Kit > > > > On Sun 17/06/18 20:59 , Joao Schuler j...@schulers.com sent: > > I can give a try to support vaddps and other instructions I need the most > in AVX512. Where is the code (what file) for the above please? > > On Sun, Jun 17, 2018 at 6:30 PM, Florian Klämpfl > wrote: > >> Am 17.06.2018 um 06:37 schrieb Joao Schuler: >> >>> Hi, >>> I started testing the AVX512 branch: >>> https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/ >>> >>> This is the code: >>> >>> {$ASMMODE intel} >>> asm >>> vaddps zmm1, zmm2, zmm3 >>> end; >>> >>> The error message is: invalid combination of opcode and operands. >>> >>> The assembly code looks correct to me: >>> http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E >>> 1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf >>> >>> (look at page 19 above). >>> >>> I'm I doing something very wrong? >>> >> >> No, this is feature branch and work in progress. It is only useful to >> check out if you want to contribute to it. >> >> Should I submit a bug report? >>> >>> Only if you submit a patch with it :) >> ___ >> fpc-devel maillist - fpc-devel@lists.freepascal.org >> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel >> > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel;> > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
I can give a try to support vaddps and other instructions I need the most in AVX512. Where is the code (what file) for the above please? On Sun, Jun 17, 2018 at 6:30 PM, Florian Klämpfl wrote: > Am 17.06.2018 um 06:37 schrieb Joao Schuler: > >> Hi, >> I started testing the AVX512 branch: >> https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/ >> >> This is the code: >> >> {$ASMMODE intel} >> asm >> vaddps zmm1, zmm2, zmm3 >> end; >> >> The error message is: invalid combination of opcode and operands. >> >> The assembly code looks correct to me: >> http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E >> 1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf >> >> (look at page 19 above). >> >> I'm I doing something very wrong? >> > > No, this is feature branch and work in progress. It is only useful to > check out if you want to contribute to it. > > Should I submit a bug report? >> >> Only if you submit a patch with it :) > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] AVX 512 - Can't compile vaddps zmm1, zmm2, zmm3
Hi, I started testing the AVX512 branch: https://svn.freepascal.org/svn/fpc/branches/tg74/avx512/ This is the code: {$ASMMODE intel} asm vaddps zmm1, zmm2, zmm3 end; The error message is: invalid combination of opcode and operands. The assembly code looks correct to me: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=22E1CA82C5506AA7E639CACEB96C72AB?doi=10.1.1.697.2949=rep1=pdf (look at page 19 above). I'm I doing something very wrong? Should I submit a bug report? Kind regards, JP. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel