Re: [Tinycc-devel] Optimizing for avx512
Thank you for feedback. I understand what are the limits of tcc. In my specific problem, I am trying to speed up user-provided expression in a simulation of 100 paths. Can I use the avx512 build-in - e.g. work on 8 double precision values with one operation - practically reducing the 100 evaluations to 13 (100/8) ? User expressions are all in the form that can be handle by AVX SIMD instructions: add, multiple, … Thanks, yair. Sent from my iPad ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Optimizing for avx512
On Sat, 5 Feb 2022, Samir Ribić via Tinycc-devel wrote: However, it is possible at library level. Simply write matrix manipulations functions in assembly language. Inside functions load data into AVX registers, do calculations in registers and at exit write the result into memory. I think tcc's assembler does not support avx512 either :) -E___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Optimizing for avx512
Single pass compilers still can have peephole optimization. Once I demonstrated with situations when tcc generates MOV EAX,const MOV [location],EAX and catched this case to generate MOV DWORD [location],const Back to the topic, AVX-512 is difficult to be supported at language level. In the case of tcc, it is impossible because to convert array operations into AVX-512 instructions you need to see a global picture of the program. However, it is possible at library level. Simply write matrix manipulations functions in assembly language. Inside functions load data into AVX registers, do calculations in registers and at exit write the result into memory. On Sat, Feb 5, 2022 at 4:55 PM Christian Jullien wrote: > An optimizer compiler need several pass to operate. > - constant folding > - register allocation > - peephole optimization > - branch prediction > ... > > When it knows the target it can reorganize code to keep as much as > possible data un L1 cache and have the longest series of instructions that > can be executed without breaking the pipeline. i.e. instructions nearly run > in // > > Tcc, which is one pass compiler, definitely loses on this point. On the > other end, one pass makes it damn fast and that's why we love it. > > We can't have the butter and the money for the butter > > -Original Message- > From: rem...@tutanota.com [mailto:rem...@tutanota.com] > Sent: Saturday, February 05, 2022 16:10 > To: Jullien; Tinycc Devel > Cc: Tinycc Devel > Subject: Re: [Tinycc-devel] Optimizing for avx512 > > 5 Φεβ 2022, 11:01 Από eli...@orange.fr: > > > The price to pay its really fast compilation is that the generated code > is poor compared to gcc, clang or vc++ (among others). Depending on your > program, consider it is roughly 2 to 4x slower. > > > I would say that this is not always the case. And correct me if I'm wrong > but aren't optimization (except few of them) mostly because the programmer > wrote bad code and the compiler found a better instructions to do the same > thing? Inline assembly exists in the end so if you really really care about > performance, you should probably use inline assembly in the most critical > algorithms/functions. I've seen some code running the same on TCC and GCC > so I suppose optimization doesn't always makes magic. Or you may have a 5% > increase or even less. In any case, I would suggest using both TCC and then > GCC/Clang for the critical parts that will be hugely favored by the > optimizations these compilers can do. > > But of course just my opinion on the topic. > > > ___ > Tinycc-devel mailing list > Tinycc-devel@nongnu.org > https://lists.nongnu.org/mailman/listinfo/tinycc-devel > ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Optimizing for avx512
An optimizer compiler need several pass to operate. - constant folding - register allocation - peephole optimization - branch prediction ... When it knows the target it can reorganize code to keep as much as possible data un L1 cache and have the longest series of instructions that can be executed without breaking the pipeline. i.e. instructions nearly run in // Tcc, which is one pass compiler, definitely loses on this point. On the other end, one pass makes it damn fast and that's why we love it. We can't have the butter and the money for the butter -Original Message- From: rem...@tutanota.com [mailto:rem...@tutanota.com] Sent: Saturday, February 05, 2022 16:10 To: Jullien; Tinycc Devel Cc: Tinycc Devel Subject: Re: [Tinycc-devel] Optimizing for avx512 5 Φεβ 2022, 11:01 Από eli...@orange.fr: > The price to pay its really fast compilation is that the generated code is > poor compared to gcc, clang or vc++ (among others). Depending on your > program, consider it is roughly 2 to 4x slower. > I would say that this is not always the case. And correct me if I'm wrong but aren't optimization (except few of them) mostly because the programmer wrote bad code and the compiler found a better instructions to do the same thing? Inline assembly exists in the end so if you really really care about performance, you should probably use inline assembly in the most critical algorithms/functions. I've seen some code running the same on TCC and GCC so I suppose optimization doesn't always makes magic. Or you may have a 5% increase or even less. In any case, I would suggest using both TCC and then GCC/Clang for the critical parts that will be hugely favored by the optimizations these compilers can do. But of course just my opinion on the topic. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Optimizing for avx512
Hi, I speak only for myself. I'm not sure tcc is the right target for you. We all loved to have tcc generating fast code but the two main goals of tcc are C compliance and FAST compilation code. The price to pay its really fast compilation is that the generated code is poor compared to gcc, clang or vc++ (among others). Depending on your program, consider it is roughly 2 to 4x slower. If you want a fast executable, use a C compiler that generates fast code (read gcc/clang). You can combine tcc and gcc as I do: compile and test your code with tcc and, when it works, switch to gcc/clang with all possible optimization. Hint PGO, is a valuable way to make your gcc optimized code even faster. I would say between 10 and 20%. C. -Original Message- From: Tinycc-devel [mailto:tinycc-devel-bounces+eligis=orange...@nongnu.org] On Behalf Of Yair Lenga Sent: Saturday, February 05, 2022 09:52 To: tinycc-devel@nongnu.org Subject: [Tinycc-devel] Optimizing for avx512 I have a project which is running a user simulation - time consuming - user defined code. I hope that performance can be improved using SIMD instructions. What is the optimization level supported by libtcc ? Can it generate optimized code for AVX512 ? See 4.x. Documentation indicate optimization done at statement level - possible to use SIMD functions directly ? Thanks, yair Sent from my iPad ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
[Tinycc-devel] Optimizing for avx512
I have a project which is running a user simulation - time consuming - user defined code. I hope that performance can be improved using SIMD instructions. What is the optimization level supported by libtcc ? Can it generate optimized code for AVX512 ? See 4.x. Documentation indicate optimization done at statement level - possible to use SIMD functions directly ? Thanks, yair Sent from my iPad ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel