Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Yair Lenga
Thank you for feedback. I understand what are the limits of tcc. In my specific 
problem, I am trying to speed up user-provided expression in a simulation of 
100 paths. Can I use the avx512 build-in - e.g. work on 8 double precision 
values with one operation - practically reducing the 100 evaluations to 13 
(100/8) ?

User expressions are all in the form that can be handle by AVX SIMD 
instructions: add, multiple, … 

Thanks, yair. 

Sent from my iPad
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Elijah Stone

On Sat, 5 Feb 2022, Samir Ribić via Tinycc-devel wrote:

However, it is possible at library level.  Simply write matrix 
manipulations functions in assembly language.  Inside functions load 
data into AVX registers, do calculations in registers and at exit write 
the result into memory.


I think tcc's assembler does not support avx512 either :)

 -E___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Samir Ribić via Tinycc-devel
Single pass compilers still can have peephole optimization. Once I
demonstrated with situations when tcc generates
MOV EAX,const
MOV [location],EAX
and catched this case to generate
MOV DWORD [location],const

Back to the topic, AVX-512 is difficult to be supported at language level.
In the case of tcc, it is impossible because to convert array operations
into AVX-512 instructions you need  to see a global picture of the program.

However, it is possible at library level. Simply write matrix manipulations
functions in assembly language. Inside functions load data into AVX
registers, do calculations in registers and at exit write the result into
memory.


On Sat, Feb 5, 2022 at 4:55 PM Christian Jullien  wrote:

> An optimizer compiler need several pass to operate.
> - constant folding
> - register allocation
> - peephole optimization
> - branch prediction
> ...
>
> When it knows the target it can reorganize code to keep as much as
> possible data un L1 cache and have the longest series of instructions that
> can be executed without breaking the pipeline. i.e. instructions nearly run
> in //
>
> Tcc, which is one pass compiler, definitely loses on this point. On the
> other end, one pass makes it damn fast and that's why we love it.
>
> We can't have the butter and the money for the butter
>
> -Original Message-
> From: rem...@tutanota.com [mailto:rem...@tutanota.com]
> Sent: Saturday, February 05, 2022 16:10
> To: Jullien; Tinycc Devel
> Cc: Tinycc Devel
> Subject: Re: [Tinycc-devel] Optimizing for avx512
>
> 5 Φεβ 2022, 11:01 Από eli...@orange.fr:
>
> > The price to pay its really fast compilation is that the generated code
> is poor compared to gcc, clang or vc++ (among others). Depending on your
> program, consider it is roughly 2 to 4x slower.
> >
> I would say that this is not always the case. And correct me if I'm wrong
> but aren't optimization (except few of them) mostly because the programmer
> wrote bad code and the compiler found a better instructions to do the same
> thing? Inline assembly exists in the end so if you really really care about
> performance, you should probably use inline assembly in the most critical
> algorithms/functions. I've seen some code running the same on TCC and GCC
> so I suppose optimization doesn't always makes magic. Or you may have a 5%
> increase or even less. In any case, I would suggest using both TCC and then
> GCC/Clang for the critical parts that will be hugely favored by the
> optimizations these compilers can do.
>
> But of course just my opinion on the topic.
>
>
> ___
> Tinycc-devel mailing list
> Tinycc-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/tinycc-devel
>
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Christian Jullien
An optimizer compiler need several pass to operate.
- constant folding
- register allocation
- peephole optimization
- branch prediction
...

When it knows the target it can reorganize code to keep as much as possible 
data un L1 cache and have the longest series of instructions that can be 
executed without breaking the pipeline. i.e. instructions nearly run in //

Tcc, which is one pass compiler, definitely loses on this point. On the other 
end, one pass makes it damn fast and that's why we love it.

We can't have the butter and the money for the butter

-Original Message-
From: rem...@tutanota.com [mailto:rem...@tutanota.com] 
Sent: Saturday, February 05, 2022 16:10
To: Jullien; Tinycc Devel
Cc: Tinycc Devel
Subject: Re: [Tinycc-devel] Optimizing for avx512

5 Φεβ 2022, 11:01 Από eli...@orange.fr:

> The price to pay its really fast compilation is that the generated code is 
> poor compared to gcc, clang or vc++ (among others). Depending on your 
> program, consider it is roughly 2 to 4x slower.
>
I would say that this is not always the case. And correct me if I'm wrong but 
aren't optimization (except few of them) mostly because the programmer wrote 
bad code and the compiler found a better instructions to do the same thing? 
Inline assembly exists in the end so if you really really care about 
performance, you should probably use inline assembly in the most critical 
algorithms/functions. I've seen some code running the same on TCC and GCC so I 
suppose optimization doesn't always makes magic. Or you may have a 5% increase 
or even less. In any case, I would suggest using both TCC and then GCC/Clang 
for the critical parts that will be hugely favored by the optimizations these 
compilers can do.

But of course just my opinion on the topic.


___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Christian Jullien
Hi, I speak only for myself.
I'm not sure tcc is the right target for you.
We all loved to have tcc generating fast code but the two main goals of tcc are 
C compliance and FAST compilation code.
The price to pay its really fast compilation is that the generated code is poor 
compared to gcc, clang or vc++ (among others). Depending on your program, 
consider it is roughly 2 to 4x slower.

If you want a fast executable, use a C compiler that generates fast code (read 
gcc/clang).

You can combine tcc and gcc as I do: compile and test your code with tcc and, 
when it works, switch to gcc/clang with all possible optimization. Hint PGO, is 
a valuable way to make your gcc optimized code even faster. I would say between 
10 and 20%.

C.

-Original Message-
From: Tinycc-devel [mailto:tinycc-devel-bounces+eligis=orange...@nongnu.org] On 
Behalf Of Yair Lenga
Sent: Saturday, February 05, 2022 09:52
To: tinycc-devel@nongnu.org
Subject: [Tinycc-devel] Optimizing for avx512

I have a project which is running a user simulation - time consuming - user 
defined code. I hope that performance can be improved using SIMD instructions.

What is the optimization level supported by libtcc ? Can it generate optimized 
code for AVX512 ? See 4.x. Documentation indicate optimization done at 
statement level - possible to use SIMD functions directly ?

Thanks, yair

Sent from my iPad
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


[Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Yair Lenga
I have a project which is running a user simulation - time consuming - user 
defined code. I hope that performance can be improved using SIMD instructions.

What is the optimization level supported by libtcc ? Can it generate optimized 
code for AVX512 ? See 4.x. Documentation indicate optimization done at 
statement level - possible to use SIMD functions directly ?

Thanks, yair

Sent from my iPad
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel