Re: gcc -S vs clang -S
On Wed, May 13, 2015 at 08:41:39AM -0600, Martin Sebor wrote: On 05/12/2015 07:40 PM, Andrew Pinski wrote: On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote: I think Thiago and Eric just want to know which code-gen is better and why... You need to understand for a complex process (CISC ISAs) like x86, there is no one right answer sometimes. You need to look at each micro-arch and understand the pipeline. Sometimes different code stream will performance the same but it also depends on the code size too. A good place to start is the Intel 64 and IA-32 Architectures Optimization Reference Manual. It lists the throughput and latencies of x86 instructions and gives guidance for which ones might be more efficient on which processors. For example, in the section titled Using LEA it discusses why the three operand form of the instruction is slower on the Sandy Bridge microarchitecture than on others: http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf But leal (%rdi,%rsi), %eax is not the slower case it talks about. Furthermore, supposedly the generic tuning is used, in which case it really doesn't matter that much if it is slower or faster on a particular CPU, but if it is in general slower or faster on the whole basket of CPUs that the generic tuning is based on. Jakub
Re: gcc -S vs clang -S
On 05/12/2015 07:40 PM, Andrew Pinski wrote: On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote: I think Thiago and Eric just want to know which code-gen is better and why... You need to understand for a complex process (CISC ISAs) like x86, there is no one right answer sometimes. You need to look at each micro-arch and understand the pipeline. Sometimes different code stream will performance the same but it also depends on the code size too. A good place to start is the Intel 64 and IA-32 Architectures Optimization Reference Manual. It lists the throughput and latencies of x86 instructions and gives guidance for which ones might be more efficient on which processors. For example, in the section titled Using LEA it discusses why the three operand form of the instruction is slower on the Sandy Bridge microarchitecture than on others: http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf Martin Thanks, Andrew Pinski 2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr: Note that at -O3 there is a difference still: clang (3.6.0): addl%esi, %edi movl%edi, %eax retq gcc (4.9.2) leal(%rdi,%rsi), %eax ret Can't tell which is best, if any. But what's your point exactly here? You cannot expect different compilers to generate exactly the same code on a given testcase for non-toy architectures. Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org -- Eric Botcazou
Re: gcc -S vs clang -S
On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote: I think Thiago and Eric just want to know which code-gen is better and why... You need to understand for a complex process (CISC ISAs) like x86, there is no one right answer sometimes. You need to look at each micro-arch and understand the pipeline. Sometimes different code stream will performance the same but it also depends on the code size too. Thanks, Andrew Pinski 2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr: Note that at -O3 there is a difference still: clang (3.6.0): addl%esi, %edi movl%edi, %eax retq gcc (4.9.2) leal(%rdi,%rsi), %eax ret Can't tell which is best, if any. But what's your point exactly here? You cannot expect different compilers to generate exactly the same code on a given testcase for non-toy architectures. Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org -- Eric Botcazou
Re: gcc -S vs clang -S
I think Thiago and Eric just want to know which code-gen is better and why... 2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr: Note that at -O3 there is a difference still: clang (3.6.0): addl%esi, %edi movl%edi, %eax retq gcc (4.9.2) leal(%rdi,%rsi), %eax ret Can't tell which is best, if any. But what's your point exactly here? You cannot expect different compilers to generate exactly the same code on a given testcase for non-toy architectures. Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org -- Eric Botcazou
Re: gcc -S vs clang -S
Note that at -O3 there is a difference still: clang (3.6.0): addl%esi, %edi movl%edi, %eax retq gcc (4.9.2) leal(%rdi,%rsi), %eax ret Can't tell which is best, if any. OG. On Tue, May 12, 2015 at 4:06 AM, pins...@gmail.com wrote: On May 11, 2015, at 6:16 PM, Thiago Farina tfrans...@gmail.com wrote: Hi, Clang 3.7 generated the following code: $ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-4(%rbp), %esi addl-8(%rbp), %esi movl%esi, %eax popq%rbp retq While gcc 4.8 generated the following: $ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-8(%rbp), %eax movl-4(%rbp), %edx addl%edx, %eax popq%rbp ret $ cat add.c int add(int a, int b) { return a + b; } Is the clang version better? Neither is better or worse due to this is at -O0. Thanks, Andrew -- Thiago Farina
Re: gcc -S vs clang -S
Note that at -O3 there is a difference still: clang (3.6.0): addl%esi, %edi movl%edi, %eax retq gcc (4.9.2) leal(%rdi,%rsi), %eax ret Can't tell which is best, if any. But what's your point exactly here? You cannot expect different compilers to generate exactly the same code on a given testcase for non-toy architectures. Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org -- Eric Botcazou
gcc -S vs clang -S
Hi, Clang 3.7 generated the following code: $ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-4(%rbp), %esi addl-8(%rbp), %esi movl%esi, %eax popq%rbp retq While gcc 4.8 generated the following: $ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-8(%rbp), %eax movl-4(%rbp), %edx addl%edx, %eax popq%rbp ret $ cat add.c int add(int a, int b) { return a + b; } Is the clang version better? -- Thiago Farina
Re: gcc -S vs clang -S
On May 11, 2015, at 6:16 PM, Thiago Farina tfrans...@gmail.com wrote: Hi, Clang 3.7 generated the following code: $ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-4(%rbp), %esi addl-8(%rbp), %esi movl%esi, %eax popq%rbp retq While gcc 4.8 generated the following: $ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c -o add_att_x64.s add: pushq %rbp movq%rsp, %rbp movl%edi, -4(%rbp) movl%esi, -8(%rbp) movl-8(%rbp), %eax movl-4(%rbp), %edx addl%edx, %eax popq%rbp ret $ cat add.c int add(int a, int b) { return a + b; } Is the clang version better? Neither is better or worse due to this is at -O0. Thanks, Andrew -- Thiago Farina