Re: gcc -S vs clang -S

2015-05-13 Thread Jakub Jelinek
On Wed, May 13, 2015 at 08:41:39AM -0600, Martin Sebor wrote:
 On 05/12/2015 07:40 PM, Andrew Pinski wrote:
 On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote:
 I think Thiago and Eric just want to know which code-gen is better and 
 why...
 
 
 You need to understand for a complex process (CISC ISAs) like x86,
 there is no one right answer sometimes.  You need to look at each
 micro-arch and understand the pipeline.  Sometimes different code
 stream will performance the same but it also depends on the code size
 too.
 
 A good place to start is the Intel 64 and IA-32 Architectures
 Optimization Reference Manual. It lists the throughput and
 latencies of x86 instructions and gives guidance for which
 ones might be more efficient on which processors. For example,
 in the section titled Using LEA it discusses why the three
 operand form of the instruction is slower on the Sandy Bridge
 microarchitecture than on others:
 
 http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

But leal (%rdi,%rsi), %eax is not the slower case it talks about.
Furthermore, supposedly the generic tuning is used, in which case it really
doesn't matter that much if it is slower or faster on a particular CPU,
but if it is in general slower or faster on the whole basket of CPUs that
the generic tuning is based on.

Jakub


Re: gcc -S vs clang -S

2015-05-13 Thread Martin Sebor

On 05/12/2015 07:40 PM, Andrew Pinski wrote:

On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote:

I think Thiago and Eric just want to know which code-gen is better and why...



You need to understand for a complex process (CISC ISAs) like x86,
there is no one right answer sometimes.  You need to look at each
micro-arch and understand the pipeline.  Sometimes different code
stream will performance the same but it also depends on the code size
too.


A good place to start is the Intel 64 and IA-32 Architectures
Optimization Reference Manual. It lists the throughput and
latencies of x86 instructions and gives guidance for which
ones might be more efficient on which processors. For example,
in the section titled Using LEA it discusses why the three
operand form of the instruction is slower on the Sandy Bridge
microarchitecture than on others:

http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

Martin



Thanks,
Andrew Pinski



2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr:

Note that at -O3 there is a difference still:
clang (3.6.0):
 addl%esi, %edi
 movl%edi, %eax
 retq

gcc (4.9.2)
 leal(%rdi,%rsi), %eax
 ret

Can't tell which is best, if any.


But what's your point exactly here?  You cannot expect different compilers to
generate exactly the same code on a given testcase for non-toy architectures.

Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org

--
Eric Botcazou




Re: gcc -S vs clang -S

2015-05-12 Thread Andrew Pinski
On Tue, May 12, 2015 at 6:36 PM, Fei Ding fding...@gmail.com wrote:
 I think Thiago and Eric just want to know which code-gen is better and why...


You need to understand for a complex process (CISC ISAs) like x86,
there is no one right answer sometimes.  You need to look at each
micro-arch and understand the pipeline.  Sometimes different code
stream will performance the same but it also depends on the code size
too.

Thanks,
Andrew Pinski


 2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr:
 Note that at -O3 there is a difference still:
 clang (3.6.0):
 addl%esi, %edi
 movl%edi, %eax
 retq

 gcc (4.9.2)
 leal(%rdi,%rsi), %eax
 ret

 Can't tell which is best, if any.

 But what's your point exactly here?  You cannot expect different compilers to
 generate exactly the same code on a given testcase for non-toy architectures.

 Note that this kind of discussion is more appropriate for 
 gcc-h...@gcc.gnu.org

 --
 Eric Botcazou


Re: gcc -S vs clang -S

2015-05-12 Thread Fei Ding
I think Thiago and Eric just want to know which code-gen is better and why...

2015-05-12 23:29 GMT+08:00 Eric Botcazou ebotca...@libertysurf.fr:
 Note that at -O3 there is a difference still:
 clang (3.6.0):
 addl%esi, %edi
 movl%edi, %eax
 retq

 gcc (4.9.2)
 leal(%rdi,%rsi), %eax
 ret

 Can't tell which is best, if any.

 But what's your point exactly here?  You cannot expect different compilers to
 generate exactly the same code on a given testcase for non-toy architectures.

 Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org

 --
 Eric Botcazou


Re: gcc -S vs clang -S

2015-05-12 Thread Olivier Galibert
Note that at -O3 there is a difference still:
clang (3.6.0):
addl%esi, %edi
movl%edi, %eax
retq

gcc (4.9.2)
leal(%rdi,%rsi), %eax
ret

Can't tell which is best, if any.

  OG.


On Tue, May 12, 2015 at 4:06 AM,  pins...@gmail.com wrote:




 On May 11, 2015, at 6:16 PM, Thiago Farina tfrans...@gmail.com wrote:

 Hi,

 Clang 3.7 generated the following code:

 $ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables
 add.c -o add_att_x64.s

 add:
 pushq   %rbp
 movq%rsp, %rbp
 movl%edi, -4(%rbp)
 movl%esi, -8(%rbp)
 movl-4(%rbp), %esi
 addl-8(%rbp), %esi
 movl%esi, %eax
 popq%rbp
 retq

 While gcc 4.8 generated the following:

 $ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c
 -o add_att_x64.s

 add:
pushq   %rbp
movq%rsp, %rbp
movl%edi, -4(%rbp)
movl%esi, -8(%rbp)
movl-8(%rbp), %eax
movl-4(%rbp), %edx
addl%edx, %eax
popq%rbp
ret

 $ cat add.c
 int add(int a, int b) {
return a + b;
 }

 Is the clang version better?

 Neither is better or worse due to this is at -O0.

 Thanks,
 Andrew


 --
 Thiago Farina


Re: gcc -S vs clang -S

2015-05-12 Thread Eric Botcazou
 Note that at -O3 there is a difference still:
 clang (3.6.0):
 addl%esi, %edi
 movl%edi, %eax
 retq
 
 gcc (4.9.2)
 leal(%rdi,%rsi), %eax
 ret
 
 Can't tell which is best, if any.

But what's your point exactly here?  You cannot expect different compilers to 
generate exactly the same code on a given testcase for non-toy architectures.

Note that this kind of discussion is more appropriate for gcc-h...@gcc.gnu.org

-- 
Eric Botcazou


gcc -S vs clang -S

2015-05-11 Thread Thiago Farina
Hi,

Clang 3.7 generated the following code:

$ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables
add.c -o add_att_x64.s

add:
 pushq   %rbp
 movq%rsp, %rbp
 movl%edi, -4(%rbp)
 movl%esi, -8(%rbp)
 movl-4(%rbp), %esi
 addl-8(%rbp), %esi
 movl%esi, %eax
 popq%rbp
 retq

While gcc 4.8 generated the following:

$ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c
-o add_att_x64.s

add:
pushq   %rbp
movq%rsp, %rbp
movl%edi, -4(%rbp)
movl%esi, -8(%rbp)
movl-8(%rbp), %eax
movl-4(%rbp), %edx
addl%edx, %eax
popq%rbp
ret

$ cat add.c
int add(int a, int b) {
return a + b;
}

Is the clang version better?

-- 
Thiago Farina


Re: gcc -S vs clang -S

2015-05-11 Thread pinskia




 On May 11, 2015, at 6:16 PM, Thiago Farina tfrans...@gmail.com wrote:
 
 Hi,
 
 Clang 3.7 generated the following code:
 
 $ clang -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables
 add.c -o add_att_x64.s
 
 add:
 pushq   %rbp
 movq%rsp, %rbp
 movl%edi, -4(%rbp)
 movl%esi, -8(%rbp)
 movl-4(%rbp), %esi
 addl-8(%rbp), %esi
 movl%esi, %eax
 popq%rbp
 retq
 
 While gcc 4.8 generated the following:
 
 $ gcc -S -O0 -fno-unwind-tables -fno-asynchronous-unwind-tables add.c
 -o add_att_x64.s
 
 add:
pushq   %rbp
movq%rsp, %rbp
movl%edi, -4(%rbp)
movl%esi, -8(%rbp)
movl-8(%rbp), %eax
movl-4(%rbp), %edx
addl%edx, %eax
popq%rbp
ret
 
 $ cat add.c
 int add(int a, int b) {
return a + b;
 }
 
 Is the clang version better?

Neither is better or worse due to this is at -O0. 

Thanks,
Andrew

 
 -- 
 Thiago Farina