[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

--- Comment #3 from Jan Hubicka  ---
Seems this perofmance difference is still there on zen4
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3

[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

Jan Hubicka  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-01-05
 CC||hubicka at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #2 from Jan Hubicka  ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.

Profile looks comparable:

gcc
  30.96%  cwebplibwebp.so.7.1.5   [.]
GetCombinedEntropyUnre
  26.19%  cwebplibwebp.so.7.1.5   [.] VP8LHashChainFill 
   3.34%  cwebplibwebp.so.7.1.5   [.]
CalculateBestCacheSize
   3.30%  cwebplibwebp.so.7.1.5   [.]
CombinedShannonEntropy
   3.21%  cwebplibwebp.so.7.1.5   [.]
CollectColorBlueTransf

clang:

  34.06%  cwebplibwebp.so.7.1.5[.] GetCombinedEntropy   
  28.95%  cwebplibwebp.so.7.1.5[.] VP8LHashChainFill
   5.37%  cwebplibwebp.so.7.1.5[.]
VP8LGetBackwardReferences
   4.39%  cwebplibwebp.so.7.1.5[.]
CombinedShannonEntropy_SS
   4.28%  cwebplibwebp.so.7.1.5[.]
CollectColorBlueTransform


In the first loop clang seems to ifconvert while GCC doesn't:
  0.59 │   lea  kSLog2Table,%rdi
  3.69 │   vmovss   (%rdi,%rax,4),%xmm0
  0.98 │ 6f:   vcvtsi2ss%edx,%xmm2,%xmm1
  0.63 │   vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
 38.16 │   vmovss   %xmm1,0x0(%r13)
  5.48 │   cmp  %r12d,0xc(%r13)
  0.06 │ ↓ jae  89 
   │   mov  %r12d,0xc(%r13)
  0.99 │ 89:   mov  0x4(%r13),%edi 
  0.96 │ 8d:   xor  %eax,%eax  
  0.40 │   test %r12d,%r12d
  0.60 │   setne%al 



   │   vcvtsd2ss%xmm0,%xmm0,%xmm1   
  0.02 │362:   mov  %r15d,%eax  
  0.57 │   imul %r12d,%eax  
  0.00 │   cmp  %r12d,%r9d  
  0.03 │   cmovbe   %r12d,%r9d  
  0.02 │   vmovd%eax,%xmm0  
  0.08 │   vpinsrd  $0x1,%r15d,%xmm0,%xmm0  
  1.50 │   vpaddd   %xmm0,%xmm4,%xmm4   
  1.08 │   vcvtsi2ss%r15d,%xmm5,%xmm0   
  0.87 │   vfnmadd231ss %xmm0,%xmm1,%xmm3   
  5.40 │   vmovaps  %xmm3,%xmm0 
  0.02 │38c:   xor  %eax,%eax   
  0.16 │   cmp  $0x4,%r15d

[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-01-04 Thread aros at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

--- Comment #1 from Artem S. Tashkinov  ---
That's WebP image encode, Quality 100, highest compression.

Also applies to MTL:
https://www.phoronix.com/review/intel-meteorlake-gcc-clang/3