https://bugs.llvm.org/show_bug.cgi?id=40984

            Bug ID: 40984
           Summary: @llvm.maxnum creates very inefficient code for
                    skylake-avx512
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedb...@nondot.org
          Reporter: schnet...@gmail.com
                CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
                    llvm-...@redking.me.uk, spatel+l...@rotateright.com

@llvm_maxnum with vector arguments creates very inefficient code on AVX512
architectures. It creates good code for AVX2, using the ymm registers. However,
on AVX512, it falls back to using xmm registers instead of zmm registers.

This is an example (see also <https://godbolt.org/z/oGnYqO>):



declare <16 x float> @llvm.maxnum.v16f32(<16 x float>, <16 x float>)

;  @ /home/eschnetter/.julia/packages/SIMD/5ugK9/src/SIMD.jl:1012 within `max'
define void @julia_max_12430({ <16 x float> }* noalias nocapture sret, { <16 x
float> } addrspace(11)* nocapture nonnull readonly dereferenceable(64), { <16 x
float> } addrspace(11)* nocapture nonnull readonly dereferenceable(64)) {
top:
; ┌ @ /home/eschnetter/.julia/packages/SIMD/5ugK9/src/SIMD.jl:538 within
`llvmwrap' @ /home/eschnetter/.julia/packages/SIMD/5ugK9/src/SIMD.jl:538
; │┌ @ /home/eschnetter/.julia/packages/SIMD/5ugK9/src/SIMD.jl:557 within
`macro expansion'
; ││┌ @ sysimg.jl:18 within `getproperty'
     %3 = getelementptr inbounds { <16 x float> }, { <16 x float> }
addrspace(11)* %1, i64 0, i32 0
     %4 = getelementptr inbounds { <16 x float> }, { <16 x float> }
addrspace(11)* %2, i64 0, i32 0
; ││└
    %5 = load <16 x float>, <16 x float> addrspace(11)* %3, align 16
    %6 = load <16 x float>, <16 x float> addrspace(11)* %4, align 16
    %res.i = call <16 x float> @llvm.maxnum.v16f32(<16 x float> %5, <16 x
float> %6)
; └└
  %.sroa.0.0..sroa_idx = getelementptr inbounds { <16 x float> }, { <16 x
float> }* %0, i64 0, i32 0
  store <16 x float> %res.i, <16 x float>* %.sroa.0.0..sroa_idx, align 64
  ret void
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to