Re: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

2016-07-25 Thread Andrea Di Biagio via cfe-commits
Hi Simon (and all),

I noticed that this patch changes the definition of intrinsic _mm_cvtsd2_ss
in emmintrin.h. Is that intentional? My understanding is that your patch
should have only addressed float-to-integer conversions.

Was this change to _mm_cvtsd_ss motivated by the fact that (V)CVTSD2SS
depends on the rounding mode (control bits in the MXCSR register) for
inexact conversions? That would explain why the LLVM part (r275981) also
added a codegen test for that double-to-float conversion (I guess that none
of the reviewer spotted that extra test).

The problem I am seeing is that your change is causing a crash in the
backend (at -O1 and above). See the reproducible below:

/
#include 

__m128 test(__m128 a, const __m128d ) {
  return _mm_cvtsd_ss(a, b);
}
/

Alternatively, here is the LLVM IR:

define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly
%b) {
entry:
  %0 = load <2 x double>, <2 x double>* %b, align 16
  %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x
double> %0)
  ret <4 x float> %1
}

; Function Attrs: nounwind readnone
declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)



With your patch, we now always generate a call to @llvm.x86.sse2.cvtsd2ss
when expanding the builtin call from _mm_cvtsd_ss.

ISel would then select `Int_CVTSD2SSrm` instruction, which however is
`CodeGenOnly`. Unfortunately that pseudo is never further expanded/replaced
before compilation reaches the object emitter stage. So, before we execute
Pass 'X86 Assembly/Object Emitter' we see machine code like this:

BB#0: derived from LLVM BB %entry
Live Ins: %RDI %XMM0
%XMM0 = Int_VCVTSD2SSrm %XMM0, %RDI, 1, %noreg, 0,
%noreg; mem:LD16[%b]
RETQ %XMM0


.. which then causes  the following assertion failure:

[2016-07-25 13:25:11.830351700] 0x7bad5f87e0 Executing Pass 'X86
Assembly / Object Emitter' on Function 'test'...
Cannot encode all operands of:   >


Overall, I agree that the compiler shouldn't make assumptions on the
rounding mode when coverting from double-to-float. However, the RM variant
of Int_VCVTSD2SS doesn't seem to be correctly handled/expanded by the
backend.

Can we revert the change to the double-to-single convert? Alternatively,
can we fix the codegen issue exposed by this change (and add extra test
coverage)?
My opinion is that the change to the double-to-float conversion intrinsic
should have been part of a separate patch.


Thanks,
Andrea

On Fri, Jul 22, 2016 at 2:18 PM, Hans Wennborg via cfe-commits <
cfe-commits@lists.llvm.org> wrote:

> On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits
>  wrote:
> >
> >
> >> -Original Message-
> >> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On
> Behalf Of
> >> Simon Pilgrim via cfe-commits
> >> Sent: Wednesday, July 20, 2016 3:18 AM
> >> To: cfe-commits@lists.llvm.org
> >> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion
> intrinsics
> >> instead of using generic IR
> >>
> >> Author: rksimon
> >> Date: Wed Jul 20 05:18:01 2016
> >> New Revision: 276102
> >>
> >> URL: http://llvm.org/viewvc/llvm-project?rev=276102=rev
> >> Log:
> >> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
> >> generic IR
> >>
> >> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and
> VCVTTPD2DQ
> >> truncating conversions with generic IR instead.
> >>
> >> It turns out that the behaviour of these intrinsics is different enough
> >> from generic IR that this will cause problems, INF/NAN/out of range
> values
> >> are guaranteed to result in a 0x8000 value - which plays havoc with
> >> constant folding which converts them to either zero or UNDEF. This is
> also
> >> an issue with the scalar implementations (which were already generic IR
> >> and what I was trying to match).
> >
> > Are the problems enough that this should be merged to the 3.9 release
> branch?
> > --paulr
>
> IIUC, this is the Clang-side of r275981, and if we merge that this
> should probably be merged too.
>
> Thanks,
> Hans
> ___
> cfe-commits mailing list
> cfe-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

2016-07-22 Thread Hans Wennborg via cfe-commits
On Thu, Jul 21, 2016 at 6:34 PM, Robinson, Paul via cfe-commits
 wrote:
>
>
>> -Original Message-
>> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On Behalf Of
>> Simon Pilgrim via cfe-commits
>> Sent: Wednesday, July 20, 2016 3:18 AM
>> To: cfe-commits@lists.llvm.org
>> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics
>> instead of using generic IR
>>
>> Author: rksimon
>> Date: Wed Jul 20 05:18:01 2016
>> New Revision: 276102
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=276102=rev
>> Log:
>> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
>> generic IR
>>
>> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ
>> truncating conversions with generic IR instead.
>>
>> It turns out that the behaviour of these intrinsics is different enough
>> from generic IR that this will cause problems, INF/NAN/out of range values
>> are guaranteed to result in a 0x8000 value - which plays havoc with
>> constant folding which converts them to either zero or UNDEF. This is also
>> an issue with the scalar implementations (which were already generic IR
>> and what I was trying to match).
>
> Are the problems enough that this should be merged to the 3.9 release branch?
> --paulr

IIUC, this is the Clang-side of r275981, and if we merge that this
should probably be merged too.

Thanks,
Hans
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


RE: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

2016-07-21 Thread Robinson, Paul via cfe-commits


> -Original Message-
> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On Behalf Of
> Simon Pilgrim via cfe-commits
> Sent: Wednesday, July 20, 2016 3:18 AM
> To: cfe-commits@lists.llvm.org
> Subject: r276102 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics
> instead of using generic IR
> 
> Author: rksimon
> Date: Wed Jul 20 05:18:01 2016
> New Revision: 276102
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=276102=rev
> Log:
> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
> generic IR
> 
> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ
> truncating conversions with generic IR instead.
> 
> It turns out that the behaviour of these intrinsics is different enough
> from generic IR that this will cause problems, INF/NAN/out of range values
> are guaranteed to result in a 0x8000 value - which plays havoc with
> constant folding which converts them to either zero or UNDEF. This is also
> an issue with the scalar implementations (which were already generic IR
> and what I was trying to match).

Are the problems enough that this should be merged to the 3.9 release branch?
--paulr

> 
> This patch changes both scalar and packed versions back to using x86-
> specific builtins.
> 
> It also deals with the other scalar conversion cases that are runtime
> rounding mode dependent and can have similar issues with constant folding.
> 
> Differential Revision: https://reviews.llvm.org/D22105
> 
> Modified:
> cfe/trunk/include/clang/Basic/BuiltinsX86.def
> cfe/trunk/lib/Headers/avxintrin.h
> cfe/trunk/lib/Headers/emmintrin.h
> cfe/trunk/lib/Headers/xmmintrin.h
> cfe/trunk/test/CodeGen/avx-builtins.c
> cfe/trunk/test/CodeGen/builtins-x86.c
> cfe/trunk/test/CodeGen/sse-builtins.c
> cfe/trunk/test/CodeGen/sse2-builtins.c
> 
> Modified: cfe/trunk/include/clang/Basic/BuiltinsX86.def
> URL: http://llvm.org/viewvc/llvm-
> project/cfe/trunk/include/clang/Basic/BuiltinsX86.def?rev=276102=276101
> =276102=diff
> ==
> 
> --- cfe/trunk/include/clang/Basic/BuiltinsX86.def (original)
> +++ cfe/trunk/include/clang/Basic/BuiltinsX86.def Wed Jul 20 05:18:01 2016
> @@ -303,7 +303,9 @@ TARGET_BUILTIN(__builtin_ia32_pabsd128,
>  TARGET_BUILTIN(__builtin_ia32_ldmxcsr, "vUi", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_stmxcsr, "Ui", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_cvtss2si, "iV4f", "", "sse")
> +TARGET_BUILTIN(__builtin_ia32_cvttss2si, "iV4f", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_cvtss2si64, "LLiV4f", "", "sse")
> +TARGET_BUILTIN(__builtin_ia32_cvttss2si64, "LLiV4f", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_storehps, "vV2i*V4f", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_storelps, "vV2i*V4f", "", "sse")
>  TARGET_BUILTIN(__builtin_ia32_movmskps, "iV4f", "", "sse")
> @@ -328,8 +330,12 @@ TARGET_BUILTIN(__builtin_ia32_cvtpd2dq,
>  TARGET_BUILTIN(__builtin_ia32_cvtpd2ps, "V4fV2d", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_cvttpd2dq, "V4iV2d", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_cvtsd2si, "iV2d", "", "sse2")
> +TARGET_BUILTIN(__builtin_ia32_cvttsd2si, "iV2d", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_cvtsd2si64, "LLiV2d", "", "sse2")
> +TARGET_BUILTIN(__builtin_ia32_cvttsd2si64, "LLiV2d", "", "sse2")
> +TARGET_BUILTIN(__builtin_ia32_cvtsd2ss, "V4fV4fV2d", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_cvtps2dq, "V4iV4f", "", "sse2")
> +TARGET_BUILTIN(__builtin_ia32_cvttps2dq, "V4iV4f", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_clflush, "vvC*", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_lfence, "v", "", "sse2")
>  TARGET_BUILTIN(__builtin_ia32_mfence, "v", "", "sse2")
> @@ -455,7 +461,9 @@ TARGET_BUILTIN(__builtin_ia32_cmpss, "V4
>  TARGET_BUILTIN(__builtin_ia32_cvtdq2ps256, "V8fV8i", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_cvtpd2ps256, "V4fV4d", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_cvtps2dq256, "V8iV8f", "", "avx")
> +TARGET_BUILTIN(__builtin_ia32_cvttpd2dq256, "V4iV4d", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_cvtpd2dq256, "V4iV4d", "", "avx")
> +TARGET_BUILTIN(__builtin_ia32_cvttps2dq256, "V8iV8f", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_vperm2f128_pd256, "V4dV4dV4dIc", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")
>  TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")
> 
> Modified: cfe/trunk/lib/Headers/avxintrin.h
> URL: http://llvm.org/viewvc/llvm-
> project/cfe/trunk/lib/Headers/avxintrin.h?rev=276102=276101=276102
> iew=diff
> ==
> 
> --- cfe/trunk/lib/Headers/avxintrin.h (original)
> +++ cfe/trunk/lib/Headers/avxintrin.h Wed Jul 20 05:18:01 2016
> @@ -2117,7 +2117,7 @@ _mm256_cvtps_pd(__m128 __a)
>  static __inline __m128i __DEFAULT_FN_ATTRS
>  _mm256_cvttpd_epi32(__m256d __a)
>  {
> -  return (__m128i)__builtin_convertvector((__v4df) __a,