Re: [PATCH, x86, testsuite, AVX-512] Fix initialization in 4 tests for shuffles.
On Thu, Mar 27, 2014 at 10:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Straightforward patch in the bottom fixes copy-and-paste problem in initialization part of tests. Updated tests pass on simulator. Is it ok for trunk? gcc/testsuite: * gcc.target/i386/avx512f-vshuff32x4-2.c: Fix initialization of second source operand. * gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto. * gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto. * gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto. OK. Thanks, Uros.
Re: [PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)
On Fri, Mar 28, 2014 at 4:19 PM, Jakub Jelinek ja...@redhat.com wrote: Before ix86_copy_addr_to_reg has been added, we've been using copy_addr_to_reg, which handles VOIDmode values just fine. But this new function just ICEs on those. As the function has been added for adding SUBREGs to TLS addresses, those will never retunring CONST_INTs, so just using copy_addr_to_reg is IMHO the right thing and restores previous behavior. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-28 Jakub Jelinek ja...@redhat.com PR target/60693 * config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg also if addr has VOIDmode. * gcc.target/i386/pr60693.c: New test. OK. Thanks, Uros.
Re: various _mm512_set* intrinsics
Hello! Here are more intrinsics that are missing. I know that gcc currently generates horrible code for most of them but I think it's more important to have the API in place, albeit non-optimal. Maybe this entices some one to add the necessary optimizations. I agree that having non-optimal implementation is better than nothing. The code is self-contained and shouldn't interfere with any correct code. Should this also go into 4.9? 2014-03-27 Ulrich Drepper drep...@gmail.com * config/i386/avx512fintrin.h (__v32hi): Define type. (__v64qi): Likewise. (_mm512_set1_epi8): Define. (_mm512_set1_epi16): Define. (_mm512_set4_epi32): Define. (_mm512_set4_epi64): Define. (_mm512_set4_pd): Define. (_mm512_set4_ps): Define. (_mm512_setr4_epi64): Define. (_mm512_setr4_epi32): Define. (_mm512_setr4_pd): Define. (_mm512_setr4_ps): Define. (_mm512_setzero_epi32): Define. This is OK for mainline, but please wait for Kirill's review of the intrinsics. Thanks, Uros.
Re: Fix various x86 tests for --with-arch=bdver3
On Fri, Mar 28, 2014 at 10:46 PM, Joseph S. Myers jos...@codesourcery.com wrote: If you build an x86_64 toolchain with --with-arch enabling various instruction set extensions by default, this causes some tests to fail that aren't expecting those extensions to be enabled. This patch fixes various tests failing like that for an x86_64-linux-gnu toolchain configured --with-arch=bdver3, generally by using appropriate -mno-* options in the tests, or in the case of gcc.dg/pr45416.c by adjusting the scan-assembler to allow the alternative instruction that gets used in this case. It's quite likely other such failures appear for other --with-arch choices. Tested x86_64-linux-gnu. OK to commit? In addition to the failures fixed by this patch, there are many gcc.dg/vect tests where having additional vector extensions enabled breaks their expectations; I'm not sure of the best way to handle those. And you get FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors) which are assembler errors such as operand type mismatch for `vfmaddpd' - it looks like the compiler isn't really prepared for the -mavx512f -mfma4 combination, but I'm not sure what the best way to handle it is (producing invalid output doesn't seem right, however). I will look into these. If you test with -march=bdver3 in the multilib options (runtest --target_board=unix/-march=bdver3) rather than as the configured default, you get extra failures for the usual reason of multilib options going after the options from dg-options (which I propose to address in the usual way using dg-skip-if for -march= options different from the one present in dg-options). 2014-03-28 Joseph Myers jos...@codesourcery.com * gcc.dg/pr45416.c: Allow bextr on x86. * gcc.target/i386/fma4-builtin.c, gcc.target/i386/fma4-fma-2.c, gcc.target/i386/fma4-fma.c, gcc.target/i386/fma4-vector-2.c, gcc.target/i386/fma4-vector.c: Use -mno-fma. * gcc.target/i386/l_fma_double_1.c, gcc.target/i386/l_fma_double_2.c, gcc.target/i386/l_fma_double_3.c, gcc.target/i386/l_fma_double_4.c, gcc.target/i386/l_fma_double_5.c, gcc.target/i386/l_fma_double_6.c, gcc.target/i386/l_fma_float_1.c, gcc.target/i386/l_fma_float_2.c, gcc.target/i386/l_fma_float_3.c, gcc.target/i386/l_fma_float_4.c, gcc.target/i386/l_fma_float_5.c, gcc.target/i386/l_fma_float_6.c: Use -mno-fma4. * gcc.target/i386/pr27971.c: Use -mno-tbm. * gcc.target/i386/pr42542-4a.c: Use -mno-avx. * gcc.target/i386/pr59390.c: Use -mno-fma -mno-fma4. OK. Thanks, Uros.
Re: Fix various x86 tests for --with-arch=bdver3
On Fri, Mar 28, 2014 at 10:46 PM, Joseph S. Myers jos...@codesourcery.com wrote: In addition to the failures fixed by this patch, there are many gcc.dg/vect tests where having additional vector extensions enabled breaks their expectations; I'm not sure of the best way to handle those. And you get FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors) which are assembler errors such as operand type mismatch for `vfmaddpd' - it looks like the compiler isn't really prepared for the -mavx512f -mfma4 combination, but I'm not sure what the best way to handle it is (producing invalid output doesn't seem right, however). Attached patch splits AVX512F modes out of existing FMA patterns. These modes are not supported by patterns that also support FMA4 insns. 2014-03-30 Uros Bizjak ubiz...@gmail.com * config/i386/sse.md (FMAMODE_NOVF512): New mode iterator. (sd_mask_codeforfma_fmadd_modesd_maskz_nameround_name): Split out sd_mask_codeforfma_fmadd_VF_512:modesd_maskz_nameround_name. Use FMAMODE_NOVF512 mode iterator. (sd_mask_codeforfma_fmsub_modesd_maskz_nameround_name): Ditto. (sd_mask_codeforfma_fnmadd_modesd_maskz_nameround_name): Ditto. (sd_mask_codeforfma_fnmsub_modesd_maskz_nameround_name): Ditto. (sd_mask_codeforfma_fmaddsub_modesd_maskz_nameround_name): Split out sd_mask_codeforfma_fmaddsub_VF_512:modesd_maskz_nameround_name. Use VF_128_256 mode iterator. (sd_mask_codeforfma_fmsubadd_modesd_maskz_nameround_name): Ditto. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. I have also eyeballed the asm - no FMA4 insn was emitted for AVX512F modes. Committed to mainline SVN. Uros. Index: config/i386/sse.md === --- config/i386/sse.md (revision 208944) +++ config/i386/sse.md (working copy) @@ -2712,8 +2712,7 @@ (fma:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand) (match_operand:FMAMODEM 2 nonimmediate_operand) - (match_operand:FMAMODEM 3 nonimmediate_operand)))] - ) + (match_operand:FMAMODEM 3 nonimmediate_operand)))]) (define_expand fmsmode4 [(set (match_operand:FMAMODEM 0 register_operand) @@ -2720,8 +2719,7 @@ (fma:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand) (match_operand:FMAMODEM 2 nonimmediate_operand) - (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand] - ) + (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand]) (define_expand fnmamode4 [(set (match_operand:FMAMODEM 0 register_operand) @@ -2728,8 +2726,7 @@ (fma:FMAMODEM (neg:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand)) (match_operand:FMAMODEM 2 nonimmediate_operand) - (match_operand:FMAMODEM 3 nonimmediate_operand)))] - ) + (match_operand:FMAMODEM 3 nonimmediate_operand)))]) (define_expand fnmsmode4 [(set (match_operand:FMAMODEM 0 register_operand) @@ -2736,18 +2733,18 @@ (fma:FMAMODEM (neg:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand)) (match_operand:FMAMODEM 2 nonimmediate_operand) - (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand] - ) + (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand]) ;; The builtins for intrinsics are not constrained by SSE math enabled. -(define_mode_iterator FMAMODE [(SF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) - (DF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) - (V4SF TARGET_FMA || TARGET_FMA4) - (V2DF TARGET_FMA || TARGET_FMA4) - (V8SF TARGET_FMA || TARGET_FMA4) - (V4DF TARGET_FMA || TARGET_FMA4) - (V16SF TARGET_AVX512F) - (V8DF TARGET_AVX512F)]) +(define_mode_iterator FMAMODE + [(SF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) + (DF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F) + (V4SF TARGET_FMA || TARGET_FMA4) + (V2DF TARGET_FMA || TARGET_FMA4
Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers jos...@codesourcery.com wrote: When I fixed various tests in http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html for failures with --with-arch=bdver3, I missed that a so-configured compiler still defaults to -mtune=generic. If you override that as well with --with-cpu=bdver3, further failures appear, and this patch fixes some of them. Most of these changes add -mno-prefer-avx128 to AVX tests not expecting a -mprefer-avx128 default. In addition, some tests have -mtune=generic added where the behavior tested for depends on some tuning parameter that I identified: X86_TUNE_EXT_80387_CONSTANTS or X86_TUNE_SSE_LOAD0_BY_PXOR. Tested x86_64-linux-gnu. OK to commit? There are other failures this patch does not resolve in a --with-arch=bdver3 --with-cpu=bdver3 configuration. Some of these are AVX tests whose failures are not resolved by adding -mno-prefer-avx128 (and so this patch does not add -mno-prefer-avx128 to those tests); others may be cases where -mtune=generic is appropriate but I haven't identified the specific tuning parameter that shows code generation differences depending on tuning are correct and so a -mtune= option should be used. FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ \\t]+[^\n]*%ymm[0-9] 1 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal) FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\t]+%zmm 2 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1 FAIL:
Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers jos...@codesourcery.com wrote: There are other failures this patch does not resolve in a --with-arch=bdver3 --with-cpu=bdver3 configuration. Some of these are AVX tests whose failures are not resolved by adding -mno-prefer-avx128 (and so this patch does not add -mno-prefer-avx128 to those tests); others may be cases where -mtune=generic is appropriate but I haven't identified the specific tuning parameter that shows code generation differences depending on tuning are correct and so a -mtune= option should be used. FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ \\t]+[^\n]*%ymm[0-9] 1 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ \\t]+[^\n]*%ymm[0-9] FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal) FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\t]+%zmm 2 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9][^{] 4 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9][^{] 3 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1 FAIL: gcc.target/i386/pr59539-2.c scan-assembler-times vmovdqu 1 These are due to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL tuning flag. Currently, this flag applies to all vector sizes (128, 256 and 512 bits), but I guess it is effective only for 128 bit sizes. Can you please review usage of this flag in i386/sse.md? Thanks, Uros.
Re: Skip some gcc.target/i386 tests for conflicting -march= options
On Wed, Apr 2, 2014 at 6:36 PM, Joseph S. Myers jos...@codesourcery.com wrote: If you test an x86_64 toolchain with -march=bdver3 in the multilib options, as noted in http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html various test failures arise from tests whose own -march= in dg-options is overridden. This patch adds dg-skip-if to those tests to skip them for conflicting -march= options, as has been done before for other tests (obviously, if the option ordering is changed in future in DejaGnu, such skips may become obsolete or could be conditioned on DejaGnu version). (No doubt other -march= options would show up further tests needing such changes.) Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.com * gcc.target/i386/funcspec-2.c, gcc.target/i386/funcspec-3.c, gcc.target/i386/funcspec-9.c, gcc.target/i386/isa-1.c, gcc.target/i386/memcpy-strategy-1.c, gcc.target/i386/memcpy-strategy-2.c, gcc.target/i386/memcpy-vector_loop-1.c, gcc.target/i386/memcpy-vector_loop-2.c, gcc.target/i386/memset-vector_loop-1.c, gcc.target/i386/memset-vector_loop-2.c, gcc.target/i386/sse2-init-v2di-2.c, gcc.target/i386/ssetype-1.c, gcc.target/i386/ssetype-2.c, gcc.target/i386/ssetype-5.c: Skip for -march= options different from those in dg-options. OK. Thanks, Uros.
Re: Use -mno-prefer-avx128 in two more tests
On Wed, Apr 2, 2014 at 10:09 PM, Joseph S. Myers jos...@codesourcery.com wrote: Two of the tests I noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00036.html did not get fixed for --with-arch=bdver3 --with-cpu=bdver3 by adding -mno-prefer-avx128 in fact also show failures for --with-arch=btver2 --with-tune=btver2, and in that case *are* fixed by adding -mno-prefer-avx128. Thus, while in those cases there may still be other tuning issues as noted in http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00052.html (btver2 doesn't enable the flag in question) I think it *is* correct to use -mno-prefer-avx128 for these two tests, and this patch adds it. Tested x86_64-linux-gnu. OK to commit? 2014-04-02 Joseph Myers jos...@codesourcery.cmo * gcc.target/i386/avx2-vpand-3.c, gcc.target/i386/avx256-unaligned-load-2.c: Use -mno-prefer-avx128. OK. Thanks, Uros.
Re: PATCH: PR target/60827: Inconsistent optimize_function_for_speed_p in in *fixuns_truncmode_1
On Fri, Apr 11, 2014 at 10:16 PM, H.J. Lu hongjiu...@intel.com wrote: Since fixuns_truncmodesi2 expander checks optimize_insn_for_size_p before generating *fixuns_truncmode_1, we should use optimize_insn_for_speed_p in *fixuns_truncmode_1 for consistency. OK for trunk? Thanks. H.J. --- 2014-04-11 H.J. Lu hongjiu...@intel.com PR target/60827 * config/i386/i386.md (*fixuns_truncmode_1): Check optimize_insn_for_speed_p instead of optimize_function_for_speed_p. It looks to me that many, if not all optimize_function_for_{speed,size}_p predicates in .md files should be converted to corresponding optimize_insn_for_*_p predicates. The later predicates apply to BBs, so IMO insn sequences should be handled according to BB frequencies, not function frequencies. The patch is OK for mainline. Thanks, Uros.
Re: PATCH: PR target/60827: Inconsistent optimize_function_for_speed_p in in *fixuns_truncmode_1
On Mon, Apr 14, 2014 at 6:49 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Apr 11, 2014 at 10:16 PM, H.J. Lu hongjiu...@intel.com wrote: Since fixuns_truncmodesi2 expander checks optimize_insn_for_size_p before generating *fixuns_truncmode_1, we should use optimize_insn_for_speed_p in *fixuns_truncmode_1 for consistency. OK for trunk? Thanks. H.J. --- 2014-04-11 H.J. Lu hongjiu...@intel.com PR target/60827 * config/i386/i386.md (*fixuns_truncmode_1): Check optimize_insn_for_speed_p instead of optimize_function_for_speed_p. It looks to me that many, if not all optimize_function_for_{speed,size}_p predicates in .md files should be converted to corresponding optimize_insn_for_*_p predicates. The later predicates apply to BBs, so IMO insn sequences should be handled according to BB frequencies, not function frequencies. You can not convert all predicates, only those in expanders. The predicates in insn templates must be consistent thorough the compilation since the insn may come from hot BB to cold BB and you do not want it to become unrecognizable. Ops, thanks for sharing this. Based on this explanation, the patch isn't correct. H.J., please revert it. Thanks, Uros.
[PATCH, i386]: Some classify_argument and return_in_memory cleanups
Hello! Attached patch changes return type of classify_argument to bool and merges a couple of called-once functions to their call sites. The later change removes a bunch of functions, declared with ATTRIBUTE_UNUSED. 2014-04-14 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (examine_argument): Return bool. Return true if parameter should be passed in memory. case X86_64_COMPLEX_X87_CLASS: Adjust. (construct_container): Update calls to examine_argument. (function_arg_advance_64): Ditto. (return_in_memory_32): Merge with ix86_return_in_memory. (return_in_memory_64): Ditto. (return_in_memory_ms_64): Ditto. Bootstrapped and regtested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 209348) +++ config/i386/i386.c (working copy) @@ -6806,8 +6806,9 @@ classify_argument (enum machine_mode mode, const_t } /* Examine the argument and return set number of register required in each - class. Return 0 iff parameter should be passed in memory. */ -static int + class. Return true iff parameter should be passed in memory. */ + +static bool examine_argument (enum machine_mode mode, const_tree type, int in_return, int *int_nregs, int *sse_nregs) { @@ -6816,8 +6817,9 @@ examine_argument (enum machine_mode mode, const_tr *int_nregs = 0; *sse_nregs = 0; + if (!n) -return 0; +return true; for (n--; n = 0; n--) switch (regclass[n]) { @@ -6835,15 +6837,15 @@ examine_argument (enum machine_mode mode, const_tr break; case X86_64_X87_CLASS: case X86_64_X87UP_CLASS: + case X86_64_COMPLEX_X87_CLASS: if (!in_return) - return 0; + return true; break; - case X86_64_COMPLEX_X87_CLASS: - return in_return ? 2 : 0; case X86_64_MEMORY_CLASS: gcc_unreachable (); } - return 1; + + return false; } /* Construct container for the argument used by GCC interface. See @@ -6873,8 +6875,8 @@ construct_container (enum machine_mode mode, enum n = classify_argument (mode, type, regclass, 0); if (!n) return NULL; - if (!examine_argument (mode, type, in_return, needed_intregs, -needed_sseregs)) + if (examine_argument (mode, type, in_return, needed_intregs, + needed_sseregs)) return NULL; if (needed_intregs nintregs || needed_sseregs nsseregs) return NULL; @@ -7193,7 +7195,7 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, enu || VALID_AVX256_REG_MODE (mode))) return; - if (examine_argument (mode, type, 0, int_nregs, sse_nregs) + if (!examine_argument (mode, type, 0, int_nregs, sse_nregs) sse_nregs = cum-sse_nregs int_nregs = cum-nregs) { cum-nregs -= int_nregs; @@ -7988,95 +7990,87 @@ ix86_libcall_value (enum machine_mode mode) /* Return true iff type is returned in memory. */ -static bool ATTRIBUTE_UNUSED -return_in_memory_32 (const_tree type, enum machine_mode mode) +static bool +ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) { +#ifdef SUBTARGET_RETURN_IN_MEMORY + return SUBTARGET_RETURN_IN_MEMORY (type, fntype); +#else + const enum machine_mode mode = type_natural_mode (type, NULL, true); HOST_WIDE_INT size; - if (mode == BLKmode) -return true; + if (TARGET_64BIT) +{ + if (ix86_function_type_abi (fntype) == MS_ABI) + { + size = int_size_in_bytes (type); - size = int_size_in_bytes (type); + /* __m128 is returned in xmm0. */ + if ((!type || VECTOR_INTEGER_TYPE_P (type) + || INTEGRAL_TYPE_P (type) + || VECTOR_FLOAT_TYPE_P (type)) + (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)) + !COMPLEX_MODE_P (mode) + (GET_MODE_SIZE (mode) == 16 || size == 16)) + return false; - if (MS_AGGREGATE_RETURN AGGREGATE_TYPE_P (type) size = 8) -return false; + /* Otherwise, the size must be exactly in [1248]. */ + return size != 1 size != 2 size != 4 size != 8; + } + else + { + int needed_intregs, needed_sseregs; - if (VECTOR_MODE_P (mode) || mode == TImode) + return examine_argument (mode, type, 1, + needed_intregs, needed_sseregs); + } +} + else { - /* User-created vectors small enough to fit in EAX. */ - if (size 8) - return false; + if (mode == BLKmode) + return true; - /* MMX/3dNow values are returned in MM0, -except when it doesn't exits or the ABI prescribes otherwise. */ - if (size == 8) - return !TARGET_MMX || TARGET_VECT8_RETURNS; + size = int_size_in_bytes (type); - /* SSE values are returned in XMM0, except when it doesn't exist. */ - if (size
Re: [build] Correctly detect native TLS support with 64-bit gas on Solaris/x86 (PR target/60817)
On Tue, Apr 15, 2014 at 5:21 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: As reported in the PR, gcc/configure currently fails to detect native TLS support on x86_64-*-solaris2* with a 64-bit gas since it feeds it 32-bit TLS code. I haden't noticed this so far since I've been using a 32-bit gas here (no idea why). The following patch fixes this by making sure 64-bit code is both used for 64-bit-default configurations and the necessary assembler flags passed. I've chosen to merge the i?86 and x86_64 cases to avoid duplicating considerable amounts of code. When using the native Solaris assembler, the relocs need to be in lower case as already done for 32-bit. Tested by configuring for x86_64-pc-solaris2.11 with 32-bit gas, 64-bit gas, /bin/as, i386-pc-solaris2.11 with 32-bit gas and /bin/as, x86_64-unknown-linux-gnu, and i686-unknown-linux-gnu and checking that native TLS support is detected correctly. Ok for mainline or should I rather bootstrap the change on a couple of those configurations? Thanks. Rainer 2014-04-15 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/60817 * configure.ac (set_have_as_tls): Merge i[34567]86-*-* and x86_64-*-* cases. Pass necessary as flags on 64-bit Solaris/x86. Use lowercase relocs for x86_64-*-*. * configure: Regenerate. OK. Thanks, Uros.
Re: [PATCH 1/3, x86] X86 Silvermont vector cost model tune
On Tue, Apr 15, 2014 at 6:06 PM, Evgeny Stupachenko evstu...@gmail.com wrote: I've separated the patch into 3. The patch passes x86 bootstrap. 1st part: 2014-04-15 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (slm_cost): Fixing vec_to_scalar_cost for Silvermont according latency table. ... : Adjust vec_to_scalar_cost. (intel_cost): Ditto. OK for mainline with the above ChangeLog fix. Thanks, Uros.
Re: [PATCH 2/3, x86] X86 Silvermont vector cost model tune
On Tue, Apr 15, 2014 at 6:08 PM, Evgeny Stupachenko evstu...@gmail.com wrote: 2d part: 2014-04-15 Evgeny Stupachenko evstu...@gmail.com * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte shuffle on some x86 architectures. ... (X86_TUNE_SLOW_PSHUFB): New tune definition. Typo: TARGET_SLOW_PHUFFB - TARGET_SLOW_PSHUFB. * config/i386/i386.h (TARGET_SLOW_PHUFFB): Ditto. ... : New tune flag. * config/i386/i386.c (expand_vec_perm_even_odd_1): Avoid byte shuffles in architectures where they are slow (TARGET_SLOW_PHUFFB). ...: Avoid byte shuffles for TARGET_SLOW_PSHUFB. OK for mainline with the above ChangeLog modifications. Thanks, Uros.
Re: [PATCH 3/3, x86] X86 Silvermont vector cost model tune
On Tue, Apr 15, 2014 at 6:12 PM, Evgeny Stupachenko evstu...@gmail.com wrote: 3d part: 2014-04-15 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (x86_add_stmt_cost): Fixing vector cost model for Silvermont. ... : Fix vector cost ... OK for mainline with the above ChangeLog fix. Thanks, Uros.
Re: [PATCH 3/3, x86] X86 Silvermont vector cost model tune
On Wed, Apr 16, 2014 at 4:31 PM, Evgeny Stupachenko evstu...@gmail.com wrote: For the 3d part of the patch there was a misprint in estimated constant. It should be 1.7 instead of 1.8. - retval = (retval * 18) / 10; + retval = (retval * 17) / 10; Bootstarp passed. The change is also OK. BTW: trivial patch adjustments like this do not need re-approvals. The message to the ML should be enough. Uros.
Re: Remove obsolete Solaris 9 support
On Wed, Apr 16, 2014 at 1:16 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Now that 4.9 has branched, it's time to actually remove the obsolete Solaris 9 configuration. Most of this is just legwork and falls under my Solaris maintainership. A couple of questions, though: * Uros: I'm removing all sse_os_support() checks from the testsuite. Solaris 9 was the only consumer, so it seems best to do away with it. This is OK, but please leave sse-os-check.h (and corresponding sse_os_support calls) in the testsuite. Just remove the Solaris 9 specific code from sse-os-check.h and always return 1, perhaps with the comment that all currently supported OSes support SSE instructions. Uros.
Re: Patch ping
On Wed, Apr 16, 2014 at 11:35 PM, Jeff Law l...@redhat.com wrote: I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html - Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than memory load after optimization (I'd like to keep the current MODE_SIZE patch for the reasons mentioned there, but also add this patch) This is fine. Per the follow-up discussion, I think you can mark it was resolving 36109 as well. http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html - PR target/59617 handle gather loads for AVX512 (at least non-masked ones, masked ones will need to wait for 5.0 and we need to find how to represent it in GIMPLE) I'll leave this to Uros :-) IIRC, this patch was already committed to 4.9 some time ago. Uros.
Re: [PATCH, x86] merge movsd/movhpd pair in peephole
On Mon, Apr 21, 2014 at 8:00 PM, Wei Mi w...@google.com wrote: llvm will merge movsd/movhpd to movupd while gcc will not. The merge is beneficial on x86 machines starting from Nehalem. The patch is to add the merging in peephole. bootstrap and regression pass. Is it ok for stage1? Let's wait for a generic pass, as proposed by Bin. I think that this pass will render peephole2 approach obsolete. Uros.
[PATCH, i386]: Fix PR/60909, ICE with -mrdrnd and __builtin_ia32_rdrand32_step
Hello! Attached patch fixes PR 60909, where memory operand was used as a target RTX of a CMOVE insn, leading to unrecognized insn. Similar problem was found with rdseed insn, where memory operand was used as an invalid target of a ZERO_EXTEND insn. Attached patch fixes both occurences. 2014-04-21 Uros Bizjak ubiz...@gmail.com PR target/60909 * config/i386/i386.c (ix86_expand_builtin) case IX86_BUILTIN_RDRAND{16,32,64}_STEP: Use temporary register for target RTX. case IX86_BUILTIN_RDSEED{16,32,64}_STEP: Ditto. Testsuite/ChangeLog: 2014-04-21 Uros Bizjak ubiz...@gmail.com PR target/60909 * gcc.target/i386/pr60909-1.c: New test. * gcc.target/i386/pr60909-2.c: Ditto. Patch was committed to mainline and will be committed to other release branches after 4.9 is released. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 209544) +++ config/i386/i386.c (working copy) @@ -35400,7 +35400,8 @@ rdrand_step: else op2 = gen_rtx_SUBREG (SImode, op0, 0); - if (target == 0) + if (target == 0 + || !register_operand (target, SImode)) target = gen_reg_rtx (SImode); pat = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG), @@ -35442,7 +35443,8 @@ rdseed_step: const0_rtx); emit_insn (gen_rtx_SET (VOIDmode, op2, pat)); - if (target == 0) + if (target == 0 + || !register_operand (target, SImode)) target = gen_reg_rtx (SImode); emit_insn (gen_zero_extendqisi2 (target, op2)); Index: testsuite/gcc.target/i386/pr60909-1.c === --- testsuite/gcc.target/i386/pr60909-1.c (revision 0) +++ testsuite/gcc.target/i386/pr60909-1.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -mrdrnd } */ + +extern void bar (int); + +void +foo (unsigned *u) +{ + int i = __builtin_ia32_rdrand32_step (u); + bar (i); +} Index: testsuite/gcc.target/i386/pr60909-2.c === --- testsuite/gcc.target/i386/pr60909-2.c (revision 0) +++ testsuite/gcc.target/i386/pr60909-2.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -mrdseed } */ + +extern void bar (int); + +void +foo (unsigned *u) +{ + int i = __builtin_ia32_rdseed_si_step (u); + bar (i); +}
Re: Remove obsolete Solaris 9 support
On Tue, Apr 22, 2014 at 2:35 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Uros Bizjak ubiz...@gmail.com writes: On Wed, Apr 16, 2014 at 1:16 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Now that 4.9 has branched, it's time to actually remove the obsolete Solaris 9 configuration. Most of this is just legwork and falls under my Solaris maintainership. A couple of questions, though: * Uros: I'm removing all sse_os_support() checks from the testsuite. Solaris 9 was the only consumer, so it seems best to do away with it. This is OK, but please leave sse-os-check.h (and corresponding sse_os_support calls) in the testsuite. Just remove the Solaris 9 specific code from sse-os-check.h and always return 1, perhaps with the comment that all currently supported OSes support SSE instructions. Here's the final patch I've checked in, incorporating all review comments. I've left out the libgo (already checked in by Ian) and classpath parts. It looks to me that one part was left in libgcc/config/i386/crtfastmath.c: #if !defined __x86_64__ defined __sun__ defined __svr4__ #include signal.h #include ucontext.h ... #endif
Re: [i386] define __SIZEOF_FLOAT128__
On Thu, Apr 24, 2014 at 7:35 AM, Marc Glisse marc.gli...@inria.fr wrote: (Adding an i386 maintainer in Cc) http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00620.html On Sun, 13 Apr 2014, Marc Glisse wrote: Hello, some people like having a macro to test if a type is available (__SIZEOF_INT128__ for instance). This adds macros for __float80 and __float128. The types seem to be always available, so I didn't add any condition. If you think this is a bad idea, please close the PR. Bootstrap+testsuite on x86_64-linux-gnu. 2014-04-13 Marc Glisse marc.gli...@inria.fr PR preprocessor/56540 * config/i386/i386-c.c (ix86_target_macros): Define __SIZEOF_FLOAT80__ and __SIZEOF_FLOAT128__. For __SIZEOF_FLOAT80__, you should check TARGET_128BIT_LONG_DOUBLE instead of TARGET_64BIT. Good point, thanks! It now matches i386-modes.def. Is this version (same changelog) ok? A couple of extra defines won't hurt, and maybe they will be useful to someone. So, if there are no objections in the next 24h, the patch is OK for mainline. Thanks, Uros.
[PATCH, testsuite]: Require vect_simd_clones effective target for c-c++-common/gomp/pr60823-2.c
Hello! This is a runtime test, so check if we are able to at least compile the source. 2014-04-25 Uros Bizjak ubiz...@gmail.com * c-c++-common/gomp/pr60823-2.c: Require effective target vect_simd_clones. Tested on x86_64 CentOS 5.10. OK for mainline? Uros. Index: c-c++-common/gomp/pr60823-2.c === --- c-c++-common/gomp/pr60823-2.c (revision 209778) +++ c-c++-common/gomp/pr60823-2.c (working copy) @@ -1,5 +1,6 @@ /* PR tree-optimization/60823 */ /* { dg-do run } */ +/* { dg-require-effective-target vect_simd_clones } */ /* { dg-options -O2 -fopenmp-simd } */ #pragma omp declare simd simdlen(4) notinbranch
Re: [COMMITTED] Fix debug/60438 -- i686 stack vs fp operations
On Sat, Apr 26, 2014 at 11:27 AM, Tom de Vries tom_devr...@mentor.com wrote: On 13-03-14 21:49, Richard Henderson wrote: (define_expand ldexpxf3 - [(set (match_dup 3) - (float:XF (match_operand:SI 2 register_operand))) - (parallel [(set (match_operand:XF 0 register_operand) - (unspec:XF [(match_operand:XF 1 register_operand) - (match_dup 3)] - UNSPEC_FSCALE_FRACT)) - (set (match_dup 4) - (unspec:XF [(match_dup 1) (match_dup 3)] - UNSPEC_FSCALE_EXP))])] + [(match_operand:XF 0 register_operand) + (match_operand:XF 1 register_operand) + (match_operand:SI 2 register_operand)] TARGET_USE_FANCY_MATH_387 flag_unsafe_math_optimizations { @@ -14808,6 +14633,11 @@ operands[3] = gen_reg_rtx (XFmode); operands[4] = gen_reg_rtx (XFmode); + + emit_insn (gen_floatsixf2 (operands[3], operands[2])); + emit_insn (gen_fscalexf4_i387 (operands[0], operands[4], + operands[1], operands[3])); + DONE; }) Richard, For a non-bootstrap x86_64 build, gcc.dg/builtins-34.c fails for me with a sigsegv. I've traced it back to this code in insn-emit.c: ... rtx gen_ldexpxf3 (rtx operand0, rtx operand1, rtx operand2) { rtx _val = 0; start_sequence (); { rtx operands[3]; operands[0] = operand0; operands[1] = operand1; operands[2] = operand2; { if (optimize_insn_for_size_p ()) FAIL; operands[3] = gen_reg_rtx (XFmode); operands[4] = gen_reg_rtx (XFmode); ... operands is declared with size 3, and operands[3,4] accesses are out of bounds. I've done a minimal build with attached patch, and reran the test-case, which passes now. OK if bootstrap succeeds? 2014-04-26 Tom de Vries t...@codesourcery.com * config/i386/i386.md (define_expand ldexpxf3): Fix out-of-bounds array accesses. OK for mainline and 4.9 branch. Thanks, Uros.
[PATCH, testsuite]: Clean dump files
Hello! 2014-04-26 Uros Bizjak ubiz...@gmail.com * gcc.dg/tree-ssa/alias-30.c (dg-options): Dump only fre1 details. * gcc.dg/vect/pr60505.c: Cleanup vect tree dump. * g++.dg/ipa/devirt-27.C (dg-options): Remove -fdump-ipa-devirt. Committed to mainline and 4.9 branch. Uros. Index: gcc.dg/tree-ssa/alias-30.c === --- gcc.dg/tree-ssa/alias-30.c (revision 209806) +++ gcc.dg/tree-ssa/alias-30.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fdump-tree-fre-details } */ +/* { dg-options -O -fdump-tree-fre1-details } */ extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, __SIZE_TYPE__ size); Index: gcc.dg/vect/pr60505.c === --- gcc.dg/vect/pr60505.c (revision 209806) +++ gcc.dg/vect/pr60505.c (working copy) @@ -10,3 +10,5 @@ out[i] = (ovec[i] = in[i]); out[num] = ovec[num/2]; } + +/* { dg-final { cleanup-tree-dump vect } } */ Index: g++.dg/ipa/devirt-27.C === --- g++.dg/ipa/devirt-27.C (revision 209806) +++ g++.dg/ipa/devirt-27.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O3 -fdump-ipa-devirt -fdump-tree-optimized } */ +/* { dg-options -O3 -fdump-tree-optimized } */ struct A { int a;
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote: Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? None of these patches have correct ChangeLog entries. Please follow the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting Patches section), otherwise your patches will be simply ignored. I attached the patch which combined your two patches and the fix in legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, the code looked fine. Do you think it is ok? Thanks, Wei. Either pr58066-3.patch or pr58066-4.patch looks good to me. pr58066-4 patch is definitely not OK. I wonder, how it works at all, since you can't split the insn to the same pattern. The generic code detects this condition and forces ICE (IIRC: this is the reason for UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1). From pr58066-3 patch: -;; Local dynamic of a single variable is a lose. Show combine how -;; to convert that back to global dynamic. - -(define_insn_and_split *tls_local_dynamic_32_once - [(set (match_operand:SI 0 register_operand =a) -(plus:SI - (unspec:SI [(match_operand:SI 1 register_operand b) - (match_operand 2 constant_call_address_operand z)] -UNSPEC_TLS_LD_BASE) - (const:SI (unspec:SI -[(match_operand 3 tls_symbolic_operand)] -UNSPEC_DTPOFF - (clobber (match_scratch:SI 4 =d)) - (clobber (match_scratch:SI 5 =c)) - (clobber (reg:CC FLAGS_REG))] - - # - - [(parallel - [(set (match_dup 0) - (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)] - UNSPEC_TLS_GD)) - (clobber (match_dup 4)) - (clobber (match_dup 5)) - (clobber (reg:CC FLAGS_REG))])]) Why did you remove this splitter? Please do not write: +{ + ix86_tls_descriptor_calls_expanded_in_cfun = true; +}) but use a short form: + ix86_tls_descriptor_calls_expanded_in_cfun = true;) Please also add a testcase (from one of the previous mails): --- testsuite/gcc.dg/pr58066.c (revision 0) +++ testsuite/gcc.dg/pr58066.c (revision 0) Put this test to gcc.target/i386 directory ... @@ -0,0 +1,18 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } { ! ia32 } } } } */ ... to avoid target selector. +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ Please repost the complete patch with a proper ChangeLog. Uros.
Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3
On Mon, May 5, 2014 at 6:44 PM, Joseph S. Myers jos...@codesourcery.com wrote: These are due to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL tuning flag. Currently, this flag applies to all vector sizes (128, 256 and 512 bits), but I guess it is effective only for 128 bit sizes. Can you please review usage of this flag in i386/sse.md? Indeed, the optimization as described in http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01464.html is purely about reducing code size, and is irrelevant in VEX-prefixed cases. Thus, this patch adds MODE_SIZE == 16 conditionals in relevant cases (some cases already had such conditionals or otherwise wouldn't be used for larger vectors). Tested with no regressions for x86_64-linux-gnu (--with-arch=bdver3 --with-cpu=bdver3, where it fixes most of the remaining scan-assembler test failures). OK to commit? 2014-05-05 Joseph Myers jos...@codesourcery.com * config/i386/sse.md (*movmode_internal) (*sse_loadussemodesuffixavxsizesuffixmask_name) (*sse2_avx_avx512f_loaddqumodemask_name) (sse_andnotmode3, codemode3, *andnotmode3) (*codemode3, *andnotmode3mask_name) (mask_codeforcodemode3mask_name): Only consider TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL for modes of size 16. This is OK for mainline with a slight change below. Index: gcc/config/i386/sse.md === --- gcc/config/i386/sse.md (revision 209980) +++ gcc/config/i386/sse.md (working copy) @@ -758,7 +758,8 @@ [(set_attr type sselog1,ssemov,ssemov) (set_attr prefix maybe_vex) (set (attr mode) - (cond [(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) + (cond [(and (match_test MODE_SIZE == 16) + (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)) (const_string ssePSmode) (and (match_test MODE_SIZE == 16) (and (eq_attr alternative 2) Please merge the changed first and the second conditional to: (cond [(and (match_test MODE_SIZE == 16) (ior (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) (and (eq_attr alternative 2) (match_test TARGET_SSE_TYPELESS_STORES (const_string ssePSmode) Thanks, Uros.
Fwd: [PATCH, alpha]: Fix PR61092, wide-int merge broke alpha bootstrap
Hello! Wide-int merge triggered following ICE: In file included from ../../gcc-svn/trunk/gcc/wide-int.cc:37:0: ../../gcc-svn/trunk/gcc/wide-int.cc: In function ‘unsigned int wi::mul_internal(long int*, const long int*, unsigned int, const long int*, unsigned int, unsigned int, signop, bool*, bool)’: ../../gcc-svn/trunk/gcc/../include/longlong.h:145:10: sorry, unimplemented: unexpected AST of kind mult_highpart_expr (ph) = __builtin_alpha_umulh (__m0, __m1);\ ^ ../../gcc-svn/trunk/gcc/wide-int.cc:1269:4: note: in expansion of macro ‘umul_ppmm’ umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); ^ ../../gcc-svn/trunk/gcc/../include/longlong.h:145:10: internal compiler error: in potential_constant_expression_1, at cp/semantics.c:10575 (ph) = __builtin_alpha_umulh (__m0, __m1);\ ^ ../../gcc-svn/trunk/gcc/wide-int.cc:1269:4: note: in expansion of macro ‘umul_ppmm’ umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); ^ As instructed by Jakub, target builtins should be folded during gimplification. 2014-05-08 Uros Bizjak ubiz...@gmail.com PR target/61092 * config/alpha/alpha.c: Include gimple-iterator.h. (alpha_gimple_fold_builtin): New function. Move ALPHA_BUILTIN_UMULH folding from ... (alpha_fold_builtin): ... here. (TARGET_GIMPLE_FOLD_BUILTIN): New define. Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu. If there are no objections, I will commit the patch to mainline and 4.9. Uros. Index: config/alpha/alpha.c === --- config/alpha/alpha.c(revision 210120) +++ config/alpha/alpha.c(working copy) @@ -62,6 +62,7 @@ along with GCC; see the file COPYING3. If not see #include gimple-expr.h #include is-a.h #include gimple.h +#include gimple-iterator.h #include gimplify.h #include gimple-ssa.h #include stringpool.h @@ -7042,9 +7043,6 @@ alpha_fold_builtin (tree fndecl, int n_args, tree case ALPHA_BUILTIN_MSKQH: return alpha_fold_builtin_mskxx (op, opint, op_const, 0xff, true); -case ALPHA_BUILTIN_UMULH: - return fold_build2 (MULT_HIGHPART_EXPR, alpha_dimode_u, op[0], op[1]); - case ALPHA_BUILTIN_ZAP: opint[1] ^= 0xff; /* FALLTHRU */ @@ -7094,6 +7092,49 @@ alpha_fold_builtin (tree fndecl, int n_args, tree return NULL; } } + +bool +alpha_gimple_fold_builtin (gimple_stmt_iterator *gsi) +{ + bool changed = false; + gimple stmt = gsi_stmt (*gsi); + tree call = gimple_call_fn (stmt); + gimple new_stmt = NULL; + + if (call) +{ + tree fndecl = gimple_call_fndecl (stmt); + + if (fndecl) + { + tree arg0, arg1; + + switch (DECL_FUNCTION_CODE (fndecl)) + { + case ALPHA_BUILTIN_UMULH: + arg0 = gimple_call_arg (stmt, 0); + arg1 = gimple_call_arg (stmt, 1); + + new_stmt + = gimple_build_assign_with_ops (MULT_HIGHPART_EXPR, + gimple_call_lhs (stmt), + arg0, + arg1); + break; + default: + break; + } + } +} + + if (new_stmt) +{ + gsi_replace (gsi, new_stmt, true); + changed = true; +} + + return changed; +} /* This page contains routines that are used to determine what the function prologue and epilogue code will do and write them out. */ @@ -9790,6 +9831,8 @@ alpha_canonicalize_comparison (int *code, rtx *op0 #define TARGET_EXPAND_BUILTIN alpha_expand_builtin #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN alpha_fold_builtin +#undef TARGET_GIMPLE_FOLD_BUILTIN +#define TARGET_GIMPLE_FOLD_BUILTIN alpha_gimple_fold_builtin #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL alpha_function_ok_for_sibcall
Re: Fix some tests for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
On Thu, May 8, 2014 at 3:10 AM, Joseph S. Myers jos...@codesourcery.com wrote: Having fixed TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to apply only to 128-bit vectors, some --with-arch=bdver3 --with-cpu=bdver3 scan-assembler failures relating to that tuning remain, because of different choices of instructions for 128-bit vectors from the choices expected by the tests. This patch fixes affected tests to allow the different instruction choices seen in this case. Tested for x86_64-linux-gnu (--with-arch=bdver3 --with-cpu=bdver3). OK to commit? OK. Thanks, Uros.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Thu, May 8, 2014 at 12:59 AM, Wei Mi w...@google.com wrote: The calls added in the templates of tls_local_dynamic_base_32 and tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2 from moving sp setting across implicit tls calls, but those calls make the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that the optimization in tls_local_dynamic_32_once to convert local_dynamic to global_dynamic mode for single tls reference cannot take effect. In the updated patch, I remove those calls from insn templates and add reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Please update ChangeLog with all changes, see below: ChangeLog: gcc/ 2014-05-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): preferred_stack_boundary updated for tls expanded call. (...): Update preferred_stack_boundary for call, expanded from tls descriptor. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. * config/i386/i386.md (*tls_global_dynamic_32_gnu): Depend on SP register. (*tls_local_dynamic_base_32_gnu): Ditto. ... (tls_global_dynamic_32): Set ix86_tls_descriptor_calls_expanded_in_cfun. Update RTX to depend on SP register. (tls_local_dynamic_base_32): Ditto. ... The patch is OK for mainline with updated and complete ChangeLog entry. Thanks, Uros.
[PATCH, i386]: Fix PR59952, -march=core-avx2 should not enable RTM
Hello! Apparently, not all Haswell processors have TSX. Attached patch removes PTA_RTM from default Haswell flags. PTA_HLX still makes sense for Haswell processors, since the prefix is ignored on non-TSX processors. 2014-05-08 Uros Bizjak ubiz...@gmail.com PR target/59952 * config/i386/i386.c (PTA_HASWELL): Remove PTA_RTM. Bootstrapped and regression tested on x86_64-pc-linux-gnu. The patch is committed on mainline, will be backported to all relevant release branches. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 210231) +++ config/i386/i386.c (working copy) @@ -3130,7 +3130,7 @@ ix86_option_override_internal (bool main_args_p, (PTA_SANDYBRIDGE | PTA_FSGSBASE | PTA_RDRND | PTA_F16C) #define PTA_HASWELL \ (PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI | PTA_BMI2 | PTA_LZCNT \ - | PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE) + | PTA_FMA | PTA_MOVBE | PTA_HLE) #define PTA_BROADWELL \ (PTA_HASWELL | PTA_ADX | PTA_PRFCHW | PTA_RDSEED) #define PTA_BONNELL \
Re: [PATCH][x86] Support clflushopt, xsaves, xsavec.
On Mon, May 12, 2014 at 3:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote: This patch add support for xsavec, xsaves ISA extensions, introduced in [1], and clflushopt introduced in [2]. [1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html [2]http://software.intel.com/en-us/file/319433-018pdf Bootstraps, passes make-check. Please also add new options to g++.dg/other/i386-{2,3}.C and gcc.target/i386/sse-{14,15,22,23}.c. Uros.
Re: [PATCH][x86] Support clflushopt, xsaves, xsavec.
On Tue, May 13, 2014 at 11:18 AM, Ilya Tocar tocarip.in...@gmail.com wrote: This patch add support for xsavec, xsaves ISA extensions, introduced in [1], and clflushopt introduced in [2]. [1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html [2]http://software.intel.com/en-us/file/319433-018pdf Bootstraps, passes make-check. Please also add new options to g++.dg/other/i386-{2,3}.C and gcc.target/i386/sse-{14,15,22,23}.c. Uros. Done. Looks like sse-15 doesn't need new options, I've assumed sse-12/13. Yes, you are right. Changelog: 2014-05-12 Ilya Tocar ilya.to...@intel.com * common/config/i386/i386-common.c (OPTION_MASK_ISA_CLFLUSHOPT_SET): Define. (OPTION_MASK_ISA_XSAVES_SET): Ditto. (OPTION_MASK_ISA_XSAVEC_SET): Ditto. (OPTION_MASK_ISA_CLFLUSHOPT_UNSET): Ditto. (OPTION_MASK_ISA_XSAVES_UNSET): Ditto. (OPTION_MASK_ISA_XSAVEC_UNSET): Ditto. (ix86_handle_option): Handle OPT_mxsavec, OPT_mxsaves, OPT_mclflushopt. * config.gcc (i[34567]86-*-*): Add clflushoptintrin.h, xsavecintrin.h, xsavesintrin.h. (x86_64-*-*): Ditto. * config/i386/clflushoptintrin.h: New. * config/i386/xsavecintrin.h: Ditto. * config/i386/xsavesintrin.h: Ditto. * config/i386/cpuid.h (bit_CLFLUSHOPT): Define. (bit_XSAVES): Ditto. (bit_XSAVES): Ditto. * config/i386/driver-i386.c (host_detect_local_cpu): Handle -mclflushopt, -mxsavec, -mxsaves, -mno-xsaves, -mno-xsavec, -mno-clflushopt. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_CLFLUSHOPT, OPTION_MASK_ISA_XSAVEC, OPTION_MASK_ISA_XSAVES. * config/i386/i386.c (ix86_target_string): Handle -mclflushopt, -mxsavec, -mxsaves. (PTA_CLFLUSHOPT) Define. (PTA_XSAVEC): Ditto. (PTA_XSAVES): Ditto. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Ditto. (ix86_builtins): Add IX86_BUILTIN_XSAVEC, IX86_BUILTIN_XSAVEC64, IX86_BUILTIN_XSAVES, IX86_BUILTIN_XRSTORS, IX86_BUILTIN_XSAVES64, IX86_BUILTIN_XRSTORS64, IX86_BUILTIN_CLFLUSHOPT. (bdesc_special_args): Add __builtin_ia32_xsaves, __builtin_ia32_xrstors, __builtin_ia32_xsavec, __builtin_ia32_xsaves64, __builtin_ia32_xrstors64, __builtin_ia32_xsavec64. (ix86_init_mmx_sse_builtins): Add __builtin_ia32_clflushopt. (ix86_expand_builtin): Handle new builtins. * config/i386/i386.h (TARGET_CLFLUSHOPT) Define. (TARGET_CLFLUSHOPT_P): Ditto. (TARGET_XSAVEC): Ditto. (TARGET_XSAVEC_P): Ditto. (TARGET_XSAVES): Ditto. (TARGET_XSAVES_P): Ditto. * config/i386/i386.md (ANY_XSAVE): Add UNSPECV_XSAVEC, UNSPECV_XSAVES. (ANY_XSAVE64) Add UNSPECV_XSAVEC64, UNSPECV_XSAVES64. (attr xsave): Add xsavec, xsavec64, xsaves, xsaves64. (ANY_XRSTOR): New. (ANY_XRSTOR64): Ditto. (xrstor): Ditto. (xrstor): Change into xrstor. (xrstor_rex64): Change into xrstor_rex64. (xrstor64): Change into xrstor64 (clflushopt): New. * config/i386/i386.opt (mclflushopt): New. (mxsavec): Ditto. (mxsaves): Ditto. * config/i386/x86intrin.h: Add clflushoptintrin.h, xsavesintrin.h, xsavecintrin.h. * doc/invoke.texi: Document new options. And for tests: 2014-05-12 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/clflushopt-1.c: New. * gcc.target/i386/xsavec-1.c: Ditto. * gcc.target/i386/xsavec64-1.c: Ditto. * gcc.target/i386/xsaves-1.c: Ditto. * gcc.target/i386/xsaves64-1.c: Ditto. * gcc.target/i386/sse-12.c: Test new options. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. This is OK for mainline. Thanks, Uros.
Re: [PATCH] Fix PR 60901
On Wed, May 14, 2014 at 10:57 AM, Andrey Belevantsev a...@ispras.ru wrote: This ICE comes from the ix86_dependencies_evaluation_hook code assumption that any scheduling region will be connected. This assumption is not correct in case of the outer loops pipelining of the selective scheduler as explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3. Trying to add dependencies between insns from the different scheduling regions results in a segfault within the dependency analyzer code. The fix is to adjust the code to account for the situation when basic block's predecessors do not belong to the same scheduling region. Bootstrapped and tested on x86-64, OK for trunk? Branches? The fix is low risk as the additional test should always be true for the regular scheduler. I don't know all scheduler details, so your opinion counts there. Let's put this fix to mainline first and after a week without problems, backport it to all release branches. gcc/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that bb predecessor belongs to the same scheduling region. Adjust comment. testsuite/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * gcc.dg/pr60901.c: New test. +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fselective-scheduling -fschedule-insns -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fno-tree-dominator-opts } */ + As this is clearly a target bug, let's put the test in gcc.target/i386 directory. You can remove target selector, and the test will run for 64bit and 32bit targets automatically. Thanks, Uros.
Re: Make check_effective_target_vect_sizes_32B_16B handle -mprefer-avx128
On Sat, May 10, 2014 at 6:00 PM, Joseph S. Myers jos...@codesourcery.com wrote: In general, vectorization tests whose expectations on x86 depend on whather AVX is available should only consider AVX available if -mprefer-avx128 is not enabled. Some of the effective-target functions in target-supports.exp handle this properly, but check_effective_target_vect_sizes_32B_16B does not do so, resulting in various test failures in configurations with -mprefer-avx128. This patch makes check_effective_target_vect_sizes_32B_16B follow other functions in checking check_prefer_avx128. It fixes the following failures for x86_64-linux-gnu --with-arch=bdver3 --with-tune=bdver3. OK to commit? FAIL: gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8 FAIL: gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8 FAIL: gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect vectorized 1 loops 2 FAIL: gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect vectorizing stmts using SLP 1 FAIL: gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8 FAIL: gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8 FAIL: gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects scan-tree-dump-times vect vectorized 1 loops 2 FAIL: gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects scan-tree-dump-times vect vectorizing stmts using SLP 1 (I still see some gcc.dg/vect/costmodel/ failures that are unchanged by this patch: costmodel-vect-31.c, costmodel-vect-68.c and costmodel-fast-math-vect-pr29925.c.) 2014-05-10 Joseph Myers jos...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_sizes_32B_16B): Return false if 128-bit AVX vectors preferred. This is OK. Thanks, Uros.
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
On Mon, May 12, 2014 at 7:38 PM, Wei Mi w...@google.com wrote: Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok? No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld: FAIL: gcc.target/i386/pr58066.c scan-assembler sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr The TLS code sequence is different here: subl$8, %esp lealccc1@tlsgd(,%ebx,1), %eax callccc1@tlsgdplt I fear this insn scanning is going to be extremely fragile. Rainer Thanks for trying the testcase. rtl scanning will be slightly better than assembly scanning. So how about this one? This is OK, with a small effective-target addition, as shown below. Thanks, Uros. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 210222) +++ testsuite/gcc.target/i386/pr58066.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -fPIC -O2 } */ +/* { dg-require-effective-target tls_native } */ Please also add /* { dg-require-effective-target fpic } */ +/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */ /* Check whether the stack frame starting addresses of tls expanded calls in foo and goo are 16bytes aligned. */ @@ -15,4 +16,6 @@ void* goo() return ccc2; } -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ +/* { dg-final { scan-rtl-dump Function foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS.*Function goo final } } */ +/* { dg-final { scan-rtl-dump Function goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int -8.*UNSPEC_TLS final } } */ +/* { dg-final { cleanup-rtl-dump final } } */
Re: patch to fix PR60969
Hello! Attached patch enhances the testcase to also check for presence of MMX registers on all 32bit x86 targets. 2014-05-17 Uros Bizjak ubiz...@gmail.com * g++.dg/pr60969.C: Compile for all ilp32 x86 targets. (dg-options): Add -mfpmath=387. (dg-final): Check that no MMX registers are used. Tested on x86-64-pc-linux-gnu {,-m32} and committed to mainline and 4.9 branch. Uros. Index: pr60969.C === --- pr60969.C (revision 210549) +++ pr60969.C (working copy) @@ -1,5 +1,5 @@ -/* { dg-do compile { target i?86-*-* } } */ -/* { dg-options -O2 -ftree-vectorize -march=pentium4 } */ +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } ilp32 } } } */ +/* { dg-options -O2 -ftree-vectorize -march=pentium4 -mfpmath=387 } */ struct A { @@ -28,3 +28,5 @@ } return x; } + +/* { dg-final { scan-assembler-not %mm } } */
[PATCH, doc]: Improve -free description
Hello! -free defaults to enabled also for Alpha. The option is also enabled for -Os on all targets. 2014-05-17 Uros Bizjak ubiz...@gmail.com * doc/invoke.texi (free): Mention Alpha. Also enabled at -Os. Committed to 4.9 and mainline SVN. Uros. Index: invoke.texi === --- invoke.texi (revision 210549) +++ invoke.texi (working copy) @@ -7463,7 +7463,8 @@ helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit registers after writing to their lower 32-bit half. -Enabled for AArch64 and x86 at levels @option{-O2}, @option{-O3}. +Enabled for Alpha, AArch64 and x86 at levels @option{-O2}, +@option{-O3}, @option{-Os}. @item -flive-range-shrinkage @opindex flive-range-shrinkage
[PATCH, doc]: Fix a bunch of warnings in *.texi files
Hello! Attached patch fixes: md.texi:1057: warning: node next `Constraints' in menu `Asm Labels' and in sectioning `Size of an asm' differ extend.texi:7175: warning: node `Asm Labels' is next for `Size of an asm' in sectioning but not in menu extend.texi:7175: warning: node prev `Size of an asm' in menu `Explicit Reg Vars' and in sectioning `Constraints' differ extend.texi:7197: warning: node prev `Asm Labels' in menu `Constraints' and in sectioning `Size of an asm' differ extend.texi:7245: warning: node `Size of an asm' is next for `Explicit Reg Vars' in menu but not in sectioning as seen when compiling on Fedora 20. 2014-05-17 Uros Bizjak ubiz...@gmail.com * doc/extend.texi (Size of an asm): Move node text according to its @menu entry position. Tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: doc/extend.texi === --- doc/extend.texi (revision 210549) +++ doc/extend.texi (working copy) @@ -7172,28 +7172,6 @@ @include md.texi @raisesections -@node Size of an asm -@subsection Size of an @code{asm} - -Some targets require that GCC track the size of each instruction used -in order to generate correct code. Because the final length of the -code produced by an @code{asm} statement is only known by the -assembler, GCC must make an estimate as to how big it will be. It -does this by counting the number of instructions in the pattern of the -@code{asm} and multiplying that by the length of the longest -instruction supported by that processor. (When working out the number -of instructions, it assumes that any occurrence of a newline or of -whatever statement separator character is supported by the assembler -- -typically @samp{;} --- indicates the end of an instruction.) - -Normally, GCC's estimate is adequate to ensure that correct -code is generated, but it is possible to confuse the compiler if you use -pseudo instructions or assembler macros that expand into multiple real -instructions, or if you use assembler directives that expand to more -space in the object file than is needed for a single instruction. -If this happens then the assembler may produce a diagnostic saying that -a label is unreachable. - @node Asm Labels @subsection Controlling Names Used in Assembler Code @cindex assembler names for identifiers @@ -7277,6 +7255,28 @@ specified for that operand in the @code{asm}.) @end itemize +@node Size of an asm +@subsection Size of an @code{asm} + +Some targets require that GCC track the size of each instruction used +in order to generate correct code. Because the final length of the +code produced by an @code{asm} statement is only known by the +assembler, GCC must make an estimate as to how big it will be. It +does this by counting the number of instructions in the pattern of the +@code{asm} and multiplying that by the length of the longest +instruction supported by that processor. (When working out the number +of instructions, it assumes that any occurrence of a newline or of +whatever statement separator character is supported by the assembler -- +typically @samp{;} --- indicates the end of an instruction.) + +Normally, GCC's estimate is adequate to ensure that correct +code is generated, but it is possible to confuse the compiler if you use +pseudo instructions or assembler macros that expand into multiple real +instructions, or if you use assembler directives that expand to more +space in the object file than is needed for a single instruction. +If this happens then the assembler may produce a diagnostic saying that +a label is unreachable. + @menu * Global Reg Vars:: * Local Reg Vars::
[PATCH, libgomp doc]: Fix all libgomp.texi warnings
Hello! Attached patch fixes following libgomp.texi warnings: libgomp.texi:169: warning: multiple @menu libgomp.texi:184: warning: multiple @menu libgomp.texi:914: warning: node `omp_init_lock' is next for `omp_set_schedule' in sectioning but not in menu libgomp.texi:947: warning: node `omp_set_schedule' is prev for `omp_init_lock' in sectioning but not in menu libgomp.texi:1206: warning: node `omp_get_wtick' is next for `omp_destroy_nest_lock' in sectioning but not in menu libgomp.texi:1233: warning: node `omp_destroy_nest_lock' is prev for `omp_get_wtick' in sectioning but not in menu libgomp.texi:1431: warning: node next `OMP_NUM_THREADS' in menu `OMP_PROC_BIND' and in sectioning `OMP_PLACES' differ libgomp.texi:1451: warning: node next `OMP_PLACES' in menu `OMP_STACKSIZE' and in sectioning `OMP_PROC_BIND' differ libgomp.texi:1451: warning: node prev `OMP_PLACES' in menu `OMP_PROC_BIND' and in sectioning `OMP_NUM_THREADS' differ libgomp.texi:1493: warning: node next `OMP_PROC_BIND' in menu `OMP_PLACES' and in sectioning `OMP_SCHEDULE' differ libgomp.texi:1493: warning: node prev `OMP_PROC_BIND' in menu `OMP_NUM_THREADS' and in sectioning `OMP_PLACES' differ libgomp.texi:1520: warning: node next `OMP_SCHEDULE' in menu `OMP_THREAD_LIMIT' and in sectioning `OMP_STACKSIZE' differ libgomp.texi:1520: warning: node prev `OMP_SCHEDULE' in menu `OMP_STACKSIZE' and in sectioning `OMP_PROC_BIND' differ libgomp.texi:1541: warning: node next `OMP_STACKSIZE' in menu `OMP_SCHEDULE' and in sectioning `OMP_THREAD_LIMIT' differ libgomp.texi:1541: warning: node prev `OMP_STACKSIZE' in menu `OMP_PLACES' and in sectioning `OMP_SCHEDULE' differ libgomp.texi:1561: warning: node prev `OMP_THREAD_LIMIT' in menu `OMP_SCHEDULE' and in sectioning `OMP_STACKSIZE' differ these are seen when compiling libgomp on Fedora20. The menu in Runtime Library Routines now looks this way: --cut here-- 2 Runtime Library Routines ** The runtime routines described here are defined by Section 3 of the OpenMP specification in version 4.0. The routines are structured in following three parts: * Menu: Control threads, processors and the parallel environment. They have C linkage, and do not throw exceptions. * omp_get_active_level::Number of active parallel regions * omp_get_ancestor_thread_num:: Ancestor thread ID * omp_get_cancellation::Whether cancellation support is enabled ... * omp_set_nested:: Enable/disable nested parallel regions * omp_set_num_threads:: Set upper team size limit * omp_set_schedule::Set the runtime scheduling method Initialize, set, test, unset and destroy simple and nested locks. * omp_init_lock::Initialize simple lock * omp_set_lock:: Wait for and set simple lock ... * omp_unset_nest_lock:: Unset nested lock * omp_destroy_nest_lock::Destroy nested lock Portable, thread-based, wall clock timer. * omp_get_wtick::Get timer precision. * omp_get_wtime::Elapsed wall clock time. --cut here-- i.e., without extra * Menu entries, what was probably ment from the beginning. 2014-05-17 Uros Bizjak ubiz...@gmail.com * libgomp.texi (Runitme Library Routines): Remove multiple @menu. (Environment Variables) Move OMP_PROC_BIND and OMP_STACKSIZE node texts according to their @menu entry positions. Tested with x86_64-pc-linux-gnu bootstrap. OK for mainline and 4.9? Uros. Index: libgomp.texi === --- libgomp.texi(revision 210549) +++ libgomp.texi(working copy) @@ -130,10 +130,10 @@ specification in version 4.0. The routines are structured in following three parts: +@menu Control threads, processors and the parallel environment. They have C linkage, and do not throw exceptions. -@menu * omp_get_active_level::Number of active parallel regions * omp_get_ancestor_thread_num:: Ancestor thread ID * omp_get_cancellation::Whether cancellation support is enabled @@ -162,11 +162,9 @@ * omp_set_nested:: Enable/disable nested parallel regions * omp_set_num_threads:: Set upper team size limit * omp_set_schedule::Set the runtime scheduling method -@end menu Initialize, set, test, unset and destroy simple and nested locks. -@menu * omp_init_lock::Initialize simple lock * omp_set_lock:: Wait for and set simple lock * omp_test_lock::Test and set simple lock if available @@ -177,11 +175,9 @@ * omp_test_nest_lock:: Test and set nested lock if available * omp_unset_nest_lock:: Unset nested lock * omp_destroy_nest_lock::Destroy nested lock -@end menu Portable, thread-based, wall clock timer. -@menu * omp_get_wtick::Get timer precision. * omp_get_wtime::Elapsed wall clock time. @end menu @@ -1448,6 +1444,33 @@ +@node OMP_PROC_BIND +@section @env{OMP_PROC_BIND
Re: [PATCH, doc]: Fix a bunch of warnings in *.texi files
On Sun, May 18, 2014 at 7:17 AM, David Wohlferd d...@limegreensocks.com wrote: My bad. My version of makeinfo wasn't reporting these errors. However, this isn't right either. There are two subsections that are now under Size of an asm that should be under Variables in Specified Registers. How about this (attached)? Oh, I was not aware that this is a nested @menu with its own sections. Sure, your patch is OK. I went ahead and installed it on mainline, after I have bootstrapped it on x86_64-linux-gnu. Thanks, Uros.
[PATCH, doc]: Fix POD document had syntax errors at /usr/bin/pod2man line 69. error
Hello! Attached patch fixes following errors in .pod document sources: gfdl.pod around line 53: Expected text after =item, not a number gfdl.pod around line 147: Expected text after =item, not a number gfdl.pod around line 165: Expected text after =item, not a number gfdl.pod around line 205: Expected text after =item, not a number gfdl.pod around line 357: Expected text after =item, not a number gfdl.pod around line 384: Expected text after =item, not a number gfdl.pod around line 400: Expected text after =item, not a number gfdl.pod around line 422: Expected text after =item, not a number gfdl.pod around line 445: Expected text after =item, not a number gfdl.pod around line 475: Expected text after =item, not a number gfdl.pod around line 499: Expected text after =item, not a number POD document had syntax errors at /usr/bin/pod2man line 69. gmake[3]: [doc/gfdl.7] Error 1 (ignored) As suggested in the fix for a similar problem [1], the solution is to put Z in the =item argument string. 2014-05-18 Uros Bizjak ubiz...@gmail.com * texi2pod.pl: Force .pod file to not be a numbered list. The fix was tested by bootstrapping on Fedora20 x86_64-pc-linux-gnu, and also comparing previous .man and .html files with new ones. They were bit-exact. OK for mainline and 4.9? [1] http://comments.gmane.org/gmane.network.inn/9841 Uros. Index: texi2pod.pl === --- texi2pod.pl (revision 210579) +++ texi2pod.pl (working copy) @@ -1,6 +1,6 @@ #! /usr/bin/perl -w -# Copyright (C) 1999, 2000, 2001, 2003, 2010 Free Software Foundation, Inc. +# Copyright (C) 1999-2014 Free Software Foundation, Inc. # This file is part of GCC. @@ -337,7 +337,7 @@ $_ = \n=item $1\n; } } else { - $_ = \n=item $ic\n; + $_ = \n=item Z\LT;\GT;$ic\n; $ic =~ y/A-Ya-y/B-Zb-z/; $ic =~ s/(\d+)/$1 + 1/eg; }
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
On Mon, May 19, 2014 at 6:48 AM, Jan Hubicka hubi...@ucw.cz wrote: Thanks for the pointer, there is indeed the recommendation in optimization manual [1], section 3.6.4, where it is said: --quote-- Misaligned data access can incur significant performance penalties. This is particularly true for cache line splits. The size of a cache line is 64 bytes in the Pentium 4 and other recent Intel processors, including processors based on Intel Core microarchitecture. An access to data unaligned on 64-byte boundary leads to two memory accesses and requires several ??ops to be executed (instead of one). Accesses that span 64-byte boundaries are likely to incur a large performance penalty, the cost of each stall generally are greater on machines with longer pipelines. ... A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64. Sorting data in decreasing size order is one heuristic for assisting with natural alignment. As long as 16- byte boundaries (and cache lines) are never crossed, natural alignment is not strictly necessary (though it is an easy way to enforce this). --/quote-- So, this part has nothing to do with AVX512, but with cache line width. And we do have a --param l1-cache-line-size=64, detected with -march=native that could come handy here. This part should be rewritten (and commented) with the information above in mind. Like in the patch below. Please note, that the block_tune setting for the nocona is wrong, -march=native on my trusted old P4 returns: --param l1-cache-size=16 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=nocona which is consistent with the above quote from manual. 2014-01-02 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (ix86_data_alignment): Calculate max_align from prefetch_block tune setting. (nocona_cost): Correct size of prefetch block to 64. Uros, I am looking into libreoffice size and the data alignment seems to make huge difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9, while clang produces 5.2MB. The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but But perhaps we want to revisit the alignment rules. The optimization manuals usually care only about performance critical loops. Perhaps we can make the rules to align only bigger datastructures, or so at least for -O2. Based on the above quote, Misaligned data access can incur significant performance penalties. and the fact that this particular alignment rule has some compatibility issues with previous versions of gcc (these were later fixed by Jakub), I'd rather leave this rule as is. However, if the access is from the cold section, we can perhaps avoid extra alignment, while avoiding those compatibility issues. Uros.
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
On Mon, May 19, 2014 at 6:42 PM, H.J. Lu hjl.to...@gmail.com wrote: Uros, I am looking into libreoffice size and the data alignment seems to make huge difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9, while clang produces 5.2MB. The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but But perhaps we want to revisit the alignment rules. The optimization manuals usually care only about performance critical loops. Perhaps we can make the rules to align only bigger datastructures, or so at least for -O2. Based on the above quote, Misaligned data access can incur significant performance penalties. and the fact that this particular alignment rule has some compatibility issues with previous versions of gcc (these were later fixed by Jakub), I'd rather leave this rule as is. However, if the access is from the cold section, we can perhaps avoid extra alignment, while avoiding those compatibility issues. It is excessive to align struct foo { int x1; int x2; char x3; int x4; int x5; char x6; int x7; int x8; }; to 32 bytes and align struct foo { int x1; int x2; char x3; int x4; int x5; char x6; int x7[9]; int x8; }; to 64 bytes. What performance gain does it provide? Avoids significant performance penalties, perhaps? Uros.
Re: [PATCH, doc]: Fix POD document had syntax errors at /usr/bin/pod2man line 69. error
On Sun, May 18, 2014 at 11:10 AM, Uros Bizjak ubiz...@gmail.com wrote: Attached patch fixes following errors in .pod document sources: gfdl.pod around line 53: Expected text after =item, not a number gfdl.pod around line 147: Expected text after =item, not a number gfdl.pod around line 165: Expected text after =item, not a number gfdl.pod around line 205: Expected text after =item, not a number gfdl.pod around line 357: Expected text after =item, not a number gfdl.pod around line 384: Expected text after =item, not a number gfdl.pod around line 400: Expected text after =item, not a number gfdl.pod around line 422: Expected text after =item, not a number gfdl.pod around line 445: Expected text after =item, not a number gfdl.pod around line 475: Expected text after =item, not a number gfdl.pod around line 499: Expected text after =item, not a number POD document had syntax errors at /usr/bin/pod2man line 69. gmake[3]: [doc/gfdl.7] Error 1 (ignored) As suggested in the fix for a similar problem [1], the solution is to put Z in the =item argument string. 2014-05-18 Uros Bizjak ubiz...@gmail.com * texi2pod.pl: Force .pod file to not be a numbered list. The fix was tested by bootstrapping on Fedora20 x86_64-pc-linux-gnu, and also comparing previous .man and .html files with new ones. They were bit-exact. I went ahead and install the patch in the mainline. It is a trivial one-liner, and can be easily reverted if it makes troubles. Uros.
Re: [PATCH] PR 61249: fix documentation of __builtin_ia32_{vfrczss,vfrczsd,mpsadbw256}
Hello! 2014-05-26 Michael Tautschnig m...@debian.org PR target/61249 * doc/extend.texi: Fix parameter lists of __builtin_ia32_vfrczs[sd], __builtin_ia32_mpsadbw256. Thanks, I have committed the patch with slightly changed ChangeLog to all branches. Uros.
[PATCH, libbid]: Fix variable ‘Ql’ set but not used warnings
Hello! Attached patch fixes several variable ‘Ql’ set but not used warnings in bid128_div.c and bid64_div.c libbid sources. We can simply use __mul_128x128_high functions when lowpart is not needed. 2014-05-26 Uros Bizjak ubiz...@gmail.com * bid128_div.c (BID128_FUNCTION_ARG2): Remove unused variable 'Ql'. Call __mul_128x128_high instead of __mul_128x128_full. (TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2): Ditto. (BID128_FUNCTION_ARGTYPE1_ARG128): Ditto. (BID128_FUNCTION_ARG128_ARGTYPE2): Ditto. * bid64_div.c (TYPE0_FUNCTION_ARGTYPE1_ARG128): Ditto. (TYPE0_FUNCTION_ARG128_ARGTYPE2): Ditto. (TYPE0_FUNCTION_ARG128_ARG128): Ditto. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. HJ, should this be fixed upstream first? Uros. Index: bid128_div.c === --- bid128_div.c(revision 210927) +++ bid128_div.c(working copy) @@ -36,7 +36,7 @@ extern UINT8 packed_1_zeros[]; BID128_FUNCTION_ARG2 (bid128_div, x, y) UINT256 CA4, CA4r, P256; - UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res; + UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res; UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD, valid_y; int_float fx, fy, f64; @@ -239,7 +239,7 @@ if (!CA4.w[0] !CA4.w[1]) if (d5 nzeros) nzeros = d5; // get P*(2^M[extra_digits])/10^extra_digits -__mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); +__mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 amount = recip_scale[nzeros]; @@ -365,7 +365,7 @@ if (!CA4.w[0] !CA4.w[1]) if (nzeros) { // get P*(2^M[extra_digits])/10^extra_digits - __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); + __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); //now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 amount = recip_scale[nzeros]; @@ -487,7 +487,7 @@ TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2 (UINT128, bid128d UINT64, y) UINT256 CA4, CA4r, P256; - UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res; + UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res; UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD, valid_y; int_float fx, fy, f64; @@ -701,7 +701,7 @@ __div_256_by_128 (CQ, CA4, CY); if (d5 nzeros) nzeros = d5; // get P*(2^M[extra_digits])/10^extra_digits - __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); + __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); //__mul_128x128_to_256(P256, CQ, reciprocals10_128[nzeros]);Qh.w[1]=P256.w[3];Qh.w[0]=P256.w[2]; // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 @@ -829,7 +829,7 @@ __div_256_by_128 (CQ, CA4, CY); if (nzeros) { // get P*(2^M[extra_digits])/10^extra_digits - __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); + __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 amount = recip_scale[nzeros]; @@ -946,7 +946,7 @@ BID_RETURN (res); BID128_FUNCTION_ARGTYPE1_ARG128 (bid128dq_div, UINT64, x, y) UINT256 CA4, CA4r, P256; - UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res; + UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res; UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, valid_y, PD; int_float fx, fy, f64; @@ -1155,7 +1155,7 @@ __div_256_by_128 (CQ, CA4, CY); if (d5 nzeros) nzeros = d5; // get P*(2^M[extra_digits])/10^extra_digits - __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); + __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); //__mul_128x128_to_256(P256, CQ, reciprocals10_128[nzeros]);Qh.w[1]=P256.w[3];Qh.w[0]=P256.w[2]; // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 @@ -1285,7 +1285,7 @@ __div_256_by_128 (CQ, CA4, CY); if (nzeros) { // get P*(2^M[extra_digits])/10^extra_digits - __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]); + __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]); // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128 amount = recip_scale[nzeros]; @@ -1403,7 +1403,7 @@ BID_RETURN (res); BID128_FUNCTION_ARG128_ARGTYPE2 (bid128qd_div, x, UINT64, y) UINT256 CA4, CA4r, P256; - UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res; + UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res; UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD, valid_y; int_float fx, fy, f64; @@ -1607,7 +1607,7 @@ __div_256_by_128 (CQ, CA4, CY); if (d5 nzeros) nzeros = d5; // get P*(2^M
[PATCH, i386]: Fix logical 'not' error in x86_rtx_costs (PR 61271)
Hello! There is a stray ! in ix86_rtx_costs which results in an invalid bypass for LABEL_REFs. After some simplifications, the fixed condition should read: else if (flag_pic SYMBOLIC_CONST (x) !(TARGET_64BIT (GET_CODE (x) == LABEL_REF || (GET_CODE (x) == SYMBOL_REF SYMBOL_REF_LOCAL_P (x) *total = 1; The patch fixes the condition, but I don't think that handling of LABEL_REFs and SYMBOL_REFs is correct in the cost function at all. E.g. in x86_64_immediate_operand predicate, LABEL_REFs (and non-TLS SYMBOL_REFs) are rejected for all PIC code models, so they get cost of 3 and don't even reach this part of the function. Honza, can you perhaps check if x86_64{,_zext}_immediate operand handles PIC code models in a correct way? The trivial patch is bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: i386.c === --- i386.c (revision 210937) +++ i386.c (working copy) @@ -37903,10 +37903,10 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_ else if (TARGET_64BIT !x86_64_zext_immediate_operand (x, VOIDmode)) *total = 2; else if (flag_pic SYMBOLIC_CONST (x) - !(TARGET_64BIT -(GET_CODE (x) == LABEL_REF - || (GET_CODE (x) == SYMBOL_REF -SYMBOL_REF_LOCAL_P (x) + (!TARGET_64BIT + || (!GET_CODE (x) != LABEL_REF + (GET_CODE (x) != SYMBOL_REF + || !SYMBOL_REF_LOCAL_P (x) *total = 1; else *total = 0;
Re: [PATCH, i386]: Fix logical 'not' error in x86_rtx_costs (PR 61271)
On Mon, May 26, 2014 at 7:48 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! There is a stray ! in ix86_rtx_costs which results in an invalid bypass for LABEL_REFs. After some simplifications, the fixed condition should read: else if (flag_pic SYMBOLIC_CONST (x) !(TARGET_64BIT (GET_CODE (x) == LABEL_REF || (GET_CODE (x) == SYMBOL_REF SYMBOL_REF_LOCAL_P (x) *total = 1; The patch fixes the condition, but I don't think that handling of LABEL_REFs and SYMBOL_REFs is correct in the cost function at all. E.g. in x86_64_immediate_operand predicate, LABEL_REFs (and non-TLS SYMBOL_REFs) are rejected for all PIC code models, so they get cost of 3 and don't even reach this part of the function. Honza, can you perhaps check if x86_64{,_zext}_immediate operand handles PIC code models in a correct way? The trivial patch is bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Eh, wrong patch was attached. And ChangeLog was missing, too. 2014-05-26 Uros Bizjak ubiz...@gmail.com PR target/61271 * config/i386/i386.c (ix86_rtx_costs) case CONST_INT, case CONST, case LABEL_REF, case SYMBOL_REF: Fix condition. Uros. Index: i386.c === --- i386.c (revision 210889) +++ i386.c (working copy) @@ -37903,10 +37903,10 @@ else if (TARGET_64BIT !x86_64_zext_immediate_operand (x, VOIDmode)) *total = 2; else if (flag_pic SYMBOLIC_CONST (x) - (!TARGET_64BIT - || (!GET_CODE (x) != LABEL_REF - (GET_CODE (x) != SYMBOL_REF - || !SYMBOL_REF_LOCAL_P (x) + !(TARGET_64BIT +(GET_CODE (x) == LABEL_REF + || (GET_CODE (x) == SYMBOL_REF +SYMBOL_REF_LOCAL_P (x) *total = 1; else *total = 0;
[PATCH, testsuite]: Fix lto.exp does not support ... in secondary source files warnings
Hello! 2014-05-26 Uros Bizjak ubiz...@gmail.com * gcc.dg/lto/pr61278_1.c: Remove dg directives. Tested on x86_64-pc-linux-gnu and committed. Uros. Index: ChangeLog === --- ChangeLog (revision 210936) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-05-26 Uros Bizjak ubiz...@gmail.com + + * gcc.dg/lto/pr61278_1.c: Remove dg directives. + 2014-05-26 Jerry DeLisle jvdeli...@gcc.gnu.org PR libgfortran/55117 Index: gcc.dg/lto/pr61278_1.c === --- gcc.dg/lto/pr61278_1.c (revision 210936) +++ gcc.dg/lto/pr61278_1.c (working copy) @@ -1,6 +1,3 @@ -/* { dg-lto-do link } */ -/* { dg-lto-options { { -flto -O1 } } } */ - extern char foo (char *); char d;
[PATCH, testsuite]: Fix c-c++-common/cilk-plus/AN/pr61191.c dg-error directives.
Hello! 2014-05-26 Uros Bizjak ubiz...@gmail.com * c-c++-common/cilk-plus/AN/pr61191.c: Fix dg-error directives. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: c-c++-common/cilk-plus/AN/pr61191.c === --- c-c++-common/cilk-plus/AN/pr61191.c (revision 210936) +++ c-c++-common/cilk-plus/AN/pr61191.c (working copy) @@ -4,7 +4,7 @@ double f(double * A, double * B) { - return __sec_reduce_add((B[0:500])(; -/* { dg-error expected expression before ';' token {target *-*-*} 7 } */ -/* { dg-error called object {target *-*-*} 7 } */ -} /* { dg-error expected } */ + return __sec_reduce_add((B[0:500])(; /* { dg-error called object { target c } } */ +/* { dg-error expected expression before ';' token { target c } 7 } */ +/* { dg-error expected primary-expression before ';' token { target c++ } 7 } */ +} /* { dg-error expected { target c } } */
[PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
Hello! These tests require vect_simd_clones effective target, as the target should be able to compile AVX clones. 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. Remove dg-additional-options directives. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Tested on x86_64-linux-gnu CentOS 5. Uros. Index: testsuite/libgomp.fortran/declare-simd-1.f90 === --- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy) @@ -1,6 +1,5 @@ ! { dg-options -fno-inline } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-require-effective-target vect_simd_clones } module declare_simd_1_mod contains Index: testsuite/libgomp.fortran/declare-simd-2.f90 === --- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy) @@ -1,8 +1,7 @@ ! { dg-do run } ! { dg-options -fno-inline } - ! { dg-additional-sources declare-simd-3.f90 } -! { dg-additional-options -msse2 { target sse2_runtime } } -! { dg-additional-options -mavx { target avx_runtime } } +! { dg-additional-sources declare-simd-3.f90 } +! { dg-require-effective-target vect_simd_clones } module declare_simd_2_mod contains
Re: [PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90
On Tue, May 27, 2014 at 9:18 AM, Jakub Jelinek ja...@redhat.com wrote: Please don't remove the dg-additional-options there, that is completely intentional there, only the simd clones are built for SSE2/AVX/AVX2, the simd loops are built with whatever options the loop is compiled with, and for the common case (AVX or later HW, but compiler not configured to support only AVX or later) I want to test as much vectorization as possible. Requiring vect_simd_clone or the whitespace change is fine, though I'd just use ! { dg-do run { target vect_simd_clones } } instead of dg-require-effective-target. Thanks for the explanation. Following is the v2 patch that I plan to commit after testing: 2014-05-27 Uros Bizjak ubiz...@gmail.com * testsuite/libgomp.fortran/declare-simd-1.f90: Require vect_simd_clones effective target. * testsuite/libgomp.fortran/declare-simd-2.f90: Ditto. Uros. Index: testsuite/libgomp.fortran/declare-simd-1.f90 === --- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy) @@ -1,3 +1,4 @@ +! { dg-do run { target vect_simd_clones } } ! { dg-options -fno-inline } ! { dg-additional-options -msse2 { target sse2_runtime } } ! { dg-additional-options -mavx { target avx_runtime } } Index: testsuite/libgomp.fortran/declare-simd-2.f90 === --- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956) +++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy) @@ -1,6 +1,6 @@ -! { dg-do run } +! { dg-do run { target vect_simd_clones } } ! { dg-options -fno-inline } - ! { dg-additional-sources declare-simd-3.f90 } +! { dg-additional-sources declare-simd-3.f90 } ! { dg-additional-options -msse2 { target sse2_runtime } } ! { dg-additional-options -mavx { target avx_runtime } }
[PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
... to avoid implicit declaration of function ‘free’ warning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. Tested on x86_64-pc-linux-gnu. OK for mainline? Uros. Index: intrinsics/getcwd.c === --- intrinsics/getcwd.c (revision 210956) +++ intrinsics/getcwd.c (working copy) @@ -25,6 +25,7 @@ #include libgfortran.h +#include stdlib.h #include string.h #include errno.h
Re: [PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c
On Tue, May 27, 2014 at 1:37 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: ... to avoid implicit declaration of function ???free??? warning. 2014-05-27 Uros Bizjak ubiz...@gmail.com * intrinsics/getcwd.c: Include stdlib.h. It can also be committed to the 4.9 branch if you have the time. There is no need for stdlib.h include in the 4.9 branch, the call to free was introduced in 4.10. Uros.
Re: [Patch] Minor fixes for regtesting gfortran with -flto
Hello! With the following patch, gfortran can be regtested with -flto with no failure, but pr54852 and pr60061. -! { dg-final { scan-assembler-times myBindC 1 { target { ! { hppa*-*-hpux* } } } } } -! { dg-final { scan-assembler-times myBindC,%r2 1 { target { hppa*-*-hpux* } } } } +! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } +! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } The change above fails on alpha, which doesn't emit call in the assembly, but: $ grep myBindC bind_c_array_params_2.s jsr $26,myBindC Probably, alpha is not the only one that fails this assumption. Uros.
Re: [Patch] Minor fixes for regtesting gfortran with -flto
On Thu, May 29, 2014 at 11:38 AM, Dominique Dhumieres domi...@lps.ens.fr wrote: Probably, alpha is not the only one that fails this assumption. Indeed! see the thread starting at https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html Could you test the following patch --- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-24 16:17:53.0 +0200 +++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 11:34:40.0 +0200 @@ -16,7 +16,7 @@ integer :: aa(4,4) call test(aa) end -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } ! { dg-final { scan-tree-dump-times test \\\(parm\\. 1 original } } ! { dg-final { cleanup-tree-dump original } } with make -k check-gfortran RUNTESTFLAGS=dg.exp=bind_c_array_params_2.f90 --target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}' This works on alpha with --target_board=unix'{,-flto}' and x86_64, so I guess it is OK. Can you pre-approved it? I'm not a testsuite maintainer (one is CC'd for a final approval), but the situation is definitely better with the patched regexp. Uros.
Re: Patch RFA: Move x86 _mm_pause out of pragma target(sse) scope
Hello! This error is because _mm_pause is defined in the scope of #pragma GCC target(sse). But _mm_pause, which simply generates the pause instruction, does not require SSE support. The pause instruction has nothing really to do with SSE, and it works on all x86 processors (on processors that do not explicitly recognize it, it is a nop). I propose the following patch, which moves _mm_pause out of the pragma target scope. I know that x86intrin.h provides a similar intrinsic, __pause, but I think it's worth making _mm_pause work reasonably as well. I'm running a full testsuite run. OK for mainline if it passes? gcc/ChangeLog: 2014-05-29 Ian Lance Taylor i...@google.com * config/i386/xmmintrin.h (_mm_pause): Move out of scope of pragma target(sse). gcc/testsuite/ChangeLog: 2014-05-29 Ian Lance Taylor i...@google.com * gcc.target/i386/pause-2.c: New test. The patch looks OK to me, but please wait a day or two for possible comments (compatibility, etc) from Kirill. Thanks, Uros.
Re: [PATCH, i386] Enable fuse-caller-save for i386
On Fri, May 30, 2014 at 11:45 AM, Tom de Vries tom_devr...@mentor.com wrote: This patch enables the fuse-caller-save optimization for i386. It sets the hook TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true. The definition of the hook is: ... set to true if all the calls in the current function contain clobbers in CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call rather than by the callee, and are not already set or clobbered in the call pattern. Examples of such registers are registers used in PLTs and stubs, and temporary registers used in the call instruction but not present in the rtl pattern. Another way to formulate it is the registers not present in the rtl pattern that are clobbered by the call assuming the callee does not clobber any register. The default version of this hook is set to false. ... Bootstrapped and reg-tested this patch on x86_64, no issues found. Is it in fact safe to set this hook to true for i386? Are there clobbers which need to be added? If it's safe to set this hook to true, OK for trunk? AFAIK, this is true for all targets, including cross-calls between MS and SYSV ABIs, so I'd say OK. Uros.
Re: [patch i386]: Fix sibcall failures caused by allowing constant memories
On Sat, May 31, 2014 at 2:27 PM, Kai Tietz ktiet...@googlemail.com wrote: I resend patch within new thread. Recent fallout about sibcall was caused by using 'm' constraint for sibcalls. By this wrongly combines happened on reload-pass. That patch introduces new constraint 'B' for sibcall_memory_operand. ChangeLog 2014-05-31 Kai Tietz kti...@redhat.com * constrains.md (define_constrain): New 'B' constraint. Please make this a two-letter constraint (perhaps Bs). We are already short in single-letter constraints. I plan to change z and w @internal constraints to Bz and Bw to return these two letters. Uros.
[PATCH, testsuite]: Properly escape brackets in gcc.target/i386/sibcall-X.c scan strings
Hello! 2014-06-01 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/sibcall-2.c (dg-final): Properly escape '[' and ']' in scan-assembler-not string. * gcc.target/i386/sibcall-3.c (dg-final): Ditto. * gcc.target/i386/sibcall-4.c (dg-final): Ditto. * gcc.target/i386/sibcall-6.c (dg-final): Ditto. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: gcc.target/i386/sibcall-2.c === --- gcc.target/i386/sibcall-2.c (revision 22) +++ gcc.target/i386/sibcall-2.c (working copy) @@ -13,4 +13,4 @@ return (a 0 ? doo1 : doo2) (a); } -/* { dg-final { scan-assembler-not call[ \t]*.%eax } } */ +/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */ Index: gcc.target/i386/sibcall-3.c === --- gcc.target/i386/sibcall-3.c (revision 22) +++ gcc.target/i386/sibcall-3.c (working copy) @@ -13,4 +13,4 @@ return foo (a); } -/* { dg-final { scan-assembler-not jmp[ \t]*.%eax } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ Index: gcc.target/i386/sibcall-4.c === --- gcc.target/i386/sibcall-4.c (revision 22) +++ gcc.target/i386/sibcall-4.c (working copy) @@ -12,4 +12,4 @@ dispatch[offset](offset); } -/* { dg-final { scan-assembler-not jmp[ \t]*.%eax } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ Index: gcc.target/i386/sibcall-6.c === --- gcc.target/i386/sibcall-6.c (revision 22) +++ gcc.target/i386/sibcall-6.c (working copy) @@ -34,4 +34,4 @@ if (postorder_func) (*postorder_func) (loop_node); } -/* { dg-final { scan-assembler jmp[ \t]*.%eax } } */ +/* { dg-final { scan-assembler jmp\[ \t\]*.%eax } } */
[PATCH, i386]: Rename two @internal constraints
Hello! This change renames two @internal constraints to free two letters. 2014-06-01 Uros Bizjak ubiz...@gmail.com * config/i386/constraints.md (Bw): Rename from 'w'. (Bz): Rename from 'z'. * config/i386/i386.md: Change 'w' to 'Bw' and 'z' to 'Bz' globally. Tested on x86_64-linux-gnu {,-m32} and committed to mainline. Uros. Index: config/i386/constraints.md === --- config/i386/constraints.md (revision 22) +++ config/i386/constraints.md (working copy) @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; H -;;; h j +;;; h jw z ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -91,6 +91,9 @@ (define_register_constraint x TARGET_SSE ? SSE_REGS : NO_REGS Any SSE register.) +(define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS + Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).) + ;; We use the Y prefix to denote any number of conditional register sets: ;; z First SSE register. ;; i SSE2 inter-unit moves to SSE register enabled @@ -144,8 +147,10 @@ (ix86_fpmath FPMATH_387) ? FLOAT_REGS : NO_REGS @internal Any x87 register when 80387 FP arithmetic is enabled.) -;; We use the B prefix to denote any number of internal memory operands: -;; s Sibling memory operand. +;; We use the B prefix to denote any number of internal operands: +;; s Sibcall memory operand, not valid for TARGET_X32 +;; w Call memory operand, not valid for TARGET_X32 +;; z Constant call address operand. (define_constraint Bs @internal Sibcall memory operand. @@ -152,18 +157,15 @@ (and (not (match_test TARGET_X32)) (match_operand 0 sibcall_memory_operand))) -(define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS - Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).) +(define_constraint Bw + @internal Call memory operand. + (and (not (match_test TARGET_X32)) + (match_operand 0 memory_operand))) -(define_constraint z +(define_constraint Bz @internal Constant call address operand. (match_operand 0 constant_call_address_operand)) -(define_constraint w - @internal Call memory operand. - (and (not (match_test TARGET_X32)) - (match_operand 0 memory_operand))) - ;; Integer constant constraints. (define_constraint I Integer constant in the range 0 @dots{} 31, for 32-bit shifts. Index: config/i386/i386.md === --- config/i386/i386.md (revision 22) +++ config/i386/i386.md (working copy) @@ -11182,7 +11182,7 @@ }) (define_insn *indirect_jump - [(set (pc) (match_operand:W 0 indirect_branch_operand rw))] + [(set (pc) (match_operand:W 0 indirect_branch_operand rBw))] jmp\t%A0 [(set_attr type ibr) @@ -11230,7 +11230,7 @@ }) (define_insn *tablejump_1 - [(set (pc) (match_operand:W 0 indirect_branch_operand rw)) + [(set (pc) (match_operand:W 0 indirect_branch_operand rBw)) (use (label_ref (match_operand 1)))] jmp\t%A0 @@ -11360,7 +11360,7 @@ }) (define_insn *call - [(call (mem:QI (match_operand:W 0 call_insn_operand czw)) + [(call (mem:QI (match_operand:W 0 call_insn_operand cBwBz)) (match_operand 1))] !SIBLING_CALL_P (insn) * return ix86_output_call_insn (insn, operands[0]); @@ -11368,7 +11368,7 @@ (define_insn *call_rex64_ms_sysv [(match_parallel 2 call_rex64_ms_sysv_operation -[(call (mem:QI (match_operand:DI 0 call_insn_operand rzw)) +[(call (mem:QI (match_operand:DI 0 call_insn_operand rBwBz)) (match_operand 1)) (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)])] TARGET_64BIT !SIBLING_CALL_P (insn) @@ -11376,7 +11376,7 @@ [(set_attr type call)]) (define_insn *sibcall - [(call (mem:QI (match_operand:W 0 sibcall_insn_operand UzBs)) + [(call (mem:QI (match_operand:W 0 sibcall_insn_operand UBsBz)) (match_operand 1))] SIBLING_CALL_P (insn) * return ix86_output_call_insn (insn, operands[0]); @@ -11396,7 +11396,7 @@ }) (define_insn *call_pop - [(call (mem:QI (match_operand:SI 0 call_insn_operand lzm)) + [(call (mem:QI (match_operand:SI 0 call_insn_operand lmBz)) (match_operand 1)) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) @@ -11406,7 +11406,7 @@ [(set_attr type call)]) (define_insn *sibcall_pop - [(call (mem:QI (match_operand:SI 0 sibcall_insn_operand UzBs)) + [(call (mem:QI (match_operand:SI 0 sibcall_insn_operand UBsBz)) (match_operand 1)) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) @@ -11443,7 +11443,7 @@ (define_insn *call_value [(set (match_operand 0) - (call (mem:QI (match_operand:W 1 call_insn_operand czw)) + (call (mem:QI (match_operand:W 1 call_insn_operand cBwBz)) (match_operand 2)))] !SIBLING_CALL_P (insn) * return ix86_output_call_insn (insn, operands[1]); @@ -11451,7 +11451,7 @@ (define_insn *sibcall_value [(set
[PATCH, testsuite]: Fixes for recent ia32 testsuite failures
Hello! Plus a more modern dg-do target selector instead of dg-require-effective-target. 2014-06-01 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/sibcall-2.c: Xfail dg-final scan-assembler-not, not compilation. * gcc.target/i386/sibcall-4.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Add -mregparm=1 for ia32 target. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: fuse-caller-save.c === --- fuse-caller-save.c (revision 22) +++ fuse-caller-save.c (working copy) @@ -1,5 +1,7 @@ /* { dg-do compile } */ /* { dg-options -O2 -fuse-caller-save } */ +/* { dg-additional-options -mregparm=1 { target ia32 } } */ + /* Testing -fuse-caller-save optimization option. */ static int __attribute__((noinline)) Index: sibcall-1.c === --- sibcall-1.c (revision 22) +++ sibcall-1.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern int (*foo)(int); Index: sibcall-2.c === --- sibcall-2.c (revision 28) +++ sibcall-2.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile { xfail { *-*-* } } } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern int doo1 (int); @@ -13,4 +12,4 @@ return (a 0 ? doo1 : doo2) (a); } -/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */ +/* { dg-final { scan-assembler-not call\[ \t\]*.%eax { xfail *-*-* } } } */ Index: sibcall-3.c === --- sibcall-3.c (revision 28) +++ sibcall-3.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern Index: sibcall-4.c === --- sibcall-4.c (revision 28) +++ sibcall-4.c (working copy) @@ -1,6 +1,5 @@ /* Testcase for PR target/46219. */ -/* { dg-do compile { xfail { *-*-* } } } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ typedef void (*dispatch_t)(long offset); @@ -12,4 +11,4 @@ dispatch[offset](offset); } -/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax { xfail *-*-* } } } */ Index: sibcall-5.c === --- sibcall-5.c (revision 22) +++ sibcall-5.c (working copy) @@ -1,6 +1,5 @@ /* Check that indirect sibcalls understand regparm. */ -/* { dg-do run } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do run { target ia32 } } */ /* { dg-options -O2 } */ extern void abort (void); Index: sibcall-6.c === --- sibcall-6.c (revision 28) +++ sibcall-6.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ typedef void *ira_loop_tree_node_t;
[PATCH, i386]: Fix PR 61239, ICE in decompose, at rtl.h when compiling vshuf-v16hi.c using -mavx2
Hello! 2014-06-02 Uros Bizjak ubiz...@gmail.com PR target/61239 * config/i386/i386.c (ix86_expand_vec_perm) [case V32QImode]: Use GEN_INT (-128) instead of GEN_INT (128) to set MSB of QImode constant. Tested on x86_64-pc-linux-gnu with make check-gcc RUNTESTFLAGS='--target_board=unix\{-msse2,-msse4,-mavx,-mavx2\} dg-torture.exp=vshuf*.c' and committed to mainline SVN. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 211125) +++ config/i386/i386.c (working copy) @@ -21541,7 +21541,7 @@ ix86_expand_vec_perm (rtx operands[]) t1 = gen_reg_rtx (V32QImode); t2 = gen_reg_rtx (V32QImode); t3 = gen_reg_rtx (V32QImode); - vt2 = GEN_INT (128); + vt2 = GEN_INT (-128); for (i = 0; i 32; i++) vec[i] = vt2; vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
[PATCH, testsuite]: Add -mno-avx2 to some i386 XOP tests
Hello! With targets that default to AVX2, these tests vectorize via 256bit paths, where different insns are emitted. 2014-06-02 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/xop-rotate1-vector.c (dg-options): Add -mno-avx2. * gcc.target/i386/xop-rotate2-vector.c (dg-options): Ditto. * gcc.target/i386/xop-rotate3-vector.c (dg-options): Ditto. * gcc.target/i386/xop-imul32widen-vector.c (dg-options): Ditto. * gcc.target/i386/xop-imul64-vector.c (dg-options): Ditto. * gcc.target/i386/xop-shift1-vector.c (dg-options): Ditto. * gcc.target/i386/xop-shift2-vector.c (dg-options): Ditto. * gcc.target/i386/xop-shift3-vector.c (dg-options): Ditto. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: gcc.target/i386/xop-rotate1-vector.c === --- gcc.target/i386/xop-rotate1-vector.c(revision 211125) +++ gcc.target/i386/xop-rotate1-vector.c(working copy) @@ -2,7 +2,7 @@ into prot on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-rotate2-vector.c === --- gcc.target/i386/xop-rotate2-vector.c(revision 211125) +++ gcc.target/i386/xop-rotate2-vector.c(working copy) @@ -2,7 +2,7 @@ into prot on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-imul32widen-vector.c === --- gcc.target/i386/xop-imul32widen-vector.c(revision 211125) +++ gcc.target/i386/xop-imul32widen-vector.c(working copy) @@ -3,7 +3,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-rotate3-vector.c === --- gcc.target/i386/xop-rotate3-vector.c(revision 211125) +++ gcc.target/i386/xop-rotate3-vector.c(working copy) @@ -2,7 +2,7 @@ into prot on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-imul64-vector.c === --- gcc.target/i386/xop-imul64-vector.c (revision 211125) +++ gcc.target/i386/xop-imul64-vector.c (working copy) @@ -3,7 +3,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-shift1-vector.c === --- gcc.target/i386/xop-shift1-vector.c (revision 211125) +++ gcc.target/i386/xop-shift1-vector.c (working copy) @@ -2,7 +2,7 @@ psha/pshl on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-shift2-vector.c === --- gcc.target/i386/xop-shift2-vector.c (revision 211125) +++ gcc.target/i386/xop-shift2-vector.c (working copy) @@ -2,7 +2,7 @@ psha/pshl on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int); Index: gcc.target/i386/xop-shift3-vector.c === --- gcc.target/i386/xop-shift3-vector.c (revision 211125) +++ gcc.target/i386/xop-shift3-vector.c (working copy) @@ -2,7 +2,7 @@ psha/pshl on XOP systems. */ /* { dg-do compile { target { ! { ia32 } } } } */ -/* { dg-options -O2 -mxop -ftree-vectorize } */ +/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */ extern void exit (int);
[PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg
Hello! A problem was uncovered by -march=corei7 -mtune=intel -m32 with i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the max size of the transfer was known, the memcpy was not inlined, as expected by the testcase. The core of the problem can be seen in the definition of 32bit intel_memcpy stringop alg: {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}}, Please note that the last algorithm sets its maximum size to -1, unlimited. However, in decide_alg, the same number also signals that no algorithm sets its size, so expected_size is never calculated. In the loop that sets maximal size for user defined algorithm, it is assumed that -1 belongs exclusively to libcall, which is not the case in the above intel_memcpy definition: if (candidate != libcall candidate usable) max = algs-size[i].max; When the last non-libcall algorithm sets its maximum to -1 (aka unlimited), this value fails following test: if (max 1 (unsigned HOST_WIDE_INT) max = max_size and expected_size is never calculated. Attached patch fixes this oversight, so -1 means unlimited size and 0 means that size was never set. The patch also considers these two special values when choosing a maximum size for dynamic check. 2014-06-02 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (decide_alg): Correctly handle maximum size of stringop algorithm. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, also with RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\}, where it fixes both memcpy failures from [1]. [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html Jan, can you please review the patch, to check if the logic is OK? Uros. Index: fuse-caller-save.c === --- fuse-caller-save.c (revision 22) +++ fuse-caller-save.c (working copy) @@ -1,5 +1,7 @@ /* { dg-do compile } */ /* { dg-options -O2 -fuse-caller-save } */ +/* { dg-additional-options -mregparm=1 { target ia32 } } */ + /* Testing -fuse-caller-save optimization option. */ static int __attribute__((noinline)) Index: sibcall-1.c === --- sibcall-1.c (revision 22) +++ sibcall-1.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern int (*foo)(int); Index: sibcall-2.c === --- sibcall-2.c (revision 28) +++ sibcall-2.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile { xfail { *-*-* } } } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern int doo1 (int); @@ -13,4 +12,4 @@ return (a 0 ? doo1 : doo2) (a); } -/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */ +/* { dg-final { scan-assembler-not call\[ \t\]*.%eax { xfail *-*-* } } } */ Index: sibcall-3.c === --- sibcall-3.c (revision 28) +++ sibcall-3.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ extern Index: sibcall-4.c === --- sibcall-4.c (revision 28) +++ sibcall-4.c (working copy) @@ -1,6 +1,5 @@ /* Testcase for PR target/46219. */ -/* { dg-do compile { xfail { *-*-* } } } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ typedef void (*dispatch_t)(long offset); @@ -12,4 +11,4 @@ dispatch[offset](offset); } -/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax { xfail *-*-* } } } */ Index: sibcall-5.c === --- sibcall-5.c (revision 22) +++ sibcall-5.c (working copy) @@ -1,6 +1,5 @@ /* Check that indirect sibcalls understand regparm. */ -/* { dg-do run } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do run { target ia32 } } */ /* { dg-options -O2 } */ extern void abort (void); Index: sibcall-6.c === --- sibcall-6.c (revision 28) +++ sibcall-6.c (working copy) @@ -1,5 +1,4 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target ia32 } */ +/* { dg-do compile { target ia32 } } */ /* { dg-options -O2 } */ typedef void *ira_loop_tree_node_t;
Re: [PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg
On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak ubiz...@gmail.com wrote: A problem was uncovered by -march=corei7 -mtune=intel -m32 with i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the max size of the transfer was known, the memcpy was not inlined, as expected by the testcase. The core of the problem can be seen in the definition of 32bit intel_memcpy stringop alg: {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}}, Please note that the last algorithm sets its maximum size to -1, unlimited. However, in decide_alg, the same number also signals that no algorithm sets its size, so expected_size is never calculated. In the loop that sets maximal size for user defined algorithm, it is assumed that -1 belongs exclusively to libcall, which is not the case in the above intel_memcpy definition: if (candidate != libcall candidate usable) max = algs-size[i].max; When the last non-libcall algorithm sets its maximum to -1 (aka unlimited), this value fails following test: if (max 1 (unsigned HOST_WIDE_INT) max = max_size and expected_size is never calculated. Attached patch fixes this oversight, so -1 means unlimited size and 0 means that size was never set. The patch also considers these two special values when choosing a maximum size for dynamic check. 2014-06-02 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (decide_alg): Correctly handle maximum size of stringop algorithm. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, also with RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\}, where it fixes both memcpy failures from [1]. [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html Jan, can you please review the patch, to check if the logic is OK? Whoops, wrong patch was attached. Now with the correct attachment. Uros. Index: ChangeLog === --- ChangeLog (revision 211140) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-06-02 Uros Bizjak ubiz...@gmail.com + + * config/i386/i386.c (decide_alg): Correctly handle maximum size of + stringop algorithm. + 2014-06-02 Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write. Index: config/i386/i386.c === --- config/i386/i386.c (revision 211140) +++ config/i386/i386.c (working copy) @@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp { const struct stringop_algs * algs; bool optimize_for_speed; - int max = -1; + int max = 0; const struct processor_costs *cost; int i; bool any_alg_usable_p = false; @@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp /* If expected size is not known but max size is small enough so inline version is a win, set expected size into the range. */ - if (max 1 (unsigned HOST_WIDE_INT) max = max_size + if (((max 1 (unsigned HOST_WIDE_INT) max = max_size) || max == -1) expected_size == -1) expected_size = min_size / 2 + max_size / 2; @@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp *dynamic_check = 128; return loop_1_byte; } - if (max == -1) + if (max = 0) max = 4096; alg = decide_alg (count, max / 2, min_size, max_size, memset, zero_memset, dynamic_check, noalign);
[PATCH, testsuite]: Fix g++.dg/ext/mv[14,15].C spurious failure on corei7
Hello! When configured with --with-arch=core-avx-i --with-cpu=core-avx-i, g++.dg/ext/mv[14,15].C tests fail on corei7 [1] since the default CPU is the same as the checked cpu in the test. The patch compiles the testcase with -march=x86-64 as the generic CPU 2014-06-03 Uros Bizjak ubiz...@gmail.com * g++.dg/ext/mv14.C (dg-options): Add -march=x86-64. * g++.dg/ext/mv15.C (dg-options): Ditto. Tested on x86_64-pc-linux-gnu {,-m32} corei7 CPU and committed to mainline SVN. [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00243.html Uros. Index: g++.dg/ext/mv14.C === --- g++.dg/ext/mv14.C (revision 211188) +++ g++.dg/ext/mv14.C (working copy) @@ -1,7 +1,7 @@ /* Test case to check if Multiversioning works. */ /* { dg-do run { target i?86-*-* x86_64-*-* } } */ /* { dg-require-ifunc } */ -/* { dg-options -O2 -fPIC } */ +/* { dg-options -O2 -fPIC -march=x86-64 } */ #include assert.h Index: g++.dg/ext/mv15.C === --- g++.dg/ext/mv15.C (revision 211188) +++ g++.dg/ext/mv15.C (working copy) @@ -1,7 +1,7 @@ /* Test case to check if Multiversioning works. */ /* { dg-do run { target i?86-*-* x86_64-*-* } } */ /* { dg-require-ifunc } */ -/* { dg-options -O2 -fPIC } */ +/* { dg-options -O2 -fPIC -march=x86-64 } */ #include assert.h
Re: [fortran, patch] IEEE intrinsic modules
Hello! +int get_fpu_except_flags (void) { unsigned short cw; int excepts; int result = 0; - __asm__ __volatile__ (fnstsw\t%0 : =a (cw)); + __asm__ __volatile__ (fnstsw\t%0 : =m (cw)); excepts = cw; if (has_sse()) You can use =am constraint here, and the compiler will be free to choose the most appropriate form. Also, you should use __asm__ __volatile__ consistently in the headers. Uros.
Re: [fortran, patch] IEEE intrinsic modules
Hello! 0. Gradual underflow control is implemented as not supported by the processor (its SUPPORT function returns false, and the GET and SET procedures abort if you call them). That’s explicitly allowed by the standard, so it’s not actually “missing. We can improve on this in the future, if people can help. Please look at libgcc/config/i386/crtfastmath.c for how to set MXCSR_FTZ from mxcsr. You already have all necessary bits in place, the function is basically only: + if (has_sse()) + { +unsigned int cw_sse; + +__asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse)); +cw_sse |= MXCSR_DAZ; +__asm__ __volatile__ (%vldmxcsr\t%0 : : m (cw_sse)); + } Please note, that FTZ applies only to SSE math. x87 and (IIRC) soft-FP don't handle this setting. Uros.
Re: [fortran, patch] IEEE intrinsic modules
On Thu, Jun 5, 2014 at 11:35 AM, FX fxcoud...@gmail.com wrote: Please look at libgcc/config/i386/crtfastmath.c for how to set MXCSR_FTZ from mxcsr. You already have all necessary bits in place, the function is basically only: + if (has_sse()) + { +unsigned int cw_sse; + +__asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse)); +cw_sse |= MXCSR_DAZ; +__asm__ __volatile__ (%vldmxcsr\t%0 : : m (cw_sse)); + } Oops, the above should read MXCSR_FTZ. Thanks for the suggestion! Please note, that FTZ applies only to SSE math. x87 and (IIRC) soft-FP don't handle this setting. Yeah, that’s also why I prefer for now to have it declared as unsupported: the Fortran standard doesn’t really allow for partial support such as this, so I’m still trying to figure out what The Right Thing To Do is. Referring to some older mails [1], this looks like a performance-only setting (sort of fast-math). So, we can perhaps just set this bit, regardless of the details. Maybe soft-fp will grow support for FTZ sometime, it looks like a useful addition from the performance POV. [1] https://gcc.gnu.org/ml/fortran/2013-11/msg00133.html Uros.
Re: libgo patch committed: Merge from revision 18783 of master
Hello! I have committed a patch to libgo to merge from revision 18783:00cce3a34d7e of the master library. This revision was committed January 7. I picked this revision to merge to because the next revision deleted a file that is explicitly merged in by the libgo/merge.sh script. crypto/x509 fails on x86 Fedora20 with: --- FAIL: TestImports (0.00 seconds) testing.go:228: failed to run x509_test_import.go: exec: go: executable file not found in $PATH FAIL FAIL: crypto/x509 Uros.
Re: [PATCH] Fix PR61335
Hello! 2014-05-28 Richard Biener rguent...@suse.de PR tree-optimization/61335 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and new range fails, drop to varying. * gfortran.dg/pr61335.f90: New testcase. This testcase triggers SIGFPE on alpha due to the use of denormal operand. Maybe uninitialized value is used in line 48? Reading symbols from ./pr61335.exe...done. (gdb) r Starting program: /space/homedirs/uros/test/pr61335.exe Program received signal SIGFPE, Arithmetic exception. 0x00012b54 in cp_units::cp_unit_create (string=error reading variable: Cannot access memory at address 0x120004000, _string=5) at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:48 48 unit_id=cp_units_none (gdb) list 43 len_string, next_power 44 INTEGER, DIMENSION(cp_unit_max_kinds):: kind_id, power, unit_id 45 LOGICAL :: failure 46 47 failure=.FALSE. 48 unit_id=cp_units_none 49 kind_id=cp_ukind_none 50 power=0 51 i_low=1 52 i_high=1 The exception is triggered in 0x12b50, but emitted on the next FP insn. 0x00012b4c +76:lds $f10,48(fp) 0x00012b50 +80:cvttq/c $f10,$f10 = 0x00012b54 +84:ftoit $f10,t0 (gdb) b *0x12b50 Breakpoint 1 at 0x12b50: file /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90, line 48. (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /space/homedirs/uros/test/pr61335.exe Breakpoint 1, 0x00012b50 in cp_units::cp_unit_create (string=error reading variable: Cannot access memory at address 0x120004000, _string=5) at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:48 48 unit_id=cp_units_none (gdb) i r $f10 f108.0173244974249919e-310 (raw 0x9396) The test passes with -mieee that allows denormals. Uros.
Re: [PATCH] Fix PR61335
On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote: 2014-05-28 Richard Biener rguent...@suse.de PR tree-optimization/61335 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and new range fails, drop to varying. * gfortran.dg/pr61335.f90: New testcase. This testcase triggers SIGFPE on alpha due to the use of denormal operand. Maybe uninitialized value is used in line 48? SIGFPE also triggers at the same place on x86_64 with unmasked FPE exceptions (compile with -O0). (gdb) b main Breakpoint 1 at 0x401602: file /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90, line 115. (gdb) r Starting program: /home/uros/test/pr61335.exe warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaab000 Breakpoint 1, main (argc=1, argv=0x7fffd88e) at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:115 115 USE cp_units (gdb) i r mxcsr mxcsr 0x1f80 [ IM DM ZM OM UM PM ] (gdb) set $mxcsr=0x1000 (gdb) i r mxcsr mxcsr 0x1000 [ PM ] (gdb) c Continuing. Program received signal SIGFPE, Arithmetic exception. 0x00400b60 in cp_units::cp_unit_create (string=error reading variable: Cannot access memory at address 0x401c47, _string=5) at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:49 49 kind_id=cp_ukind_none 0x00400b57 +95:mov-0x28(%rbp),%eax 0x00400b5a +98:mov%eax,-0x280(%rbp) = 0x00400b60 +104: cvttss2si -0x280(%rbp),%ecx (gdb) i r mxcsr mxcsr 0x1021 [ IE PE PM ] Uros.
Re: [patch] Update catch(...) handlers to deal with __forced_unwind
Hello! Failing to rethrow a __forced_unwind exception is very bad. This patch ensures we rethrow them in async tasks, and makes the shared state ready with a broken_promise so that waiting threads don't block forever. That seems reasonable to me, does anyone have any better ideas? Tested x86_64-linux, will wait for feedback before committing. Committed to trunk. * testsuite/30_threads/async/forced_unwind.cc: New. * testsuite/30_threads/packaged_task/forced_unwind.cc: New. These two tests timeout on alpha-linux-gnu: FAIL: 30_threads/async/forced_unwind.cc execution test WARNING: program timed out. FAIL: 30_threads/packaged_task/forced_unwind.cc execution test WARNING: program timed out. strace -f of 30_threads/async/forced_unwind.cc execution test: ... open(/lib/libpthread.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\220\1\0\0\0\320r\0\0\0\0\0\0..., 832) = 832 fstat64(3, {st_mode=S_IFREG|0755, st_size=141449, ...}) = 0 mmap(NULL, 189528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x227c000 mprotect(0x2296000, 57344, PROT_NONE) = 0 mmap(0x22a4000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x22a4000 mmap(0x22a8000, 9304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x22a8000 close(3)= 0 open(/lib/libc.so.6.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\220\1\0\0\0`_\2\0\0\0\0\0..., 832) = 832 fstat64(3, {st_mode=S_IFREG|0755, st_size=1646104, ...}) = 0 mmap(NULL, 1719888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x22ac000 mprotect(0x2438000, 57344, PROT_NONE) = 0 mmap(0x2446000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18a000) = 0x2446000 mmap(0x244e000, 7760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x244e000 close(3)= 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x245 mprotect(0x2446000, 16384, PROT_READ) = 0 mprotect(0x22a4000, 8192, PROT_READ) = 0 mprotect(0x2278000, 8192, PROT_READ) = 0 mprotect(0x2254000, 8192, PROT_READ) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2452000 mprotect(0x217a000, 24576, PROT_READ) = 0 mprotect(0x120016000, 8192, PROT_READ) = 0 mprotect(0x2032000, 8192, PROT_READ) = 0 munmap(0x2024000, 41134)= 0 set_tid_address(0x2450e50) = 18325 set_robust_list(0x2450e60, 24) = 0 futex(0x11f813260, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x11f813260, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 22b0248) = -1 EAGAIN (Resource temporarily unavailable) rt_sigaction(SIGRT_0, {0x2282db0, [], SA_SIGINFO}, NULL, 8, 0) = 0 rt_sigaction(SIGRT_1, {0x2282c70, [], SA_RESTART|SA_SIGINFO}, NULL, 8, 0) = 0 rt_sigprocmask(SIG_UNBLOCK, [RT_0 RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=9223372036854775807}) = 0 brk(0) = 0x12001a000 brk(0x12003c000)= 0x12003c000 mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x2454000 mprotect(0x2454000, 8192, PROT_NONE) = 0 clone(Process 18326 attached child_stack=0x2c52ae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2c532c0, tls=0x2c538e0, child_tidptr=0x2c532c0) = 18326 [pid 18326] set_robust_list(0x2c532d0, 24 unfinished ... [pid 18325] futex(0x2c532c0, FUTEX_WAIT, 18326, NULL unfinished ... [pid 18326] ... set_robust_list resumed ) = 0 [pid 18326] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2c54000 [pid 18326] munmap(0x2c54000, 54181888) = 0 [pid 18326] munmap(0x2000800, 12926976) = 0 [pid 18326] mprotect(0x2000400, 139264, PROT_READ|PROT_WRITE) = 0 [pid 18326] futex(0x227a1f4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 18326] futex(0x12001a08c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 18326] madvise(0x2454000, 8355840, MADV_DONTNEED) = 0 [pid 18326] exit(0) = ? [pid 18326] +++ exited with 0 +++ ... futex resumed ) = 0 futex(0x12001a098, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x12001a05c, FUTEX_WAIT_PRIVATE, 1, NULL ... the test hangs here ... Uros.
Re: [patch] Update catch(...) handlers to deal with __forced_unwind
On Fri, Jun 6, 2014 at 11:19 AM, Jonathan Wakely jwak...@redhat.com wrote: On 06/06/14 10:27 +0200, Uros Bizjak wrote: These two tests timeout on alpha-linux-gnu: FAIL: 30_threads/async/forced_unwind.cc execution test WARNING: program timed out. FAIL: 30_threads/packaged_task/forced_unwind.cc execution test WARNING: program timed out. Sorry about that, I don't know why. Does pthread_exit(0) use a __forced_unwind exception on alpha-linux-gnu? This should tell you ... #include bits/cxxabi_forced.h #include pthread.h void* f(void*) { try { pthread_exit(0); } catch (__cxxabiv1::__forced_unwind const) { __builtin_puts(unwind); throw; } catch (...) { __builtin_puts(something else); throw; } } int main() { pthread_t t; pthread_create(t, 0, f, 0); pthread_join(t, 0); } Strange, I don't get anything ... $ g++ -lpthread pt.C $ ./a.out $ $ g++ --version g++ (Gentoo 4.8.2 p1.3r1, pie-0.5.8r1) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Maybe Richard knows why... [pid 18326] futex(0x227a1f4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 18326] futex(0x12001a08c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 [pid 18326] madvise(0x2454000, 8355840, MADV_DONTNEED) = 0 [pid 18326] exit(0) = ? [pid 18326] +++ exited with 0 +++ ... futex resumed ) = 0 futex(0x12001a098, FUTEX_WAKE_PRIVATE, 2147483647) = 0 futex(0x12001a05c, FUTEX_WAIT_PRIVATE, 1, NULL ... the test hangs here ... Could I get a stack trace of the remaining thread at that point? Reading symbols from ./forced_unwind.exe...done. (gdb) r Starting program: /space/homedirs/uros/test/forced_unwind.exe [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/libthread_db.so.1. [New Thread 0x2c531f0 (LWP 22587)] [Thread 0x2c531f0 (LWP 22587) exited] ^C Program received signal SIGINT, Interrupt. 0x02289ca4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 (gdb) bt #0 0x02289ca4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x021279ec in std::condition_variable::wait(std::unique_lockstd::mutex) () from /usr/lib/gcc/alpha-unknown-linux-gnu/4.8.2/libstdc++.so.6 #2 0x000120001a80 in waitstd::__future_base::_State_baseV2::wait()::lambda() (__p=..., __lock=..., this=0x12001a058) at /home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/condition_variable:98 #3 wait (this=0x12001a020) at /home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:323 #4 _M_get_result (this=0x11fc8f190) at /home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:618 #5 get (this=0x11fc8f190) at /home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:783 #6 main () at /home/uros/gcc-svn/trunk/libstdc++-v3/testsuite/30_threads/async/forced_unwind.cc:38 (gdb) Uros.
[PATCH, i386]: Fix PR 61423, incorrect conversion from unsigned int to floating point
Hello! Attached patch fixes PR 61423. The problem was that splitters omitted apparently necessary zero extension, and left garbage in the highpart of the register. 2014-06-06 Uros Bizjak ubiz...@gmail.com PR target/61423 * config/i386/i386.md (*floatunssimode2_i387_with_xmm): New define_insn_and_split pattern, merged from *floatunssimode2_1 and corresponding splitters. Zero extend general register or memory input operand to XMM temporary. Enable for TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only. (floatunssimode2): Update expander predicate. testsuite/ChangeLog: 2014-06-06 Uros Bizjak ubiz...@gmail.com PR target/61423 * gcc.target/i386/pr61423.c: New test. The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Please note that the patch breaks bootstrap when gcc is configured with --with-arch=core-avx-i --with-cpu=core-avx-i due to an unrelated problem in REE pass. The failing preprocessed source from the libgcc is attached to the PR. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 211316) +++ config/i386/i386.md (working copy) @@ -4943,66 +4943,37 @@ ;; Avoid store forwarding (partial memory) stall penalty by extending ;; SImode value to DImode through XMM register instead of pushing two -;; SImode values to stack. Note that even !TARGET_INTER_UNIT_MOVES_TO_VEC -;; targets benefit from this optimization. Also note that fild -;; loads from memory only. +;; SImode values to stack. Also note that fild loads from memory only. -(define_insn *floatunssimode2_1 - [(set (match_operand:X87MODEF 0 register_operand =f,f) +(define_insn_and_split *floatunssimode2_i387_with_xmm + [(set (match_operand:X87MODEF 0 register_operand =f) (unsigned_float:X87MODEF - (match_operand:SI 1 nonimmediate_operand x,m))) - (clobber (match_operand:DI 2 memory_operand =m,m)) - (clobber (match_scratch:SI 3 =X,x))] + (match_operand:SI 1 nonimmediate_operand rm))) + (clobber (match_scratch:DI 3 =x)) + (clobber (match_operand:DI 2 memory_operand =m))] !TARGET_64BIT TARGET_80387 X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode) -TARGET_SSE +TARGET_SSE2 TARGET_INTER_UNIT_MOVES_TO_VEC # + reload_completed + [(set (match_dup 3) (zero_extend:DI (match_dup 1))) + (set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (float:X87MODEF (match_dup 2)))] + [(set_attr type multi) (set_attr mode MODE)]) -(define_split - [(set (match_operand:X87MODEF 0 register_operand) - (unsigned_float:X87MODEF - (match_operand:SI 1 register_operand))) - (clobber (match_operand:DI 2 memory_operand)) - (clobber (match_scratch:SI 3))] - !TARGET_64BIT -TARGET_80387 X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode) -TARGET_SSE -reload_completed - [(set (match_dup 2) (match_dup 1)) - (set (match_dup 0) - (float:X87MODEF (match_dup 2)))] - operands[1] = simplify_gen_subreg (DImode, operands[1], SImode, 0);) - -(define_split - [(set (match_operand:X87MODEF 0 register_operand) - (unsigned_float:X87MODEF - (match_operand:SI 1 memory_operand))) - (clobber (match_operand:DI 2 memory_operand)) - (clobber (match_scratch:SI 3))] - !TARGET_64BIT -TARGET_80387 X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode) -TARGET_SSE -reload_completed - [(set (match_dup 2) (match_dup 3)) - (set (match_dup 0) - (float:X87MODEF (match_dup 2)))] -{ - emit_move_insn (operands[3], operands[1]); - operands[3] = simplify_gen_subreg (DImode, operands[3], SImode, 0); -}) - (define_expand floatunssimode2 [(parallel [(set (match_operand:X87MODEF 0 register_operand) (unsigned_float:X87MODEF (match_operand:SI 1 nonimmediate_operand))) - (clobber (match_dup 2)) - (clobber (match_scratch:SI 3))])] + (clobber (match_scratch:DI 3)) + (clobber (match_dup 2))])] !TARGET_64BIT ((TARGET_80387 X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode) -TARGET_SSE) +TARGET_SSE2 TARGET_INTER_UNIT_MOVES_TO_VEC) || (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH)) { if (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) Index: testsuite/gcc.target/i386/pr61423.c === --- testsuite/gcc.target/i386/pr61423.c (revision 0) +++ testsuite/gcc.target/i386/pr61423.c (working copy) @@ -0,0 +1,38 @@ +/* PR target/61423 */ +/* { dg-do run { target ia32 } } */ +/* { dg-options -O1 -ftree-vectorize -msse2 -mfpmath=387 -mtune=core2 } */ + +#define N 1024 +static unsigned int A[N]; + +double +__attribute__((noinline)) +func (void) +{ + unsigned int sum = 0; + unsigned i; + double t; + + for (i = 0; i N; i++) +sum += A[i]; + + t = sum; + return t; +} + +int +main () +{ + unsigned i; + double d; + + for(i = 0; i N; i++) +A[i] = 1; + + d
Re: [patch] fix tests for AVX512
On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com wrote: Hi, I've fixed tests for AVX512, so they could be compiled with -Werror -Wall. Please have a look. 2014-05-19 Petr Murzin petr.mur...@intel.com * gcc.target/i386/avx512f-vaddpd-2.c: Add static void for CALC, void for TEST instead of static void. * gcc.target/i386/avx512f-vaddps-2.c: Ditto. * gcc.target/i386/avx512f-vblendmpd-2.c: Ditto. * gcc.target/i386/avx512f-vblendmps-2.c: Ditto. * gcc.target/i386/avx512f-vbroadcastf32x4-2.c:Ditto. * gcc.target/i386/avx512f-vbroadcastf64x4-2.c:Ditto. * gcc.target/i386/avx512f-vbroadcasti32x4-2.c: Ditto. * gcc.target/i386/avx512f-vbroadcasti64x4-2.c: Ditto. * gcc.target/i386/avx512f-vbroadcastsd-2.c: Ditto. * gcc.target/i386/avx512f-vbroadcastss-2.c: Ditto. * gcc.target/i386/avx512f-vcvtps2dq-2.c: Ditto. * gcc.target/i386/avx512f-vcvttps2dq-2.c: Ditto. * gcc.target/i386/avx512f-vdivpd-2.c: Ditto. * gcc.target/i386/avx512f-vdivps-2.c: Ditto. * gcc.target/i386/avx512f-vextractf32x4-2.c: Ditto. * gcc.target/i386/avx512f-vextracti32x4-2.c: Ditto. * gcc.target/i386/avx512f-vmaxpd-2.c: Ditto. * gcc.target/i386/avx512f-vmaxps-2.c: Ditto. * gcc.target/i386/avx512f-vminpd-2.c: Ditto. * gcc.target/i386/avx512f-vminps-2.c: Ditto. * gcc.target/i386/avx512f-vmulpd-2.c: Ditto. * gcc.target/i386/avx512f-vmulps-2.c: Ditto. * gcc.target/i386/avx512f-vpaddd-2.c: Ditto. * gcc.target/i386/avx512f-vpaddq-2.c: Ditto. * gcc.target/i386/avx512f-vpblendmd-2.c: Ditto. * gcc.target/i386/avx512f-vpblendmq-2.c: Ditto. * gcc.target/i386/avx512f-vpbroadcastd-2.c: Ditto. * gcc.target/i386/avx512f-vpbroadcastq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpeqd-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpeqq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpgtd-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpgtq-2.c: Ditto. * gcc.target/i386/avx512f-vpmovdb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovdw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovqb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovqw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsdb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsdw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqd-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqw-2.c: Ditto. * gcc.target/i386/avx512f-vpslld-2.c: Ditto. * gcc.target/i386/avx512f-vpslldi-2.c: Ditto. * gcc.target/i386/avx512f-vpsllq-2.c: Ditto. * gcc.target/i386/avx512f-vpsllqi-2.c: Ditto. * gcc.target/i386/avx512f-vpsrad-2.c: Ditto. * gcc.target/i386/avx512f-vpsradi-2.c: Ditto. * gcc.target/i386/avx512f-vpsraq-2.c: Ditto. * gcc.target/i386/avx512f-vpsraqi-2.c: Ditto. * gcc.target/i386/avx512f-vpsravd-2.c: Ditto. * gcc.target/i386/avx512f-vpsravq-2.c: Ditto. * gcc.target/i386/avx512f-vpsubd-2.c: Ditto. * gcc.target/i386/avx512f-vpsubq-2.c: Ditto. * gcc.target/i386/avx512f-vptestmd-2.c: Ditto. * gcc.target/i386/avx512f-vptestmq-2.c: Ditto. * gcc.target/i386/avx512f-vptestnmd-2.c: Ditto. * gcc.target/i386/avx512f-vptestnmq-2.c: Ditto. * gcc.target/i386/avx512f-vpunpckhdq-2.c: Ditto. * gcc.target/i386/avx512f-vpunpckhqdq-2.c: Ditto. * gcc.target/i386/avx512f-vpunpckldq-2.c: Ditto. * gcc.target/i386/avx512f-vpunpcklqdq-2.c: Ditto. * gcc.target/i386/avx512f-vscalefpd-2.c: Ditto. * gcc.target/i386/avx512f-vscalefps-2.c: Ditto. * gcc.target/i386/avx512f-vshuff32x4-2.c: Ditto. * gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto. * gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto. * gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto. * gcc.target/i386/avx512f-vsubpd-2.c: Ditto. * gcc.target/i386/avx512f-vsubps-2.c: Ditto. * gcc.target/i386/avx512f-vpmovdb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovdw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovqb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovqw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsdb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsdw-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqb-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqd-2.c: Ditto. * gcc.target/i386/avx512f-vpmovsqw-2.c: Ditto. * gcc.target/i386/avx512f-vpsllvd-2.c: Ditto. * gcc.target/i386/avx512f-vpsllvq-2.c: Ditto. * gcc.target/i386/avx512f-vpsrld-2.c: Ditto. * gcc.target/i386/avx512f-vpsrldi-2.c: Ditto. * gcc.target/i386/avx512f-vpsrlq-2.c: Ditto. * gcc.target/i386/avx512f-vpsrlqi-2.c: Ditto. * gcc.target/i386/avx512f-vpsrlvd-2.c: Ditto. * gcc.target/i386/avx512f-vpsrlvq-2.c: Ditto. * gcc.target/i386/avx512f-vpshufd-2.c: Delete variables, void for TEST instead of static void. * gcc.target/i386/avx512f-vpcmpged-2.c: Add static void for CALC, delete unused variables. * gcc.target/i386/avx512f-vpcmpgeq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpgeud-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpgeuq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpled-2.c: Add static void for CALC, delete unused variables, void for TEST instead of static void. * gcc.target/i386/avx512f-vpcmpleq-2.c: Ditto. * gcc.target/i386/avx512f-vpcmpleud-2.c: Ditto.
Re: [patch] fix tests for AVX512
On Mon, Jun 9, 2014 at 1:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, On 08 Jun 11:26, Uros Bizjak wrote: On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com wrote: Hi, I've fixed tests for AVX512, so they could be compiled with -Werror -Wall. Please have a look. From a quick look, this looks OK. Thanks, checked into trunk. Could we apply that to 4.9 branch? OK, but please wait a couple of days to check if everything is OK in mainline and also for Release Manager to reject the patch. Thanks, Uros.
Re: [PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg
Ping. On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! A problem was uncovered by -march=corei7 -mtune=intel -m32 with i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the max size of the transfer was known, the memcpy was not inlined, as expected by the testcase. The core of the problem can be seen in the definition of 32bit intel_memcpy stringop alg: {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}}, Please note that the last algorithm sets its maximum size to -1, unlimited. However, in decide_alg, the same number also signals that no algorithm sets its size, so expected_size is never calculated. In the loop that sets maximal size for user defined algorithm, it is assumed that -1 belongs exclusively to libcall, which is not the case in the above intel_memcpy definition: if (candidate != libcall candidate usable) max = algs-size[i].max; When the last non-libcall algorithm sets its maximum to -1 (aka unlimited), this value fails following test: if (max 1 (unsigned HOST_WIDE_INT) max = max_size and expected_size is never calculated. Attached patch fixes this oversight, so -1 means unlimited size and 0 means that size was never set. The patch also considers these two special values when choosing a maximum size for dynamic check. 2014-06-02 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (decide_alg): Correctly handle maximum size of stringop algorithm. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, also with RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\}, where it fixes both memcpy failures from [1]. [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html Jan, can you please review the patch, to check if the logic is OK? Uros. Index: ChangeLog === --- ChangeLog (revision 211140) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-06-02 Uros Bizjak ubiz...@gmail.com + + * config/i386/i386.c (decide_alg): Correctly handle maximum size of + stringop algorithm. + 2014-06-02 Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write. Index: config/i386/i386.c === --- config/i386/i386.c (revision 211140) +++ config/i386/i386.c (working copy) @@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp { const struct stringop_algs * algs; bool optimize_for_speed; - int max = -1; + int max = 0; const struct processor_costs *cost; int i; bool any_alg_usable_p = false; @@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp /* If expected size is not known but max size is small enough so inline version is a win, set expected size into the range. */ - if (max 1 (unsigned HOST_WIDE_INT) max = max_size + if (((max 1 (unsigned HOST_WIDE_INT) max = max_size) || max == -1) expected_size == -1) expected_size = min_size / 2 + max_size / 2; @@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp *dynamic_check = 128; return loop_1_byte; } - if (max == -1) + if (max = 0) max = 4096; alg = decide_alg (count, max / 2, min_size, max_size, memset, zero_memset, dynamic_check, noalign);
Re: [PATCH, i386] Remove use of vpmacsdql instruction from multiplication.
On Tue, Jun 10, 2014 at 12:30 PM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Hi, The below patch fixes the issue with 64-bit multiplication. The instruction vpmacsdql does signed 32-bit multiplication. For V2DImode, we require widened unsigned multiplication. So, replacing the vpmacsdql instruction with vpmuludq and vpaddq. This patch had been already discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908 With required change in the test xop-imul64-vector.c, make check passes. Is it OK for upstream? Regards Ganesh diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d0a1253..c158612 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2014-06-10 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + + * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Issue instructions +vpmuludq and vpaddq instead of vpmacsdql for handling 32-bit +multiplication. OK for mainline and release branches. Thanks, Uros.
Re: [PATCH, libbid]: Fix variable ‘Ql’ set but not used warnings
On Mon, May 26, 2014 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote: Attached patch fixes several variable ‘Ql’ set but not used warnings in bid128_div.c and bid64_div.c libbid sources. We can simply use __mul_128x128_high functions when lowpart is not needed. 2014-05-26 Uros Bizjak ubiz...@gmail.com * bid128_div.c (BID128_FUNCTION_ARG2): Remove unused variable 'Ql'. Call __mul_128x128_high instead of __mul_128x128_full. (TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2): Ditto. (BID128_FUNCTION_ARGTYPE1_ARG128): Ditto. (BID128_FUNCTION_ARG128_ARGTYPE2): Ditto. * bid64_div.c (TYPE0_FUNCTION_ARGTYPE1_ARG128): Ditto. (TYPE0_FUNCTION_ARG128_ARGTYPE2): Ditto. (TYPE0_FUNCTION_ARG128_ARG128): Ditto. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. The patch was OK'd offline by H.J. Committed to mainline SVN. Uros.
Re: [PATCH, PR61446] Fix mode for register copy in REE pass
On Tue, Jun 10, 2014 at 3:45 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: This patch fixes PR61446. ... Confirmed, it also allows to bootstrap Core* targets. Could it be reviewed and committed ASAP? 2014-06-09 Ilya Enkovich ilya.enkov...@intel.com PR 61446 * ree.c (find_and_remove_re): Narrow mode for register copy if required. Please also add the testcase form the PR. (I am not RTL reviewer, so I can't approve the patch). Uros.
Re: [PATCH, PR61446] Fix mode for register copy in REE pass
On Wed, Jun 11, 2014 at 3:19 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: (I am not RTL reviewer, so I can't approve the patch). Is https://gcc.gnu.org/ml/gcc-regression/2014-06/ accepatble? Yes, these are bootstraps with non-default configurations. Uros.
Re: [PATCH, PR61446] Fix mode for register copy in REE pass
On Wed, Jun 11, 2014 at 6:11 PM, Ilya Enkovich enkovich@gmail.com wrote: On 11 Jun 14:59, Uros Bizjak wrote: On Tue, Jun 10, 2014 at 3:45 PM, Dominique Dhumieres domi...@lps.ens.fr wrote: This patch fixes PR61446. ... Confirmed, it also allows to bootstrap Core* targets. Could it be reviewed and committed ASAP? 2014-06-09 Ilya Enkovich ilya.enkov...@intel.com PR 61446 * ree.c (find_and_remove_re): Narrow mode for register copy if required. Please also add the testcase form the PR. (I am not RTL reviewer, so I can't approve the patch). Uros. Hi, rgis one is the same but with testcase added. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-11 Ilya Enkovich ilya.enkov...@intel.com PR 61446 * ree.c (find_and_remove_re): Narrow mode for register copy if required. gcc/testsuite/ 2014-06-11 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/pr61446.c : New. diff --git a/gcc/ree.c b/gcc/ree.c index ade413e..6d34764 100644 --- a/gcc/ree.c +++ b/gcc/ree.c @@ -1088,14 +1088,24 @@ find_and_remove_re (void) /* Use the mode of the destination of the defining insn for the mode of the copy. This is necessary if the defining insn was used to eliminate a second extension -that was wider than the first. */ +that was wider than the first. Truncate mode if it is +too wide for destination reg. */ rtx sub_rtx = *get_sub_rtx (def_insn); rtx pat = PATTERN (curr_insn); - rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)), -REGNO (XEXP (SET_SRC (pat), 0))); - rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)), -REGNO (SET_DEST (pat))); - rtx set = gen_rtx_SET (VOIDmode, new_dst, new_src); + unsigned int regno = REGNO (XEXP (SET_SRC (pat), 0)); + enum machine_mode mode = GET_MODE (SET_DEST (sub_rtx)); + rtx new_dst, new_src, set; + + if (HARD_REGNO_NREGS (regno, mode) != 1) + { + mode = GET_CLASS_NARROWEST_MODE (GET_MODE_CLASS (mode)); + while (HARD_REGNO_NREGS (regno, GET_MODE_WIDER_MODE (mode)) == 1) + mode = GET_MODE_WIDER_MODE (mode); + } + + new_dst = gen_rtx_REG (mode, REGNO (XEXP (SET_SRC (pat), 0))); + new_src = gen_rtx_REG (mode, REGNO (SET_DEST (pat))); + set = gen_rtx_SET (VOIDmode, new_dst, new_src); emit_insn_after (set, def_insn); } diff --git a/gcc/testsuite/gcc.target/i386/pr61446.c b/gcc/testsuite/gcc.target/i386/pr61446.c new file mode 100644 index 000..8537cdb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr61446.c @@ -0,0 +1,14 @@ +/* PR rtl-optimization/61446 */ + +/* { dg-do compile } */ +/* { dg-options -O2 -m32 -march=corei7 } */ This should read: /* { dg-do compile { target { ia32 } } } */ /* { dg-options -O2 -march=corei7 -mfpmath=387 } */ The x86 part is OK with this change. Uros.
Re: [PATCH] Fix PR61335
On Fri, Jun 6, 2014 at 10:07 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote: 2014-05-28 Richard Biener rguent...@suse.de PR tree-optimization/61335 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and new range fails, drop to varying. * gfortran.dg/pr61335.f90: New testcase. This testcase triggers SIGFPE on alpha due to the use of denormal operand. Maybe uninitialized value is used in line 48? SIGFPE also triggers at the same place on x86_64 with unmasked FPE exceptions (compile with -O0). Attached patch initializes problematic array to zero instead of uninitialized value. 2014-06-17 Uros Bizjak ubiz...@gmail.com * gfortran.dg/pr61335.f90 (cp_unit_create): Initialize unit_id and kind_id to zero. Tested on alphaev68-linux-gnu and x86_64-linux-gnu. OK for mainline? Uros. Index: gfortran.dg/pr61335.f90 === --- gfortran.dg/pr61335.f90 (revision 211723) +++ gfortran.dg/pr61335.f90 (working copy) @@ -45,8 +45,8 @@ LOGICAL :: failure failure=.FALSE. -unit_id=cp_units_none -kind_id=cp_ukind_none +unit_id=0 +kind_id=0 power=0 i_low=1 i_high=1
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Are i386 changes ok? Patches with corresponding changes and new tests are attached. Please remove all target selectors from dg-options and dg-final testcase directives, they are not needed inside gcc.dg/i386 directory. The patch is OK with this change. Thanks, Uros.
[PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
Hello! Attached patch fixes recently added sibcall insns and their corresponding peephole2 patterns: - There is no need for new memory_nox32_operand. A generic memory_operand can be used, since new insns and peephole2 patterns should be disabled for TARGET_X32 entirely. - Adds missing m constraint in insn patterns. - Macroizes peephole2 patterns - Adds check that eliminated register is really dead after the call (maybe an overkill, but some hard-to-debug problems surfaced due to missing liveness checks in the past) - Fixes call RTXes in sibcall_pop related patterns (and fixes two newly introduced warnings in i386.md) 2014-06-18 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern. Do not use unspec as call operand. Use memory_operand instead of memory_nox32_operand and add m operand constraint. Disable pattern for TARGET_X32. (*sibcall_pop_memory): Ditto. (*sibcall_value_memory): Ditto. (*sibcall_value_pop_memory): Ditto. (sibcall peepholes): Merge SImode and DImode patterns using W mode iterator. Use memory_operand instead of memory_nox32_operand. Disable pattern for TARGET_X32. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value peepholes): Ditto. (sibcall_pop peepholes): Fix call insn RTXes. Use memory_operand instead of memory_nox32_operand. Check if eliminated register is really dead after call insn. Generate call RTX without unspec operand. (sibcall_value_pop peepholes): Ditto. * config/i386/predicates.md (memory_nox32_operand): Remove predicate. The patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 211725) +++ i386.md (working copy) @@ -11354,53 +11354,38 @@ * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) -(define_insn *sibcall_intern - [(call (unspec [(mem:QI (match_operand:W 0 memory_nox32_operand))] - UNSPEC_PEEPSIB) -(match_operand 1))] - +(define_insn *sibcall_memory + [(call (mem:QI (match_operand:W 0 memory_operand m)) +(match_operand 1)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)] + !TARGET_X32 * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) (define_peephole2 - [(set (match_operand:DI 0 register_operand) -(match_operand:DI 1 memory_nox32_operand)) + [(set (match_operand:W 0 register_operand) + (match_operand:W 1 memory_operand)) (call (mem:QI (match_dup 0)) (match_operand 3))] - TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (1)) - [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) + !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (1)) +peep2_reg_dead_p (2, operands[0]) + [(parallel [(call (mem:QI (match_dup 1)) + (match_dup 3)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) (define_peephole2 - [(set (match_operand:DI 0 register_operand) -(match_operand:DI 1 memory_nox32_operand)) + [(set (match_operand:W 0 register_operand) + (match_operand:W 1 memory_operand)) (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) (call (mem:QI (match_dup 0)) (match_operand 3))] - TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (2)) + !TARGET_X32 SIBLING_CALL_P (peep2_next_insn (2)) +peep2_reg_dead_p (3, operands[0]) [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) + (parallel [(call (mem:QI (match_dup 1)) + (match_dup 3)) + (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) -(define_peephole2 - [(set (match_operand:SI 0 register_operand) -(match_operand:SI 1 memory_nox32_operand)) - (call (mem:QI (match_dup 0)) - (match_operand 3))] - !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (1)) - [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) - (match_dup 3))]) - -(define_peephole2 - [(set (match_operand:SI 0 register_operand) -(match_operand:SI 1 memory_nox32_operand)) - (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (mem:QI (match_dup 0)) - (match_operand 3))] - !TARGET_64BIT SIBLING_CALL_P (peep2_next_insn (2)) - [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE) - (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) (match_dup 3))]) - (define_expand call_pop [(parallel [(call (match_operand:QI 0) (match_operand:SI 1)) @@ -11434,42 +11419,52 @@ * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) -(define_insn *sibcall_pop_intern - [(call (unspec [(mem:QI (match_operand:SI 0 memory_nox32_operand))] - UNSPEC_PEEPSIB) +(define_insn *sibcall_pop_memory
Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns
On Wed, Jun 18, 2014 at 2:24 PM, Kai Tietz ktiet...@googlemail.com wrote: The following change in predicates.md seems to be a bit premature. There is still the point about Darwin's PIC issue for unspec-gotpcrel. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387 return ANY_QI_REG_P (op); }) +;; Return true if OP is a memory operands that can be used in sibcalls. (define_predicate sibcall_memory_operand - (match_operand 0 memory_operand) -{ - return CONSTANT_P (XEXP (op, 0)); -}) + (and (match_operand 0 memory_operand) + (match_test CONSTANT_P (XEXP (op, 0) as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point. In general there is still the question why this issue just happens for Darwin, but not for linux. For linux that gotpcrel-code path seems not to be hit at all (at least is that what Ians told). Oh, this part doesn't change any functionality at all. The predicate is just written in a different way. Uros.
Re: [Patch, i386] Separate Intel processor with expanded ISA
On Mon, Jan 27, 2014 at 10:15 AM, Uros Bizjak ubiz...@gmail.com wrote: +2013-12-29 Allan Sandfeld Jensen sandf...@kde.org Missing space in ChangeLog entry. + * config/i386/i386.c (get_builtin_code_for_version): Separate + Westmere from Nehalem, Ivy Bridge from Sandy Bridge and + Broadwell from Haswell. --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -31298,18 +31298,27 @@ get_builtin_code_for_version (tree decl, tree *predicate_list) priority = P_PROC_SSSE3; break; case PROCESSOR_NEHALEM: - /* We translate arch=corei7 and arch=nehelam to - corei7 so that it will be mapped to M_INTEL_COREI7 - as cpu type to cover all M_INTEL_COREI7_XXXs. */ - arg_str = corei7; + if (new_target-x_ix86_isa_flags OPTION_MASK_ISA_AES) + arg_str = westmere; + else + /* We translate arch=corei7 and arch=nehelam to Trivial typo above: arch=nehalem. OK for mainline with these changes. I have committed slightly reformated patches with following ChangeLog to mainline SVN. 2014-01-27 Allan Sandfeld Jensen sandf...@kde.org * config/i386/i386.c (get_builtin_code_for_version): Separate Westmere from Nehalem, Ivy Bridge from Sandy Bridge and Broadwell from Haswell. testsuite/ChangeLog: 2014-01-27 Allan Sandfeld Jensen sandf...@kde.org * g++.dg/ext/mv16.C: New tests. Uros.
Re: PATCH: PR target/59672: Add -m16 support for x86
On Mon, Jan 27, 2014 at 8:44 PM, H.J. Lu hongjiu...@intel.com wrote: The .code16gcc directive was added to binutils back in 1999: --- '.code16gcc' provides experimental support for generating 16-bit code from gcc, and differs from '.code16' in that 'call', 'ret', 'enter', 'leave', 'push', 'pop', 'pusha', 'popa', 'pushf', and 'popf' instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. '.code16gcc' also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates. --- It encodes 32-bit assembly instructions generated by GCC in 16-bit format so that GCC can be used to generate 16-bit instructions. To do that, the .code16gcc directive may be placed at the very beginning of the assembly code. This patch adds -m16 to x86 backend by: 1. Add -m16 and make it mutually exclusive with -m32, -m64 and -mx32. 2. Treat -m16 like -m32 so that --32 is passed to assembler. 3. Output .code16gcc at the very beginning of the assembly code. 4. Turn off 64-bit ISA when -m16 is used. Tested on Linux/x86 and Linux/x86-64. OK for trunk? Thanks. H.J. --- PR target/59672 * config/i386/gnu-user64.h (SPEC_32): Add m16| to m32. (SPEC_X32): Likewise. (SPEC_64): Likewise. * config/i386/i386.c (ix86_option_override_internal): Turn off OPTION_MASK_ISA_64BIT, OPTION_MASK_ABI_X32 and OPTION_MASK_ABI_64 for TARGET_16BIT. (x86_file_start): Output .code16gcc for TARGET_16BIT. * config/i386/i386.h (TARGET_16BIT): New macro. (TARGET_16BIT_P): Likewise. * config/i386/i386.opt: Add m16. * doc/invoke.texi: Document -m16. OK for mainline, needs OK from RMs for a backport. Please also add the entry to Changes.html, this is user-visible change. Thanks, Uros.
Re: PATCH: PR target/59672: Add -m16 support for x86
On Tue, Jan 28, 2014 at 5:35 PM, H.J. Lu hjl.to...@gmail.com wrote: The .code16gcc directive was added to binutils back in 1999: scan-asm testcase doesn't do anything useful. The only difference in assembly code between -m16 and -m32 is the .code16gcc directive All magic is done in assembler. The test would just pass -m16 in dg-options and scan for the above directive. It is a simple test that -m16 works as expected. Uros.
Re: PATCH: PR target/59672: Add -m16 support for x86
On Tue, Jan 28, 2014 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Jan 27, 2014 at 8:44 PM, H.J. Lu hongjiu...@intel.com wrote: The .code16gcc directive was added to binutils back in 1999: --- '.code16gcc' provides experimental support for generating 16-bit code from gcc, and differs from '.code16' in that 'call', 'ret', 'enter', 'leave', 'push', 'pop', 'pusha', 'popa', 'pushf', and 'popf' instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. '.code16gcc' also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates. --- It encodes 32-bit assembly instructions generated by GCC in 16-bit format so that GCC can be used to generate 16-bit instructions. To do that, the .code16gcc directive may be placed at the very beginning of the assembly code. This patch adds -m16 to x86 backend by: 1. Add -m16 and make it mutually exclusive with -m32, -m64 and -mx32. 2. Treat -m16 like -m32 so that --32 is passed to assembler. 3. Output .code16gcc at the very beginning of the assembly code. 4. Turn off 64-bit ISA when -m16 is used. Tested on Linux/x86 and Linux/x86-64. OK for trunk? Thanks. H.J. --- PR target/59672 * config/i386/gnu-user64.h (SPEC_32): Add m16| to m32. (SPEC_X32): Likewise. (SPEC_64): Likewise. * config/i386/i386.c (ix86_option_override_internal): Turn off OPTION_MASK_ISA_64BIT, OPTION_MASK_ABI_X32 and OPTION_MASK_ABI_64 for TARGET_16BIT. (x86_file_start): Output .code16gcc for TARGET_16BIT. * config/i386/i386.h (TARGET_16BIT): New macro. (TARGET_16BIT_P): Likewise. * config/i386/i386.opt: Add m16. * doc/invoke.texi: Document -m16. OK for mainline, needs OK from RMs for a backport. Please also add the entry to Changes.html, this is user-visible change. Oh, a short scan-asm testcase would be nice, too. Thanks, Uros.
Re: [PATCH][AVX512] Swap Yk and k constraints.
On Thu, Jan 30, 2014 at 11:54 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Turns out that for Icc meaning of Yk and k constraints (exposed through inline asm) is opposite to current GCC implementation. As Icc with such behavior was already releases and GCC wasn't. I propose to swap meaning of Yk and k constraints. Changes are pretty mechanical. Bootstraps/passes make check/SPEC2006. Ok for trunk? Here is ChangeLog: 2014-01-30 Ilya Tocar ilya.to...@intel.com * config/i386/constraints.md (Yk): Swap meaning with k. * config/i386/i386.md (movhi_internal): Change Yk to k. (movqi_internal): Ditto. (*klogicmode): Ditto. (*andhi_1): Ditto. (*andqi_1): Ditto. (kandnmode): Ditto. (*codehi_1): Ditto. (*codeqi_1): Ditto. (kxnormode): Ditto. (kortestzhi): Ditto. (kortestchi): Ditto. (kunpckhi): Ditto. (*one_cmplhi2_1): Ditto. (*one_cmplqi2_1): Ditto. * config/i386/sse.md (): Change k to Yk. (avx512f_loadmode_mask): Ditto. (avx512f_blendmmode): Ditto. (avx512f_storemode_mask): Ditto. (avx512f_storeussemodesuffix512_mask): Ditto. (avx512f_storedqumode_mask): Ditto. (avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name): Ditto. (avx512f_ucmpmode3mask_scalar_merge_name): Ditto. (avx512f_vmcmpmode3round_saeonly_name): Ditto. (avx512f_vmcmpmode3_maskround_saeonly_name): Ditto. (avx512f_maskcmpmode3): Ditto. (avx512f_fmadd_mode_maskround_name): Ditto. (avx512f_fmadd_mode_mask3round_name): Ditto. (avx512f_fmsub_mode_maskround_name): Ditto. (avx512f_fmsub_mode_mask3round_name): Ditto. (avx512f_fnmadd_mode_maskround_name): Ditto. (avx512f_fnmadd_mode_mask3round_name): Ditto. (avx512f_fnmsub_mode_maskround_name): Ditto. (avx512f_fnmsub_mode_mask3round_name): Ditto. (avx512f_fmaddsub_mode_maskround_name): Ditto. (avx512f_fmaddsub_mode_mask3round_name): Ditto. (avx512f_fmsubadd_mode_maskround_name): Ditto. (avx512f_fmsubadd_mode_mask3round_name): Ditto. (avx512f_vextractshuffletype32x4_1_maskm): Ditto. (vec_extract_lo_mode_maskm): Ditto. (vec_extract_hi_mode_maskm): Ditto. (avx512f_vternlogmode_mask): Ditto. (avx512f_fixupimmmode_maskround_saeonly_name): Ditto. (avx512f_sfixupimmmode_maskround_saeonly_name): Ditto. (avx512f_codepmov_src_lowermode2_mask): Ditto. (avx512f_codev8div16qi2_mask): Ditto. (avx512f_codev8div16qi2_mask_store): Ditto. (avx512f_eqmode3mask_scalar_merge_name_1): Ditto. (avx512f_gtmode3mask_scalar_merge_name): Ditto. (avx512f_testmmode3mask_scalar_merge_name): Ditto. (avx512f_testnmmode3mask_scalar_merge_name): Ditto. (*avx512pf_gatherpfmodesf_mask): Ditto. (*avx512pf_gatherpfmodedf_mask): Ditto. (*avx512pf_scatterpfmodesf_mask): Ditto. (*avx512pf_scatterpfmodedf_mask): Ditto. (avx512cd_maskb_vec_dupv8di): Ditto. (avx512cd_maskw_vec_dupv16si): Ditto. (avx512f_vpermi2varmode3_maskz): Ditto. (avx512f_vpermi2varmode3_mask): Ditto. (avx512f_vpermi2varmode3_mask): Ditto. (avx512f_vpermt2varmode3_maskz): Ditto. (*avx512f_gathersimode): Ditto. (*avx512f_gathersimode_2): Ditto. (*avx512f_gatherdimode): Ditto. (*avx512f_gatherdimode_2): Ditto. (*avx512f_scattersimode): Ditto. (*avx512f_scatterdimode): Ditto. (avx512f_compressmode_mask): Ditto. (avx512f_compressstoremode_mask): Ditto. (avx512f_expandmode_mask): Ditto. * config/i386/subst.md (mask): Change k to Yk. (mask_scalar_merge): Ditto. (sd): Ditto. And for tests: 2014-01-30 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx512f-inline-asm.c: Swap Yk and k. * gcc.target/i386/avx512f-kmovw-1.c: Also allow k0. OK. Thanks, Uros.
Re: [PATCH][AVX512] Fix rounding operand.
On Thu, Jan 30, 2014 at 1:50 PM, Ilya Tocar tocarip.in...@gmail.com wrote: I've found some problems with embedded rounding implementation. First constants are already defined in smmintrin.h, so we shouldn't redefine them. Second problem is bigger: currently rounding argument to intrinsic is one of _MM_FROUND_TO_NEAREST_INT, _MM_FROUND_TO_NEG_INF, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_ZERO, _MM_FROUND_CUR_DIRECTION, _MM_FROUND_NO_EXC, but actually it should be _MM_FROUND_NO_EXC or _MM_FROUND_CUR_DIRECTION for SAE and _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC for rounding. That's how Icc does it, because while currently rounding implies sae it may be not true in future. I've splited rounding and sae in print_operand into 'R' and 'r' because we can't distinguish between 8 and 8 | 0. I've also run sed on tests to correct rounding arguments. While patch is huge most of it is result of sed. It bootstraps, passes make check/SPEC2006.i Ok for trunk? Here is ChangeLog. 2014-01-30 Ilya Tocar ilya.to...@intel.com * config/i386/avx512fintrin.h (_MM_FROUND_TO_NEAREST_INT), (_MM_FROUND_TO_NEG_INF), (_MM_FROUND_TO_POS_INF), (_MM_FROUND_TO_ZERO), (_MM_FROUND_CUR_DIRECTION): Are already defined in smmintrin.h, remove them. (_MM_FROUND_NO_EXC): Same as above, bit also wrong value. * config/i386/i386.c (ix86_print_operand): Split sae and rounding. * config/i386/i386.md (ROUND_SAE): Fix value. * config/i386/predicates.md (const_4_or_8_to_11_operand): New. (const48_operand): New. * config/i386/subst.md (round), (round_expand): Use const_4_or_8_to_11_operand. (round_saeonly), (round_saeonly_expand): Use const48_operand. 2014-01-30 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Use correct rounding values. * gcc.target/i386/avx512f-vaddpd-1.c: Ditto. * gcc.target/i386/avx512f-vaddps-1.c: Ditto. * gcc.target/i386/avx512f-vaddsd-1.c: Ditto. * gcc.target/i386/avx512f-vaddss-1.c: Ditto. * gcc.target/i386/avx512f-vcvtdq2ps-1.c: Ditto. * gcc.target/i386/avx512f-vcvtpd2dq-1.c: Ditto. * gcc.target/i386/avx512f-vcvtpd2ps-1.c: Ditto. * gcc.target/i386/avx512f-vcvtpd2udq-1.c: Ditto. * gcc.target/i386/avx512f-vcvtps2dq-1.c: Ditto. * gcc.target/i386/avx512f-vcvtps2udq-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsd2si-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsd2si64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsd2ss-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsd2usi-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsd2usi64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsi2sd64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsi2ss-1.c: Ditto. * gcc.target/i386/avx512f-vcvtsi2ss64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtss2si-1.c: Ditto. * gcc.target/i386/avx512f-vcvtss2si64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtss2usi-1.c: Ditto. * gcc.target/i386/avx512f-vcvtss2usi64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtudq2ps-1.c: Ditto. * gcc.target/i386/avx512f-vcvtusi2sd64-1.c: Ditto. * gcc.target/i386/avx512f-vcvtusi2ss-1.c: Ditto. * gcc.target/i386/avx512f-vcvtusi2ss64-1.c: Ditto. * gcc.target/i386/avx512f-vdivpd-1.c: Ditto. * gcc.target/i386/avx512f-vdivps-1.c: Ditto. * gcc.target/i386/avx512f-vdivsd-1.c: Ditto. * gcc.target/i386/avx512f-vdivss-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddXXXsd-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddXXXss-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddsubXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfmaddsubXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubXXXsd-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubXXXss-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubaddXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfmsubaddXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfnmaddXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfnmaddXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfnmaddXXXsd-1.c: Ditto. * gcc.target/i386/avx512f-vfnmaddXXXss-1.c: Ditto. * gcc.target/i386/avx512f-vfnmsubXXXpd-1.c: Ditto. * gcc.target/i386/avx512f-vfnmsubXXXps-1.c: Ditto. * gcc.target/i386/avx512f-vfnmsubXXXsd-1.c: Ditto. * gcc.target/i386/avx512f-vfnmsubXXXss-1.c: Ditto. * gcc.target/i386/avx512f-vmulpd-1.c: Ditto. * gcc.target/i386/avx512f-vmulps-1.c: Ditto. * gcc.target/i386/avx512f-vmulsd-1.c: Ditto. * gcc.target/i386/avx512f-vmulss-1.c:
Re: [PATCH] Two small i?86 *intrin* warning fixes
On Thu, Jan 30, 2014 at 6:52 PM, Jakub Jelinek ja...@redhat.com wrote: While looking at some other PR, I've stripped line notes and got pr59947.ii.bak:26330:74: error: ISO C++ forbids declaration of '_mm512_mask_cvtusepi64_storeu_epi32' with no type [-fpermissive] _mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A) ^ pr59947.ii.bak: In function 'float _cvtsh_ss(short unsigned int)': pr59947.ii.bak:30674:65: warning: narrowing conversion of '__S' from 'short unsigned int' to 'short int' inside { } [-Wnarrowing] __v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 }; ^ warnings that would normally only show up with -Wsystem-headers. Especially the second one looks like one worth fixing. Ok for trunk? 2014-01-30 Jakub Jelinek ja...@redhat.com * config/i386/f16cintrin.h (_cvtsh_ss): Avoid -Wnarrowing warning. * config/i386/avx512fintrin.h (_mm512_mask_cvtusepi64_storeu_epi32): Add missing return type - void. OK. Should _cvtsh_ss fix be backported to other release branches? Thanks, Uros.
Re: [PATCH][testsuite] Avoid division by zero.
On Thu, Jan 30, 2014 at 5:41 PM, Ilya Tocar tocarip.in...@gmail.com wrote: This patch removes possible division by zero. Make check passes. Ok for trunk? 2014-01-30 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/m512-check.h: Use correct rounding values. --- gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index 3209039..8441784 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v, \ \ for (i = 0; i ARRAY_SIZE (u.a); i++) \ { \ - VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \ + VALUE_TYPE rel_err; \ + rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i]; \ if (((rel_err 0) ? -rel_err : rel_err) eps) \ { \ err++;\ We won't get zero from exponential function, so expecting zero result is flawed anyway. If we would like to introduce universal epsilon comparisons into the testsuite, then please read [1]. Being overly pedantic, the definition should be |(v[i] - u.a[i]) / v[i]|, as stated in [2]. [1] http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ [2] http://en.wikipedia.org/wiki/Relative_error Uros.
Re: [PATCH][testsuite] Avoid division by zero.
On Fri, Jan 31, 2014 at 11:00 AM, Ilya Tocar tocarip.in...@gmail.com wrote: This patch removes possible division by zero. Make check passes. Ok for trunk? 2014-01-30 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/m512-check.h: Use correct rounding values. --- gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index 3209039..8441784 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v, \ \ for (i = 0; i ARRAY_SIZE (u.a); i++) \ { \ - VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \ + VALUE_TYPE rel_err; \ + rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i]; \ if (((rel_err 0) ? -rel_err : rel_err) eps) \ { \ err++;\ We won't get zero from exponential function, so expecting zero result is flawed anyway. If we would like to introduce universal epsilon comparisons into the testsuite, then please read [1]. Being overly pedantic, the definition should be |(v[i] - u.a[i]) / v[i]|, as stated in [2]. [1] http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ [2] http://en.wikipedia.org/wiki/Relative_error We get zero from testing zero-masking. Currently we produce 0/0 = NaN. Comparison with NaN is always false, so tests pass. But I think that this should be fixed to avoid division by zero. As for being more pedantic about comparison, I doubt that its useful, when we use 0.0001 as eps. In this case, please add simple check for zero, with the above comment. We don't test exp function, but masking. Uros.
Re: [PATCH][testsuite] Avoid division by zero.
On Fri, Jan 31, 2014 at 1:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote: We won't get zero from exponential function, so expecting zero result is flawed anyway. If we would like to introduce universal epsilon comparisons into the testsuite, then please read [1]. Being overly pedantic, the definition should be |(v[i] - u.a[i]) / v[i]|, as stated in [2]. [1] http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ [2] http://en.wikipedia.org/wiki/Relative_error We get zero from testing zero-masking. Currently we produce 0/0 = NaN. Comparison with NaN is always false, so tests pass. But I think that this should be fixed to avoid division by zero. As for being more pedantic about comparison, I doubt that its useful, when we use 0.0001 as eps. In this case, please add simple check for zero, with the above comment. We don't test exp function, but masking. Something like this? Yes, this is OK, with a small comment fix. gcc/testsuite/gcc.target/i386/m512-check.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h b/gcc/testsuite/gcc.target/i386/m512-check.h index 3209039..a96a103 100644 --- a/gcc/testsuite/gcc.target/i386/m512-check.h +++ b/gcc/testsuite/gcc.target/i386/m512-check.h @@ -58,6 +58,16 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE *v, \ \ for (i = 0; i ARRAY_SIZE (u.a); i++) \ { \ + /* We will always have v[i] == 0 == u.a[i] for some i, \ We can have ... Thanks, Uros.