Re: [PATCH, x86, testsuite, AVX-512] Fix initialization in 4 tests for shuffles.

2014-03-27 Thread Uros Bizjak
On Thu, Mar 27, 2014 at 10:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Straightforward patch in the bottom fixes
 copy-and-paste problem in initialization part
 of tests.

 Updated tests pass on simulator.

 Is it ok for trunk?

 gcc/testsuite:
 * gcc.target/i386/avx512f-vshuff32x4-2.c: Fix initialization
 of second source operand.
 * gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto.

OK.

Thanks,
Uros.


Re: [PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)

2014-03-28 Thread Uros Bizjak
On Fri, Mar 28, 2014 at 4:19 PM, Jakub Jelinek ja...@redhat.com wrote:

 Before ix86_copy_addr_to_reg has been added, we've been using
 copy_addr_to_reg, which handles VOIDmode values just fine.
 But this new function just ICEs on those.  As the function
 has been added for adding SUBREGs to TLS addresses, those will
 never retunring CONST_INTs, so just using copy_addr_to_reg
 is IMHO the right thing and restores previous behavior.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2014-03-28  Jakub Jelinek  ja...@redhat.com

 PR target/60693
 * config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg
 also if addr has VOIDmode.

 * gcc.target/i386/pr60693.c: New test.

OK.

Thanks,
Uros.


Re: various _mm512_set* intrinsics

2014-03-28 Thread Uros Bizjak
Hello!

 Here are more intrinsics that are missing.  I know that gcc currently
 generates horrible code for most of them but I think it's more important
 to have the API in place, albeit non-optimal.  Maybe this entices some
 one to add the necessary optimizations.

I agree that having non-optimal implementation is better than nothing.

 The code is self-contained and shouldn't interfere with any correct
 code.  Should this also go into 4.9?

 2014-03-27  Ulrich Drepper  drep...@gmail.com

 * config/i386/avx512fintrin.h (__v32hi): Define type.
 (__v64qi): Likewise.
 (_mm512_set1_epi8): Define.
 (_mm512_set1_epi16): Define.
 (_mm512_set4_epi32): Define.
 (_mm512_set4_epi64): Define.
 (_mm512_set4_pd): Define.
 (_mm512_set4_ps): Define.
 (_mm512_setr4_epi64): Define.
 (_mm512_setr4_epi32): Define.
 (_mm512_setr4_pd): Define.
 (_mm512_setr4_ps): Define.
 (_mm512_setzero_epi32): Define.

This is OK for mainline, but please wait for Kirill's review of the intrinsics.

Thanks,
Uros.


Re: Fix various x86 tests for --with-arch=bdver3

2014-03-29 Thread Uros Bizjak
On Fri, Mar 28, 2014 at 10:46 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 If you build an x86_64 toolchain with --with-arch enabling various
 instruction set extensions by default, this causes some tests to fail
 that aren't expecting those extensions to be enabled.  This patch
 fixes various tests failing like that for an x86_64-linux-gnu
 toolchain configured --with-arch=bdver3, generally by using
 appropriate -mno-* options in the tests, or in the case of
 gcc.dg/pr45416.c by adjusting the scan-assembler to allow the
 alternative instruction that gets used in this case.  It's quite
 likely other such failures appear for other --with-arch choices.

 Tested x86_64-linux-gnu.  OK to commit?

 In addition to the failures fixed by this patch, there are many
 gcc.dg/vect tests where having additional vector extensions enabled
 breaks their expectations; I'm not sure of the best way to handle
 those.  And you get

 FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors)

 which are assembler errors such as operand type mismatch for
 `vfmaddpd' - it looks like the compiler isn't really prepared for the
 -mavx512f -mfma4 combination, but I'm not sure what the best way to
 handle it is (producing invalid output doesn't seem right, however).

I will look into these.

 If you test with -march=bdver3 in the multilib options (runtest
 --target_board=unix/-march=bdver3) rather than as the configured
 default, you get extra failures for the usual reason of multilib
 options going after the options from dg-options (which I propose to
 address in the usual way using dg-skip-if for -march= options
 different from the one present in dg-options).

 2014-03-28  Joseph Myers  jos...@codesourcery.com

 * gcc.dg/pr45416.c: Allow bextr on x86.
 * gcc.target/i386/fma4-builtin.c, gcc.target/i386/fma4-fma-2.c,
 gcc.target/i386/fma4-fma.c, gcc.target/i386/fma4-vector-2.c,
 gcc.target/i386/fma4-vector.c: Use -mno-fma.
 * gcc.target/i386/l_fma_double_1.c,
 gcc.target/i386/l_fma_double_2.c,
 gcc.target/i386/l_fma_double_3.c,
 gcc.target/i386/l_fma_double_4.c,
 gcc.target/i386/l_fma_double_5.c,
 gcc.target/i386/l_fma_double_6.c, gcc.target/i386/l_fma_float_1.c,
 gcc.target/i386/l_fma_float_2.c, gcc.target/i386/l_fma_float_3.c,
 gcc.target/i386/l_fma_float_4.c, gcc.target/i386/l_fma_float_5.c,
 gcc.target/i386/l_fma_float_6.c: Use -mno-fma4.
 * gcc.target/i386/pr27971.c: Use -mno-tbm.
 * gcc.target/i386/pr42542-4a.c: Use -mno-avx.
 * gcc.target/i386/pr59390.c: Use -mno-fma -mno-fma4.

OK.

Thanks,
Uros.


Re: Fix various x86 tests for --with-arch=bdver3

2014-03-30 Thread Uros Bizjak
On Fri, Mar 28, 2014 at 10:46 PM, Joseph S. Myers
jos...@codesourcery.com wrote:

 In addition to the failures fixed by this patch, there are many
 gcc.dg/vect tests where having additional vector extensions enabled
 breaks their expectations; I'm not sure of the best way to handle
 those.  And you get

 FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors)

 which are assembler errors such as operand type mismatch for
 `vfmaddpd' - it looks like the compiler isn't really prepared for the
 -mavx512f -mfma4 combination, but I'm not sure what the best way to
 handle it is (producing invalid output doesn't seem right, however).

Attached patch splits AVX512F modes out of existing FMA patterns.
These modes are not supported by patterns that also support FMA4
insns.

2014-03-30  Uros Bizjak  ubiz...@gmail.com

* config/i386/sse.md (FMAMODE_NOVF512): New mode iterator.
(sd_mask_codeforfma_fmadd_modesd_maskz_nameround_name):
Split out
sd_mask_codeforfma_fmadd_VF_512:modesd_maskz_nameround_name.
Use FMAMODE_NOVF512 mode iterator.
(sd_mask_codeforfma_fmsub_modesd_maskz_nameround_name): Ditto.
(sd_mask_codeforfma_fnmadd_modesd_maskz_nameround_name): Ditto.
(sd_mask_codeforfma_fnmsub_modesd_maskz_nameround_name): Ditto.
(sd_mask_codeforfma_fmaddsub_modesd_maskz_nameround_name):
Split out
sd_mask_codeforfma_fmaddsub_VF_512:modesd_maskz_nameround_name.
Use VF_128_256 mode iterator.
(sd_mask_codeforfma_fmsubadd_modesd_maskz_nameround_name):
Ditto.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. I
have also eyeballed the asm - no FMA4 insn was emitted for AVX512F
modes.

Committed to mainline SVN.

Uros.
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 208944)
+++ config/i386/sse.md  (working copy)
@@ -2712,8 +2712,7 @@
(fma:FMAMODEM
  (match_operand:FMAMODEM 1 nonimmediate_operand)
  (match_operand:FMAMODEM 2 nonimmediate_operand)
- (match_operand:FMAMODEM 3 nonimmediate_operand)))]
-  )
+ (match_operand:FMAMODEM 3 nonimmediate_operand)))])
 
 (define_expand fmsmode4
   [(set (match_operand:FMAMODEM 0 register_operand)
@@ -2720,8 +2719,7 @@
(fma:FMAMODEM
  (match_operand:FMAMODEM 1 nonimmediate_operand)
  (match_operand:FMAMODEM 2 nonimmediate_operand)
- (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand]
-  )
+ (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand])
 
 (define_expand fnmamode4
   [(set (match_operand:FMAMODEM 0 register_operand)
@@ -2728,8 +2726,7 @@
(fma:FMAMODEM
  (neg:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand))
  (match_operand:FMAMODEM 2 nonimmediate_operand)
- (match_operand:FMAMODEM 3 nonimmediate_operand)))]
-  )
+ (match_operand:FMAMODEM 3 nonimmediate_operand)))])
 
 (define_expand fnmsmode4
   [(set (match_operand:FMAMODEM 0 register_operand)
@@ -2736,18 +2733,18 @@
(fma:FMAMODEM
  (neg:FMAMODEM (match_operand:FMAMODEM 1 nonimmediate_operand))
  (match_operand:FMAMODEM 2 nonimmediate_operand)
- (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand]
-  )
+ (neg:FMAMODEM (match_operand:FMAMODEM 3 nonimmediate_operand])
 
 ;; The builtins for intrinsics are not constrained by SSE math enabled.
-(define_mode_iterator FMAMODE [(SF TARGET_FMA || TARGET_FMA4 || 
TARGET_AVX512F)
-  (DF TARGET_FMA || TARGET_FMA4 || 
TARGET_AVX512F)
-  (V4SF TARGET_FMA || TARGET_FMA4)
-  (V2DF TARGET_FMA || TARGET_FMA4)
-  (V8SF TARGET_FMA || TARGET_FMA4)
-  (V4DF TARGET_FMA || TARGET_FMA4)
-  (V16SF TARGET_AVX512F)
-  (V8DF TARGET_AVX512F)])
+(define_mode_iterator FMAMODE
+  [(SF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)
+   (DF TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)
+   (V4SF TARGET_FMA || TARGET_FMA4)
+   (V2DF TARGET_FMA || TARGET_FMA4

Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3

2014-04-02 Thread Uros Bizjak
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 When I fixed various tests in
 http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html for failures
 with --with-arch=bdver3, I missed that a so-configured compiler still
 defaults to -mtune=generic.  If you override that as well with
 --with-cpu=bdver3, further failures appear, and this patch fixes some
 of them.

 Most of these changes add -mno-prefer-avx128 to AVX tests not
 expecting a -mprefer-avx128 default.  In addition, some tests have
 -mtune=generic added where the behavior tested for depends on some
 tuning parameter that I identified: X86_TUNE_EXT_80387_CONSTANTS or
 X86_TUNE_SSE_LOAD0_BY_PXOR.

 Tested x86_64-linux-gnu.  OK to commit?

 There are other failures this patch does not resolve in a
 --with-arch=bdver3 --with-cpu=bdver3 configuration.  Some of these are
 AVX tests whose failures are not resolved by adding -mno-prefer-avx128
 (and so this patch does not add -mno-prefer-avx128 to those tests);
 others may be cases where -mtune=generic is appropriate but I haven't
 identified the specific tuning parameter that shows code generation
 differences depending on tuning are correct and so a -mtune= option
 should be used.

 FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ 
 \\t]+[^\n]*%ymm[0-9] 1
 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler 
 (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal)
 FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128
 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ 
 \\t]+%zmm 2
 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times 
 vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1
 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ 
 \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm
 FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1
 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa
 FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1
 FAIL: 

Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3

2014-04-02 Thread Uros Bizjak
On Wed, Apr 2, 2014 at 12:27 AM, Joseph S. Myers
jos...@codesourcery.com wrote:

 There are other failures this patch does not resolve in a
 --with-arch=bdver3 --with-cpu=bdver3 configuration.  Some of these are
 AVX tests whose failures are not resolved by adding -mno-prefer-avx128
 (and so this patch does not add -mno-prefer-avx128 to those tests);
 others may be cases where -mtune=generic is appropriate but I haven't
 identified the specific tuning parameter that shows code generation
 differences depending on tuning are correct and so a -mtune= option
 should be used.

 FAIL: gcc.target/i386/avx2-vpand-1.c scan-assembler vpand[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpand-3.c scan-assembler-times vpand[ 
 \\t]+[^\n]*%ymm[0-9] 1
 FAIL: gcc.target/i386/avx2-vpandn-1.c scan-assembler vpandn[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpor-1.c scan-assembler vpor[ \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx2-vpxor-1.c scan-assembler vpxor[ 
 \\t]+[^\n]*%ymm[0-9]
 FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler 
 (sse2_loaddqu|vmovdqu[^\n\r]*movv16qi_internal)
 FAIL: gcc.target/i386/avx256-unaligned-load-2.c scan-assembler vinsert.128
 FAIL: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ 
 \\t]+%zmm 2
 FAIL: gcc.target/i386/avx512f-vmovdqu32-1.c scan-assembler-times 
 vmovdqu[36][24][ \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1
 FAIL: gcc.target/i386/avx512f-vmovupd-1.c scan-assembler-times vmovupd[ 
 \\t]+[^\n]*\\)[^\n]*%zmm[0-9][^{] 1
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandd-1.c scan-assembler-times vpandd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandnd-1.c scan-assembler-times vpandnd[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandnq-1.c scan-assembler-times vpandnq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpandq-1.c scan-assembler-times vpandq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpord-1.c scan-assembler-times vpord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vporq-1.c scan-assembler-times vporq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9][^{] 4
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpxord-1.c scan-assembler-times vpxord[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9][^{] 3
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}[^{] 1
 FAIL: gcc.target/i386/avx512f-vpxorq-1.c scan-assembler-times vpxorq[ 
 \\t]+[^\n]*%zmm[0-9]{%k[1-7]}{z} 1
 FAIL: gcc.target/i386/pr49002-1.c scan-assembler vmovapd[\t ]*[^,]*,[\t ]*%xmm
 FAIL: gcc.target/i386/pr53712.c scan-assembler-times movdqu 1
 FAIL: gcc.target/i386/pr53907.c scan-assembler movdqa
 FAIL: gcc.target/i386/pr59539-1.c scan-assembler-times vmovdqu 1
 FAIL: gcc.target/i386/pr59539-2.c scan-assembler-times vmovdqu 1

These are due to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL tuning flag.
Currently, this flag applies to all vector sizes (128, 256 and 512
bits), but I guess it is effective only for 128 bit sizes. Can you
please review usage of this flag in i386/sse.md?

Thanks,
Uros.


Re: Skip some gcc.target/i386 tests for conflicting -march= options

2014-04-02 Thread Uros Bizjak
On Wed, Apr 2, 2014 at 6:36 PM, Joseph S. Myers jos...@codesourcery.com wrote:

 If you test an x86_64 toolchain with -march=bdver3 in the multilib
 options, as noted in
 http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01662.html various test
 failures arise from tests whose own -march= in dg-options is
 overridden.  This patch adds dg-skip-if to those tests to skip them
 for conflicting -march= options, as has been done before for other
 tests (obviously, if the option ordering is changed in future in
 DejaGnu, such skips may become obsolete or could be conditioned on
 DejaGnu version).  (No doubt other -march= options would show up
 further tests needing such changes.)

 Tested x86_64-linux-gnu.  OK to commit?

 2014-04-02  Joseph Myers  jos...@codesourcery.com

 * gcc.target/i386/funcspec-2.c, gcc.target/i386/funcspec-3.c,
 gcc.target/i386/funcspec-9.c, gcc.target/i386/isa-1.c,
 gcc.target/i386/memcpy-strategy-1.c,
 gcc.target/i386/memcpy-strategy-2.c,
 gcc.target/i386/memcpy-vector_loop-1.c,
 gcc.target/i386/memcpy-vector_loop-2.c,
 gcc.target/i386/memset-vector_loop-1.c,
 gcc.target/i386/memset-vector_loop-2.c,
 gcc.target/i386/sse2-init-v2di-2.c, gcc.target/i386/ssetype-1.c,
 gcc.target/i386/ssetype-2.c, gcc.target/i386/ssetype-5.c: Skip for
 -march= options different from those in dg-options.

OK.

Thanks,
Uros.


Re: Use -mno-prefer-avx128 in two more tests

2014-04-02 Thread Uros Bizjak
On Wed, Apr 2, 2014 at 10:09 PM, Joseph S. Myers
jos...@codesourcery.com wrote:

 Two of the tests I noted in
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00036.html did not get
 fixed for --with-arch=bdver3 --with-cpu=bdver3 by adding
 -mno-prefer-avx128 in fact also show failures for --with-arch=btver2
 --with-tune=btver2, and in that case *are* fixed by adding
 -mno-prefer-avx128.  Thus, while in those cases there may still be
 other tuning issues as noted in
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00052.html (btver2
 doesn't enable the flag in question) I think it *is* correct to use
 -mno-prefer-avx128 for these two tests, and this patch adds it.

 Tested x86_64-linux-gnu.  OK to commit?

 2014-04-02  Joseph Myers  jos...@codesourcery.cmo

 * gcc.target/i386/avx2-vpand-3.c,
 gcc.target/i386/avx256-unaligned-load-2.c: Use -mno-prefer-avx128.

OK.

Thanks,
Uros.


Re: PATCH: PR target/60827: Inconsistent optimize_function_for_speed_p in in *fixuns_truncmode_1

2014-04-14 Thread Uros Bizjak
On Fri, Apr 11, 2014 at 10:16 PM, H.J. Lu hongjiu...@intel.com wrote:
 Since fixuns_truncmodesi2 expander checks optimize_insn_for_size_p
 before generating *fixuns_truncmode_1,  we should use
 optimize_insn_for_speed_p in *fixuns_truncmode_1 for consistency.
 OK for trunk?

 Thanks.


 H.J.
 ---
 2014-04-11  H.J. Lu  hongjiu...@intel.com

 PR target/60827
 * config/i386/i386.md (*fixuns_truncmode_1): Check
 optimize_insn_for_speed_p instead of
 optimize_function_for_speed_p.

It looks to me that many, if not all
optimize_function_for_{speed,size}_p predicates in .md files should be
converted to corresponding optimize_insn_for_*_p predicates. The later
predicates apply to BBs, so IMO insn sequences should be handled
according to BB frequencies, not function frequencies.

The patch is OK for mainline.

Thanks,
Uros.


Re: PATCH: PR target/60827: Inconsistent optimize_function_for_speed_p in in *fixuns_truncmode_1

2014-04-14 Thread Uros Bizjak
On Mon, Apr 14, 2014 at 6:49 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Apr 11, 2014 at 10:16 PM, H.J. Lu hongjiu...@intel.com wrote:
  Since fixuns_truncmodesi2 expander checks optimize_insn_for_size_p
  before generating *fixuns_truncmode_1,  we should use
  optimize_insn_for_speed_p in *fixuns_truncmode_1 for consistency.
  OK for trunk?
 
  Thanks.
 
 
  H.J.
  ---
  2014-04-11  H.J. Lu  hongjiu...@intel.com
 
  PR target/60827
  * config/i386/i386.md (*fixuns_truncmode_1): Check
  optimize_insn_for_speed_p instead of
  optimize_function_for_speed_p.

 It looks to me that many, if not all
 optimize_function_for_{speed,size}_p predicates in .md files should be
 converted to corresponding optimize_insn_for_*_p predicates. The later
 predicates apply to BBs, so IMO insn sequences should be handled
 according to BB frequencies, not function frequencies.

 You can not convert all predicates, only those in expanders.
 The predicates in insn templates must be consistent thorough the compilation
 since the insn may come from hot BB to cold BB and you do not want it to 
 become
 unrecognizable.

Ops, thanks for sharing this. Based on this explanation, the patch
isn't correct. H.J., please revert it.

Thanks,
Uros.


[PATCH, i386]: Some classify_argument and return_in_memory cleanups

2014-04-14 Thread Uros Bizjak
Hello!

Attached patch changes return type of classify_argument to bool and
merges a couple of called-once functions to their call sites. The
later change removes a bunch of functions, declared with
ATTRIBUTE_UNUSED.

2014-04-14  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (examine_argument): Return bool.  Return true if
parameter should be passed in memory.
case X86_64_COMPLEX_X87_CLASS: Adjust.
(construct_container): Update calls to examine_argument.
(function_arg_advance_64): Ditto.
(return_in_memory_32): Merge with ix86_return_in_memory.
(return_in_memory_64): Ditto.
(return_in_memory_ms_64): Ditto.

Bootstrapped and regtested on x86_64-pc-linux-gnu {,-m32} and
committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 209348)
+++ config/i386/i386.c  (working copy)
@@ -6806,8 +6806,9 @@ classify_argument (enum machine_mode mode, const_t
 }
 
 /* Examine the argument and return set number of register required in each
-   class.  Return 0 iff parameter should be passed in memory.  */
-static int
+   class.  Return true iff parameter should be passed in memory.  */
+
+static bool
 examine_argument (enum machine_mode mode, const_tree type, int in_return,
  int *int_nregs, int *sse_nregs)
 {
@@ -6816,8 +6817,9 @@ examine_argument (enum machine_mode mode, const_tr
 
   *int_nregs = 0;
   *sse_nregs = 0;
+
   if (!n)
-return 0;
+return true;
   for (n--; n = 0; n--)
 switch (regclass[n])
   {
@@ -6835,15 +6837,15 @@ examine_argument (enum machine_mode mode, const_tr
break;
   case X86_64_X87_CLASS:
   case X86_64_X87UP_CLASS:
+  case X86_64_COMPLEX_X87_CLASS:
if (!in_return)
- return 0;
+ return true;
break;
-  case X86_64_COMPLEX_X87_CLASS:
-   return in_return ? 2 : 0;
   case X86_64_MEMORY_CLASS:
gcc_unreachable ();
   }
-  return 1;
+
+  return false;
 }
 
 /* Construct container for the argument used by GCC interface.  See
@@ -6873,8 +6875,8 @@ construct_container (enum machine_mode mode, enum
   n = classify_argument (mode, type, regclass, 0);
   if (!n)
 return NULL;
-  if (!examine_argument (mode, type, in_return, needed_intregs,
-needed_sseregs))
+  if (examine_argument (mode, type, in_return, needed_intregs,
+   needed_sseregs))
 return NULL;
   if (needed_intregs  nintregs || needed_sseregs  nsseregs)
 return NULL;
@@ -7193,7 +7195,7 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, enu
 || VALID_AVX256_REG_MODE (mode)))
 return;
 
-  if (examine_argument (mode, type, 0, int_nregs, sse_nregs)
+  if (!examine_argument (mode, type, 0, int_nregs, sse_nregs)
sse_nregs = cum-sse_nregs  int_nregs = cum-nregs)
 {
   cum-nregs -= int_nregs;
@@ -7988,95 +7990,87 @@ ix86_libcall_value (enum machine_mode mode)
 
 /* Return true iff type is returned in memory.  */
 
-static bool ATTRIBUTE_UNUSED
-return_in_memory_32 (const_tree type, enum machine_mode mode)
+static bool
+ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
 {
+#ifdef SUBTARGET_RETURN_IN_MEMORY
+  return SUBTARGET_RETURN_IN_MEMORY (type, fntype);
+#else
+  const enum machine_mode mode = type_natural_mode (type, NULL, true);
   HOST_WIDE_INT size;
 
-  if (mode == BLKmode)
-return true;
+  if (TARGET_64BIT)
+{
+  if (ix86_function_type_abi (fntype) == MS_ABI)
+   {
+ size = int_size_in_bytes (type);
 
-  size = int_size_in_bytes (type);
+ /* __m128 is returned in xmm0.  */
+ if ((!type || VECTOR_INTEGER_TYPE_P (type)
+  || INTEGRAL_TYPE_P (type)
+  || VECTOR_FLOAT_TYPE_P (type))
+  (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
+  !COMPLEX_MODE_P (mode)
+  (GET_MODE_SIZE (mode) == 16 || size == 16))
+   return false;
 
-  if (MS_AGGREGATE_RETURN  AGGREGATE_TYPE_P (type)  size = 8)
-return false;
+ /* Otherwise, the size must be exactly in [1248]. */
+ return size != 1  size != 2  size != 4  size != 8;
+   }
+  else
+   {
+ int needed_intregs, needed_sseregs;
 
-  if (VECTOR_MODE_P (mode) || mode == TImode)
+ return examine_argument (mode, type, 1,
+  needed_intregs, needed_sseregs);
+   }
+}
+  else
 {
-  /* User-created vectors small enough to fit in EAX.  */
-  if (size  8)
-   return false;
+  if (mode == BLKmode)
+   return true;
 
-  /* MMX/3dNow values are returned in MM0,
-except when it doesn't exits or the ABI prescribes otherwise.  */
-  if (size == 8)
-   return !TARGET_MMX || TARGET_VECT8_RETURNS;
+  size = int_size_in_bytes (type);
 
-  /* SSE values are returned in XMM0, except when it doesn't exist.  */
-  if (size

Re: [build] Correctly detect native TLS support with 64-bit gas on Solaris/x86 (PR target/60817)

2014-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2014 at 5:21 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 As reported in the PR, gcc/configure currently fails to detect native
 TLS support on x86_64-*-solaris2* with a 64-bit gas since it feeds it
 32-bit TLS code.  I haden't noticed this so far since I've been using a
 32-bit gas here (no idea why).

 The following patch fixes this by making sure 64-bit code is both used
 for 64-bit-default configurations and the necessary assembler flags
 passed.  I've chosen to merge the i?86 and x86_64 cases to avoid
 duplicating considerable amounts of code.  When using the native Solaris
 assembler, the relocs need to be in lower case as already done for
 32-bit.

 Tested by configuring for x86_64-pc-solaris2.11 with 32-bit gas, 64-bit
 gas, /bin/as, i386-pc-solaris2.11 with 32-bit gas and /bin/as,
 x86_64-unknown-linux-gnu, and i686-unknown-linux-gnu and checking that
 native TLS support is detected correctly.

 Ok for mainline or should I rather bootstrap the change on a couple of
 those configurations?

 Thanks.
 Rainer


 2014-04-15  Rainer Orth  r...@cebitec.uni-bielefeld.de

 PR target/60817
 * configure.ac (set_have_as_tls): Merge i[34567]86-*-* and
 x86_64-*-* cases.
 Pass necessary as flags on 64-bit Solaris/x86.
 Use lowercase relocs for x86_64-*-*.
 * configure: Regenerate.

OK.

Thanks,
Uros.


Re: [PATCH 1/3, x86] X86 Silvermont vector cost model tune

2014-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2014 at 6:06 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 I've separated the patch into 3.
 The patch passes x86 bootstrap.

 1st part:

 2014-04-15  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/i386.c (slm_cost): Fixing vec_to_scalar_cost for
Silvermont according latency table.

... : Adjust vec_to_scalar_cost.

(intel_cost): Ditto.

OK for mainline with the above ChangeLog fix.

Thanks,
Uros.


Re: [PATCH 2/3, x86] X86 Silvermont vector cost model tune

2014-04-16 Thread Uros Bizjak
On Tue, Apr 15, 2014 at 6:08 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 2d part:

 2014-04-15  Evgeny Stupachenko  evstu...@gmail.com

* config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte
shuffle on some x86 architectures.

... (X86_TUNE_SLOW_PSHUFB): New tune definition.

Typo: TARGET_SLOW_PHUFFB - TARGET_SLOW_PSHUFB.

* config/i386/i386.h (TARGET_SLOW_PHUFFB): Ditto.

... : New tune flag.

* config/i386/i386.c (expand_vec_perm_even_odd_1): Avoid byte shuffles
in architectures where they are slow (TARGET_SLOW_PHUFFB).

...: Avoid byte shuffles for TARGET_SLOW_PSHUFB.

OK for mainline with the above ChangeLog modifications.

Thanks,
Uros.


Re: [PATCH 3/3, x86] X86 Silvermont vector cost model tune

2014-04-16 Thread Uros Bizjak
On Tue, Apr 15, 2014 at 6:12 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 3d part:

 2014-04-15  Evgeny Stupachenko  evstu...@gmail.com

 * config/i386/i386.c (x86_add_stmt_cost): Fixing vector cost model for
 Silvermont.

... : Fix vector cost ...

OK for mainline with the above ChangeLog fix.

Thanks,
Uros.


Re: [PATCH 3/3, x86] X86 Silvermont vector cost model tune

2014-04-16 Thread Uros Bizjak
On Wed, Apr 16, 2014 at 4:31 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 For the 3d part of the patch there was a misprint in estimated
 constant. It should be 1.7 instead of 1.8.
 - retval = (retval * 18) / 10;
 + retval = (retval * 17) / 10;

 Bootstarp passed.

The change is also OK.

BTW: trivial patch adjustments like this do not need re-approvals. The
message to the ML should be enough.

Uros.


Re: Remove obsolete Solaris 9 support

2014-04-16 Thread Uros Bizjak
On Wed, Apr 16, 2014 at 1:16 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Now that 4.9 has branched, it's time to actually remove the obsolete
 Solaris 9 configuration.  Most of this is just legwork and falls under
 my Solaris maintainership.

 A couple of questions, though:

 * Uros: I'm removing all sse_os_support() checks from the testsuite.
   Solaris 9 was the only consumer, so it seems best to do away with it.

This is OK, but please leave sse-os-check.h (and corresponding
sse_os_support calls) in the testsuite. Just remove the Solaris 9
specific code from sse-os-check.h and always return 1, perhaps with
the comment that all currently supported OSes support SSE
instructions.

Uros.


Re: Patch ping

2014-04-17 Thread Uros Bizjak
On Wed, Apr 16, 2014 at 11:35 PM, Jeff Law l...@redhat.com wrote:

 I'd like to ping 2 patches:

 http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html
 - Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than
memory load after optimization (I'd like to keep the current
 MODE_SIZE
patch for the reasons mentioned there, but also add this patch)

 This is fine.  Per the follow-up discussion, I think you can mark it was
 resolving 36109 as well.



 http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html
 - PR target/59617
handle gather loads for AVX512 (at least non-masked ones, masked ones
will need to wait for 5.0 and we need to find how to represent it in
GIMPLE)

 I'll leave this to Uros :-)

IIRC, this patch was already committed to 4.9 some time ago.

Uros.


Re: [PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-21 Thread Uros Bizjak
On Mon, Apr 21, 2014 at 8:00 PM, Wei Mi w...@google.com wrote:

 llvm will merge movsd/movhpd to movupd while gcc will not. The merge
 is beneficial on x86 machines starting from Nehalem.

 The patch is to add the merging in peephole.
 bootstrap and regression pass. Is it ok for stage1?

Let's wait for a generic pass, as proposed by Bin. I think that this
pass will render peephole2 approach obsolete.

Uros.


[PATCH, i386]: Fix PR/60909, ICE with -mrdrnd and __builtin_ia32_rdrand32_step

2014-04-21 Thread Uros Bizjak
Hello!

Attached patch fixes PR 60909, where memory operand was used as a
target RTX of a CMOVE insn, leading to unrecognized insn. Similar
problem was found with rdseed insn, where memory operand was used as
an invalid target of a ZERO_EXTEND insn.

Attached patch fixes both occurences.

2014-04-21  Uros Bizjak  ubiz...@gmail.com

PR target/60909
* config/i386/i386.c (ix86_expand_builtin)
case IX86_BUILTIN_RDRAND{16,32,64}_STEP: Use temporary
register for target RTX.
case IX86_BUILTIN_RDSEED{16,32,64}_STEP: Ditto.

Testsuite/ChangeLog:

2014-04-21  Uros Bizjak  ubiz...@gmail.com

PR target/60909
* gcc.target/i386/pr60909-1.c: New test.
* gcc.target/i386/pr60909-2.c: Ditto.

Patch was committed to mainline and will be committed to other release
branches after 4.9 is released.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 209544)
+++ config/i386/i386.c  (working copy)
@@ -35400,7 +35400,8 @@ rdrand_step:
   else
op2 = gen_rtx_SUBREG (SImode, op0, 0);
 
-  if (target == 0)
+  if (target == 0
+ || !register_operand (target, SImode))
target = gen_reg_rtx (SImode);
 
   pat = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
@@ -35442,7 +35443,8 @@ rdseed_step:
  const0_rtx);
   emit_insn (gen_rtx_SET (VOIDmode, op2, pat));
 
-  if (target == 0)
+  if (target == 0
+ || !register_operand (target, SImode))
 target = gen_reg_rtx (SImode);
 
   emit_insn (gen_zero_extendqisi2 (target, op2));
Index: testsuite/gcc.target/i386/pr60909-1.c
===
--- testsuite/gcc.target/i386/pr60909-1.c   (revision 0)
+++ testsuite/gcc.target/i386/pr60909-1.c   (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -mrdrnd } */
+
+extern void bar (int);
+
+void
+foo (unsigned *u)
+{
+  int i = __builtin_ia32_rdrand32_step (u);
+  bar (i);
+}
Index: testsuite/gcc.target/i386/pr60909-2.c
===
--- testsuite/gcc.target/i386/pr60909-2.c   (revision 0)
+++ testsuite/gcc.target/i386/pr60909-2.c   (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -mrdseed } */
+
+extern void bar (int);
+
+void
+foo (unsigned *u)
+{
+  int i = __builtin_ia32_rdseed_si_step (u);
+  bar (i);
+}


Re: Remove obsolete Solaris 9 support

2014-04-23 Thread Uros Bizjak
On Tue, Apr 22, 2014 at 2:35 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Uros Bizjak ubiz...@gmail.com writes:

 On Wed, Apr 16, 2014 at 1:16 PM, Rainer Orth
 r...@cebitec.uni-bielefeld.de wrote:
 Now that 4.9 has branched, it's time to actually remove the obsolete
 Solaris 9 configuration.  Most of this is just legwork and falls under
 my Solaris maintainership.

 A couple of questions, though:

 * Uros: I'm removing all sse_os_support() checks from the testsuite.
   Solaris 9 was the only consumer, so it seems best to do away with it.

 This is OK, but please leave sse-os-check.h (and corresponding
 sse_os_support calls) in the testsuite. Just remove the Solaris 9
 specific code from sse-os-check.h and always return 1, perhaps with
 the comment that all currently supported OSes support SSE
 instructions.

 Here's the final patch I've checked in, incorporating all review
 comments.  I've left out the libgo (already checked in by Ian) and
 classpath parts.

It looks to me that one part was left in libgcc/config/i386/crtfastmath.c:

#if !defined __x86_64__  defined __sun__  defined __svr4__
#include signal.h
#include ucontext.h
...
#endif


Re: [i386] define __SIZEOF_FLOAT128__

2014-04-24 Thread Uros Bizjak
On Thu, Apr 24, 2014 at 7:35 AM, Marc Glisse marc.gli...@inria.fr wrote:

 (Adding an i386 maintainer in Cc)
 http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00620.html


 On Sun, 13 Apr 2014, Marc Glisse wrote:

 Hello,

 some people like having a macro to test if a type is available
 (__SIZEOF_INT128__ for instance). This adds macros for __float80 and
 __float128. The types seem to be always available, so I didn't add any
 condition.

 If you think this is a bad idea, please close the PR.

 Bootstrap+testsuite on x86_64-linux-gnu.

 2014-04-13  Marc Glisse  marc.gli...@inria.fr

 PR preprocessor/56540
 * config/i386/i386-c.c (ix86_target_macros): Define
 __SIZEOF_FLOAT80__ and __SIZEOF_FLOAT128__.



 For __SIZEOF_FLOAT80__, you should check TARGET_128BIT_LONG_DOUBLE
 instead of TARGET_64BIT.


 Good point, thanks! It now matches i386-modes.def. Is this version (same
 changelog) ok?

A couple of extra defines won't hurt, and maybe they will be useful to someone.

So, if there are no objections in the next 24h, the patch is OK for mainline.

Thanks,
Uros.


[PATCH, testsuite]: Require vect_simd_clones effective target for c-c++-common/gomp/pr60823-2.c

2014-04-25 Thread Uros Bizjak
Hello!

This is a runtime test, so check if we are able to at least compile the source.

2014-04-25  Uros Bizjak  ubiz...@gmail.com

* c-c++-common/gomp/pr60823-2.c: Require effective target
vect_simd_clones.

Tested on x86_64 CentOS 5.10.

OK for mainline?

Uros.

Index: c-c++-common/gomp/pr60823-2.c
===
--- c-c++-common/gomp/pr60823-2.c   (revision 209778)
+++ c-c++-common/gomp/pr60823-2.c   (working copy)
@@ -1,5 +1,6 @@
 /* PR tree-optimization/60823 */
 /* { dg-do run } */
+/* { dg-require-effective-target vect_simd_clones } */
 /* { dg-options -O2 -fopenmp-simd } */

 #pragma omp declare simd simdlen(4) notinbranch


Re: [COMMITTED] Fix debug/60438 -- i686 stack vs fp operations

2014-04-26 Thread Uros Bizjak
On Sat, Apr 26, 2014 at 11:27 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 13-03-14 21:49, Richard Henderson wrote:

   (define_expand ldexpxf3
 -  [(set (match_dup 3)
 -   (float:XF (match_operand:SI 2 register_operand)))
 -   (parallel [(set (match_operand:XF 0  register_operand)
 -  (unspec:XF [(match_operand:XF 1 register_operand)
 -  (match_dup 3)]
 - UNSPEC_FSCALE_FRACT))
 - (set (match_dup 4)
 -  (unspec:XF [(match_dup 1) (match_dup 3)]
 - UNSPEC_FSCALE_EXP))])]
 +  [(match_operand:XF 0 register_operand)
 +   (match_operand:XF 1 register_operand)
 +   (match_operand:SI 2 register_operand)]
 TARGET_USE_FANCY_MATH_387
   flag_unsafe_math_optimizations
   {
 @@ -14808,6 +14633,11 @@

 operands[3] = gen_reg_rtx (XFmode);
 operands[4] = gen_reg_rtx (XFmode);
 +
 +  emit_insn (gen_floatsixf2 (operands[3], operands[2]));
 +  emit_insn (gen_fscalexf4_i387 (operands[0], operands[4],
 + operands[1], operands[3]));
 +  DONE;
   })


 Richard,

 For a non-bootstrap x86_64 build, gcc.dg/builtins-34.c fails for me with a
 sigsegv.

 I've traced it back to this code in insn-emit.c:
 ...
 rtx
 gen_ldexpxf3 (rtx operand0,
 rtx operand1,
 rtx operand2)
 {
   rtx _val = 0;
   start_sequence ();
   {
 rtx operands[3];
 operands[0] = operand0;
 operands[1] = operand1;
 operands[2] = operand2;

 {
   if (optimize_insn_for_size_p ())
 FAIL;

   operands[3] = gen_reg_rtx (XFmode);
   operands[4] = gen_reg_rtx (XFmode);
 ...

 operands is declared with size 3, and operands[3,4] accesses are out of
 bounds.

 I've done a minimal build with attached patch, and reran the test-case,
 which passes now.

 OK if bootstrap succeeds?

 2014-04-26  Tom de Vries  t...@codesourcery.com

 * config/i386/i386.md (define_expand ldexpxf3): Fix out-of-bounds
 array accesses.

OK for mainline and 4.9 branch.

Thanks,
Uros.


[PATCH, testsuite]: Clean dump files

2014-04-26 Thread Uros Bizjak
Hello!

2014-04-26  Uros Bizjak  ubiz...@gmail.com

* gcc.dg/tree-ssa/alias-30.c (dg-options): Dump only fre1 details.
* gcc.dg/vect/pr60505.c: Cleanup vect tree dump.
* g++.dg/ipa/devirt-27.C (dg-options): Remove -fdump-ipa-devirt.

Committed to mainline and 4.9 branch.

Uros.
Index: gcc.dg/tree-ssa/alias-30.c
===
--- gcc.dg/tree-ssa/alias-30.c  (revision 209806)
+++ gcc.dg/tree-ssa/alias-30.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O -fdump-tree-fre-details } */
+/* { dg-options -O -fdump-tree-fre1-details } */
 
 extern int posix_memalign(void **memptr,
  __SIZE_TYPE__ alignment, __SIZE_TYPE__ size);
Index: gcc.dg/vect/pr60505.c
===
--- gcc.dg/vect/pr60505.c   (revision 209806)
+++ gcc.dg/vect/pr60505.c   (working copy)
@@ -10,3 +10,5 @@
 out[i] = (ovec[i] = in[i]);
   out[num] = ovec[num/2];
 }
+
+/* { dg-final { cleanup-tree-dump vect } } */
Index: g++.dg/ipa/devirt-27.C
===
--- g++.dg/ipa/devirt-27.C  (revision 209806)
+++ g++.dg/ipa/devirt-27.C  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O3 -fdump-ipa-devirt -fdump-tree-optimized  } */
+/* { dg-options -O3 -fdump-tree-optimized  } */
 struct A
  {
int a;


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-01 Thread Uros Bizjak
On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote:
 Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?

None of these patches have correct ChangeLog entries. Please follow
the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting
Patches section), otherwise your patches will be simply ignored.

 I attached the patch which combined your two patches and the fix in
 legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
 the code looked fine. Do you think it is ok?

 Thanks,
 Wei.

 Either pr58066-3.patch or pr58066-4.patch looks good to me.

pr58066-4 patch is definitely not OK. I wonder, how it works at all,
since you can't split the insn to the same pattern. The generic code
detects this condition and forces ICE (IIRC: this is the reason for
UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1).

From pr58066-3 patch:

-;; Local dynamic of a single variable is a lose.  Show combine how
-;; to convert that back to global dynamic.
-
-(define_insn_and_split *tls_local_dynamic_32_once
-  [(set (match_operand:SI 0 register_operand =a)
-(plus:SI
- (unspec:SI [(match_operand:SI 1 register_operand b)
- (match_operand 2 constant_call_address_operand z)]
-UNSPEC_TLS_LD_BASE)
- (const:SI (unspec:SI
-[(match_operand 3 tls_symbolic_operand)]
-UNSPEC_DTPOFF
-   (clobber (match_scratch:SI 4 =d))
-   (clobber (match_scratch:SI 5 =c))
-   (clobber (reg:CC FLAGS_REG))]
-  
-  #
-  
-  [(parallel
- [(set (match_dup 0)
-   (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)]
-  UNSPEC_TLS_GD))
-  (clobber (match_dup 4))
-  (clobber (match_dup 5))
-  (clobber (reg:CC FLAGS_REG))])])

Why did you remove this splitter?

Please do not write:

+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

but use a short form:

+  ix86_tls_descriptor_calls_expanded_in_cfun = true;)

Please also add a testcase (from one of the previous mails):

--- testsuite/gcc.dg/pr58066.c (revision 0)
+++ testsuite/gcc.dg/pr58066.c (revision 0)

Put this test to gcc.target/i386 directory ...
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */

... to avoid target selector.

+/* { dg-options -fPIC -O2 } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return ccc2;
+}
+
+/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */

Please repost the complete patch with a proper ChangeLog.

Uros.


Re: Fix various x86 tests for --with-arch=bdver3 --with-cpu=bdver3

2014-05-05 Thread Uros Bizjak
On Mon, May 5, 2014 at 6:44 PM, Joseph S. Myers jos...@codesourcery.com wrote:

 These are due to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL tuning flag.
 Currently, this flag applies to all vector sizes (128, 256 and 512
 bits), but I guess it is effective only for 128 bit sizes. Can you
 please review usage of this flag in i386/sse.md?

 Indeed, the optimization as described in
 http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01464.html is purely
 about reducing code size, and is irrelevant in VEX-prefixed cases.
 Thus, this patch adds MODE_SIZE == 16 conditionals in relevant cases
 (some cases already had such conditionals or otherwise wouldn't be
 used for larger vectors).

 Tested with no regressions for x86_64-linux-gnu (--with-arch=bdver3
 --with-cpu=bdver3, where it fixes most of the remaining scan-assembler
 test failures).  OK to commit?

 2014-05-05  Joseph Myers  jos...@codesourcery.com

 * config/i386/sse.md (*movmode_internal)
 (*sse_loadussemodesuffixavxsizesuffixmask_name)
 (*sse2_avx_avx512f_loaddqumodemask_name)
 (sse_andnotmode3, codemode3, *andnotmode3)
 (*codemode3, *andnotmode3mask_name)
 (mask_codeforcodemode3mask_name): Only consider
 TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL for modes of size 16.

This is OK for mainline with a slight change below.

 Index: gcc/config/i386/sse.md
 ===
 --- gcc/config/i386/sse.md  (revision 209980)
 +++ gcc/config/i386/sse.md  (working copy)
 @@ -758,7 +758,8 @@
[(set_attr type sselog1,ssemov,ssemov)
 (set_attr prefix maybe_vex)
 (set (attr mode)
 -   (cond [(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
 +   (cond [(and (match_test MODE_SIZE == 16)
 +   (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL))
  (const_string ssePSmode)
(and (match_test MODE_SIZE == 16)
 (and (eq_attr alternative 2)

Please merge the changed first and the second conditional to:

(cond [(and (match_test MODE_SIZE == 16)
   (ior (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
 (and (eq_attr alternative 2)
  (match_test TARGET_SSE_TYPELESS_STORES
 (const_string ssePSmode)

Thanks,
Uros.


Fwd: [PATCH, alpha]: Fix PR61092, wide-int merge broke alpha bootstrap

2014-05-08 Thread Uros Bizjak
Hello!

Wide-int merge triggered following ICE:

In file included from ../../gcc-svn/trunk/gcc/wide-int.cc:37:0:
../../gcc-svn/trunk/gcc/wide-int.cc: In function ‘unsigned int
wi::mul_internal(long int*, const long int*, unsigned int, const long
int*, unsigned int, unsigned int, signop, bool*, bool)’:
../../gcc-svn/trunk/gcc/../include/longlong.h:145:10: sorry,
unimplemented: unexpected AST of kind mult_highpart_expr
 (ph) = __builtin_alpha_umulh (__m0, __m1);\
  ^
../../gcc-svn/trunk/gcc/wide-int.cc:1269:4: note: in expansion of
macro ‘umul_ppmm’
umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
^
../../gcc-svn/trunk/gcc/../include/longlong.h:145:10: internal
compiler error: in potential_constant_expression_1, at
cp/semantics.c:10575
 (ph) = __builtin_alpha_umulh (__m0, __m1);\
  ^
../../gcc-svn/trunk/gcc/wide-int.cc:1269:4: note: in expansion of
macro ‘umul_ppmm’
umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
^

As instructed by Jakub, target builtins should be folded during gimplification.

2014-05-08  Uros Bizjak  ubiz...@gmail.com

PR target/61092
* config/alpha/alpha.c: Include gimple-iterator.h.
(alpha_gimple_fold_builtin): New function.  Move
ALPHA_BUILTIN_UMULH folding from ...
(alpha_fold_builtin): ... here.
(TARGET_GIMPLE_FOLD_BUILTIN): New define.

Patch was bootstrapped and regression tested on
alphaev68-pc-linux-gnu. If there are no objections, I will commit the
patch to mainline and 4.9.

Uros.
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 210120)
+++ config/alpha/alpha.c(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.  If not see
 #include gimple-expr.h
 #include is-a.h
 #include gimple.h
+#include gimple-iterator.h
 #include gimplify.h
 #include gimple-ssa.h
 #include stringpool.h
@@ -7042,9 +7043,6 @@ alpha_fold_builtin (tree fndecl, int n_args, tree
 case ALPHA_BUILTIN_MSKQH:
   return alpha_fold_builtin_mskxx (op, opint, op_const, 0xff, true);
 
-case ALPHA_BUILTIN_UMULH:
-  return fold_build2 (MULT_HIGHPART_EXPR, alpha_dimode_u, op[0], op[1]);
-
 case ALPHA_BUILTIN_ZAP:
   opint[1] ^= 0xff;
   /* FALLTHRU */
@@ -7094,6 +7092,49 @@ alpha_fold_builtin (tree fndecl, int n_args, tree
   return NULL;
 }
 }
+
+bool
+alpha_gimple_fold_builtin (gimple_stmt_iterator *gsi)
+{
+  bool changed = false;
+  gimple stmt = gsi_stmt (*gsi);
+  tree call = gimple_call_fn (stmt);
+  gimple new_stmt = NULL;
+
+  if (call)
+{
+  tree fndecl = gimple_call_fndecl (stmt);
+
+  if (fndecl)
+   {
+ tree arg0, arg1;
+
+ switch (DECL_FUNCTION_CODE (fndecl))
+   {
+   case ALPHA_BUILTIN_UMULH:
+ arg0 = gimple_call_arg (stmt, 0);
+ arg1 = gimple_call_arg (stmt, 1);
+
+ new_stmt
+   = gimple_build_assign_with_ops (MULT_HIGHPART_EXPR,
+   gimple_call_lhs (stmt),
+   arg0,
+   arg1);
+ break;
+   default:
+ break;
+   }
+   }
+}
+
+  if (new_stmt)
+{
+  gsi_replace (gsi, new_stmt, true);
+  changed = true;
+}
+
+  return changed;
+}
 
 /* This page contains routines that are used to determine what the function
prologue and epilogue code will do and write them out.  */
@@ -9790,6 +9831,8 @@ alpha_canonicalize_comparison (int *code, rtx *op0
 #define TARGET_EXPAND_BUILTIN alpha_expand_builtin
 #undef  TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN alpha_fold_builtin
+#undef  TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_BUILTIN alpha_gimple_fold_builtin
 
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL alpha_function_ok_for_sibcall


Re: Fix some tests for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL

2014-05-08 Thread Uros Bizjak
On Thu, May 8, 2014 at 3:10 AM, Joseph S. Myers jos...@codesourcery.com wrote:
 Having fixed TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to apply only to
 128-bit vectors, some --with-arch=bdver3 --with-cpu=bdver3
 scan-assembler failures relating to that tuning remain, because of
 different choices of instructions for 128-bit vectors from the choices
 expected by the tests.

 This patch fixes affected tests to allow the different instruction
 choices seen in this case.  Tested for x86_64-linux-gnu
 (--with-arch=bdver3 --with-cpu=bdver3).  OK to commit?

OK.

Thanks,
Uros.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-08 Thread Uros Bizjak
On Thu, May 8, 2014 at 12:59 AM, Wei Mi w...@google.com wrote:

 The calls added in the templates of tls_local_dynamic_base_32 and
 tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2
 from moving sp setting across implicit tls calls, but those calls make
 the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that
 the optimization in tls_local_dynamic_32_once to convert local_dynamic
 to global_dynamic mode for single tls reference cannot take effect. In
 the updated patch, I remove those calls from insn templates and add
 reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and
 UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above,
 and now the optimization in tls_local_dynamic_32_once works.

 bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK
 if regression passes?

Please update ChangeLog with all changes, see below:

 ChangeLog:

 gcc/
 2014-05-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout):
 preferred_stack_boundary updated for tls expanded call.

(...): Update preferred_stack_boundary for call, expanded from tls descriptor.

 * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun.

* config/i386/i386.md (*tls_global_dynamic_32_gnu): Depend on SP register.
(*tls_local_dynamic_base_32_gnu): Ditto.
...
(tls_global_dynamic_32): Set
ix86_tls_descriptor_calls_expanded_in_cfun.  Update RTX to depend on
SP register.
(tls_local_dynamic_base_32): Ditto.
...

The patch is OK for mainline with updated and complete ChangeLog entry.

Thanks,
Uros.


[PATCH, i386]: Fix PR59952, -march=core-avx2 should not enable RTM

2014-05-08 Thread Uros Bizjak
Hello!

Apparently, not all Haswell processors have TSX. Attached patch
removes PTA_RTM from default Haswell flags. PTA_HLX still makes sense
for Haswell processors, since the prefix is ignored on non-TSX
processors.

2014-05-08  Uros Bizjak  ubiz...@gmail.com

PR target/59952
* config/i386/i386.c (PTA_HASWELL): Remove PTA_RTM.

Bootstrapped and regression tested on x86_64-pc-linux-gnu. The patch
is committed on mainline, will be backported to all relevant release
branches.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 210231)
+++ config/i386/i386.c  (working copy)
@@ -3130,7 +3130,7 @@ ix86_option_override_internal (bool main_args_p,
   (PTA_SANDYBRIDGE | PTA_FSGSBASE | PTA_RDRND | PTA_F16C)
 #define PTA_HASWELL \
   (PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI | PTA_BMI2 | PTA_LZCNT \
-   | PTA_FMA | PTA_MOVBE | PTA_RTM | PTA_HLE)
+   | PTA_FMA | PTA_MOVBE | PTA_HLE)
 #define PTA_BROADWELL \
   (PTA_HASWELL | PTA_ADX | PTA_PRFCHW | PTA_RDSEED)
 #define PTA_BONNELL \


Re: [PATCH][x86] Support clflushopt, xsaves, xsavec.

2014-05-12 Thread Uros Bizjak
On Mon, May 12, 2014 at 3:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 This patch add support for xsavec, xsaves ISA extensions, introduced in
 [1], and clflushopt introduced in [2].

 [1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
 [2]http://software.intel.com/en-us/file/319433-018pdf

 Bootstraps, passes make-check.

Please also add new options to g++.dg/other/i386-{2,3}.C and
gcc.target/i386/sse-{14,15,22,23}.c.

Uros.


Re: [PATCH][x86] Support clflushopt, xsaves, xsavec.

2014-05-13 Thread Uros Bizjak
On Tue, May 13, 2014 at 11:18 AM, Ilya Tocar tocarip.in...@gmail.com wrote:

  This patch add support for xsavec, xsaves ISA extensions, introduced in
  [1], and clflushopt introduced in [2].
 
  [1]http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
  [2]http://software.intel.com/en-us/file/319433-018pdf
 
  Bootstraps, passes make-check.

 Please also add new options to g++.dg/other/i386-{2,3}.C and
 gcc.target/i386/sse-{14,15,22,23}.c.

 Uros.

 Done.
 Looks like sse-15 doesn't need new options, I've assumed sse-12/13.

Yes, you are right.

 Changelog:

 2014-05-12  Ilya Tocar  ilya.to...@intel.com

 * common/config/i386/i386-common.c
 (OPTION_MASK_ISA_CLFLUSHOPT_SET): Define.
 (OPTION_MASK_ISA_XSAVES_SET): Ditto.
 (OPTION_MASK_ISA_XSAVEC_SET): Ditto.
 (OPTION_MASK_ISA_CLFLUSHOPT_UNSET): Ditto.
 (OPTION_MASK_ISA_XSAVES_UNSET): Ditto.
 (OPTION_MASK_ISA_XSAVEC_UNSET): Ditto.
 (ix86_handle_option): Handle OPT_mxsavec, OPT_mxsaves,
 OPT_mclflushopt.
 * config.gcc (i[34567]86-*-*): Add clflushoptintrin.h,
 xsavecintrin.h, xsavesintrin.h.
 (x86_64-*-*): Ditto.
 * config/i386/clflushoptintrin.h: New.
 * config/i386/xsavecintrin.h: Ditto.
 * config/i386/xsavesintrin.h: Ditto.
 * config/i386/cpuid.h (bit_CLFLUSHOPT): Define.
 (bit_XSAVES): Ditto.
 (bit_XSAVES): Ditto.
 * config/i386/driver-i386.c (host_detect_local_cpu): Handle
 -mclflushopt, -mxsavec, -mxsaves, -mno-xsaves, -mno-xsavec,
 -mno-clflushopt.
 * config/i386/i386-c.c (ix86_target_macros_internal): Handle
 OPTION_MASK_ISA_CLFLUSHOPT, OPTION_MASK_ISA_XSAVEC,
 OPTION_MASK_ISA_XSAVES.
 * config/i386/i386.c (ix86_target_string): Handle -mclflushopt,
 -mxsavec, -mxsaves.
 (PTA_CLFLUSHOPT) Define.
 (PTA_XSAVEC): Ditto.
 (PTA_XSAVES): Ditto.
 (ix86_option_override_internal): Handle new options.
 (ix86_valid_target_attribute_inner_p): Ditto.
 (ix86_builtins): Add IX86_BUILTIN_XSAVEC, IX86_BUILTIN_XSAVEC64,
 IX86_BUILTIN_XSAVES, IX86_BUILTIN_XRSTORS, IX86_BUILTIN_XSAVES64,
 IX86_BUILTIN_XRSTORS64, IX86_BUILTIN_CLFLUSHOPT.
 (bdesc_special_args): Add __builtin_ia32_xsaves, 
 __builtin_ia32_xrstors,
 __builtin_ia32_xsavec, __builtin_ia32_xsaves64, 
 __builtin_ia32_xrstors64,
 __builtin_ia32_xsavec64.
 (ix86_init_mmx_sse_builtins): Add __builtin_ia32_clflushopt.
 (ix86_expand_builtin): Handle new builtins.
 * config/i386/i386.h (TARGET_CLFLUSHOPT) Define.
 (TARGET_CLFLUSHOPT_P): Ditto.
 (TARGET_XSAVEC): Ditto.
 (TARGET_XSAVEC_P): Ditto.
 (TARGET_XSAVES): Ditto.
 (TARGET_XSAVES_P): Ditto.
 * config/i386/i386.md (ANY_XSAVE): Add UNSPECV_XSAVEC, UNSPECV_XSAVES.
 (ANY_XSAVE64) Add UNSPECV_XSAVEC64, UNSPECV_XSAVES64.
 (attr xsave): Add xsavec, xsavec64, xsaves, xsaves64.
 (ANY_XRSTOR): New.
 (ANY_XRSTOR64): Ditto.
 (xrstor): Ditto.
 (xrstor): Change into xrstor.
 (xrstor_rex64): Change into xrstor_rex64.
 (xrstor64): Change into xrstor64
 (clflushopt): New.
 * config/i386/i386.opt (mclflushopt): New.
 (mxsavec): Ditto.
 (mxsaves): Ditto.
 * config/i386/x86intrin.h: Add clflushoptintrin.h, xsavesintrin.h,
 xsavecintrin.h.
 * doc/invoke.texi: Document new options.

 And for tests:

 2014-05-12  Ilya Tocar  ilya.to...@intel.com
 * gcc.target/i386/clflushopt-1.c: New.
 * gcc.target/i386/xsavec-1.c: Ditto.
 * gcc.target/i386/xsavec64-1.c: Ditto.
 * gcc.target/i386/xsaves-1.c: Ditto.
 * gcc.target/i386/xsaves64-1.c: Ditto.
 * gcc.target/i386/sse-12.c: Test new options.
 * gcc.target/i386/sse-13.c: Ditto.
 * gcc.target/i386/sse-14.c: Ditto.
 * gcc.target/i386/sse-22.c: Ditto.
 * gcc.target/i386/sse-23.c: Ditto.
 * g++.dg/other/i386-2.C: Ditto.
 * g++.dg/other/i386-3.C: Ditto.

This is OK for mainline.

Thanks,
Uros.


Re: [PATCH] Fix PR 60901

2014-05-14 Thread Uros Bizjak
On Wed, May 14, 2014 at 10:57 AM, Andrey Belevantsev a...@ispras.ru wrote:

 This ICE comes from the ix86_dependencies_evaluation_hook code assumption
 that any scheduling region will be connected.  This assumption is not
 correct in case of the outer loops pipelining of the selective scheduler as
 explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3.
 Trying to add dependencies between insns from the different scheduling
 regions results in a segfault within the dependency analyzer code.

 The fix is to adjust the code to account for the situation when basic
 block's predecessors do not belong to the same scheduling region.

 Bootstrapped and tested on x86-64, OK for trunk?  Branches?  The fix is low
 risk as the additional test should always be true for the regular scheduler.

I don't know all scheduler details, so your opinion counts there.
Let's put this fix to mainline first and after a week without
problems, backport it to all release branches.

 gcc/
 2014-05-14  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60901

 * config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that
 bb predecessor belongs to the same scheduling region.  Adjust comment.

 testsuite/
 2014-05-14  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60901
 * gcc.dg/pr60901.c: New test.

+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fselective-scheduling -fschedule-insns
-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops
-fno-tree-dominator-opts  } */
+

As this is clearly a target bug, let's put the test in gcc.target/i386
directory. You can remove target selector, and the test will run for
64bit and 32bit targets automatically.

Thanks,
Uros.


Re: Make check_effective_target_vect_sizes_32B_16B handle -mprefer-avx128

2014-05-17 Thread Uros Bizjak
On Sat, May 10, 2014 at 6:00 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 In general, vectorization tests whose expectations on x86 depend on
 whather AVX is available should only consider AVX available if
 -mprefer-avx128 is not enabled.  Some of the effective-target
 functions in target-supports.exp handle this properly, but
 check_effective_target_vect_sizes_32B_16B does not do so, resulting in
 various test failures in configurations with -mprefer-avx128.

 This patch makes check_effective_target_vect_sizes_32B_16B follow
 other functions in checking check_prefer_avx128.  It fixes the
 following failures for x86_64-linux-gnu --with-arch=bdver3
 --with-tune=bdver3.  OK to commit?

 FAIL: gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect 
 vect_recog_over_widening_pattern: detected 8
 FAIL: gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect 
 vect_recog_over_widening_pattern: detected 8
 FAIL: gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect vectorized 1 loops 
 2
 FAIL: gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect vectorizing stmts 
 using SLP 1
 FAIL: gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects  
 scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8
 FAIL: gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects  
 scan-tree-dump-times vect vect_recog_over_widening_pattern: detected 8
 FAIL: gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects  scan-tree-dump-times 
 vect vectorized 1 loops 2
 FAIL: gcc.dg/vect/slp-perm-9.c -flto -ffat-lto-objects  scan-tree-dump-times 
 vect vectorizing stmts using SLP 1

 (I still see some gcc.dg/vect/costmodel/ failures that are unchanged
 by this patch: costmodel-vect-31.c, costmodel-vect-68.c and
 costmodel-fast-math-vect-pr29925.c.)

 2014-05-10  Joseph Myers  jos...@codesourcery.com

 * lib/target-supports.exp
 (check_effective_target_vect_sizes_32B_16B): Return false if
 128-bit AVX vectors preferred.

This is OK.

Thanks,
Uros.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-17 Thread Uros Bizjak
On Mon, May 12, 2014 at 7:38 PM, Wei Mi w...@google.com wrote:
 Here is a patch for the test. It contains two changes:
 1. For emutls, there will be an explicit call generated at expand
 pass, and no stack adjustment is needed. So add /* {
 dg-require-effective-target tls_native } */ in the test.
 2. Replace cfi_def_cfa_offset with insn sequence check.

 Is it ok?

 No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld:

 FAIL: gcc.target/i386/pr58066.c scan-assembler 
 sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr

 The TLS code sequence is different here:

 subl$8, %esp
 lealccc1@tlsgd(,%ebx,1), %eax
 callccc1@tlsgdplt

 I fear this insn scanning is going to be extremely fragile.

 Rainer

 Thanks for trying the testcase. rtl scanning will be slightly better
 than assembly scanning. So how about this one?

This is OK, with a small effective-target addition, as shown below.

Thanks,
Uros.


 Index: testsuite/gcc.target/i386/pr58066.c
 ===
 --- testsuite/gcc.target/i386/pr58066.c (revision 210222)
 +++ testsuite/gcc.target/i386/pr58066.c (working copy)
 @@ -1,5 +1,6 @@
  /* { dg-do compile } */
 -/* { dg-options -fPIC -O2 } */
 +/* { dg-require-effective-target tls_native } */

Please also add

/* { dg-require-effective-target fpic } */

 +/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */

  /* Check whether the stack frame starting addresses of tls expanded calls
 in foo and goo are 16bytes aligned.  */
 @@ -15,4 +16,6 @@ void* goo()
   return ccc2;
  }

 -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
 +/* { dg-final { scan-rtl-dump Function
 foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
 -8.*UNSPEC_TLS.*Function goo final } } */
 +/* { dg-final { scan-rtl-dump Function
 goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
 -8.*UNSPEC_TLS final } } */
 +/* { dg-final { cleanup-rtl-dump final } } */


Re: patch to fix PR60969

2014-05-17 Thread Uros Bizjak
Hello!

Attached patch enhances the testcase to also check for presence of MMX
registers on all 32bit x86 targets.

2014-05-17  Uros Bizjak  ubiz...@gmail.com

* g++.dg/pr60969.C: Compile for all ilp32 x86 targets.
(dg-options): Add -mfpmath=387.
(dg-final): Check that no MMX registers are used.

Tested on x86-64-pc-linux-gnu {,-m32} and committed to mainline and 4.9 branch.

Uros.
Index: pr60969.C
===
--- pr60969.C   (revision 210549)
+++ pr60969.C   (working copy)
@@ -1,5 +1,5 @@
-/* { dg-do compile { target i?86-*-* } } */
-/* { dg-options -O2 -ftree-vectorize -march=pentium4 } */
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* }  ilp32 } } } */
+/* { dg-options -O2 -ftree-vectorize -march=pentium4 -mfpmath=387 } */
 
 struct A
 {
@@ -28,3 +28,5 @@
 }
   return x;
 }
+
+/* { dg-final { scan-assembler-not %mm } } */


[PATCH, doc]: Improve -free description

2014-05-17 Thread Uros Bizjak
Hello!

-free defaults to enabled also for Alpha.  The option is also enabled
for -Os on all targets.

2014-05-17  Uros Bizjak  ubiz...@gmail.com

* doc/invoke.texi (free): Mention Alpha.  Also enabled at -Os.

Committed to 4.9 and mainline SVN.

Uros.
Index: invoke.texi
===
--- invoke.texi (revision 210549)
+++ invoke.texi (working copy)
@@ -7463,7 +7463,8 @@
 helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
 registers after writing to their lower 32-bit half.
 
-Enabled for AArch64 and x86 at levels @option{-O2}, @option{-O3}.
+Enabled for Alpha, AArch64 and x86 at levels @option{-O2},
+@option{-O3}, @option{-Os}.
 
 @item -flive-range-shrinkage
 @opindex flive-range-shrinkage


[PATCH, doc]: Fix a bunch of warnings in *.texi files

2014-05-17 Thread Uros Bizjak
Hello!

Attached patch fixes:

md.texi:1057: warning: node next `Constraints' in menu `Asm Labels'
and in sectioning `Size of an asm' differ
extend.texi:7175: warning: node `Asm Labels' is next for `Size of an
asm' in sectioning but not in menu
extend.texi:7175: warning: node prev `Size of an asm' in menu
`Explicit Reg Vars' and in sectioning `Constraints' differ
extend.texi:7197: warning: node prev `Asm Labels' in menu
`Constraints' and in sectioning `Size of an asm' differ
extend.texi:7245: warning: node `Size of an asm' is next for `Explicit
Reg Vars' in menu but not in sectioning

as seen when compiling on Fedora 20.

2014-05-17  Uros Bizjak  ubiz...@gmail.com

* doc/extend.texi (Size of an asm): Move node text according
to its @menu entry position.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: doc/extend.texi
===
--- doc/extend.texi (revision 210549)
+++ doc/extend.texi (working copy)
@@ -7172,28 +7172,6 @@
 @include md.texi
 @raisesections
 
-@node Size of an asm
-@subsection Size of an @code{asm}
-
-Some targets require that GCC track the size of each instruction used
-in order to generate correct code.  Because the final length of the
-code produced by an @code{asm} statement is only known by the
-assembler, GCC must make an estimate as to how big it will be.  It
-does this by counting the number of instructions in the pattern of the
-@code{asm} and multiplying that by the length of the longest
-instruction supported by that processor.  (When working out the number
-of instructions, it assumes that any occurrence of a newline or of
-whatever statement separator character is supported by the assembler --
-typically @samp{;} --- indicates the end of an instruction.)
-
-Normally, GCC's estimate is adequate to ensure that correct
-code is generated, but it is possible to confuse the compiler if you use
-pseudo instructions or assembler macros that expand into multiple real
-instructions, or if you use assembler directives that expand to more
-space in the object file than is needed for a single instruction.
-If this happens then the assembler may produce a diagnostic saying that
-a label is unreachable.
-
 @node Asm Labels
 @subsection Controlling Names Used in Assembler Code
 @cindex assembler names for identifiers
@@ -7277,6 +7255,28 @@
 specified for that operand in the @code{asm}.)
 @end itemize
 
+@node Size of an asm
+@subsection Size of an @code{asm}
+
+Some targets require that GCC track the size of each instruction used
+in order to generate correct code.  Because the final length of the
+code produced by an @code{asm} statement is only known by the
+assembler, GCC must make an estimate as to how big it will be.  It
+does this by counting the number of instructions in the pattern of the
+@code{asm} and multiplying that by the length of the longest
+instruction supported by that processor.  (When working out the number
+of instructions, it assumes that any occurrence of a newline or of
+whatever statement separator character is supported by the assembler --
+typically @samp{;} --- indicates the end of an instruction.)
+
+Normally, GCC's estimate is adequate to ensure that correct
+code is generated, but it is possible to confuse the compiler if you use
+pseudo instructions or assembler macros that expand into multiple real
+instructions, or if you use assembler directives that expand to more
+space in the object file than is needed for a single instruction.
+If this happens then the assembler may produce a diagnostic saying that
+a label is unreachable.
+
 @menu
 * Global Reg Vars::
 * Local Reg Vars::


[PATCH, libgomp doc]: Fix all libgomp.texi warnings

2014-05-17 Thread Uros Bizjak
Hello!

Attached patch fixes following libgomp.texi warnings:

libgomp.texi:169: warning: multiple @menu
libgomp.texi:184: warning: multiple @menu
libgomp.texi:914: warning: node `omp_init_lock' is next for
`omp_set_schedule' in sectioning but not in menu
libgomp.texi:947: warning: node `omp_set_schedule' is prev for
`omp_init_lock' in sectioning but not in menu
libgomp.texi:1206: warning: node `omp_get_wtick' is next for
`omp_destroy_nest_lock' in sectioning but not in menu
libgomp.texi:1233: warning: node `omp_destroy_nest_lock' is prev for
`omp_get_wtick' in sectioning but not in menu
libgomp.texi:1431: warning: node next `OMP_NUM_THREADS' in menu
`OMP_PROC_BIND' and in sectioning `OMP_PLACES' differ
libgomp.texi:1451: warning: node next `OMP_PLACES' in menu
`OMP_STACKSIZE' and in sectioning `OMP_PROC_BIND' differ
libgomp.texi:1451: warning: node prev `OMP_PLACES' in menu
`OMP_PROC_BIND' and in sectioning `OMP_NUM_THREADS' differ
libgomp.texi:1493: warning: node next `OMP_PROC_BIND' in menu
`OMP_PLACES' and in sectioning `OMP_SCHEDULE' differ
libgomp.texi:1493: warning: node prev `OMP_PROC_BIND' in menu
`OMP_NUM_THREADS' and in sectioning `OMP_PLACES' differ
libgomp.texi:1520: warning: node next `OMP_SCHEDULE' in menu
`OMP_THREAD_LIMIT' and in sectioning `OMP_STACKSIZE' differ
libgomp.texi:1520: warning: node prev `OMP_SCHEDULE' in menu
`OMP_STACKSIZE' and in sectioning `OMP_PROC_BIND' differ
libgomp.texi:1541: warning: node next `OMP_STACKSIZE' in menu
`OMP_SCHEDULE' and in sectioning `OMP_THREAD_LIMIT' differ
libgomp.texi:1541: warning: node prev `OMP_STACKSIZE' in menu
`OMP_PLACES' and in sectioning `OMP_SCHEDULE' differ
libgomp.texi:1561: warning: node prev `OMP_THREAD_LIMIT' in menu
`OMP_SCHEDULE' and in sectioning `OMP_STACKSIZE' differ

these are seen when compiling libgomp on Fedora20.

The menu in Runtime Library Routines now looks this way:

--cut here--
2 Runtime Library Routines
**

The runtime routines described here are defined by Section 3 of the
OpenMP specification in version 4.0.  The routines are structured in
following three parts:

* Menu:

Control threads, processors and the parallel environment.  They have C
linkage, and do not throw exceptions.

* omp_get_active_level::Number of active parallel regions
* omp_get_ancestor_thread_num:: Ancestor thread ID
* omp_get_cancellation::Whether cancellation support is enabled
...
* omp_set_nested::  Enable/disable nested parallel regions
* omp_set_num_threads:: Set upper team size limit
* omp_set_schedule::Set the runtime scheduling method

Initialize, set, test, unset and destroy simple and nested locks.

* omp_init_lock::Initialize simple lock
* omp_set_lock:: Wait for and set simple lock
...
* omp_unset_nest_lock::  Unset nested lock
* omp_destroy_nest_lock::Destroy nested lock

Portable, thread-based, wall clock timer.

* omp_get_wtick::Get timer precision.
* omp_get_wtime::Elapsed wall clock time.
--cut here--

i.e., without extra * Menu entries, what was probably ment from the beginning.

2014-05-17  Uros Bizjak  ubiz...@gmail.com

* libgomp.texi (Runitme Library Routines): Remove multiple @menu.
(Environment Variables) Move OMP_PROC_BIND and OMP_STACKSIZE node
texts according to their @menu entry positions.

Tested with x86_64-pc-linux-gnu bootstrap.

OK for mainline and 4.9?

Uros.
Index: libgomp.texi
===
--- libgomp.texi(revision 210549)
+++ libgomp.texi(working copy)
@@ -130,10 +130,10 @@
 specification in version 4.0.  The routines are structured in following
 three parts:
 
+@menu
 Control threads, processors and the parallel environment.  They have C
 linkage, and do not throw exceptions.
 
-@menu
 * omp_get_active_level::Number of active parallel regions
 * omp_get_ancestor_thread_num:: Ancestor thread ID
 * omp_get_cancellation::Whether cancellation support is enabled
@@ -162,11 +162,9 @@
 * omp_set_nested::  Enable/disable nested parallel regions
 * omp_set_num_threads:: Set upper team size limit
 * omp_set_schedule::Set the runtime scheduling method
-@end menu
 
 Initialize, set, test, unset and destroy simple and nested locks.
 
-@menu
 * omp_init_lock::Initialize simple lock
 * omp_set_lock:: Wait for and set simple lock
 * omp_test_lock::Test and set simple lock if available
@@ -177,11 +175,9 @@
 * omp_test_nest_lock::   Test and set nested lock if available
 * omp_unset_nest_lock::  Unset nested lock
 * omp_destroy_nest_lock::Destroy nested lock
-@end menu
 
 Portable, thread-based, wall clock timer.
 
-@menu
 * omp_get_wtick::Get timer precision.
 * omp_get_wtime::Elapsed wall clock time.
 @end menu
@@ -1448,6 +1444,33 @@
 
 
 
+@node OMP_PROC_BIND
+@section @env{OMP_PROC_BIND

Re: [PATCH, doc]: Fix a bunch of warnings in *.texi files

2014-05-18 Thread Uros Bizjak
On Sun, May 18, 2014 at 7:17 AM, David Wohlferd d...@limegreensocks.com wrote:
 My bad.  My version of makeinfo wasn't reporting these errors.

 However, this isn't right either.  There are two subsections that are now
 under Size of an asm that should be under Variables in Specified
 Registers.  How about this (attached)?

Oh, I was not aware that this is a nested @menu with its own sections.

Sure, your patch is OK. I went ahead and installed it on mainline,
after I have bootstrapped it on x86_64-linux-gnu.

Thanks,
Uros.


[PATCH, doc]: Fix POD document had syntax errors at /usr/bin/pod2man line 69. error

2014-05-18 Thread Uros Bizjak
Hello!

Attached patch fixes following errors in .pod document sources:

gfdl.pod around line 53: Expected text after =item, not a number
gfdl.pod around line 147: Expected text after =item, not a number
gfdl.pod around line 165: Expected text after =item, not a number
gfdl.pod around line 205: Expected text after =item, not a number
gfdl.pod around line 357: Expected text after =item, not a number
gfdl.pod around line 384: Expected text after =item, not a number
gfdl.pod around line 400: Expected text after =item, not a number
gfdl.pod around line 422: Expected text after =item, not a number
gfdl.pod around line 445: Expected text after =item, not a number
gfdl.pod around line 475: Expected text after =item, not a number
gfdl.pod around line 499: Expected text after =item, not a number
POD document had syntax errors at /usr/bin/pod2man line 69.
gmake[3]: [doc/gfdl.7] Error 1 (ignored)

As suggested in the fix for a similar problem [1], the solution is to
put Z in the =item argument string.

2014-05-18  Uros Bizjak  ubiz...@gmail.com

* texi2pod.pl: Force .pod file to not be a numbered list.

The fix was tested by bootstrapping on Fedora20 x86_64-pc-linux-gnu,
and also comparing previous .man and .html files with new ones. They
were bit-exact.

OK for mainline and 4.9?

[1] http://comments.gmane.org/gmane.network.inn/9841

Uros.
Index: texi2pod.pl
===
--- texi2pod.pl (revision 210579)
+++ texi2pod.pl (working copy)
@@ -1,6 +1,6 @@
 #! /usr/bin/perl -w
 
-#   Copyright (C) 1999, 2000, 2001, 2003, 2010 Free Software Foundation, Inc.
+#   Copyright (C) 1999-2014 Free Software Foundation, Inc.
 
 # This file is part of GCC.
 
@@ -337,7 +337,7 @@
 $_ = \n=item $1\n;
 }
} else {
-   $_ = \n=item $ic\n;
+   $_ = \n=item Z\LT;\GT;$ic\n;
$ic =~ y/A-Ya-y/B-Zb-z/;
$ic =~ s/(\d+)/$1 + 1/eg;
}


Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.

2014-05-19 Thread Uros Bizjak
On Mon, May 19, 2014 at 6:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Thanks for the pointer, there is indeed the recommendation in
  optimization manual [1], section 3.6.4, where it is said:
 
  --quote--
  Misaligned data access can incur significant performance penalties.
  This is particularly true for cache line
  splits. The size of a cache line is 64 bytes in the Pentium 4 and
  other recent Intel processors, including
  processors based on Intel Core microarchitecture.
  An access to data unaligned on 64-byte boundary leads to two memory
  accesses and requires several
  ??ops to be executed (instead of one). Accesses that span 64-byte
  boundaries are likely to incur a large
  performance penalty, the cost of each stall generally are greater on
  machines with longer pipelines.
 
  ...
 
  A 64-byte or greater data structure or array should be aligned so that
  its base address is a multiple of 64.
  Sorting data in decreasing size order is one heuristic for assisting
  with natural alignment. As long as 16-
  byte boundaries (and cache lines) are never crossed, natural alignment
  is not strictly necessary (though
  it is an easy way to enforce this).
  --/quote--
 
  So, this part has nothing to do with AVX512, but with cache line
  width. And we do have a --param l1-cache-line-size=64, detected with
  -march=native that could come handy here.
 
  This part should be rewritten (and commented) with the information
  above in mind.

 Like in the patch below. Please note, that the block_tune setting for
 the nocona is wrong, -march=native on my trusted old P4 returns:

 --param l1-cache-size=16 --param l1-cache-line-size=64 --param
 l2-cache-size=2048 -mtune=nocona

 which is consistent with the above quote from manual.

 2014-01-02  Uros Bizjak  ubiz...@gmail.com

 * config/i386/i386.c (ix86_data_alignment): Calculate max_align
 from prefetch_block tune setting.
 (nocona_cost): Correct size of prefetch block to 64.

 Uros,
 I am looking into libreoffice size and the data alignment seems to make huge
 difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 
 4.9,
 while clang produces 5.2MB.

 The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, 
 but
 But perhaps we want to revisit the alignment rules.  The optimization manuals
 usually care only about performance critical loops.  Perhaps we can make the
 rules to align only bigger datastructures, or so at least for -O2.

Based on the above quote, Misaligned data access can incur
significant performance penalties. and the fact that this particular
alignment rule has some compatibility issues with previous versions of
gcc (these were later fixed by Jakub), I'd rather leave this rule as
is. However, if the access is from the cold section, we can perhaps
avoid extra alignment, while avoiding those compatibility issues.

Uros.


Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.

2014-05-19 Thread Uros Bizjak
On Mon, May 19, 2014 at 6:42 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Uros,
 I am looking into libreoffice size and the data alignment seems to make huge
 difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 
 and 4.9,
 while clang produces 5.2MB.

 The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, 
 but
 But perhaps we want to revisit the alignment rules.  The optimization 
 manuals
 usually care only about performance critical loops.  Perhaps we can make the
 rules to align only bigger datastructures, or so at least for -O2.

 Based on the above quote, Misaligned data access can incur
 significant performance penalties. and the fact that this particular
 alignment rule has some compatibility issues with previous versions of
 gcc (these were later fixed by Jakub), I'd rather leave this rule as
 is. However, if the access is from the cold section, we can perhaps
 avoid extra alignment, while avoiding those compatibility issues.


 It is excessive to align

 struct foo
 {
   int x1;
   int x2;
   char x3;
   int x4;
   int x5;
   char x6;
   int x7;
   int x8;
 };

 to 32 bytes and align

 struct foo
 {
   int x1;
   int x2;
   char x3;
   int x4;
   int x5;
   char x6;
   int x7[9];
   int x8;
 };

 to 64 bytes.  What performance gain does it provide?

Avoids significant performance penalties, perhaps?

Uros.


Re: [PATCH, doc]: Fix POD document had syntax errors at /usr/bin/pod2man line 69. error

2014-05-24 Thread Uros Bizjak
On Sun, May 18, 2014 at 11:10 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Attached patch fixes following errors in .pod document sources:

 gfdl.pod around line 53: Expected text after =item, not a number
 gfdl.pod around line 147: Expected text after =item, not a number
 gfdl.pod around line 165: Expected text after =item, not a number
 gfdl.pod around line 205: Expected text after =item, not a number
 gfdl.pod around line 357: Expected text after =item, not a number
 gfdl.pod around line 384: Expected text after =item, not a number
 gfdl.pod around line 400: Expected text after =item, not a number
 gfdl.pod around line 422: Expected text after =item, not a number
 gfdl.pod around line 445: Expected text after =item, not a number
 gfdl.pod around line 475: Expected text after =item, not a number
 gfdl.pod around line 499: Expected text after =item, not a number
 POD document had syntax errors at /usr/bin/pod2man line 69.
 gmake[3]: [doc/gfdl.7] Error 1 (ignored)

 As suggested in the fix for a similar problem [1], the solution is to
 put Z in the =item argument string.

 2014-05-18  Uros Bizjak  ubiz...@gmail.com

 * texi2pod.pl: Force .pod file to not be a numbered list.

 The fix was tested by bootstrapping on Fedora20 x86_64-pc-linux-gnu,
 and also comparing previous .man and .html files with new ones. They
 were bit-exact.

I went ahead and install the patch in the mainline. It is a trivial
one-liner, and can be easily reverted if it makes troubles.

Uros.


Re: [PATCH] PR 61249: fix documentation of __builtin_ia32_{vfrczss,vfrczsd,mpsadbw256}

2014-05-26 Thread Uros Bizjak
Hello!

 2014-05-26  Michael Tautschnig  m...@debian.org

 PR target/61249
 * doc/extend.texi: Fix parameter lists of __builtin_ia32_vfrczs[sd],
 __builtin_ia32_mpsadbw256.

Thanks, I have committed the patch with slightly changed ChangeLog to
all branches.

Uros.


[PATCH, libbid]: Fix variable ‘Ql’ set but not used warnings

2014-05-26 Thread Uros Bizjak
Hello!

Attached patch fixes several variable ‘Ql’ set but not used warnings
in bid128_div.c and bid64_div.c libbid sources. We can simply use
__mul_128x128_high functions when lowpart is not needed.

2014-05-26  Uros Bizjak  ubiz...@gmail.com

* bid128_div.c (BID128_FUNCTION_ARG2): Remove unused variable 'Ql'.
Call __mul_128x128_high instead of __mul_128x128_full.
(TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2): Ditto.
(BID128_FUNCTION_ARGTYPE1_ARG128): Ditto.
(BID128_FUNCTION_ARG128_ARGTYPE2): Ditto.
* bid64_div.c (TYPE0_FUNCTION_ARGTYPE1_ARG128): Ditto.
(TYPE0_FUNCTION_ARG128_ARGTYPE2): Ditto.
(TYPE0_FUNCTION_ARG128_ARG128): Ditto.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

HJ, should this be fixed upstream first?

Uros.
Index: bid128_div.c
===
--- bid128_div.c(revision 210927)
+++ bid128_div.c(working copy)
@@ -36,7 +36,7 @@ extern UINT8 packed_1_zeros[];
 BID128_FUNCTION_ARG2 (bid128_div, x, y)
 
  UINT256 CA4, CA4r, P256;
- UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res;
+ UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res;
  UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD,
valid_y;
  int_float fx, fy, f64;
@@ -239,7 +239,7 @@ if (!CA4.w[0]  !CA4.w[1])
 if (d5  nzeros)
   nzeros = d5;
 // get P*(2^M[extra_digits])/10^extra_digits
-__mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+__mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
 
 // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128
 amount = recip_scale[nzeros];
@@ -365,7 +365,7 @@ if (!CA4.w[0]  !CA4.w[1])
 
   if (nzeros) {
// get P*(2^M[extra_digits])/10^extra_digits
-   __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+   __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
 
//now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128
amount = recip_scale[nzeros];
@@ -487,7 +487,7 @@ TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2 (UINT128, bid128d
  UINT64, y)
 
  UINT256 CA4, CA4r, P256;
- UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res;
+ UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res;
  UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD,
valid_y;
  int_float fx, fy, f64;
@@ -701,7 +701,7 @@ __div_256_by_128 (CQ, CA4, CY);
   if (d5  nzeros)
nzeros = d5;
   // get P*(2^M[extra_digits])/10^extra_digits
-  __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+  __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
   //__mul_128x128_to_256(P256, CQ, 
reciprocals10_128[nzeros]);Qh.w[1]=P256.w[3];Qh.w[0]=P256.w[2];
 
   // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128
@@ -829,7 +829,7 @@ __div_256_by_128 (CQ, CA4, CY);
 
if (nzeros) {
  // get P*(2^M[extra_digits])/10^extra_digits
- __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+ __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
 
  // now get P/10^extra_digits: shift Q_high right by 
M[extra_digits]-128
  amount = recip_scale[nzeros];
@@ -946,7 +946,7 @@ BID_RETURN (res);
 
 BID128_FUNCTION_ARGTYPE1_ARG128 (bid128dq_div, UINT64, x, y)
  UINT256 CA4, CA4r, P256;
- UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res;
+ UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res;
  UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, valid_y,
PD;
  int_float fx, fy, f64;
@@ -1155,7 +1155,7 @@ __div_256_by_128 (CQ, CA4, CY);
   if (d5  nzeros)
nzeros = d5;
   // get P*(2^M[extra_digits])/10^extra_digits
-  __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+  __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
   //__mul_128x128_to_256(P256, CQ, 
reciprocals10_128[nzeros]);Qh.w[1]=P256.w[3];Qh.w[0]=P256.w[2];
 
   // now get P/10^extra_digits: shift Q_high right by M[extra_digits]-128
@@ -1285,7 +1285,7 @@ __div_256_by_128 (CQ, CA4, CY);
 
if (nzeros) {
  // get P*(2^M[extra_digits])/10^extra_digits
- __mul_128x128_full (Qh, Ql, CQ, reciprocals10_128[nzeros]);
+ __mul_128x128_high (Qh, CQ, reciprocals10_128[nzeros]);
 
  // now get P/10^extra_digits: shift Q_high right by 
M[extra_digits]-128
  amount = recip_scale[nzeros];
@@ -1403,7 +1403,7 @@ BID_RETURN (res);
 
 BID128_FUNCTION_ARG128_ARGTYPE2 (bid128qd_div, x, UINT64, y)
  UINT256 CA4, CA4r, P256;
- UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, Ql, res;
+ UINT128 CX, CY, T128, CQ, CR, CA, TP128, Qh, res;
  UINT64 sign_x, sign_y, T, carry64, D, Q_high, Q_low, QX, PD,
valid_y;
  int_float fx, fy, f64;
@@ -1607,7 +1607,7 @@ __div_256_by_128 (CQ, CA4, CY);
   if (d5  nzeros)
nzeros = d5;
   // get P*(2^M

[PATCH, i386]: Fix logical 'not' error in x86_rtx_costs (PR 61271)

2014-05-26 Thread Uros Bizjak
Hello!

There is a stray ! in ix86_rtx_costs which results in an invalid
bypass for LABEL_REFs. After some simplifications, the fixed condition
should read:

  else if (flag_pic  SYMBOLIC_CONST (x)
!(TARGET_64BIT
 (GET_CODE (x) == LABEL_REF
|| (GET_CODE (x) == SYMBOL_REF
 SYMBOL_REF_LOCAL_P (x)
*total = 1;

The patch fixes the condition, but I don't think that handling of
LABEL_REFs and SYMBOL_REFs is correct in the cost function at all.
E.g. in x86_64_immediate_operand predicate, LABEL_REFs (and non-TLS
SYMBOL_REFs) are rejected for all PIC code models, so they get cost of
3 and don't even reach this part of the function.

Honza, can you perhaps check if x86_64{,_zext}_immediate operand
handles PIC code models in a correct way?

The trivial patch is bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: i386.c
===
--- i386.c  (revision 210937)
+++ i386.c  (working copy)
@@ -37903,10 +37903,10 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_
   else if (TARGET_64BIT  !x86_64_zext_immediate_operand (x, VOIDmode))
*total = 2;
   else if (flag_pic  SYMBOLIC_CONST (x)
-   !(TARGET_64BIT
-(GET_CODE (x) == LABEL_REF
-   || (GET_CODE (x) == SYMBOL_REF
-SYMBOL_REF_LOCAL_P (x)
+   (!TARGET_64BIT
+  || (!GET_CODE (x) != LABEL_REF
+   (GET_CODE (x) != SYMBOL_REF
+  || !SYMBOL_REF_LOCAL_P (x)
*total = 1;
   else
*total = 0;


Re: [PATCH, i386]: Fix logical 'not' error in x86_rtx_costs (PR 61271)

2014-05-26 Thread Uros Bizjak
On Mon, May 26, 2014 at 7:48 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 There is a stray ! in ix86_rtx_costs which results in an invalid
 bypass for LABEL_REFs. After some simplifications, the fixed condition
 should read:

   else if (flag_pic  SYMBOLIC_CONST (x)
 !(TARGET_64BIT
  (GET_CODE (x) == LABEL_REF
 || (GET_CODE (x) == SYMBOL_REF
  SYMBOL_REF_LOCAL_P (x)
 *total = 1;

 The patch fixes the condition, but I don't think that handling of
 LABEL_REFs and SYMBOL_REFs is correct in the cost function at all.
 E.g. in x86_64_immediate_operand predicate, LABEL_REFs (and non-TLS
 SYMBOL_REFs) are rejected for all PIC code models, so they get cost of
 3 and don't even reach this part of the function.

 Honza, can you perhaps check if x86_64{,_zext}_immediate operand
 handles PIC code models in a correct way?

 The trivial patch is bootstrapped and regression tested on
 x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Eh, wrong patch was attached. And ChangeLog was missing, too.

2014-05-26  Uros Bizjak  ubiz...@gmail.com

PR target/61271
* config/i386/i386.c (ix86_rtx_costs)
case CONST_INT, case CONST, case LABEL_REF, case SYMBOL_REF:
Fix condition.

Uros.
Index: i386.c
===
--- i386.c  (revision 210889)
+++ i386.c  (working copy)
@@ -37903,10 +37903,10 @@
   else if (TARGET_64BIT  !x86_64_zext_immediate_operand (x, VOIDmode))
*total = 2;
   else if (flag_pic  SYMBOLIC_CONST (x)
-   (!TARGET_64BIT
-  || (!GET_CODE (x) != LABEL_REF
-   (GET_CODE (x) != SYMBOL_REF
-  || !SYMBOL_REF_LOCAL_P (x)
+   !(TARGET_64BIT
+(GET_CODE (x) == LABEL_REF
+   || (GET_CODE (x) == SYMBOL_REF
+SYMBOL_REF_LOCAL_P (x)
*total = 1;
   else
*total = 0;


[PATCH, testsuite]: Fix lto.exp does not support ... in secondary source files warnings

2014-05-26 Thread Uros Bizjak
Hello!

2014-05-26  Uros Bizjak  ubiz...@gmail.com

* gcc.dg/lto/pr61278_1.c: Remove dg directives.

Tested on x86_64-pc-linux-gnu and committed.

Uros.
Index: ChangeLog
===
--- ChangeLog   (revision 210936)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2014-05-26  Uros Bizjak  ubiz...@gmail.com
+
+   * gcc.dg/lto/pr61278_1.c: Remove dg directives.
+
 2014-05-26  Jerry DeLisle  jvdeli...@gcc.gnu.org
 
PR libgfortran/55117
Index: gcc.dg/lto/pr61278_1.c
===
--- gcc.dg/lto/pr61278_1.c  (revision 210936)
+++ gcc.dg/lto/pr61278_1.c  (working copy)
@@ -1,6 +1,3 @@
-/* { dg-lto-do link } */
-/* { dg-lto-options { { -flto -O1 } } } */
-
 extern char foo (char *);
 
 char d;


[PATCH, testsuite]: Fix c-c++-common/cilk-plus/AN/pr61191.c dg-error directives.

2014-05-26 Thread Uros Bizjak
Hello!

2014-05-26  Uros Bizjak  ubiz...@gmail.com

* c-c++-common/cilk-plus/AN/pr61191.c: Fix dg-error directives.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: c-c++-common/cilk-plus/AN/pr61191.c
===
--- c-c++-common/cilk-plus/AN/pr61191.c (revision 210936)
+++ c-c++-common/cilk-plus/AN/pr61191.c (working copy)
@@ -4,7 +4,7 @@
 
 double f(double * A, double * B)
 {
-  return __sec_reduce_add((B[0:500])(;
-/* { dg-error expected expression before ';' token  {target *-*-*} 7 } */
-/* { dg-error called object  {target *-*-*} 7 } */
-} /* { dg-error expected } */
+  return __sec_reduce_add((B[0:500])(; /* { dg-error called object  { 
target c } } */
+/* { dg-error expected expression before ';' token  { target c } 7 } */
+/* { dg-error expected primary-expression before ';' token  { target c++ } 
7 } */
+} /* { dg-error expected  { target c } } */


[PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90

2014-05-27 Thread Uros Bizjak
Hello!

These tests require vect_simd_clones effective target, as the target
should be able to compile AVX clones.

2014-05-27  Uros Bizjak  ubiz...@gmail.com

* testsuite/libgomp.fortran/declare-simd-1.f90: Require
vect_simd_clones effective target.  Remove
dg-additional-options directives.
* testsuite/libgomp.fortran/declare-simd-2.f90: Ditto.

Tested on x86_64-linux-gnu CentOS 5.

Uros.
Index: testsuite/libgomp.fortran/declare-simd-1.f90
===
--- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956)
+++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy)
@@ -1,6 +1,5 @@
 ! { dg-options -fno-inline }
-! { dg-additional-options -msse2 { target sse2_runtime } }
-! { dg-additional-options -mavx { target avx_runtime } }
+! { dg-require-effective-target vect_simd_clones }
 
 module declare_simd_1_mod
   contains
Index: testsuite/libgomp.fortran/declare-simd-2.f90
===
--- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956)
+++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy)
@@ -1,8 +1,7 @@
 ! { dg-do run }
 ! { dg-options -fno-inline }
-  ! { dg-additional-sources declare-simd-3.f90 }
-! { dg-additional-options -msse2 { target sse2_runtime } }
-! { dg-additional-options -mavx { target avx_runtime } }
+! { dg-additional-sources declare-simd-3.f90 }
+! { dg-require-effective-target vect_simd_clones }
 
 module declare_simd_2_mod
   contains


Re: [PATCH, libgomp]: Require vect_simd_clones effective target for libgomp.fortran/declare-simd-[12].f90

2014-05-27 Thread Uros Bizjak
On Tue, May 27, 2014 at 9:18 AM, Jakub Jelinek ja...@redhat.com wrote:

 Please don't remove the dg-additional-options there, that is completely
 intentional there, only the simd clones are built for SSE2/AVX/AVX2,
 the simd loops are built with whatever options the loop is compiled with,
 and for the common case (AVX or later HW, but compiler not configured to
 support only AVX or later) I want to test as much vectorization as possible.
 Requiring vect_simd_clone or the whitespace change is fine, though I'd
 just use ! { dg-do run { target vect_simd_clones } } instead of
 dg-require-effective-target.

Thanks for the explanation.

Following is the v2 patch that I plan to commit after testing:

2014-05-27  Uros Bizjak  ubiz...@gmail.com

* testsuite/libgomp.fortran/declare-simd-1.f90: Require
vect_simd_clones effective target.
* testsuite/libgomp.fortran/declare-simd-2.f90: Ditto.

Uros.
Index: testsuite/libgomp.fortran/declare-simd-1.f90
===
--- testsuite/libgomp.fortran/declare-simd-1.f90(revision 210956)
+++ testsuite/libgomp.fortran/declare-simd-1.f90(working copy)
@@ -1,3 +1,4 @@
+! { dg-do run { target vect_simd_clones } }
 ! { dg-options -fno-inline }
 ! { dg-additional-options -msse2 { target sse2_runtime } }
 ! { dg-additional-options -mavx { target avx_runtime } }
Index: testsuite/libgomp.fortran/declare-simd-2.f90
===
--- testsuite/libgomp.fortran/declare-simd-2.f90(revision 210956)
+++ testsuite/libgomp.fortran/declare-simd-2.f90(working copy)
@@ -1,6 +1,6 @@
-! { dg-do run }
+! { dg-do run { target vect_simd_clones } }
 ! { dg-options -fno-inline }
-  ! { dg-additional-sources declare-simd-3.f90 }
+! { dg-additional-sources declare-simd-3.f90 }
 ! { dg-additional-options -msse2 { target sse2_runtime } }
 ! { dg-additional-options -mavx { target avx_runtime } }
 


[PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c

2014-05-27 Thread Uros Bizjak
... to avoid implicit declaration of function ‘free’ warning.

2014-05-27  Uros Bizjak  ubiz...@gmail.com

* intrinsics/getcwd.c: Include stdlib.h.

Tested on x86_64-pc-linux-gnu.

OK for mainline?

Uros.

Index: intrinsics/getcwd.c
===
--- intrinsics/getcwd.c (revision 210956)
+++ intrinsics/getcwd.c (working copy)
@@ -25,6 +25,7 @@

 #include libgfortran.h

+#include stdlib.h
 #include string.h
 #include errno.h


Re: [PATCH, fortran]: Include stdlib.h in intrinsics/getcwd.c

2014-05-27 Thread Uros Bizjak
On Tue, May 27, 2014 at 1:37 PM, Steve Kargl
s...@troutmask.apl.washington.edu wrote:

 ... to avoid implicit declaration of function ???free??? warning.

 2014-05-27  Uros Bizjak  ubiz...@gmail.com

 * intrinsics/getcwd.c: Include stdlib.h.

 It can also be committed to the 4.9 branch if you have the time.

There is no need for stdlib.h include in the 4.9 branch, the call to
free was introduced in 4.10.

Uros.


Re: [Patch] Minor fixes for regtesting gfortran with -flto

2014-05-29 Thread Uros Bizjak
Hello!

 With the following patch, gfortran can be regtested with -flto
 with no failure, but pr54852 and pr60061.

-! { dg-final { scan-assembler-times myBindC 1 { target { ! {
hppa*-*-hpux* } } } } }
-! { dg-final { scan-assembler-times myBindC,%r2 1 { target {
hppa*-*-hpux* } } } }
+! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 {
target { ! { hppa*-*-hpux* } } } } }
+! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 {
target { hppa*-*-hpux* } } } }

The change above fails on alpha, which doesn't emit call in the assembly, but:

$ grep myBindC bind_c_array_params_2.s
jsr $26,myBindC

Probably, alpha is not the only one that fails this assumption.

Uros.


Re: [Patch] Minor fixes for regtesting gfortran with -flto

2014-05-29 Thread Uros Bizjak
On Thu, May 29, 2014 at 11:38 AM, Dominique Dhumieres
domi...@lps.ens.fr wrote:
 Probably, alpha is not the only one that fails this assumption.

 Indeed! see the thread starting at
 https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html

 Could you test the following patch

 --- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90   
 2014-05-24 16:17:53.0 +0200
 +++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 
 11:34:40.0 +0200
 @@ -16,7 +16,7 @@ integer :: aa(4,4)
  call test(aa)
  end

 -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { 
 hppa*-*-hpux* } } } } }
 -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { 
 hppa*-*-hpux* } } } }
 +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC 1 { target { 
 ! { hppa*-*-hpux* } } } } }
 +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC,%r2 1 { 
 target { hppa*-*-hpux* } } } }
  ! { dg-final { scan-tree-dump-times test \\\(parm\\. 1 original } }
  ! { dg-final { cleanup-tree-dump original } }

 with

 make -k check-gfortran RUNTESTFLAGS=dg.exp=bind_c_array_params_2.f90 
 --target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}'

This works on alpha with --target_board=unix'{,-flto}' and x86_64, so
I guess it is OK.

 Can you pre-approved it?

I'm not a testsuite maintainer (one is CC'd for a final approval), but
the situation is definitely better with the patched regexp.

Uros.


Re: Patch RFA: Move x86 _mm_pause out of pragma target(sse) scope

2014-05-30 Thread Uros Bizjak
Hello!

 This error is because _mm_pause is defined in the scope of #pragma GCC
 target(sse).  But _mm_pause, which simply generates the pause
 instruction, does not require SSE support.  The pause instruction has
 nothing really to do with SSE, and it works on all x86 processors (on
 processors that do not explicitly recognize it, it is a nop).

 I propose the following patch, which moves _mm_pause out of the pragma
 target scope.

 I know that x86intrin.h provides a similar intrinsic, __pause, but I
 think it's worth making _mm_pause work reasonably as well.

 I'm running a full testsuite run.  OK for mainline if it passes?

 gcc/ChangeLog:

 2014-05-29  Ian Lance Taylor  i...@google.com

 * config/i386/xmmintrin.h (_mm_pause): Move out of scope of pragma
 target(sse).

 gcc/testsuite/ChangeLog:

 2014-05-29  Ian Lance Taylor  i...@google.com

 * gcc.target/i386/pause-2.c: New test.

The patch looks OK to me, but please wait a day or two for possible
comments (compatibility, etc) from Kirill.

Thanks,
Uros.


Re: [PATCH, i386] Enable fuse-caller-save for i386

2014-05-30 Thread Uros Bizjak
On Fri, May 30, 2014 at 11:45 AM, Tom de Vries tom_devr...@mentor.com wrote:

 This patch enables the fuse-caller-save optimization for i386.

 It sets the hook TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true.

 The definition of the hook is:
 ...
 set to true if all the calls in the current function contain clobbers in
 CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call
 rather than by the callee, and are not already set or clobbered in the call
 pattern. Examples of such registers are registers used in PLTs and stubs,
 and temporary registers used in the call instruction but not present in the
 rtl pattern. Another way to formulate it is the registers not present in the
 rtl pattern that are clobbered by the call assuming the callee does not
 clobber any register. The default version of this hook is set to false.
 ...

 Bootstrapped and reg-tested this patch on x86_64, no issues found.

 Is it in fact safe to set this hook to true for i386? Are there clobbers
 which need to be added?

 If it's safe to set this hook to true, OK for trunk?

AFAIK, this is true for all targets, including cross-calls between MS
and SYSV ABIs, so I'd say OK.

Uros.


Re: [patch i386]: Fix sibcall failures caused by allowing constant memories

2014-06-01 Thread Uros Bizjak
On Sat, May 31, 2014 at 2:27 PM, Kai Tietz ktiet...@googlemail.com wrote:

 I resend patch within new thread.
 Recent fallout about sibcall was caused by using 'm' constraint for
 sibcalls.  By this wrongly combines happened on reload-pass. That
 patch introduces new constraint 'B' for sibcall_memory_operand.

 ChangeLog

 2014-05-31  Kai Tietz  kti...@redhat.com

 * constrains.md (define_constrain): New 'B' constraint.

Please make this a two-letter constraint (perhaps Bs). We are
already short in single-letter constraints.

I plan to change z and w @internal constraints to Bz and Bw to
return these two letters.

Uros.


[PATCH, testsuite]: Properly escape brackets in gcc.target/i386/sibcall-X.c scan strings

2014-06-01 Thread Uros Bizjak
Hello!

2014-06-01  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/sibcall-2.c (dg-final): Properly escape '[' and ']'
in scan-assembler-not string.
* gcc.target/i386/sibcall-3.c (dg-final): Ditto.
* gcc.target/i386/sibcall-4.c (dg-final): Ditto.
* gcc.target/i386/sibcall-6.c (dg-final): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: gcc.target/i386/sibcall-2.c
===
--- gcc.target/i386/sibcall-2.c (revision 22)
+++ gcc.target/i386/sibcall-2.c (working copy)
@@ -13,4 +13,4 @@
   return (a  0 ? doo1 : doo2) (a);
 }
 
-/* { dg-final { scan-assembler-not call[ \t]*.%eax } } */
+/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */
Index: gcc.target/i386/sibcall-3.c
===
--- gcc.target/i386/sibcall-3.c (revision 22)
+++ gcc.target/i386/sibcall-3.c (working copy)
@@ -13,4 +13,4 @@
   return foo (a);
 }
 
-/* { dg-final { scan-assembler-not jmp[ \t]*.%eax } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */
Index: gcc.target/i386/sibcall-4.c
===
--- gcc.target/i386/sibcall-4.c (revision 22)
+++ gcc.target/i386/sibcall-4.c (working copy)
@@ -12,4 +12,4 @@
   dispatch[offset](offset);
 }
 
-/* { dg-final { scan-assembler-not jmp[ \t]*.%eax } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */
Index: gcc.target/i386/sibcall-6.c
===
--- gcc.target/i386/sibcall-6.c (revision 22)
+++ gcc.target/i386/sibcall-6.c (working copy)
@@ -34,4 +34,4 @@
   if (postorder_func)
 (*postorder_func) (loop_node);
 }
-/* { dg-final { scan-assembler jmp[ \t]*.%eax } } */
+/* { dg-final { scan-assembler jmp\[ \t\]*.%eax } } */


[PATCH, i386]: Rename two @internal constraints

2014-06-01 Thread Uros Bizjak
Hello!

This change renames two @internal constraints to free two letters.

2014-06-01  Uros Bizjak  ubiz...@gmail.com

* config/i386/constraints.md (Bw): Rename from 'w'.
(Bz): Rename from 'z'.
* config/i386/i386.md: Change 'w' to 'Bw' and 'z' to 'Bz' globally.

Tested on x86_64-linux-gnu {,-m32}  and committed to mainline.

Uros.
Index: config/i386/constraints.md
===
--- config/i386/constraints.md  (revision 22)
+++ config/i386/constraints.md  (working copy)
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;   H
-;;;   h j
+;;;   h jw  z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -91,6 +91,9 @@
 (define_register_constraint x TARGET_SSE ? SSE_REGS : NO_REGS
  Any SSE register.)
 
+(define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS
+ Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).)
+
 ;; We use the Y prefix to denote any number of conditional register sets:
 ;;  z  First SSE register.
 ;;  i  SSE2 inter-unit moves to SSE register enabled
@@ -144,8 +147,10 @@
  (ix86_fpmath  FPMATH_387) ? FLOAT_REGS : NO_REGS
  @internal Any x87 register when 80387 FP arithmetic is enabled.)
 
-;; We use the B prefix to denote any number of internal memory operands:
-;;  s  Sibling memory operand.
+;; We use the B prefix to denote any number of internal operands:
+;;  s  Sibcall memory operand, not valid for TARGET_X32
+;;  w  Call memory operand, not valid for TARGET_X32
+;;  z  Constant call address operand.
 
 (define_constraint Bs
   @internal Sibcall memory operand.
@@ -152,18 +157,15 @@
   (and (not (match_test TARGET_X32))
(match_operand 0 sibcall_memory_operand)))
 
-(define_register_constraint v TARGET_SSE ? ALL_SSE_REGS : NO_REGS
- Any EVEX encodable SSE register (@code{%xmm0-%xmm31}).)
+(define_constraint Bw
+  @internal Call memory operand.
+  (and (not (match_test TARGET_X32))
+   (match_operand 0 memory_operand)))
 
-(define_constraint z
+(define_constraint Bz
   @internal Constant call address operand.
   (match_operand 0 constant_call_address_operand))
 
-(define_constraint w
-  @internal Call memory operand.
-  (and (not (match_test TARGET_X32))
-   (match_operand 0 memory_operand)))
-
 ;; Integer constant constraints.
 (define_constraint I
   Integer constant in the range 0 @dots{} 31, for 32-bit shifts.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 22)
+++ config/i386/i386.md (working copy)
@@ -11182,7 +11182,7 @@
 })
 
 (define_insn *indirect_jump
-  [(set (pc) (match_operand:W 0 indirect_branch_operand rw))]
+  [(set (pc) (match_operand:W 0 indirect_branch_operand rBw))]
   
   jmp\t%A0
   [(set_attr type ibr)
@@ -11230,7 +11230,7 @@
 })
 
 (define_insn *tablejump_1
-  [(set (pc) (match_operand:W 0 indirect_branch_operand rw))
+  [(set (pc) (match_operand:W 0 indirect_branch_operand rBw))
(use (label_ref (match_operand 1)))]
   
   jmp\t%A0
@@ -11360,7 +11360,7 @@
 })
 
 (define_insn *call
-  [(call (mem:QI (match_operand:W 0 call_insn_operand czw))
+  [(call (mem:QI (match_operand:W 0 call_insn_operand cBwBz))
 (match_operand 1))]
   !SIBLING_CALL_P (insn)
   * return ix86_output_call_insn (insn, operands[0]);
@@ -11368,7 +11368,7 @@
 
 (define_insn *call_rex64_ms_sysv
   [(match_parallel 2 call_rex64_ms_sysv_operation
-[(call (mem:QI (match_operand:DI 0 call_insn_operand rzw))
+[(call (mem:QI (match_operand:DI 0 call_insn_operand rBwBz))
   (match_operand 1))
  (unspec [(const_int 0)] UNSPEC_MS_TO_SYSV_CALL)])]
   TARGET_64BIT  !SIBLING_CALL_P (insn)
@@ -11376,7 +11376,7 @@
   [(set_attr type call)])
 
 (define_insn *sibcall
-  [(call (mem:QI (match_operand:W 0 sibcall_insn_operand UzBs))
+  [(call (mem:QI (match_operand:W 0 sibcall_insn_operand UBsBz))
 (match_operand 1))]
   SIBLING_CALL_P (insn)
   * return ix86_output_call_insn (insn, operands[0]);
@@ -11396,7 +11396,7 @@
 })
 
 (define_insn *call_pop
-  [(call (mem:QI (match_operand:SI 0 call_insn_operand lzm))
+  [(call (mem:QI (match_operand:SI 0 call_insn_operand lmBz))
 (match_operand 1))
(set (reg:SI SP_REG)
(plus:SI (reg:SI SP_REG)
@@ -11406,7 +11406,7 @@
   [(set_attr type call)])
 
 (define_insn *sibcall_pop
-  [(call (mem:QI (match_operand:SI 0 sibcall_insn_operand UzBs))
+  [(call (mem:QI (match_operand:SI 0 sibcall_insn_operand UBsBz))
 (match_operand 1))
(set (reg:SI SP_REG)
(plus:SI (reg:SI SP_REG)
@@ -11443,7 +11443,7 @@
 
 (define_insn *call_value
   [(set (match_operand 0)
-   (call (mem:QI (match_operand:W 1 call_insn_operand czw))
+   (call (mem:QI (match_operand:W 1 call_insn_operand cBwBz))
  (match_operand 2)))]
   !SIBLING_CALL_P (insn)
   * return ix86_output_call_insn (insn, operands[1]);
@@ -11451,7 +11451,7 @@
 
 (define_insn *sibcall_value
   [(set

[PATCH, testsuite]: Fixes for recent ia32 testsuite failures

2014-06-01 Thread Uros Bizjak
Hello!

Plus a more modern dg-do target selector instead of dg-require-effective-target.

2014-06-01  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/sibcall-2.c: Xfail dg-final scan-assembler-not,
not compilation.
* gcc.target/i386/sibcall-4.c: Ditto.
* gcc.target/i386/fuse-caller-save.c: Add -mregparm=1 for ia32 target.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: fuse-caller-save.c
===
--- fuse-caller-save.c  (revision 22)
+++ fuse-caller-save.c  (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fuse-caller-save } */
+/* { dg-additional-options -mregparm=1 { target ia32 } } */
+
 /* Testing -fuse-caller-save optimization option.  */
 
 static int __attribute__((noinline))
Index: sibcall-1.c
===
--- sibcall-1.c (revision 22)
+++ sibcall-1.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern int (*foo)(int);
Index: sibcall-2.c
===
--- sibcall-2.c (revision 28)
+++ sibcall-2.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile { xfail { *-*-* } } } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern int doo1 (int);
@@ -13,4 +12,4 @@
   return (a  0 ? doo1 : doo2) (a);
 }
 
-/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */
+/* { dg-final { scan-assembler-not call\[ \t\]*.%eax { xfail *-*-* } } } */
Index: sibcall-3.c
===
--- sibcall-3.c (revision 28)
+++ sibcall-3.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern 
Index: sibcall-4.c
===
--- sibcall-4.c (revision 28)
+++ sibcall-4.c (working copy)
@@ -1,6 +1,5 @@
 /* Testcase for PR target/46219.  */
-/* { dg-do compile { xfail { *-*-* } } } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile  { target ia32 } } */
 /* { dg-options -O2 } */
 
 typedef void (*dispatch_t)(long offset);
@@ -12,4 +11,4 @@
   dispatch[offset](offset);
 }
 
-/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax { xfail *-*-* } } } */
Index: sibcall-5.c
===
--- sibcall-5.c (revision 22)
+++ sibcall-5.c (working copy)
@@ -1,6 +1,5 @@
 /* Check that indirect sibcalls understand regparm.  */
-/* { dg-do run } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do run { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern void abort (void);
Index: sibcall-6.c
===
--- sibcall-6.c (revision 28)
+++ sibcall-6.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 typedef void *ira_loop_tree_node_t;


[PATCH, i386]: Fix PR 61239, ICE in decompose, at rtl.h when compiling vshuf-v16hi.c using -mavx2

2014-06-02 Thread Uros Bizjak
Hello!

2014-06-02  Uros Bizjak  ubiz...@gmail.com

PR target/61239
* config/i386/i386.c (ix86_expand_vec_perm) [case V32QImode]: Use
GEN_INT (-128) instead of GEN_INT (128) to set MSB of QImode constant.

Tested on x86_64-pc-linux-gnu with make check-gcc
RUNTESTFLAGS='--target_board=unix\{-msse2,-msse4,-mavx,-mavx2\}
dg-torture.exp=vshuf*.c' and committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 211125)
+++ config/i386/i386.c  (working copy)
@@ -21541,7 +21541,7 @@ ix86_expand_vec_perm (rtx operands[])
  t1 = gen_reg_rtx (V32QImode);
  t2 = gen_reg_rtx (V32QImode);
  t3 = gen_reg_rtx (V32QImode);
- vt2 = GEN_INT (128);
+ vt2 = GEN_INT (-128);
  for (i = 0; i  32; i++)
vec[i] = vt2;
  vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));


[PATCH, testsuite]: Add -mno-avx2 to some i386 XOP tests

2014-06-02 Thread Uros Bizjak
Hello!

With targets that default to AVX2, these tests vectorize via 256bit
paths, where different insns are emitted.

2014-06-02  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/xop-rotate1-vector.c (dg-options): Add -mno-avx2.
* gcc.target/i386/xop-rotate2-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-rotate3-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-imul32widen-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-imul64-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-shift1-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-shift2-vector.c (dg-options): Ditto.
* gcc.target/i386/xop-shift3-vector.c (dg-options): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: gcc.target/i386/xop-rotate1-vector.c
===
--- gcc.target/i386/xop-rotate1-vector.c(revision 211125)
+++ gcc.target/i386/xop-rotate1-vector.c(working copy)
@@ -2,7 +2,7 @@
into prot on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-rotate2-vector.c
===
--- gcc.target/i386/xop-rotate2-vector.c(revision 211125)
+++ gcc.target/i386/xop-rotate2-vector.c(working copy)
@@ -2,7 +2,7 @@
into prot on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-imul32widen-vector.c
===
--- gcc.target/i386/xop-imul32widen-vector.c(revision 211125)
+++ gcc.target/i386/xop-imul32widen-vector.c(working copy)
@@ -3,7 +3,7 @@
 
 /* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-rotate3-vector.c
===
--- gcc.target/i386/xop-rotate3-vector.c(revision 211125)
+++ gcc.target/i386/xop-rotate3-vector.c(working copy)
@@ -2,7 +2,7 @@
into prot on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-imul64-vector.c
===
--- gcc.target/i386/xop-imul64-vector.c (revision 211125)
+++ gcc.target/i386/xop-imul64-vector.c (working copy)
@@ -3,7 +3,7 @@
 
 /* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-shift1-vector.c
===
--- gcc.target/i386/xop-shift1-vector.c (revision 211125)
+++ gcc.target/i386/xop-shift1-vector.c (working copy)
@@ -2,7 +2,7 @@
psha/pshl on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-shift2-vector.c
===
--- gcc.target/i386/xop-shift2-vector.c (revision 211125)
+++ gcc.target/i386/xop-shift2-vector.c (working copy)
@@ -2,7 +2,7 @@
psha/pshl on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 
Index: gcc.target/i386/xop-shift3-vector.c
===
--- gcc.target/i386/xop-shift3-vector.c (revision 211125)
+++ gcc.target/i386/xop-shift3-vector.c (working copy)
@@ -2,7 +2,7 @@
psha/pshl on XOP systems.  */
 
 /* { dg-do compile { target { ! { ia32 } } } } */
-/* { dg-options -O2 -mxop -ftree-vectorize } */
+/* { dg-options -O2 -mxop -mno-avx2 -ftree-vectorize } */
 
 extern void exit (int);
 


[PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg

2014-06-02 Thread Uros Bizjak
Hello!

A problem was uncovered by -march=corei7 -mtune=intel -m32 with
i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the
max size of the transfer was known, the memcpy was not inlined, as
expected by the testcase.

The core of the problem can be seen in the definition of 32bit
intel_memcpy stringop alg:

  {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}},

Please note that the last algorithm sets its maximum size to -1,
unlimited. However, in decide_alg, the same number also signals that
no algorithm sets its size, so expected_size is never calculated. In
the loop that sets maximal size for user defined algorithm, it is
assumed that -1 belongs exclusively to libcall, which is not the
case in the above intel_memcpy definition:

  if (candidate != libcall  candidate  usable)
  max = algs-size[i].max;

When the last non-libcall algorithm sets its maximum to -1 (aka
unlimited), this value fails following test:

  if (max  1  (unsigned HOST_WIDE_INT) max = max_size

and expected_size is never calculated.

Attached patch fixes this oversight, so -1 means unlimited size and
0 means that size was never set. The patch also considers these two
special values when choosing a maximum size for dynamic check.

2014-06-02  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (decide_alg): Correctly handle maximum size of
stringop algorithm.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}, also with
RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\},
where it fixes both memcpy failures from [1].

[1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html

Jan, can you please review the patch, to check if the logic is OK?

Uros.
Index: fuse-caller-save.c
===
--- fuse-caller-save.c  (revision 22)
+++ fuse-caller-save.c  (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fuse-caller-save } */
+/* { dg-additional-options -mregparm=1 { target ia32 } } */
+
 /* Testing -fuse-caller-save optimization option.  */
 
 static int __attribute__((noinline))
Index: sibcall-1.c
===
--- sibcall-1.c (revision 22)
+++ sibcall-1.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern int (*foo)(int);
Index: sibcall-2.c
===
--- sibcall-2.c (revision 28)
+++ sibcall-2.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile { xfail { *-*-* } } } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern int doo1 (int);
@@ -13,4 +12,4 @@
   return (a  0 ? doo1 : doo2) (a);
 }
 
-/* { dg-final { scan-assembler-not call\[ \t\]*.%eax } } */
+/* { dg-final { scan-assembler-not call\[ \t\]*.%eax { xfail *-*-* } } } */
Index: sibcall-3.c
===
--- sibcall-3.c (revision 28)
+++ sibcall-3.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern 
Index: sibcall-4.c
===
--- sibcall-4.c (revision 28)
+++ sibcall-4.c (working copy)
@@ -1,6 +1,5 @@
 /* Testcase for PR target/46219.  */
-/* { dg-do compile { xfail { *-*-* } } } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile  { target ia32 } } */
 /* { dg-options -O2 } */
 
 typedef void (*dispatch_t)(long offset);
@@ -12,4 +11,4 @@
   dispatch[offset](offset);
 }
 
-/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax { xfail *-*-* } } } */
Index: sibcall-5.c
===
--- sibcall-5.c (revision 22)
+++ sibcall-5.c (working copy)
@@ -1,6 +1,5 @@
 /* Check that indirect sibcalls understand regparm.  */
-/* { dg-do run } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do run { target ia32 } } */
 /* { dg-options -O2 } */
 
 extern void abort (void);
Index: sibcall-6.c
===
--- sibcall-6.c (revision 28)
+++ sibcall-6.c (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target ia32 } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-options -O2 } */
 
 typedef void *ira_loop_tree_node_t;


Re: [PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg

2014-06-02 Thread Uros Bizjak
On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak ubiz...@gmail.com wrote:

 A problem was uncovered by -march=corei7 -mtune=intel -m32 with
 i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the
 max size of the transfer was known, the memcpy was not inlined, as
 expected by the testcase.

 The core of the problem can be seen in the definition of 32bit
 intel_memcpy stringop alg:

   {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}},

 Please note that the last algorithm sets its maximum size to -1,
 unlimited. However, in decide_alg, the same number also signals that
 no algorithm sets its size, so expected_size is never calculated. In
 the loop that sets maximal size for user defined algorithm, it is
 assumed that -1 belongs exclusively to libcall, which is not the
 case in the above intel_memcpy definition:

   if (candidate != libcall  candidate  usable)
   max = algs-size[i].max;

 When the last non-libcall algorithm sets its maximum to -1 (aka
 unlimited), this value fails following test:

   if (max  1  (unsigned HOST_WIDE_INT) max = max_size

 and expected_size is never calculated.

 Attached patch fixes this oversight, so -1 means unlimited size and
 0 means that size was never set. The patch also considers these two
 special values when choosing a maximum size for dynamic check.

 2014-06-02  Uros Bizjak  ubiz...@gmail.com

 * config/i386/i386.c (decide_alg): Correctly handle maximum size of
 stringop algorithm.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
 {,-m32}, also with
 RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\},
 where it fixes both memcpy failures from [1].

 [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html

 Jan, can you please review the patch, to check if the logic is OK?

Whoops, wrong patch was attached. Now with the correct attachment.

Uros.
Index: ChangeLog
===
--- ChangeLog   (revision 211140)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-06-02  Uros Bizjak  ubiz...@gmail.com
+
+   * config/i386/i386.c (decide_alg): Correctly handle maximum size of
+   stringop algorithm.
+
 2014-06-02  Marcus Shawcroft  marcus.shawcr...@arm.com
 
* config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 211140)
+++ config/i386/i386.c  (working copy)
@@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
 {
   const struct stringop_algs * algs;
   bool optimize_for_speed;
-  int max = -1;
+  int max = 0;
   const struct processor_costs *cost;
   int i;
   bool any_alg_usable_p = false;
@@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
   /* If expected size is not known but max size is small enough
  so inline version is a win, set expected size into
  the range.  */
-  if (max  1  (unsigned HOST_WIDE_INT) max = max_size
+  if (((max  1  (unsigned HOST_WIDE_INT) max = max_size) || max == -1)
expected_size == -1)
 expected_size = min_size / 2 + max_size / 2;
 
@@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
 *dynamic_check = 128;
   return loop_1_byte;
 }
-  if (max == -1)
+  if (max = 0)
max = 4096;
   alg = decide_alg (count, max / 2, min_size, max_size, memset,
zero_memset, dynamic_check, noalign);


[PATCH, testsuite]: Fix g++.dg/ext/mv[14,15].C spurious failure on corei7

2014-06-03 Thread Uros Bizjak
Hello!

When configured with --with-arch=core-avx-i --with-cpu=core-avx-i,
g++.dg/ext/mv[14,15].C tests fail on corei7 [1] since the default CPU
is the same as the checked cpu in the test. The patch compiles the
testcase with -march=x86-64 as the generic CPU

2014-06-03  Uros Bizjak  ubiz...@gmail.com

* g++.dg/ext/mv14.C (dg-options): Add -march=x86-64.
* g++.dg/ext/mv15.C (dg-options): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32} corei7 CPU and committed to mainline SVN.

[1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00243.html

Uros.
Index: g++.dg/ext/mv14.C
===
--- g++.dg/ext/mv14.C   (revision 211188)
+++ g++.dg/ext/mv14.C   (working copy)
@@ -1,7 +1,7 @@
 /* Test case to check if Multiversioning works.  */
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-require-ifunc  }  */
-/* { dg-options -O2 -fPIC } */
+/* { dg-options -O2 -fPIC -march=x86-64 } */
 
 #include assert.h
 
Index: g++.dg/ext/mv15.C
===
--- g++.dg/ext/mv15.C   (revision 211188)
+++ g++.dg/ext/mv15.C   (working copy)
@@ -1,7 +1,7 @@
 /* Test case to check if Multiversioning works.  */
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-require-ifunc  }  */
-/* { dg-options -O2 -fPIC } */
+/* { dg-options -O2 -fPIC -march=x86-64 } */
 
 #include assert.h
 


Re: [fortran, patch] IEEE intrinsic modules

2014-06-05 Thread Uros Bizjak
Hello!

 +int
  get_fpu_except_flags (void)
  {
unsigned short cw;
int excepts;
int result = 0;

 -  __asm__ __volatile__ (fnstsw\t%0 : =a (cw));
 +  __asm__ __volatile__ (fnstsw\t%0 : =m (cw));
excepts = cw;

if (has_sse())

You can use =am constraint here, and the compiler will be free to
choose the most appropriate form.

Also, you should use __asm__ __volatile__ consistently in the headers.

Uros.


Re: [fortran, patch] IEEE intrinsic modules

2014-06-05 Thread Uros Bizjak
Hello!

 0. Gradual underflow control is implemented as not supported by the 
 processor (its SUPPORT
 function returns false, and the GET and SET procedures abort if you call 
 them). That’s explicitly
 allowed by the standard, so it’s not actually “missing. We can improve on 
 this in the future, if
 people can help.

Please look at libgcc/config/i386/crtfastmath.c for how to set
MXCSR_FTZ from mxcsr. You already have all necessary bits in place,
the function is basically only:

+  if (has_sse())
+  {
+unsigned int cw_sse;
+
+__asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse));
+cw_sse |= MXCSR_DAZ;
+__asm__ __volatile__ (%vldmxcsr\t%0 : : m (cw_sse));
+  }

Please note, that FTZ applies only to SSE math. x87 and (IIRC) soft-FP
don't handle this setting.

Uros.


Re: [fortran, patch] IEEE intrinsic modules

2014-06-05 Thread Uros Bizjak
On Thu, Jun 5, 2014 at 11:35 AM, FX fxcoud...@gmail.com wrote:
 Please look at libgcc/config/i386/crtfastmath.c for how to set
 MXCSR_FTZ from mxcsr. You already have all necessary bits in place,
 the function is basically only:

 +  if (has_sse())
 +  {
 +unsigned int cw_sse;
 +
 +__asm__ __volatile__ (%vstmxcsr\t%0 : =m (cw_sse));
 +cw_sse |= MXCSR_DAZ;
 +__asm__ __volatile__ (%vldmxcsr\t%0 : : m (cw_sse));
 +  }

Oops, the above should read MXCSR_FTZ.

 Thanks for the suggestion!


 Please note, that FTZ applies only to SSE math. x87 and (IIRC) soft-FP
 don't handle this setting.

 Yeah, that’s also why I prefer for now to have it declared as unsupported: 
 the Fortran standard doesn’t really allow for partial support such as this, 
 so I’m still trying to figure out what The Right Thing To Do is.

Referring to some older mails [1], this looks like a performance-only
setting (sort of fast-math). So, we can perhaps just set this bit,
regardless of the details. Maybe soft-fp will grow support for FTZ
sometime, it looks like a useful addition from the performance POV.

[1] https://gcc.gnu.org/ml/fortran/2013-11/msg00133.html

Uros.


Re: libgo patch committed: Merge from revision 18783 of master

2014-06-05 Thread Uros Bizjak
Hello!

 I have committed a patch to libgo to merge from revision
 18783:00cce3a34d7e of the master library.  This revision was committed
 January 7.  I picked this revision to merge to because the next revision
 deleted a file that is explicitly merged in by the libgo/merge.sh
 script.

crypto/x509 fails on x86 Fedora20 with:

--- FAIL: TestImports (0.00 seconds)
testing.go:228: failed to run x509_test_import.go: exec: go:
executable file not found in $PATH
FAIL
FAIL: crypto/x509

Uros.


Re: [PATCH] Fix PR61335

2014-06-06 Thread Uros Bizjak
Hello!

 2014-05-28  Richard Biener  rguent...@suse.de

 PR tree-optimization/61335
 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and
 new range fails, drop to varying.

 * gfortran.dg/pr61335.f90: New testcase.

This testcase triggers SIGFPE on alpha due to the use of denormal
operand. Maybe uninitialized value is used in line 48?

Reading symbols from ./pr61335.exe...done.
(gdb) r
Starting program: /space/homedirs/uros/test/pr61335.exe

Program received signal SIGFPE, Arithmetic exception.
0x00012b54 in cp_units::cp_unit_create (string=error reading
variable: Cannot access memory at address 0x120004000, _string=5)
at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:48
48  unit_id=cp_units_none
(gdb) list
43  len_string, next_power
44  INTEGER, DIMENSION(cp_unit_max_kinds):: kind_id, power, unit_id
45  LOGICAL  :: failure
46
47  failure=.FALSE.
48  unit_id=cp_units_none
49  kind_id=cp_ukind_none
50  power=0
51  i_low=1
52  i_high=1

The exception is triggered in 0x12b50, but emitted on the next FP insn.

   0x00012b4c +76:lds $f10,48(fp)
   0x00012b50 +80:cvttq/c $f10,$f10
= 0x00012b54 +84:ftoit   $f10,t0

(gdb) b *0x12b50
Breakpoint 1 at 0x12b50: file
/home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90, line
48.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /space/homedirs/uros/test/pr61335.exe

Breakpoint 1, 0x00012b50 in cp_units::cp_unit_create
(string=error reading variable: Cannot access memory at address
0x120004000,
_string=5) at
/home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:48
48  unit_id=cp_units_none

(gdb) i r $f10
f108.0173244974249919e-310  (raw 0x9396)

The test passes with -mieee that allows denormals.

Uros.


Re: [PATCH] Fix PR61335

2014-06-06 Thread Uros Bizjak
On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote:

 2014-05-28  Richard Biener  rguent...@suse.de

 PR tree-optimization/61335
 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and
 new range fails, drop to varying.

 * gfortran.dg/pr61335.f90: New testcase.

 This testcase triggers SIGFPE on alpha due to the use of denormal
 operand. Maybe uninitialized value is used in line 48?

SIGFPE also triggers at the same place on x86_64 with unmasked FPE
exceptions (compile with -O0).

(gdb) b main
Breakpoint 1 at 0x401602: file
/home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90, line
115.
(gdb) r
Starting program: /home/uros/test/pr61335.exe
warning: no loadable sections found in added symbol-file
system-supplied DSO at 0x2aaab000

Breakpoint 1, main (argc=1, argv=0x7fffd88e) at
/home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:115
115 USE cp_units
(gdb) i r mxcsr
mxcsr  0x1f80   [ IM DM ZM OM UM PM ]
(gdb) set $mxcsr=0x1000
(gdb) i r mxcsr
mxcsr  0x1000   [ PM ]
(gdb) c
Continuing.

Program received signal SIGFPE, Arithmetic exception.
0x00400b60 in cp_units::cp_unit_create (string=error reading
variable: Cannot access memory at address 0x401c47, _string=5)
at /home/uros/gcc-svn/trunk/gcc/testsuite/gfortran.dg/pr61335.f90:49
49  kind_id=cp_ukind_none

   0x00400b57 +95:mov-0x28(%rbp),%eax
   0x00400b5a +98:mov%eax,-0x280(%rbp)
= 0x00400b60 +104:   cvttss2si -0x280(%rbp),%ecx

(gdb) i r mxcsr
mxcsr  0x1021   [ IE PE PM ]

Uros.


Re: [patch] Update catch(...) handlers to deal with __forced_unwind

2014-06-06 Thread Uros Bizjak
Hello!

 Failing to rethrow a __forced_unwind exception is very bad.

 This patch ensures we rethrow them in async tasks, and makes the
 shared state ready with a broken_promise so that waiting threads
 don't block forever. That seems reasonable to me, does anyone have any
 better ideas?

 Tested x86_64-linux, will wait for feedback before committing.

 Committed to trunk.

* testsuite/30_threads/async/forced_unwind.cc: New.
* testsuite/30_threads/packaged_task/forced_unwind.cc: New.

These two tests timeout on alpha-linux-gnu:

FAIL: 30_threads/async/forced_unwind.cc execution test
WARNING: program timed out.
FAIL: 30_threads/packaged_task/forced_unwind.cc execution test
WARNING: program timed out.

strace -f of 30_threads/async/forced_unwind.cc execution test:

...
open(/lib/libpthread.so.0, O_RDONLY|O_CLOEXEC) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\220\1\0\0\0\320r\0\0\0\0\0\0...,
832) = 832
fstat64(3, {st_mode=S_IFREG|0755, st_size=141449, ...}) = 0
mmap(NULL, 189528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x227c000
mprotect(0x2296000, 57344, PROT_NONE) = 0
mmap(0x22a4000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x22a4000
mmap(0x22a8000, 9304, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x22a8000
close(3)= 0
open(/lib/libc.so.6.1, O_RDONLY|O_CLOEXEC) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\220\1\0\0\0`_\2\0\0\0\0\0...,
832) = 832
fstat64(3, {st_mode=S_IFREG|0755, st_size=1646104, ...}) = 0
mmap(NULL, 1719888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x22ac000
mprotect(0x2438000, 57344, PROT_NONE) = 0
mmap(0x2446000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18a000) = 0x2446000
mmap(0x244e000, 7760, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x244e000
close(3)= 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x245
mprotect(0x2446000, 16384, PROT_READ) = 0
mprotect(0x22a4000, 8192, PROT_READ) = 0
mprotect(0x2278000, 8192, PROT_READ) = 0
mprotect(0x2254000, 8192, PROT_READ) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2452000
mprotect(0x217a000, 24576, PROT_READ) = 0
mprotect(0x120016000, 8192, PROT_READ)  = 0
mprotect(0x2032000, 8192, PROT_READ) = 0
munmap(0x2024000, 41134)= 0
set_tid_address(0x2450e50)  = 18325
set_robust_list(0x2450e60, 24)  = 0
futex(0x11f813260, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x11f813260, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1,
NULL, 22b0248) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRT_0, {0x2282db0, [], SA_SIGINFO}, NULL, 8, 0) = 0
rt_sigaction(SIGRT_1, {0x2282c70, [], SA_RESTART|SA_SIGINFO},
NULL, 8, 0) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_0 RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=9223372036854775807}) = 0
brk(0)  = 0x12001a000
brk(0x12003c000)= 0x12003c000
mmap(NULL, 8388608, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x2454000
mprotect(0x2454000, 8192, PROT_NONE) = 0
clone(Process 18326 attached
child_stack=0x2c52ae0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x2c532c0, tls=0x2c538e0,
child_tidptr=0x2c532c0) = 18326
[pid 18326] set_robust_list(0x2c532d0, 24 unfinished ...
[pid 18325] futex(0x2c532c0, FUTEX_WAIT, 18326, NULL unfinished ...
[pid 18326] ... set_robust_list resumed ) = 0
[pid 18326] mmap(NULL, 134217728, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2c54000
[pid 18326] munmap(0x2c54000, 54181888) = 0
[pid 18326] munmap(0x2000800, 12926976) = 0
[pid 18326] mprotect(0x2000400, 139264, PROT_READ|PROT_WRITE) = 0
[pid 18326] futex(0x227a1f4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 18326] futex(0x12001a08c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 18326] madvise(0x2454000, 8355840, MADV_DONTNEED) = 0
[pid 18326] exit(0) = ?
[pid 18326] +++ exited with 0 +++
... futex resumed )   = 0
futex(0x12001a098, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x12001a05c, FUTEX_WAIT_PRIVATE, 1, NULL

... the test hangs here ...

Uros.


Re: [patch] Update catch(...) handlers to deal with __forced_unwind

2014-06-06 Thread Uros Bizjak
On Fri, Jun 6, 2014 at 11:19 AM, Jonathan Wakely jwak...@redhat.com wrote:
 On 06/06/14 10:27 +0200, Uros Bizjak wrote:

 These two tests timeout on alpha-linux-gnu:

 FAIL: 30_threads/async/forced_unwind.cc execution test
 WARNING: program timed out.
 FAIL: 30_threads/packaged_task/forced_unwind.cc execution test
 WARNING: program timed out.


 Sorry about that, I don't know why.

 Does pthread_exit(0) use a __forced_unwind exception on
 alpha-linux-gnu? This should tell you ...


 #include bits/cxxabi_forced.h
 #include pthread.h

 void* f(void*) {
  try
  {
pthread_exit(0);
  }
  catch (__cxxabiv1::__forced_unwind const)
  {
__builtin_puts(unwind);
throw;
  }
  catch (...)
  {
__builtin_puts(something else);
throw;
  }
 }

 int main()
 {
  pthread_t t;
  pthread_create(t, 0, f, 0);
  pthread_join(t, 0);

 }

Strange, I don't get anything ...

$ g++ -lpthread pt.C
$ ./a.out
$
$ g++ --version
g++ (Gentoo 4.8.2 p1.3r1, pie-0.5.8r1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Maybe Richard knows why...

 [pid 18326] futex(0x227a1f4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
 [pid 18326] futex(0x12001a08c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
 [pid 18326] madvise(0x2454000, 8355840, MADV_DONTNEED) = 0
 [pid 18326] exit(0) = ?
 [pid 18326] +++ exited with 0 +++
 ... futex resumed )   = 0
 futex(0x12001a098, FUTEX_WAKE_PRIVATE, 2147483647) = 0
 futex(0x12001a05c, FUTEX_WAIT_PRIVATE, 1, NULL

 ... the test hangs here ...


 Could I get a stack trace of the remaining thread at that point?

Reading symbols from ./forced_unwind.exe...done.
(gdb) r
Starting program: /space/homedirs/uros/test/forced_unwind.exe
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/libthread_db.so.1.
[New Thread 0x2c531f0 (LWP 22587)]
[Thread 0x2c531f0 (LWP 22587) exited]
^C
Program received signal SIGINT, Interrupt.
0x02289ca4 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
(gdb) bt
#0  0x02289ca4 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#1  0x021279ec in
std::condition_variable::wait(std::unique_lockstd::mutex) ()
   from /usr/lib/gcc/alpha-unknown-linux-gnu/4.8.2/libstdc++.so.6
#2  0x000120001a80 in
waitstd::__future_base::_State_baseV2::wait()::lambda()  (__p=...,
__lock=..., this=0x12001a058)
at 
/home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/condition_variable:98
#3  wait (this=0x12001a020) at
/home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:323
#4  _M_get_result (this=0x11fc8f190) at
/home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:618
#5  get (this=0x11fc8f190) at
/home/uros/gcc-build/alphaev68-unknown-linux-gnu/libstdc++-v3/include/future:783
#6  main () at 
/home/uros/gcc-svn/trunk/libstdc++-v3/testsuite/30_threads/async/forced_unwind.cc:38
(gdb)

Uros.


[PATCH, i386]: Fix PR 61423, incorrect conversion from unsigned int to floating point

2014-06-06 Thread Uros Bizjak
Hello!

Attached patch fixes PR 61423. The problem was that splitters omitted
apparently necessary zero extension, and left garbage in the highpart
of the register.

2014-06-06  Uros Bizjak  ubiz...@gmail.com

PR target/61423
* config/i386/i386.md (*floatunssimode2_i387_with_xmm): New
define_insn_and_split pattern, merged from *floatunssimode2_1
and corresponding splitters.  Zero extend general register
or memory input operand to XMM temporary.  Enable for
TARGET_SSE2 and TARGET_INTER_UNIT_MOVES_TO_VEC only.
(floatunssimode2): Update expander predicate.

testsuite/ChangeLog:

2014-06-06  Uros Bizjak  ubiz...@gmail.com

PR target/61423
* gcc.target/i386/pr61423.c: New test.

The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Please note that the patch breaks bootstrap when gcc is configured
with --with-arch=core-avx-i --with-cpu=core-avx-i due to an
unrelated problem in REE pass. The failing preprocessed source from
the libgcc is attached to the PR.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 211316)
+++ config/i386/i386.md (working copy)
@@ -4943,66 +4943,37 @@
 
 ;; Avoid store forwarding (partial memory) stall penalty by extending
 ;; SImode value to DImode through XMM register instead of pushing two
-;; SImode values to stack. Note that even !TARGET_INTER_UNIT_MOVES_TO_VEC
-;; targets benefit from this optimization. Also note that fild
-;; loads from memory only.
+;; SImode values to stack. Also note that fild loads from memory only.
 
-(define_insn *floatunssimode2_1
-  [(set (match_operand:X87MODEF 0 register_operand =f,f)
+(define_insn_and_split *floatunssimode2_i387_with_xmm
+  [(set (match_operand:X87MODEF 0 register_operand =f)
(unsigned_float:X87MODEF
- (match_operand:SI 1 nonimmediate_operand x,m)))
-   (clobber (match_operand:DI 2 memory_operand =m,m))
-   (clobber (match_scratch:SI 3 =X,x))]
+ (match_operand:SI 1 nonimmediate_operand rm)))
+   (clobber (match_scratch:DI 3 =x))
+   (clobber (match_operand:DI 2 memory_operand =m))]
   !TARGET_64BIT
 TARGET_80387  X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode)
-TARGET_SSE
+TARGET_SSE2  TARGET_INTER_UNIT_MOVES_TO_VEC
   #
+   reload_completed
+  [(set (match_dup 3) (zero_extend:DI (match_dup 1)))
+   (set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+   (float:X87MODEF (match_dup 2)))]
+  
   [(set_attr type multi)
(set_attr mode MODE)])
 
-(define_split
-  [(set (match_operand:X87MODEF 0 register_operand)
-   (unsigned_float:X87MODEF
- (match_operand:SI 1 register_operand)))
-   (clobber (match_operand:DI 2 memory_operand))
-   (clobber (match_scratch:SI 3))]
-  !TARGET_64BIT
-TARGET_80387  X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode)
-TARGET_SSE
-reload_completed
-  [(set (match_dup 2) (match_dup 1))
-   (set (match_dup 0)
-   (float:X87MODEF (match_dup 2)))]
-  operands[1] = simplify_gen_subreg (DImode, operands[1], SImode, 0);)
-
-(define_split
-  [(set (match_operand:X87MODEF 0 register_operand)
-   (unsigned_float:X87MODEF
- (match_operand:SI 1 memory_operand)))
-   (clobber (match_operand:DI 2 memory_operand))
-   (clobber (match_scratch:SI 3))]
-  !TARGET_64BIT
-TARGET_80387  X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode)
-TARGET_SSE
-reload_completed
-  [(set (match_dup 2) (match_dup 3))
-   (set (match_dup 0)
-   (float:X87MODEF (match_dup 2)))]
-{
-  emit_move_insn (operands[3], operands[1]);
-  operands[3] = simplify_gen_subreg (DImode, operands[3], SImode, 0);
-})
-
 (define_expand floatunssimode2
   [(parallel
  [(set (match_operand:X87MODEF 0 register_operand)
   (unsigned_float:X87MODEF
 (match_operand:SI 1 nonimmediate_operand)))
-  (clobber (match_dup 2))
-  (clobber (match_scratch:SI 3))])]
+  (clobber (match_scratch:DI 3))
+  (clobber (match_dup 2))])]
   !TARGET_64BIT
 ((TARGET_80387  X87_ENABLE_FLOAT (X87MODEF:MODEmode, DImode)
-TARGET_SSE)
+TARGET_SSE2  TARGET_INTER_UNIT_MOVES_TO_VEC)
|| (SSE_FLOAT_MODE_P (MODEmode)  TARGET_SSE_MATH))
 {
   if (SSE_FLOAT_MODE_P (MODEmode)  TARGET_SSE_MATH)
Index: testsuite/gcc.target/i386/pr61423.c
===
--- testsuite/gcc.target/i386/pr61423.c (revision 0)
+++ testsuite/gcc.target/i386/pr61423.c (working copy)
@@ -0,0 +1,38 @@
+/* PR target/61423 */
+/* { dg-do run { target ia32 } } */
+/* { dg-options -O1 -ftree-vectorize -msse2 -mfpmath=387 -mtune=core2 } */
+
+#define N 1024
+static unsigned int A[N];
+
+double
+__attribute__((noinline))
+func (void)
+{
+  unsigned int sum = 0;
+  unsigned i;
+  double t;
+
+  for (i = 0; i  N; i++)
+sum += A[i];
+
+  t = sum;
+  return t;
+}
+
+int
+main ()
+{
+  unsigned i;
+  double d;
+
+  for(i = 0; i  N; i++)
+A[i] = 1;
+
+  d

Re: [patch] fix tests for AVX512

2014-06-08 Thread Uros Bizjak
On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com wrote:
 Hi,
 I've fixed tests for AVX512, so they could be compiled with -Werror
 -Wall. Please have a look.



 2014-05-19  Petr Murzin  petr.mur...@intel.com

 * gcc.target/i386/avx512f-vaddpd-2.c:  Add static void for CALC,
 void for TEST instead of static void.
 * gcc.target/i386/avx512f-vaddps-2.c: Ditto.
 * gcc.target/i386/avx512f-vblendmpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vblendmps-2.c: Ditto.
 * gcc.target/i386/avx512f-vbroadcastf32x4-2.c:Ditto.
 * gcc.target/i386/avx512f-vbroadcastf64x4-2.c:Ditto.
 * gcc.target/i386/avx512f-vbroadcasti32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vbroadcasti64x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vbroadcastsd-2.c: Ditto.
 * gcc.target/i386/avx512f-vbroadcastss-2.c: Ditto.
 * gcc.target/i386/avx512f-vcvtps2dq-2.c: Ditto.
 * gcc.target/i386/avx512f-vcvttps2dq-2.c: Ditto.
 * gcc.target/i386/avx512f-vdivpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vdivps-2.c: Ditto.
 * gcc.target/i386/avx512f-vextractf32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vextracti32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vmaxpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vmaxps-2.c: Ditto.
 * gcc.target/i386/avx512f-vminpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vminps-2.c: Ditto.
 * gcc.target/i386/avx512f-vmulpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vmulps-2.c: Ditto.
 * gcc.target/i386/avx512f-vpaddd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpaddq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpblendmd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpblendmq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpbroadcastd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpbroadcastq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpeqd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpeqq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpgtd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpgtq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovdb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovdw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovqb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovqw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsdb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsdw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpslld-2.c: Ditto.
 * gcc.target/i386/avx512f-vpslldi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsllq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsllqi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrad-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsradi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsraq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsraqi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsravd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsravq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsubd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsubq-2.c: Ditto.
 * gcc.target/i386/avx512f-vptestmd-2.c: Ditto.
 * gcc.target/i386/avx512f-vptestmq-2.c: Ditto.
 * gcc.target/i386/avx512f-vptestnmd-2.c: Ditto.
 * gcc.target/i386/avx512f-vptestnmq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpunpckhdq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpunpckhqdq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpunpckldq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpunpcklqdq-2.c: Ditto.
 * gcc.target/i386/avx512f-vscalefpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vscalefps-2.c: Ditto.
 * gcc.target/i386/avx512f-vshuff32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vshuff64x2-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi32x4-2.c: Ditto.
 * gcc.target/i386/avx512f-vshufi64x2-2.c: Ditto.
 * gcc.target/i386/avx512f-vsubpd-2.c: Ditto.
 * gcc.target/i386/avx512f-vsubps-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovdb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovdw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovqb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovqw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsdb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsdw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqb-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpmovsqw-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsllvd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsllvq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrld-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrldi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrlq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrlqi-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrlvd-2.c: Ditto.
 * gcc.target/i386/avx512f-vpsrlvq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpshufd-2.c: Delete variables, void for TEST
 instead of static void.
 * gcc.target/i386/avx512f-vpcmpged-2.c: Add static void for CALC,
 delete unused variables.
 * gcc.target/i386/avx512f-vpcmpgeq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpgeud-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpgeuq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpled-2.c: Add static void for CALC,
 delete unused variables, void for TEST instead of static void.
 * gcc.target/i386/avx512f-vpcmpleq-2.c: Ditto.
 * gcc.target/i386/avx512f-vpcmpleud-2.c: Ditto.
 

Re: [patch] fix tests for AVX512

2014-06-09 Thread Uros Bizjak
On Mon, Jun 9, 2014 at 1:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello Uroš,
 On 08 Jun 11:26, Uros Bizjak wrote:
 On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com wrote:
  Hi,
  I've fixed tests for AVX512, so they could be compiled with -Werror
  -Wall. Please have a look.
 From a quick look, this looks OK.
 Thanks, checked into trunk.
 Could we apply that to 4.9 branch?

OK, but please wait a couple of days to check if everything is OK in
mainline and also for Release Manager to reject the patch.

Thanks,
Uros.


Re: [PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg

2014-06-09 Thread Uros Bizjak
Ping.

On Mon, Jun 2, 2014 at 11:12 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 A problem was uncovered by -march=corei7 -mtune=intel -m32 with
 i386/memcpy-[23] testcase in decide_alg subroutine [1]. Although the
 max size of the transfer was known, the memcpy was not inlined, as
 expected by the testcase.

 The core of the problem can be seen in the definition of 32bit
 intel_memcpy stringop alg:

   {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}},

 Please note that the last algorithm sets its maximum size to -1,
 unlimited. However, in decide_alg, the same number also signals that
 no algorithm sets its size, so expected_size is never calculated. In
 the loop that sets maximal size for user defined algorithm, it is
 assumed that -1 belongs exclusively to libcall, which is not the
 case in the above intel_memcpy definition:

   if (candidate != libcall  candidate  usable)
   max = algs-size[i].max;

 When the last non-libcall algorithm sets its maximum to -1 (aka
 unlimited), this value fails following test:

   if (max  1  (unsigned HOST_WIDE_INT) max = max_size

 and expected_size is never calculated.

 Attached patch fixes this oversight, so -1 means unlimited size and
 0 means that size was never set. The patch also considers these two
 special values when choosing a maximum size for dynamic check.

 2014-06-02  Uros Bizjak  ubiz...@gmail.com

 * config/i386/i386.c (decide_alg): Correctly handle maximum size of
 stringop algorithm.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
 {,-m32}, also with
 RUNTESTFLAGS=--target_board=unix/-march=corei7/-mtune=intel\{,-m32\},
 where it fixes both memcpy failures from [1].

 [1] https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg00127.html

 Jan, can you please review the patch, to check if the logic is OK?

 Uros.
Index: ChangeLog
===
--- ChangeLog   (revision 211140)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-06-02  Uros Bizjak  ubiz...@gmail.com
+
+   * config/i386/i386.c (decide_alg): Correctly handle maximum size of
+   stringop algorithm.
+
 2014-06-02  Marcus Shawcroft  marcus.shawcr...@arm.com
 
* config/aarch64/aarch64.md (set_fpcr): Drop ISB after FPCR write.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 211140)
+++ config/i386/i386.c  (working copy)
@@ -23828,7 +23828,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
 {
   const struct stringop_algs * algs;
   bool optimize_for_speed;
-  int max = -1;
+  int max = 0;
   const struct processor_costs *cost;
   int i;
   bool any_alg_usable_p = false;
@@ -23866,7 +23866,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
   /* If expected size is not known but max size is small enough
  so inline version is a win, set expected size into
  the range.  */
-  if (max  1  (unsigned HOST_WIDE_INT) max = max_size
+  if (((max  1  (unsigned HOST_WIDE_INT) max = max_size) || max == -1)
expected_size == -1)
 expected_size = min_size / 2 + max_size / 2;
 
@@ -23955,7 +23955,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT exp
 *dynamic_check = 128;
   return loop_1_byte;
 }
-  if (max == -1)
+  if (max = 0)
max = 4096;
   alg = decide_alg (count, max / 2, min_size, max_size, memset,
zero_memset, dynamic_check, noalign);


Re: [PATCH, i386] Remove use of vpmacsdql instruction from multiplication.

2014-06-10 Thread Uros Bizjak
On Tue, Jun 10, 2014 at 12:30 PM, Gopalasubramanian, Ganesh
ganesh.gopalasubraman...@amd.com wrote:
 Hi,

 The below patch fixes the issue with 64-bit multiplication.
 The instruction vpmacsdql does signed 32-bit multiplication.
 For V2DImode, we require widened unsigned multiplication.
 So, replacing the vpmacsdql instruction with vpmuludq and vpaddq.

 This patch had been already discussed in 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908

 With required change in the test xop-imul64-vector.c,  make check passes. Is 
 it OK for upstream?

 Regards
 Ganesh

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index d0a1253..c158612 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,9 @@
 +2014-06-10  Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com
 +
 +   * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): Issue instructions
 +vpmuludq and vpaddq instead of vpmacsdql for handling 32-bit
 +multiplication.


OK for mainline and release branches.

Thanks,
Uros.


Re: [PATCH, libbid]: Fix variable ‘Ql’ set but not used warnings

2014-06-10 Thread Uros Bizjak
On Mon, May 26, 2014 at 6:52 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Attached patch fixes several variable ‘Ql’ set but not used warnings
 in bid128_div.c and bid64_div.c libbid sources. We can simply use
 __mul_128x128_high functions when lowpart is not needed.

 2014-05-26  Uros Bizjak  ubiz...@gmail.com

 * bid128_div.c (BID128_FUNCTION_ARG2): Remove unused variable 'Ql'.
 Call __mul_128x128_high instead of __mul_128x128_full.
 (TYPE0_FUNCTION_ARGTYPE1_ARGTYPE2): Ditto.
 (BID128_FUNCTION_ARGTYPE1_ARG128): Ditto.
 (BID128_FUNCTION_ARG128_ARGTYPE2): Ditto.
 * bid64_div.c (TYPE0_FUNCTION_ARGTYPE1_ARG128): Ditto.
 (TYPE0_FUNCTION_ARG128_ARGTYPE2): Ditto.
 (TYPE0_FUNCTION_ARG128_ARG128): Ditto.

 Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

The patch was OK'd offline by H.J.

Committed to mainline SVN.

Uros.


Re: [PATCH, PR61446] Fix mode for register copy in REE pass

2014-06-11 Thread Uros Bizjak
On Tue, Jun 10, 2014 at 3:45 PM, Dominique Dhumieres domi...@lps.ens.fr wrote:
 This patch fixes PR61446. ...

 Confirmed, it also allows to bootstrap Core* targets.
 Could it be reviewed and committed ASAP?

 2014-06-09  Ilya Enkovich  ilya.enkov...@intel.com

 PR 61446
 * ree.c (find_and_remove_re): Narrow mode for register copy
 if required.

Please also add the testcase form the PR.

(I am not RTL reviewer, so I can't approve the patch).

Uros.


Re: [PATCH, PR61446] Fix mode for register copy in REE pass

2014-06-11 Thread Uros Bizjak
On Wed, Jun 11, 2014 at 3:19 PM, Dominique Dhumieres domi...@lps.ens.fr wrote:
 (I am not RTL reviewer, so I can't approve the patch).

 Is https://gcc.gnu.org/ml/gcc-regression/2014-06/ accepatble?

Yes, these are bootstraps with non-default configurations.

Uros.


Re: [PATCH, PR61446] Fix mode for register copy in REE pass

2014-06-11 Thread Uros Bizjak
On Wed, Jun 11, 2014 at 6:11 PM, Ilya Enkovich enkovich@gmail.com wrote:
 On 11 Jun 14:59, Uros Bizjak wrote:
 On Tue, Jun 10, 2014 at 3:45 PM, Dominique Dhumieres domi...@lps.ens.fr 
 wrote:
  This patch fixes PR61446. ...
 
  Confirmed, it also allows to bootstrap Core* targets.
  Could it be reviewed and committed ASAP?

  2014-06-09  Ilya Enkovich  ilya.enkov...@intel.com
 
  PR 61446
  * ree.c (find_and_remove_re): Narrow mode for register copy
  if required.

 Please also add the testcase form the PR.

 (I am not RTL reviewer, so I can't approve the patch).

 Uros.

 Hi,

 rgis one is the same but with testcase added.

 Bootstrapped and tested on linux-x86_64.

 Thanks,
 Ilya
 --
 gcc/

 2014-06-11  Ilya Enkovich  ilya.enkov...@intel.com

 PR 61446
 * ree.c (find_and_remove_re): Narrow mode for register copy
 if required.

 gcc/testsuite/

 2014-06-11  Ilya Enkovich  ilya.enkov...@intel.com

  * gcc.target/i386/pr61446.c : New.


 diff --git a/gcc/ree.c b/gcc/ree.c
 index ade413e..6d34764 100644
 --- a/gcc/ree.c
 +++ b/gcc/ree.c
 @@ -1088,14 +1088,24 @@ find_and_remove_re (void)
/* Use the mode of the destination of the defining insn
  for the mode of the copy.  This is necessary if the
  defining insn was used to eliminate a second extension
 -that was wider than the first.  */
 +that was wider than the first.  Truncate mode if it is
 +too wide for destination reg.  */
rtx sub_rtx = *get_sub_rtx (def_insn);
rtx pat = PATTERN (curr_insn);
 -  rtx new_dst = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
 -REGNO (XEXP (SET_SRC (pat), 0)));
 -  rtx new_src = gen_rtx_REG (GET_MODE (SET_DEST (sub_rtx)),
 -REGNO (SET_DEST (pat)));
 -  rtx set = gen_rtx_SET (VOIDmode, new_dst, new_src);
 +  unsigned int regno = REGNO (XEXP (SET_SRC (pat), 0));
 +  enum machine_mode mode = GET_MODE (SET_DEST (sub_rtx));
 +  rtx new_dst, new_src, set;
 +
 +  if (HARD_REGNO_NREGS (regno, mode) != 1)
 +   {
 + mode = GET_CLASS_NARROWEST_MODE (GET_MODE_CLASS (mode));
 + while (HARD_REGNO_NREGS (regno, GET_MODE_WIDER_MODE (mode)) == 1)
 +   mode = GET_MODE_WIDER_MODE (mode);
 +   }
 +
 +  new_dst = gen_rtx_REG (mode, REGNO (XEXP (SET_SRC (pat), 0)));
 +  new_src = gen_rtx_REG (mode, REGNO (SET_DEST (pat)));
 +  set = gen_rtx_SET (VOIDmode, new_dst, new_src);
emit_insn_after (set, def_insn);
  }

 diff --git a/gcc/testsuite/gcc.target/i386/pr61446.c 
 b/gcc/testsuite/gcc.target/i386/pr61446.c
 new file mode 100644
 index 000..8537cdb
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr61446.c
 @@ -0,0 +1,14 @@
 +/* PR rtl-optimization/61446 */
 +
 +/* { dg-do compile } */
 +/* { dg-options -O2 -m32 -march=corei7 } */

This should read:

/* { dg-do compile { target { ia32 } } } */
/* { dg-options -O2 -march=corei7 -mfpmath=387 } */

The x86 part is OK with this change.

Uros.


Re: [PATCH] Fix PR61335

2014-06-17 Thread Uros Bizjak
On Fri, Jun 6, 2014 at 10:07 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote:

 2014-05-28  Richard Biener  rguent...@suse.de

 PR tree-optimization/61335
 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and
 new range fails, drop to varying.

 * gfortran.dg/pr61335.f90: New testcase.

 This testcase triggers SIGFPE on alpha due to the use of denormal
 operand. Maybe uninitialized value is used in line 48?

 SIGFPE also triggers at the same place on x86_64 with unmasked FPE
 exceptions (compile with -O0).

Attached patch initializes problematic array to zero instead of
uninitialized value.

2014-06-17  Uros Bizjak  ubiz...@gmail.com

* gfortran.dg/pr61335.f90 (cp_unit_create): Initialize
unit_id and kind_id to zero.

Tested on alphaev68-linux-gnu and x86_64-linux-gnu.

OK for mainline?

Uros.

Index: gfortran.dg/pr61335.f90
===
--- gfortran.dg/pr61335.f90 (revision 211723)
+++ gfortran.dg/pr61335.f90 (working copy)
@@ -45,8 +45,8 @@
 LOGICAL  :: failure

 failure=.FALSE.
-unit_id=cp_units_none
-kind_id=cp_ukind_none
+unit_id=0
+kind_id=0
 power=0
 i_low=1
 i_high=1


Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Uros Bizjak
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 Are i386 changes ok?
 Patches with corresponding changes and new tests are attached.

Please remove all target selectors from dg-options and dg-final
testcase directives, they are not needed inside gcc.dg/i386 directory.

The patch is OK with this change.

Thanks,
Uros.


[PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Uros Bizjak
Hello!

Attached patch fixes recently added sibcall insns and their
corresponding peephole2 patterns:

- There is no need for new memory_nox32_operand. A generic
memory_operand can be used, since new insns and peephole2 patterns
should be disabled for TARGET_X32 entirely.
- Adds missing m constraint in insn patterns.
- Macroizes peephole2 patterns
- Adds check that eliminated register is really dead after the call
(maybe an overkill, but some hard-to-debug problems surfaced due to
missing liveness checks in the past)
- Fixes call RTXes in sibcall_pop related patterns (and fixes two
newly introduced warnings in i386.md)

2014-06-18  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern.
Do not use unspec as call operand.  Use memory_operand instead of
memory_nox32_operand and add m operand constraint.  Disable
pattern for TARGET_X32.
(*sibcall_pop_memory): Ditto.
(*sibcall_value_memory): Ditto.
(*sibcall_value_pop_memory): Ditto.
(sibcall peepholes): Merge SImode and DImode patterns using
W mode iterator.  Use memory_operand instead of memory_nox32_operand.
Disable pattern for TARGET_X32.  Check if eliminated register is
really dead after call insn.  Generate call RTX without unspec operand.
(sibcall_value peepholes): Ditto.
(sibcall_pop peepholes): Fix call insn RTXes.  Use memory_operand
instead of memory_nox32_operand.  Check if eliminated register is
really dead after call insn. Generate call RTX without unspec operand.
(sibcall_value_pop peepholes): Ditto.
* config/i386/predicates.md (memory_nox32_operand): Remove predicate.

The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 211725)
+++ i386.md (working copy)
@@ -11354,53 +11354,38 @@
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
-(define_insn *sibcall_intern
-  [(call (unspec [(mem:QI (match_operand:W 0 memory_nox32_operand))]
-  UNSPEC_PEEPSIB)
-(match_operand 1))]
-  
+(define_insn *sibcall_memory
+  [(call (mem:QI (match_operand:W 0 memory_operand m))
+(match_operand 1))
+   (unspec [(const_int 0)] UNSPEC_PEEPSIB)]
+  !TARGET_X32
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
 (define_peephole2
-  [(set (match_operand:DI 0 register_operand)
-(match_operand:DI 1 memory_nox32_operand))
+  [(set (match_operand:W 0 register_operand)
+   (match_operand:W 1 memory_operand))
(call (mem:QI (match_dup 0))
  (match_operand 3))]
-  TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (1))
-  [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
+  !TARGET_X32  SIBLING_CALL_P (peep2_next_insn (1))
+peep2_reg_dead_p (2, operands[0])
+  [(parallel [(call (mem:QI (match_dup 1))
+   (match_dup 3))
+ (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
 (define_peephole2
-  [(set (match_operand:DI 0 register_operand)
-(match_operand:DI 1 memory_nox32_operand))
+  [(set (match_operand:W 0 register_operand)
+   (match_operand:W 1 memory_operand))
(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
(call (mem:QI (match_dup 0))
  (match_operand 3))]
-  TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (2))
+  !TARGET_X32  SIBLING_CALL_P (peep2_next_insn (2))
+peep2_reg_dead_p (3, operands[0])
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
+   (parallel [(call (mem:QI (match_dup 1))
+   (match_dup 3))
+ (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
-(define_peephole2
-  [(set (match_operand:SI 0 register_operand)
-(match_operand:SI 1 memory_nox32_operand))
-   (call (mem:QI (match_dup 0))
- (match_operand 3))]
-  !TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (1))
-  [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
-
-(define_peephole2
-  [(set (match_operand:SI 0 register_operand)
-(match_operand:SI 1 memory_nox32_operand))
-   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (mem:QI (match_dup 0))
- (match_operand 3))]
-  !TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (2))
-  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) (match_dup 3))])
-
 (define_expand call_pop
   [(parallel [(call (match_operand:QI 0)
(match_operand:SI 1))
@@ -11434,42 +11419,52 @@
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
-(define_insn *sibcall_pop_intern
-  [(call (unspec [(mem:QI (match_operand:SI 0 memory_nox32_operand))]
-   UNSPEC_PEEPSIB)
+(define_insn *sibcall_pop_memory

Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Uros Bizjak
On Wed, Jun 18, 2014 at 2:24 PM, Kai Tietz ktiet...@googlemail.com wrote:

 The following change in predicates.md seems to be a bit premature.
 There is still the point about Darwin's PIC issue for unspec-gotpcrel.

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387

   return ANY_QI_REG_P (op);
 })

 +;; Return true if OP is a memory operands that can be used in sibcalls.
 (define_predicate sibcall_memory_operand
 -  (match_operand 0 memory_operand)
 -{
 -  return CONSTANT_P (XEXP (op, 0));
 -})
 +  (and (match_operand 0 memory_operand)
 +   (match_test CONSTANT_P (XEXP (op, 0)

 as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point.
  In general there is still the question why this issue just happens
 for Darwin, but not for linux.  For linux that gotpcrel-code path
 seems not to be hit at all (at least is that what Ians told).

Oh, this part doesn't change any functionality at all. The predicate
is just written in a different way.

Uros.


Re: [Patch, i386] Separate Intel processor with expanded ISA

2014-01-27 Thread Uros Bizjak
On Mon, Jan 27, 2014 at 10:15 AM, Uros Bizjak ubiz...@gmail.com wrote:

 +2013-12-29  Allan Sandfeld Jensen  sandf...@kde.org

 Missing space in ChangeLog entry.

 + * config/i386/i386.c (get_builtin_code_for_version): Separate
 + Westmere from Nehalem, Ivy Bridge from Sandy Bridge and
 + Broadwell from Haswell.

 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -31298,18 +31298,27 @@ get_builtin_code_for_version (tree decl,
 tree *predicate_list)
priority = P_PROC_SSSE3;
break;
  case PROCESSOR_NEHALEM:
 -  /* We translate arch=corei7 and arch=nehelam to
 - corei7 so that it will be mapped to M_INTEL_COREI7
 - as cpu type to cover all M_INTEL_COREI7_XXXs.  */
 -  arg_str = corei7;
 +  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_AES)
 + arg_str = westmere;
 +  else
 + /* We translate arch=corei7 and arch=nehelam to

 Trivial typo above: arch=nehalem.

 OK for mainline with these changes.

I have committed slightly reformated patches with following ChangeLog
to mainline SVN.

2014-01-27  Allan Sandfeld Jensen  sandf...@kde.org

* config/i386/i386.c (get_builtin_code_for_version): Separate
Westmere from Nehalem, Ivy Bridge from Sandy Bridge and
Broadwell from Haswell.

testsuite/ChangeLog:

2014-01-27  Allan Sandfeld Jensen  sandf...@kde.org

* g++.dg/ext/mv16.C: New tests.

Uros.


Re: PATCH: PR target/59672: Add -m16 support for x86

2014-01-28 Thread Uros Bizjak
On Mon, Jan 27, 2014 at 8:44 PM, H.J. Lu hongjiu...@intel.com wrote:

 The .code16gcc directive was added to binutils back in 1999:

 ---
'.code16gcc' provides experimental support for generating 16-bit code
 from gcc, and differs from '.code16' in that 'call', 'ret', 'enter',
 'leave', 'push', 'pop', 'pusha', 'popa', 'pushf', and 'popf'
 instructions default to 32-bit size.  This is so that the stack pointer
 is manipulated in the same way over function calls, allowing access to
 function parameters at the same stack offsets as in 32-bit mode.
 '.code16gcc' also automatically adds address size prefixes where
 necessary to use the 32-bit addressing modes that gcc generates.
 ---

 It encodes 32-bit assembly instructions generated by GCC in 16-bit format
  so that GCC can be used to generate 16-bit instructions.  To do that, the
  .code16gcc directive may be placed at the very beginning of the assembly
  code.  This patch adds -m16 to x86 backend by:

 1. Add -m16 and make it mutually exclusive with -m32, -m64 and -mx32.
 2. Treat -m16 like -m32 so that --32 is passed to assembler.
 3. Output .code16gcc at the very beginning of the assembly code.
 4. Turn off 64-bit ISA when -m16 is used.

 Tested on Linux/x86 and Linux/x86-64.  OK for trunk?

 Thanks.

 H.J.
 ---
 PR target/59672
 * config/i386/gnu-user64.h (SPEC_32): Add m16| to m32.
 (SPEC_X32): Likewise.
 (SPEC_64): Likewise.
 * config/i386/i386.c (ix86_option_override_internal): Turn off
 OPTION_MASK_ISA_64BIT, OPTION_MASK_ABI_X32 and OPTION_MASK_ABI_64
 for TARGET_16BIT.
 (x86_file_start): Output .code16gcc for TARGET_16BIT.
 * config/i386/i386.h (TARGET_16BIT): New macro.
 (TARGET_16BIT_P): Likewise.
 * config/i386/i386.opt: Add m16.
 * doc/invoke.texi: Document -m16.

OK for mainline, needs OK from RMs for a backport.

Please also add the entry to Changes.html, this is user-visible change.

Thanks,
Uros.


Re: PATCH: PR target/59672: Add -m16 support for x86

2014-01-28 Thread Uros Bizjak
On Tue, Jan 28, 2014 at 5:35 PM, H.J. Lu hjl.to...@gmail.com wrote:

 The .code16gcc directive was added to binutils back in 1999:

 scan-asm testcase doesn't do anything useful.  The only
 difference in assembly code between -m16 and -m32 is the
 .code16gcc directive  All magic is done in assembler.

The test would just pass -m16 in dg-options and scan for the above
directive. It is a simple test that -m16 works as expected.

Uros.


Re: PATCH: PR target/59672: Add -m16 support for x86

2014-01-28 Thread Uros Bizjak
On Tue, Jan 28, 2014 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Jan 27, 2014 at 8:44 PM, H.J. Lu hongjiu...@intel.com wrote:

 The .code16gcc directive was added to binutils back in 1999:

 ---
'.code16gcc' provides experimental support for generating 16-bit code
 from gcc, and differs from '.code16' in that 'call', 'ret', 'enter',
 'leave', 'push', 'pop', 'pusha', 'popa', 'pushf', and 'popf'
 instructions default to 32-bit size.  This is so that the stack pointer
 is manipulated in the same way over function calls, allowing access to
 function parameters at the same stack offsets as in 32-bit mode.
 '.code16gcc' also automatically adds address size prefixes where
 necessary to use the 32-bit addressing modes that gcc generates.
 ---

 It encodes 32-bit assembly instructions generated by GCC in 16-bit format
  so that GCC can be used to generate 16-bit instructions.  To do that, the
  .code16gcc directive may be placed at the very beginning of the assembly
  code.  This patch adds -m16 to x86 backend by:

 1. Add -m16 and make it mutually exclusive with -m32, -m64 and -mx32.
 2. Treat -m16 like -m32 so that --32 is passed to assembler.
 3. Output .code16gcc at the very beginning of the assembly code.
 4. Turn off 64-bit ISA when -m16 is used.

 Tested on Linux/x86 and Linux/x86-64.  OK for trunk?

 Thanks.

 H.J.
 ---
 PR target/59672
 * config/i386/gnu-user64.h (SPEC_32): Add m16| to m32.
 (SPEC_X32): Likewise.
 (SPEC_64): Likewise.
 * config/i386/i386.c (ix86_option_override_internal): Turn off
 OPTION_MASK_ISA_64BIT, OPTION_MASK_ABI_X32 and OPTION_MASK_ABI_64
 for TARGET_16BIT.
 (x86_file_start): Output .code16gcc for TARGET_16BIT.
 * config/i386/i386.h (TARGET_16BIT): New macro.
 (TARGET_16BIT_P): Likewise.
 * config/i386/i386.opt: Add m16.
 * doc/invoke.texi: Document -m16.

 OK for mainline, needs OK from RMs for a backport.

 Please also add the entry to Changes.html, this is user-visible change.

Oh, a short scan-asm testcase would be nice, too.

Thanks,
Uros.


Re: [PATCH][AVX512] Swap Yk and k constraints.

2014-01-30 Thread Uros Bizjak
On Thu, Jan 30, 2014 at 11:54 AM, Ilya Tocar tocarip.in...@gmail.com wrote:

 Turns out that for Icc meaning of Yk and k constraints
 (exposed through inline asm) is opposite to current GCC implementation.
 As Icc with such behavior was already releases and GCC wasn't. I propose
 to swap meaning of Yk and k constraints. Changes are pretty mechanical.
 Bootstraps/passes make check/SPEC2006. Ok for trunk?

 Here is ChangeLog:

 2014-01-30  Ilya Tocar  ilya.to...@intel.com

 * config/i386/constraints.md (Yk): Swap meaning with k.
 * config/i386/i386.md (movhi_internal): Change Yk to k.
 (movqi_internal): Ditto.
 (*klogicmode): Ditto.
 (*andhi_1): Ditto.
 (*andqi_1): Ditto.
 (kandnmode): Ditto.
 (*codehi_1): Ditto.
 (*codeqi_1): Ditto.
 (kxnormode): Ditto.
 (kortestzhi): Ditto.
 (kortestchi): Ditto.
 (kunpckhi): Ditto.
 (*one_cmplhi2_1): Ditto.
 (*one_cmplqi2_1): Ditto.
 * config/i386/sse.md (): Change k to Yk.
 (avx512f_loadmode_mask): Ditto.
 (avx512f_blendmmode): Ditto.
 (avx512f_storemode_mask): Ditto.
 (avx512f_storeussemodesuffix512_mask): Ditto.
 (avx512f_storedqumode_mask): Ditto.
 (avx512f_cmpmode3mask_scalar_merge_nameround_saeonly_name): 
 Ditto.
 (avx512f_ucmpmode3mask_scalar_merge_name): Ditto.
 (avx512f_vmcmpmode3round_saeonly_name): Ditto.
 (avx512f_vmcmpmode3_maskround_saeonly_name): Ditto.
 (avx512f_maskcmpmode3): Ditto.
 (avx512f_fmadd_mode_maskround_name): Ditto.
 (avx512f_fmadd_mode_mask3round_name): Ditto.
 (avx512f_fmsub_mode_maskround_name): Ditto.
 (avx512f_fmsub_mode_mask3round_name): Ditto.
 (avx512f_fnmadd_mode_maskround_name): Ditto.
 (avx512f_fnmadd_mode_mask3round_name): Ditto.
 (avx512f_fnmsub_mode_maskround_name): Ditto.
 (avx512f_fnmsub_mode_mask3round_name): Ditto.
 (avx512f_fmaddsub_mode_maskround_name): Ditto.
 (avx512f_fmaddsub_mode_mask3round_name): Ditto.
 (avx512f_fmsubadd_mode_maskround_name): Ditto.
 (avx512f_fmsubadd_mode_mask3round_name): Ditto.
 (avx512f_vextractshuffletype32x4_1_maskm): Ditto.
 (vec_extract_lo_mode_maskm): Ditto.
 (vec_extract_hi_mode_maskm): Ditto.
 (avx512f_vternlogmode_mask): Ditto.
 (avx512f_fixupimmmode_maskround_saeonly_name): Ditto.
 (avx512f_sfixupimmmode_maskround_saeonly_name): Ditto.
 (avx512f_codepmov_src_lowermode2_mask): Ditto.
 (avx512f_codev8div16qi2_mask): Ditto.
 (avx512f_codev8div16qi2_mask_store): Ditto.
 (avx512f_eqmode3mask_scalar_merge_name_1): Ditto.
 (avx512f_gtmode3mask_scalar_merge_name): Ditto.
 (avx512f_testmmode3mask_scalar_merge_name): Ditto.
 (avx512f_testnmmode3mask_scalar_merge_name): Ditto.
 (*avx512pf_gatherpfmodesf_mask): Ditto.
 (*avx512pf_gatherpfmodedf_mask): Ditto.
 (*avx512pf_scatterpfmodesf_mask): Ditto.
 (*avx512pf_scatterpfmodedf_mask): Ditto.
 (avx512cd_maskb_vec_dupv8di): Ditto.
 (avx512cd_maskw_vec_dupv16si): Ditto.
 (avx512f_vpermi2varmode3_maskz): Ditto.
 (avx512f_vpermi2varmode3_mask): Ditto.
 (avx512f_vpermi2varmode3_mask): Ditto.
 (avx512f_vpermt2varmode3_maskz): Ditto.
 (*avx512f_gathersimode): Ditto.
 (*avx512f_gathersimode_2): Ditto.
 (*avx512f_gatherdimode): Ditto.
 (*avx512f_gatherdimode_2): Ditto.
 (*avx512f_scattersimode): Ditto.
 (*avx512f_scatterdimode): Ditto.
 (avx512f_compressmode_mask): Ditto.
 (avx512f_compressstoremode_mask): Ditto.
 (avx512f_expandmode_mask): Ditto.
 * config/i386/subst.md (mask): Change k to Yk.
 (mask_scalar_merge): Ditto.
 (sd): Ditto.

 And for tests:

 2014-01-30  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx512f-inline-asm.c: Swap Yk and k.
 * gcc.target/i386/avx512f-kmovw-1.c: Also allow k0.

OK.

Thanks,
Uros.


Re: [PATCH][AVX512] Fix rounding operand.

2014-01-30 Thread Uros Bizjak
On Thu, Jan 30, 2014 at 1:50 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 I've found some problems with embedded rounding implementation.
 First constants are already defined in smmintrin.h, so we shouldn't
 redefine them.
 Second problem is bigger: currently rounding argument to intrinsic
 is one of _MM_FROUND_TO_NEAREST_INT, _MM_FROUND_TO_NEG_INF,
 _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_ZERO, _MM_FROUND_CUR_DIRECTION,
 _MM_FROUND_NO_EXC, but actually it should be
 _MM_FROUND_NO_EXC or _MM_FROUND_CUR_DIRECTION for SAE and
 _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC for rounding. That's
 how Icc does it, because while currently rounding implies sae it may be
 not true in future. I've splited rounding and sae in print_operand into
 'R' and 'r' because we can't distinguish between 8 and 8 | 0. I've
 also run sed on tests to correct rounding arguments. While patch is huge
 most of it is result of sed. It bootstraps, passes make check/SPEC2006.i
 Ok for trunk?

 Here is ChangeLog.

 2014-01-30  Ilya Tocar  ilya.to...@intel.com

 * config/i386/avx512fintrin.h (_MM_FROUND_TO_NEAREST_INT),
 (_MM_FROUND_TO_NEG_INF), (_MM_FROUND_TO_POS_INF),
 (_MM_FROUND_TO_ZERO), (_MM_FROUND_CUR_DIRECTION): Are already defined
 in smmintrin.h, remove them.
 (_MM_FROUND_NO_EXC): Same as above, bit also wrong value.
 * config/i386/i386.c (ix86_print_operand): Split sae and rounding.
 * config/i386/i386.md (ROUND_SAE): Fix value.
 * config/i386/predicates.md (const_4_or_8_to_11_operand): New.
 (const48_operand): New.
 * config/i386/subst.md (round), (round_expand): Use
 const_4_or_8_to_11_operand.
 (round_saeonly), (round_saeonly_expand): Use const48_operand.

 2014-01-30  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx-1.c: Use correct rounding values.
 * gcc.target/i386/avx512f-vaddpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vaddps-1.c: Ditto.
 * gcc.target/i386/avx512f-vaddsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vaddss-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtdq2ps-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtpd2dq-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtpd2ps-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtpd2udq-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtps2dq-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtps2udq-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsd2si-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsd2si64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsd2ss-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsd2usi-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsd2usi64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsi2sd64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsi2ss-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtsi2ss64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtss2si-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtss2si64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtss2usi-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtss2usi64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtudq2ps-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtusi2sd64-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtusi2ss-1.c: Ditto.
 * gcc.target/i386/avx512f-vcvtusi2ss64-1.c: Ditto.
 * gcc.target/i386/avx512f-vdivpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vdivps-1.c: Ditto.
 * gcc.target/i386/avx512f-vdivsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vdivss-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddXXXsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddXXXss-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddsubXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmaddsubXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubXXXsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubXXXss-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubaddXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfmsubaddXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmaddXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmaddXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmaddXXXsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmaddXXXss-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmsubXXXpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmsubXXXps-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmsubXXXsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vfnmsubXXXss-1.c: Ditto.
 * gcc.target/i386/avx512f-vmulpd-1.c: Ditto.
 * gcc.target/i386/avx512f-vmulps-1.c: Ditto.
 * gcc.target/i386/avx512f-vmulsd-1.c: Ditto.
 * gcc.target/i386/avx512f-vmulss-1.c: 

Re: [PATCH] Two small i?86 *intrin* warning fixes

2014-01-30 Thread Uros Bizjak
On Thu, Jan 30, 2014 at 6:52 PM, Jakub Jelinek ja...@redhat.com wrote:

 While looking at some other PR, I've stripped line notes and got
 pr59947.ii.bak:26330:74: error: ISO C++ forbids declaration of 
 '_mm512_mask_cvtusepi64_storeu_epi32' with no type [-fpermissive]
  _mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A)
   ^
 pr59947.ii.bak: In function 'float _cvtsh_ss(short unsigned int)':
 pr59947.ii.bak:30674:65: warning: narrowing conversion of '__S' from 'short 
 unsigned int' to 'short int' inside { } [-Wnarrowing]
__v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 };
  ^
 warnings that would normally only show up with -Wsystem-headers.
 Especially the second one looks like one worth fixing.

 Ok for trunk?

 2014-01-30  Jakub Jelinek  ja...@redhat.com

 * config/i386/f16cintrin.h (_cvtsh_ss): Avoid -Wnarrowing
 warning.
 * config/i386/avx512fintrin.h (_mm512_mask_cvtusepi64_storeu_epi32):
 Add missing return type - void.

OK.

Should _cvtsh_ss fix be backported to other release branches?

Thanks,
Uros.


Re: [PATCH][testsuite] Avoid division by zero.

2014-01-30 Thread Uros Bizjak
On Thu, Jan 30, 2014 at 5:41 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 This patch removes possible division by zero.
 Make check passes. Ok for trunk?

 2014-01-30  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/m512-check.h: Use correct rounding values.

 ---
  gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
 b/gcc/testsuite/gcc.target/i386/m512-check.h
 index 3209039..8441784 100644
 --- a/gcc/testsuite/gcc.target/i386/m512-check.h
 +++ b/gcc/testsuite/gcc.target/i386/m512-check.h
 @@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE 
 *v,  \
 \
for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
  {  \
 -  VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \
 +  VALUE_TYPE rel_err;  \
 +  rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i];   \
if (((rel_err  0) ? -rel_err : rel_err)  eps)  \
 {   \
   err++;\

We won't get zero from exponential function, so expecting zero result
is flawed anyway.

If we would like to introduce universal epsilon comparisons into the
testsuite, then please read [1]. Being overly pedantic, the definition
should be |(v[i] - u.a[i]) / v[i]|, as stated in [2].

[1] 
http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
[2] http://en.wikipedia.org/wiki/Relative_error

Uros.


Re: [PATCH][testsuite] Avoid division by zero.

2014-01-31 Thread Uros Bizjak
On Fri, Jan 31, 2014 at 11:00 AM, Ilya Tocar tocarip.in...@gmail.com wrote:

  This patch removes possible division by zero.
  Make check passes. Ok for trunk?
 
  2014-01-30  Ilya Tocar  ilya.to...@intel.com
 
  * gcc.target/i386/m512-check.h: Use correct rounding values.
 
  ---
   gcc/testsuite/gcc.target/i386/m512-check.h | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
  diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
  b/gcc/testsuite/gcc.target/i386/m512-check.h
  index 3209039..8441784 100644
  --- a/gcc/testsuite/gcc.target/i386/m512-check.h
  +++ b/gcc/testsuite/gcc.target/i386/m512-check.h
  @@ -58,7 +58,8 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE 
  *v,  \
  \
 for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
   {  \
  -  VALUE_TYPE rel_err = (u.a[i] - v[i]) / v[i]; \
  +  VALUE_TYPE rel_err;  \
  +  rel_err = v[i] != 0 ? (u.a[i] - v[i]) / v[i] : u.a[i];   \
 if (((rel_err  0) ? -rel_err : rel_err)  eps)  \
  {   \
err++;\

 We won't get zero from exponential function, so expecting zero result
 is flawed anyway.

 If we would like to introduce universal epsilon comparisons into the
 testsuite, then please read [1]. Being overly pedantic, the definition
 should be |(v[i] - u.a[i]) / v[i]|, as stated in [2].

 [1] 
 http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
 [2] http://en.wikipedia.org/wiki/Relative_error


 We get zero from testing zero-masking. Currently we produce 0/0 = NaN.
 Comparison with NaN is always false, so tests pass. But I think that
 this should be fixed to avoid division by zero. As for being more
 pedantic about comparison, I doubt that its useful, when we use
 0.0001 as eps.

In this case, please add simple check for zero, with the above
comment. We don't test exp function, but masking.

Uros.


Re: [PATCH][testsuite] Avoid division by zero.

2014-01-31 Thread Uros Bizjak
On Fri, Jan 31, 2014 at 1:32 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  We won't get zero from exponential function, so expecting zero result
  is flawed anyway.
 
  If we would like to introduce universal epsilon comparisons into the
  testsuite, then please read [1]. Being overly pedantic, the definition
  should be |(v[i] - u.a[i]) / v[i]|, as stated in [2].
 
  [1] 
  http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
  [2] http://en.wikipedia.org/wiki/Relative_error
 
 
  We get zero from testing zero-masking. Currently we produce 0/0 = NaN.
  Comparison with NaN is always false, so tests pass. But I think that
  this should be fixed to avoid division by zero. As for being more
  pedantic about comparison, I doubt that its useful, when we use
  0.0001 as eps.

 In this case, please add simple check for zero, with the above
 comment. We don't test exp function, but masking.


 Something like this?

Yes, this is OK, with a small comment fix.

  gcc/testsuite/gcc.target/i386/m512-check.h | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/gcc/testsuite/gcc.target/i386/m512-check.h 
 b/gcc/testsuite/gcc.target/i386/m512-check.h
 index 3209039..a96a103 100644
 --- a/gcc/testsuite/gcc.target/i386/m512-check.h
 +++ b/gcc/testsuite/gcc.target/i386/m512-check.h
 @@ -58,6 +58,16 @@ check_rough_##UINON_TYPE (UINON_TYPE u, const VALUE_TYPE 
 *v, \
 \
for (i = 0; i  ARRAY_SIZE (u.a); i++)   \
  {  \
 +  /* We will always have v[i] == 0 == u.a[i]  for some i,  \

We can have ...

Thanks,
Uros.


<    1   2   3   4   5   6   7   8   9   10   >