Re: RFA: Use precision in get_mode_bounds()

2013-12-31 Thread nick clifton

Hi Volker,


apparently you added the wrong PR number to the patch. You probably
meant PR 59613. Would you mind fixing that in the ChangeLog?


Doh!  Sorry - fixed.

Cheers
  Nick



Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)

2013-12-31 Thread Jakub Jelinek
On Tue, Dec 31, 2013 at 08:39:02AM +0100, Richard Biener wrote:
 That said, fold_stmt callers have to be prepared to handle say a normal
 call becoming noreturn call, consider say:
 
 struct A { virtual int foo (); };
 struct B : public A { int foo () __attribute__((noreturn)); };
 int B::foo () { __builtin_exit (0); }
 int bar ()
 {
   B b;
   B *p = b;
   return p-foo ();
 }
 
 Is that a valid specialization though?

I think so, after all don't we set noreturn attribute automatically
even if it is missing when IPA can prove the function never returns?
If we fold_stmt after that, the above testcase even without explicit
noreturn attribute would need cfg cleanup.

Perhaps gimple_fold_call should punt and not change fndecl if !inplace
if some call flags have changed that would require cfg cleanup, making
at least fold_stmt_inplace callers not having to deal with it, and make
sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns true?

Jakub


[Patch, Fortran] PR 59023: [4.9 regression] ICE in gfc_search_interface with BIND(C)

2013-12-31 Thread Janus Weil
Hi all,

... and the reg-bashing continues: Here is a small patch to fix a
bind(c) problem. It essentially prevents 'resolve_global_procedure' to
be applied to bind(c) procedures.

Regtested on x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus



2013-12-31  Janus Weil  ja...@gcc.gnu.org

PR fortran/59023
* resolve.c (resolve_global_procedure): Don't apply to c-binding
procedures.
(gfc_verify_binding_labels): Remove duplicate line.

2013-12-31  Janus Weil  ja...@gcc.gnu.org

PR fortran/59023
* gfortran.dg/bind_c_procs_2.f90: New.
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 206252)
+++ gcc/fortran/resolve.c   (working copy)
@@ -2351,6 +2351,7 @@ resolve_global_procedure (gfc_symbol *sym, locus *
   if ((sym-attr.if_source == IFSRC_UNKNOWN
|| sym-attr.if_source == IFSRC_IFBODY)
gsym-type != GSYM_UNKNOWN
+   !gsym-binding_label
gsym-ns
gsym-ns-resolved != -1
gsym-ns-proc_name
@@ -10163,7 +10164,6 @@ gfc_verify_binding_labels (gfc_symbol *sym)
   gsym-where = sym-declared_at;
   gsym-sym_name = sym-name;
   gsym-binding_label = sym-binding_label;
-  gsym-binding_label = sym-binding_label;
   gsym-ns = sym-ns;
   gsym-mod_name = module;
   if (sym-attr.function)
! { dg-do compile }
!
! PR 59023: [4.9 regression] ICE in gfc_search_interface with BIND(C)
!
! Contributed by Francois-Xavier Coudert fxcoud...@gcc.gnu.org

  type t
integer hidden
  end type
  
contains

  subroutine bar
type(t) :: toto
interface
  integer function helper() bind(c)
  end function
end interface
toto = t(helper())
  end subroutine

end


Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.

2013-12-31 Thread Ilya Tocar
 RA figured out that operation with general registers results in less
 moves (you already have x1 in general reg). This is exaclty the reason
 why I think unspecs are not needed. It is the job of the compiler to
 choose most appropriate approach, and its behavior should be adjusted
 with appropriate cost functions.
 
 I guess that if you load x from memory, RA will allocate mask
 registers all the way to add insn.

I tried loading from memory and result is the same. Without unspec this
intrinsic is just return __A and is completely useless. As for RA
choosing best approach, big concern was generating klogic for normal
instructions, so current implementation of masks is conservative and RA
chooses gpr alternatives. So i think, that kmov intrinsic with unspec
has it's uses as a hint to complier. If you are against this approach
here is version without unspec.

---
 gcc/config/i386/avx512fintrin.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 0a43b1e..e0e74cf 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -14826,6 +14826,13 @@ _mm_maskz_fnmsub_ss (__mmask8 __U, __m128 __W, __m128 
__A, __m128 __B)
  _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __mmask16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_kmov (__mmask16 __A)
+{
+  return __A;
+}
+
 #ifdef __DISABLE_AVX512F__
 #undef __DISABLE_AVX512F__
 #pragma GCC pop_options
-- 
1.8.3.1



Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)

2013-12-31 Thread Richard Biener
Jakub Jelinek ja...@redhat.com wrote:
On Tue, Dec 31, 2013 at 08:39:02AM +0100, Richard Biener wrote:
 That said, fold_stmt callers have to be prepared to handle say a
normal
 call becoming noreturn call, consider say:
 
 struct A { virtual int foo (); };
 struct B : public A { int foo () __attribute__((noreturn)); };
 int B::foo () { __builtin_exit (0); }
 int bar ()
 {
   B b;
   B *p = b;
   return p-foo ();
 }
 
 Is that a valid specialization though?

I think so, after all don't we set noreturn attribute automatically
even if it is missing when IPA can prove the function never returns?
If we fold_stmt after that, the above testcase even without explicit
noreturn attribute would need cfg cleanup.

Perhaps gimple_fold_call should punt and not change fndecl if !inplace
if some call flags have changed that would require cfg cleanup, making
at least fold_stmt_inplace callers not having to deal with it, and make
sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns
true?

It would be nice to audit callers and finally document what callers are 
required to do ... I can look at this next week.  I agree that the in place 
variant should avoid these kind of side-effects.

Meanwhile your patch is ok.

Thanks,
Richard.

   Jakub




Re: [PATCH i386 9/8] [AVX512] Add forgotten kmovw insn, built-in and test.

2013-12-31 Thread Uros Bizjak
On Tue, Dec 31, 2013 at 11:22 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 RA figured out that operation with general registers results in less
 moves (you already have x1 in general reg). This is exaclty the reason
 why I think unspecs are not needed. It is the job of the compiler to
 choose most appropriate approach, and its behavior should be adjusted
 with appropriate cost functions.

 I guess that if you load x from memory, RA will allocate mask
 registers all the way to add insn.

 I tried loading from memory and result is the same. Without unspec this
 intrinsic is just return __A and is completely useless. As for RA
 choosing best approach, big concern was generating klogic for normal
 instructions, so current implementation of masks is conservative and RA
 chooses gpr alternatives. So i think, that kmov intrinsic with unspec
 has it's uses as a hint to complier. If you are against this approach
 here is version without unspec.

No, this explanation sounds reasonable.

I was trying to take parallels with MMX insns, where we were able to
avoid DImode MMX moves by penalizing various moves between register
sets. However, apart from DImode shifts, MMX didn't have equivalent
logical or arithmetic DImode operations.

When _mm512_kmov is used, user clearly wants to use mask registers and
operations on mask registers.

Based on these facts, I think that the lesser evil is to use UNSPEC
moves. So, your original patch is OK.

Thanks,
Uros.


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)

2013-12-31 Thread Jan Hubicka
 I think so, after all don't we set noreturn attribute automatically
 even if it is missing when IPA can prove the function never returns?
 If we fold_stmt after that, the above testcase even without explicit
 noreturn attribute would need cfg cleanup.
 
 Perhaps gimple_fold_call should punt and not change fndecl if !inplace
 if some call flags have changed that would require cfg cleanup, making
 at least fold_stmt_inplace callers not having to deal with it, and make
 sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns
 true?
 
 It would be nice to audit callers and finally document what callers are 
 required to do ... I can look at this next week.  I agree that the in place 
 variant should avoid these kind of side-effects.

Yep, I think with in-place-fold we really can't do this.
This make me wonder, how we arrange the statement to be folded later if we 
don't fold
statements we did not touched by a given pass?
Note that with LTO and invalid C++ program, I think devirtualization can return
a function of a different type just because the slots in virtual tables are used
differently in each unit.  We should not ICE here.

Honza
 
 Meanwhile your patch is ok.
 
 Thanks,
 Richard.
 
  Jakub
 


Re: [PATCH] Fix devirtualization ICE (PR tree-optimization/59622)

2013-12-31 Thread Richard Biener
Jan Hubicka hubi...@ucw.cz wrote:
 I think so, after all don't we set noreturn attribute automatically
 even if it is missing when IPA can prove the function never returns?
 If we fold_stmt after that, the above testcase even without explicit
 noreturn attribute would need cfg cleanup.
 
 Perhaps gimple_fold_call should punt and not change fndecl if
!inplace
 if some call flags have changed that would require cfg cleanup,
making
 at least fold_stmt_inplace callers not having to deal with it, and
make
 sure fold_stmt callers schedule cleanup_cfg when fold_stmt returns
 true?
 
 It would be nice to audit callers and finally document what callers
are required to do ... I can look at this next week.  I agree that the
in place variant should avoid these kind of side-effects.

Yep, I think with in-place-fold we really can't do this.
This make me wonder, how we arrange the statement to be folded later if
we don't fold
statements we did not touched by a given pass?

We don't fold all statements, that is a general issue. I have old patches that 
ensure we at least fold all statements after gimplifying. Propagators should 
fold what they propagate into. That leaves us with lto which could fold each 
statement after reading it. I've not done this to avoid differences with 
regarding to the gimplification issue.

Richard.

Note that with LTO and invalid C++ program, I think devirtualization
can return
a function of a different type just because the slots in virtual tables
are used
differently in each unit.  We should not ICE here.

Honza
 
 Meanwhile your patch is ok.
 
 Thanks,
 Richard.
 
 Jakub
 




Re: RFA: Use precision in get_mode_bounds()

2013-12-31 Thread Andreas Schwab
Richard Biener richard.guent...@gmail.com writes:

 Nick Clifton ni...@redhat.com wrote:
Hi Guys,

  I have tracked down a bug reported against the MSP430 (PR
  target/59613) which turns out to be a generic problem.  The function
  get_mode_bounds() in stor-layout.c computes the minimum and maximum
  values for a given mode, but it uses the bitsize of the mode not the
  precision.  This is incorrect if the two values are different, (which
  can happen when the target is using partial integer modes), since
  values outside of the precision cannot be stored in the mode, no
  matter how many bits the mode uses.

  The result, for the MSP430 in LARGE mode, is that calling
  get_mode_bounds() on PSImode returns a maximum value of -1 instead of
  1^20.  Note - what happens is that get_mode_bounds computes the
  maximum as (1  31) - 1 and then calls gen_int_mode to create an RTX
  for this value.  But gen_int_mode calls trunc_int_for_mode which uses
 GET_MODE_PRECISION to determine the sign bits and these bits overwrite
  some of the zero bits in (1  31) - 1 changing it into -1.

  The result of this behaviour is the bug that I was tracking down.
  simplify_const_relational_operation() was erroneously discarding a
  comparison of a pointer against zero, because get_mode_bounds told it
  that the pointer could only have values between 0 and -1...

  The proposed patch is simple - use the mode's precision and not its
  bitsize to compute the bounds.  This seems like an obvious fix, but
  I am not familiar with all of the uses of get_mode_bounds() so maybe
  there is a situation where using the bitsize is correct.

  There were no regressions and several fixed test cases with the
  msp430-elf toolchain that I was using, and no regressions with an
  i686-pc-linux-gnu toolchain.

  OK to apply ?

 Ok without the comment (this is obvious)

This completely breaks ia64, see PR59649.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [C++,doc] vector conditional expression

2013-12-31 Thread Marc Glisse

On Mon, 2 Dec 2013, Gerald Pfeifer wrote:


On Mon, 2 Dec 2013, Marc Glisse wrote:

Index: doc/extend.texi
===
+In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where
+@code{b} and @code{c} are vectors of the same type and @code{a} is an
+integer vector of the same size and number of elements as @code{b} and
+@code{c}

Why same size and number of elements in the above?  What is the
difference between these two?

(on x86_64)
A vector of 4 int and a vector of 4 long have the same number of elements but
not the same size.
A vector of 8 int and a vector of 4 long have the same size but not the same
number of elements.

For semantics, we want the same number of elements. To match the
hardware, we want the same size.


Ah, so it was good I asked. :-)  Thanks for your explanation.

It seems the way this is intended is
 integer vector of the (same size and number of elements) as
whereas I parsed it as
 (integer vector of the same size) and (number of elements) as
hence wondering what the difference between the size of the vector and
the number of elements was.


I think you had parsed it ok. In code terms:
size: sizeof(vec)
number of elements: sizeof(vec)/sizeof(vec[0])

Now when the number of elements is fixed, saying that vectors have the 
same (total) size or that they have elements of the same size is 
equivalent, so any interpretation is fine.



Rephrasing this as the same number and size of elements as or better
the same number of elements of the same size as may help avoid this.


Ok. Like this then?

+In C++, the ternary operator @code{?:} is available. @code{a?b:c}, where
+@code{b} and @code{c} are vectors of the same type and @code{a} is an
+integer vector with the same number of elements of the same size as @code{b}
+and @code{c}, computes all three arguments and creates a vector
+@code{@{a[0]?b[0]:c[0], a[1]?b[1]:c[1], @dots{}@}}.  Note that unlike in
+OpenCL, @code{a} is thus interpreted as @code{a != 0} and not @code{a  0}.
+As in the case of binary operations, this syntax is also accepted when
+one of @code{b} or @code{c} is a scalar that is then transformed into a
+vector. If both @code{b} and @code{c} are scalars and the type of
+@code{true?b:c} has the same size as the element type of @code{a}, then
+@code{b} and @code{c} are converted to a vector type whose elements have
+this type and with the same number of elements as @code{a}.

(though arguably one could parse this to mean that the elements of a
have the same size as the whole vector b, but I am fine with ignoring
this)

--
Marc Glisse


Re: [C,C++] integer constants in attribute arguments

2013-12-31 Thread Marc Glisse

Ping http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03822.html

On Sat, 30 Nov 2013, Marc Glisse wrote:


Hello,

we currently reject:

constexpr int s = 32;
typedef double VEC __attribute__ ((__vector_size__ (s)));

and similarly for other attributes, while we accept s+0 or (int)s, etc. The 
code is basically copied from the constructor attribute. The C front-end is 
much less forgiving than the C++ one, so we need to protect the call to 
default_conversion (as in PR c/59280), and for some reason one of the 
attributes can see a FUNCTION_DECL where others see an IDENTIFIER_NODE, I 
didn't try to understand why and just added that check to the code.


Bootstrap and testsuite on x86_64-linux-gnu.

2013-11-30  Marc Glisse  marc.gli...@inria.fr

PR c++/53017
PR c++/59211
gcc/c-family/
* c-common.c (handle_aligned_attribute, handle_alloc_size_attribute,
handle_vector_size_attribute, handle_nonnull_attribute): Call
default_conversion on the attribute argument.
gcc/cp/
* tree.c (handle_init_priority_attribute): Likewise.
gcc/
* doc/extend.texi (Function Attributes): Typo.
gcc/testsuite/
* c-c++-common/attributes-1.c: New testcase.
* g++.dg/cpp0x/constexpr-attribute2.C: Likewise.


--
Marc Glisse


[C++] PR59641: error recovery in vector condition

2013-12-31 Thread Marc Glisse

Hello,

here is a simple patch for error recovery. We already check the arguments 
earlier, but force_rvalue can replace them with errors.


Bootstrap+testsuite on x86_64-unknown-linux-gnu.

2014-01-01  Marc Glisse  marc.gli...@inria.fr

PR c++/59641
gcc/cp/
* call.c (build_conditional_expr_1): Check the return value of
force_rvalue.
gcc/testsuite/
* g++.dg/cpp0x/pr59641.C: New file.

--
Marc GlisseIndex: gcc/cp/call.c
===
--- gcc/cp/call.c   (revision 206265)
+++ gcc/cp/call.c   (working copy)
@@ -4395,20 +4395,26 @@ build_conditional_expr_1 (location_t loc
 
   orig_arg2 = arg2;
   orig_arg3 = arg3;
 
   if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (arg1)))
 {
   arg1 = force_rvalue (arg1, complain);
   arg2 = force_rvalue (arg2, complain);
   arg3 = force_rvalue (arg3, complain);
 
+  /* force_rvalue can return error_mark on valid arguments.  */
+  if (error_operand_p (arg1)
+ || error_operand_p (arg2)
+ || error_operand_p (arg3))
+   return error_mark_node;
+
   tree arg1_type = TREE_TYPE (arg1);
   arg2_type = TREE_TYPE (arg2);
   arg3_type = TREE_TYPE (arg3);
 
   if (TREE_CODE (arg2_type) != VECTOR_TYPE
   TREE_CODE (arg3_type) != VECTOR_TYPE)
{
  /* Rely on the error messages of the scalar version.  */
  tree scal = build_conditional_expr_1 (loc, integer_one_node,
orig_arg2, orig_arg3, complain);
Index: gcc/testsuite/g++.dg/cpp0x/pr59641.C
===
--- gcc/testsuite/g++.dg/cpp0x/pr59641.C(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/pr59641.C(working copy)
@@ -0,0 +1,8 @@
+// { dg-options -std=gnu++11 }
+typedef int T __attribute__((vector_size(2*sizeof(int;
+
+void foo(T r, const T a, const T b)
+{
+  constexpr T c = a  b; // { dg-error constant }
+  r = c ? a : b;
+}

Property changes on: gcc/testsuite/g++.dg/cpp0x/pr59641.C
___
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property


Re: [PATCH] Vectorization for store with negative step

2013-12-31 Thread Tejas Belagod

Bingfeng Mei wrote:

Hi,
I created PR59544 and here is the patch. OK to commit? 


Thanks,
Bingfeng


2013-12-18  Bingfeng Mei  b...@broadcom.com

PR tree-optimization/59544
 * tree-vect-stmts.c (perm_mask_for_reverse): Move before
   vectorizable_store. (vectorizable_store): Handle negative step.

2013-12-18  Bingfeng Mei  b...@broadcom.com

PR tree-optimization/59544
* gcc.target/i386/pr59544.c: New test



Hi Bingfeng,

Your patch seems to have a dependence calculation bug(I think) due to which 
gcc.dg/torture/pr52943.c regresses on aarch64. I've raised 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59651.


Do you think you could have a look?

Thanks,
Tejas.


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: 18 December 2013 11:47
To: Bingfeng Mei
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Vectorization for store with negative step

On Wed, Dec 18, 2013 at 12:34 PM, Bingfeng Mei b...@broadcom.com wrote:

Thanks, Richard. I will file a bug report and prepare a complete patch. For 
perm_mask_for_reverse function, should I move it before vectorizable_store or 
add a declaration.


Move it.

Richard.


Bingfeng
-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com]
Sent: 18 December 2013 11:26
To: Bingfeng Mei
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Vectorization for store with negative step

On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei b...@broadcom.com wrote:

Hi,
I was looking at some loops that can be vectorized by LLVM, but not GCC. One 
type of loop is with store of negative step.

void test1(short * __restrict__ x, short * __restrict__ y, short * __restrict__ 
z)
{
int i;
for (i=127; i=0; i--) {
x[i] = y[127-i] + z[127-i];
}
}

I don't know why GCC only implements negative step for load, but not store. I 
implemented a patch, very similar to code in vectorizable_load.

~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx

Without patch:
test1:
.LFB0:
addq$254, %rdi
xorl%eax, %eax
.p2align 4,,10
.p2align 3
.L2:
movzwl  (%rsi,%rax), %ecx
subq$2, %rdi
addw(%rdx,%rax), %cx
addq$2, %rax
movw%cx, 2(%rdi)
cmpq$256, %rax
jne .L2
rep; ret

With patch:
test1:
.LFB0:
vmovdqa .LC0(%rip), %xmm1
xorl%eax, %eax
.p2align 4,,10
.p2align 3
.L2:
vmovdqu (%rsi,%rax), %xmm0
movq%rax, %rcx
negq%rcx
vpaddw  (%rdx,%rax), %xmm0, %xmm0
vpshufb %xmm1, %xmm0, %xmm0
addq$16, %rax
cmpq$256, %rax
vmovups %xmm0, 240(%rdi,%rcx)
jne .L2
rep; ret

Performance is definitely improved here. It is bootstrapped for 
x86_64-unknown-linux-gnu, and has no additional regressions on my machine.

For reference, LLVM seems to use different instructions and slightly worse 
code. I am not so familiar with x86 assemble code. The patch is originally for 
our private port.
test1:  # @test1
.cfi_startproc
# BB#0: # %entry
addq$240, %rdi
xorl%eax, %eax
.align  16, 0x90
.LBB0_1:# %vector.body
# =This Inner Loop Header: Depth=1
movdqu  (%rsi,%rax,2), %xmm0
movdqu  (%rdx,%rax,2), %xmm1
paddw   %xmm0, %xmm1
shufpd  $1, %xmm1, %xmm1# xmm1 = xmm1[1,0]
pshuflw $27, %xmm1, %xmm0   # xmm0 = xmm1[3,2,1,0,4,5,6,7]
pshufhw $27, %xmm0, %xmm0   # xmm0 = xmm0[0,1,2,3,7,6,5,4]
movdqu  %xmm0, (%rdi)
addq$8, %rax
addq$-16, %rdi
cmpq$128, %rax
jne .LBB0_1
# BB#2: # %for.end
ret

Any comment?

Looks good to me.  One of the various TODOs in vectorizable_store I presume.

Needs a testcase and at this stage a bugreport that is fixed by it.

Thanks,
Richard.


Bingfeng Mei
Broadcom UK








[PATCH] Tiny predcom improvement (PR tree-optimization/59643)

2013-12-31 Thread Jakub Jelinek
Hi!

As written in the PR, I've been looking why is llvm 3.[34] so much faster
on Scimark2 SOR benchmark and the reason is that it's predictive commoning
or whatever it uses doesn't give up on the inner loop, while our predcom
unnecessarily gives up, because there are reads that could alias the write.

This simple patch improves the benchmark by 42%.  We already ignore
unsuitable dependencies for read/read, the patch extends that for unsuitable
dependencies for read/write by just putting the read (and anything in it's
component) into the bad component which is ignored.  pcom doesn't optimize
away the writes and will keep the potentially aliasing reads unmodified as
well.  Without the patch we'd merge the two components, and as
!determine_offset between the two DRs, it would mean the whole merged
component would be always unsuitable and thus ignored.  With the patch
we'll hopefully have some other reads with known offset to the write
and can optimize that, so the patch should always either handle what
it did before or handle perhaps some more cases.

The inner loop from the (public domain) benchmark is added in the two tests,
one runtime test and one test looking whether pcom actually optimized it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-12-31  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/59643
* tree-predcom.c (split_data_refs_to_components): If one dr is
read and one write, determine_offset fails and the write isn't
in the bad component, just put the read into the bad component.

* gcc.dg/pr59643.c: New test.
* gcc.c-torture/execute/pr59643.c: New test.

--- gcc/tree-predcom.c.jj   2013-12-31 12:50:47.169748385 +0100
+++ gcc/tree-predcom.c  2013-12-31 15:39:44.422297404 +0100
@@ -772,10 +772,37 @@ split_data_refs_to_components (struct lo
   bad = component_of (comp_father, n);
 
   /* If both A and B are reads, we may ignore unsuitable dependences.  */
-  if (DR_IS_READ (dra)  DR_IS_READ (drb)
-  (ia == bad || ib == bad
- || !determine_offset (dra, drb, dummy_off)))
-   continue;
+  if (DR_IS_READ (dra)  DR_IS_READ (drb))
+   {
+ if (ia == bad || ib == bad
+ || !determine_offset (dra, drb, dummy_off))
+   continue;
+   }
+  /* If A is read and B write or vice versa and there is unsuitable
+dependence, instead of merging both components into a component
+that will certainly not pass suitable_component_p, just put the
+read into bad component, perhaps at least the write together with
+all the other data refs in it's component will be optimizable.  */
+  else if (DR_IS_READ (dra)  ib != bad)
+   {
+ if (ia == bad)
+   continue;
+ else if (!determine_offset (dra, drb, dummy_off))
+   {
+ merge_comps (comp_father, comp_size, bad, ia);
+ continue;
+   }
+   }
+  else if (DR_IS_READ (drb)  ia != bad)
+   {
+ if (ib == bad)
+   continue;
+ else if (!determine_offset (dra, drb, dummy_off))
+   {
+ merge_comps (comp_father, comp_size, bad, ib);
+ continue;
+   }
+   }
 
   merge_comps (comp_father, comp_size, ia, ib);
 }
--- gcc/testsuite/gcc.dg/pr59643.c.jj   2013-12-31 15:34:48.584810067 +0100
+++ gcc/testsuite/gcc.dg/pr59643.c  2013-12-31 15:34:48.584810067 +0100
@@ -0,0 +1,15 @@
+/* PR tree-optimization/59643 */
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-tree-pcom-details } */
+
+void
+foo (double *a, double *b, double *c, double d, double e, int n)
+{
+  int i;
+  for (i = 1; i  n - 1; i++)
+a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i];
+}
+
+/* { dg-final { scan-tree-dump-times Before commoning: 1 pcom } } */
+/* { dg-final { scan-tree-dump-times Unrolling 2 times 1 pcom } } */
+/* { dg-final { cleanup-tree-dump pcom } } */
--- gcc/testsuite/gcc.c-torture/execute/pr59643.c.jj2013-12-31 
15:34:48.584810067 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr59643.c   2013-12-31 
15:34:48.584810067 +0100
@@ -0,0 +1,39 @@
+/* PR tree-optimization/59643 */
+
+#define N 32
+
+__attribute__((noinline, noclone)) void
+foo (double *a, double *b, double *c, double d, double e, int n)
+{
+  int i;
+  for (i = 1; i  n - 1; i++)
+a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i];
+}
+
+double expected[] = {
+  0.0, 10.0, 44.0, 110.0, 232.0, 490.0, 1020.0, 2078.0, 4152.0, 8314.0,
+  16652.0, 33326.0, 4.0, 133354.0, 266748.0, 533534.0, 1067064.0,
+  2134138.0, 4268300.0, 8536622.0, 17073256.0, 34146538.0, 68293116.0,
+  136586270.0, 273172536.0, 546345082.0, 1092690188.0, 2185380398.0,
+  4370760808.0, 8741521642.0, 17483043324.0, 6.0
+};
+
+int
+main ()
+{
+  int i;
+  double a[N], b[N], c[N];
+  if (__DBL_MANT_DIG__ = 35)
+return 0;
+  for (i = 0; i  N; i++)
+{
+  a[i] = (i  3) * 2.0;
+  b[i] 

[PATCH] CSE fix for UNSIGNED_FLOAT (PR rtl-optimization/59647)

2013-12-31 Thread Jakub Jelinek
Hi!

This patch fixes a 4.8 ICE (on trunk the bug went latent by GIMPLE
optimization changes).  The problem is we have a REG_EQUAL
(unsigned_float:DF (reg:SI ...)) and cse_process_notes_1 attempts
to replace the reg with CONST_INT that has the top most bit set.
As CONST_INTs are canonicalized by sign extending it, simplify_unary*
ICEs on it because it doesn't know the original mode of the argument
and thus if the e.g. -1 in the CONST_INT is 0xff, 0x, 0x,
0xULL.  cse_process_notes_1 already punts similarly
on SIGN_EXTEND/ZERO_EXTEND/SUBREG, this just handles UNSIGNED_FLOAT
similarly, but allows unsigned values that aren't negative when
canonicalized, one doesn't need the mode in that case, if the CONST_INT
was valid for the original mode, then it shouldn't have any bits set
beyond it's precision.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

2013-12-31  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/59647
* cse.c (cse_process_notes_1): Don't substitute negative VOIDmode
new_rtx into UNSIGNED_FLOAT rtxes.

* g++.dg/opt/pr59647.C: New test.

--- gcc/cse.c.jj2013-03-16 08:30:37.0 +0100
+++ gcc/cse.c   2013-12-31 15:31:16.469885722 +0100
@@ -6082,6 +6082,18 @@ cse_process_notes_1 (rtx x, rtx object,
return x;
   }
 
+case UNSIGNED_FLOAT:
+  {
+   rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed);
+   /* We don't substitute negative VOIDmode constants into these rtx,
+  since they would impede folding.  */
+   if (GET_MODE (new_rtx) != VOIDmode
+   || (CONST_INT_P (new_rtx)  INTVAL (new_rtx) = 0)
+   || (CONST_DOUBLE_P (new_rtx)  CONST_DOUBLE_HIGH (new_rtx) = 0))
+ validate_change (object, XEXP (x, 0), new_rtx, 0);
+   return x;
+  }
+
 case REG:
   i = REG_QTY (REGNO (x));
 
--- gcc/testsuite/g++.dg/opt/pr59647.C.jj   2013-12-31 15:16:16.259454995 
+0100
+++ gcc/testsuite/g++.dg/opt/pr59647.C  2013-12-31 15:15:51.0 +0100
@@ -0,0 +1,32 @@
+// PR rtl-optimization/59647
+// { dg-do compile }
+// { dg-options -O2 -fno-tree-vrp }
+// { dg-additional-options -msse2 -mfpmath=sse { target { { i?86-*-* 
x86_64-*-* }  ia32 } } }
+
+void f1 (int);
+void f2 ();
+double f3 (int);
+
+struct A
+{
+  int f4 () const
+  {
+if (a == 0)
+  return 1;
+return 0;
+  }
+  unsigned f5 ()
+  {
+if (!f4 ())
+  f2 ();
+return a;
+  }
+  int a;
+};
+
+void
+f6 (A *x)
+{
+  unsigned b = x-f5 ();
+  f1 (b - 1 - f3 (x-f5 () - 1U));
+}

Jakub


Re: [PATCH] CSE fix for UNSIGNED_FLOAT (PR rtl-optimization/59647)

2013-12-31 Thread Eric Botcazou
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?
 
 2013-12-31  Jakub Jelinek  ja...@redhat.com
 
   PR rtl-optimization/59647
   * cse.c (cse_process_notes_1): Don't substitute negative VOIDmode
   new_rtx into UNSIGNED_FLOAT rtxes.
 
   * g++.dg/opt/pr59647.C: New test.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH] Tiny predcom improvement (PR tree-optimization/59643)

2013-12-31 Thread Vladimir Makarov

On 12/31/2013, 2:04 PM, Jakub Jelinek wrote:

Hi!

As written in the PR, I've been looking why is llvm 3.[34] so much faster
on Scimark2 SOR benchmark and the reason is that it's predictive commoning
or whatever it uses doesn't give up on the inner loop, while our predcom
unnecessarily gives up, because there are reads that could alias the write.

This simple patch improves the benchmark by 42%.  We already ignore
unsuitable dependencies for read/read, the patch extends that for unsuitable
dependencies for read/write by just putting the read (and anything in it's
component) into the bad component which is ignored.  pcom doesn't optimize
away the writes and will keep the potentially aliasing reads unmodified as
well.  Without the patch we'd merge the two components, and as
!determine_offset between the two DRs, it would mean the whole merged
component would be always unsuitable and thus ignored.  With the patch
we'll hopefully have some other reads with known offset to the write
and can optimize that, so the patch should always either handle what
it did before or handle perhaps some more cases.


Congratulation, Jakub!

Scimark2 is always used by Phoronix to show how bad GCC in comparison 
with LLVM.  It is understandable. For some reasons phoronix is very 
biased to LLVM and, I'd say, a marketing machine for LLVM.


They use very small selected benchmarks.  Benchmarking is evil but I 
believe more SPEC2000/SPEC2006 which use bigger real programs.  With 
their approach, they could use whetstone instead of Scimark but the 
results would be very bad for LLVM (e.g. 64-bit code for -O3 on Haswell):


dfa:MWIPS  3705.311  -- LLVM3.3
nor:MWIPS  4587.555  -- GCC4.8

It would be nice to fix also Scimark2 LU GCC-4.8 degradation to finally 
stop this nonsense from Phoronix.




[buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT (was: nios2 port committed)

2013-12-31 Thread Jan-Benedict Glaw
On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com 
wrote:
 The nios2 port was just committed. Thanks to all that gave time and
 effort to review this.

Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being
redefined, see eg.
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923
as an example.


2013-12-31  Jan-Benedict Glaw  jbg...@lug-owl.de

* config/nios2/nios2.h (BITS_PER_UNIT): Don't define it.


diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h
index 8e6941b..f333be3 100644
--- a/gcc/config/nios2/nios2.h
+++ b/gcc/config/nios2/nios2.h
@@ -73,7 +73,6 @@
 #define BITS_BIG_ENDIAN 0
 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
-#define BITS_PER_UNIT 8
 #define BITS_PER_WORD 32
 #define UNITS_PER_WORD 4
 #define POINTER_SIZE 32


Ok?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: They that give up essential liberty to obtain temporary safety,
the second  : deserve neither liberty nor safety.  (Ben Franklin)


signature.asc
Description: Digital signature


[PATCH] Fix up --enable-checking=rtl SSE/AVX regressions

2013-12-31 Thread Jakub Jelinek
Hi!

In --enable-checking=rtl bootstrap most of SSE/AVX tests fail, because
*movmode_internal is using REGNO on operands that are e.g. MEM or
CONST_INT etc.  Fixed thusly, bootstrapped/regtested on x86_64-linux and
i686-linux, committed to trunk as obvious.

2014-01-01  Jakub Jelinek  ja...@redhat.com

* config/i386/sse.md (*movmode_internal): Guard
EXT_REX_SSE_REGNO_P (REGNO ()) uses with REG_P.

--- gcc/config/i386/sse.md.jj   2013-12-31 13:52:45.0 +0100
+++ gcc/config/i386/sse.md  2013-12-31 20:13:32.673520479 +0100
@@ -670,8 +670,10 @@ (define_insn *movmode_internal
 in avx512f, so we need to use workarounds, to access sse registers
 16-31, which are evex-only.  */
   if (TARGET_AVX512F  GET_MODE_SIZE (MODEmode)  64
-  (EXT_REX_SSE_REGNO_P (REGNO (operands[0]))
- || EXT_REX_SSE_REGNO_P (REGNO (operands[1]
+  ((REG_P (operands[0])
+   EXT_REX_SSE_REGNO_P (REGNO (operands[0])))
+ || (REG_P (operands[1])
+  EXT_REX_SSE_REGNO_P (REGNO (operands[1])
{
  if (memory_operand (operands[0], MODEmode))
{

Jakub


[PATCH] Fix check_effective_target_avx512f

2013-12-31 Thread Jakub Jelinek
Hi!

I've noticed that all abi-avx512f.exp tests fail when the assembler doesn't
support avx512f.  The problem is that the builtin has been optimized away
as unused and thus it passed even with assembler that doesn't support EVEX.

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.

Happy New Year to everyone!

2014-01-01  Jakub Jelinek  ja...@redhat.com

* lib/target-supports.exp (check_effective_target_avx512f): Make sure
the builtin isn't optimized away as unused.

--- gcc/testsuite/lib/target-supports.exp.jj2013-12-31 12:51:09.0 
+0100
+++ gcc/testsuite/lib/target-supports.exp   2014-01-01 01:00:28.086093259 
+0100
@@ -5176,9 +5176,9 @@ proc check_effective_target_avx512f { }
 return [check_no_compiler_messages avx512f object {
typedef double __m512d __attribute__ ((__vector_size__ (64)));
 
-   void _mm512_add (__m512d a)
+   __m512d _mm512_add (__m512d a)
{
-  __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+ return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
}
 } -O2 -mavx512f ]
 }

Jakub


pch bug fix

2013-12-31 Thread Mike Stump
In testing for wide-int, we discovered that someone seems to have blown pch.  
The problem is that the optabs field is not a string.  It's size is not 
determined by strlen.  strlen are the semantics gty attaches to unsigned char * 
data.  By defeating the type of optabs to being any other type, then the length 
is determined by the ggc subsystem which just knows the length of the data (2K 
on my system).

I don't really like the type lying going on.  I'd prefer if a tree-core person 
would get honest with the types.

Another style of fix would be to decide that atomic in this case means 
something else, then, we'd have to have a gty person implement the new 
semantics.  Until that is done, this problem is so bad, that the below should 
go in until someone can fix gty, if someone doesn't want to be honest with the 
type system.

This bug comes across as a random memory smasher and is incredibly annoying to 
track down.  There is non-determinism in the process and will causes 
non-deterministic memory crashes, but, only on pch use.  To track down why, you 
have to trace back through the pch writing process and realize the data in 
memory is completely valid, is it only the meta information about that data 
that pch uses from the GTY world that causes any problem.

Ok?

Index: optabs.c
===
--- optabs.c(revision 206086)
+++ optabs.c(working copy)
@@ -6242,7 +6242,7 @@
 
   /* If the optabs changed, record it.  */
   if (memcmp (tmp_optabs, this_target_optabs, sizeof (struct target_optabs)))
-TREE_OPTIMIZATION_OPTABS (optnode) = (unsigned char *) tmp_optabs;
+TREE_OPTIMIZATION_OPTABS (optnode) = (struct target_optabs_S *) tmp_optabs;
   else
 {
   TREE_OPTIMIZATION_OPTABS (optnode) = NULL;
Index: tree-core.h
===
--- tree-core.h (revision 206086)
+++ tree-core.h (working copy)
@@ -1559,6 +1559,9 @@
   struct tree_statement_list_node *tail;
 };
 
+struct GTY(()) target_optabs_S {
+  unsigned char e[1];
+};
 
 /* Optimization options used by a function.  */
 
@@ -1570,7 +1573,7 @@
 
   /* Target optabs for this set of optimization options.  This is of
  type `struct target_optabs *'.  */
-  unsigned char *GTY ((atomic)) optabs;
+  struct target_optabs_S *GTY((atomic)) optabs;
 
   /* The value of this_target_optabs against which the optabs above were
  generated.  */



Re: [buildrobot] [PATCH] Fix redefinition of BITS_PER_UNIT (was: nios2 port committed)

2013-12-31 Thread Mike Stump
On Dec 31, 2013, at 12:26 PM, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 On Tue, 2013-12-31 15:24:52 +0800, Chung-Lin Tang clt...@codesourcery.com 
 wrote:
 The nios2 port was just committed. Thanks to all that gave time and
 effort to review this.
 
 Just a heads-up: I see a lot of warnings about BITS_PER_UNIT being
 redefined, see eg.
 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=74923
 as an example.
 
 
 2013-12-31  Jan-Benedict Glaw  jbg...@lug-owl.de
 
   * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it.
 
 diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h
 index 8e6941b..f333be3 100644
 --- a/gcc/config/nios2/nios2.h
 +++ b/gcc/config/nios2/nios2.h
 @@ -73,7 +73,6 @@
 #define BITS_BIG_ENDIAN 0
 #define BYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN != 0)
 -#define BITS_PER_UNIT 8
 #define BITS_PER_WORD 32
 #define UNITS_PER_WORD 4
 #define POINTER_SIZE 32
 
 
 Ok?

Ok.

Re: pch bug fix

2013-12-31 Thread Jakub Jelinek
On Tue, Dec 31, 2013 at 10:39:58PM -0800, Mike Stump wrote:
 In testing for wide-int, we discovered that someone seems to have blown
 pch.  The problem is that the optabs field is not a string.  It's size is
 not determined by strlen.  strlen are the semantics gty attaches to
 unsigned char * data.  By defeating the type of optabs to being any other
 type, then the length is determined by the ggc subsystem which just knows
 the length of the data (2K on my system).
 
 I don't really like the type lying going on.  I'd prefer if a tree-core
 person would get honest with the types.
 
 Another style of fix would be to decide that atomic in this case means
 something else, then, we'd have to have a gty person implement the new
 semantics.  Until that is done, this problem is so bad, that the below
 should go in until someone can fix gty, if someone doesn't want to be
 honest with the type system.
 
 This bug comes across as a random memory smasher and is incredibly
 annoying to track down.  There is non-determinism in the process and will
 causes non-deterministic memory crashes, but, only on pch use.  To track
 down why, you have to trace back through the pch writing process and
 realize the data in memory is completely valid, is it only the meta
 information about that data that pch uses from the GTY world that causes
 any problem.

Thanks for tracking this down, this sounds like PR59436.  How have you
managed to track it down?  I also wonder why it doesn't seem to affect 4.8
when it also has the same change.

Based on the comments in gengtype.c, I'd expect
  unsigned char *GTY ((atomic, length (sizeof (struct target_optabs 
optabs;
to work (or, perhaps instead of sizeof directly call a function that is
defined in optabs.c and returns that value, perhaps optabs.h isn't included
where it should be), but unfortunately it seems to be rejected right now.

So, the question is how hard would it be to support that.

Jakub