Remove obsolete comment

2013-09-26 Thread Eric Botcazou
The first part is obsolete since r91570 at least, for the second part about 
MEM_KEEP_ALIAS_SET_P it's more recent.

Tested on x86_64-suse-linux, applied on the mainline.


2013-09-26  Eric Botcazou  ebotca...@adacore.com

* expr.c (expand_assignment): Remove obsolete comment.


-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 202912)
+++ expr.c	(working copy)
@@ -4836,11 +4836,7 @@ expand_assignment (tree to, tree from, b
 	  /* If the field is at offset zero, we could have been given the
 		 DECL_RTX of the parent struct.  Don't munge it.  */
 	  to_rtx = shallow_copy_rtx (to_rtx);
-
 	  set_mem_attributes_minus_bitpos (to_rtx, to, 0, bitpos);
-
-	  /* Deal with volatile and readonly fields.  The former is only
-		 done for MEM.  Also set MEM_KEEP_ALIAS_SET_P if needed.  */
 	  if (volatilep)
 		MEM_VOLATILE_P (to_rtx) = 1;
 	}

Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-09-26 Thread Kugan Vivekanandarajah
Hi,

This is the updated patch for expanding gimple stmts without zer/sign
extensions when it is safe to do that. This is based on the
 latest changes to propagating value range information to SSA_NAMEs
and addresses review comments from Eric.

Bootstrapped and regtested on x86_64-unknown-linux-gnu and arm-none
linux-gnueabi. Is this OK ?

Thanks,
Kugan


ChangeLog
Description: Binary data


patch.diff
Description: Binary data


RE: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-26 Thread Bernd Edlinger
Hi,

On Wed, 25 Sep 2013 16:01:02, Martin Jambor wrote:
 Hi,

 On Wed, Sep 25, 2013 at 11:05:21AM +0200, Richard Biener wrote:
 On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote:
 On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger
 bernd.edlin...@hotmail.de wrote:
 Hi,

 with the attached patch the read-side of the out of bounds accesses are 
 fixed.
 There is a single new test case pr57748-3.c that is derived from Martin's 
 test case.
 The test case does only test the read access and does not depend on part 
 1 of the
 patch.

 This patch was boot-strapped and regression tested on 
 x86_64-unknown-linux-gnu.

 Additionally I generated eCos and an eCos-application (on ARMv5 using 
 packed
 structures) with an arm-eabi cross compiler, and looked for differences 
 in the
 disassembled code with and without this patch, but there were none.

 OK for trunk?

 Index: gcc/expr.c
 ===
 --- gcc/expr.c (revision 202778)
 +++ gcc/expr.c (working copy)
 @@ -9878,7 +9878,7 @@
  modifier != EXPAND_STACK_PARM
 ? target : NULL_RTX),
 VOIDmode,
 - modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);
 + EXPAND_MEMORY);

 /* If the bitfield is volatile, we want to access it in the
 field's mode, not the computed mode.

 context suggests that we may arrive with EXPAND_STACK_PARM here
 which is a correctness modifier (see its docs). But I'm not too familiar
 with the details of the various expand modifiers, Eric may be though.

 That said, I still believe that fixing the misalign path in 
 expand_assignment
 would be better than trying to avoid it. For this testcase the issue is
 again that expand_assignment passes the wrong mode/target to the
 movmisalign optab.

 Perhaps I'm stating the obvious, but this is supposed to address a
 separate bug in the expansion of the RHS (as opposed to the first bug
 which was in the expansion of the LHS), and so we would have to make
 expand_assignment to also examine potential misalignments in the RHS,
 which it so far does not do, despite its complexity.

 Having said that, I also dislike the patch, but I am quite convinced
 that if we allow non-BLK structures with zero sized arrays, the fix
 will be ugly - but I'll be glad to be shown I've been wrong.

 I don't think the issues have anything to do with zero sized arrays
 or non-BLK structures. The issue is just we are feeding movmisaling
 with the wrong mode (the mode of the base object) if we are expanding
 a base object which is unaligned and has non-BLK mode.

 I disagree. Consider a slightly modified testcase:

 #include stdlib.h
 typedef long long V
 __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

 #if 1
 typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
 struct __attribute__((packed)) T { char c; P s; };
 #else
 typedef struct S { V a; V b[0]; } P;
 struct T { char c; P s; };
 #endif

 void __attribute__((noinline, noclone))
 good_value (V b)
 {
 if (b[0] != 3 || b[1] != 4)
 __builtin_abort ();
 }

 void __attribute__((noinline, noclone))
 check (P *p)
 {
 good_value (p-b[0]);
 }

 int
 main ()
 {
 struct T *t = (struct T *) calloc (128, 1);
 V a = { 3, 4 };
 t-s.b[0] = a;
 check (t-s);
 free (t);
 return 0;
 }

 The problem is the expansion of the memory load in function check.
 All involved modes, the one of the base structure and of the loaded
 component, are V2DI so even if we somehow mixed them up, in this
 example it should not matter, yet the unaligned load gives wrong
 results.

 I'm still convinced that the problem is that we have a V2DImode
 structure which is larger that the mode tells and we want to load the
 data from outside of its mode. That is only happening because zero
 sized arrays.

Yes, this is another good example, and it passes vector values on the
stack to a function. Again my patch produces working code, while
the current trunk produces just completely _insane_ code.

And again, this is not only a problem of structures with zero-sized
arrays at the end. Remember my previous example code:
On ARM (or anything with STRICT_ALIGNMENT) this union has the
same problems:

/* PR middle-end/57748 */
/* arm-eabi-gcc -mcpu=cortex-a9 -O3 */
#include stdlib.h

union  x
{
  short a[2];
  char x[4];
} __attribute__((packed, aligned(4))) ;
typedef volatile union  x *s;

void __attribute__((noinline, noclone))
check (void)
{
  s xx=(s)(0x8002);
  /* although volatile xx-x[3] reads 4 bytes here */
  if (xx-x[3] != 3)
    abort ();
}

void __attribute__((noinline, noclone))
foo (void)
{
  s xx=(s)(0x8002);
  xx-x[3] = 3;
}

int
main ()
{
  foo ();
  check ();
  return 0;
}



 So what we maybe need is another expand modifier that tells us
 to not use movmisalign when expanding the base object.

 In that case we can just as well use EXPAND_MEMORY. If so, I'd do
 that only when there is a zero sized array involved in order not to
 avoid 

Re: [PATCH, LRA, AARCH64] Switching LRA on for AArch64

2013-09-26 Thread Richard Earnshaw
On 24/09/13 10:03, Yvan Roux wrote:
 Hi,
 
 The following patch switch LRA on for AArch64.  The patch introduces
 an undocumented option -mlra to use LRA instead of  reload, for a
 testing purpose.  Please notice that this patch is dependent on the
 one submitted in the thread below:
 
 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html
 
 Thanks,
 Yvan
 
 2013-09-24  Yvan Roux  yvan.r...@linaro.org
 
 * config/aarch64/aarch64.opt (mlra): New option.
 * config/aarch64/aarch64.c (aarch64_lra_p): New function.
 (TARGET_LRA_P): Define.=
 
 

+mlra
+Target Report Var(aarch64_lra_flag) Init(1) Save
+Use LRA instead of reload

Please mark this (transitional) in the doc string.

Otherwise OK.

R.




Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set

2013-09-26 Thread Eric Botcazou
 Eric, the current way component_uses_parent_alias_set is called from
 get_alias_set causes the reference tree chain to be walked O(n^2)
 in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT
 reference in the tree chain.  Also the comment above
 component_uses_parent_alias_set doesn't seem to match behavior
 in get_alias_set.  get_alias_set ends up using exactly the type of
 the parent
 of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference
 instead of the alias set provided by the object at the heart of T

That's what the object at the heart of T means I think: given the loop in 
get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the 
parent of the outermost non-addressable component in the component chain 
(outermost in the type layout sense), which is what is wanted: when you go 
down the component chain from the base object to find the alias set, you need 
to stop at the first non-addressable component.  That's what is achieved in 
the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the base 
object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable component, 
so that the alias set isn't touched any more after that.

But I agree that the head comment and the interface of c_u_p_a_s are awkward, 
to say the least, and the quadratic aspect isn't very nice either.

 I also fail to see why component_uses_parent_alias_set invokes
 get_alias_set (is that to make 'a.c' in struct { char c; } a;
 use the alias set of 'a' instead of the alias set of 'char'?

Yes, I think it's intended to stop the walk as soon as you have a type with 
alias set 0: if 'a' has alias set 0, then 'a.c' will have it. 

 Or is it for correctness reasons in case there is a ref-all
 indirect ref at the base?  For the former I believe it doesn't
 work because it only checks the non-outermost type).

This code predates ref-all.  

 So, the following patch removes the quadraticness by returning
 the parent of the innermost non-addressable (or alias-set zero)
 reference component instead of just a flag.  That changes behavior
 for the alias-set zero case but still doesn't match the overall
 function comment.

The change is fine, but let's rename the function and reword the comment (and 
agree on the terminology, I think it's better to use outermost here as already 
used in get_alias_set/reference_alias_ptr_type_1):

/* Return the outermost parent of component present in the chain of component
   references handled by get_inner_reference in T with the following property:
 - the component is non-addressable, or
 - the parent has alias set zero,
   or NULL_TREE if no such parent exists.  In the former cases, the alias set
   of this parent is the alias set that must be used for T itself.  */

tree
component_uses_parent_alias_set_from (const_tree t)

However, I think the handling of the parent has alias set zero is wrong in 
your patch, you should test

  if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0)

 The only other use besides from get_alias_set is to set
 MEM_KEEP_ALIAS_SET_P -

It means that the alias set already on the MEM may not be changed afterwards, 
even if you look into its sub-components later.

 whatever that is exactly for, I can
 see a single non-obvious use in store_constructor_field
 (didn't bother to lookup how callers specify the alias_set argument).
 In
 
 if (MEM_P (to_rtx)  !MEM_KEEP_ALIAS_SET_P (to_rtx)
  DECL_NONADDRESSABLE_P (field))
   {
 to_rtx = copy_rtx (to_rtx);
 MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
   }
 
 we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case
 MEM_KEEP_ALIAS_SET_P is dropped).

Do you mean dropped in set_mem_attributes_minus_bitpos?  No, I don't think we 
can do that.

 Btw, checks like
 
  if (!MEM_KEEP_ALIAS_SET_P (to_rtx)  MEM_ALIAS_SET (to_rtx) != 0)
 set_mem_alias_set (to_rtx, alias_set);
 
 seem to miss MEM_ALIAS_SET (to_rtx) being ALIAS_SET_MEMORY_BARRIER?
 Or alias_set being zero / ALIAS_SET_MEMORY_BARRIER?

This code predates ALIAS_SET_MEMORY_BARRIER.

-- 
Eric Botcazou


RE: [PATCH] fortran/PR58113

2013-09-26 Thread Bernd Edlinger
On Wed, 25 Sep 2013 21:00:33, Tobias Burnus wrote:

 Bernd Edlinger wrote:
 this test case fails very often, and the reason is not in GCC but
 in a missing glibc rounding support for strtod.

 This patch fixes the test case, to first determine if the
 rounding support is available. This is often the case for real(16)
 thru the libquadmath. So even in cases where the test case usually
 fails it still tests something with this patch.

 Ok for trunk?


 First, for Fortran patches, it helps if you CC fortran@ as otherwise the
 email might be missed.

 Your change doesn't really directly check whether strtod handles
 rounding but whether libgfortran (invoking strtod) supports up/down
 rounding.

 Hence, after your patch, one effectively checks - given that up/down
 rounding works (or at least produces different values) - that the
 nearest/zero/up/down give the correct result.

 As only few strtod implementations currently support rounding, it is
 probably the best approach. However, I think it merits a comment making
 clear what it now checked (and what isn't). Maybe something along my
 paragraph (but polished) - and removing the then obsoleted parts of the
 existing comment.

 Except for the comment update, the patch looks fine to me.


OK, I used some of your wordings to update the comment.

I assume it's OK now.

 Tobias

 PS: I wonder whether there is a good way to do rounded strtod without
 relying on the system's libc to handle it.


Apparently the real(16) aka libquadmath has an own implementation
for strtod, that handles all rounding stuff the right way.

Probably it should be possible to locate that code and rework it for the
other possible real precisions, that currently rely on glibc's strtod.

However I will likely not have time for that :(

Bernd.2013-09-25  Bernd Edlinger  bernd.edlin...@hotmail.de

PR fortran/58113
* gfortran.dg/round_4.f90: Check for rounding support.



patch-round4.diff
Description: Binary data


Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-26 Thread Eric Botcazou
 So I still think my patch does the right thing.
 
 The rationale is:
 
   = expand_expr (tem,
  (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE
COMPLETE_TYPE_P (TREE_TYPE (tem))
(TREE_CODE (TYPE_SIZE (TREE_TYPE (tem)))
   != INTEGER_CST)
modifier != EXPAND_STACK_PARM
   ? target : NULL_RTX),
  VOIDmode,
  EXPAND_MEMORY);
 
 returns the address of the structure in question,
 we can add offset, bitoffset, and access the memory
 in the right mode and alignment information is
 passed to the backend via  MEM_ALIGN (op0).

But there are conceptually no reasons to require a MEM here.  Look at all the 
code just below the block.  Given how hard it is to eliminate spills to memory 
in RTL once they are generated, this shouldn't be taken lightly.

-- 
Eric Botcazou


Commit: MSP430: Code gen improvements

2013-09-26 Thread Nick Clifton
Hi Guys,

  I am checking in the attached patch in order to fix some code
  generation problems exposed by running the GCC testsuite for the
  MSP430.

  The patch basically adds a definition of TARGET_PRINT_OPERAND_ADDRESS
  as some asm() statements needs this.  It also adds a sign extending
  PSI to SI conversion pattern for when signed pointer values need to be
  stored in an SI pair of registers.

Cheers
  Nick

gcc/ChangeLog
2013-09-26  Nick Clifton  ni...@redhat.com

* config/msp430/msp430.c (msp430_expand_epilogue): Fix compile
time warning message.
(msp430_print_operand_raw): Delete unused letter parameter.
(TARGET_PRINT_OPERAND_ADDRESS): Define.
(msp430_print_operand_address): New function.
(msp430_print_operand): Move address printing code from here to
new function.
* config/msp430/msp430.md (movsipsi2): Add comment in generated
assembler.
(zero_extendpsisi2): Likewise.
(extendpsisi2): New pattern.
(andneghi3): New pattern.



msp430.patch.xz
Description: application/xz


Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set

2013-09-26 Thread Richard Biener
On Thu, 26 Sep 2013, Eric Botcazou wrote:

  Eric, the current way component_uses_parent_alias_set is called from
  get_alias_set causes the reference tree chain to be walked O(n^2)
  in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT
  reference in the tree chain.  Also the comment above
  component_uses_parent_alias_set doesn't seem to match behavior
  in get_alias_set.  get_alias_set ends up using exactly the type of
  the parent
  of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference
  instead of the alias set provided by the object at the heart of T
 
 That's what the object at the heart of T means I think: given the loop in 
 get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the 
 parent of the outermost non-addressable component in the component chain 
 (outermost in the type layout sense), which is what is wanted: when you go 
 down the component chain from the base object to find the alias set, you need 
 to stop at the first non-addressable component.  That's what is achieved in 
 the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the base 
 object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable 
 component, 
 so that the alias set isn't touched any more after that.
 
 But I agree that the head comment and the interface of c_u_p_a_s are awkward, 
 to say the least, and the quadratic aspect isn't very nice either.

Ok.

  I also fail to see why component_uses_parent_alias_set invokes
  get_alias_set (is that to make 'a.c' in struct { char c; } a;
  use the alias set of 'a' instead of the alias set of 'char'?
 
 Yes, I think it's intended to stop the walk as soon as you have a type with 
 alias set 0: if 'a' has alias set 0, then 'a.c' will have it. 

That wasn't the question - I was asking if we have a struct type
with non-zero alias set, like struct { char c; int i } a; then
if we have an access like a.c does the code want to use the
alias set of 'a' (non-zero) for the access for optimization purposes?
It seems not, given your comment below...

[...]

 However, I think the handling of the parent has alias set zero is wrong in 
 your patch, you should test
 
   if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0)

but I fail to see how this can happen in practice?  It can happen
for ref-all bases but that case is handled separately.

But ok, I'll leave the code as-is functionality wise, we should change
semantics as followup if at all.

  The only other use besides from get_alias_set is to set
  MEM_KEEP_ALIAS_SET_P -
 
 It means that the alias set already on the MEM may not be changed afterwards, 
 even if you look into its sub-components later.
 
  whatever that is exactly for, I can
  see a single non-obvious use in store_constructor_field
  (didn't bother to lookup how callers specify the alias_set argument).
  In
  
  if (MEM_P (to_rtx)  !MEM_KEEP_ALIAS_SET_P (to_rtx)
   DECL_NONADDRESSABLE_P (field))
{
  to_rtx = copy_rtx (to_rtx);
  MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
}
  
  we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case
  MEM_KEEP_ALIAS_SET_P is dropped).
 
 Do you mean dropped in set_mem_attributes_minus_bitpos?  No, I don't think we 
 can do that.

Yeah, I thought of dropping it completely - we know the effective
alias-set of the load/store stmt we are expanding via get_alias_set
of the expression.  I don't see why we need to invent new alias sets
in any place down the road when creating sub-accesses (either from
storing constructor components or from storing bitfield pieces).

Thanks for the comments, I'll prepare a patch to only remove the
quadraticness without changing anything else.

Richard.


Re: [PATCH, RTL] Prepare ARM build with LRA

2013-09-26 Thread Iain Sandoe
Hi Yvan,

On 24 Sep 2013, at 09:29, Yvan Roux wrote:

 On 11 September 2013 21:08, Yvan Roux yvan.r...@linaro.org wrote:
 Here is the new patch discussed in the other thread.
 
 Thanks
 Yvan
 
 2013-09-11  Yvan Roux  yvan.r...@linaro.org
Vladimir Makarov  vmaka...@redhat.com
 
* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield 
 operations
from the least significant bit.
(strip_address_mutations): Add bitfield operations handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.

as discussed on irc, this part (applied as r202914) breaks bootstrap on 
powerpc-darwin9 (and, presumably, other BE targets) with a signed/unsigned 
compare error at rtlanal.c:5482

below is a trivial patch, which makes both parts of test signed.  
With this, bootstrap completes on powerpc-darwin9 - however, you might want to 
check that it still does what you intended.

thanks
Iain

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 24cbcd2..0349bcc 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5476,7 +5476,7 @@ lsb_bitfield_op_p (rtx x)
   if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
 {
   enum machine_mode mode = GET_MODE (XEXP (x, 0));
-  unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
+  HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
   HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
 
   return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));



[PATCH] Fix PR58539

2013-09-26 Thread Richard Biener

The vectorizer does not honor the fact that debug statements
do not participate in loop-closed-SSA construction and thus
a SSA name can have outside loop uses that are not in the
loop-closed PHI node but in a debug statement.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-09-26  Richard Biener  rguent...@suse.de

PR tree-optimization/58539
* tree-vect-loop.c (vect_create_epilog_for_reduction): Honor
the fact that debug statements are not taking part in loop-closed
SSA construction.

* gcc.dg/torture/pr58539.c: New testcase.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 202883)
--- gcc/tree-vect-loop.c(working copy)
*** vect_finalize_reduction:
*** 4411,4417 
   result.  (The reduction result is expected to have two immediate 
uses -
   one at the latch block, and one at the loop exit).  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
! if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p
phis.safe_push (USE_STMT (use_p));
  
/* While we expect to have found an exit_phi because of loop-closed-ssa
--- 4411,4418 
   result.  (The reduction result is expected to have two immediate 
uses -
   one at the latch block, and one at the loop exit).  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
! if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p)))
!!is_gimple_debug (USE_STMT (use_p)))
phis.safe_push (USE_STMT (use_p));
  
/* While we expect to have found an exit_phi because of loop-closed-ssa
*** vect_finalize_reduction:
*** 4541,4547 
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
  {
if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p
! phis.safe_push (USE_STMT (use_p));
else
  {
if (double_reduc  gimple_code (USE_STMT (use_p)) == 
GIMPLE_PHI)
--- 4542,4551 
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
  {
if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p
!   {
! if (!is_gimple_debug (USE_STMT (use_p)))
!   phis.safe_push (USE_STMT (use_p));
!   }
else
  {
if (double_reduc  gimple_code (USE_STMT (use_p)) == 
GIMPLE_PHI)
*** vect_finalize_reduction:
*** 4551,4557 
FOR_EACH_IMM_USE_FAST (phi_use_p, phi_imm_iter, phi_res)
  {
if (!flow_bb_inside_loop_p (loop,
!  gimple_bb (USE_STMT 
(phi_use_p
  phis.safe_push (USE_STMT (phi_use_p));
  }
  }
--- 4555,4562 
FOR_EACH_IMM_USE_FAST (phi_use_p, phi_imm_iter, phi_res)
  {
if (!flow_bb_inside_loop_p (loop,
!  gimple_bb (USE_STMT (phi_use_p)))
!  !is_gimple_debug (USE_STMT (phi_use_p)))
  phis.safe_push (USE_STMT (phi_use_p));
  }
  }
Index: gcc/testsuite/gcc.dg/torture/pr58539.c
===
*** gcc/testsuite/gcc.dg/torture/pr58539.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr58539.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do compile } */
+ /* { dg-options -g } */
+ 
+ int a, b;
+ 
+ extern void baz (int);
+ 
+ int foo (int p)
+ {
+   return p ? p : 1;
+ }
+ 
+ void bar ()
+ {
+   int *c = a, *d = a;
+   for (b = 0; b  12; b++)
+ *d |= 1;
+   foo (*c);
+   baz (*c  1);
+ }


Re: [PATCH] Improve out-of-SSA coalescing

2013-09-26 Thread Richard Biener
On Wed, 25 Sep 2013, Steven Bosscher wrote:

 On Wednesday, September 25, 2013, Richard Biener rguent...@suse.de wrote:
  On Wed, 25 Sep 2013, Richard Biener wrote:
 
 
  This loosens the restriction of only coalescing SSA names with
  the same base variable by ignoring that restriction for DECL_INGORED_P
  base variables (ok, all of them can and should be anonymous SSA names
  now, but code obviously hasn't catched up 100%).
 
  This improves the code generated for the loop in the testcase to
 
  fallthru
  .p2align 4,,10
  .p2align 3
  .L4:
  xorps   %xmm1, %xmm1
  cvtsi2ss%eax, %xmm1
  addl$1, %eax
  cmpl%edi, %eax
  addss   %xmm1, %xmm0
  jne .L4
 
  from
 
  jmp .L4
  .p2align 4,,10
  .p2align 3
  .L6:
  movaps  %xmm0, %xmm1
  .L4:
  xorps   %xmm0, %xmm0
  cvtsi2ss%eax, %xmm0
  addl$1, %eax
  cmpl%edi, %eax
  addss   %xmm1, %xmm0
  jne .L6
 
  avoiding the copy on the backedge and the loop entry jump.  Overall
  this is similar to what Jeff was after with his latest adjustment
  of this code.
 
  Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.
 
  For some reason it miscompiles GCC itself.  Hmm.  Cannot spot the
  obvious error yet.
 
 Try reverting the gcc_assert change. With the checking_assert there will be
 different code for checking enabled or disabled.

Nah, it was us coalescing PARM_DECLs and VAR_DECLs (and RESULT_DECLs).
Fixed by restricting this handling to VAR_DECLs only.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-09-25  Richard Biener  rguent...@suse.de

* tree-ssa-live.c (var_map_base_init): Handle SSA names with
DECL_IGNORED_P base VAR_DECLs like anonymous SSA names.
(loe_visit_block): Use gcc_checking_assert.
* tree-ssa-coalesce.c (create_outofssa_var_map): Use
gimple_assign_ssa_name_copy_p.
(gimple_can_coalesce_p): Adjust according to the var_map_base_init
change.

* gcc.dg/tree-ssa/coalesce-2.c: New testcase.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c.orig2013-09-26 11:50:32.0 +0200
--- gcc/tree-ssa-live.c 2013-09-26 12:11:49.412951758 +0200
*** var_map_base_init (var_map map)
*** 104,110 
struct tree_int_map **slot;
unsigned baseindex;
var = partition_to_var (map, x);
!   if (SSA_NAME_VAR (var))
m-base.from = SSA_NAME_VAR (var);
else
/* This restricts what anonymous SSA names we can coalesce
--- 104,112 
struct tree_int_map **slot;
unsigned baseindex;
var = partition_to_var (map, x);
!   if (SSA_NAME_VAR (var)
!  (!VAR_P (SSA_NAME_VAR (var))
! || !DECL_IGNORED_P (SSA_NAME_VAR (var
m-base.from = SSA_NAME_VAR (var);
else
/* This restricts what anonymous SSA names we can coalesce
*** loe_visit_block (tree_live_info_p live,
*** 992,1000 
edge_iterator ei;
basic_block pred_bb;
bitmap loe;
-   gcc_assert (!bitmap_bit_p (visited, bb-index));
  
bitmap_set_bit (visited, bb-index);
loe = live_on_entry (live, bb);
  
FOR_EACH_EDGE (e, ei, bb-preds)
--- 994,1003 
edge_iterator ei;
basic_block pred_bb;
bitmap loe;
  
+   gcc_checking_assert (!bitmap_bit_p (visited, bb-index));
bitmap_set_bit (visited, bb-index);
+ 
loe = live_on_entry (live, bb);
  
FOR_EACH_EDGE (e, ei, bb-preds)
Index: gcc/tree-ssa-coalesce.c
===
*** gcc/tree-ssa-coalesce.c.orig2013-09-26 11:50:32.0 +0200
--- gcc/tree-ssa-coalesce.c 2013-09-26 13:12:48.848382555 +0200
*** create_outofssa_var_map (coalesce_list_p
*** 982,991 
  {
tree lhs = gimple_assign_lhs (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
! 
!   if (gimple_assign_copy_p (stmt)
!  TREE_CODE (lhs) == SSA_NAME
!TREE_CODE (rhs1) == SSA_NAME
 gimple_can_coalesce_p (lhs, rhs1))
  {
v1 = SSA_NAME_VERSION (lhs);
--- 982,988 
  {
tree lhs = gimple_assign_lhs (stmt);
tree rhs1 = gimple_assign_rhs1 (stmt);
!   if (gimple_assign_ssa_name_copy_p (stmt)
 gimple_can_coalesce_p (lhs, rhs1))
  {
v1 = SSA_NAME_VERSION (lhs);
*** gimple_can_coalesce_p (tree name1, tree
*** 1349,1355 
  {
/* First check the SSA_NAME's associated DECL.  We only want to
   coalesce if they have the same DECL or both have no associated DECL.  */
!   if (SSA_NAME_VAR (name1) != SSA_NAME_VAR (name2))
  return 

Re: [PATCH, RTL] Prepare ARM build with LRA

2013-09-26 Thread Yvan Roux
(Added Eric and Richard)

Sorry for the inconvenience Iain, It's ok for my side.

Thanks,
Yvan

On 26 September 2013 13:18, Iain Sandoe i...@codesourcery.com wrote:
 Hi Yvan,

 On 24 Sep 2013, at 09:29, Yvan Roux wrote:

 On 11 September 2013 21:08, Yvan Roux yvan.r...@linaro.org wrote:
 Here is the new patch discussed in the other thread.

 Thanks
 Yvan

 2013-09-11  Yvan Roux  yvan.r...@linaro.org
Vladimir Makarov  vmaka...@redhat.com

* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield 
 operations
from the least significant bit.
(strip_address_mutations): Add bitfield operations handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.

 as discussed on irc, this part (applied as r202914) breaks bootstrap on 
 powerpc-darwin9 (and, presumably, other BE targets) with a signed/unsigned 
 compare error at rtlanal.c:5482

 below is a trivial patch, which makes both parts of test signed.
 With this, bootstrap completes on powerpc-darwin9 - however, you might want 
 to check that it still does what you intended.

 thanks
 Iain

 diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
 index 24cbcd2..0349bcc 100644
 --- a/gcc/rtlanal.c
 +++ b/gcc/rtlanal.c
 @@ -5476,7 +5476,7 @@ lsb_bitfield_op_p (rtx x)
if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
  {
enum machine_mode mode = GET_MODE (XEXP (x, 0));
 -  unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
 +  HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));

return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 
 0));



RE: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-26 Thread Bernd Edlinger
Hi,

On Thu, 26 Sep 2013 11:34:02, Eric Botcazou wrote:

 So I still think my patch does the right thing.

 The rationale is:

 = expand_expr (tem,
 (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE
  COMPLETE_TYPE_P (TREE_TYPE (tem))
  (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem)))
 != INTEGER_CST)
  modifier != EXPAND_STACK_PARM
 ? target : NULL_RTX),
 VOIDmode,
 EXPAND_MEMORY);

 returns the address of the structure in question,
 we can add offset, bitoffset, and access the memory
 in the right mode and alignment information is
 passed to the backend via MEM_ALIGN (op0).

 But there are conceptually no reasons to require a MEM here. Look at all the
 code just below the block. Given how hard it is to eliminate spills to memory
 in RTL once they are generated, this shouldn't be taken lightly.

 --
 Eric Botcazou

Sure, but the modifier is not meant to force something into memory,
especially when it is already in an register. Remember, we are only
talking of structures here, and we only want to access one member.

It is more the other way round:
It says: You do not have to load the value in a register, if it is already in
memory I'm happy

At least in my understanding.

Bernd.

RE: [PATCH]Construct canonical scaled address expression in IVOPT

2013-09-26 Thread bin.cheng


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 24, 2013 7:58 PM
 To: Bin.Cheng
 Cc: Bin Cheng; GCC Patches; Richard Earnshaw
 Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT
 
 On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com
 wrote:
  On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
  richard.guent...@gmail.com wrote:
  On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote:
 
 
  -Original Message-
 
  Or even [reg*scale] (not sure about that).  But yes, at least
  reg*scale + offset and reg*scale + reg.
 
Apparently it's infeasible to check every possibility for each
  architecture, is it ok we at least check index, then addr if
  index is failed?  By any kind of addressing modes, I mean modes
  supported by function get_address_cost,  i.e., in form of [base] +
  [off] + [var] + (reg|reg*scale).
 
  I suppose so, but it of course depends on what IVOPTs uses the answer
  for in the end.  Appearantly it doesn't distinguish between the
  various cases even though TARGET_MEM_REF does support all the
  variants in question (reg * scale, reg * scale + reg, reg * scale +
  const, reg * scale + reg + const).
 
  So the better answer may be to teach the costs about the differences?
  Ideally, IVOPT should be aware whether scaling is allowed in every
  kind of addressing modes and account cost of multiplier accordingly.
  For current code, there are two scenarios here
  1) If target supports reg*scale+reg, but not reg*scale, in this case,
  IVOPT considers multiplier is not allowed in any addressing mode and
  account multiplier with high cost.  This is the problem arm having.
  2) If target supports reg*scale, but not some kind of addressing mode
  (saying reg*scale+reg), in this case, IVOPT still constructs various
  scaled addressing mode in get_address_cost and depends on address_cost
  to compute correct cost for that addressing expression.  I think this
  happens to work even IVOPT doesn't know reg*scale+reg is actually
  not supported.
 
 
  The above also builds more RTX waste which you can fix by re-using
  the
  PLUS
  by building it up-front similar to the multiplication.  You also
  miss the
  Yes, this can be fixed.
 
  opportunity to have scale == 1 denote as to whether reg1 + reg2 is
valid.
  I
  would expect that many targets support reg1 * scale +
  constant-offset but not many reg1 * scale + reg2.
  I thought scale==1 is unnecessary because the addressing mode
  degrades into reg or reg+reg.  Moreover, calls of
  multiplier_allowed_in_address_p in both get_address_cost and
 get_computation_cost_at have scale other than 1.
 
  Ok.
 
 
  So no, the helper now checks sth completely different.  What's the
  problem with arm supporting reg1 * scale?  Why shouldn't it being
  able to handle
  the
  implicit zero offset?
 
  As Richard clarified, ARM does not support scaled addressing mode
  without base register.
 
  I see.
 
  Also from the newer comments:
 
  Btw, it should be reasonably possible to compute the whole
  multiplier_allowed_in_address_p table for all primary and secondary
  archs (simply build cross-cc1) and compare the results before / after
  a patch candidate.  Querying both reg * scale and reg + reg * scale
  if the first fails sounds like a good solution to me.
  I take this as we should do minimal change by checking reg + reg *
  scale if reg * scale is failed,  right?
 
 Yes, you can share a single RTL expression for all this and I think
querying reg
 + reg * scale first makes sense (then fallback to reg * scale for
compatibility).
 
I updated the patch as discussed.  Please review.
It bootstraps and passes regression test on x86/x86_64, but fails
tree-ssa/scev-4.c on arm cortex-a15.
Hi Richard, I investigated the failure and found out it reveals two other
problems in IVOPT we have discussed.
For scev-4.c like:

typedef struct {
int x;
int y;
} S;
int *a_p;
S a[1000];
f(int k)
{
int i;

for (i=k; i1000; i+=k) {
a_p = a[i].y;
*a_p = 100;
}
}

The iv candidates and uses are like:

use 0
  generic
  in statement a_p.0_5 = a[k_11].y;

  at position 
  type int *
  base (int *) a + ((sizetype) k_3(D) * 8 + 4)
  step (sizetype) k_3(D) * 8
  base object (void *) a
  related candidates 
use 1
  address
  in statement MEM[(int *)a][k_11].y = 100;

  at position MEM[(int *)a][k_11].y
  type int *
  base MEM[(int *)a][k_3(D)].y
  step (sizetype) k_3(D) * 8
  base object (void *) a
  related candidates

candidate 4 (important)
  depends on 1
  original biv
  type int
  base k_3(D)
  step k_3(D)

candidate 7
  depends on 1
  var_before ivtmp.12
  var_after ivtmp.12
  incremented before exit test
  type unsigned int
  base (unsigned int) ((int *) a + (sizetype) k_3(D) * 8)
  step (unsigned int) ((sizetype) k_3(D) * 8)
  base object (void *) a
Candidate 7 is related to use 0

Problem 1) When 

RE: [PATCH]Construct canonical scaled address expression in IVOPT

2013-09-26 Thread bin.cheng
Sorry for missing the patch.

Thanks.
bin

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of bin.cheng
 Sent: Thursday, September 26, 2013 8:05 PM
 To: 'Richard Biener'; Bin.Cheng
 Cc: GCC Patches; Richard Earnshaw
 Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT
 
 
 
  -Original Message-
  From: Richard Biener [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 24, 2013 7:58 PM
  To: Bin.Cheng
  Cc: Bin Cheng; GCC Patches; Richard Earnshaw
  Subject: Re: [PATCH]Construct canonical scaled address expression in
  IVOPT
 
  On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com
  wrote:
   On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
   richard.guent...@gmail.com wrote:
   On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com
 wrote:
  
  
   -Original Message-
  
   Or even [reg*scale] (not sure about that).  But yes, at least
   reg*scale + offset and reg*scale + reg.
  
 Apparently it's infeasible to check every possibility for each
   architecture, is it ok we at least check index, then addr if
   index is failed?  By any kind of addressing modes, I mean
   modes supported by function get_address_cost,  i.e., in form of
   [base] + [off] + [var] + (reg|reg*scale).
  
   I suppose so, but it of course depends on what IVOPTs uses the
   answer for in the end.  Appearantly it doesn't distinguish between
   the various cases even though TARGET_MEM_REF does support all the
   variants in question (reg * scale, reg * scale + reg, reg * scale +
   const, reg * scale + reg + const).
  
   So the better answer may be to teach the costs about the differences?
   Ideally, IVOPT should be aware whether scaling is allowed in every
   kind of addressing modes and account cost of multiplier accordingly.
   For current code, there are two scenarios here
   1) If target supports reg*scale+reg, but not reg*scale, in this
   case, IVOPT considers multiplier is not allowed in any addressing
   mode and account multiplier with high cost.  This is the problem arm
 having.
   2) If target supports reg*scale, but not some kind of addressing
   mode (saying reg*scale+reg), in this case, IVOPT still constructs
   various scaled addressing mode in get_address_cost and depends on
   address_cost to compute correct cost for that addressing expression.
   I think this happens to work even IVOPT doesn't know reg*scale+reg
   is actually not supported.
  
  
   The above also builds more RTX waste which you can fix by
   re-using the
   PLUS
   by building it up-front similar to the multiplication.  You also
   miss the
   Yes, this can be fixed.
  
   opportunity to have scale == 1 denote as to whether reg1 + reg2
   is
 valid.
   I
   would expect that many targets support reg1 * scale +
   constant-offset but not many reg1 * scale + reg2.
   I thought scale==1 is unnecessary because the addressing mode
   degrades into reg or reg+reg.  Moreover, calls of
   multiplier_allowed_in_address_p in both get_address_cost and
  get_computation_cost_at have scale other than 1.
  
   Ok.
  
  
   So no, the helper now checks sth completely different.  What's
   the problem with arm supporting reg1 * scale?  Why shouldn't it
   being able to handle
   the
   implicit zero offset?
  
   As Richard clarified, ARM does not support scaled addressing mode
   without base register.
  
   I see.
  
   Also from the newer comments:
  
   Btw, it should be reasonably possible to compute the whole
   multiplier_allowed_in_address_p table for all primary and secondary
   archs (simply build cross-cc1) and compare the results before /
   after a patch candidate.  Querying both reg * scale and reg + reg *
   scale if the first fails sounds like a good solution to me.
   I take this as we should do minimal change by checking reg + reg *
   scale if reg * scale is failed,  right?
 
  Yes, you can share a single RTL expression for all this and I think
 querying reg
  + reg * scale first makes sense (then fallback to reg * scale for
 compatibility).
 
 I updated the patch as discussed.  Please review.
 It bootstraps and passes regression test on x86/x86_64, but fails tree-
 ssa/scev-4.c on arm cortex-a15.
 Hi Richard, I investigated the failure and found out it reveals two other
 problems in IVOPT we have discussed.
 For scev-4.c like:
 
 typedef struct {
   int x;
   int y;
 } S;
 int *a_p;
 S a[1000];
 f(int k)
 {
   int i;
 
   for (i=k; i1000; i+=k) {
   a_p = a[i].y;
   *a_p = 100;
 }
 }
 
 The iv candidates and uses are like:
 
 use 0
   generic
   in statement a_p.0_5 = a[k_11].y;
 
   at position
   type int *
   base (int *) a + ((sizetype) k_3(D) * 8 + 4)
   step (sizetype) k_3(D) * 8
   base object (void *) a
   related candidates
 use 1
   address
   in statement MEM[(int *)a][k_11].y = 100;
 
   at position MEM[(int *)a][k_11].y
   type int *
   base 

Re: [PATCH, ARM] Fix PR target/58423

2013-09-26 Thread Richard Earnshaw
On 26/09/13 09:50, Zhenqiang Chen wrote:
 
 
 -Original Message-
 From: Richard Earnshaw
 Sent: Monday, September 23, 2013 11:11 PM
 To: Zhenqiang Chen
 Cc: Yufeng Zhang; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Fix PR target/58423

 On 23/09/13 09:11, Zhenqiang Chen wrote:


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Yufeng Zhang
 Sent: Monday, September 23, 2013 3:16 PM
 To: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Fix PR target/58423

 On 09/23/13 07:58, Zhenqiang Chen wrote:
 --- clean-trunk/gcc/config/arm/arm.c2013-09-17 14:29:45.632457018
 +0800
 +++ pr58423/gcc/config/arm/arm.c2013-09-18 14:34:24.708892318
 +0800
 @@ -17645,8 +17645,8 @@
 mem = gen_frame_mem (DImode, stack_pointer_rtx);

   tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem);
 -RTX_FRAME_RELATED_P (tmp) = 1;
   tmp = emit_insn (tmp);
 +RTX_FRAME_RELATED_P (tmp) = 1;

 The indent doesn't seem right.

 Thanks for the comments. My gmail server changes tab to 4 spaces.

 Recreate the patch to remove tab and use attachment to make sure no
 other automatic changes.


 Please fix your mailer.  The GNU coding standard uses hard tabs and
 gratuitous changes to the white space are a pain to deal with.
 
 Thanks for the comments. Patch is updated to use hard tabs. 
  
 -Zhenqiang
 

OK.

R.




Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set

2013-09-26 Thread Richard Biener
On Thu, 26 Sep 2013, Richard Biener wrote:

 On Thu, 26 Sep 2013, Eric Botcazou wrote:
 
   Eric, the current way component_uses_parent_alias_set is called from
   get_alias_set causes the reference tree chain to be walked O(n^2)
   in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT
   reference in the tree chain.  Also the comment above
   component_uses_parent_alias_set doesn't seem to match behavior
   in get_alias_set.  get_alias_set ends up using exactly the type of
   the parent
   of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference
   instead of the alias set provided by the object at the heart of T
  
  That's what the object at the heart of T means I think: given the loop in 
  get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the 
  parent of the outermost non-addressable component in the component chain 
  (outermost in the type layout sense), which is what is wanted: when you go 
  down the component chain from the base object to find the alias set, you 
  need 
  to stop at the first non-addressable component.  That's what is achieved in 
  the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the 
  base 
  object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable 
  component, 
  so that the alias set isn't touched any more after that.
  
  But I agree that the head comment and the interface of c_u_p_a_s are 
  awkward, 
  to say the least, and the quadratic aspect isn't very nice either.
 
 Ok.
 
   I also fail to see why component_uses_parent_alias_set invokes
   get_alias_set (is that to make 'a.c' in struct { char c; } a;
   use the alias set of 'a' instead of the alias set of 'char'?
  
  Yes, I think it's intended to stop the walk as soon as you have a type with 
  alias set 0: if 'a' has alias set 0, then 'a.c' will have it. 
 
 That wasn't the question - I was asking if we have a struct type
 with non-zero alias set, like struct { char c; int i } a; then
 if we have an access like a.c does the code want to use the
 alias set of 'a' (non-zero) for the access for optimization purposes?
 It seems not, given your comment below...
 
 [...]
 
  However, I think the handling of the parent has alias set zero is wrong 
  in 
  your patch, you should test
  
if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0)
 
 but I fail to see how this can happen in practice?  It can happen
 for ref-all bases but that case is handled separately.
 
 But ok, I'll leave the code as-is functionality wise, we should change
 semantics as followup if at all.
 
   The only other use besides from get_alias_set is to set
   MEM_KEEP_ALIAS_SET_P -
  
  It means that the alias set already on the MEM may not be changed 
  afterwards, 
  even if you look into its sub-components later.
  
   whatever that is exactly for, I can
   see a single non-obvious use in store_constructor_field
   (didn't bother to lookup how callers specify the alias_set argument).
   In
   
   if (MEM_P (to_rtx)  !MEM_KEEP_ALIAS_SET_P (to_rtx)
DECL_NONADDRESSABLE_P (field))
 {
   to_rtx = copy_rtx (to_rtx);
   MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 }
   
   we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case
   MEM_KEEP_ALIAS_SET_P is dropped).
  
  Do you mean dropped in set_mem_attributes_minus_bitpos?  No, I don't think 
  we 
  can do that.
 
 Yeah, I thought of dropping it completely - we know the effective
 alias-set of the load/store stmt we are expanding via get_alias_set
 of the expression.  I don't see why we need to invent new alias sets
 in any place down the road when creating sub-accesses (either from
 storing constructor components or from storing bitfield pieces).
 
 Thanks for the comments, I'll prepare a patch to only remove the
 quadraticness without changing anything else.

Like the following.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2013-09-26  Richard Biener  rguent...@suse.de

* alias.h (component_uses_parent_alias_set): Rename to ...
(component_uses_parent_alias_set_from): ... this.
* alias.c (component_uses_parent_alias_set): Rename to ...
(component_uses_parent_alias_set_from): ... this and return
the desired parent.
(reference_alias_ptr_type_1): Use the result from
component_uses_parent_alias_set_from instead of stripping
components one at a time.
* emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust.

Index: gcc/alias.c
===
--- gcc/alias.c (revision 202941)
+++ gcc/alias.c (working copy)
@@ -500,51 +500,58 @@ objects_must_conflict_p (tree t1, tree t
   return alias_sets_must_conflict_p (set1, set2);
 }
 
-/* Return true if all nested component references handled by
-   get_inner_reference in T are such that we should use the alias set
-   provided by the object 

[PATCH, committed] Update t-rs6000 for automatic dependencies

2013-09-26 Thread David Edelsohn
All of the rs6000-specific dependencies are included and discovered
automatically.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

David

* config/rs6000/t-rs6000 (rs6000.o): Remove.
(rs6000-c.o): Use COMPILE and POSTCOMPILE.

Index: config/rs6000/t-rs6000
===
--- config/rs6000/t-rs6000  (revision 202942)
+++ config/rs6000/t-rs6000  (working copy)
@@ -20,22 +20,10 @@

 TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def

-rs6000.o: $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
-  $(RTL_H) $(REGS_H) hard-reg-set.h \
-  real.h insn-config.h conditions.h insn-attr.h flags.h $(RECOG_H) \
-  $(OBSTACK_H) $(TREE_H) $(EXPR_H) $(OPTABS_H) except.h function.h \
-  output.h dbxout.h $(BASIC_BLOCK_H) toplev.h $(GGC_H) $(HASHTAB_H) \
-  $(TM_P_H) $(TARGET_H) $(TARGET_DEF_H) langhooks.h reload.h gt-rs6000.h \
-  cfgloop.h $(OPTS_H) $(COMMON_TARGET_H) dumpfile.h \
-  $(srcdir)/config/rs6000/rs6000-cpus.def
+rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c
+   $(COMPILE) $
+   $(POSTCOMPILE)

-rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c \
-$(srcdir)/config/rs6000/rs6000-protos.h \
-$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(CPPLIB_H) \
-$(TM_P_H) $(C_PRAGMA_H) errors.h coretypes.h $(TM_H)
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
-   $(srcdir)/config/rs6000/rs6000-c.c
-
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000  \


Re: [PATCH]Construct canonical scaled address expression in IVOPT

2013-09-26 Thread Richard Biener
On Thu, Sep 26, 2013 at 2:10 PM, bin.cheng bin.ch...@arm.com wrote:
 Sorry for missing the patch.

 Thanks.
 bin

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of bin.cheng
 Sent: Thursday, September 26, 2013 8:05 PM
 To: 'Richard Biener'; Bin.Cheng
 Cc: GCC Patches; Richard Earnshaw
 Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT



  -Original Message-
  From: Richard Biener [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 24, 2013 7:58 PM
  To: Bin.Cheng
  Cc: Bin Cheng; GCC Patches; Richard Earnshaw
  Subject: Re: [PATCH]Construct canonical scaled address expression in
  IVOPT
 
  On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com
  wrote:
   On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener
   richard.guent...@gmail.com wrote:
   On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com
 wrote:
  
  
   -Original Message-
  
   Or even [reg*scale] (not sure about that).  But yes, at least
   reg*scale + offset and reg*scale + reg.
  
 Apparently it's infeasible to check every possibility for each
   architecture, is it ok we at least check index, then addr if
   index is failed?  By any kind of addressing modes, I mean
   modes supported by function get_address_cost,  i.e., in form of
   [base] + [off] + [var] + (reg|reg*scale).
  
   I suppose so, but it of course depends on what IVOPTs uses the
   answer for in the end.  Appearantly it doesn't distinguish between
   the various cases even though TARGET_MEM_REF does support all the
   variants in question (reg * scale, reg * scale + reg, reg * scale +
   const, reg * scale + reg + const).
  
   So the better answer may be to teach the costs about the differences?
   Ideally, IVOPT should be aware whether scaling is allowed in every
   kind of addressing modes and account cost of multiplier accordingly.
   For current code, there are two scenarios here
   1) If target supports reg*scale+reg, but not reg*scale, in this
   case, IVOPT considers multiplier is not allowed in any addressing
   mode and account multiplier with high cost.  This is the problem arm
 having.
   2) If target supports reg*scale, but not some kind of addressing
   mode (saying reg*scale+reg), in this case, IVOPT still constructs
   various scaled addressing mode in get_address_cost and depends on
   address_cost to compute correct cost for that addressing expression.
   I think this happens to work even IVOPT doesn't know reg*scale+reg
   is actually not supported.
  
  
   The above also builds more RTX waste which you can fix by
   re-using the
   PLUS
   by building it up-front similar to the multiplication.  You also
   miss the
   Yes, this can be fixed.
  
   opportunity to have scale == 1 denote as to whether reg1 + reg2
   is
 valid.
   I
   would expect that many targets support reg1 * scale +
   constant-offset but not many reg1 * scale + reg2.
   I thought scale==1 is unnecessary because the addressing mode
   degrades into reg or reg+reg.  Moreover, calls of
   multiplier_allowed_in_address_p in both get_address_cost and
  get_computation_cost_at have scale other than 1.
  
   Ok.
  
  
   So no, the helper now checks sth completely different.  What's
   the problem with arm supporting reg1 * scale?  Why shouldn't it
   being able to handle
   the
   implicit zero offset?
  
   As Richard clarified, ARM does not support scaled addressing mode
   without base register.
  
   I see.
  
   Also from the newer comments:
  
   Btw, it should be reasonably possible to compute the whole
   multiplier_allowed_in_address_p table for all primary and secondary
   archs (simply build cross-cc1) and compare the results before /
   after a patch candidate.  Querying both reg * scale and reg + reg *
   scale if the first fails sounds like a good solution to me.
   I take this as we should do minimal change by checking reg + reg *
   scale if reg * scale is failed,  right?
 
  Yes, you can share a single RTL expression for all this and I think
 querying reg
  + reg * scale first makes sense (then fallback to reg * scale for
 compatibility).
 
 I updated the patch as discussed.  Please review.
 It bootstraps and passes regression test on x86/x86_64, but fails tree-
 ssa/scev-4.c on arm cortex-a15.
 Hi Richard, I investigated the failure and found out it reveals two other
 problems in IVOPT we have discussed.
 For scev-4.c like:

 typedef struct {
   int x;
   int y;
 } S;
 int *a_p;
 S a[1000];
 f(int k)
 {
   int i;

   for (i=k; i1000; i+=k) {
   a_p = a[i].y;
   *a_p = 100;
 }
 }

 The iv candidates and uses are like:

 use 0
   generic
   in statement a_p.0_5 = a[k_11].y;

   at position
   type int *
   base (int *) a + ((sizetype) k_3(D) * 8 + 4)
   step (sizetype) k_3(D) * 8
   base object (void *) a
   related candidates
 use 1
   address
   in statement MEM[(int *)a][k_11].y = 100;

  

[PATCH] Fix libgfortran cross compile configury w.r.t newlib

2013-09-26 Thread Marcus Shawcroft

This patch:

http://gcc.gnu.org/ml/fortran/2013-06/msg00038.html

... breaks libgfortran configure against newlib.

The solution implemented hard wires an assumption in 
libgfortran/configure.ac that newlib provides strtold().  This 
assumption is not correct, newlib only provides an implementation of 
strtold on systems where sizeof(long double) == sizeof(double).  This 
manifests as a regression when trying to build a cross aarch64 bare 
metal toolchain with fortran enabled.


The attached patch tightens the condition introduced in the earlier 
patch such that we continue to call AC_CHECK_FUNCS_ONCE unless we know 
that link tests are not possible, in which case we fall back to the 
existing broken assumption.


I'm in two minds about whether further sticky tape of this form is the 
right approach or whether the original patch should be reverted until a 
proper fix that does not regress the tree can be found.


Thoughts?

2013-09-26  Marcus Shawcroft  marcus.shawcr...@arm.com

* configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement
dependent on gcc_no_link.

Cheers
/Marcusdiff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 4609eba..411ab38 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -261,7 +261,7 @@ GCC_HEADER_STDINT(gstdint.h)
 AC_CHECK_MEMBERS([struct stat.st_blksize, struct stat.st_blocks, struct 
stat.st_rdev])
 
 # Check for library functions.
-if test x${with_newlib} = xyes; then
+if test x${with_newlib} = xyes -a x${gcc_no_link} = xyes ; then
# We are being configured with a cross compiler.  AC_REPLACE_FUNCS
# may not work correctly, because the compiler may not be able to
# link executables.

[PATCH] Fix PR58459

2013-09-26 Thread Richard Biener

After much thinking I settled on removing the restriction from
forwprop that avoids propagating non-invariant addresses into
loops.  As forwprop is mainly seen as a way to canonicalize the IL
and simplify it for further passes this seems like the correct thing
to do.  LIM and IVOPTs should undo any harm this causes, otherwise
I'll find another way to fix the fallout.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-09-26  Richard Biener  rguent...@suse.de

PR tree-optimization/58459
* tree-ssa-forwprop.c (forward_propagate_addr_expr): Remove
restriction not propagating into loops.

* gcc.dg/tree-ssa/ssa-pre-31.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c  (working copy)
***
*** 0 
--- 1,47 
+ /* { dg-do compile } */
+ /* { dg-options -O2 -fdump-tree-pre } */
+ 
+ typedef struct {
+ unsigned int key;
+ } S;
+ typedef struct s1  {
+ unsigned int key;
+ unsigned int bits;
+ struct s1 *left, *right;
+ }S1;
+ extern S a[1024];
+ static inline int bar( S* p, S1* n )
+ {
+   S1 *curr;
+   S1 *next;
+ 
+   if ( n-left == n )
+ return (int)(p-key == n-key);
+ 
+   curr = n;
+   next = n-left;
+ 
+   while (curr-bits  next-bits ) {
+   curr = next;
+   if (p-key  (1  curr-bits))
+   next = curr-right;
+   else
+   next = curr-left;
+   }
+ 
+   return (int)(p-key == next-key);
+ 
+ }
+ 
+ int foo (S1 *root, int N)
+ {
+   volatile int r;
+   int i,j;
+   for (i=0; iN; i++)
+ for (j=0;j1024; j++)
+   r = bar(a[j], root);
+   return 0;
+ } 
+ 
+ /* { dg-final { scan-tree-dump-times key 4 pre } } */
+ /* { dg-final { cleanup-tree-dump pre } } */
Index: gcc/tree-ssa-forwprop.c
===
*** gcc/tree-ssa-forwprop.c (revision 202941)
--- gcc/tree-ssa-forwprop.c (working copy)
*** forward_propagate_addr_expr_1 (tree name
*** 1001,1007 
  static bool
  forward_propagate_addr_expr (tree name, tree rhs)
  {
-   int stmt_loop_depth = bb_loop_depth (gimple_bb (SSA_NAME_DEF_STMT (name)));
imm_use_iterator iter;
gimple use_stmt;
bool all = true;
--- 1001,1006 
*** forward_propagate_addr_expr (tree name,
*** 1014,1050 
  
/* If the use is not in a simple assignment statement, then
 there is nothing we can do.  */
!   if (gimple_code (use_stmt) != GIMPLE_ASSIGN)
{
  if (!is_gimple_debug (use_stmt))
all = false;
  continue;
}
  
!   /* If the use is in a deeper loop nest, then we do not want
!to propagate non-invariant ADDR_EXPRs into the loop as that
!is likely adding expression evaluations into the loop.  */
!   if (bb_loop_depth (gimple_bb (use_stmt))  stmt_loop_depth
!  !is_gimple_min_invariant (rhs))
{
! all = false;
! continue;
}
! 
!   {
!   gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
!   result = forward_propagate_addr_expr_1 (name, rhs, gsi,
!   single_use_p);
!   /* If the use has moved to a different statement adjust
!  the update machinery for the old statement too.  */
!   if (use_stmt != gsi_stmt (gsi))
! {
!   update_stmt (use_stmt);
!   use_stmt = gsi_stmt (gsi);
! }
! 
!   update_stmt (use_stmt);
!   }
all = result;
  
/* Remove intermediate now unused copy and conversion chains.  */
--- 1013,1036 
  
/* If the use is not in a simple assignment statement, then
 there is nothing we can do.  */
!   if (!is_gimple_assign (use_stmt))
{
  if (!is_gimple_debug (use_stmt))
all = false;
  continue;
}
  
!   gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
!   result = forward_propagate_addr_expr_1 (name, rhs, gsi,
! single_use_p);
!   /* If the use has moved to a different statement adjust
!the update machinery for the old statement too.  */
!   if (use_stmt != gsi_stmt (gsi))
{
! update_stmt (use_stmt);
! use_stmt = gsi_stmt (gsi);
}
!   update_stmt (use_stmt);
all = result;
  
/* Remove intermediate now unused copy and conversion chains.  */


Re: [PATCH] Refactor type handling in get_alias_set, fix PR58513

2013-09-26 Thread James Greenhalgh

On Tue, Sep 24, 2013 at 12:00:48PM +0100, Richard Biener wrote:
 2013-09-24  Richard Biener  rguent...@suse.de

 * g++.dg/vect/pr58513.cc: New testcase.


Hi,

This testcase fails for arm and aarch64 targets when using -fPIC.
As discussed on IRC this can be fixed by making op static.

After asking Richard Earnshaw to explain -fPIC and dynamic linking to
me, I've committed this (now obvious) patch as r202947.

Thanks,
James

---
gcc/testsuite/

2013-09-26  James Greenhalgh  james.greenha...@arm.com

* g++.dg/vect/pr58513.cc (op): Make static.diff --git a/gcc/testsuite/g++.dg/vect/pr58513.cc b/gcc/testsuite/g++.dg/vect/pr58513.cc
index 2563047..08a175c 100644
--- a/gcc/testsuite/g++.dg/vect/pr58513.cc
+++ b/gcc/testsuite/g++.dg/vect/pr58513.cc
@@ -1,7 +1,7 @@
 // { dg-do compile }
 // { dg-require-effective-target vect_int }
 
-int op (const int x, const int y) { return x + y; }
+static int op (const int x, const int y) { return x + y; }
 
 void foo(int* a)
 {

Re: [PATCH] Trivial cleanup

2013-09-26 Thread Michael Matz
Hi,

On Wed, 25 Sep 2013, Jeff Law wrote:

   I was going to bring it up at some point too.  My preference is
   strongly to simply eliminate the space on methods...
  Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.
 Yea.  I actually reviewed the libstdc++ guidelines to see where they differed
 from GNU's C guidelines.
 
 I'm strongly in favor of dropping the horizontal whitespace between the 
 method name and its open paren when the result is then dereferenced.  
 ie foo.last()-e rather than foo.last ()-e.

I'd prefer to not write in this style at all, like Jakub.  If we must 
absolutely have it, then I agree that the space before _empty_ parentheses 
are ugly if followed by references.  I.e. I'd like to see spaces before 
parens as is customary, except in one case: empty parens in the middle of 
expressions (which don't happen very often right now in GCC, and hence 
wouldn't introduce a coding style confusion):

do.this ();
give.that()-flag;
get.list (one)-clear ();

I'd prefer to not have further references to return values be applied, 
though (as in, the parentheses should be the end of statement), which 
would avoid the topic (at the expensive of having to invent names for 
those temporaries, or to write trivial wrapper methods contracting several 
method calls).


Ciao,
Michael.


[PATCH] Make more SSA names anonymous after into-SSA

2013-09-26 Thread Richard Biener

It turns out we are pretty conservative on the set of SSA names
we make anonymous.  In particular all named temporaries
created by gimplification and lowering passes were left alone.
The following patch makes all SSA names anonymous that we will
not generate debug information for and transitions the name
over to it.

Bootstrap and regtest running on x86_64_unknown-linux-gnu.

Richard.

2013-09-26  Richard Biener  rguent...@suse.de

* tree-into-ssa.c (rewrite_into_ssa): Make more SSA names
to anonymous.

Index: gcc/tree-into-ssa.c
===
*** gcc/tree-into-ssa.c (revision 202941)
--- gcc/tree-into-ssa.c (working copy)
*** rewrite_into_ssa (void)
*** 2366,2375 
if (decl
   TREE_CODE (decl) == VAR_DECL
   !VAR_DECL_IS_VIRTUAL_OPERAND (decl)
!  DECL_ARTIFICIAL (decl)
!  DECL_IGNORED_P (decl)
!  !DECL_NAME (decl))
!   SET_SSA_NAME_VAR_OR_IDENTIFIER (name, NULL_TREE);
  }
  
return 0;
--- 2366,2373 
if (decl
   TREE_CODE (decl) == VAR_DECL
   !VAR_DECL_IS_VIRTUAL_OPERAND (decl)
!  DECL_IGNORED_P (decl))
!   SET_SSA_NAME_VAR_OR_IDENTIFIER (name, DECL_NAME (decl));
  }
  
return 0;


Re: [PATCH] Trivial cleanup

2013-09-26 Thread Richard Biener
On Thu, Sep 26, 2013 at 4:15 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Wed, 25 Sep 2013, Jeff Law wrote:

   I was going to bring it up at some point too.  My preference is
   strongly to simply eliminate the space on methods...
  Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.
 Yea.  I actually reviewed the libstdc++ guidelines to see where they differed
 from GNU's C guidelines.

 I'm strongly in favor of dropping the horizontal whitespace between the
 method name and its open paren when the result is then dereferenced.
 ie foo.last()-e rather than foo.last ()-e.

 I'd prefer to not write in this style at all, like Jakub.  If we must
 absolutely have it, then I agree that the space before _empty_ parentheses
 are ugly if followed by references.  I.e. I'd like to see spaces before
 parens as is customary, except in one case: empty parens in the middle of
 expressions (which don't happen very often right now in GCC, and hence
 wouldn't introduce a coding style confusion):

 do.this ();
 give.that()-flag;
 get.list (one)-clear ();

 I'd prefer to not have further references to return values be applied,
 though (as in, the parentheses should be the end of statement), which
 would avoid the topic (at the expensive of having to invent names for
 those temporaries, or to write trivial wrapper methods contracting several
 method calls).

Seconded, even give.that()-flag; is ugly.

Richard.


Re: cost model patch

2013-09-26 Thread Richard Biener
On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com wrote:
 I took the liberty to pick up Richard's original fvect-cost-model
 patch and made some modification.

 What has not changed:
 1) option -ftree-vect-loop-version is removed;
 2) three cost models are introduced: cheap, dynamic, and unlimited;
 3) unless explicitly specified, cheap model is the default at O2 (e.g.
 when -ftree-loop-vectorize is used with -O2), and dynamic mode is the
 default for O3 and FDO
 4) alignment based versioning is disabled with cheap model.

 What has changed:
 1) peeling is also disabled with cheap model;
 2) alias check condition limit is reduced with cheap model, but not
 completely suppressed. Runtime alias check is a pretty important
 enabler.
 3) tree if conversion changes are not included.

 Does this patch look reasonable?

In principle yes.  Note that it changes the behavior of -O2 -ftree-vectorize
as -ftree-vectorize does not imply changing the default cost model.  I am
fine with that, but eventually this will have some testsuite fallout.  This
reorg would also need documenting in changes.html to make people
aware of this.

With completely disabling alingment peeling and alignment versioning
you cut out targets that have no way of performing unaligned accesses.
From looking at vect_no_align this are mips, sparc, ia64 and some arm.
A compromise for them would be to allow peeling a single iteration
and some alignment checks (like up to two?).

Reducing the number of allowed alias-checks is ok, but I'd reduce it
more than to 6 (was that an arbitrary number or is that the result of
some benchmarking?)

I suppose all of the params could use some benchmarking to select
a sweet spot in code size vs. runtime.

I suppose the patch is ok as-is (if it actually works) if you provide
a changelog and propose an entry for changes.html.  We can
tune the params for the cheap model as followup.

Thanks for picking this up,
Richard.

 thanks,

 David


Re: [PATCH] Fix PR58459

2013-09-26 Thread Jakub Jelinek
On Thu, Sep 26, 2013 at 03:54:19PM +0200, Richard Biener wrote:
 
 After much thinking I settled on removing the restriction from
 forwprop that avoids propagating non-invariant addresses into
 loops.  As forwprop is mainly seen as a way to canonicalize the IL
 and simplify it for further passes this seems like the correct thing
 to do.  LIM and IVOPTs should undo any harm this causes, otherwise
 I'll find another way to fix the fallout.

But, aren't some forwprop passes run after LIM and IVOPTs, so while
LIM and IVOPTs undo that harm, forwprop4 pass adds it there again?

I guess it is just fine to remove the restriction for forwprop{1,2,3},
but for forwprop4 I'm not that sure.  You are then hoping RTL optimizations
will undo the harm, but if they don't, ...

Jakub


Re: [PATCH] Fix PR58459

2013-09-26 Thread Richard Biener
On Thu, 26 Sep 2013, Jakub Jelinek wrote:

 On Thu, Sep 26, 2013 at 03:54:19PM +0200, Richard Biener wrote:
  
  After much thinking I settled on removing the restriction from
  forwprop that avoids propagating non-invariant addresses into
  loops.  As forwprop is mainly seen as a way to canonicalize the IL
  and simplify it for further passes this seems like the correct thing
  to do.  LIM and IVOPTs should undo any harm this causes, otherwise
  I'll find another way to fix the fallout.
 
 But, aren't some forwprop passes run after LIM and IVOPTs, so while
 LIM and IVOPTs undo that harm, forwprop4 pass adds it there again?

LIM would move a memory reference, forwprop nowadays hopefully only
propagates these beasts into dereferences.  IVOPTs only works on
memory references as well, exposing lowered address computation,
thus a different kind of addresses.

 I guess it is just fine to remove the restriction for forwprop{1,2,3},
 but for forwprop4 I'm not that sure.  You are then hoping RTL optimizations
 will undo the harm, but if they don't, ...

... then we find a solution to address this.  No, I don't have my
bets on RTL optimizers fixing this but instead on no fallout actually
happening.

Richard.


Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib

2013-09-26 Thread Steve Ellcey
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote:

 I'm in two minds about whether further sticky tape of this form is the 
 right approach or whether the original patch should be reverted until a 
 proper fix that does not regress the tree can be found.
 
 Thoughts?
 
 2013-09-26  Marcus Shawcroft  marcus.shawcr...@arm.com
 
  * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement
  dependent on gcc_no_link.
 
 Cheers
 /Marcus

I think this patch is a good fix.  I (obviously) don't favor reverting
the previous patch because that would re-break the Fortran build on MIPS
bare-metal cross compilers (or any compiler where a linker script is
needed).  Any 'proper' fix should address libstdc++, libjava, and other
libraries as well as libgfortran and I don't know what a cleaner fix
would be.  In fact I would say the other libraries should consider using
this fix.  The only reason they don't run into this problem too is that
they don't depend on any long double functions or any other functions
that are optionally built by newlib.

I will test this patch on my targets and make sure it works for me, but
I don't see why it would not.

Steve Ellcey
sell...@mips.com




[GOOGLE] max-lipo-group parameter for AutoFDO

2013-09-26 Thread Dehao Chen
This patch fix the bug when setting max-lipo-group in AutoFDO:

Bootstrapped and passed regression test.

OK for google branches?

Thanks,
Dehao

Index: gcc/auto-profile.c
===
--- gcc/auto-profile.c (revision 202926)
+++ gcc/auto-profile.c (working copy)
@@ -746,7 +746,7 @@ read_aux_modules (void)
 assembler statements, *iter);
   continue;
  }
-  if (max_group != 0  curr_module == max_group)
+  if (max_group != 0  curr_module = max_group)
  {
   if (flag_opt_info)
 inform (0, Not importing %s: maximum group size reached, *iter);


Re: [GOOGLE] max-lipo-group parameter for AutoFDO

2013-09-26 Thread Xinliang David Li
Looks ok.

David

On Thu, Sep 26, 2013 at 9:00 AM, Dehao Chen de...@google.com wrote:
 This patch fix the bug when setting max-lipo-group in AutoFDO:

 Bootstrapped and passed regression test.

 OK for google branches?

 Thanks,
 Dehao

 Index: gcc/auto-profile.c
 ===
 --- gcc/auto-profile.c (revision 202926)
 +++ gcc/auto-profile.c (working copy)
 @@ -746,7 +746,7 @@ read_aux_modules (void)
  assembler statements, *iter);
continue;
   }
 -  if (max_group != 0  curr_module == max_group)
 +  if (max_group != 0  curr_module = max_group)
   {
if (flag_opt_info)
  inform (0, Not importing %s: maximum group size reached, *iter);


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-26 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 So in the set_* routines it isn't about whether the value is definitely
 a base or a definitely an index.  It's just about drilling down through
 what we've already decided is a base or index to get the inner reg or mem,
 and knowing which XEXPs to look at.  We could instead have used a
 for_each_rtx, or something like that, without any code checks.  But I
 wanted to be precise about the types of address we allow, so that we can
 assert for things we don't understand.  In other words, it was designed
 to require the kind of extension Yvan is adding here.

 Does this mean that the design is to require a parallel implementation in the 
 predicates and in the set routines, i.e. each time you add a new case to the 
 predicates, you need to add it (or do something) to the set routines as well? 
  
 If so, that's a little weird, but OK, feel free to revert the de-duplication 
 part, but add comments saying that the functions must be kept synchronized.

They don't need to be kept synchronised as such.  It's fine for the index
to allow more than must_be_index_p.  But if you're not keen on the current
structure, does the following look better?  Tested on x86_64-linux-gnu.

Thanks,
Richard


gcc/
* rtlanal.c (must_be_base_p, must_be_index_p): Delete.
(binary_scale_code_p, get_base_term, get_index_term): New functions.
(set_address_segment, set_address_base, set_address_index)
(set_address_disp): Accept the argument unconditionally.
(baseness): Remove must_be_base_p and must_be_index_p checks.
(decompose_normal_address): Classify as much as possible in the
main loop.

Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2013-09-25 22:42:16.870221821 +0100
+++ gcc/rtlanal.c   2013-09-26 09:11:41.273778750 +0100
@@ -5521,26 +5521,50 @@ strip_address_mutations (rtx *loc, enum
 }
 }
 
-/* Return true if X must be a base rather than an index.  */
+/* Return true if CODE applies some kind of scale.  The scaled value is
+   is the first operand and the scale is the second.  */
 
 static bool
-must_be_base_p (rtx x)
+binary_scale_code_p (enum rtx_code code)
 {
-  return GET_CODE (x) == LO_SUM;
+  return (code == MULT
+  || code == ASHIFT
+  /* Needed by ARM targets.  */
+  || code == ASHIFTRT
+  || code == LSHIFTRT
+  || code == ROTATE
+  || code == ROTATERT);
 }
 
-/* Return true if X must be an index rather than a base.  */
+/* If *INNER can be interpreted as a base, return a pointer to the inner term
+   (see address_info).  Return null otherwise.  */
 
-static bool
-must_be_index_p (rtx x)
+static rtx *
+get_base_term (rtx *inner)
 {
-  return (GET_CODE (x) == MULT
-  || GET_CODE (x) == ASHIFT
-  /* Needed by ARM targets.  */
-  || GET_CODE (x) == ASHIFTRT
-  || GET_CODE (x) == LSHIFTRT
-  || GET_CODE (x) == ROTATE
-  || GET_CODE (x) == ROTATERT);
+  if (GET_CODE (*inner) == LO_SUM)
+inner = strip_address_mutations (XEXP (*inner, 0));
+  if (REG_P (*inner)
+  || MEM_P (*inner)
+  || GET_CODE (*inner) == SUBREG)
+return inner;
+  return 0;
+}
+
+/* If *INNER can be interpreted as an index, return a pointer to the inner term
+   (see address_info).  Return null otherwise.  */
+
+static rtx *
+get_index_term (rtx *inner)
+{
+  /* At present, only constant scales are allowed.  */
+  if (binary_scale_code_p (GET_CODE (*inner))  CONSTANT_P (XEXP (*inner, 1)))
+inner = strip_address_mutations (XEXP (*inner, 0));
+  if (REG_P (*inner)
+  || MEM_P (*inner)
+  || GET_CODE (*inner) == SUBREG)
+return inner;
+  return 0;
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5549,8 +5573,6 @@ must_be_index_p (rtx x)
 static void
 set_address_segment (struct address_info *info, rtx *loc, rtx *inner)
 {
-  gcc_checking_assert (GET_CODE (*inner) == UNSPEC);
-
   gcc_assert (!info-segment);
   info-segment = loc;
   info-segment_term = inner;
@@ -5562,12 +5584,6 @@ set_address_segment (struct address_info
 static void
 set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if (must_be_base_p (*inner))
-inner = strip_address_mutations (XEXP (*inner, 0));
-  gcc_checking_assert (REG_P (*inner)
-  || MEM_P (*inner)
-  || GET_CODE (*inner) == SUBREG);
-
   gcc_assert (!info-base);
   info-base = loc;
   info-base_term = inner;
@@ -5579,12 +5595,6 @@ set_address_base (struct address_info *i
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if (must_be_index_p (*inner)  CONSTANT_P (XEXP (*inner, 1)))
-inner = strip_address_mutations (XEXP (*inner, 0));
-  gcc_checking_assert (REG_P (*inner)
-  || MEM_P (*inner)
-  || GET_CODE (*inner) == SUBREG);
-
   gcc_assert (!info-index);
   

Re: RFA: Store the REG_BR_PROB probability directly as an int

2013-09-26 Thread Richard Sandiford
Steve Ellcey sell...@mips.com writes:
 On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote:
 Richard Sandiford rdsandif...@googlemail.com writes:
 
  Sorry for the breakage.  I think we need to handle INT_LIST in the same way
  as INSN_LIST though, and eliminate in XEXP (x, 1).
 
  How about the attached?  Testing in progress...
 
 Works for me as well.
 
 Andreas.

 This patch worked for me as well on MIPS.  I did a complete build and
 test overnight.

Thanks for the testing.  It also passes bootstrap on x86_64-linux-gnu.
OK to install?

Thanks,
Richard



[GOOGLE] Fix an ICE in lipo_cmp_type

2013-09-26 Thread Dehao Chen
This fixes an ICE when lipo_cmp_type handles NULL_PTR_TYPE.

Bootstrapped and regression test on going?

OK for google branches?

Thanks,
Dehao

Index: gcc/l-ipo.c
===
--- gcc/l-ipo.c (revision 202926)
+++ gcc/l-ipo.c (working copy)
@@ -713,6 +713,7 @@ lipo_cmp_type (tree t1, tree t2)
lipo_cmp_type (TREE_TYPE (t1), TREE_TYPE (t2)));
 case VOID_TYPE:
 case BOOLEAN_TYPE:
+case NULLPTR_TYPE:
   return 1;
 case TEMPLATE_TYPE_PARM:
   return 1;


Re: [GOOGLE] Fix an ICE in lipo_cmp_type

2013-09-26 Thread Xinliang David Li
yes.

David

On Thu, Sep 26, 2013 at 9:26 AM, Dehao Chen de...@google.com wrote:
 This fixes an ICE when lipo_cmp_type handles NULL_PTR_TYPE.

 Bootstrapped and regression test on going?

 OK for google branches?

 Thanks,
 Dehao

 Index: gcc/l-ipo.c
 ===
 --- gcc/l-ipo.c (revision 202926)
 +++ gcc/l-ipo.c (working copy)
 @@ -713,6 +713,7 @@ lipo_cmp_type (tree t1, tree t2)
 lipo_cmp_type (TREE_TYPE (t1), TREE_TYPE (t2)));
  case VOID_TYPE:
  case BOOLEAN_TYPE:
 +case NULLPTR_TYPE:
return 1;
  case TEMPLATE_TYPE_PARM:
return 1;


[patch, committed] Remove walk_use_def_chains

2013-09-26 Thread Florian Weimer
As preapproved by Richard Biener.  Bootstrapped on
x86_64-debian-linux-gnu.

2013-09-26  Florian Weimer  f...@deneb.enyo.de

* tree-ssa.h (walk_use_def_chains_fn, walk_use_def_chains): Delete.
* tree-ssa.c (walk_use_def_chains_1, walk_use_def_chains): Delete.
* doc/tree-ssa.texi (Walking use-def chains): Delete.

Index: gcc/tree-ssa.c
===
--- gcc/tree-ssa.c  (revision 202947)
+++ gcc/tree-ssa.c  (working copy)
@@ -1347,115 +1347,6 @@
 }
 
 
-/* Internal helper for walk_use_def_chains.  VAR, FN and DATA are as
-   described in walk_use_def_chains.
-
-   VISITED is a pointer set used to mark visited SSA_NAMEs to avoid
-  infinite loops.  We used to have a bitmap for this to just mark
-  SSA versions we had visited.  But non-sparse bitmaps are way too
-  expensive, while sparse bitmaps may cause quadratic behavior.
-
-   IS_DFS is true if the caller wants to perform a depth-first search
-  when visiting PHI nodes.  A DFS will visit each PHI argument and
-  call FN after each one.  Otherwise, all the arguments are
-  visited first and then FN is called with each of the visited
-  arguments in a separate pass.  */
-
-static bool
-walk_use_def_chains_1 (tree var, walk_use_def_chains_fn fn, void *data,
-  struct pointer_set_t *visited, bool is_dfs)
-{
-  gimple def_stmt;
-
-  if (pointer_set_insert (visited, var))
-return false;
-
-  def_stmt = SSA_NAME_DEF_STMT (var);
-
-  if (gimple_code (def_stmt) != GIMPLE_PHI)
-{
-  /* If we reached the end of the use-def chain, call FN.  */
-  return fn (var, def_stmt, data);
-}
-  else
-{
-  size_t i;
-
-  /* When doing a breadth-first search, call FN before following the
-use-def links for each argument.  */
-  if (!is_dfs)
-   for (i = 0; i  gimple_phi_num_args (def_stmt); i++)
- if (fn (gimple_phi_arg_def (def_stmt, i), def_stmt, data))
-   return true;
-
-  /* Follow use-def links out of each PHI argument.  */
-  for (i = 0; i  gimple_phi_num_args (def_stmt); i++)
-   {
- tree arg = gimple_phi_arg_def (def_stmt, i);
-
- /* ARG may be NULL for newly introduced PHI nodes.  */
- if (arg
-  TREE_CODE (arg) == SSA_NAME
-  walk_use_def_chains_1 (arg, fn, data, visited, is_dfs))
-   return true;
-   }
-
-  /* When doing a depth-first search, call FN after following the
-use-def links for each argument.  */
-  if (is_dfs)
-   for (i = 0; i  gimple_phi_num_args (def_stmt); i++)
- if (fn (gimple_phi_arg_def (def_stmt, i), def_stmt, data))
-   return true;
-}
-
-  return false;
-}
-
-
-
-/* Walk use-def chains starting at the SSA variable VAR.  Call
-   function FN at each reaching definition found.  FN takes three
-   arguments: VAR, its defining statement (DEF_STMT) and a generic
-   pointer to whatever state information that FN may want to maintain
-   (DATA).  FN is able to stop the walk by returning true, otherwise
-   in order to continue the walk, FN should return false.
-
-   Note, that if DEF_STMT is a PHI node, the semantics are slightly
-   different.  The first argument to FN is no longer the original
-   variable VAR, but the PHI argument currently being examined.  If FN
-   wants to get at VAR, it should call PHI_RESULT (PHI).
-
-   If IS_DFS is true, this function will:
-
-   1- walk the use-def chains for all the PHI arguments, and,
-   2- call (*FN) (ARG, PHI, DATA) on all the PHI arguments.
-
-   If IS_DFS is false, the two steps above are done in reverse order
-   (i.e., a breadth-first search).  */
-
-void
-walk_use_def_chains (tree var, walk_use_def_chains_fn fn, void *data,
- bool is_dfs)
-{
-  gimple def_stmt;
-
-  gcc_assert (TREE_CODE (var) == SSA_NAME);
-
-  def_stmt = SSA_NAME_DEF_STMT (var);
-
-  /* We only need to recurse if the reaching definition comes from a PHI
- node.  */
-  if (gimple_code (def_stmt) != GIMPLE_PHI)
-(*fn) (var, def_stmt, data);
-  else
-{
-  struct pointer_set_t *visited = pointer_set_create ();
-  walk_use_def_chains_1 (var, fn, data, visited, is_dfs);
-  pointer_set_destroy (visited);
-}
-}
-
-
 /* If necessary, rewrite the base of the reference tree *TP from
a MEM_REF to a plain or converted symbol.  */
 
Index: gcc/tree-ssa.h
===
--- gcc/tree-ssa.h  (revision 202947)
+++ gcc/tree-ssa.h  (working copy)
@@ -56,11 +56,6 @@
 extern bool tree_ssa_useless_type_conversion (tree);
 extern tree tree_ssa_strip_useless_type_conversions (tree);
 
-/* Call-back function for walk_use_def_chains().  At each reaching
-   definition, a function with this prototype is called.  */
-typedef bool (*walk_use_def_chains_fn) (tree, gimple, void *);
-extern void walk_use_def_chains (tree, 

Re: [gomp4] Dumping gimple for offload.

2013-09-26 Thread Ilya Tocar
On 25 Sep 15:48, Richard Biener wrote:
 On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  On 24 Sep 11:02, Richard Biener wrote:
  On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
  wrote:
   thus consider assigning the section
  name in a different place.
 
  Richard.
 
  What do you mean  by different place?
  I can add global dumping_omp_target variable to choose correct name,
  depending on it's value (patch below). Is it better?
 
 More like passing down a different abstraction, like for
 
  @@ -907,9 +907,15 @@ output_symtab (void)
   {
 symtab_node node = lto_symtab_encoder_deref (encoder, i);
 if (cgraph_node *cnode = dyn_cast cgraph_node (node))
  -lto_output_node (ob, cnode, encoder);
  +   {
  + if (!dumping_omp_target || lookup_attribute (omp declare target,
  + DECL_ATTRIBUTES 
  (node-symbol.decl)))
  +   lto_output_node (ob, cnode, encoder);
  +   }
 else
  -lto_output_varpool_node (ob, varpool (node), encoder);
  + if (!dumping_omp_target || lookup_attribute (omp declare target,
  + DECL_ATTRIBUTES 
  (node-symbol.decl)))
  +   lto_output_varpool_node (ob, varpool (node), encoder);
 
   }
 
 have the symtab encoder already not contain the varpool nodes you
 don't need.
 
 And instead of looking up attributes, mark the symtab node with a flag.

Good idea!
I've tried creating 2 encoders, and adding only nodes with
omp declare target attribute in omp case. There is still some is_omp
passing to control  lto_set_symtab_encoder_in_partition behaivor, 
because i think it's better than global var.
What do you think?

---
 gcc/cgraphunit.c| 10 +-
 gcc/lto-cgraph.c| 15 ++-
 gcc/lto-streamer.c  |  3 ++-
 gcc/lto-streamer.h  | 10 --
 gcc/lto/lto-partition.c |  4 ++--
 gcc/passes.c| 10 +-
 gcc/tree-pass.h |  2 +-
 7 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1644ca9..9e0fc77 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2016,7 +2016,15 @@ ipa_passes (void)
  passes-all_lto_gen_passes);
 
   if (!in_lto_p)
-ipa_write_summaries ();
+{
+  if (flag_openmp)
+   {
+ section_name_prefix = OMP_SECTION_NAME_PREFIX;
+ ipa_write_summaries (true);
+   }
+  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+  ipa_write_summaries (false);
+}
 
   if (flag_generate_lto)
 targetm.asm_out.lto_end ();
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 952588d..4a7d179 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t 
encoder,
 
 void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
-symtab_node node)
+symtab_node node, bool is_omp)
 {
+  /* Ignore non omp target nodes for omp case.  */
+  if (is_omp  !lookup_attribute (omp declare target,
+  DECL_ATTRIBUTES (node-symbol.decl)))
+return;
+
   int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
   encoder-nodes[index].in_partition = true;
 }
@@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
ignored by the partitioning logic earlier.  */
 
 lto_symtab_encoder_t 
-compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
+compute_ltrans_boundary (lto_symtab_encoder_t in_encoder, bool is_omp)
 {
   struct cgraph_node *node;
   struct cgraph_edge *edge;
@@ -779,7 +784,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 {
   node = lsei_cgraph_node (lsei);
   add_node_to (encoder, node, true);
-  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node);
+  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node, is_omp);
   add_references (encoder, node-symbol.ref_list);
   /* For proper debug info, we need to ship the origins, too.  */
   if (DECL_ABSTRACT_ORIGIN (node-symbol.decl))
@@ -794,7 +799,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 {
   struct varpool_node *vnode = lsei_varpool_node (lsei);
 
-  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode);
+  lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode, 
is_omp);
   lto_set_symtab_encoder_encode_initializer (encoder, vnode);
   add_references (encoder, vnode-symbol.ref_list);
   /* For proper debug info, we need to ship the origins, too.  */
@@ -802,7 +807,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
{
  struct varpool_node *origin_node
  = varpool_get_node (DECL_ABSTRACT_ORIGIN (node-symbol.decl));
- lto_set_symtab_encoder_in_partition (encoder, 

Re: [gomp4] Tweak GOMP_target{,_data,_update} arguments

2013-09-26 Thread Ilya Verbin
On 19 Sep 11:23, Jakub Jelinek wrote:
 that.  Another complication is dependent shared libraries.
 Consider
 liba.c:
 #pragma omp declare target
 int i;
 int foo (void)
 {
   return ++i;
 }
 #pragma omp end declare target
 main.c:
 #pragma omp declare target
 extern int i;
 extern int foo (void);
 #pragma omp end declare target
 int main ()
 {
   int j;
   #pragma omp target
 {
   j = i;
   j += foo ();
 }
   if (j != 1)
 abort ();
   return 0;
 }
 gcc -shared -O2 -fpic -fopenmp -o liba.so -Wl,-soname,liba.so liba.c
 gcc -O2 -fopenmp -o main main.c -L. -la
 ./main
 
 Perhaps the linker plugin can extract the target shared libraries from
 the embedded sections of dependent shared libraries (if any), and link the
 main shared library against that, but GOMP_target will need to know that
 it can't just offload main.so, but also has to offload the dependent
 liba.so (and of course libgomp.so.1 from the libgomp plugin).
 What does ICC do in this case?
 
   Jakub

Hi Jakub,

Here's what ICC does.
Suppose we have liba.c and main.c, both with target regions:

1. Building liba.c - liba.so.
A call to offload-runtime library is inserted into _init of liba.so.
Target region is compiled into liba_target.so, and placed into .rodata of
liba.so.

2. Building main.c - main.exe.
Similarly, a call to offload-runtime library is inserted into _init of main.exe.
Target region is compiled into main_target.so, and placed into .rodata of
main.exe.

3. Runtime.
So, when liba.so and main.exe are loaded at host-side, the runtime library
knows, that it should transfer liba_target.so and main_target.so to the
target-side.  Then, main.exe starts execution.  At every entry point to the
target region, runtime library checks whether it should perform an
initialization.  If target is not initialized, runtime library calls
COIProcessCreateFromMemory(main_target.exe), that transfers some standard
main_target.exe to the target and starts it.  Then, runtime library calls
COIProcessLoadLibraryFromMemory(liba_target.so, main_target.so), that transfers
these libraries to the target and loads them into the main_target.exe.
The target-side functions are called from host through
COIProcessGetFunctionHandles(f_name) and COIPipelineRunFunction(handle). The
addresses of target-side functions are obtained from *_target.so by dlsym().
So, the host-side knows nothing about target addresses.

What do you think, how will such an approach work with other target
architectures, and with current implementation of GOMP_target{,_data,_update}?

Thanks,
  -- Ilya


Re: cost model patch

2013-09-26 Thread Xinliang David Li
On Thu, Sep 26, 2013 at 7:37 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com wrote:
 I took the liberty to pick up Richard's original fvect-cost-model
 patch and made some modification.

 What has not changed:
 1) option -ftree-vect-loop-version is removed;
 2) three cost models are introduced: cheap, dynamic, and unlimited;
 3) unless explicitly specified, cheap model is the default at O2 (e.g.
 when -ftree-loop-vectorize is used with -O2), and dynamic mode is the
 default for O3 and FDO
 4) alignment based versioning is disabled with cheap model.

 What has changed:
 1) peeling is also disabled with cheap model;
 2) alias check condition limit is reduced with cheap model, but not
 completely suppressed. Runtime alias check is a pretty important
 enabler.
 3) tree if conversion changes are not included.

 Does this patch look reasonable?

 In principle yes.  Note that it changes the behavior of -O2 -ftree-vectorize
 as -ftree-vectorize does not imply changing the default cost model.  I am
 fine with that, but eventually this will have some testsuite fallout.  This
 reorg would also need documenting in changes.html to make people
 aware of this.


Here is the proposed change:


Index: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.26
diff -u -r1.26 changes.html
--- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26
+++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 -
@@ -37,6 +37,7 @@
   ul
 liAddressSanitizer, a fast memory error detector, is now
available on ARM.
 /li
+liGCC introduces a new cost model for vectorizer, called
'cheap' model. The new cost model is intenteded to minimize compile
time, code size, and potential negative runtime impact introduced when
vectorizer is turned on at the expense of not getting the maximum
potential runtime speedup. The 'cheap' model will be the default when
vectorizer is turned on at code-O2/code. To override this, use
option code-fvect-cost-model=[cheap|dynamic|unlimited]/code.
   /ul

 h2New Languages and Language specific improvements/h2



 With completely disabling alingment peeling and alignment versioning
 you cut out targets that have no way of performing unaligned accesses.
 From looking at vect_no_align this are mips, sparc, ia64 and some arm.
 A compromise for them would be to allow peeling a single iteration
 and some alignment checks (like up to two?).


Possibly. I think target owners can choose to do target specific
tunings as follow up.


 Reducing the number of allowed alias-checks is ok, but I'd reduce it
 more than to 6 (was that an arbitrary number or is that the result of
 some benchmarking?)


yes -- we found that it is not uncommon to have a loop with 2 or 3
distinct source address and 1 or 2 target address.

There are also tuning opportunities. For instance, in cases where
source address are derived from the same base, a consolidated alias
check (against the whole access range instead of just checking cross
1-unrolled iteration dependence) can be done.

 I suppose all of the params could use some benchmarking to select
 a sweet spot in code size vs. runtime.

Agree.



 I suppose the patch is ok as-is (if it actually works) if you provide
 a changelog and propose an entry for changes.html.  We can
 tune the params for the cheap model as followup.

Ok. I will do more testing and check in the patch with proper
ChangeLog. The changes.html change will be done separately.

thanks,

David



 Thanks for picking this up,
 Richard.

 thanks,

 David


[gomp4] Library side of depend clause support

2013-09-26 Thread Jakub Jelinek
Hi!

This patch adds depend clause support.
In GOMP_task, before queueing the task, if task has any depend clauses
we look up the addresses in a hash table (in the parent task, because
only sibling tasks are ordered through depend clause), and if there
are any dependencies on the earlier started tasks, the new task
isn't added to team, parent and taskgroup task queues, but instead
just added into the earlier task's depender vectors.  Each task
has also an integer number of how many other tasks it depends on.
When a task on which something depends on finishes, if parent exists,
it's records are removed from parent's depend address hash table,
and even if parent doesn't exist anymore, we decrement num_dependees
of every task mentioned in the dependers vector.  If any of those
counters go to zero, we insert them into all the relevant task queues.

Tested on x86_64-linux.  Will commit tomorrow unless somebody complains,
but in any case would appreciate review of the changes.

2013-09-26  Jakub Jelinek  ja...@redhat.com

* libgomp.h: Include stdlib.h.
(struct gomp_task_depend_entry): New type.
(struct gomp_task): Add dependers, depend_hash, depend_count,
num_dependees and depend fields.
(struct gomp_taskgroup): Add num_children field.
(gomp_finish_task): Free depend_hash if non-NULL.
* libgomp_g.h (GOMP_task): Add depend argument.
* hashtab.h: New file.
* task.c: Include hashtab.h.
(hash_entry_type): New typedef.
(htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
(gomp_init_task): Clear dependers, depend_hash and depend_count
fields.
(GOMP_task): Add depend argument, handle depend clauses.  Increment
num_children field in taskgroup.
(gomp_task_run_pre): Don't increment task_running_count here,
nor clear task_pending bit.
(gomp_task_run_post_handle_depend_hash,
gomp_task_run_post_handle_dependers,
gomp_task_run_post_handle_depend): New functions.
(gomp_task_run_post_remove_parent): Clear in_taskwait before
signalling corresponding semaphore.
(gomp_task_run_post_remove_taskgroup): Decrement num_children
field and make the decrement to 0 MEMMODEL_RELEASE operation,
rather than storing NULL to taskgroup-children.  Clear
in_taskgroup_wait before signalling corresponding semaphore.
(gomp_barrier_handle_tasks): Move task_running_count increment
and task_pending bit clearing here.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks
have been queued, wake other threads if needed.
(GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
than one new tasks have been queued, wake other threads if needed.
After waiting on taskwait_sem, enter critical section again.
(GOMP_taskgroup_start): Initialize num_children field.
(GOMP_taskgroup_end): Check num_children instead of children
before critical section.  If children is NULL, but num_children
is non-zero, wait on taskgroup_sem.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks have
been queued, wake other threads if needed.  After waiting on
taskgroup_sem, enter critical section again.
* testsuite/libgomp.c/depend-1.c: New test.
* testsuite/libgomp.c/depend-2.c: New test.

--- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200
+++ libgomp/libgomp.h   2013-09-26 17:17:28.267001263 +0200
@@ -39,6 +39,7 @@
 
 #include pthread.h
 #include stdbool.h
+#include stdlib.h
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -253,7 +254,19 @@ enum gomp_task_kind
   GOMP_TASK_TIED
 };
 
+struct gomp_task;
 struct gomp_taskgroup;
+struct htab;
+
+struct gomp_task_depend_entry
+{
+  void *addr;
+  struct gomp_task_depend_entry *next;
+  struct gomp_task_depend_entry *prev;
+  struct gomp_task *task;
+  bool is_in;
+  bool redundant;
+};
 
 /* This structure describes a task to be run by a thread.  */
 
@@ -268,6 +281,10 @@ struct gomp_task
   struct gomp_task *next_taskgroup;
   struct gomp_task *prev_taskgroup;
   struct gomp_taskgroup *taskgroup;
+  struct gomp_task **dependers;
+  struct htab *depend_hash;
+  size_t depend_count;
+  size_t num_dependees;
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -277,6 +294,7 @@ struct gomp_task
   bool final_task;
   bool copy_ctors_done;
   gomp_sem_t taskwait_sem;
+  struct gomp_task_depend_entry depend[];
 };
 
 struct gomp_taskgroup
@@ -286,6 +304,7 @@ struct gomp_taskgroup
   bool in_taskgroup_wait;
   bool cancelled;
   gomp_sem_t taskgroup_sem;
+  size_t num_children;
 };
 
 /* This structure describes a team of threads.  These are the threads
@@ -525,6 +544,8 @@ extern void gomp_barrier_handle_tasks (g
 static void inline
 gomp_finish_task (struct gomp_task *task)
 {
+  if (__builtin_expect 

Re: [patch] fix libstdc++/55861

2013-09-26 Thread Jonathan Wakely
On 19 January 2013 23:43, Jonathan Wakely wrote:
 PR libstdc++/55861
 * include/std/future (_State_base::_S_check(const shared_ptrT)):
 Fix return type.
 (__basic_future::_M_get_result()): Const qualify.
 (shared_future::get()): Likewise.
 * testsuite/30_threads/shared_future/members/get.cc: Use const
 objects.

 Tested x86_64-linux, committed to trunk.

I've backported the first part of this, fixing the return type, to the
4.7 branch.

Unfortunately it makes std::future unusable with clang, see
http://llvm.org/bugs/show_bug.cgi?id=17375

Tested x86_64-linux, committed to the 4.7 branch.
commit e2829e7c39c153cfc0d09cd8a8be14c5467c7730
Author: Jonathan Wakely jwakely@gmail.com
Date:   Thu Sep 26 10:11:29 2013 +0100

Backport from mainline

2013-01-19  Jonathan Wakely  jwakely@gmail.com

PR libstdc++/55861
* include/std/future (_State_base::_S_check(const shared_ptrT)):
Fix return type.

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index 98c7b84..150c1af 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -1,6 +1,6 @@
 // future -*- C++ -*-
 
-// Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+// Copyright (C) 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -456,7 +456,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __setter(promisevoid* __prom);
 
   templatetypename _Tp
-static bool
+static void
 _S_check(const shared_ptr_Tp __p)
 {
   if (!static_castbool(__p))


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-26 Thread Rong Xu
On Tue, Sep 24, 2013 at 5:31 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi Honza,

 I am finally getting back to working on this after a few weeks of
 working on some other priorities.

 I am also trying to return to this, so good timming ;)
 Martin has got smaller C++ programs (Inkscape) to not touch cold segment
 during the startup with FDO (w/o partitioning). Firefox still does, I think
 the problem are lost samples due to different linker decisions even with LTO.
 (i.e. linker pick an object from .a libraryat profile-generate time that i 
 never
 passes later.).

 I plan to look into that today.

 Did you mean to commit the above change? I see that it went in as part
 of r202258 but doesn't show up in the ChangeLog entry for that
 revision.

 Yes, I meant to check it in, but did not mean to do so w/o Changelog.  I wil
 fix that.

 
  In other cases it was mostly loop unrolling in combination with jump 
  threading. So
  I modified my script to separately report when failure happens for test 
  trained
  once and test trained hundred times.

 Thanks for the linker script. I reproduced your results. I looked at a
 couple cases. The first was one that failed after 1 training run only
 (2910-2.c). It was due to jump threading, which you noted was a
 problem. For this one I think we can handle it in the partitioning,
 since there is an FDO insanity that we could probably treat more
 conservatively when splitting.

 We should fix the roundoff issues - when I was introducing the
 frequency/probability/count system I made an assumptions that parts of 
 programs
 with very low counts do not matter, since they are not part of hot spot (and I
 cared only about the hot spot).  Now we care about identifying unlikely
 executed spots and we need to fix this.

 I looked at one that failed after 100 as well (20031204-1.c). In this
 case, it was due to expansion which was creating multiple branches/bbs
 from a logical OR and guessing incorrectly on how to assign the
 counts:

  if (octets == 4  (*cp == ':' || *cp == '\0')) {

 The (*cp == ':' || *cp == '\0') part looked like the following going
 into RTL expansion:

   [20031204-1.c : 31:33] _29 = _28 == 58;
   [20031204-1.c : 31:33] _30 = _28 == 0;
   [20031204-1.c : 31:33] _31 = _29 | _30;
   [20031204-1.c : 31:18] if (_31 != 0)
 goto bb 16;
   else
 goto bb 19;

 where the result of the OR was always true, so bb 16 had a count of
 100 and bb 19 a count of 0. When it was expanded, the expanded version
 of the above turned into 2 bbs with a branch in between. Both
 comparisons were done in the first bb, but the first bb checked
 whether the result of the *cp == '\0' compare was true, and if not
 branched to the check for whether the *cp == ':' compare was true. It
 gave the branch to the second check against ':' a count of 0, so that
 bb got a count of 0 and was split out, and put the count of 100 on the
 fall through assuming the compare with '\0' always evaluated to true.
 In reality, this OR condition was always true because *cp was ':', not
 '\0'. Therefore, the count of 0 on the second block with the check for
 ':' was incorrect, we ended up trying to execute it, and failed.

 Presumably we had the correct profile data for both blocks, but the
 accuracy was reduced when the OR was represented as a logical
 computation with a single branch. We could change the expansion code
 to do something different, e.g. treat as a 50-50 branch. But we would
 still end up with integer truncation issues when there was a single
 training run. But that could be dealt with conservatively in the
 bbpart code as I suggested for the jump threading issue above. I.e. a
 cold block with incoming non-cold edges conservatively not marked cold
 for splitting.

 
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2422-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2910-2.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20020413-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20030903-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c
  FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060420-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060905-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-2.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120808-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c
  FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c
  FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920726-1.c
  FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c
  FAIL 

OMP4/cilkplus: simd clone function mangling

2013-09-26 Thread Aldy Hernandez

Hi folks.

Both OMP4 and Cilk Plus provide mechanisms for simd function cloning. 
In OMP4 it's #pragma omp declare simd and in Cilk Plus they are 
elemental functions (or simd-enabled functions in their Intel's 
latest nomenclature).  For lack of a better term, I'll call them simd 
clones.


We need a generic way of representing the clone information all the way 
through to the vectorizer, as well as provide unique mangling for the 
simd functions.  The current patch does both.  Currently nothing is done 
with the information, but I wanted to get this out and get feedback as 
this will affect not only OMP4 and Cilk Plus, but possibly OpenACC and 
others-- and later, the vectorizer.


Intel's Vector Function ABI v0.9.5 document explains how their 
mangling and ABI works for Cilk Plus, so this seems like a good start. 
I have adapted the algorithm to use a more generic interface for either 
non-x86* or non-CilkPlus.


I'd like to keep/commit this in the gomp4 branch, since it has a 
complete parsing implementation of #pragma omp declare simd.


Tested on x86-64 Linux on the gomp-4_0-branch.

OK for branch?
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1a12eda..ff1533c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,45 @@
+2013-09-24  Aldy Hernandez  al...@redhat.com
+
+   * Makefile.in (omp-low.o): Depend on PRETTY_PRINT_H.
+   * ipa-cp.c (determine_versionability): Nodes with SIMD clones are
+   not versionable.
+   * ggc.h (ggc_alloc_cleared_simd_clone_stat): New.
+   * cgraph.h (enum linear_stride_type): New.
+   (struct simd_clone_arg): New.
+   (struct simd_clone): New.
+   (struct cgraph_node): Add `simdclone' field.
+   Add `has_simd_clones' field.
+   * omp-low.c: Add new pass_omp_simd_clone support code.
+   (vecsize_mangle): New.
+   (ipa_omp_simd_clone): New.
+   (simd_clone_clauses_extract): New.
+   (simd_clone_compute_base_data_type): New.
+   (simd_clone_compute_isa_and_simdlen): New.
+   (simd_clone_create): New.
+   (simd_clone_mangle): New.
+   (simd_clone_struct_allow): New.
+   (simd_clone_struct_copy): New.
+   (class argno_map): New.
+   (argno_map::argno_map(tree)): New.
+   (argno_map::~argno_map): New.
+   (argno_map::to_tree): New.
+   * tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): New.
+   * tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document.
+   * tree-pass.h (make_pass_omp_simd_clone): New.
+   * passes.def (pass_omp_simd_clone): New.
+   * target.def: Define new hook prefix TARGET_CILKPLUS_.
+   (default_vector_mangling_isa_code): New.
+   (max_vector_size_for_isa): New.
+   * doc/tm.texi.in: Add placeholder for
+   TARGET_CILKPLUS_DEFAULT_DEFAULT_VECTOR_MANGLING_ISA_CODE,
+   TARGET_CILKPLUS_MAX_VECTOR_SIZE_FOR_ISA.
+   * doc/tm.texi: Regenerate.
+   * config/i386/i386.c (ix86_cilkplus_default_vector_mangling_isa_code):
+   New.
+   (ix86_cilkplus_max_vector_size_for_isa): New.
+   (TARGET_CILKPLUS_DEFAULT_DEFAULT_VECTOR_MANGLING_ISA_CODE): Define.
+   (TARGET_CILKPLUS_MAX_VECTOR_SIZE_FOR_ISA): Define.
+
 2013-09-19  Jakub Jelinek  ja...@redhat.com
 
PR tree-optimization/58472
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c006711..4fc7e48 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2573,6 +2573,7 @@ omp-low.o : omp-low.c $(CONFIG_H) $(SYSTEM_H) coretypes.h 
$(TM_H) $(TREE_H) \
$(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h $(DIAGNOSTIC_CORE_H) \
$(TREE_SSA_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \
$(TREE_PASS_H) $(GGC_H) $(EXCEPT_H) $(SPLAY_TREE_H) $(OPTABS_H) \
+   $(PRETTY_PRINT_H) \
$(CFGLOOP_H) tree-iterator.h $(TARGET_H) gt-omp-low.h
 tree-browser.o : tree-browser.c tree-browser.def $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(HASH_TABLE_H) $(TREE_H) $(TREE_PRETTY_PRINT_H)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 50e8743..3db29cd 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -248,6 +248,68 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
+enum linear_stride_type {
+  LINEAR_STRIDE_NO,
+  LINEAR_STRIDE_YES_CONSTANT,
+  LINEAR_STRIDE_YES_VARIABLE
+};
+
+/* Function arguments in the original function of a SIMD clone.
+   Supplementary data for `struct simd_clone'.  */
+
+struct GTY(()) simd_clone_arg {
+  /* A SIMD clone's argument can be either linear (constant or
+ variable), uniform, or vector.  If the argument is neither linear
+ or uniform, the default is vector.  */
+
+  /* If the linear stride is a constant, `linear_stride' is
+ LINEAR_STRIDE_YES_CONSTANT, and `linear_stride_num' holds
+ the numeric stride.
+
+ If the linear stride is variable, `linear_stride' is
+ LINEAR_STRIDE_YES_VARIABLE, and `linear_stride_num' contains
+ the function argument containing the stride (as an index into the
+ function arguments starting at 0).
+
+ Otherwise, `linear_stride' is LINEAR_STRIDE_NO 

Re: OMP4/cilkplus: simd clone function mangling

2013-09-26 Thread Aldy Hernandez

+  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
+ be cloned have a distinctive artificial label in addition to omp
+ declare simd.  */
+  bool cilk_clone = flag_enable_cilkplus
+ lookup_attribute (cilk plus elemental,
+DECL_ATTRIBUTES (new_node-symbol.decl));
+  if (cilk_clone)
+remove_attribute (cilk plus elemental,
+ DECL_ATTRIBUTES (new_node-symbol.decl));


Oh yeah, rth had asked me why I remove the attribute.  My initial 
thoughts were that whether or not a function is a simd clone can be 
accessed through the cgraph bits (node-simdclone != NULL for the 
clone, and node-has_simd_clones for the parent).  No sense keeping 
the attribute.  But I can leave it if you think it's better.


Aldy


Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3

2013-09-26 Thread Michael Meissner
I discovered that I was setting the wv/wu constraints incorrectly to
ALTIVEC_REGS, which leads to reload failures in some cases.

Is this patch ok to apply along with the previous patch assuming it bootstraps
and has no regressions with make check?  It builds the programs that had
failures with the previous patch.

2013-09-26  Michael Meissner  meiss...@linux.vnet.ibm.com

* config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't
allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can
occupy the Altivec registers.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 202846)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1209,9 +1209,9 @@ BU_VSX_1 (XVRSPIZ,  xvrspiz,CONS
 
 BU_VSX_1 (XSRDPI,xsrdpi, CONST,  vsx_xsrdpi)
 BU_VSX_1 (XSRDPIC,   xsrdpic,CONST,  vsx_xsrdpic)
-BU_VSX_1 (XSRDPIM,   xsrdpim,CONST,  vsx_floordf2)
-BU_VSX_1 (XSRDPIP,   xsrdpip,CONST,  vsx_ceildf2)
-BU_VSX_1 (XSRDPIZ,   xsrdpiz,CONST,  vsx_btruncdf2)
+BU_VSX_1 (XSRDPIM,   xsrdpim,CONST,  floordf2)
+BU_VSX_1 (XSRDPIP,   xsrdpip,CONST,  ceildf2)
+BU_VSX_1 (XSRDPIZ,   xsrdpiz,CONST,  btruncdf2)
 
 /* VSX predicate functions.  */
 BU_VSX_P (XVCMPEQSP_P,   xvcmpeqsp_p,CONST,  vector_eq_v4sf_p)
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 202874)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -2394,13 +2394,17 @@ rs6000_init_hard_regno_mode_ok (bool glo
   rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS;
   rs6000_constraints[RS6000_CONSTRAINT_wd] = VSX_REGS;
   rs6000_constraints[RS6000_CONSTRAINT_wf] = VSX_REGS;
-  rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS;
 
   if (TARGET_VSX_TIMODE)
rs6000_constraints[RS6000_CONSTRAINT_wt] = VSX_REGS;
 
-  rs6000_constraints[RS6000_CONSTRAINT_ws]
-   = (TARGET_UPPER_REGS_DF) ? VSX_REGS : FLOAT_REGS;
+  if (TARGET_UPPER_REGS_DF)
+   {
+ rs6000_constraints[RS6000_CONSTRAINT_ws] = VSX_REGS;
+ rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS;
+   }
+  else
+   rs6000_constraints[RS6000_CONSTRAINT_ws] = FLOAT_REGS;
 }
 
   /* Add conditional constraints based on various options, to allow us to
@@ -2420,12 +2424,16 @@ rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_POWERPC64)
 rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS;
 
-  if (TARGET_P8_VECTOR)
+  if (TARGET_P8_VECTOR  TARGET_UPPER_REGS_SF)
 {
   rs6000_constraints[RS6000_CONSTRAINT_wu] = ALTIVEC_REGS;
-  rs6000_constraints[RS6000_CONSTRAINT_wy]
-   = rs6000_constraints[RS6000_CONSTRAINT_ww]
-   = (TARGET_UPPER_REGS_SF) ? VSX_REGS : FLOAT_REGS;
+  rs6000_constraints[RS6000_CONSTRAINT_wy] = VSX_REGS;
+  rs6000_constraints[RS6000_CONSTRAINT_ww] = VSX_REGS;
+}
+  else if (TARGET_P8_VECTOR)
+{
+  rs6000_constraints[RS6000_CONSTRAINT_wy] = FLOAT_REGS;
+  rs6000_constraints[RS6000_CONSTRAINT_ww] = FLOAT_REGS;
 }
   else if (TARGET_VSX)
 rs6000_constraints[RS6000_CONSTRAINT_ww] = FLOAT_REGS;
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 202846)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -617,6 +617,25 @@ extern int rs6000_vector_align[];
  || rs6000_cpu == PROCESSOR_PPC8548)
 
 
+/* Whether SF/DF operations are supported on the E500.  */
+#define TARGET_SF_SPE  (TARGET_HARD_FLOAT  TARGET_SINGLE_FLOAT   \
+ !TARGET_FPRS)
+
+#define TARGET_DF_SPE  (TARGET_HARD_FLOAT  TARGET_DOUBLE_FLOAT   \
+ !TARGET_FPRS  TARGET_E500_DOUBLE)
+
+/* Whether SF/DF operations are supported by by the normal floating point unit
+   (or the vector/scalar unit).  */
+#define TARGET_SF_FPR  (TARGET_HARD_FLOAT  TARGET_FPRS   \
+ TARGET_SINGLE_FLOAT)
+
+#define TARGET_DF_FPR  (TARGET_HARD_FLOAT  TARGET_FPRS   \
+ TARGET_DOUBLE_FLOAT)
+
+/* Whether SF/DF operations are supported by any hardware.  */
+#define TARGET_SF_INSN (TARGET_SF_FPR || TARGET_SF_SPE)
+#define TARGET_DF_INSN (TARGET_DF_FPR || TARGET_DF_SPE)
+
 /* Which machine supports the various reciprocal estimate instructions.  */
 #define TARGET_FRES(TARGET_HARD_FLOAT  TARGET_PPC_GFXOPT \
  TARGET_FPRS  TARGET_SINGLE_FLOAT)


Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3

2013-09-26 Thread Michael Meissner
Just to be clear, I was only asking about the change in rs6000.c.  The other
two changes (rs6000-builtins.def, rs6000.h) will be part of the next patch
set.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA. 2/2 New registers and instructions

2013-09-26 Thread Uros Bizjak
On Tue, Sep 17, 2013 at 10:41 AM, Ilya Enkovich enkovich@gmail.com wrote:

  The x86 part looks mostly OK (I have a couple of comments bellow), but
  please first get target-independent changes reviewed and committed.
 
  Do you mean I should move bound type and mode declaration into a separate 
  patch?

 Yes, target-independent part (middle end) has to go through the
 separate review to check if this part is OK. The target-dependent part
 uses the infrastructure from the middle end, so it can go into the
 code base only after target-independent parts are committed.

 I sent a separate patch for bound type and mode class 
 (http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01268.html). Here is target 
 part of the patch with fixes you mentioned. Does it look OK?

 Bootstrapped and checked on linux-x86_64. Still shows incorrect length 
 attribute computation (described here 
 http://gcc.gnu.org/ml/gcc/2013-07/msg00311.html).

Please look at the attached patch that solves length computation
problem. The patch also implements length calculation in a generic
way, as proposed earlier.

The idea is to calculate total insn length via generic length
attribute calculation from length_nobnd attribute, but iff
length_attribute is non-null. This way, we are able to decorate
bnd-prefixed instructions by lenght_nobnd attribute, and generic
part will automatically call ix86_bnd_prefixed_insn_p predicate with
current insn pattern. I also belive that this approach is most
flexible to decorate future patterns.

The patch adds new attribute to a couple of patterns to illustrate its usage.

Please test this approach. Modulo length calculations, improved by the
patch in this message, I have no further comments, but please repost
complete (target part) of your patch.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 202953)
+++ config/i386/i386.md (working copy)
@@ -562,12 +562,19 @@
 ]
 (const_int 1)))
 
+;; When this attribute is set, calculate total insn length from
+;; length_nobnd attribute, prefixed with eventual bnd prefix byte
+(define_attr length_nobnd  (const_int 0))
+
 ;; The (bounding maximum) length of an instruction in bytes.
 ;; ??? fistp and frndint are in fact fldcw/{fistp,frndint}/fldcw sequences.
 ;; Later we may want to split them and compute proper length as for
 ;; other insns.
 (define_attr length 
-  (cond [(eq_attr type other,multi,fistp,frndint)
+  (cond [(eq_attr length_nobnd !0)
+  (plus (symbol_ref (ix86_bnd_prefixed_insn_p (insn)))
+(attr length_nobnd))
+(eq_attr type other,multi,fistp,frndint)
   (const_int 16)
 (eq_attr type fcmp)
   (const_int 4)
@@ -10683,7 +10690,7 @@
   %+j%C1\t%l0
   [(set_attr type ibr)
(set_attr modrm 0)
-   (set (attr length)
+   (set (attr length_nobnd)
   (if_then_else (and (ge (minus (match_dup 0) (pc))
  (const_int -126))
  (lt (minus (match_dup 0) (pc))
@@ -10701,7 +10708,7 @@
   %+j%c1\t%l0
   [(set_attr type ibr)
(set_attr modrm 0)
-   (set (attr length)
+   (set (attr length_nobnd)
   (if_then_else (and (ge (minus (match_dup 0) (pc))
  (const_int -126))
  (lt (minus (match_dup 0) (pc))
@@ -11623,7 +11630,7 @@
   [(simple_return)]
   reload_completed
   ret
-  [(set_attr length 1)
+  [(set_attr length_nobnd 1)
(set_attr atom_unit jeu)
(set_attr length_immediate 0)
(set_attr modrm 0)])


[Patch] Regex Fixes on constants and no-backref in extended

2013-09-26 Thread Tim Shen
Just follow the standard :)

Booted and tested under -m32, -m64.

Thank you!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Regex Fixes on constants and no-backref in extended

2013-09-26 Thread Jonathan Wakely
On 26 September 2013 22:28, Tim Shen wrote:
 Just follow the standard :)

 Booted and tested under -m32, -m64.

The ChangeLog entry says stanard, with that fixed this is OK to
commit, thanks!


Re: [Patch] Regex Fixes on constants and no-backref in extended

2013-09-26 Thread Tim Shen
On Thu, Sep 26, 2013 at 5:37 PM, Jonathan Wakely jwakely@gmail.com wrote:
 The ChangeLog entry says stanard, with that fixed this is OK to
 commit, thanks!

Committed.


-- 
Tim Shen


Re: [PATCH] Trivial cleanup

2013-09-26 Thread Andrew MacLeod

On 09/26/2013 10:21 AM, Richard Biener wrote:

On Thu, Sep 26, 2013 at 4:15 PM, Michael Matz m...@suse.de wrote:

Hi,

On Wed, 25 Sep 2013, Jeff Law wrote:


I was going to bring it up at some point too.  My preference is
strongly to simply eliminate the space on methods...

Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.

Yea.  I actually reviewed the libstdc++ guidelines to see where they differed
from GNU's C guidelines.

I'm strongly in favor of dropping the horizontal whitespace between the
method name and its open paren when the result is then dereferenced.
ie foo.last()-e rather than foo.last ()-e.

I'd prefer to not write in this style at all, like Jakub.  If we must
absolutely have it, then I agree that the space before _empty_ parentheses
are ugly if followed by references.  I.e. I'd like to see spaces before
parens as is customary, except in one case: empty parens in the middle of
expressions (which don't happen very often right now in GCC, and hence
wouldn't introduce a coding style confusion):

do.this ();
give.that()-flag;
get.list (one)-clear ();

I'd prefer to not have further references to return values be applied,
though (as in, the parentheses should be the end of statement), which
would avoid the topic (at the expensive of having to invent names for
those temporaries, or to write trivial wrapper methods contracting several
method calls).

Seconded, even give.that()-flag; is ugly.


I don't find it ugly :-)

so my example would end up being
   int unsignedsrcp = ptrvar.type().type().type_unsigned ();  ?
I can live with this suggestion...  its the sequence of 2 or 3 or 4 
empty parens with spaces that I really found distasteful.


Andrew


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-26 Thread Jan Hubicka
  As for COMDAT merging, i would like to see the patch.  I am experimenting
  now with a patch to also privatize COMDATs during -fprofile-generate to
  avoid problems with lost profiles mentioned above.
 
 
 Do you mean you privatize every COMDAT function in the profile-generate?
 We discussed this idea internally and we thought it would not work for
 large applications (like in google) due to size.

Yes, Martin and I plan to test this on firefox.  In a way you already have all
the COMDAT functions unshared in the object files, so the resulting binary
should not be completely off the limits.  But I do not have any quantitative
data, yet, since we hit bug in constant folding and devirtualization I fixed in
meantime but we did not re-run the tests yet.

 
  As for context sensitivity, one approach would be to have two sets of
  counters for every comdat - one merged globally and one counting local
  instances.  We can then privatize always and at profile read in stage
  just clone every comdat and have two instances - one for offline copy
  and one for inlining.
 
 
 In my implementation, I also allow multiple sets of COMDAT profile
 co-existing in one compilation.
 Due to the auxiliary modules in LIPO, I actually have more than two.

How does auxiliary modules work?
 
 But I'm wondering how do you determine which profile to use for each
 call-site -- the inline decision may not
 be the same for profile-generate and profile-use compilation.

My suggestion was to simply use the module local profile for all inline sites
within the given module and the global profile for the offline copy of the
function (that one will, in the case it survives linking, be shared across
all the modules anyway).

I think this may work in the cases where i.e. use of hash templates in one
module is very different (in average size) from other module.
I did not really put much effort into it - I currently worry primarily about
the cases where profile is lost completely since it gets attached to a function
not surviving final linking (or because we inline something we did not inlined
at profile time).

As for context sensitivity, we may try to consider developing more consistent
solution for this.  COMDAT functions are definitely not only that may exhibit
context sensitive behaviour.
One approach would be to always have multiple counters for each function and
hash based on cbacktraces collected by indirect call profiling instrumentation.
In a way this is same path profiling, but that would definitely add quite some
overhead + we will need to think of resonable way to represent this within
compiler.

How do you decide what functions you want to have multiple profiles for?

Honza


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-26 Thread Jan Hubicka
 
 Why not just have probably_never_executed_bb_p return simply return
 false bb-frequency is non-zero (right now it does the opposite -

We want to have frequencies guessed for functions that was not trained
in the profiling run (that was patch I posted earlier that I think did not
go in, yet).

Currently I return true when frequency indicate that BB is executed at least in
1/4th of all executions.  With the cases discussed I see we may need to reduce
this threshold.  In general I do not like much hard tests for 0 because meaning
of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may
want to make frequencies sreal, too.

I suppose we may introduce --param for this.  You are also right that I should
update probably_never_executed_edge_p (I intended so, but obviously the code
ended up in mainline accidentally).

I however saw at least one case of jump threading where this trick did not
help: the jump threading update confused itself by scaling via counts rather
than frequencies and ended up with dropping everything to 0. This makes it 
more tempting to try to go with sreals for those

Honza

 returns true when bb-frequency is 0)? Making this change removed a
 bunch of other failures. With this change as well, there are only 3
 cases that still fail with 1 train run that pass with 100. Need to
 look at those.
 
 
  Will you look into logic of do_jump or shall I try to dive in?
 
 I can take a look, but probably won't have a chance until late this
 week. If you don't get to it before then I will see if I can figure
 out why it is applying the branch probabilities this way.
 
 Teresa
 
 
  Honza
 
 
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[google gcc-4_8] fix size_estimation for builtin_expect

2013-09-26 Thread Rong Xu
Hi,

builtin_expect should be a NOP in size_estimation. Indeed, the call
stmt itself is 0 weight in size and time. But it may introduce
an extra relation expr which has non-zero size/time. The end result
is: for w/ and w/o builtin_expect, we have different size/time estimation
for early inlining.

This patch fixes this problem.

-Rong
2013-09-26  Rong Xu  x...@google.com

* ipa-inline-analysis.c (estimate_function_body_sizes): fix
the size estimation for builtin_expect.

Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 202638)
+++ ipa-inline-analysis.c   (working copy)
@@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node *
   /* Estimate static overhead for function prologue/epilogue and alignment. */
   int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
   int size = overhead;
+  gimple fix_expect_builtin;
+
   /* Benefits are scaled by probability of elimination that is in range
  0,2.  */
   basic_block bb;
@@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node *
}
}
 
+  fix_expect_builtin = NULL;
   for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
{
  gimple stmt = gsi_stmt (bsi);
+ if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT))
+{
+  tree var = gimple_call_lhs (stmt);
+  tree arg = gimple_call_arg (stmt, 0);
+  use_operand_p use_p;
+  gimple use_stmt;
+  bool match = false;
+  bool done = false;
+  gcc_assert (var  arg);
+  gcc_assert (TREE_CODE (var) == SSA_NAME);
+
+  while (TREE_CODE (arg) == SSA_NAME)
+{
+  gimple stmt_tmp = SSA_NAME_DEF_STMT (arg);
+  switch (gimple_assign_rhs_code (stmt_tmp))
+{
+  case LT_EXPR:
+  case LE_EXPR:
+  case GT_EXPR:
+  case GE_EXPR:
+  case EQ_EXPR:
+  case NE_EXPR:
+match = true;
+done = true;
+break;
+  case NOP_EXPR:
+break;
+  default:
+done = true;
+break;
+}
+  if (done)
+break;
+  arg = gimple_assign_rhs1 (stmt_tmp);
+}
+
+  if (match  single_imm_use (var, use_p, use_stmt))
+{
+  if (gimple_code (use_stmt) == GIMPLE_COND)
+{
+  fix_expect_builtin = use_stmt;
+}
+}
+
+  /* we should see one builtin_expert call in one bb.  */
+  break;
+}
+}
+
+  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
+   {
+ gimple stmt = gsi_stmt (bsi);
  int this_size = estimate_num_insns (stmt, eni_size_weights);
  int this_time = estimate_num_insns (stmt, eni_time_weights);
  int prob;
  struct predicate will_be_nonconstant;
 
+ if (stmt == fix_expect_builtin)
+{
+  this_size--;
+  this_time--;
+}
+
  if (dump_file  (dump_flags  TDF_DETAILS))
{
  fprintf (dump_file,   );


[google gcc-4_8] alternate hirate for builtin_expert

2013-09-26 Thread Rong Xu
Hi,

Current default probably for builtin_expect is 0.9996.
This makes the freq of unlikely bb very low (4), which
suppresses the inlining of any calls within those bb.

We used FDO data to measure the branch probably for
the branch annotated with builtin_expert.
 For google
internal benchmarks, the weight average
(the profile count value as the weight) is 0.9081.

Linux kernel is another program that is heavily annotated
with builtin-expert. We measured its weight average as 0.8717,
  using google search as
the workload.


This patch sets the alternate hirate probability for
builtin_expert
to 90%. With the alternate hirate, we measured performance
  improvement for google
benchmarks and Linux kernel.


  -Rong
2013-09-26  Rong Xu  x...@google.com

* params.def (DEFPARAM): New.
* params.def: New.
* predict.c (tree_predict_by_opcode): Alternate 
probablity hirate for builtin_expect.

Index: params.def
===
--- params.def  (revision 202638)
+++ params.def  (working copy)
@@ -483,6 +483,10 @@ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY,
 tracer-min-branch-probability,
 Stop forward growth if the probability of best edge is less than this 
threshold (in percent). Used when profile feedback is not available,
 50, 0, 100)
+DEFPARAM(BUILTIN_EXPECT_PROBABILITY_RELAXED,
+builtin-expect-probability-relaxed,
+Set the estimated probability for builtin expect. By default using 
PROB_VERY_LIKELY, while value of 1 using HIRATE(90) probability.,
+0, 0, 1)
 
 /* The maximum number of incoming edges to consider for crossjumping.  */
 DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES,
Index: params.def
===
--- params.def  (revision 202638)
+++ params.def  (working copy)
@@ -483,6 +483,10 @@ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY,
 tracer-min-branch-probability,
 Stop forward growth if the probability of best edge is less than this 
threshold (in percent). Used when profile feedback is not available,
 50, 0, 100)
+DEFPARAM(BUILTIN_EXPECT_PROBABILITY_RELAXED,
+builtin-expect-probability-relaxed,
+Set the estimated probability for builtin expect. By default using 
PROB_VERY_LIKELY, while value of 1 using HIRATE(90) probability.,
+0, 0, 1)
 
 /* The maximum number of incoming edges to consider for crossjumping.  */
 DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES,
Index: predict.c
===
--- predict.c   (revision 202638)
+++ predict.c   (working copy)
@@ -1950,10 +1950,17 @@ tree_predict_by_opcode (basic_block bb)
   BITMAP_FREE (visited);
   if (val)
 {
+  enum br_predictor predictor;
+
+  if (PARAM_VALUE (BUILTIN_EXPECT_PROBABILITY_RELAXED) == 0)
+predictor = PRED_BUILTIN_EXPECT;
+  else
+predictor = PRED_BUILTIN_EXPECT_RELAXED;
+
   if (integer_zerop (val))
-   predict_edge_def (then_edge, PRED_BUILTIN_EXPECT, NOT_TAKEN);
+   predict_edge_def (then_edge, predictor, NOT_TAKEN);
   else
-   predict_edge_def (then_edge, PRED_BUILTIN_EXPECT, TAKEN);
+   predict_edge_def (then_edge, predictor, TAKEN);
   return;
 }
   /* Try pointer heuristic.


Re: [google gcc-4_8] fix size_estimation for builtin_expect

2013-09-26 Thread Jan Hubicka
 Hi,
 
 builtin_expect should be a NOP in size_estimation. Indeed, the call
 stmt itself is 0 weight in size and time. But it may introduce
 an extra relation expr which has non-zero size/time. The end result
 is: for w/ and w/o builtin_expect, we have different size/time estimation
 for early inlining.
 
 This patch fixes this problem.
 
 -Rong

 2013-09-26  Rong Xu  x...@google.com
 
   * ipa-inline-analysis.c (estimate_function_body_sizes): fix
 the size estimation for builtin_expect.

This seems fine with an comment in the code what it is about.
I also think we want to support mutiple builtin_expects in a BB so perhaps
we want to have pointer set of statements to ignore?

To avoid spagetti code, please just move the new logic into separate functions.

Honza
 
 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 202638)
 +++ ipa-inline-analysis.c (working copy)
 @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node *
/* Estimate static overhead for function prologue/epilogue and alignment. 
 */
int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
int size = overhead;
 +  gimple fix_expect_builtin;
 +
/* Benefits are scaled by probability of elimination that is in range
   0,2.  */
basic_block bb;
 @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node *
   }
   }
  
 +  fix_expect_builtin = NULL;
for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
   {
 gimple stmt = gsi_stmt (bsi);
 +   if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT))
 +{
 +  tree var = gimple_call_lhs (stmt);
 +  tree arg = gimple_call_arg (stmt, 0);
 +  use_operand_p use_p;
 +  gimple use_stmt;
 +  bool match = false;
 +  bool done = false;
 +  gcc_assert (var  arg);
 +  gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 +  while (TREE_CODE (arg) == SSA_NAME)
 +{
 +  gimple stmt_tmp = SSA_NAME_DEF_STMT (arg);
 +  switch (gimple_assign_rhs_code (stmt_tmp))
 +{
 +  case LT_EXPR:
 +  case LE_EXPR:
 +  case GT_EXPR:
 +  case GE_EXPR:
 +  case EQ_EXPR:
 +  case NE_EXPR:
 +match = true;
 +done = true;
 +break;
 +  case NOP_EXPR:
 +break;
 +  default:
 +done = true;
 +break;
 +}
 +  if (done)
 +break;
 +  arg = gimple_assign_rhs1 (stmt_tmp);
 +}
 +
 +  if (match  single_imm_use (var, use_p, use_stmt))
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +{
 +  fix_expect_builtin = use_stmt;
 +}
 +}
 +
 +  /* we should see one builtin_expert call in one bb.  */
 +  break;
 +}
 +}
 +
 +  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
 + {
 +   gimple stmt = gsi_stmt (bsi);
 int this_size = estimate_num_insns (stmt, eni_size_weights);
 int this_time = estimate_num_insns (stmt, eni_time_weights);
 int prob;
 struct predicate will_be_nonconstant;
  
 +   if (stmt == fix_expect_builtin)
 +{
 +  this_size--;
 +  this_time--;
 +}
 +
 if (dump_file  (dump_flags  TDF_DETAILS))
   {
 fprintf (dump_file,   );



Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-26 Thread Rong Xu
On Thu, Sep 26, 2013 at 2:54 PM, Jan Hubicka hubi...@ucw.cz wrote:
  As for COMDAT merging, i would like to see the patch.  I am experimenting
  now with a patch to also privatize COMDATs during -fprofile-generate to
  avoid problems with lost profiles mentioned above.
 

 Do you mean you privatize every COMDAT function in the profile-generate?
 We discussed this idea internally and we thought it would not work for
 large applications (like in google) due to size.

 Yes, Martin and I plan to test this on firefox.  In a way you already have all
 the COMDAT functions unshared in the object files, so the resulting binary
 should not be completely off the limits.  But I do not have any quantitative
 data, yet, since we hit bug in constant folding and devirtualization I fixed 
 in
 meantime but we did not re-run the tests yet.

LInker removes a great numbers of duplicated copies, esp for those
template functions.
We don't have a quantitative numbers either. But I'll collect some soon.


  As for context sensitivity, one approach would be to have two sets of
  counters for every comdat - one merged globally and one counting local
  instances.  We can then privatize always and at profile read in stage
  just clone every comdat and have two instances - one for offline copy
  and one for inlining.
 

 In my implementation, I also allow multiple sets of COMDAT profile
 co-existing in one compilation.
 Due to the auxiliary modules in LIPO, I actually have more than two.

 How does auxiliary modules work?

It pulls in multiple profiles from other compilation. So there might be multiple
inlined profiles.


 But I'm wondering how do you determine which profile to use for each
 call-site -- the inline decision may not
 be the same for profile-generate and profile-use compilation.

 My suggestion was to simply use the module local profile for all inline sites
 within the given module and the global profile for the offline copy of the
 function (that one will, in the case it survives linking, be shared across
 all the modules anyway).

For simple example like:
callsite1 -- comcat_function_foo
callsite2 -- comdat_function_foo

callsite1 is inlined in profile-generate, it has its own inlined
profile counter.
callsite2 is not inlined and the profile goes to the offline copies.
let's callsite 1 is cold (0 counter) and callsite 2 is hot. Using
local profile (the cold one)
for callsite2 will not be correct.


 I think this may work in the cases where i.e. use of hash templates in one
 module is very different (in average size) from other module.
 I did not really put much effort into it - I currently worry primarily about
 the cases where profile is lost completely since it gets attached to a 
 function
 not surviving final linking (or because we inline something we did not inlined
 at profile time).

 As for context sensitivity, we may try to consider developing more consistent
 solution for this.  COMDAT functions are definitely not only that may exhibit
 context sensitive behaviour.
 One approach would be to always have multiple counters for each function and
 hash based on cbacktraces collected by indirect call profiling 
 instrumentation.
 In a way this is same path profiling, but that would definitely add quite some
 overhead + we will need to think of resonable way to represent this within
 compiler.

 How do you decide what functions you want to have multiple profiles for?

I do the instrumentation after ipa-inline for comdat function. I know
if a callsite
is inlined or not. In profile-use phrase, I also need to provide to
the context (which module this is from) to pick
the right profile.


 Honza


Re: [google gcc-4_8] alternate hirate for builtin_expert

2013-09-26 Thread Jan Hubicka
 Hi,
 
 Current default probably for builtin_expect is 0.9996.
 This makes the freq of unlikely bb very low (4), which
 suppresses the inlining of any calls within those bb.
 
 We used FDO data to measure the branch probably for
 the branch annotated with builtin_expert.
  For google
 internal benchmarks, the weight average
 (the profile count value as the weight) is 0.9081.
 
 Linux kernel is another program that is heavily annotated
 with builtin-expert. We measured its weight average as 0.8717,
   using google search as
 the workload.

This is interesting.  I was always unsure if programmers use builtin_expect
more often to mark an impossible paths (as those leading to crash) or just
unlikely paths.  Your data seems to suggest the second.

We probably also ought to get analyze_brprob working again. It was always
useful to get such a data.
 
 
 This patch sets the alternate hirate probability for
 builtin_expert
 to 90%. With the alternate hirate, we measured performance
   improvement for google
 benchmarks and Linux kernel.
 
 
   -Rong
 2013-09-26  Rong Xu  x...@google.com
 
   * params.def (DEFPARAM): New.
   * params.def: New.
   * predict.c (tree_predict_by_opcode): Alternate 
 probablity hirate for builtin_expect.

This also seems resonable for mainline.  Please add a comment
to the parameter explaining how the value was determined.
Please also add invoke.texi documentation.

For patches that seems resonable for mainline in FDO/IPA area,
i would apprechiate if you just added me to CC, so I do not lose
track of them.
Honza


Re: [google gcc-4_8] alternate hirate for builtin_expert

2013-09-26 Thread Xinliang David Li
This patch improves linux kernel performance with a large workload, so
it is good to first submit this to trunk and backport it.

thanks,

David

On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 Current default probably for builtin_expect is 0.9996.
 This makes the freq of unlikely bb very low (4), which
 suppresses the inlining of any calls within those bb.

 We used FDO data to measure the branch probably for
 the branch annotated with builtin_expert.
  For google
 internal benchmarks, the weight average
 (the profile count value as the weight) is 0.9081.

 Linux kernel is another program that is heavily annotated
 with builtin-expert. We measured its weight average as 0.8717,
   using google search as
 the workload.

 This is interesting.  I was always unsure if programmers use builtin_expect
 more often to mark an impossible paths (as those leading to crash) or just
 unlikely paths.  Your data seems to suggest the second.

 We probably also ought to get analyze_brprob working again. It was always
 useful to get such a data.


 This patch sets the alternate hirate probability for
 builtin_expert
 to 90%. With the alternate hirate, we measured performance
   improvement for google
 benchmarks and Linux kernel.


   -Rong
 2013-09-26  Rong Xu  x...@google.com

   * params.def (DEFPARAM): New.
   * params.def: New.
   * predict.c (tree_predict_by_opcode): Alternate
 probablity hirate for builtin_expect.

 This also seems resonable for mainline.  Please add a comment
 to the parameter explaining how the value was determined.
 Please also add invoke.texi documentation.

 For patches that seems resonable for mainline in FDO/IPA area,
 i would apprechiate if you just added me to CC, so I do not lose
 track of them.
 Honza


Re: [google gcc-4_8] alternate hirate for builtin_expert

2013-09-26 Thread Xinliang David Li
it might worth extend __builtin_expect to take more parameters:
1) argument to specify actual probability: __builtin_expect (x, 10, 0.6)
2) a more general way of doing this is to allow specifying multiple
values, and the probability is determined by # of occurances:
__builtin_expect (x, 10, 10, 20) -- tells compiler x is expected to
be 10 66% of the time, and 33% of time with value twenty.
3) a special value can be reserved to indicate if the branch is
predictable or not.

David

On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 Current default probably for builtin_expect is 0.9996.
 This makes the freq of unlikely bb very low (4), which
 suppresses the inlining of any calls within those bb.

 We used FDO data to measure the branch probably for
 the branch annotated with builtin_expert.
  For google
 internal benchmarks, the weight average
 (the profile count value as the weight) is 0.9081.

 Linux kernel is another program that is heavily annotated
 with builtin-expert. We measured its weight average as 0.8717,
   using google search as
 the workload.

 This is interesting.  I was always unsure if programmers use builtin_expect
 more often to mark an impossible paths (as those leading to crash) or just
 unlikely paths.  Your data seems to suggest the second.

 We probably also ought to get analyze_brprob working again. It was always
 useful to get such a data.


 This patch sets the alternate hirate probability for
 builtin_expert
 to 90%. With the alternate hirate, we measured performance
   improvement for google
 benchmarks and Linux kernel.


   -Rong
 2013-09-26  Rong Xu  x...@google.com

   * params.def (DEFPARAM): New.
   * params.def: New.
   * predict.c (tree_predict_by_opcode): Alternate
 probablity hirate for builtin_expect.

 This also seems resonable for mainline.  Please add a comment
 to the parameter explaining how the value was determined.
 Please also add invoke.texi documentation.

 For patches that seems resonable for mainline in FDO/IPA area,
 i would apprechiate if you just added me to CC, so I do not lose
 track of them.
 Honza


Re: [gomp4] Library side of depend clause support

2013-09-26 Thread Richard Henderson
On 09/26/2013 11:36 AM, Jakub Jelinek wrote:
 +struct gomp_task;
  struct gomp_taskgroup;
 +struct htab;
 +
 +struct gomp_task_depend_entry
 +{
 +  void *addr;
 +  struct gomp_task_depend_entry *next;
 +  struct gomp_task_depend_entry *prev;
 +  struct gomp_task *task;
 +  bool is_in;
 +  bool redundant;
 +};

I'm a bit confused about the combination of linked lists and reallocated
arrays.  When did you decide to use what?

 +  if ((flags  8)  thr-task  thr-task-depend_hash)
 + {
 +   struct gomp_task *parent = thr-task;
 +   struct gomp_task_depend_entry elem, *ent = NULL;
 +   size_t ndepend = (uintptr_t) depend[0];
 +   size_t nout = (uintptr_t) depend[1];
 +   size_t i;
 +   gomp_mutex_lock (team-task_lock);
 +   for (i = 0; i  ndepend; i++)
 + {
 +   elem.addr = depend[i + 2];
 +   ent = htab_find (parent-depend_hash, elem);
 +   for (; ent; ent = ent-next)
 + if (i = nout  ent-is_in)
 +   continue;
 + else
 +   break;

I wonder if we ought always defer tasks with dependencies and skip this lock
and search?  Unless the taskgroup is truly weird, we *are* going to have
dependencies between the tasks.  Dunno what exactly to do with final_tasks
that have unfulfilled dependencies...

I also think it would significantly clean up the code to declare a struct with
a variable tail for the depend argument.  That would eliminate all of the
casting to uintptr_t and give names to the first two entries.

 +   if (tsk-dependers == NULL)
 + {
 +   tsk-dependers
 + = gomp_malloc (8 * sizeof (struct gomp_task *));
 +   tsk-dependers[0]
 + = (struct gomp_task *) (uintptr_t) 1;
 +   tsk-dependers[1]
 + = (struct gomp_task *) (uintptr_t) (8 - 2);
 +   tsk-dependers[2] = task;
 +   task-num_dependees++;
 +   continue;

Another place for which a struct with variable tail would significantly clean
up the code.  And here's where I wonder why you're using realloc'd arrays here
as opposed to another linked list?

Perhaps what we need are smaller linked list entries like

  struct gomp_task_depend_node {
 struct gomp_task *task;
 struct gomp_task_depend_node *next;
 struct gomp_task_depend_node *prev;
  };

and a different hash table entry like

  struct gomp_task_depend_head {
void *addr;
struct gomp_task_depend_node *outs;
struct gomp_task_depend_node *ins;
size_t n_ins;
  };

If we scan the ndepend entries twice, we can find out how many nodes we need,
and allocate them with the task like you do now.  Scanning the ndepends array
twice can be sped by only looking up the hash table entries once -- allocate a
local array of size ndepend entries and cache the lookups.

I'd say we don't need a count of the n_outs because all of them on the list
must be sequentially dependent.  Thus any new task simply depends on the
previous task in the outs list.  Thus imo it makes sense to have ins/outs point
to the tail of the list as opposed to the head.

Is is really worthwhile to detect redundant dependencies?  It seems just as
easy to add multiple dependencies and let them just fall out naturally.

OTOH, perhaps you should just go ahead with this patch and we can evolve it
incrementally.  I don't see anything technically wrong with it.


r~


Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3

2013-09-26 Thread David Edelsohn
On Thu, Sep 26, 2013 at 4:51 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 I discovered that I was setting the wv/wu constraints incorrectly to
 ALTIVEC_REGS, which leads to reload failures in some cases.

 Is this patch ok to apply along with the previous patch assuming it bootstraps
 and has no regressions with make check?  It builds the programs that had
 failures with the previous patch.

 2013-09-26  Michael Meissner  meiss...@linux.vnet.ibm.com

 * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't
 allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can
 occupy the Altivec registers.

Okay.

Can you add a testcase to catch this in the future?

Thanks, David


Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3

2013-09-26 Thread Michael Meissner
On Thu, Sep 26, 2013 at 06:56:37PM -0400, David Edelsohn wrote:
 On Thu, Sep 26, 2013 at 4:51 PM, Michael Meissner
 meiss...@linux.vnet.ibm.com wrote:
  I discovered that I was setting the wv/wu constraints incorrectly to
  ALTIVEC_REGS, which leads to reload failures in some cases.
 
  Is this patch ok to apply along with the previous patch assuming it 
  bootstraps
  and has no regressions with make check?  It builds the programs that had
  failures with the previous patch.
 
  2013-09-26  Michael Meissner  meiss...@linux.vnet.ibm.com
 
  * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't
  allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can
  occupy the Altivec registers.
 
 Okay.
 
 Can you add a testcase to catch this in the future?

You only see it in big programs with agressive optimizations.  I did not see it
during the normal testing (bootstrap, etc.).

The failure is reload complaining it can't find an Altivec register to spill if
the move pattern has an option for only Altivec registers.  It isn't like bad
code is silently generated.  I will check the 5 spec benchmarks that failed
with to see if I can extract one module that shows it off.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [gomp4] Library side of depend clause support

2013-09-26 Thread Jakub Jelinek
On Thu, Sep 26, 2013 at 03:54:09PM -0700, Richard Henderson wrote:
 On 09/26/2013 11:36 AM, Jakub Jelinek wrote:
  +struct gomp_task;
   struct gomp_taskgroup;
  +struct htab;
  +
  +struct gomp_task_depend_entry
  +{
  +  void *addr;
  +  struct gomp_task_depend_entry *next;
  +  struct gomp_task_depend_entry *prev;
  +  struct gomp_task *task;
  +  bool is_in;
  +  bool redundant;
  +};
 
 I'm a bit confused about the combination of linked lists and reallocated
 arrays.  When did you decide to use what?

I initially wanted to use linked lists only, but while I can statically
preallocate the chains for the hash table, for the depender - dependee
chains where a task may depend on many other tasks that would mean having to
allocate small structures (or pool allocate them, per team?).

 I wonder if we ought always defer tasks with dependencies and skip this lock
 and search?  Unless the taskgroup is truly weird, we *are* going to have
 dependencies between the tasks.  Dunno what exactly to do with final_tasks
 that have unfulfilled dependencies...

I think final tasks aren't a problem, if the parent is a final task, then
all the children are non-deferred, thus we never record any dependencies
and the test for that will be cheap too (because parent-depend_hash will be
NULL).  The problem is if (0) tasks, the spec says that they must be
non-deferred unless they depend on some earlier non-finished task.  But
the cost in that case is primarily in taking the lock/unlock; the search
will stop on the first dependency found, if there aren't any, nothing will
be recorded and we don't jump to the defer label, if there are some, as soon
as we discover first we jump there.

 I also think it would significantly clean up the code to declare a struct with
 a variable tail for the depend argument.  That would eliminate all of the
 casting to uintptr_t and give names to the first two entries.

I agree if we keep using realloced vectors that flexible array would make it
cleaner.

 Perhaps what we need are smaller linked list entries like
 
   struct gomp_task_depend_node {
  struct gomp_task *task;
  struct gomp_task_depend_node *next;
  struct gomp_task_depend_node *prev;
   };

The dependee - depender vectors resp. linked lists are just pushed to first
(the only thing needed during insertion is to have a cheap check if the last
inserted task is the current one, to avoid having the same task multiple
times in the vector/linked list), and then just walked once when the
dependee finishes, so no removal is needed there, it can be freed at once;
thus, for linked list, it would be enough to use non-doubly linked list for
that.  For the hash table chains, unless we want to always lookup the hash
table and walk the chains for removal, we need doubly linked list.
 
 and a different hash table entry like
 
   struct gomp_task_depend_head {
 void *addr;
 struct gomp_task_depend_node *outs;
 struct gomp_task_depend_node *ins;
 size_t n_ins;
   };

You mean that the hash table instead would contain the structures, or
pointers to these structures?  If the latter (not sure what n_ins would be
for), then we'd again need to pool alloc them.
 
 If we scan the ndepend entries twice, we can find out how many nodes we need,
 and allocate them with the task like you do now.  Scanning the ndepends array
 twice can be sped by only looking up the hash table entries once -- allocate a
 local array of size ndepend entries and cache the lookups.
 
 I'd say we don't need a count of the n_outs because all of them on the list
 must be sequentially dependent.  Thus any new task simply depends on the
 previous task in the outs list.  Thus imo it makes sense to have ins/outs 
 point
 to the tail of the list as opposed to the head.

Ah, haven't thought about it this way, yes, you're right that for out/inout
dependencies it is enough to remember in the hash table the last one,
because the dependencies will form a chain on the same address and serialize
the tasks.

 Is is really worthwhile to detect redundant dependencies?  It seems just as
 easy to add multiple dependencies and let them just fall out naturally.

I just didn't want to have duplicates in the hash table chains, the
redundant flag is just a sign that the entry doesn't need to be removed
from the hash table chains.

 OTOH, perhaps you should just go ahead with this patch and we can evolve it
 incrementally.  I don't see anything technically wrong with it.

Perhaps.  What if I do just minor cleanup (use flexible array members for
the reallocated vectors, and perhaps keep only the last out/inout task
in the hash table chains rather than all of them), retest, commit and then
we can discuss/incrementally improve it?

Jakub


[ping] [PATCH] Silence an unused variable warning

2013-09-26 Thread Jan-Benedict Glaw
On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 Hi!
 
 With the VAX target, I see this warning:
 
 g++ -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
 -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
 -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic 
 -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
 -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. 
 -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include  
 -I../../../../gcc/gcc/../libdecnumber 
 -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I../../../../gcc/gcc/../libbacktrace
 ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o
 ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’:
 ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable 
 ‘value_p’ [-Wunused-variable]
bool value_p;
 ^
[...]

Ping:

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01568.html
`-- http://gcc.gnu.org/ml/gcc-patches/2013-09/txtnrNwaGiD3x.txt

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: Gib Dein Bestes. Dann übertriff Dich selbst!
the second  :


signature.asc
Description: Digital signature


[Patch] Let ordinary escaping in POSIX regex be valid

2013-09-26 Thread Tim Shen
POSIX ERE says that escaping an ordinary char, say R\n is not
permitted, because 'n' is not a special char. However, they also say
that : Implementations are permitted to extend the language to allow
these. Conforming applications cannot use such constructs.

So let's support it not to make users surprised.

Booted and tested under -m32 and -m64

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


User-define literals for std::complex.

2013-09-26 Thread Ed Smith-Rowland

Greetings,

The complex user-defined literals finally passed (n3779) with the 
resolution to DR1473 allowing the suffix id to touch the quotes (Can't 
find it but I put it in not too long ago).

(http://wiki.edg.com/twiki/pub/Wg21chicago2013/LibraryWorkingGroup/N3779-complex_literals.pdf)

Actually, I think allowing space between quotes and suffix ID was a mistake.

Also it looks like they are *removing* inline from the 'namespace 
literals' so that 'using std;' brings in the literals but that will be a 
later patch for all literals at once.


This has been bootstrapped and regtested on x86_64-linux.

As a general stylistic guide for the library I think I'll put
  operatorabc(...)
with no spaces.  Later.

OK?


2013-09-27  Ed Smith-Rowland  3dw...@verizon.net

Implement N3779 - User-defined Literals for std::complex,
part 2 of UDL for Standard Library Types
* include/std/complex: Add complex literal operators.
* testsuite/26_numerics/complex/literals/types.cc: New.
* testsuite/26_numerics/complex/literals/values.cc: New.

Index: include/std/complex
===
--- include/std/complex (revision 202928)
+++ include/std/complex (working copy)
@@ -1924,6 +1924,40 @@
 conj(_Tp __x)
 { return __x; }
 
+#if __cplusplus  201103L
+
+inline namespace literals {
+inline namespace complex_literals {
+
+  inline constexpr std::complexfloat
+  operatorif(long double __num)
+  { return std::complexfloat{0.0F, static_castfloat(__num)}; }
+
+  inline constexpr std::complexfloat
+  operatorif(unsigned long long __num)
+  { return std::complexfloat{0.0F, static_castfloat(__num)}; }
+
+  inline constexpr std::complexdouble
+  operatori(long double __num)
+  { return std::complexdouble{0.0, static_castdouble(__num)}; }
+
+  inline constexpr std::complexdouble
+  operatori(unsigned long long __num)
+  { return std::complexdouble{0.0, static_castdouble(__num)}; }
+
+  inline constexpr std::complexlong double
+  operatoril(long double __num)
+  { return std::complexlong double{0.0L, __num}; }
+
+  inline constexpr std::complexlong double
+  operatoril(unsigned long long __num)
+  { return std::complexlong double{0.0L, static_castlong double(__num)}; }
+
+} // inline namespace complex_literals
+} // inline namespace literals
+
+#endif // C++14
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
Index: testsuite/26_numerics/complex/literals/types.cc
===
--- testsuite/26_numerics/complex/literals/types.cc (revision 0)
+++ testsuite/26_numerics/complex/literals/types.cc (working copy)
@@ -0,0 +1,46 @@
+// { dg-options -std=c++1y }
+// { dg-do compile }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include complex
+#include type_traits
+
+void
+test02()
+{
+  using namespace std::literals::complex_literals;
+
+  static_assert(std::is_samedecltype(1.0if), std::complexfloat::value,
+   1.0if is std::complexfloat);
+
+  static_assert(std::is_samedecltype(1if), std::complexfloat::value,
+   1if is std::complexfloat);
+
+  static_assert(std::is_samedecltype(1.0i), std::complexdouble::value,
+   1.0i is std::complexdouble);
+
+  static_assert(std::is_samedecltype(1i), std::complexdouble::value,
+   1i is std::complexdouble);
+
+  static_assert(std::is_samedecltype(1.0il), std::complexlong 
double::value,
+   1.0il is std::complexlong double);
+
+  static_assert(std::is_samedecltype(1il), std::complexlong double::value,
+   1il is std::complexlong double);
+}
Index: testsuite/26_numerics/complex/literals/values.cc
===
--- testsuite/26_numerics/complex/literals/values.cc(revision 0)
+++ testsuite/26_numerics/complex/literals/values.cc(working copy)
@@ -0,0 +1,48 @@
+// { dg-options -std=gnu++1y }
+// { dg-do run }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either 

Re: [PATCH] Trivial cleanup

2013-09-26 Thread Jeff Law

On 09/26/2013 08:15 AM, Michael Matz wrote:

Hi,

On Wed, 25 Sep 2013, Jeff Law wrote:


I was going to bring it up at some point too.  My preference is
strongly to simply eliminate the space on methods...

Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time.

Yea.  I actually reviewed the libstdc++ guidelines to see where they differed
from GNU's C guidelines.

I'm strongly in favor of dropping the horizontal whitespace between the
method name and its open paren when the result is then dereferenced.
ie foo.last()-e rather than foo.last ()-e.


I'd prefer to not write in this style at all, like Jakub.  If we must
absolutely have it, then I agree that the space before _empty_ parentheses
are ugly if followed by references.  I.e. I'd like to see spaces before
parens as is customary, except in one case: empty parens in the middle of
expressions (which don't happen very often right now in GCC, and hence
wouldn't introduce a coding style confusion):

do.this ();
give.that()-flag;
get.list (one)-clear ();

I'd prefer to not have further references to return values be applied,
though (as in, the parentheses should be the end of statement), which
would avoid the topic (at the expensive of having to invent names for
those temporaries, or to write trivial wrapper methods contracting several
method calls).
Should we consider banning dereferencing the result of a method call and 
instead prefer to use a more functional interface such as Jakub has 
suggested, or have the result of the method call put into a temporary 
and dereference the temporary.


I considered suggesting the latter.  I wouldn't be a huge fan of the 
unnecessary temporaries, but they may be better than the horrid 
foo.last()-argh()-e-src or whatever.


Stuffing the result into a temporary does have one advantage, it 
encourages us to CSE across the method calls in cases where the compiler 
might not be able to do so.  Of course, being humans, we'll probably 
mess it up.


jeff


RE: [PATCH]Fix computation of offset in ivopt

2013-09-26 Thread bin.cheng


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 24, 2013 6:31 PM
 To: Bin Cheng
 Cc: GCC Patches
 Subject: Re: [PATCH]Fix computation of offset in ivopt
 
 On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote:
 
 +   field = TREE_OPERAND (expr, 1);
 +   if (DECL_FIELD_BIT_OFFSET (field)
 +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
 + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
 +
 +   tmp = component_ref_field_offset (expr);
 +   if (top_compref
 +cst_and_fits_in_hwi (tmp))
 + {
 +   /* Strip the component reference completely.  */
 +   op0 = TREE_OPERAND (expr, 0);
 +   op0 = strip_offset_1 (op0, inside_addr, top_compref, off0);
 +   *offset = off0 + int_cst_value (tmp) + boffset /
BITS_PER_UNIT;
 +   return op0;
 + }
 
 the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false
for
 either offset part you may end up doing half accounting and not stripping.
 
 Btw, DECL_FIELD_BIT_OFFSET is always non-NULL.  I suggest to rewrite to
 
  if (!inside_addr)
return orig_expr;
 
  tmp = component_ref_field_offset (expr);
  field = TREE_OPERAND (expr, 1);
  if (top_compref
   cst_and_fits_in_hwi (tmp)
   cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
 {
   ...
 }
Will be refined.

 
 note that this doesn't really handle overflows correctly as
 
 +   *offset = off0 + int_cst_value (tmp) + boffset /
 + BITS_PER_UNIT;
 
 may still overflow.
Since it's unsigned + signed + signed, according to implicit conversion,
the signed operand will be converted to unsigned, so the overflow would only
happen when off0 is huge number and tmp/boffset is large positive number,
right?  Do I need to check whether off0 is larger than the overflowed
result?  Also there is signed-unsigned problem here, see below.

 
 @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data,
  bitmap_clear (*depends_on);
  }
 
 +  /* Sign-extend offset if utype has lower precision than
 + HOST_WIDE_INT.  */  offset = sext_hwi (offset, TYPE_PRECISION
 + (utype));
 +
 
 offset is computed elsewhere in difference_cost and the bug to me seems
 that it is unsigned.  sign-extending it here is odd at least (and the
extension
 should probably happen at sizetype precision, not that of utype).
I agree, The root cause is in split_offset_1, in which offset is computed.
Every time offset is computed in this function with a signed operand (like
int_cst_value (tmp) above), we need to take care the possible negative
number problem.   Take this case as an example, we need to do below change:

  case INTEGER_CST:
  //...
  *offset = int_cst_value (expr); 
change to 
  case INTEGER_CST:
  //...
  *offset = sext_hwi (int_cst_value (expr), type);

and
  case MULT_EXPR: 
  //...
  *offset = sext_hwi (int_cst_value (expr), type);
to
  case MULT_EXPR: 
  //...
 HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1);
  *offset = sext_hwi (xxx, type);

Any comments?

Thanks.
bin