Re: [PATCH][PR67666] Handle single restrict pointer in struct in create_variable_info_for_1

2015-09-28 Thread Tom de Vries

On 22/09/15 09:49, Richard Biener wrote:

On Tue, 22 Sep 2015, Tom de Vries wrote:


Hi,

Consider this test-case:

struct ps
{
   int *__restrict__ p;
};

void
f (struct ps &__restrict__ ps1)
{
   *(ps1.p) = 1;
}


Atm, the restrict on p has no effect. Now, say we add a field to the struct:

struct ps
{
   int *__restrict__ p;
   int a;
};


Then the restrict on p does have the desired effect.


This patch fixes the handling of structs with a single field in alias
analysis.

Bootstrapped and reg-tested on x86_64.

OK for trunk?


Ok.



Hi,

I wonder if this follow-up patch is necessary.

Now that we handle structs with one field in the final loop of 
create_variable_info_for_1, should we set the is_full_var field as well? 
It used to be set for such structs before I committed the "Handle single 
restrict pointer in struct in create_variable_info_for_1" patch.


Thanks,
- Tom

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 8d86dcb..26d97a3 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5720,6 +5720,8 @@ create_variable_info_for_1 (tree decl, const char 
*name)

   newvi->offset = fo->offset;
   newvi->size = fo->size;
   newvi->fullsize = vi->fullsize;
+  if (fieldstack.length () == 1)
+   newvi->is_full_var = true;
   newvi->may_have_pointers = fo->may_have_pointers;
   newvi->only_restrict_pointers = fo->only_restrict_pointers;
   if (i + 1 < fieldstack.length ())



Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-28 Thread Senthil Kumar Selvaraj
On Mon, Sep 28, 2015 at 01:38:18PM -0600, Jeff Law wrote:
> On 09/28/2015 02:15 AM, Senthil Kumar Selvaraj wrote:
> >Hi,
> >
> >   The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
> >   pointer checks.
> >
> >   The test fails for such targets (avr, in my case) because the address
> >   comparison in the below code does not resolve to a constant, causing
> >   builtin_constant_p to return false and fail the test.
> >
> >   /* Variables and functions do not share same memory locations otherwise.  
> > */
> >   if (!__builtin_constant_p ((void *)undef_fn0 == (void *)&undef_var0))
> > abort ();
> >
> >   For targets that delete null pointer checks, the equality comparison 
> > expression
> >   is optimized away to 0, as the code in match.pd knows they can only be
> >   equal if they are both NULL, which cannot be true since
> >   flag-delete-null-pointer-checks is on.
> >
> >   For targets that keep null pointer checks, 0 is a valid address and the
> > comparison expression is left as is, and that causes a later pass to
> > fold the builtin_constant_p to a false value, resulting in the test 
> > failure.
> This sounds like a failing in the compiler itself, not a testsuite issue.
> 
> Even on a target where objects can be at address 0, you can't have a
> variable and a function at the same address.

Hmm, symtab_node::equal_address_to, which is where the address equality
check happens, has a comment that contradicts
your statement, and the function variable overlap check is done after the
NULL possibility check. The current code looks like this

   /* If both symbols may resolve to NULL, we can not really prove them 
different.  */  
   
if (!nonzero_address () && !s2->nonzero_address ())
  return 2;
  
/* Except for NULL, functions and variables never overlap.  */
if (TREE_CODE (decl) != TREE_CODE (s2->decl))
  return 0;

Does anyone know why?

Regards
Senthil


RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-28 Thread Ajit Kumar Agarwal


-Original Message-
From: Aaron Sawdey [mailto:acsaw...@linux.vnet.ibm.com] 
Sent: Monday, September 28, 2015 11:55 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sat, 2015-09-26 at 04:51 +, Ajit Kumar Agarwal wrote:
> I have made the following changes in the estimate_reg_pressure_cost 
> function used by the loop invariant and IVOPTS.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant  and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register  used inside the loops and 
> the effect of  n_new new variables generated by loop invariant and 
> IVOPTS on register pressure is based on how the new variables impact 
> on register used inside the loops. The increase or decrease in register 
> pressure is due to the impact of new variables on the register used  inside 
> the loops. The register-register move cost or the spill cost should consider 
> the cost associated with register used and the new variables generated. The 
> movement  of new variables increases or decreases the register pressure, 
> which is based on  overall cost of n_new + n_old variables.
> 
> The increase and decrease in register pressure is based on the overall 
> cost of n_new + n_old as the changes in the register pressure caused 
> due to new variables is based on how the changes behave with respect to the 
> register used in the loops.
> 
> Thus the register pressure caused to new variables is based on the new 
> variables and its impact on register used inside  the loops and thus consider 
> the overall  cost of n_new + n_old.
> 
> Bootstrap for i386 and reg tested on i386 with the change is fine.
> 
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance and code size.
> 
> ratio with the optimization vs ratio without optimization for INT 
> benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP 
> benchmarks ( 4668.743 vs 4778.741)
> 
> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
> 
> Number of instruction with optimization = 1094117 Number of 
> instruction without optimization = 1094659
> 
> Reduction in number of instruction with the optimization = 542 instruction.
> 
> [Patch,optimization]: Optimized changes in the estimate  register 
> pressure cost.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register used inside the loops and the 
> affect of n_new new variables generated by loop invariant and IVOPTS 
> on register pressure is based on how the new variables impact on register 
> used inside the loops.
> 
> ChangeLog:
> 2015-09-26  Ajit Agarwal  
> 
>   * cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
>   to consider the n_new plus n_old in the register pressure
>   cost.
> 
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com

>>Ajit,

 >>It looks to me like your change doesn't do anything at all inside the 
 >>loop-invariant.c code. There it's doing a difference between two 
 estimate_reg_pressure_cost calls so adding n_old (regs_used) to both is 
 >>canceled out.

>>  size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
>>  regs_used, speed, call_p)
>>   - estimate_reg_pressure_cost (new_regs[0],
>> regs_used, speed, call_p));

>>I'm not quite sure I understand the "why" of the heuristic you've added here 
>>-- can you explain your reasoning further?

Aaron:

Extract from function estimate_reg_pressure_cost() where the changes are made.

if (regs_needed <= available_regs)
/* If we are close to running out of registers, try to preserve
   them.  */
/* Case 1 */
cost = target_reg_cost [speed] * regs_needed ;
  else
/* If we run out of registers, it is very expensive to add another
   one.  */
 /* Case 2*/
cost = target_spill_cost [speed] * regs_needed;

If the first estimate_reg_pressure falls into the category of Case I or Case 2 
and the second estimate_reg_pressure falls into same Category for Case1 Or 
Case2 then it will be cancelled out. If both the estimate_reg_pressure falls 
into different category like first One falls into Case 2 and second one falls 
into Case 1, then it will not be cancelled out as the target_reg_cost[speed] 
and target_spill_cost[speed] are different.

The changes works out si

[patch, committed] Dump function attributes

2015-09-28 Thread Tom de Vries

[ was: Re: [RFC] Dump function attributes ]

On 28/09/15 17:17, Bernd Schmidt wrote:

On 09/28/2015 04:32 PM, Tom de Vries wrote:

patch below prints the function attributes in the dump file.



foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?


Only for really long ones I'd think. Patch is ok for now.




Reposting patch with ChangeLog entry added.

Bootstrapped and reg-tested on x86_64.

Committed to trunk.

Thanks,
- Tom
Dump function attributes

2015-09-29  Tom de Vries  

	* tree-cfg.c (dump_function_to_file): Dump function attributes.
---
 gcc/tree-cfg.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 807d96f..08935ac 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7369,6 +7369,13 @@ dump_function_to_file (tree fndecl, FILE *file, int flags)
 }
   fprintf (file, ")\n");
 
+  if (DECL_ATTRIBUTES (fndecl) != NULL_TREE)
+{
+  fprintf (file, "[ ");
+  print_generic_expr (file, DECL_ATTRIBUTES (fndecl), dump_flags);
+  fprintf (file, "]\n");
+}
+
   if (flags & TDF_VERBOSE)
 print_node (file, "", fndecl, 2);
 
-- 
1.9.1



[patch committed SH] Fix PR target/67716

2015-09-28 Thread Kaz Kojima
I've committed the attached patch to fix PR target/67716.  It
implements targetm.override_options_after_change for SH.  Tested
on sh4-unknown-linux-gnu.

Regards,
kaz
--
2015-09-29  Kaz Kojima  

PR target/67716
* config/sh/sh.c (sh_override_options_after_change): New.
(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): Define.
(sh_option_override): Move align_loops, align_jumps and
align_functions handling into sh_override_options_after_change.

diff --git a/config/sh/sh.c b/config/sh/sh.c
index b203258..16fb575 100644
--- a/config/sh/sh.c
+++ b/config/sh/sh.c
@@ -202,6 +202,7 @@ static bool noncall_uses_reg (rtx, rtx_insn *, rtx *);
 static rtx_insn *gen_block_redirect (rtx_insn *, int, int);
 static void sh_reorg (void);
 static void sh_option_override (void);
+static void sh_override_options_after_change (void);
 static void output_stack_adjust (int, rtx, int, HARD_REG_SET *, bool);
 static rtx_insn *frame_insn (rtx);
 static rtx push (int);
@@ -392,6 +393,10 @@ static const struct attribute_spec sh_attribute_table[] =
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sh_option_override
 
+#undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE \
+  sh_override_options_after_change
+
 #undef TARGET_PRINT_OPERAND
 #define TARGET_PRINT_OPERAND sh_print_operand
 #undef TARGET_PRINT_OPERAND_ADDRESS
@@ -1044,6 +1049,50 @@ sh_option_override (void)
   TARGET_ACCUMULATE_OUTGOING_ARGS = 1;
 }
 
+  if (flag_unsafe_math_optimizations)
+{
+  /* Enable fsca insn for SH4A if not otherwise specified by the user.  */
+  if (global_options_set.x_TARGET_FSCA == 0 && TARGET_SH4A_FP)
+   TARGET_FSCA = 1;
+
+  /* Enable fsrra insn for SH4A if not otherwise specified by the user.  */
+  if (global_options_set.x_TARGET_FSRRA == 0 && TARGET_SH4A_FP)
+   TARGET_FSRRA = 1;
+}
+
+  /*  Allow fsrra insn only if -funsafe-math-optimizations and
+  -ffinite-math-only is enabled.  */
+  TARGET_FSRRA = TARGET_FSRRA
+&& flag_unsafe_math_optimizations
+&& flag_finite_math_only;
+
+  /* If the -mieee option was not explicitly set by the user, turn it on
+ unless -ffinite-math-only was specified.  See also PR 33135.  */
+  if (! global_options_set.x_TARGET_IEEE)
+TARGET_IEEE = ! flag_finite_math_only;
+
+  if (sh_fixed_range_str)
+sh_fix_range (sh_fixed_range_str);
+
+  /* This target defaults to strict volatile bitfields.  */
+  if (flag_strict_volatile_bitfields < 0 && abi_version_at_least(2))
+flag_strict_volatile_bitfields = 1;
+
+  sh_override_options_after_change ();
+
+  /* Parse atomic model option and make sure it is valid for the current
+ target CPU.  */
+  selected_atomic_model_
+= parse_validate_atomic_model_option (sh_atomic_model_str);
+
+  register_sh_passes ();
+}
+
+/* Implement targetm.override_options_after_change.  */
+
+static void
+sh_override_options_after_change (void)
+{
   /*  Adjust loop, jump and function alignment values (in bytes), if those
   were not specified by the user using -falign-loops, -falign-jumps
   and -falign-functions options.
@@ -1093,42 +1142,6 @@ sh_option_override (void)
   if (align_functions < min_align)
align_functions = min_align;
 }
-
-  if (flag_unsafe_math_optimizations)
-{
-  /* Enable fsca insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSCA == 0 && TARGET_SH4A_FP)
-   TARGET_FSCA = 1;
-
-  /* Enable fsrra insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSRRA == 0 && TARGET_SH4A_FP)
-   TARGET_FSRRA = 1;
-}
-
-  /*  Allow fsrra insn only if -funsafe-math-optimizations and
-  -ffinite-math-only is enabled.  */
-  TARGET_FSRRA = TARGET_FSRRA
-&& flag_unsafe_math_optimizations
-&& flag_finite_math_only;
-
-  /* If the -mieee option was not explicitly set by the user, turn it on
- unless -ffinite-math-only was specified.  See also PR 33135.  */
-  if (! global_options_set.x_TARGET_IEEE)
-TARGET_IEEE = ! flag_finite_math_only;
-
-  if (sh_fixed_range_str)
-sh_fix_range (sh_fixed_range_str);
-
-  /* This target defaults to strict volatile bitfields.  */
-  if (flag_strict_volatile_bitfields < 0 && abi_version_at_least(2))
-flag_strict_volatile_bitfields = 1;
-
-  /* Parse atomic model option and make sure it is valid for the current
- target CPU.  */
-  selected_atomic_model_
-= parse_validate_atomic_model_option (sh_atomic_model_str);
-
-  register_sh_passes ();
 }
 
 /* Print the operand address in x to the stream.  */


Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-28 Thread Bin.Cheng
On Tue, Sep 29, 2015 at 2:25 AM, Aaron Sawdey
 wrote:
> On Sat, 2015-09-26 at 04:51 +, Ajit Kumar Agarwal wrote:
>> I have made the following changes in the estimate_reg_pressure_cost function 
>> used
>> by the loop invariant and IVOPTS.
>>
>> Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
>> are generated by the Loop Invariant
>>  and IVOPTS. These are not sufficient for register pressure calculation. The 
>> register pressure cost calculation should
>> use the n_new + n_old (numbers) to consider the cost. n_old is the register  
>> used inside the loops and the effect of
>>  n_new new variables generated by loop invariant and IVOPTS on register 
>> pressure is based on how the new
>> variables impact on register used inside the loops. The increase or decrease 
>> in register pressure is due to the impact
>> of new variables on the register used  inside the loops. The 
>> register-register move cost or the spill cost should consider
>> the cost associated with register used and the new variables generated. The 
>> movement  of new variables increases or
>> decreases the register pressure, which is based on  overall cost of n_new + 
>> n_old variables.
>>
>> The increase and decrease in register pressure is based on the overall cost 
>> of n_new + n_old as the changes in the
>> register pressure caused due to new variables is based on how the changes 
>> behave with respect to the register used
>> in the loops.
>>
>> Thus the register pressure caused to new variables is based on the new 
>> variables and its impact on register used inside
>>  the loops and thus consider the overall  cost of n_new + n_old.
>>
>> Bootstrap for i386 and reg tested on i386 with the change is fine.
>>
>> SPEC CPU 2000 benchmarks are run and there is following impact on the 
>> performance
>> and code size.
>>
>> ratio with the optimization vs ratio without optimization for INT benchmarks
>> (3807.632 vs 3804.661)
>>
>> ratio with the optimization vs ratio without optimization for FP benchmarks
>> ( 4668.743 vs 4778.741)
>>
>> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
>>
>> Number of instruction with optimization = 1094117
>> Number of instruction without optimization = 1094659
>>
>> Reduction in number of instruction with the optimization = 542 instruction.
>>
>> [Patch,optimization]: Optimized changes in the estimate
>>  register pressure cost.
>>
>> Earlier the estimate_reg_pressure cost uses the cost of n_new variables that
>> are generated by the Loop Invariant and IVOPTS. These are not sufficient for
>> register pressure calculation. The register pressure cost calculation should
>> use the n_new + n_old (numbers) to consider the cost. n_old is the register
>> used inside the loops and the affect of n_new new variables generated by
>> loop invariant and IVOPTS on register pressure is based on how the new
>> variables impact on register used inside the loops.
>>
>> ChangeLog:
>> 2015-09-26  Ajit Agarwal  
>>
>>   * cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
>>   to consider the n_new plus n_old in the register pressure
>>   cost.
>>
>> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> Ajit,
>   It looks to me like your change doesn't do anything at all inside the
> loop-invariant.c code. There it's doing a difference between two
> estimate_reg_pressure_cost calls so adding n_old (regs_used) to both is
> canceled out.
>
>   size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
>regs_used, speed, call_p)
>- estimate_reg_pressure_cost (new_regs[0],
>  regs_used, speed, call_p));
>
> I'm not quite sure I understand the "why" of the heuristic you've added
> here -- can you explain your reasoning further?

With this, I think the only change would be in GIMPLE IVOPT?  The
patch increases register pressure if it exceeds available register
number when choosing iv candidates.  As I mentioned, it may only have
impact on scenarios that's on the verge of available register number,
otherwise the reg_old is added(thus cancelled) for both ends of
comparison.
The result isn't clear enough even for boundary cases because IVO now
has issues in computing the "starting register pressure".  I also
planned to visit the pressure model in IVO later.

Thanks,
bin
>
>>
>> Thanks & Regards
>> Ajit
>>
>
> Thanks,
> Aaron
>
> --
> Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
>


Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28 Thread Andrew Pinski
On Mon, Sep 28, 2015 at 4:52 PM, Evandro Menezes  wrote:
> In some micro-architectures the insns to load or store pairs of vector
> registers are implemented rather differently from those affecting lanes in
> vector registers.  Then, it's important that such insns be described
> likewise differently in the scheduling model.
>
> This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart from
> the current neon_load2_2reg_q and neon_store2_2reg_q types, respectively.

This is a very useful patch for the ThunderX core also.  I will update
the config/aarch64/thunderx.md file if this patch gets approved.

Thanks,
Andrew

>
> Thank you,
>
> --
> Evandro Menezes
>


Re: [PATCH] fix PR67700

2015-09-28 Thread H.J. Lu
On Mon, Sep 28, 2015 at 3:48 PM, Sebastian Paul Pop  wrote:
> I fixed this in a follow-up patch.
>

Now I got

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67754

FAIL: gcc.dg/graphite/interchange-10.c execution test
FAIL: gcc.dg/graphite/interchange-11.c execution test
FAIL: gcc.dg/graphite/interchange-1.c execution test
FAIL: gcc.dg/graphite/interchange-3.c execution test
FAIL: gcc.dg/graphite/interchange-4.c execution test
FAIL: gcc.dg/graphite/interchange-7.c execution test
FAIL: gcc.dg/graphite/pr46185.c execution test
FAIL: gcc.dg/graphite/uns-block-1.c execution test
FAIL: gcc.dg/graphite/uns-interchange-12.c execution test
FAIL: gcc.dg/graphite/uns-interchange-14.c execution test
FAIL: gcc.dg/graphite/uns-interchange-15.c execution test
FAIL: gcc.dg/graphite/uns-interchange-9.c execution test
FAIL: gcc.dg/graphite/uns-interchange-mvt.c execution test
FAIL: gfortran.dg/graphite/block-1.f90   -O  (internal compiler error)
FAIL: gfortran.dg/graphite/block-1.f90   -O  (test for excess errors)

on Linux/x86 with ISL 0.14.

-- 
H.J.


[PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28 Thread Evandro Menezes
In some micro-architectures the insns to load or store pairs of vector 
registers are implemented rather differently from those affecting lanes 
in vector registers.  Then, it's important that such insns be described 
likewise differently in the scheduling model.


This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart 
from the current neon_load2_2reg_q and neon_store2_2reg_q types, 
respectively.


Thank you,

--
Evandro Menezes

>From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 28 Sep 2015 15:00:00 -0500
Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28  Evandro Menezes  

	gcc/
	* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
	add new insn types for vector load and store pairs.
	* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
	types "neon_ldp{,_q}".
	* config/arm/cortex-a57.md (neon_load_c): add insn types
	"neon_ldp{,_q}".
	(neon_store_complex): add insn types "neon_stp{,_q}".
	* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
	"neon_{ldp,stp}_q".
---
 gcc/config/aarch64/aarch64-simd.md |  2 +-
 gcc/config/arm/cortex-a53.md   |  2 +-
 gcc/config/arm/cortex-a57.md   |  6 --
 gcc/config/arm/types.md| 14 --
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5ab2f2b..541faf9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4327,7 +4327,7 @@
#
stp\\t%q1, %R1, %0
ldp\\t%q0, %R0, %1"
-  [(set_attr "type" "multiple,neon_store2_2reg_q,neon_load2_2reg_q")
+  [(set_attr "type" "multiple,neon_stp_q,neon_ldp_q")
(set (attr "length") (symbol_ref "aarch64_simd_attr_length_move (insn)"))]
 )
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 3fa0625..032d5eb 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -325,7 +325,7 @@
 
 (define_insn_reservation "cortex_a53_f_load_2reg" 5
   (and (eq_attr "tune" "cortexa53")
-   (eq_attr "type" "neon_load2_2reg_q"))
+   (eq_attr "type" "neon_ldp, neon_ldp_q, neon_load2_2reg_q"))
   "(cortex_a53_slot_any+cortex_a53_ls)*2")
 
 (define_insn_reservation "cortex_a53_f_loadq" 5
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index d6ce440..c751dd4 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -202,7 +202,8 @@
 	  (eq_attr "type" "neon_load1_3reg, neon_load1_3reg_q,\
 			   neon_load1_4reg, neon_load1_4reg_q")
 	(const_string "neon_load_b")
-	  (eq_attr "type" "neon_load1_one_lane, neon_load1_one_lane_q,\
+	  (eq_attr "type" "neon_ldp, neon_ldp_q,\
+			   neon_load1_one_lane, neon_load1_one_lane_q,\
 			   neon_load1_all_lanes, neon_load1_all_lanes_q,\
 			   neon_load2_2reg, neon_load2_2reg_q,\
 			   neon_load2_all_lanes, neon_load2_all_lanes_q")
@@ -224,7 +225,8 @@
 	(const_string "neon_store_a")
 	  (eq_attr "type" "neon_store1_2reg, neon_store1_1reg_q")
 	(const_string "neon_store_b")
-	  (eq_attr "type" "neon_store1_3reg, neon_store1_3reg_q,\
+	  (eq_attr "type" "neon_stp, neon_stp_q,\
+			   neon_store1_3reg, neon_store1_3reg_q,\
 			   neon_store3_3reg, neon_store3_3reg_q,\
 			   neon_store2_4reg, neon_store2_4reg_q,\
 			   neon_store4_4reg, neon_store4_4reg_q,\
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 534be74..73f482d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -376,6 +376,8 @@
 ; neon_from_gp
 ; neon_from_gp_q
 ; neon_ldr
+; neon_ldp
+; neon_ldp_q
 ; neon_load1_1reg
 ; neon_load1_1reg_q
 ; neon_load1_2reg
@@ -409,6 +411,8 @@
 ; neon_load4_one_lane
 ; neon_load4_one_lane_q
 ; neon_str
+; neon_stp
+; neon_stp_q
 ; neon_store1_1reg
 ; neon_store1_1reg_q
 ; neon_store1_2reg
@@ -889,6 +893,8 @@
   neon_from_gp_q,\
 \
   neon_ldr,\
+  neon_ldp,\
+  neon_ldp_q,\
   neon_load1_1reg,\
   neon_load1_1reg_q,\
   neon_load1_2reg,\
@@ -926,6 +932,8 @@
   neon_load4_one_lane_q,\
 \
   neon_str,\
+  neon_stp,\
+  neon_stp_q,\
   neon_store1_1reg,\
   neon_store1_1reg_q,\
   neon_store1_2reg,\
@@ -1128,7 +1136,8 @@
   neon_sat_mla_s_long, neon_sat_mla_h_scalar_long,\
   neon_sat_mla_s_scalar_long,\
   neon_to_gp, neon_to_gp_q, neon_from_gp, neon_from_gp_q,\
-  neon_ldr, neon_load1_1reg, neon_load1_1reg_q, neon_load1_2reg,\
+	   neon_ldr, neon_ldp, neon_ldp_q,\
+	   neon_load1_1reg, neon_load1_1reg_q, neon_load1_2reg,\
   neon_load1_2reg_q, neon_load1_3reg, neon_load1_3reg_q,\
   neon_load1_4reg, neon_load1_4reg_q, neon_load1_all_lanes,\
   neon_load1_all_lanes_q, neon_load1_one_lane, neon_load1_one_lane_q,\
@@ -1139,7 +1148,8 @@
   neon_load3_all_lanes_q, neon_load3_one_lane, neon_load3_one_lane_q,\
   neon_load4_4reg, neon_load4_4reg_q, neon_load4_all_lanes,\
   neon_load4_all

Re: [PATCH] Fix gcc.dg/asm-4.c

2015-09-28 Thread Mike Stump
On Sep 28, 2015, at 2:43 PM, Segher Boessenkool  
wrote:
> Double-quoted words in Tcl have substitutions performed on them, including
> backslash substitutions.  That isn't terribly nice for regular expressions,
> so use braced words instead.
> 
> Tested on powerpc64-linux.  Okay for mainline?

Ok.

RE: [PATCH] fix PR67700

2015-09-28 Thread Sebastian Paul Pop
I fixed this in a follow-up patch.

Sebastian

-Original Message-
From: H.J. Lu [mailto:hjl.to...@gmail.com] 
Sent: Monday, September 28, 2015 2:39 PM
To: Tobias Grosser
Cc: Sebastian Pop; GCC Patches; Sebastian Pop; aditya...@samsung.com; Richard 
Biener
Subject: Re: [PATCH] fix PR67700

On Sat, Sep 26, 2015 at 3:34 AM, Tobias Grosser  wrote:
> On 09/25/2015 10:39 PM, Sebastian Pop wrote:
>>
>> The patch makes the detection of scop parameters in
>> parameter_index_in_region a
>> bit more conservative by discarding scalar variables defined in function
>> of data
>> references defined in the scop.
>>
>> 2015-09-25  Aditya Kumar  
>>  Sebastian Pop  
>>
>>  PR tree-optimization/67700
>>  * graphite-sese-to-poly.c (parameter_index_in_region):
>> Call
>>  invariant_in_sese_p_rec.
>>  (extract_affine): Same.
>>  (rewrite_cross_bb_scalar_deps): Call update_ssa.
>>  * sese.c (invariant_in_sese_p_rec): Export.  Handle vdefs
>> and vuses.
>>  * sese.h (invariant_in_sese_p_rec): Declare.
>>
>>  * testsuite/gcc.dg/graphite/run-id-pr67700.c: New.

It breaks bootstrap on x86:

https://gcc.gnu.org/ml/gcc-regression/2015-09/msg00382.html

../../src-trunk/gcc/sese.c: In function âbool
invariant_in_sese_p_rec(tree, sese)â:
../../src-trunk/gcc/sese.c:781:12: error: unused variable âvdefâ
[-Werror=unused-variable]
   if (tree vdef = gimple_vdef (stmt))
^

-- 
H.J.



RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-09-28 Thread Moore, Catherine


> -Original Message-
> From: Steve Ellcey [mailto:sell...@imgtec.com]
> Sent: Friday, September 11, 2015 2:06 PM
> To: Matthew Fortune
> Cc: GCC Patches; Moore, Catherine
> Subject: RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI
> 
> On Fri, 2015-09-04 at 01:40 -0700, Matthew Fortune wrote:
> 
> > A few comments below. I found some of the comments a bit hard to parse
> but have
> > not attempted any rewording. I'd like Catherine to comment too as I have
> barely
> > any experience at the gimple level to know if this accounts for any
> necessary
> > subtleties.
> 
> Catherine said she would look at this next week but I have updated the
> patch in the mean time to address your comments and give Catherine a
> more up-to-date patch to look over.
> 

Hi Steve, I'm sorry for the delay in reviewing this patch. 
Some changes have been committed upstream (see revision #227941) that will 
require updates to this patch.
Please post the update for review.  Other comments are embedded.

> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5712547..eea97de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -420,6 +420,7 @@ microblaze*-*-*)
>  mips*-*-*)
>   cpu_type=mips
>   extra_headers="loongson.h"
> + extra_objs="frame-header-opt.o"
>   extra_options="${extra_options} g.opt fused-madd.opt mips/mips-
> tables.opt"
>   ;;
>  nds32*)
> diff --git a/gcc/config/mips/frame-header-opt.c b/gcc/config/mips/frame-
> header-opt.c
> index e69de29..5c4e93c 100644
> --- a/gcc/config/mips/frame-header-opt.c
> +++ b/gcc/config/mips/frame-header-opt.c
> @@ -0,0 +1,221 @@
> +/* Analyze functions to determine if calling functions need to allocate
> +   stack space (a frame header) for its called functions to write out their
> +   arguments on to the stack.  This optimization is only applicable to
> +   TARGET_OLDABI targets because calling functions on TARGET_NEWABI
> targets
> +   do not allocate any stack space for arguments (the called function does it
> +   if needed).
> +

Overall, I agree with Matthew regarding the comments being a little hard to 
parse.
How about:

/* Analyze functions to determine if callers need to allocate a frame header on 
the stack.  The frame header is used by callees to save its arguments.
   This optimization is specific to TARGET_OLDABI targets.  For TARGET_NEWABI 
targets, if a frame header is required, it is allocated by the callee.  */


> +
> +/* Look at all the functions this function calls and return true if none of
> +   them need the argument stack space that this function would normally
> +   allocate.  Return false if one or more functions does need this space
> +   or if we cannot determine that all called functions do not need the
> +   space.  */

/* Returns TRUE if the argument stack space allocated by function FN is used.
 Returns FALSE if the space is needed or if the need for the space cannot 
be determined.  */
> +
> +static bool
> +}
> +
> +/* This optimization scans all the functions in the compilation unit to find
> +   out which ones do not need the frame header that their caller normally
> +   allocates.  Then it does a second scan of all the functions to determine
> +   which functions can skip the allocation because none of the functions it
> +   calls need the frame header.  */
> +

  /* Scan each function to determine those that need its frame headers.  
Perform a second
   scan to determine if the allocation can be skipped because none of its 
callees require the frame header.  */
> +}
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c0ec0fd..8152645 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17940,6 +17941,18 @@ if @var{ra-address} is nonnull.
> 
>  The default is @option{-mno-mcount-ra-address}.
> 
> +@item -mframe-header-opt
> +@itemx -mno-frame-header-opt
> +@opindex mframe-header-opt
> +Enable (disable) frame header optimization in the o32 ABI.  When using
> +the o32 ABI, calling functions allocate 16 bytes on the stack in case
> +the called function needs to write out register arguments to memory so
> +that their address can be taken.  When enabled, this optimization will
> +cause the calling function to not allocate that space if it can determine
> +that none of its called functions use it.
> +
> +This optimization is off by default at all optimization levels.
> +
>  @end table
> 
>  @node MMIX Options

How about this instead:
Enable (disable) frame header optimization in the o32 ABI.  When using the o32
ABI, calling functions will allocate 16 bytes on the stack for the called 
function
to write out register arguments.  When enabled, this optimization will suppress 
the
allocation of the frame header if it can be determined that it is unused.

This optimization is off by default at all optimization levels.

Catherine


Update soft-fp from glibc

2015-09-28 Thread Joseph Myers
This patch updates the soft-fp code in libgcc from glibc.  There are
no changes here of significance to the use of soft-fp in GCC (and so
no testsuite additions); it's simply an update to bring in the latest
soft-fp version (which will also hopefully go into Linux 4.4 to
replace the 15-year-old copy currently in Linux).

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

2015-09-28  Joseph Myers  

* soft-fp/adddf3.c: Update from glibc.
* soft-fp/addsf3.c: Likewise.
* soft-fp/addtf3.c: Likewise.
* soft-fp/divdf3.c: Likewise.
* soft-fp/divsf3.c: Likewise.
* soft-fp/divtf3.c: Likewise.
* soft-fp/double.h: Likewise.
* soft-fp/eqdf2.c: Likewise.
* soft-fp/eqsf2.c: Likewise.
* soft-fp/eqtf2.c: Likewise.
* soft-fp/extenddftf2.c: Likewise.
* soft-fp/extended.h: Likewise.
* soft-fp/extendsfdf2.c: Likewise.
* soft-fp/extendsftf2.c: Likewise.
* soft-fp/extendxftf2.c: Likewise.
* soft-fp/fixdfdi.c: Likewise.
* soft-fp/fixdfsi.c: Likewise.
* soft-fp/fixdfti.c: Likewise.
* soft-fp/fixsfdi.c: Likewise.
* soft-fp/fixsfsi.c: Likewise.
* soft-fp/fixsfti.c: Likewise.
* soft-fp/fixtfdi.c: Likewise.
* soft-fp/fixtfsi.c: Likewise.
* soft-fp/fixtfti.c: Likewise.
* soft-fp/fixunsdfdi.c: Likewise.
* soft-fp/fixunsdfsi.c: Likewise.
* soft-fp/fixunsdfti.c: Likewise.
* soft-fp/fixunssfdi.c: Likewise.
* soft-fp/fixunssfsi.c: Likewise.
* soft-fp/fixunssfti.c: Likewise.
* soft-fp/fixunstfdi.c: Likewise.
* soft-fp/fixunstfsi.c: Likewise.
* soft-fp/fixunstfti.c: Likewise.
* soft-fp/floatdidf.c: Likewise.
* soft-fp/floatdisf.c: Likewise.
* soft-fp/floatditf.c: Likewise.
* soft-fp/floatsidf.c: Likewise.
* soft-fp/floatsisf.c: Likewise.
* soft-fp/floatsitf.c: Likewise.
* soft-fp/floattidf.c: Likewise.
* soft-fp/floattisf.c: Likewise.
* soft-fp/floattitf.c: Likewise.
* soft-fp/floatundidf.c: Likewise.
* soft-fp/floatundisf.c: Likewise.
* soft-fp/floatunditf.c: Likewise.
* soft-fp/floatunsidf.c: Likewise.
* soft-fp/floatunsisf.c: Likewise.
* soft-fp/floatunsitf.c: Likewise.
* soft-fp/floatuntidf.c: Likewise.
* soft-fp/floatuntisf.c: Likewise.
* soft-fp/floatuntitf.c: Likewise.
* soft-fp/gedf2.c: Likewise.
* soft-fp/gesf2.c: Likewise.
* soft-fp/getf2.c: Likewise.
* soft-fp/ledf2.c: Likewise.
* soft-fp/lesf2.c: Likewise.
* soft-fp/letf2.c: Likewise.
* soft-fp/muldf3.c: Likewise.
* soft-fp/mulsf3.c: Likewise.
* soft-fp/multf3.c: Likewise.
* soft-fp/negdf2.c: Likewise.
* soft-fp/negsf2.c: Likewise.
* soft-fp/negtf2.c: Likewise.
* soft-fp/op-1.h: Likewise.
* soft-fp/op-2.h: Likewise.
* soft-fp/op-4.h: Likewise.
* soft-fp/op-8.h: Likewise.
* soft-fp/op-common.h: Likewise.
* soft-fp/quad.h: Likewise.
* soft-fp/single.h: Likewise.
* soft-fp/soft-fp.h: Likewise.
* soft-fp/subdf3.c: Likewise.
* soft-fp/subsf3.c: Likewise.
* soft-fp/subtf3.c: Likewise.
* soft-fp/truncdfsf2.c: Likewise.
* soft-fp/trunctfdf2.c: Likewise.
* soft-fp/trunctfsf2.c: Likewise.
* soft-fp/trunctfxf2.c: Likewise.
* soft-fp/unorddf2.c: Likewise.
* soft-fp/unordsf2.c: Likewise.
* soft-fp/unordtf2.c: Likewise.

Index: libgcc/soft-fp/adddf3.c
===
--- libgcc/soft-fp/adddf3.c (revision 228199)
+++ libgcc/soft-fp/adddf3.c (working copy)
@@ -1,6 +1,6 @@
 /* Software floating-point emulation.
Return a + b
-   Copyright (C) 1997-2014 Free Software Foundation, Inc.
+   Copyright (C) 1997-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Richard Henderson (r...@cygnus.com) and
  Jakub Jelinek (j...@ultra.linux.cz).
Index: libgcc/soft-fp/addsf3.c
===
--- libgcc/soft-fp/addsf3.c (revision 228199)
+++ libgcc/soft-fp/addsf3.c (working copy)
@@ -1,6 +1,6 @@
 /* Software floating-point emulation.
Return a + b
-   Copyright (C) 1997-2014 Free Software Foundation, Inc.
+   Copyright (C) 1997-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Richard Henderson (r...@cygnus.com) and
  Jakub Jelinek (j...@ultra.linux.cz).
Index: libgcc/soft-fp/addtf3.c
===
--- libgcc/soft-fp/addtf3.c (revision 228199)
+++ libgcc/soft-fp/addtf3.c (working copy)
@@ -1,6 +1,6 @@
 /* Software floating-point emula

[PATCH] Fix gcc.dg/asm-4.c

2015-09-28 Thread Segher Boessenkool
Double-quoted words in Tcl have substitutions performed on them, including
backslash substitutions.  That isn't terribly nice for regular expressions,
so use braced words instead.

Tested on powerpc64-linux.  Okay for mainline?


Segher


2015-09-28  Segher Boessenkool  

gcc/testsuite/
* gcc.dg/asm-4.c: Use braced words for the regular expressions.

---
 gcc/testsuite/gcc.dg/asm-4.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/asm-4.c b/gcc/testsuite/gcc.dg/asm-4.c
index 0067598..1e6a538 100644
--- a/gcc/testsuite/gcc.dg/asm-4.c
+++ b/gcc/testsuite/gcc.dg/asm-4.c
@@ -11,7 +11,6 @@ int main()
   asm volatile ("test3 %[in]" : [inout] "=g"(x) : "[inout]" (x), [in] "g" (y));
 }
 
-/* ??? Someone explain why the back reference dosn't work.  */
-/* { dontdg-final { scan-assembler "test0 X(.*)Y\1Z" } } */
-/* { dontdg-final { scan-assembler "test1 X(.*)Y\1Z" } } */
-/* { dontdg-final { scan-assembler "test2 X(.*)Y\1Z" } } */
+/* { dg-final { scan-assembler {test0 X(.*)Y\1Z} } } */
+/* { dg-final { scan-assembler {test1 X(.*)Y\1Z} } } */
+/* { dg-final { scan-assembler {test2 X(.*)Y\1Z} } } */
-- 
1.8.1.4



Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux

2015-09-28 Thread Paul Richard Thomas
Committed as revision 228222. Thanks for all the help.

I'll update the fortran documentation tomorrow.

Cheers

Paul

On 28 September 2015 at 20:22, Paul Richard Thomas
 wrote:
> Dear Mikael,
>
> snip...
>
>>>  * io.c (next_char_not_space): Change tab warning to warning now
>>>  to prevent locus being lost.
>>
>> This has disappeared?
>
> duuh! Thanks
>
>
> snip
>
>> I think that for better error reporting (avoid unclassifiable statement),
>> the gfc_notification_std can be dropped, as there is a specific
>> gfc_notify_std guarding resolution.
>
> That's true - I'll check it out right now.
>
>>
>> Same for the rest of the condition.  gfc_match_ptr_fcn_assign carrefully
>> restores existing errors upon failure, so I would rather use it more often.
>>
>> So, can you try removing the condition completely (and use the match macro
>> above again)?  that should improve errors in ptr_func_assign_2, and
>> hopefully not regress.
>> If it does regress, let's keep it as is.
>
> It does regress - that's why it is the way it is. Fortunately,
> MATCH_ERROR for statement functions would produce pretty much the same
> result in pointer function assignments. The regression is in
> recursive_statement_functions.f90, which just gets hopelessly tangled
> up in error recovery.
>
> snip
>
>> Nit: Usually, we don't put the 'F2008:' prefix.
>> Also may be explicit a bit more: "function result as assigned-to variable"
>> or something alike.
>
> Nits or not, they are good points :-)
>
>>
>> Anyway, those are nits, and the rest looks good to me.
>> So, with the above comments, the patch is OK as far as I'm concerned.
>> Thanks
>
> OK - I'll try to do the honours tonight.
>
> Thanks for the reviews.
>
> Paul



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


Re: [PATCH] Convert SPARC to LRA

2015-09-28 Thread Segher Boessenkool
On Mon, Sep 28, 2015 at 03:23:37PM -0400, Vladimir Makarov wrote:
> There are more ports using reload than LRA now.  Even some major ports 
> (e.g. ppc64) did not switch to LRA.

There still are some failures in the testsuite (ICEs even) so we're
not there yet.

> I usually say target maintainers, that if they don't switch LRA they 
> probably will have problems with maintenance and development in a long 
> perspective.  New things are easier to implement in LRA.

It is also true that new *ports* are easier to do with LRA than with
reload :-)

> >It *may* be time to decree that any new ports must use the LRA path 
> >rather than reload.  I'm still on the fence with that.
> 
> That is probably a good policy I see now.  Porting LRA might be not an 
> easy task as a lot of target hooks (and even insn definitions, e.g. 
> hints *?!) were written taking reload algorithms into account. LRA uses 
> different ones and many hook implementations are misleading.  Many 
> target ports are just in a maintenance mode and simply there are no 
> resources to do LRA port for this targets.  So I believe reload will 
> stay for a long time.

We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.

I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".


Segher


Re: [PATCH] hurd: align -p and -pg behavior on Linux

2015-09-28 Thread Samuel Thibault
Ping?

Samuel Thibault, le Sat 19 Sep 2015 14:00:23 +0200, a écrit :
> On Linux, -p and -pg do not make gcc link against libc_p.a, only
> -profile does (as documented in r11246), and thus people expect -p
> and -pg to work without libc_p.a installed (it is actually even not
> available any more in Debian).  We should thus rather make the Hurd port
> do the same to avoid build failures.
> 
> Samuel
> 
>   * gcc/config/gnu.h (LIB_SPEC) [-p|-pg]: Link with -lc instead of -lc_p.
> * gcc/config/i386/gnu.h (STARTFILE_SPEC) [-p|-pg]: Use gcrt1.o
> instead of gcrt0.o.
> 
> --- gcc/config/gnu.h.orig 2015-09-16 00:43:09.785570853 +0200
> +++ gcc/config/gnu.h  2015-09-16 00:43:12.513550418 +0200
> @@ -25,7 +25,7 @@
>  
>  /* Default C library spec.  */
>  #undef LIB_SPEC
> -#define LIB_SPEC "%{pthread:-lpthread} %{pg|p|profile:-lc_p;:-lc}"
> +#define LIB_SPEC "%{pthread:-lpthread} %{profile:-lc_p;:-lc}"
>  
>  #undef GNU_USER_TARGET_OS_CPP_BUILTINS
>  #define GNU_USER_TARGET_OS_CPP_BUILTINS()\
> --- gcc/config/i386/gnu.h.orig2015-09-17 21:41:13.0 +
> +++ gcc/config/i386/gnu.h 2015-09-17 23:03:57.0 +
> @@ -27,11 +27,11 @@
>  #undef   STARTFILE_SPEC
>  #if defined HAVE_LD_PIE
>  #define STARTFILE_SPEC \
> -  "%{!shared: 
> %{pg|p|profile:gcrt0.o%s;pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
> +  "%{!shared: 
> %{pg|p:gcrt1.o%s;profile:gcrt0.o%s;pie:Scrt1.o%s;static:crt0.o%s;:crt1.o%s}} \
> crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
>  #else
>  #define STARTFILE_SPEC \
> -  "%{!shared: %{pg|p|profile:gcrt0.o%s;static:crt0.o%s;:crt1.o%s}} \
> +  "%{!shared: %{pg|p:gcrt1.o%s;profile:gcrt0.o%s;static:crt0.o%s;:crt1.o%s}} 
> \
> crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
>  #endif
>  

-- 
Samuel
Hi ! I'm a .signature virus ! Copy me into your ~/.signature, please !


Re: [PATCH] fix PR67700

2015-09-28 Thread H.J. Lu
On Sat, Sep 26, 2015 at 3:34 AM, Tobias Grosser  wrote:
> On 09/25/2015 10:39 PM, Sebastian Pop wrote:
>>
>> The patch makes the detection of scop parameters in
>> parameter_index_in_region a
>> bit more conservative by discarding scalar variables defined in function
>> of data
>> references defined in the scop.
>>
>> 2015-09-25  Aditya Kumar  
>>  Sebastian Pop  
>>
>>  PR tree-optimization/67700
>>  * graphite-sese-to-poly.c (parameter_index_in_region):
>> Call
>>  invariant_in_sese_p_rec.
>>  (extract_affine): Same.
>>  (rewrite_cross_bb_scalar_deps): Call update_ssa.
>>  * sese.c (invariant_in_sese_p_rec): Export.  Handle vdefs
>> and vuses.
>>  * sese.h (invariant_in_sese_p_rec): Declare.
>>
>>  * testsuite/gcc.dg/graphite/run-id-pr67700.c: New.

It breaks bootstrap on x86:

https://gcc.gnu.org/ml/gcc-regression/2015-09/msg00382.html

../../src-trunk/gcc/sese.c: In function âbool
invariant_in_sese_p_rec(tree, sese)â:
../../src-trunk/gcc/sese.c:781:12: error: unused variable âvdefâ
[-Werror=unused-variable]
   if (tree vdef = gimple_vdef (stmt))
^

-- 
H.J.


Re: Openacc launch API

2015-09-28 Thread Nathan Sidwell

On 09/24/15 04:40, Jakub Jelinek wrote:


Iff GCC 5 compiled offloaded OpenACC/PTX code will always do host fallback
anyway because of the incompatible PTX version, then why don't you just
do
   goacc_save_and_set_bind (acc_device_host);
   fn (hostaddrs);
   goacc_restore_bind ();


Committed the  attached.  Thanks for the review.

nathan

2015-09-28  Nathan Sidwell  

	inlude/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment.
	(GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX,
	GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* libgomp.h (acc_dispatch_t): Replace separate geometry args with
	array.
	* libgomp.map (GOACC_parallel_keyed): New.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): New interface.  Lose geometry arguments
	and take keyed varargs list.  Adjust call to exec_func.
	(GOACC_parallel): Force host fallback.
	* libgomp_g.h (GOACC_parallel): Remove.
	(GOACC_parallel_keyed): Declare.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New struct.
	(stuct targ_gn_descriptor): Replace name field with launch field.
	(nvptx_exec): Lose separate geometry args, take array.  Process
	dynamic dimensions and adjust.
	(struct nvptx_tdata): Replace fn_names field with fn_descs.
	(GOMP_OFFLOAD_load_image): Adjust for change in function table
	data.
	(GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension
	passing.
	* oacc-host.c (host_openacc_exec): Adjust for change in dimension
	passing.

	gcc/
	* config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h.
	(nvptx_record_offload_symbol): Record function execution geometry.
	* config/nvptx/mkoffload.c (process): Include launch geometry in
	function data.
	* omp-low.c (oacc_launch_pack): New.
	(replace_oacc_fn_attrib): New.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Create keyed varargs for GOACC_parallel call
	generation.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.
	* types.def (DEF_FUNCTION_TYPE_VAR_6): New.
	(DEF_FUNCTION_TYPE_VAR_11): Delete.

Index: include/gomp-constants.h
===
--- include/gomp-constants.h	(revision 228086)
+++ include/gomp-constants.h	(working copy)
@@ -115,11 +115,33 @@ enum gomp_map_kind
 
 /* Versions of libgomp and device-specific plugins.  */
 #define GOMP_VERSION	0
-#define GOMP_VERSION_NVIDIA_PTX 0
+#define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x)
 #define GOMP_VERSION_DEV(PACK) ((PACK) & 0x)
 
+#define GOMP_DIM_GANG	0
+#define GOMP_DIM_WORKER	1
+#define GOMP_DIM_VECTOR	2
+#define GOMP_DIM_MAX	3
+#define GOMP_DIM_MASK(X) (1u << (X))
+
+/* Varadic launch arguments.  End of list is marked by a zero.  */
+#define GOMP_LAUNCH_DIM		1  /* Launch dimensions, op = mask */
+#define GOMP_LAUNCH_ASYNC	2  /* Async, op = cst val if not MAX  */
+#define GOMP_LAUNCH_WAIT	3  /* Waits, op = num waits.  */
+#define GOMP_LAUNCH_CODE_SHIFT	28
+#define GOMP_LAUNCH_DEVICE_SHIFT 16
+#define GOMP_LAUNCH_OP_SHIFT 0
+#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP)	\
+  (((CODE) << GOMP_LAUNCH_CODE_SHIFT)		\
+   | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT)	\
+   | ((OP) << GOMP_LAUNCH_OP_SHIFT))
+#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf)
+#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff)
+#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x)
+#define GOMP_LAUNCH_OP_MAX 0x
+
 #endif
Index: gcc/lto/lto-lang.c
===
--- gcc/lto/lto-lang.c	(revision 228086)
+++ gcc/lto/lto-lang.c	(working copy)
@@ -160,10 +160,10 @@ enum lto_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG6) \
 NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+ ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
 ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
- ARG6, ARG7, ARG8, ARG9, ARG

Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-28 Thread Jeff Law

On 09/28/2015 02:15 AM, Senthil Kumar Selvaraj wrote:

Hi,

   The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
   pointer checks.

   The test fails for such targets (avr, in my case) because the address
   comparison in the below code does not resolve to a constant, causing
   builtin_constant_p to return false and fail the test.

   /* Variables and functions do not share same memory locations otherwise.  */
   if (!__builtin_constant_p ((void *)undef_fn0 == (void *)&undef_var0))
 abort ();

   For targets that delete null pointer checks, the equality comparison 
expression
   is optimized away to 0, as the code in match.pd knows they can only be
   equal if they are both NULL, which cannot be true since
   flag-delete-null-pointer-checks is on.

   For targets that keep null pointer checks, 0 is a valid address and the
comparison expression is left as is, and that causes a later pass to
fold the builtin_constant_p to a false value, resulting in the test 
failure.

This sounds like a failing in the compiler itself, not a testsuite issue.

Even on a target where objects can be at address 0, you can't have a 
variable and a function at the same address.


Jeff


Re: [PATCH] Fix undefined behaviour in arc port

2015-09-28 Thread Jeff Law

On 09/28/2015 11:12 AM, Jeff Law wrote:

On 09/26/2015 03:05 AM, Andreas Schwab wrote:

Jeff Law  writes:


@@ -9320,7 +9320,9 @@ arc_legitimize_reload_address (rtx *p,
machine_mode mode, int opnum,
if ((scale-1) & offset)
  scale = 1;
shift = scale >> 1;
-  offset_base = (offset + (256 << shift)) & (-512 << shift);
+  offset_base
+= ((offset + (256 << shift))
+   & ((HOST_WIDE_INT)(-512U << shift)));


If HOST_WIDE_INT is bigger than int then this is not the same.

I'll fix this too.


Fixed thusly.

Rebuilt arceb-linux-uclibc for testing purposes.

Jeff
commit 0cd75b5dc1dd8ed3f60d0b12d5cc43cc52d213aa
Author: Jeff Law 
Date:   Mon Sep 28 13:23:43 2015 -0400

Re: [PATCH] Fix undefined behaviour in arc port
* config/arc/arc.c (arc_legitimize_reload_address): Fix stupid
thinko in last change.
* config/arc/constraints.md (C2a): Fix typos in last change.

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 4d731b5..a6a1921 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -9322,7 +9322,7 @@ arc_legitimize_reload_address (rtx *p, machine_mode mode, 
int opnum,
   shift = scale >> 1;
   offset_base
= ((offset + (256 << shift))
-  & ((HOST_WIDE_INT)(-512U << shift)));
+  & ((HOST_WIDE_INT)((unsigned HOST_WIDE_INT) -512 << shift)));
   /* Sometimes the normal form does not suit DImode.  We
 could avoid that by using smaller ranges, but that
 would give less optimized code when SImode is
diff --git a/gcc/config/arc/constraints.md b/gcc/config/arc/constraints.md
index b3ea115..3d0db36 100644
--- a/gcc/config/arc/constraints.md
+++ b/gcc/config/arc/constraints.md
@@ -195,7 +195,7 @@
   "@internal
Unconditional two-address add / sub constant"
   (and (match_code "const_int")
-   (match_test "ival == HOST_WIDE_INT (HOST_WIDE_INT_M1U << 31)
+   (match_test "ival == (HOST_WIDE_INT) (HOST_WIDE_INT_M1U << 31)
|| (ival >= -0x4000 && ival <= 0x4000
&& ((ival >= 0 ? ival : -ival)
<= 0x7ff * (ival & -ival)))")))


Re: [PATCH] Convert SPARC to LRA

2015-09-28 Thread Vladimir Makarov

On 09/27/2015 09:29 PM, Jeff Law wrote:

On 09/27/2015 01:57 PM, Hans-Peter Nilsson wrote:

On Wed, 9 Sep 2015, Mike Stump wrote:


On Sep 8, 2015, at 9:41 PM, David Miller  wrote:

+#define TARGET_LRA_P hook_bool_void_true


Are we at the point there this should be the default, and old
ports should just define to false, if they really need to?


I think no.  For one, we don't have proper target documentation
updates for LRA.  What does it need?  What is outdated?

Also, give ample time for gcc releases of odd ports with LRA to
get into the public and cover most of the inevitable remaining
bugs.  Not even sh has moved over due to remaining issues.  Let
the reports come in - and be fixed.  Let's revisit in a year or
two.
I don't think we're there yet either -- many ports still require some 
guidance from Vlad to get working with LRA.


There are more ports using reload than LRA now.  Even some major ports 
(e.g. ppc64) did not switch to LRA.


I usually say target maintainers, that if they don't switch LRA they 
probably will have problems with maintenance and development in a long 
perspective.  New things are easier to implement in LRA.  Intel 
developers recognized this long ago and implemented some new 
optimizations in RA (the last biggest one was pic hard register reuse).  
According to them, it would be much harder to implement this in reload.


On the other hand a lot of work was done in reload during long years to 
accommodate some unique target requirements as SH.  A lot of efforts is 
needed to implement this in LRA to achieve the same performance as reload.


It *may* be time to decree that any new ports must use the LRA path 
rather than reload.  I'm still on the fence with that.


That is probably a good policy I see now.  Porting LRA might be not an 
easy task as a lot of target hooks (and even insn definitions, e.g. 
hints *?!) were written taking reload algorithms into account. LRA uses 
different ones and many hook implementations are misleading.  Many 
target ports are just in a maintenance mode and simply there are no 
resources to do LRA port for this targets.  So I believe reload will 
stay for a long time.




Re: New power of 2 hash policy

2015-09-28 Thread François Dumont
On 25/09/2015 15:28, Jonathan Wakely wrote:
> @@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> mutable std::size_t_M_next_resize;
>>   };
>>
>> +  /// Range hashing function considering that second args is a power
>> of 2.
>
> Does this mean "assuming" not "considering"?

I assume yes.

>
>> +  struct _Mask_range_hashing
>> +  {
>> +typedef std::size_t first_argument_type;
>> +typedef std::size_t second_argument_type;
>> +typedef std::size_t result_type;
>> +
>> +result_type
>> +operator()(first_argument_type __num,
>> +   second_argument_type __den) const noexcept
>> +{ return __num & (__den - 1); }
>> +  };
>> +
>> +
>> +  /// Helper type to compute next power of 2.
>> +  template
>> +struct _NextPower2
>> +{
>> +  static std::size_t
>> +  _Get(std::size_t __n)
>> +  {
>> +std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
>> +return __next |= __next >> _N;
>> +  }
>> +};
>> +
>> +  template<>
>> +struct _NextPower2<1>
>> +{
>> +  static std::size_t
>> +  _Get(std::size_t __n)
>> +  { return __n |= __n >> 1; }
>> +};
>
> This doesn't seem to return the next power of 2, it returns one less.
>
> _NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
> _NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.


Yes, name is bad, that is just part of the algo you copy/paste below. I
review implementation to have _NextPower2 do all the algo.

>
>
> I don't think this needs to be a recursive template, it can simply be
> a function, can't it?

I wanted code to adapt to any sizeof(std::size_t) without relying on
some preprocessor checks. As you pointed out additional >> 32 on 32 bits
or >> 64 on 64 bits wouldn't hurt but the recursive template just make
sure that we don't do useless operations.

>
>
>> +  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
>> +  /// operations.
>> +  struct _Power2_rehash_policy
>> +  {
>> +using __has_load_factor = std::true_type;
>> +
>> +_Power2_rehash_policy(float __z = 1.0) noexcept
>> +: _M_max_load_factor(__z), _M_next_resize(0) { }
>> +
>> +float
>> +max_load_factor() const noexcept
>> +{ return _M_max_load_factor; }
>> +
>> +// Return a bucket size no smaller than n (as long as n is not
>> above the
>> +// highest power of 2).
>
> This says "no smaller than n" but it actually seems to guarantee
> "greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
> power of two.

yes but this function is calling _NextPower2<>::_Get(n - 1) + 1, there
is a minus one which make this comment valid as shown by newly
introduced test.

>
>> +std::size_t
>> +_M_next_bkt(std::size_t __n) const
>> +{
>> +  constexpr auto __max_bkt
>> += (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
>> +
>> +  std::size_t __res
>> += _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;
>
> You wouldn't need to add one to the result if the template actually
> returned a power of two!
>
>> +  if (__res == 0)
>> +__res = __max_bkt;
>> +
>> +  if (__res == __max_bkt)
>> +// Set next resize to the max value so that we never try to
>> rehash again
>> +// as we already reach the biggest possible bucket number.
>> +// Note that it might result in max_load_factor not being
>> respected.
>> +_M_next_resize = std::size_t(0) - 1;
>> +  else
>> +_M_next_resize
>> +  = __builtin_floor(__res * (long double)_M_max_load_factor);
>> +
>> +  return __res;
>> +}
>
> What are the requirements for this function, "no smaller than n" or
> "greater than n"?

'No smaller than n' like stated in the comment. However for big n it is
not possible, even in the prime number based implementation. So I played
with _M_next_resize to make sure that _M_next_bkt won't be called again
as soon as the max bucket number has been reach.


>
> If "no smaller than n" is correct then the algorithm you want is
> "round up to nearest power of 2", which you can find here (I wrote
> this earlier this year for some reason I can't remember now):
>
> https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h
>
> The non-recursive version is only a valid constexpr function in C++14,
> but since you don't need a constexpr function you could just that,
> extended to handle 64-bit:
>
>  std::size_t
>  clp2(std::size_t n)
>  {
>std::uint_least64_t x = n;
>// Algorithm from Hacker's Delight, Figure 3-3.
>x = x - 1;
>x = x | (x >> 1);
>x = x | (x >> 2);
>x = x | (x >> 4);
>x = x | (x >> 8);
>x = x | (x >>16);
>x = x | (x >>32);
>return x + 1;
>  }
>
> We could avoid the last shift when sizeof(size_t) == 32, I don't know
> if the optimisers will take care of that anyway.

This is indeed the algo I found by myself and that I adapted to work
with any sizeof(size_t).

Do you prefer the new version or do you want to stick a more explicit
version 

Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-28 Thread Mike Stump
On Sep 28, 2015, at 1:15 AM, Senthil Kumar Selvaraj 
 wrote:
>  The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
>  pointer checks.
> 
>  The test fails for such targets (avr, in my case) because the address
>  comparison in the below code does not resolve to a constant, causing
>  builtin_constant_p to return false and fail the test.
> 
>  /* Variables and functions do not share same memory locations otherwise.  */
>  if (!__builtin_constant_p ((void *)undef_fn0 == (void *)&undef_var0))
>abort ();
> 
>  For targets that delete null pointer checks, the equality comparison 
> expression
>  is optimized away to 0, as the code in match.pd knows they can only be
>  equal if they are both NULL, which cannot be true since
>  flag-delete-null-pointer-checks is on.
> 
>  For targets that keep null pointer checks, 0 is a valid address and the 
>   comparison expression is left as is, and that causes a later pass to 
>   fold the builtin_constant_p to a false value, resulting in the test 
> failure.
> 
>  If this is ok, could someone commit please? I don't have commit
>  access.

So, my preference would be for the target maintainer (or a 
keeps_null_pointer_checks person) to review, if less then trivial.  Seem fine 
to me, but they should get first crack at the review.


[patch] libstdc++/67726 LWG 2135: terminate() in condition_variable::wait()

2015-09-28 Thread Jonathan Wakely

This was a change between C++11 and C++14.

Tested powerpc64le-linux, committed to trunk.


commit 023e16117005d8ca7dbb0e2e61059b59d7cc0e40
Author: Jonathan Wakely 
Date:   Mon Sep 28 17:47:35 2015 +0100

LWG 2135: terminate() in condition_variable::wait()

	* include/std/condition_variable (condition_variable::wait): Add
	noexcept.
	* src/c++11/condition_variable.cc (condition_variable::wait): Call
	std::terminate on error (DR 2135).

diff --git a/libstdc++-v3/include/std/condition_variable b/libstdc++-v3/include/std/condition_variable
index 4714774..f5f7734 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -89,7 +89,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 notify_all() noexcept;
 
 void
-wait(unique_lock& __lock);
+wait(unique_lock& __lock) noexcept;
 
 template
   void
diff --git a/libstdc++-v3/src/c++11/condition_variable.cc b/libstdc++-v3/src/c++11/condition_variable.cc
index cc0f6e4..fd850cb 100644
--- a/libstdc++-v3/src/c++11/condition_variable.cc
+++ b/libstdc++-v3/src/c++11/condition_variable.cc
@@ -48,12 +48,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  condition_variable::wait(unique_lock& __lock)
+  condition_variable::wait(unique_lock& __lock) noexcept
   {
 int __e = __gthread_cond_wait(&_M_cond, __lock.mutex()->native_handle());
 
 if (__e)
-  __throw_system_error(__e);
+  std::terminate();
   }
 
   void


Re: [PATCH] Convert SPARC to LRA

2015-09-28 Thread David Miller
From: Oleg Endo 
Date: Mon, 28 Sep 2015 21:26:14 +0900

> LRA on SH seems to work without GCC test suite failures.  However, I'd
> expect that there still hidden bugs not covered by the test suite.  SH's
> R0 spill failures are greatly reduced with LRA, although some hacks had
> to be added to the SH backend to make it work at all.  Despite that, we
> see quite some significant code size increases compared to reload.  If
> the difference wasn't that big, we'd probably turn LRA on by default for
> SH in GCC 6...

One weakness I noticed while working on the sparc conversion is that
the bootstrap of the compiler tests reload/LRA better than the
testsuite does.

Which is unfortunate, because bootstrap failures are significantly harder
to analyze and debug than individual testsuite cases.


Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-28 Thread Aaron Sawdey
On Sat, 2015-09-26 at 04:51 +, Ajit Kumar Agarwal wrote:
> I have made the following changes in the estimate_reg_pressure_cost function 
> used 
> by the loop invariant and IVOPTS. 
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
> are generated by the Loop Invariant
>  and IVOPTS. These are not sufficient for register pressure calculation. The 
> register pressure cost calculation should
> use the n_new + n_old (numbers) to consider the cost. n_old is the register  
> used inside the loops and the effect of
>  n_new new variables generated by loop invariant and IVOPTS on register 
> pressure is based on how the new
> variables impact on register used inside the loops. The increase or decrease 
> in register pressure is due to the impact
> of new variables on the register used  inside the loops. The 
> register-register move cost or the spill cost should consider
> the cost associated with register used and the new variables generated. The 
> movement  of new variables increases or 
> decreases the register pressure, which is based on  overall cost of n_new + 
> n_old variables.
> 
> The increase and decrease in register pressure is based on the overall cost 
> of n_new + n_old as the changes in the 
> register pressure caused due to new variables is based on how the changes 
> behave with respect to the register used 
> in the loops.
> 
> Thus the register pressure caused to new variables is based on the new 
> variables and its impact on register used inside
>  the loops and thus consider the overall  cost of n_new + n_old.
> 
> Bootstrap for i386 and reg tested on i386 with the change is fine.
> 
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance
> and code size.
> 
> ratio with the optimization vs ratio without optimization for INT benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP benchmarks
> ( 4668.743 vs 4778.741)
> 
> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
> 
> Number of instruction with optimization = 1094117
> Number of instruction without optimization = 1094659
> 
> Reduction in number of instruction with the optimization = 542 instruction.
> 
> [Patch,optimization]: Optimized changes in the estimate
>  register pressure cost.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new variables that
> are generated by the Loop Invariant and IVOPTS. These are not sufficient for
> register pressure calculation. The register pressure cost calculation should
> use the n_new + n_old (numbers) to consider the cost. n_old is the register
> used inside the loops and the affect of n_new new variables generated by
> loop invariant and IVOPTS on register pressure is based on how the new
> variables impact on register used inside the loops.
> 
> ChangeLog:
> 2015-09-26  Ajit Agarwal  
> 
>   * cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
>   to consider the n_new plus n_old in the register pressure
>   cost.
> 
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Ajit,
  It looks to me like your change doesn't do anything at all inside the
loop-invariant.c code. There it's doing a difference between two
estimate_reg_pressure_cost calls so adding n_old (regs_used) to both is
canceled out.

  size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
   regs_used, speed, call_p)
   - estimate_reg_pressure_cost (new_regs[0],
 regs_used, speed, call_p));

I'm not quite sure I understand the "why" of the heuristic you've added
here -- can you explain your reasoning further?

> 
> Thanks & Regards
> Ajit
> 

Thanks,
Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux

2015-09-28 Thread Paul Richard Thomas
Dear Mikael,

snip...

>>  * io.c (next_char_not_space): Change tab warning to warning now
>>  to prevent locus being lost.
>
> This has disappeared?

duuh! Thanks


snip

> I think that for better error reporting (avoid unclassifiable statement),
> the gfc_notification_std can be dropped, as there is a specific
> gfc_notify_std guarding resolution.

That's true - I'll check it out right now.

>
> Same for the rest of the condition.  gfc_match_ptr_fcn_assign carrefully
> restores existing errors upon failure, so I would rather use it more often.
>
> So, can you try removing the condition completely (and use the match macro
> above again)?  that should improve errors in ptr_func_assign_2, and
> hopefully not regress.
> If it does regress, let's keep it as is.

It does regress - that's why it is the way it is. Fortunately,
MATCH_ERROR for statement functions would produce pretty much the same
result in pointer function assignments. The regression is in
recursive_statement_functions.f90, which just gets hopelessly tangled
up in error recovery.

snip

> Nit: Usually, we don't put the 'F2008:' prefix.
> Also may be explicit a bit more: "function result as assigned-to variable"
> or something alike.

Nits or not, they are good points :-)

>
> Anyway, those are nits, and the rest looks good to me.
> So, with the above comments, the patch is OK as far as I'm concerned.
> Thanks

OK - I'll try to do the honours tonight.

Thanks for the reviews.

Paul


Re: [PATCH] Convert SPARC to LRA

2015-09-28 Thread Mike Stump
On Sep 27, 2015, at 6:29 PM, Jeff Law  wrote:
> I don't think we're there yet either -- many ports still require some 
> guidance from Vlad to get working with LRA.
> 
> It *may* be time to decree that any new ports must use the LRA path rather 
> than reload.  I'm still on the fence with that.

So, I think it makes sense to change the default for LRA to on now.  Port 
maintainers for which this isn’t the right choice (sh, cause code size 
regressions), can then default it to off in the port.  We can then use the 
#define LRA in the port as an indicator that things are not fine on that port 
and ideally LRA should be enhanced to make the port work better.  When no port 
has a #define left, we can then safely rm -rf all the non-LRA code.

Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses

2015-09-28 Thread Jeff Law

On 09/28/2015 05:28 AM, Bernd Schmidt wrote:

On 09/28/2015 11:43 AM, Bin Cheng wrote:

Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So
any
comments?

Thanks,
bin

2015-09-28  Bin Cheng  

* loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
(inv_cant_fwprop_to_addr_use): New function.
(record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
(get_inv_cost): Count cost if inv can't be propagated into its
address uses.


It looks at least plausible.
Definitely plausible.  Many targets have restrictions on the immediate 
offsets, so this potentially affects many targets (in a good way).




 Another option which I think has had some

discussion recently would be to just move everything, and leave it to
cprop to put things back together if the costs allow it.

I go back and forth on this kind of approach.

jeff


Re: [PATCH] Fix undefined behaviour in arc port

2015-09-28 Thread Jeff Law

On 09/26/2015 03:05 AM, Andreas Schwab wrote:

Jeff Law  writes:


@@ -9320,7 +9320,9 @@ arc_legitimize_reload_address (rtx *p, machine_mode mode, 
int opnum,
if ((scale-1) & offset)
scale = 1;
shift = scale >> 1;
-  offset_base = (offset + (256 << shift)) & (-512 << shift);
+  offset_base
+   = ((offset + (256 << shift))
+  & ((HOST_WIDE_INT)(-512U << shift)));


If HOST_WIDE_INT is bigger than int then this is not the same.

I'll fix this too.

jeff


Re: [PATCH] Fix undefined behaviour in arc port

2015-09-28 Thread Jeff Law

On 09/26/2015 03:06 AM, Andreas Schwab wrote:

Jeff Law  writes:


@@ -195,7 +195,7 @@
"@internal
 Unconditional two-address add / sub constant"
(and (match_code "const_int")
-   (match_test "ival == -1 << 31
+   (match_test "ival == HOST_WIDE_INT (HOST_WIDE_INT_M1U << 31)


Syntax error?

Yes, though strangely it's not causing the port to fail to build.  Weird.

I'll take care of it.

THanks,
Jeff


[gomp4] error on acc loops not associated with offloaded acc regions

2015-09-28 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which teaches omplower how to
error when it detects acc loops which aren't nested inside an acc
parallel or kernels region or located within a function marked as an acc
routine. A couple of test cases needed to be updated.

The error message is kind of long. Let me know if it should be revised.

Cesar
2015-09-28  Cesar Philippidis  

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Check for acc loops not
	associated with acc regions or routines.

	gcc/testsuite/
	* c-c++-common/goacc/non-routine.c: New test.
	* c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
	nesting.
	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-common/goacc/sb-1.c: Likewise.
	* c-c++-common/goacc/sb-3.c: Likewise.
	* gcc.dg/goacc/sb-1.c: Likewise.
	* gcc.dg/goacc/sb-3.c: Likewise.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 99b3939..2329a71 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	}
 	  return true;
 	}
+  if (is_gimple_omp_oacc (stmt) && ctx == NULL
+	  && get_oacc_fn_attrib (current_function_decl) == NULL)
+	{
+	  error_at (gimple_location (stmt),
+		"acc loops must be associated with an acc region or "
+		"routine");
+	  return false;
+	}
   /* FALLTHRU */
 case GIMPLE_CALL:
   if (is_gimple_call (stmt)
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index b38e181..75d6a1d 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -20,6 +20,7 @@ f_acc_kernels (void)
   }
 }
 
+#pragma acc routine
 void
 f_acc_loop (void)
 {
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 14c6aa6..6d91484 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -361,72 +361,72 @@ f_acc_data (void)
 void
 f_acc_loop (void)
 {
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp parallel
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp for
   for (i = 0; i < 3; i++)
 	;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp sections
   {
 	;
   }
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp single
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp task
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp master
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp critical
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp ordered
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target
   ;
-#pragma omp target data /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target data
   ;
-#pragma omp target update to(i) /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target update to(i)
 }
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/clauses-f

[doc, committed] clean up Asm Labels documentation

2015-09-28 Thread Sandra Loosemore
I've committed the attached patch on behalf of David Wohlferd, who 
doesn't have SVN write access.  Jeff Law already approved the technical 
content off-list, and I'm not sure the final version of the patch was 
ever posted here previously.


-Sandra

2015-09-28  David Wohlferd  

 * doc/extend.texi (Asm Labels): Break out text for data vs
	 functions.
Index: extend.texi
===
--- extend.texi	(revision 226751)
+++ extend.texi	(working copy)
@@ -8367,8 +8367,14 @@
 
 You can specify the name to be used in the assembler code for a C
 function or variable by writing the @code{asm} (or @code{__asm__})
-keyword after the declarator as follows:
+keyword after the declarator.
+It is up to you to make sure that the assembler names you choose do not
+conflict with any other assembler symbols, or reference registers.
 
+@subsubheading Assembler names for data:
+
+This sample shows how to specify the assembler name for data:
+
 @smallexample
 int foo asm ("myfoo") = 2;
 @end smallexample
@@ -8379,33 +8385,30 @@
 @samp{_foo}.
 
 On systems where an underscore is normally prepended to the name of a C
-function or variable, this feature allows you to define names for the
+variable, this feature allows you to define names for the
 linker that do not start with an underscore.
 
-It does not make sense to use this feature with a non-static local
-variable since such variables do not have assembler names.  If you are
-trying to put the variable in a particular register, see @ref{Explicit
-Reg Vars}.  GCC presently accepts such code with a warning, but will
-probably be changed to issue an error, rather than a warning, in the
-future.
+GCC does not support using this feature with a non-static local variable 
+since such variables do not have assembler names.  If you are
+trying to put the variable in a particular register, see 
+@ref{Explicit Reg Vars}.
 
-You cannot use @code{asm} in this way in a function @emph{definition}; but
-you can get the same effect by writing a declaration for the function
-before its definition and putting @code{asm} there, like this:
+@subsubheading Assembler names for functions:
 
+To specify the assembler name for functions, write a declaration for the 
+function before its definition and put @code{asm} there, like this:
+
 @smallexample
-extern func () asm ("FUNC");
-
-func (x, y)
- int x, y;
-/* @r{@dots{}} */
+int func (int x, int y) asm ("MYFUNC");
+ 
+int func (int x, int y)
+@{
+   /* @r{@dots{}} */
 @end smallexample
 
-It is up to you to make sure that the assembler names you choose do not
-conflict with any other assembler symbols.  Also, you must not use a
-register name; that would produce completely invalid assembler code.  GCC
-does not as yet have the ability to store static variables in registers.
-Perhaps that will be added.
+@noindent
+This specifies that the name to be used for the function @code{func} in
+the assembler code should be @code{MYFUNC}.
 
 @node Explicit Reg Vars
 @subsection Variables in Specified Registers


RE: [Graphite] Redesign Graphite scop detection

2015-09-28 Thread Sebastian Paul Pop
Hi Tobi,

we do not cache SCEV information as it depends on the region boundaries,
so I think we are safe when we extend scops.

On handling non-affine regions/loops, you are right, we would need to
first teach scop detection about how to handle them, and then teach it
to the SESE-to-poly pass as well.

Thanks for your review.

Sebastian

-Original Message-
From: Tobias Grosser [mailto:tob...@grosser.es] 
Sent: Monday, September 28, 2015 3:19 AM
To: Aditya Kumar; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; aditya...@samsung.com; s@samsung.com;
seb...@gmail.com
Subject: Re: [Graphite] Redesign Graphite scop detection

On 09/28/2015 03:30 AM, Aditya Kumar wrote:
> From: hiraditya 
>
> Redesign Graphite scop detection for faster compiler time and detecting
more SCoPs.
>
> Existing algorithm for SCoP detection in graphite was based on dominator
tree
> where a tree (CFG) traversal was required for analyzing an SESE. The tree
> traversal is linear in the number of basic blocks and SCoP detection is
> (probably) linear in number of instructions. That algorithm utilized a
generic
> infrastructure of SESE which does not directly represent loops.  With
regards to
> graphite framework, we are only interested in subtrees with loops. The new
> algorithm is geared towards tree traversal on loop structure. The
algorithm is
> linear in number of loops which is faster than the previous algorithm.
>
> Briefly, we start the traversal at a loop-nest and analyze it recursively
for
> validity. Once a valid loop is found we find a valid adjacent loop. If an
> adjacent loop is found and is valid, we merge both loop nests otherwise we
form
> a SCoP from the previous loop nest, and resume the algorithm from the
adjacent
> loop nest. The data structure to represent an SESE is an ordered pair of
edges
> (entry, exit). The new algoritm can extend a SCoP in both the directions.
With
> this approach, the number of instructions to be analyzed for validity
reduces to
> a minimal set.  We start by analyzing those statements which are inside a
loop,
> because validity of those statements is necessary for the validity of
loop. The
> statements outside the loop nest can be just excluded from the SESE if
they are
> not valid.

I am generally fine with this, but please consider that when growing a SCoP
certain
previous analysis may become invalid (an affine expression may suddenly
become
non-affine as parameters that were previously scop-invariant may now be part
of
the scop. Also, how are you planning to handle non-affine regions/loops. In
polly
we can encapsulate non-affine loops and regions in bigger scops. To handle
this,
I assume you would need to teach your patch to start growing regions even
though
the innermost loops cannot be modeled precisely.

Best,
Tobias



Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-28 Thread H.J. Lu
On Mon, Sep 28, 2015 at 1:26 AM, Kyrill Tkachov  wrote:
>
> On 25/09/15 21:03, Jeff Law wrote:
>>
>> On 09/25/2015 05:06 AM, Kyrill Tkachov wrote:
>>>
>>> Hi Rainer,
>>>
>>> On 25/09/15 11:57, Rainer Orth wrote:

 Hi Kyrill,

> Bootstrapped and tested on aarch64 and x86_64.
> Rainer, could you please try this patch in combination with the one I
> sent
> earlier at:
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

 it took me quite a bit, but I've now regtested those two patches: with
 them both applied, the sparc-sun-solaris2.12 build succeeds and the two
 gcc.c-torture/execute/20071216-1.c failures are gone.

 So, from a SPARC POV the patches are good to go.
>>>
>>> Phew, thanks a lot!
>>>
>>> So, in conclusion the patches I'd like approval for are:
>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01306.html
>>> and
>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html
>>
>> These are OK.  Thanks for taking the time to work with Rainer and sort
>> out the sparc issues.  It's greatly appreciated.
>
>
> No problem, they were my regressions to fix after all, and it's easier to
> fix now rather than in stage3/4.
>
> I've committed them with r228194 and r228195.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67749


-- 
H.J.


Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-09-28 Thread Jiong Wang

Jiong Wang writes:

> Andrew Pinski writes:
>
>> On Tue, Jul 28, 2015 at 6:12 AM, Jiong Wang  wrote:
>>>
>>> The instruction sequences for preparing argument for TLS descriptor
>>> runtime resolver and the later function call to resolver can actually be
>>> hoisted out of the loop.
>>>
>>> Currently we can't because we have exposed the hard register X0 as
>>> destination of "set".  While GCC's RTL data flow infrastructure will
>>> skip or do very conservative assumption when hard register involved in
>>> and thus some loop IV opportunities are missed.
>>>
>>> This patch add another "tlsdesc_small_pseudo_" pattern, and avoid
>>> expose x0 to gcc generic code.
>>>
>>> Generally, we define a new register class FIXED_R0 which only contains 
>>> register
>>> 0, so the instruction sequences generated from the new add pattern is the 
>>> same
>>> as tlsdesc_small_, while the operand 0 is wrapped as pseudo register 
>>> that
>>> RTL IV opt can handle it.
>>>
>>> Ideally, we should allow operand 0 to be any pseudo register, but then
>>> we can't model the override of x0 caused by the function call which is
>>> hidded by the UNSPEC.
>>>
>>> So here, we restricting operand 0 to be x0, the override of x0 can be
>>> reflected to the gcc.
>>>
>>> OK for trunk?
>>
>>
>> This patch broke ILP32 because we used mode rather than ptr_mode for
>> the psedu .  I have an idea on how to fix it (like tlsie_small_sidi
>> case) but I still need to test it fully.
>
> Have done a quick re-visit the code, the use of "mode" instead of
> "ptr_mode" looks OK to me.
>
> While what looks strange to me is under ILP32 symbol_ref is DImode.
>
>(symbol_ref:DI ("*.LANCHOR0")
>
> My my understanding, the symbol_ref should be SI mode under ilp32
> instead of DI mode.
>
> So it's better to fix "create_block_symbol" in varasm, and we should let
> it use ptr_mode instead of Pmode as Pmode is used to describe the
> underline hardware mode instead of the mode view in C language level.

meanwhile I revert this patch (r228211) as there is unresolved ILP32
issue.

-- 
Regards,
Jiong



Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 18:15:14 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 07:10:13PM +0300, Ilya Verbin wrote:
> > Committed to trunk as obvious.
> > 
> > PR other/67652
> > * runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.
> > 
> > diff --git a/liboffloadmic/runtime/offload_engine.cpp 
> > b/liboffloadmic/runtime/offload_engine.cpp
> > index 16b440d..00b673a 100644
> > --- a/liboffloadmic/runtime/offload_engine.cpp
> > +++ b/liboffloadmic/runtime/offload_engine.cpp
> > @@ -173,7 +173,7 @@ void Engine::init_process(void)
> >  // use putenv instead of setenv as Windows has no setenv.
> >  // Note: putenv requires its argument can't be freed or 
> > modified.
> >  // So no free after call to putenv or elsewhere.
> > -char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
> > +char * env_var = (char*) 
> > malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
> >  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
> >  putenv(env_var);  
> 
> Missing error handling if malloc returns NULL?

Yes :(
I will grep all mallocs/reallocs one more time.

  -- Ilya


Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Andrew Pinski
On Mon, Sep 28, 2015 at 9:15 AM, Jakub Jelinek  wrote:
> On Mon, Sep 28, 2015 at 07:10:13PM +0300, Ilya Verbin wrote:
>> Committed to trunk as obvious.
>>
>>   PR other/67652
>>   * runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.
>>
>> diff --git a/liboffloadmic/runtime/offload_engine.cpp 
>> b/liboffloadmic/runtime/offload_engine.cpp
>> index 16b440d..00b673a 100644
>> --- a/liboffloadmic/runtime/offload_engine.cpp
>> +++ b/liboffloadmic/runtime/offload_engine.cpp
>> @@ -173,7 +173,7 @@ void Engine::init_process(void)
>>  // use putenv instead of setenv as Windows has no setenv.
>>  // Note: putenv requires its argument can't be freed or 
>> modified.
>>  // So no free after call to putenv or elsewhere.
>> -char * env_var = (char*) 
>> malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 1));
>> +char * env_var = (char*) 
>> malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
>>  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
>>  putenv(env_var);
>
> Missing error handling if malloc returns NULL?


Also why not just use strdup here? instead of malloc/sizeof/sprintf ?

Thanks,
Andrew Pinski

>
> Jakub


Re: [PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Jakub Jelinek
On Mon, Sep 28, 2015 at 07:10:13PM +0300, Ilya Verbin wrote:
> Committed to trunk as obvious.
> 
>   PR other/67652
>   * runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.
> 
> diff --git a/liboffloadmic/runtime/offload_engine.cpp 
> b/liboffloadmic/runtime/offload_engine.cpp
> index 16b440d..00b673a 100644
> --- a/liboffloadmic/runtime/offload_engine.cpp
> +++ b/liboffloadmic/runtime/offload_engine.cpp
> @@ -173,7 +173,7 @@ void Engine::init_process(void)
>  // use putenv instead of setenv as Windows has no setenv.
>  // Note: putenv requires its argument can't be freed or modified.
>  // So no free after call to putenv or elsewhere.
> -char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" 
> + 1));
> +char * env_var = (char*) 
> malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
>  sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
>  putenv(env_var);  

Missing error handling if malloc returns NULL?

Jakub


[PATCH][committed] Fix PR67652: wrong sizeof calculation in liboffloadmic

2015-09-28 Thread Ilya Verbin
Committed to trunk as obvious.

PR other/67652
* runtime/offload_engine.cpp (Engine::init_process): Fix sizeof.

diff --git a/liboffloadmic/runtime/offload_engine.cpp 
b/liboffloadmic/runtime/offload_engine.cpp
index 16b440d..00b673a 100644
--- a/liboffloadmic/runtime/offload_engine.cpp
+++ b/liboffloadmic/runtime/offload_engine.cpp
@@ -173,7 +173,7 @@ void Engine::init_process(void)
 // use putenv instead of setenv as Windows has no setenv.
 // Note: putenv requires its argument can't be freed or modified.
 // So no free after call to putenv or elsewhere.
-char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2" + 
1));
+char * env_var = (char*) malloc(sizeof("COI_DMA_CHANNEL_COUNT=2"));
 sprintf(env_var, "COI_DMA_CHANNEL_COUNT=2");
 putenv(env_var);  
 }

  -- Ilya


[gomp4] remove goacc locking

2015-09-28 Thread Nathan Sidwell
I've committed this to remove the now no longer needed lock and unlock builtins 
and related infrastructure.


nathan
2015-09-28  Nathan Sidwell  

	* target.def (GOACC_LOCK): Delete hook.
	* doc/tm.texi.in (TARGET_GOACC_LOCK): Delete.
	* doc/tm.texi: Rebuilt.
	* targhooks.h (default_goacc_lock): Delete.
	* internal-fn.def (GOACC_LOCK,  GOACC_UNLOCK, GOACC_LOCK_INIT): Delete.
	* internal-fn.c (expand_GOACC_LOCK, expand_GOACC_UNLOCK,
	expand_GOACC_LOCK_INIT): Delete.
	* omp-low.c (lower_oacc_reductions): Remove locking.
	(execute_oacc_transform): Remove lock transforming.
	(default_goacc_lock): Delete.
	
	* config/nvptx/nvptx-protos.h (nvptx_expand_oacc_lock): Delete.
	* config/nvptx/nvptx.md (oacc_lock, oacc_unlock, oacc_lock_init):
	Delete.
	(nvptx_spin_lock, nvptx_spin_reset): Delete.
	* config/nvptx/nvptx.c (LOCK_GLOBAL, LOCK_SHARED, LOCK_MAX): Delete.
	(lock_names, lock_space, lock_level, lock_used): Delete.
	(force_global_locks): Delete.
	(nvptx_option_override): Do not initialize lock syms.
	(nvptx_expand_oacc_lock): Delete.
	(nvptx_file_end): Do not finalize locks.
	(TARGET_GOACC_LOCK): Delete.

Index: internal-fn.def
===
--- internal-fn.def	(revision 228200)
+++ internal-fn.def	(working copy)
@@ -81,13 +81,6 @@ DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW | E
 DEF_INTERNAL_FN (GOACC_DIM_SIZE, ECF_CONST | ECF_NOTHROW | ECF_LEAF, ".")
 DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".")
 
-/* LOCK, UNLOCK & LOCK_INIT operate a mutex used for reductions.  The first
-   argument is the compute dimension of the reduction and the second
-   argument is a loop identifer.  */
-DEF_INTERNAL_FN (GOACC_LOCK, ECF_NOTHROW | ECF_LEAF, "..")
-DEF_INTERNAL_FN (GOACC_UNLOCK, ECF_NOTHROW | ECF_LEAF, "..")
-DEF_INTERNAL_FN (GOACC_LOCK_INIT, ECF_NOTHROW | ECF_LEAF, "..")
-
 /* REDUCTION_SETUP, REDUCTION_INIT, REDUCTION_FINI and REDUCTION_TEARDOWN
together define a generic interface to support gang, worker and vector
reductions. All of the functions take the following form
Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 228200)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -34,7 +34,6 @@ extern const char *nvptx_section_for_dec
 #ifdef RTX_CODE
 extern void nvptx_expand_oacc_fork (unsigned);
 extern void nvptx_expand_oacc_join (unsigned);
-extern void nvptx_expand_oacc_lock (rtx, int);
 extern void nvptx_expand_call (rtx, rtx);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
Index: config/nvptx/nvptx.md
===
--- config/nvptx/nvptx.md	(revision 228200)
+++ config/nvptx/nvptx.md	(working copy)
@@ -1371,36 +1371,6 @@
   return asms[INTVAL (operands[1])];
 })
 
-(define_expand "oacc_lock"
-  [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")
-		(match_operand:SI 1 "const_int_operand" "")]
-		   UNSPECV_LOCK)]
-  ""
-{
-  nvptx_expand_oacc_lock (operands[0], 0);
-  DONE;
-})
-
-(define_expand "oacc_unlock"
-  [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")
-		(match_operand:SI 1 "const_int_operand" "")]
-		   UNSPECV_LOCK)]
-  ""
-{
-  nvptx_expand_oacc_lock (operands[0], +1);
-  DONE;
-})
-
-(define_expand "oacc_lock_init"
-  [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")
-		(match_operand:SI 1 "const_int_operand" "")]
-		   UNSPECV_LOCK)]
-  ""
-{
-  nvptx_expand_oacc_lock (operands[0], -1);
-  DONE;
-})
-
 (define_insn "nvptx_fork"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
 		   UNSPECV_FORK)]
@@ -1588,23 +1558,3 @@
 		UNSPECV_MEMBAR)]
   ""
   "%.\\tmembar%B0;")
-
-;; spin lock and reset
-(define_insn "nvptx_spin_lock"
-   [(parallel
- [(set (match_operand:SI 2 "register_operand" "=R")
-	   (unspec_volatile:SI [(match_operand:SI 0 "memory_operand" "m")
-			(match_operand:SI 1 "const_int_operand" "i")]
-			   UNSPECV_LOCK))
-  (set (match_operand:BI 3 "register_operand" "=R") (const_int 0))
-  (label_ref (match_operand 4 "" ""))])]
-   ""
-   "%4:\\tatom%R1.cas.b32\\t%2, %0, 0, 1;\\n\\t\\tsetp.ne.u32\\t%3, %2, 0;\\n\\t@%3\\tbra.uni\\t%4;")
-
-(define_insn "nvptx_spin_reset"
-   [(set (match_operand:SI 2 "register_operand" "=R")
-	(unspec_volatile:SI [(match_operand:SI 0 "memory_operand" "m")
- (match_operand:SI 1 "const_int_operand" "i")]
-UNSPECV_LOCK))]
-   ""
-   "%.\\tatom%R1.exch.b32\\t%2, %0, 0;")
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228200)
+++ config/nvptx/nvptx.c	(working copy)
@@ -122,21 +122,6 @@ static unsigned worker_bcast_align;
 #define worker_bcast_name "__worker_bcast"
 static GTY(()) rtx worker_bcast_sym;
 
-/* Global and shared lock variables.  Allocat

[gomp4] Remove erroneous test and unreachable situation.

2015-09-28 Thread James Norris

Hi,

The attached patch removes an erroneous attribute test and
an unreachable situation. Both showed up when dealing with
the routine directive and the name option where the name
was the identical to the name of the function / subroutine.

Committed after regtest on x86_64 and powerpc64le.

Thanks!
Jim
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 55eed48..44cbec1 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1755,14 +1755,6 @@ gfc_match_oacc_routine (void)
 	  return MATCH_ERROR;
 	}
 
-  if (!sym->attr.external && !sym->attr.function && !sym->attr.subroutine)
-	{
-	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid"
-		 " function name %qs", sym->name);
-	  gfc_current_locus = old_loc;
-	  return MATCH_ERROR;
-	}
-
   if (gfc_match_char (')') != MATCH_YES)
 	{
 	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, expecting"
@@ -1798,8 +1790,6 @@ gfc_match_oacc_routine (void)
   gfc_current_ns->proc_name->attr.oacc_function
 	= gfc_oacc_routine_dims (c) + 1;
 }
-  else
-gcc_unreachable ();
 
   if (n)
 n->clauses = c;


[gomp4, WIP] Implement -foffload-alias

2015-09-28 Thread Tom de Vries

Hi,

this work-in-progress patch implements a new option 
-foffload-alias=.



The option -foffload-alias=none instructs the compiler to assume that
objects references and pointer dereferences in an offload region do not 
alias.


The option -foffload-alias=pointer instructs the compiler to assume that 
objects references in an offload region do not alias.


The option -foffload-alias=all instructs the compiler to make no 
assumptions about aliasing in offload regions.


The default value is -foffload-alias=none.


The patch works by adding restrict to the types of the fields used to 
pass data to an offloading region.


Atm, the kernels-loop-offload-alias-ptr.c test-case passes, but the 
kernels-loop-offload-alias-none.c test-case fails.  For the latter, the 
required amount of restrict is added, but it has no effect. I've 
reported this in a more basic form in PR67742: "3rd-level restrict ignored".


Thanks,
- Tom
Implement -foffload-alias

2015-09-28  Tom de Vries  

	* common.opt (foffload-alias): New option.
	* flag-types.h (enum offload_alias): New enum.
	* omp-low.c (is_gimple_oacc_offload): New function.
	(install_var_field): Handle flag_offload_alias.
	* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
	(@item -foffload-alias): New item.

	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---
 gcc/common.opt | 16 ++
 gcc/doc/invoke.texi| 11 
 gcc/flag-types.h   |  7 +++
 gcc/omp-low.c  | 24 -
 .../goacc/kernels-loop-offload-alias-none.c| 63 ++
 .../goacc/kernels-loop-offload-alias-ptr.c | 52 ++
 6 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 290b6b3..28977a4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1730,6 +1730,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-alias=
+Common Joined RejectNegative Enum(offload_alias) Var(flag_offload_alias) Init(OFFLOAD_ALIAS_NONE)
+-foffload-alias=[all|pointer|none] Assume non-aliasing in an offload region
+
+Enum
+Name(offload_alias) Type(enum offload_alias) UnknownError(unknown offload aliasing %qs)
+
+EnumValue
+Enum(offload_alias) String(all) Value(OFFLOAD_ALIAS_ALL)
+
+EnumValue
+Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
+
+EnumValue
+Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 909a453..a5ab785 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1136,6 +1136,7 @@ See S/390 and zSeries Options.
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
+-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]} @gol
 -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
@@ -23695,6 +23696,16 @@ The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
 using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
 results in @option{-ftrapv} being effective.
 
+@item -foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]}
+@opindex -foffload-alias
+The option @option{-foffload-alias=none} instructs the compiler to assume that
+objects references and pointer dereferences in an offload region do not alias.
+The option @option{-foffload-alias=pointer} instruct the compiler to assume that
+objects references in an offload region are presumed unaliased.  The option
+@option{-foffload-alias=all} instructs the compiler to make no assumtions about
+aliasing in offload regions.  The default value is
+@option{-foffload-alias=none}.
+
 @item -fexceptions
 @opindex fexceptions
 Enable exception handling.  Generates extra code needed to propagate
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index ac9ca0b..e8e672d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -286,5 +286,12 @@ enum gfc_convert
   GFC_FLAG_CONVERT_LITTLE
 };
 
+enum offload_alias
+{
+  OFFLOAD_ALIAS_ALL,
+  OFFLOAD_ALIAS_POINTER,
+  OFFLOAD_ALIAS_NONE
+};
+
 
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a5904eb..9cbba1f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1140,6 +1140,16 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
   return false;
 }
 
+static bool
+is_gimpl

Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-09-28 Thread Marcus Shawcroft
On 28 July 2015 at 14:12, Jiong Wang  wrote:
>
> The instruction sequences for preparing argument for TLS descriptor
> runtime resolver and the later function call to resolver can actually be
> hoisted out of the loop.
>
> Currently we can't because we have exposed the hard register X0 as
> destination of "set".  While GCC's RTL data flow infrastructure will
> skip or do very conservative assumption when hard register involved in
> and thus some loop IV opportunities are missed.

This patch feels like we are botching the back end to over come a
limitation in RTL IV opt.  Isn't the real solution in RTL IV opt ?

/Marcus

>
> This patch add another "tlsdesc_small_pseudo_" pattern, and avoid
> expose x0 to gcc generic code.
>
> Generally, we define a new register class FIXED_R0 which only contains 
> register
> 0, so the instruction sequences generated from the new add pattern is the same
> as tlsdesc_small_, while the operand 0 is wrapped as pseudo register 
> that
> RTL IV opt can handle it.
>
> Ideally, we should allow operand 0 to be any pseudo register, but then
> we can't model the override of x0 caused by the function call which is
> hidded by the UNSPEC.
>
> So here, we restricting operand 0 to be x0, the override of x0 can be
> reflected to the gcc.
>
> OK for trunk?
>
> 2015-07-28  Ramana Radhakrishnan  
> Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern.
>   * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Likewise.
>   * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise.
>   (aarch64_register_move_cost): Likewise.
>   (aarch64_load_symref_appropriately): Invoke the new added pattern if
>   possible.
>   * config/aarch64/constraints.md (Uc0): New constraint.
>
> gcc/testsuite.
>   * gcc.target/aarch64/tlsdesc_hoist.c: New testcase.
>
> --
> Regards,
> Jiong
>


Re: [RFC] Dump function attributes

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 04:32 PM, Tom de Vries wrote:

patch below prints the function attributes in the dump file.



foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?


Only for really long ones I'd think. Patch is ok for now.


Bernd



Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Daniel Hellstrom

On 09/28/2015 03:49 PM, Sebastian Huber wrote:



On 28/09/15 15:39, Daniel Hellstrom wrote:

On 09/28/2015 03:37 PM, Sebastian Huber wrote:



On 28/09/15 15:20, Daniel Hellstrom wrote:

Which multilibs do we have after this change?


.;
soft;@msoft-float
v8;@mcpu=v8
leon3;@mcpu=leon3
leon3v7;@mcpu=leon3v7
leon;@mcpu=leon
leon/ut699;@mcpu=leon@mfix-ut699
leon/at697f;@mcpu=leon@mfix-at697f
soft/v8;@msoft-float@mcpu=v8
soft/leon3;@msoft-float@mcpu=leon3
soft/leon3v7;@msoft-float@mcpu=leon3v7
soft/leon;@msoft-float@mcpu=leon
soft/leon/ut699;@msoft-float@mcpu=leon@mfix-ut699
soft/leon/at697f;@msoft-float@mcpu=leon@mfix-at697f 


Ok, looks good. The change log entry should mention that you add -mcpu=leon 
multilibs as well.


Ooh I forgot to mention that, I will add it to the comment. Otherwise is this okay to commit for 4.9, 5 and master now? 


Since this is a RTEMS only change, I think this is all right. Just make sure 
the change log entry format is all right. The statements end all with a '.' for 
example.


Thanks for the review, updated the comments and committed it.

Daniel



Re: [PATCH] Remove restriction for remote testing

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 02:35 PM, James Norris wrote:

The attached patch fixes a problem when doing remote testing.
Specifically, testing of the atomic tests found in gcc/atomic.
The code in atomic_init precludes the setting of the variable
'link_flags' when doing remote testing. The conditional test
can be safely removed as get_multilibs will return "", and
atomic_link_flags will return the necessary '-latomic' that
will allow the atomic tests to successfully link.


I guess remote host is sufficiently exotic that no one has noticed until 
now?


It looks like this pattern occurs across many .exp files, at least:
  asan-dg.exp
  cilk-plus-dg.exp
  mpx-dg.exp
  tsan-dg.exp
  ubsan-dg.exp
We should check that these don't need fixing (did you only see 
unexpected atomic failures?) It looks like they may be ok; for example 
-mmpx seems to imply -lmpx at the specs level.


I'm slightly worried about whether the code in atomic_link_flags will 
reliably find the right -L options and LD_LIBRARY_PATH on all possible 
setups. It looks like it's searching for things on the build machine, 
and I expect you have a setup where the host has the build directory 
mounted in the same location.


Maybe it's better to just append "-latomic" unconditionally, and remove 
it from atomic_link_flags, so that the latter function only sets up 
LD_LIBRARY_PATH etc. That would make atomic-dg closer in behaviour to 
e.g. mpx-dg. Mike, any thoughts?



Bernd


[PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-28 Thread Ilya Verbin
Hi!

Currently the COI emulator is single-threaded, i.e. it is able to run only one
target function at a time, e.g. the following testcase:

  #pragma omp parallel sections num_threads(2)
{
  #pragma omp section
  #pragma omp target
  while (1)
putchar ('.');

  #pragma omp section
  #pragma omp target
  while (1)
putchar ('o');
}

prints only dots using emul, while using real libcoi it prints:
...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
Of course, it's not possible to test new OpenMP 4.1's async features using such
an emulator.

The patch bellow makes it asynchronous, it creates an auxiliary thread for each
COIPipeline in host and in target processes.  In general, a new COIPipeline is
created by liboffloadmic for each host thread with offload, i.e. the example
above has:
4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
3 threads in the target process (1 main thread + 2 auxiliary threads).
An auxiliary host thread runs a target function in the new thread in target
process and waits for its completion.  When the function is finished, the host
thread signals an event and can run a callback, if it is registered.
liboffloadmic waits for signalled events by calling COIEventWait.
This is identical to how real libcoi works.

make check-target-libgomp and some internal tests did not show any regression.
TSan report is clean.  Is it OK for trunk?


liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (OFFLOAD_ACTIVE_WAIT_ENV): New
define.
(init): Set OFFLOAD_ACTIVE_WAIT env var to 0, if it is not set.
* runtime/emulator/coi_common.h (PIPE_HOST_PATH): Replace with ...
(PIPE_HOST2TGT_NAME): ... this.
(PIPE_TARGET_PATH): Replace with ...
(PIPE_TGT2HOST_NAME): ... this.
(MALLOCN): New define.
(READN): Likewise.
(WRITEN): Likewise.
(enum cmd_t): Replace CMD_RUN_FUNCTION with CMD_PIPELINE_RUN_FUNCTION.
Add CMD_PIPELINE_CREATE, CMD_PIPELINE_DESTROY.
* runtime/emulator/coi_device.cpp (engine_dir): New static variable.
(pipeline_thread_routine): New static function.
(COIProcessWaitForShutdown): Use global engine_dir instead of mic_dir.
Rename pipe_host and pipe_target to pipe_host2tgt and pipe_tgt2host.
If cmd is CMD_PIPELINE_CREATE, create a new thread for the pipeline.
Remove cmd == CMD_RUN_FUNCTION case.
* runtime/emulator/coi_device.h (COIERRORN): New define.
* runtime/emulator/coi_host.cpp: Include set, map, queue.
Replace typedefs with enums and structs.
(struct Function): Remove name, add num_buffers, bufs_size,
bufs_data_target, misc_data_len, misc_data, return_value_len,
return_value, completion_event.
(struct Callback): New.
(struct Process): Remove pipeline.  Add pipe_host2tgt and pipe_tgt2host.
(struct Pipeline): Remove pipe_host and pipe_target.  Add thread,
destroy, is_destroyed, pipe_host2tgt_path, pipe_tgt2host_path,
pipe_host2tgt, pipe_tgt2host, queue, process.
(max_pipeline_num): New static variable.
(pipelines): Likewise.
(max_event_num): Likewise.
(non_signalled_events): Likewise.
(errored_events): Likewise.
(callbacks): Likewise.
(cleanup): Do not check tmp_dirs before free.
(start_critical_section): New static function.
(finish_critical_section): Likewise.
(pipeline_is_destroyed): Likewise.
(maybe_invoke_callback): Likewise.
(signal_event): Likewise.
(get_event_result): Likewise.
(COIBufferCopy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.  Signal completion
event.
(COIBufferCreate): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferCreateFromMemory): Rename arguments according to headers.
Add asserts.
(COIBufferDestroy): Rename arguments according to headers.  Add asserts.
Use process' main pipes, instead of pipeline's pipes.
(COIBufferGetSinkAddress): Rename arguments according to headers.
Add asserts.
(COIBufferMap): Rename arguments according to headers.  Add asserts.
Signal completion event.
(COIBufferRead): Likewise.
(COIBufferSetState): Likewise.
(COIBufferUnmap): Likewise.
(COIBufferWrite): Likewise.
(COIEngineGetCount): Add assert.
(COIEngineGetHandle): Rename arguments according to headers.
Add assert.
(COIEventWait): Rename arguments according to headers.  Add asserts.
Implement waiting for events with zero or infinite timeout.
(COIEventRegisterCallback): New function.
(pipeline_thread_routine): New static function.
(COIPipelineCr

Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-09-28 Thread Jiong Wang

Andrew Pinski writes:

> On Tue, Jul 28, 2015 at 6:12 AM, Jiong Wang  wrote:
>>
>> The instruction sequences for preparing argument for TLS descriptor
>> runtime resolver and the later function call to resolver can actually be
>> hoisted out of the loop.
>>
>> Currently we can't because we have exposed the hard register X0 as
>> destination of "set".  While GCC's RTL data flow infrastructure will
>> skip or do very conservative assumption when hard register involved in
>> and thus some loop IV opportunities are missed.
>>
>> This patch add another "tlsdesc_small_pseudo_" pattern, and avoid
>> expose x0 to gcc generic code.
>>
>> Generally, we define a new register class FIXED_R0 which only contains 
>> register
>> 0, so the instruction sequences generated from the new add pattern is the 
>> same
>> as tlsdesc_small_, while the operand 0 is wrapped as pseudo register 
>> that
>> RTL IV opt can handle it.
>>
>> Ideally, we should allow operand 0 to be any pseudo register, but then
>> we can't model the override of x0 caused by the function call which is
>> hidded by the UNSPEC.
>>
>> So here, we restricting operand 0 to be x0, the override of x0 can be
>> reflected to the gcc.
>>
>> OK for trunk?
>
>
> This patch broke ILP32 because we used mode rather than ptr_mode for
> the psedu .  I have an idea on how to fix it (like tlsie_small_sidi
> case) but I still need to test it fully.

Have done a quick re-visit the code, the use of "mode" instead of
"ptr_mode" looks OK to me.

While what looks strange to me is under ILP32 symbol_ref is DImode.

   (symbol_ref:DI ("*.LANCHOR0")

My my understanding, the symbol_ref should be SI mode under ilp32
instead of DI mode.

So it's better to fix "create_block_symbol" in varasm, and we should let
it use ptr_mode instead of Pmode as Pmode is used to describe the
underline hardware mode instead of the mode view in C language level.

>
> This is the smallest testcase where the problem is:
> struct dtor_list
> {
>   struct dtor_list *next;
> };
> static __thread struct dtor_list *tls_dtor_list;
> __cxa_thread_atexit_impl ( struct dtor_list *new)
> {
>   new->next = tls_dtor_list;
>   tls_dtor_list = new;
> }
>
>
> Thanks,
> Andrew
>
>>
>> 2015-07-28  Ramana Radhakrishnan  
>> Jiong Wang  
>>
>> gcc/
>>   * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern.
>>   * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0.
>>   (REG_CLASS_NAMES): Likewise.
>>   (REG_CLASS_CONTENTS): Likewise.
>>   * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise.
>>   (aarch64_register_move_cost): Likewise.
>>   (aarch64_load_symref_appropriately): Invoke the new added pattern if
>>   possible.
>>   * config/aarch64/constraints.md (Uc0): New constraint.
>>
>> gcc/testsuite.
>>   * gcc.target/aarch64/tlsdesc_hoist.c: New testcase.
>>
>> --
>> Regards,
>> Jiong
>>

-- 
Regards,
Jiong



[PATCH] AIX SECTION_EXCLUDE

2015-09-28 Thread David Edelsohn
The appended patch allows GCC to emit LTO information to a reasonable
location in XCOFF objects.

Thanks, David

* config/rs6000/rs6000.c (rs6000_xcoff_asm_named_section): Place
SECTION_EXCLUDE in XO mapping class.

Index: rs6000.c
===
--- rs6000.c(revision 228202)
+++ rs6000.c(working copy)
@@ -30845,14 +30845,16 @@
tree decl ATTRIBUTE_UNUSED)
 {
   int smclass;
-  static const char * const suffix[4] = { "PR", "RO", "RW", "TL" };
+  static const char * const suffix[5] = { "PR", "RO", "RW", "TL", "XO" };

-  if (flags & SECTION_DEBUG)
+  if (flags & SECTION_EXCLUDE)
+smclass = 4;
+  else if (flags & SECTION_DEBUG)
 {
   fprintf (asm_out_file, "\t.dwsect %s\n", name);
   return;
 }
-  if (flags & SECTION_CODE)
+  else if (flags & SECTION_CODE)
 smclass = 0;
   else if (flags & SECTION_TLS)
 smclass = 3;


[RFC] Dump function attributes

2015-09-28 Thread Tom de Vries

Hi,

patch below prints the function attributes in the dump file.

Say we mark a function foo with attributes noinline and noclone like this:
...
int __attribute__((noinline, noclone))
foo (void)
...

Then using this patch, we find in the dump file:
...
;; Function foo (foo, funcdef_no=10, decl_uid=2455, cgraph_uid=10, 
symbol_order=10)


Pass statistics of : 

foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?

Thanks,
- Tom

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cd7a4b4..c724fde 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7384,6 +7384,13 @@ dump_function_to_file (tree fndecl, FILE *file, 
int flags)

 }
   fprintf (file, ")\n");

+  if (DECL_ATTRIBUTES (fndecl) != NULL_TREE)
+{
+  fprintf (file, "[ ");
+  print_generic_expr (file, DECL_ATTRIBUTES (fndecl), dump_flags);
+  fprintf (file, "]\n");
+}
+
   if (flags & TDF_VERBOSE)
 print_node (file, "", fndecl, 2);



Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-28 Thread Florian Weimer
On 09/27/2015 12:24 PM, Jonathan Wakely wrote:
> Doh, sorry, I meant this instead i.e. the non-recursive mutex.

> +# if _GLIBCXX_ASSERTIONS && defined(PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP)
> +// Use an error-checking mutex type when assertions are enabled.
> +__native_type  _M_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
> +# else
>  __native_type  _M_mutex = __GTHREAD_MUTEX_INIT;
> +# endif

I think we should abort on a self-deadlock or an invalid unlock instead.
 If this change is valid for libstdc++, it should probably happen at the
glibc level (perhaps guarded by _FORTIFY_SOURCE) because C programs
would benefit as well.

We'd need performance numbers to justify the change.  Any ideas how to
get them?  Benchmark MariaDB?  What's another large multi-threaded
application?  Qpid perhaps?

-- 
Florian Weimer / Red Hat Product Security


[SH][committed] Improve treg_set_expr matching

2015-09-28 Thread Oleg Endo
Hi,

This patch has been hanging around in my queue for a while.  Basically,
it uses reverse_condition to get better matching for treg_set_expr.
Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
and no new failures.
Committed as r228202.

Cheers,
Oleg

gcc/ChangeLog:
PR target/54236
* config/sh/predicates.md (t_reg_operand, negt_reg_operand): Allow
and handle ne and eq codes.
* config/sh/sh.c (sh_rtx_costs): Adjust matching of tst #imm,r0 insn.
(sh_recog_treg_set_expr): Early accept negt_reg_operand.  Eearly reject
CONST_INT_P.  Use reverse_condition.
(sh_split_treg_set_expr): Likewise.

gcc/testsuite/ChangeLog:
PR target/54236
* gcc.target/sh/pr54236-1.c (test_09, test_10, test_11): New.
* gcc.target/sh/pr59533-1.c (test_23, test_24, test_25, test_26,
test_27): New.
* gcc.target/sh/pr54236-5.c: New.
* gcc.target/sh/pr54236-6.c: New.
Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 228175)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -1158,10 +1158,18 @@
 
 ;; A predicate describing the T bit register in any form.
 (define_predicate "t_reg_operand"
-  (match_code "reg,subreg,sign_extend,zero_extend")
+  (match_code "reg,subreg,sign_extend,zero_extend,ne,eq")
 {
   switch (GET_CODE (op))
 {
+  case EQ:
+	return t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)))
+	   && XEXP (op, 1) == const1_rtx;
+
+  case NE:
+	return t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)))
+	   && XEXP (op, 1) == const0_rtx;
+
   case REG:
 	return REGNO (op) == T_REG;
 
@@ -1183,13 +1191,21 @@
 
 ;; A predicate describing a negated T bit register.
 (define_predicate "negt_reg_operand"
-  (match_code "subreg,xor")
+  (match_code "subreg,xor,ne,eq")
 {
   switch (GET_CODE (op))
 {
+  case EQ:
+	return t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)))
+	   && XEXP (op, 1) == const0_rtx;
+
+  case NE:
+	return t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)))
+	   && XEXP (op, 1) == const1_rtx;
+
   case XOR:
 	return t_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)))
-	   && satisfies_constraint_M (XEXP (op, 1));
+	   && XEXP (op, 1) == const1_rtx;
 
   case SUBREG:
 	return negt_reg_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0)));
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 228176)
+++ gcc/config/sh/sh.c	(working copy)
@@ -3592,13 +3592,12 @@
 
 case EQ:
   /* An and with a constant compared against zero is
-	 most likely going to be a TST #imm, R0 instruction.
-	 Notice that this does not catch the zero_extract variants from
-	 the md file.  */
+	 most likely going to be a TST #imm, R0 instruction.  */
   if (XEXP (x, 1) == const0_rtx
-  && (GET_CODE (XEXP (x, 0)) == AND
-  || (SUBREG_P (XEXP (x, 0))
-		  && GET_CODE (SUBREG_REG (XEXP (x, 0))) == AND)))
+  && ((GET_CODE (XEXP (x, 0)) == AND
+   || (SUBREG_P (XEXP (x, 0))
+		   && GET_CODE (SUBREG_REG (XEXP (x, 0))) == AND))
+	  || GET_CODE (XEXP (x, 0)) == ZERO_EXTRACT))
 	{
 	  *total = 1;
 	  return true;
@@ -14200,7 +14199,8 @@
 return false;
 
   /* Early accept known possible operands before doing recog.  */
-  if (op == const0_rtx || op == const1_rtx || t_reg_operand (op, mode))
+  if (op == const0_rtx || op == const1_rtx || t_reg_operand (op, mode)
+  || negt_reg_operand (op, mode))
 return true;
 
   /* Early reject impossible operands before doing recog.
@@ -14209,8 +14209,8 @@
  such as lower-subreg will bail out.  Some insns such as SH4A movua are
  done with UNSPEC, so must reject those, too, or else it would result
  in an invalid reg -> treg move.  */
-  if (register_operand (op, mode) || memory_operand (op, mode)
-  || sh_unspec_insn_p (op))
+  if (CONST_INT_P (op) || register_operand (op, mode)
+  || memory_operand (op, mode) || sh_unspec_insn_p (op))
 return false;
 
   if (!can_create_pseudo_p ())
@@ -14230,26 +14230,30 @@
   SET_PREV_INSN (i) = NULL;
   SET_NEXT_INSN (i) = NULL;
 
+  /* If the comparison op doesn't have a result mode, set it to SImode.  */
+  machine_mode prev_op_mode = GET_MODE (op);
+  if (COMPARISON_P (op) && prev_op_mode == VOIDmode)
+PUT_MODE (op, SImode);
+
   int result = recog (PATTERN (i), i, 0);
 
-  /* It seems there is no insn like that.  Create a simple negated
- version and try again.  If we hit a negated form, we'll allow that
- and append a nott sequence when splitting out the insns.  Insns that
- do the split can then remove the trailing nott if they know how to
- deal with it.  */
-  if (result < 0 && GET_CODE (op) == EQ)
+  /* It seems there is no insn like that.  Create a

Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Sebastian Huber



On 28/09/15 15:39, Daniel Hellstrom wrote:

On 09/28/2015 03:37 PM, Sebastian Huber wrote:



On 28/09/15 15:20, Daniel Hellstrom wrote:

Which multilibs do we have after this change?


.;
soft;@msoft-float
v8;@mcpu=v8
leon3;@mcpu=leon3
leon3v7;@mcpu=leon3v7
leon;@mcpu=leon
leon/ut699;@mcpu=leon@mfix-ut699
leon/at697f;@mcpu=leon@mfix-at697f
soft/v8;@msoft-float@mcpu=v8
soft/leon3;@msoft-float@mcpu=leon3
soft/leon3v7;@msoft-float@mcpu=leon3v7
soft/leon;@msoft-float@mcpu=leon
soft/leon/ut699;@msoft-float@mcpu=leon@mfix-ut699
soft/leon/at697f;@msoft-float@mcpu=leon@mfix-at697f 


Ok, looks good. The change log entry should mention that you add 
-mcpu=leon multilibs as well.


Ooh I forgot to mention that, I will add it to the comment. Otherwise 
is this okay to commit for 4.9, 5 and master now? 


Since this is a RTEMS only change, I think this is all right. Just make 
sure the change log entry format is all right. The statements end all 
with a '.' for example.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [SH][committed] Fix PR 67391

2015-09-28 Thread Oleg Endo
On Sun, 2015-09-27 at 21:03 +0900, Oleg Endo wrote:
> On Wed, 2015-09-23 at 21:04 +0900, Oleg Endo wrote:
> > Hi,
> > 
> > The attached patch fixes PR 67391.  Some additional reg overlapping were
> > added to the addsi3 patterns while making LRA on SH work, but not all of
> > them seem to be good.  Removing them, seems to be working just fine.
> > Tested on sh-elf (LRA enabled) with make -k check
> > RUNTESTFLAGS="--target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > and by Kaz on sh4-linux.
> > 
> > Committed to trunk as r228046 and to the GCC 5 branch as r228047.
> 
> This has opened a small can of worms.  The follow up patch is attached
> and has been commited to trunk as r228176.  Tested by me on sh-elf and
> by Kaz on sh4-linux with LRA enabled/disabled.
> 
> As a positive side effect, we get some code size reduction on the CSiBE
> set: 3345527 bytes -> 3334351 bytes-11176 bytes / -0.334058 %
> 
> However, this is only when LRA is disabled because of some problems with
> LRA and its usage/handling of addsi3 insns.
> 
> Backport to GCC 5 branch will follow.

Now also committed to GCC 5 branch as r228201.  Tested on the branch as
above.

Cheers,
Oleg



Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Daniel Hellstrom

On 09/28/2015 03:37 PM, Sebastian Huber wrote:



On 28/09/15 15:20, Daniel Hellstrom wrote:

Which multilibs do we have after this change?


.;
soft;@msoft-float
v8;@mcpu=v8
leon3;@mcpu=leon3
leon3v7;@mcpu=leon3v7
leon;@mcpu=leon
leon/ut699;@mcpu=leon@mfix-ut699
leon/at697f;@mcpu=leon@mfix-at697f
soft/v8;@msoft-float@mcpu=v8
soft/leon3;@msoft-float@mcpu=leon3
soft/leon3v7;@msoft-float@mcpu=leon3v7
soft/leon;@msoft-float@mcpu=leon
soft/leon/ut699;@msoft-float@mcpu=leon@mfix-ut699
soft/leon/at697f;@msoft-float@mcpu=leon@mfix-at697f 


Ok, looks good. The change log entry should mention that you add -mcpu=leon 
multilibs as well.


Ooh I forgot to mention that, I will add it to the comment. Otherwise is this 
okay to commit for 4.9, 5 and master now?



It would be nice if you can update the RTEMS documentation accordingly similar 
to ARM and PowerPC, e.g.

https://docs.rtems.org/doc-current/share/rtems/html/cpu_supplement/ARM-Specific-Information-Multilibs.html#ARM-Specific-Information-Multilibs


I agree. I was just thinking were to document this, thanks!

Daniel



Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Sebastian Huber



On 28/09/15 15:20, Daniel Hellstrom wrote:

Which multilibs do we have after this change?


.;
soft;@msoft-float
v8;@mcpu=v8
leon3;@mcpu=leon3
leon3v7;@mcpu=leon3v7
leon;@mcpu=leon
leon/ut699;@mcpu=leon@mfix-ut699
leon/at697f;@mcpu=leon@mfix-at697f
soft/v8;@msoft-float@mcpu=v8
soft/leon3;@msoft-float@mcpu=leon3
soft/leon3v7;@msoft-float@mcpu=leon3v7
soft/leon;@msoft-float@mcpu=leon
soft/leon/ut699;@msoft-float@mcpu=leon@mfix-ut699
soft/leon/at697f;@msoft-float@mcpu=leon@mfix-at697f 


Ok, looks good. The change log entry should mention that you add 
-mcpu=leon multilibs as well.


It would be nice if you can update the RTEMS documentation accordingly 
similar to ARM and PowerPC, e.g.


https://docs.rtems.org/doc-current/share/rtems/html/cpu_supplement/ARM-Specific-Information-Multilibs.html#ARM-Specific-Information-Multilibs

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Daniel Hellstrom

On 09/28/2015 02:36 PM, Sebastian Huber wrote:



On 28/09/15 14:33, Sebastian Huber wrote:

On 28/09/15 14:13, Daniel Hellstrom wrote:

Now that muser-mode is default the multilib definitions does not require to
specify that switch any more. Add UT699 to multilib after recent patches. Add
AT697F multilib since there are many LEON2 users running RTEMS.

To gcc/ChangeLog:

gcc/
* config/sparc/t-rtems: Remove -muser-mode, add ut699 and at697f
---
  gcc/config/sparc/t-rtems |   25 +++--
  1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index adb6dcb..6f7cc6f 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,15 +17,20 @@
  # .
  #
  -MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7 muser-mode
-MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 user-mode
+MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7/mcpu=leon \
+   mfix-ut699/mfix-at697f
+MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 leon ut699 at697f
  MULTILIB_MATCHES = msoft-float=mno-fpu
  -MULTILIB_EXCEPTIONS = muser-mode
-MULTILIB_EXCEPTIONS += mcpu=leon3
-MULTILIB_EXCEPTIONS += mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/muser-mode
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/muser-mode
-MULTILIB_EXCEPTIONS += mcpu=v8/muser-mode
+MULTILIB_EXCEPTIONS = mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-at697f


Why do we need so many variants with fixes for the UT699 (32-bit Fault-Tolerant LEON3FT SPARC V8 Processor) and the AT697F (LEON2-FT)? Why do we have no leon3 variant without fixes for the UT699 
and AT697F? Which multilib do you suggest now for the NGMP/GR740?


NGMP/GR740 is still the same ("mcpu=leon3" or "mcpu=leon3 msoft-float") but the muser-mode is not longer needed since it is default. This will be easier than having to deal with muser-mode all the 
time and if that is forgotten during link the v7 multilib will be selected wrongly.


Two variants for UT699 (hard-float and soft-float) and for AT697F (hard-float and soft-float). We could remove AT697 soft-float since the mfix-at697f only fixes float instructions at the moment but I 
though it would be easier and MULTILIB_REUSE didn't work I'm guessing since there is a = sign in the target switch.




Sorry, for these stupid questions. Maybe we should change this file to use 
MULTILIB_REQUIRED, like on rs6000 and arm.

Which multilibs do we have after this change?


.;
soft;@msoft-float
v8;@mcpu=v8
leon3;@mcpu=leon3
leon3v7;@mcpu=leon3v7
leon;@mcpu=leon
leon/ut699;@mcpu=leon@mfix-ut699
leon/at697f;@mcpu=leon@mfix-at697f
soft/v8;@msoft-float@mcpu=v8
soft/leon3;@msoft-float@mcpu=leon3
soft/leon3v7;@msoft-float@mcpu=leon3v7
soft/leon;@msoft-float@mcpu=leon
soft/leon/ut699;@msoft-float@mcpu=leon@mfix-ut699
soft/leon/at697f;@msoft-float@mcpu=leon@mfix-at697f


Regards,
Daniel


[gomp4] lockless reductions

2015-09-28 Thread Nathan Sidwell
I've committed this patch, which implements a lockless update scheme.  This 
fixes the deadlock (hypothesized to come from resource starvation)  observed 
with worker-level locks in shared memory.  It also will make it easier to 
replace the lockless updating of a particular variable with an atomic, should 
such an atomic exist. (We won't end up with orphaned lock/unlock calls).


The scheme relies on cmp&swap, which uses a new nvptx builtin.   The generic 
atomic_compare_exchange variants all take an address for the variable holding 
the value to be stored, rather than a plain rvalue. The underlying PTX 
instruction doesn't need that.  I wanted to avoid unnecessary lvalue forcing 
early in the compiler.


The algorithm is essentially:

T actual = initval(OP) // [*]
do {
  T guess = actual;
  T write = guess OP myval
  T actual = cmp&swap (ptr, guess, write)
} while (actual != guess)

In a race one thread engine is guaranteed to complete.

I'd forgotten that the barriers needed when entering and leaving worker 
partitioned execution are sufficient memory barriers to not need the setup and 
teardown of the worker reduction buffer to be atomics.  Hence deleted the SWAP 
builtin I just added.


This change permits us to remove all the locking pieces, which I'll progress to.

nathan

[*] I  thought about havng this read *ptr for the initial guess, but decided 
against it for simplicity.  That read would need to be atomic, (or we insert 
memory barriers, which would be worse).  Why not just use the atomic cmp&swp 
later to get an initial value.  initval(OP) is more than likely to be a correct 
guess for the first thread reaching here, so we  save one memory access.
2015-09-28  Nathan Sidwell  

	* config/nvptx/nvptx.md (atomic_compare_and_swap_1,
	atomic_exchange): Incoming values can be constants.
	* config/nvptx/nvptx.c (nvptx_expand_swap): Delete.
	(nvptx_expand_cmp_swap): Force operands into regs.
	(enum nvptx_builtins): Delete SWAP and SWAPLL.
	(nvptx_init_builtins): Likewise.
	(nvptx_expand_builtin): Likewise.
	(nvptx_xform_lock): Always return true.
	(nvptx_get_worker_red_addr): No need to deal with QI and HI modes.
	(nvptx_generate_vector_shuffle): Tweak.
	(nvptx_lockless_update): New.
	(nvptx_goacc_reduction_setup): Remove bitrotted description, add
	comments.
	(nvptx_goacc_reduction_teardown, nvptx_goacc_reduction_init): Likewise.
	(nvptx_goacc_reduction_fini): Likewise.  Call lockless update.

Index: config/nvptx/nvptx.md
===
--- config/nvptx/nvptx.md	(revision 228103)
+++ config/nvptx/nvptx.md	(working copy)
@@ -1517,8 +1517,8 @@
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
 	(unspec_volatile:SDIM
 	  [(match_operand:SDIM 1 "memory_operand" "+m")
-	   (match_operand:SDIM 2 "nvptx_register_operand" "R")
-	   (match_operand:SDIM 3 "nvptx_register_operand" "R")
+	   (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri")
+	   (match_operand:SDIM 3 "nvptx_nonmemory_operand" "Ri")
 	   (match_operand:SI 4 "const_int_operand")]
 	  UNSPECV_CAS))
(set (match_dup 1)
@@ -1533,7 +1533,7 @@
 	   (match_operand:SI 3 "const_int_operand")]		;; model
 	  UNSPECV_XCHG))
(set (match_dup 1)
-	(match_operand:SDIM 2 "nvptx_register_operand" "R"))]	;; input
+	(match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))]	;; input
   ""
   "%.\\tatom%A1.exch.b%T0\\t%0, %1, %2;")
 
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228103)
+++ config/nvptx/nvptx.c	(working copy)
@@ -4198,33 +4198,11 @@ nvptx_expand_worker_addr (tree exp, rtx
 }
 
 static rtx
-nvptx_expand_swap (tree exp, rtx target,
-		   machine_mode mode, int ARG_UNUSED (ignore))
-{
-  if (!target)
-target = gen_reg_rtx (mode);
-
-  rtx mem = expand_expr  (CALL_EXPR_ARG (exp, 0),
-			  NULL_RTX, Pmode, EXPAND_NORMAL);
-  rtx src = expand_expr (CALL_EXPR_ARG (exp, 1),
-			 NULL_RTX, mode, EXPAND_NORMAL);
-
-  rtx pat;
-  
-  if (mode == SImode)
-pat = gen_atomic_exchangesi (target, mem, src, const0_rtx);
-  else
-pat = gen_atomic_exchangedi (target, mem, src, const0_rtx);
-
-  emit_insn (pat);
-
-  return target;
-}
-
-static rtx
 nvptx_expand_cmp_swap (tree exp, rtx target,
-		   machine_mode mode, int ARG_UNUSED (ignore))
+		   machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))
 {
+  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+  
   if (!target)
 target = gen_reg_rtx (mode);
 
@@ -4237,6 +4215,10 @@ nvptx_expand_cmp_swap (tree exp, rtx tar
   rtx pat;
 
   mem = gen_rtx_MEM (mode, mem);
+  if (!REG_P (cmp))
+cmp = copy_to_mode_reg (mode, cmp);
+  if (!REG_P (src))
+src = copy_to_mode_reg (mode, src);
   
   if (mode == SImode)
 pat = gen_atomic_compare_and_swapsi_1 (target, mem, cmp, src, const0_rtx);
@@ -4255,8 +4237,6 @@ enum nvptx_builtins
   NVPTX_BUILTIN_SHUFFLE,
   NVPTX_BUILTIN_SHUFFLELL,
   NVPTX_BUILTIN_WORKER_ADDR,
-  NVPTX_BUILTIN_SWAP,
-  

Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Sebastian Huber



On 28/09/15 14:33, Sebastian Huber wrote:

On 28/09/15 14:13, Daniel Hellstrom wrote:
Now that muser-mode is default the multilib definitions does not 
require to
specify that switch any more. Add UT699 to multilib after recent 
patches. Add

AT697F multilib since there are many LEON2 users running RTEMS.

To gcc/ChangeLog:

gcc/
* config/sparc/t-rtems: Remove -muser-mode, add ut699 and at697f
---
  gcc/config/sparc/t-rtems |   25 +++--
  1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index adb6dcb..6f7cc6f 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,15 +17,20 @@
  # .
  #
  -MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7 
muser-mode

-MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 user-mode
+MULTILIB_OPTIONS = msoft-float 
mcpu=v8/mcpu=leon3/mcpu=leon3v7/mcpu=leon \

+   mfix-ut699/mfix-at697f
+MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 leon ut699 at697f
  MULTILIB_MATCHES = msoft-float=mno-fpu
  -MULTILIB_EXCEPTIONS = muser-mode
-MULTILIB_EXCEPTIONS += mcpu=leon3
-MULTILIB_EXCEPTIONS += mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/muser-mode
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/muser-mode
-MULTILIB_EXCEPTIONS += mcpu=v8/muser-mode
+MULTILIB_EXCEPTIONS = mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-at697f


Why do we need so many variants with fixes for the UT699 (32-bit 
Fault-Tolerant LEON3FT SPARC V8 Processor) and the AT697F (LEON2-FT)? 
Why do we have no leon3 variant without fixes for the UT699 and 
AT697F? Which multilib do you suggest now for the NGMP/GR740?




Sorry, for these stupid questions. Maybe we should change this file to 
use MULTILIB_REQUIRED, like on rs6000 and arm.


Which multilibs do we have after this change?

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



[PATCH] Remove restriction for remote testing

2015-09-28 Thread James Norris

Hi,

The attached patch fixes a problem when doing remote testing.
Specifically, testing of the atomic tests found in gcc/atomic.
The code in atomic_init precludes the setting of the variable
'link_flags' when doing remote testing. The conditional test
can be safely removed as get_multilibs will return "", and
atomic_link_flags will return the necessary '-latomic' that
will allow the atomic tests to successfully link.

OK for trunk?

Thanks,
Jim

Index: gcc/testsuite/lib/atomic-dg.exp
===
--- gcc/testsuite/lib/atomic-dg.exp	(revision 227981)
+++ gcc/testsuite/lib/atomic-dg.exp	(working copy)
@@ -63,12 +63,10 @@ proc atomic_init { args } {
 global atomic_saved_TEST_ALWAYS_FLAGS
 
 set link_flags ""
-if ![is_remote host] {
-	if [info exists TOOL_OPTIONS] {
-	set link_flags "[atomic_link_flags [get_multilibs ${TOOL_OPTIONS}]]"
-	} else {
-	set link_flags "[atomic_link_flags [get_multilibs]]"
-	}
+if [info exists TOOL_OPTIONS] {
+	set link_flags "[atomic_link_flags [get_multilibs ${TOOL_OPTIONS}]]"
+} else {
+	set link_flags "[atomic_link_flags [get_multilibs]]"
 }
 
 if [info exists TEST_ALWAYS_FLAGS] {


Re: New post-LTO OpenACC pass

2015-09-28 Thread Nathan Sidwell

On 09/25/15 09:19, Bernd Schmidt wrote:

On 09/25/2015 03:03 PM, Bernd Schmidt wrote:

182  else if (acc_device_type (acc_dev->type) == acc_device_host)
(gdb) p acc_dev->type
$1 = OFFLOAD_TARGET_TYPE_HOST
(gdb) next
184  fn (hostaddrs);

It's not running the offloaded version, so the testcase I think should
fail.


... and that's because my system was no longer set up to run CUDA binaries,
after I fixed that the testcase passes.

So as far as I can tell almost everything here works as expected?


hm strange.  will take another look this week.  Thanks for looking.

nathan



Re: [PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Sebastian Huber

On 28/09/15 14:13, Daniel Hellstrom wrote:

Now that muser-mode is default the multilib definitions does not require to
specify that switch any more. Add UT699 to multilib after recent patches. Add
AT697F multilib since there are many LEON2 users running RTEMS.

To gcc/ChangeLog:

gcc/
* config/sparc/t-rtems: Remove -muser-mode, add ut699 and at697f
---
  gcc/config/sparc/t-rtems |   25 +++--
  1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index adb6dcb..6f7cc6f 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,15 +17,20 @@
  # .
  #
  
-MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7 muser-mode

-MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 user-mode
+MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7/mcpu=leon \
+  mfix-ut699/mfix-at697f
+MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 leon ut699 at697f
  MULTILIB_MATCHES = msoft-float=mno-fpu
  
-MULTILIB_EXCEPTIONS = muser-mode

-MULTILIB_EXCEPTIONS += mcpu=leon3
-MULTILIB_EXCEPTIONS += mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/muser-mode
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/muser-mode
-MULTILIB_EXCEPTIONS += mcpu=v8/muser-mode
+MULTILIB_EXCEPTIONS = mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-at697f


Why do we need so many variants with fixes for the UT699 (32-bit 
Fault-Tolerant LEON3FT SPARC V8 Processor) and the AT697F (LEON2-FT)? 
Why do we have no leon3 variant without fixes for the UT699 and AT697F? 
Which multilib do you suggest now for the NGMP/GR740?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] Convert SPARC to LRA

2015-09-28 Thread Oleg Endo
On Sun, 2015-09-27 at 19:29 -0600, Jeff Law wrote:
> On 09/27/2015 01:57 PM, Hans-Peter Nilsson wrote:
> > On Wed, 9 Sep 2015, Mike Stump wrote:
> >
> >> On Sep 8, 2015, at 9:41 PM, David Miller  wrote:
> >>> +#define TARGET_LRA_P hook_bool_void_true
> >>
> >> Are we at the point there this should be the default, and old
> >> ports should just define to false, if they really need to?
> >
> > I think no.  For one, we don't have proper target documentation
> > updates for LRA.  What does it need?  What is outdated?
> >
> > Also, give ample time for gcc releases of odd ports with LRA to
> > get into the public and cover most of the inevitable remaining
> > bugs.  Not even sh has moved over due to remaining issues.  Let
> > the reports come in - and be fixed.  Let's revisit in a year or
> > two.
> I don't think we're there yet either -- many ports still require some 
> guidance from Vlad to get working with LRA.
> 
> It *may* be time to decree that any new ports must use the LRA path 
> rather than reload.  I'm still on the fence with that.

LRA on SH seems to work without GCC test suite failures.  However, I'd
expect that there still hidden bugs not covered by the test suite.  SH's
R0 spill failures are greatly reduced with LRA, although some hacks had
to be added to the SH backend to make it work at all.  Despite that, we
see quite some significant code size increases compared to reload.  If
the difference wasn't that big, we'd probably turn LRA on by default for
SH in GCC 6...

Cheers,
Oleg



[PATCH] Update RTEMS multilib for SPARC

2015-09-28 Thread Daniel Hellstrom
Now that muser-mode is default the multilib definitions does not require to
specify that switch any more. Add UT699 to multilib after recent patches. Add
AT697F multilib since there are many LEON2 users running RTEMS.

To gcc/ChangeLog:

gcc/
* config/sparc/t-rtems: Remove -muser-mode, add ut699 and at697f
---
 gcc/config/sparc/t-rtems |   25 +++--
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/sparc/t-rtems b/gcc/config/sparc/t-rtems
index adb6dcb..6f7cc6f 100644
--- a/gcc/config/sparc/t-rtems
+++ b/gcc/config/sparc/t-rtems
@@ -17,15 +17,20 @@
 # .
 #
 
-MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7 muser-mode
-MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 user-mode
+MULTILIB_OPTIONS = msoft-float mcpu=v8/mcpu=leon3/mcpu=leon3v7/mcpu=leon \
+  mfix-ut699/mfix-at697f
+MULTILIB_DIRNAMES = soft v8 leon3 leon3v7 leon ut699 at697f
 MULTILIB_MATCHES = msoft-float=mno-fpu
 
-MULTILIB_EXCEPTIONS = muser-mode
-MULTILIB_EXCEPTIONS += mcpu=leon3
-MULTILIB_EXCEPTIONS += mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3v7
-MULTILIB_EXCEPTIONS += msoft-float/muser-mode
-MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/muser-mode
-MULTILIB_EXCEPTIONS += mcpu=v8/muser-mode
+MULTILIB_EXCEPTIONS = mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-ut699
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-ut699
+MULTILIB_EXCEPTIONS += mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += msoft-float/mcpu=leon3*/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=v8/mfix-at697f
+MULTILIB_EXCEPTIONS += mcpu=leon3*/mfix-at697f
-- 
1.7.0.4



Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 02:00 PM, Jakub Jelinek wrote:

On Mon, Sep 28, 2015 at 01:27:32PM +0200, Bernd Schmidt wrote:

I've removed obstack_ptr_grow for arrays with known sizes after this review:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html


That's unfortunate, I think that made the code less future-proof. IMO we
should revert to the obstack method especially if Thomas -v patch goes in.


Why?  If the number of arguments is bound by a small constant, using
automatic fixed size array is certainly more efficient, and I really don't
see it as less readable or maintainable.


The code becomes harder to modify, with more room for error, and you no 
longer have consistency in how you build argv arrays within the same 
file. The obstack method is pretty much foolproof and doesn't even 
remotely allow for the possibility of a buffer overflow, and adding new 
arguments, even conditionally, is entirely trivial. Efficiency is really 
not an issue for building arguments compared to the cost of executing 
another binary.



Bernd


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Jakub Jelinek
On Mon, Sep 28, 2015 at 01:27:32PM +0200, Bernd Schmidt wrote:
> >I've removed obstack_ptr_grow for arrays with known sizes after this review:
> >https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html
> 
> That's unfortunate, I think that made the code less future-proof. IMO we
> should revert to the obstack method especially if Thomas -v patch goes in.

Why?  If the number of arguments is bound by a small constant, using
automatic fixed size array is certainly more efficient, and I really don't
see it as less readable or maintainable.

Jakub


Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 11:44 AM, Dominik Vogt wrote:

On Fri, Sep 25, 2015 at 03:33:56PM +0200, Bernd Schmidt wrote:

On 09/24/2015 03:48 PM, Dominik Vogt wrote:

Hm, I wonder whether wrapping all these section switches in
assemble_start/end_function in ".machine" pseudoops (that's what
we need the hooks for; similar to .arch for ix86) has any real
effect.


I don't think I follow what you're trying to say here?


I mean, it's more or less random whether switching to and from the
function's section ends up inside the new .machine and
.machinemode directives (if the section needs to be switched for
this function) or outside (if the assembler code had already
switched to the correct section earlier).  I assume that .machine
and .machinemode have no effect on the section switching, but I'm
not completely sure (alignment?).

(@Andreas + Uli: Do you know of any effect this would have on
s390?)


Still not really following since I don't know anything about s390 and 
its directives. In case you're trying to figure out whether it's 
possible to use the existing macros, please continue doing so. If you 
reach the conclusion that you really do need the new hooks, your patch 
is ok. However, you probably should add a sentence or two to the 
documentation to specify ordering wrt other parts of the header of a 
function.



Bernd


Re: [PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 11:43 AM, Bin Cheng wrote:

Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So any
comments?

Thanks,
bin

2015-09-28  Bin Cheng  

* loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
(inv_cant_fwprop_to_addr_use): New function.
(record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
(get_inv_cost): Count cost if inv can't be propagated into its
address uses.


It looks at least plausible. Another option which I think has had some 
discussion recently would be to just move everything, and leave it to 
cprop to put things back together if the costs allow it.



Bernd


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 01:25 PM, Ilya Verbin wrote:

On Mon, Sep 28, 2015 at 12:09:19 +0200, Bernd Schmidt wrote:

On 09/28/2015 12:03 PM, Bernd Schmidt wrote:

On 09/28/2015 10:26 AM, Thomas Schwinge wrote:

-  objcopy_argv[8] = NULL;
+  objcopy_argv[objcopy_argc++] = NULL;
+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);


On its own this is not an improvement - you're trading a compile time
error for a runtime error. So, what is the other change this is
preparing for?


Ok, I now see the other patch. But I also see that other code in the same
file and in the nvptx mkoffload is using the obstack_ptr_grow method to
build argv arrays, I think that would be preferrable to this.


I've removed obstack_ptr_grow for arrays with known sizes after this review:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html


That's unfortunate, I think that made the code less future-proof. IMO we 
should revert to the obstack method especially if Thomas -v patch goes in.



Bernd


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Ilya Verbin
On Mon, Sep 28, 2015 at 12:09:19 +0200, Bernd Schmidt wrote:
> On 09/28/2015 12:03 PM, Bernd Schmidt wrote:
> >On 09/28/2015 10:26 AM, Thomas Schwinge wrote:
> >>-  objcopy_argv[8] = NULL;
> >>+  objcopy_argv[objcopy_argc++] = NULL;
> >>+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);
> >
> >On its own this is not an improvement - you're trading a compile time
> >error for a runtime error. So, what is the other change this is
> >preparing for?
> 
> Ok, I now see the other patch. But I also see that other code in the same
> file and in the nvptx mkoffload is using the obstack_ptr_grow method to
> build argv arrays, I think that would be preferrable to this.

I've removed obstack_ptr_grow for arrays with known sizes after this review:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html

  -- Ilya


Re: lto wrapper verboseness

2015-09-28 Thread Bernd Schmidt


We need to pass on the verbose flag.  I once came up with the following
patch (depends on "Refactor intelmic-mkoffload.c argv building",
);
OK for trunk?

/* Run objcopy.  */
@@ -457,6 +465,8 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
sprintf (rename_section_opt, ".data=%s", image_section_name);
objcopy_argc = 0;
objcopy_argv[objcopy_argc++] = "objcopy";
+  if (verbose)
+objcopy_argv[objcopy_argc++] = "-v";
objcopy_argv[objcopy_argc++] = "-B";
objcopy_argv[objcopy_argc++] = "i386";
objcopy_argv[objcopy_argc++] = "-I";


I'm not convinced we gain much by passing "-v" to objcopy, but I'll 
leave that for the Intel folks to decide.


Other than that, ok if all argv arrays are constructed using obstacks.


Bernd


Re: Use gcc/coretypes.h:enum offload_abi in mkoffloads

2015-09-28 Thread Bernd Schmidt

Hi Thomas,

Your patch submissions are sometimes very verbose which makes them hard 
to follow.



commit de4d7cbcf979edc095a48dff5b38d12846bdab6f
Author: Thomas Schwinge 
Date:   Tue Aug 4 13:12:36 2015 +0200


Cut unnecessary information such as this. git headers are uninteresting.


 Use gcc/coretypes.h:enum offload_abi in mkoffloads


That one included unfortunate yet popular ;-) strcmp "typos":


+#define STR "-foffload-abi="
+  if (strncmp (argv[i], STR, strlen (STR)) == 0)
+   {
+ if (strcmp (argv[i] + strlen (STR), "lp64"))
+   offload_abi = OFFLOAD_ABI_LP64;
+ else if (strcmp (argv[i] + strlen (STR), "ilp32"))
+   offload_abi = OFFLOAD_ABI_ILP32;


..., so with these fixed up:

--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -544,9 +544,9 @@ main (int argc, char **argv)
  #define STR "-foffload-abi="
if (strncmp (argv[i], STR, strlen (STR)) == 0)
{
- if (strcmp (argv[i] + strlen (STR), "lp64"))
+ if (strcmp (argv[i] + strlen (STR), "lp64") == 0)
offload_abi = OFFLOAD_ABI_LP64;
- else if (strcmp (argv[i] + strlen (STR), "ilp32"))
+ else if (strcmp (argv[i] + strlen (STR), "ilp32") == 0)
offload_abi = OFFLOAD_ABI_ILP32;
  else
fatal_error (input_location,
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -1013,9 +1013,9 @@ main (int argc, char **argv)
  #define STR "-foffload-abi="
if (strncmp (argv[i], STR, strlen (STR)) == 0)
{
- if (strcmp (argv[i] + strlen (STR), "lp64"))
+ if (strcmp (argv[i] + strlen (STR), "lp64") == 0)
offload_abi = OFFLOAD_ABI_LP64;
- else if (strcmp (argv[i] + strlen (STR), "ilp32"))
+ else if (strcmp (argv[i] + strlen (STR), "ilp32") == 0)
offload_abi = OFFLOAD_ABI_ILP32;
  else
fatal_error (input_location,


This confused me for a while because I thought you were proposing the 
above patch. A single line "I fixed that in the following version" would 
have been a clearer way to communicate that doesn't take up a page of space.



..., I'll again propose the following patch for trunk:

commit 922278239a9d346ebde99e616185a91fbfaf
Author: Thomas Schwinge 
Date:   Tue Aug 4 13:12:36 2015 +0200

 Use gcc/coretypes.h:enum offload_abi in mkoffloads

gcc/
* config/i386/intelmic-mkoffload.c (target_ilp32): Remove
variable, replacing it with...
(offload_abi): ... this new variable.  Adjust all users.
* config/nvptx/mkoffload.c (target_ilp32, offload_abi): Likewise.
---
  gcc/config/i386/intelmic-mkoffload.c |   90 +++---
  gcc/config/nvptx/mkoffload.c |   56 +++--
  2 files changed, 101 insertions(+), 45 deletions(-)


Just the ChangeLog please, not the other noise.


+  abort ();


Can we have gcc_unreachable() in these tools?

Other than that, it looks ok but it also doesn't seem to do anything. 
Are you intending to add more ABIs?



Bernd


Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 12:03 PM, Bernd Schmidt wrote:

On 09/28/2015 10:26 AM, Thomas Schwinge wrote:

-  objcopy_argv[8] = NULL;
+  objcopy_argv[objcopy_argc++] = NULL;
+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);


On its own this is not an improvement - you're trading a compile time
error for a runtime error. So, what is the other change this is
preparing for?


Ok, I now see the other patch. But I also see that other code in the 
same file and in the nvptx mkoffload is using the obstack_ptr_grow 
method to build argv arrays, I think that would be preferrable to this.



Bernd



Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Bernd Schmidt

On 09/28/2015 10:26 AM, Thomas Schwinge wrote:

-  objcopy_argv[8] = NULL;
+  objcopy_argv[objcopy_argc++] = NULL;
+  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);


On its own this is not an improvement - you're trading a compile time 
error for a runtime error. So, what is the other change this is 
preparing for?



Bernd



Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-28 Thread Dominik Vogt
On Fri, Sep 25, 2015 at 03:33:56PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 03:48 PM, Dominik Vogt wrote:
> >Hm, I wonder whether wrapping all these section switches in
> >assemble_start/end_function in ".machine" pseudoops (that's what
> >we need the hooks for; similar to .arch for ix86) has any real
> >effect.
> 
> I don't think I follow what you're trying to say here?

I mean, it's more or less random whether switching to and from the
function's section ends up inside the new .machine and
.machinemode directives (if the section needs to be switched for
this function) or outside (if the assembler code had already
switched to the correct section earlier).  I assume that .machine
and .machinemode have no effect on the section switching, but I'm
not completely sure (alignment?).

(@Andreas + Uli: Do you know of any effect this would have on
s390?)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH GCC]Improve rtl loop inv cost by checking if the inv can be propagated to address uses

2015-09-28 Thread Bin Cheng
Hi,
For below rtl dump before loop invariant pass:

LOOP:
 1482: r1838:DI=0xa880
 1483: r1837:DI=sfp:DI+r1838:DI
  REG_DEAD r1838:DI
  REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
  REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
  REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
  REG_DEAD r1841:V4SI
  REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...

While the dump for loop invariant pass is as below:

;;Set in insn 1471 is invariant (0), cost 4, depends on 
;;Set in insn 1482 is invariant (1), cost 4, depends on 
;;Set in insn 1483 is invariant (2), cost 4, depends on 1
;;Decided to move invariant 0 -- gain 4
;;Decided to move invariant 1 -- gain 4
 
 2034: r2163:DI=0xa880
LOOP:
 1483: r1837:DI=sfp:DI+r2163:DI
  REG_DEAD r1838:DI
  REG_EQUAL sfp:DI-0x5780
 1484: r1839:V4SI=r910:V4SI>>const_vector
 1485: r1840:V4SI=r1067:V4SI+r910:V4SI
  REG_DEAD r910:V4SI
 1486: r1841:V4SI=r1840:V4SI>>const_vector
  REG_DEAD r1840:V4SI
 1487: r1842:V8HI=vec_concat(trunc(r1839:V4SI),trunc(r1841:V4SI))
  REG_DEAD r1841:V4SI
  REG_DEAD r1839:V4SI
 1488: [r870:DI+r1837:DI]=r1842:V8HI
 ...

Note instructions 1482 and 1483 both compute loop invariant values, but only
1482 is hoisted out of loop. Since computation in 1483 uses sfp, the final
assembly code is even worse because we need another one or two instructions
to compute sfp, depending on the immediate constant value.

The direct reason that r1837 isn't hoisted is its cost is computed as 0,
rather than 4.  I believe this is a known issue and we have tried more than
once by tuning the famous magic number "3" in loop-invariant.c.  After
investigation, I believe the problem lies in cost computation, rather than
the magic number itself.  Maybe that's one reason those experiments didn't
end with good results.

The below check conditions count invariant expr's cost only if the expr is
used outside of address expression, or the address expression is too
expensive. There is an implicit assumption in it: If the invariant
expression is not referred outside of address expression, it can be forward
propagated into address expressions. But this assumption is not always true,
especially on target with limited addressing modes.

  if (!inv->cheap_address
  || inv->def->n_uses = 0
  || inv->def->n_addr_uses < inv->def->n_uses)
(*comp_cost) += inv->cost * inv->eqno;

Look at the example, r1837 computed in insn1483 is used in address
expression in insn1488, but it can't be forward propagated into it because
"r1870 + sfp + 0xa880" isn't a valid address expression on aarch64.
Which means r1837 has to be computed as an independent instruction and the
cost should be counted.

IMHO, we need to track if loop invariant expression can/cant be propagated
into address expressions and use that information to compute the cost, as
below:

  if (!inv->cheap_address
  || inv->def->n_uses = 0
  || inv->def->n_addr_uses < inv->def->n_uses
  || inv->def->cant_prop_to_addr_use)
(*comp_cost) += inv->cost * inv->eqno;

Though this patch can be improved by analyze address expression propagation
more precisely, experiments shows spec2k/fp is already improved on aarch64.
I will collect data for spec2k6 later but would like to start discussing
before my holiday.  I also collected spec2k6 on x86_64, no regression.

Bootstrap and test on x86_64 and x86_32.  Will test it on aarch64.  So any
comments?

Thanks,
bin

2015-09-28  Bin Cheng  

* loop-invariant.c (struct def): New field cant_fwprop_to_addr_uses.
(inv_cant_fwprop_to_addr_use): New function.
(record_use): Call inv_cant_fwprop_to_addr_use, set the new field.
(get_inv_cost): Count cost if inv can't be propagated into its
address uses.
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..3c2395c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -99,6 +99,8 @@ struct def
   unsigned n_uses; /* Number of such uses.  */
   unsigned n_addr_uses;/* Number of uses in addresses.  */
   unsigned invno;  /* The corresponding invariant.  */
+  bool cant_prop_to_addr_uses; /* True if the corresponding inv can't be
+  propagated into its address uses.  */
 };
 
 /* The data stored for each invariant.  */
@@ -762,6 +764,34 @@ create_new_invariant (struct def *def, rtx_insn *insn, 
bitmap depends_on,
   return inv;
 }
 
+/* Given invariant DEF and its address USE, check if the corresponding
+   invariant expr can be propagated into the use or not.  */
+
+static bool
+inv_cant_prop_to_addr_use (struct def *def, df_ref use)
+{
+  struct invariant *inv;
+  rtx *pos = DF_REF_REAL_LOC (use), def_set;
+  rtx_insn *use_insn = DF_REF_INSN (use);
+  rtx_insn *def_insn;
+  bool ok;
+
+  inv = invariants[def->invno];
+

Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-28 Thread Thomas Schwinge
Hi!

On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  wrote:
> So, do I understand well that you'll call GOMP_set_offload_targets from
> construct[ors] of all shared libraries (and the binary) that contain offloaded
> code?  If yes, that is surely going to fail the assertions in there.

Indeed.  My original plan has been to generate/invoke this constructor
only for/from the final executable and not for any shared libraries, but
it seems I didn't implemented this correctly.

> You can dlopen such libraries etc.  What if you link one library with
> -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?

So, the first question to answer is: what do we expect to happen in this
case, or similarly, if the executable and any shared libraries are
compiled with different/incompatible -foffload options?

Given that OpenMP's default-device-var ICV is per process (that is, not
separate for the executable and any shared library), and thus, once
libgomp has settled on this ICV (by first execution of an offloading
construct, for example), any offloading attempt of code compiled with
incompatible -foffload options will have to fail, because the
corresponding offloading device's code just isn't available.  We can't
avoid this situation, as it is not possible for libgomp to simply switch
to a different offloading device (or host fallback, for that matter):
libgomp doesn't have any knowledge of the current state of data regions
setup between the host and device(s), for instance.

For this, I propose that the only mode of operation that we currently can
support is that all of the executable and any shared libraries agree on
the offload targets specified by -foffload, and I thus propose the
following patch on top of what Joseph has posted before (passes the
testsuite, but not yet tested otherwise):

 libgomp/libgomp-plugin.h |3 +-
 libgomp/target.c |  157 +-
 2 files changed, 130 insertions(+), 30 deletions(-)

diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index 24fbb94..5da4fa7 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HWM
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git libgomp/target.c libgomp/target.c
index 4dd5913..d1e794a 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -68,6 +68,9 @@ static struct offload_image_descr *offload_images;
 /* Total number of offload images.  */
 static int num_offload_images;
 
+/* List of offload targets, separated by colon.  */
+static const char *gomp_offload_targets;
+
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -1121,6 +1124,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1216,39 +1221,120 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
   return 0;
 }
 
-/* Return the corresponding plugin name for the offload target name
-   OFFLOAD_TARGET.  */
+/* Return the corresponding offload target type for the offload target name
+   OFFLOAD_TARGET, or 0 if unknown.  */
 
-static const char *
-offload_target_to_plugin_name (const char *offload_target)
+static enum offload_target_type
+offload_target_to_type (const char *offload_target)
 {
   if (strstr (offload_target, "-intelmic") != NULL)
-return "intelmic";
-  if (strncmp (offload_target, "nvptx", 5) == 0)
-return "nvptx";
-  gomp_fatal ("Unknown offload target: %s", offload_target);
+return OFFLOAD_TARGET_TYPE_INTEL_MIC;
+  else if (strncmp (offload_target, "nvptx", 5) == 0)
+return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+  else
+return 0;
 }
 
-/* List of offload targets, separated by colon.  Defaults to the list
-   determined when configuring libgomp.  */
-static const char *gomp_offload_targets = OFFLOAD_TARGETS;
-static bool gomp_offload_targets_init = false;
+/* Return the corresponding plugin name for the offload target type TYPE, or
+   NULL if unknown.  */
+
+static const char *
+offload_target_type_to_plugin_name (enum offload_target_type type)
+{
+  switch (type)
+{
+case OFFLOAD_TARGET_TYPE_INTEL_MIC:
+  return "intelmic";
+case OFFLOAD_TARGET_TYPE_NVIDIA_PTX:
+  return "nvptx";
+default:
+  return NULL;
+}
+}
 
 /* Override the list of offload targets with OFFLOAD_TARGETS, the set
-   passed to the compiler at link time.  This must be called early,
-   and only once.  */
+   passed to the compiler at link time.  */
 
 vo

libgomp: Guard all devices/num_devices/num_devices_openmp access by register_lock (was: libgomp: Guard all offload_images/num_offload_images access by register_lock)

2015-09-28 Thread Thomas Schwinge
Hi!

On Fri, 25 Sep 2015 19:49:50 +0300, Ilya Verbin  wrote:
> On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> > On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> > > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > > of mutex guarding gomp_target_init (which is using pthread_once 
> > > > guaranteed
> > > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > > (if those are run from ctors, then I guess something like dl_load_lock
> > > > ensures at least on glibc that multiple GOMP_offload_register calls 
> > > > aren't
> > > > performed at the same time) in accessing/reallocating offload_images
> > > > and num_offload_images and the lack of support to register further
> > > > images after the gomp_target_init call (if you dlopen further shared
> > > > libraries) is really bad.  And it would be really nice to support the
> > > > unloading.
> > 
> > > Here is the latest patch for libgomp and mic plugin.
> > 
> > What about the scenario where one thread is inside
> > GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> > shared library with such a mkoffload-generated constructor) and is
> > modifying offload_images with register_lock held, and another thread is
> > inside a GOMP_target* construct -> gomp_init_device and is accessing
> > offload_images without register_lock held?  Or, why isn't that a
> > reachable scenario?
> > 
> > Would the following patch (untested) do the right thing (locking added to
> > gomp_init_device and gomp_unload_device)?  We can then also remove the
> > is_register_lock parameter from gomp_load_image_to_device, and simplify
> > the code.
> 
> Looks like you're right, and this scenario is possible.

Thanks for your review!  Jakub, OK to commit the patch I had posted?


Then, in context of a similar scenario, I think we'll also want the
following.  Please confirm that my reasoning in gomp_get_num_devices and
resolve_device is correct.  OK for trunk?

commit b0cf4dcc588e432c0a0d19d85727a20210b4d837
Author: Thomas Schwinge 
Date:   Sat Sep 26 15:48:09 2015 +0200

libgomp: Guard all devices/num_devices/num_devices_openmp access by 
register_lock
---
 libgomp/target.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/target.c libgomp/target.c
index 1fbbe31..6f0a339 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -49,7 +49,7 @@ static void gomp_target_init (void);
 /* The whole initialization code for offloading plugins is only run one.  */
 static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
 
-/* Mutex for offload image registration.  */
+/* Mutex for offload targets setup and image registration.  */
 static gomp_mutex_t register_lock;
 
 /* This structure describes an offload image.
@@ -118,6 +118,8 @@ attribute_hidden int
 gomp_get_num_devices (void)
 {
   gomp_init_targets_once ();
+  /* As it is immutable once it has been initialized, it's safe to access
+ num_devices_openmp without register_lock held.  */
   return num_devices_openmp;
 }
 
@@ -133,6 +135,8 @@ resolve_device (int device_id)
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
 return NULL;
 
+  /* As it is immutable once it has been initialized, it's safe to access
+ devices without register_lock held.  */
   return &devices[device_id];
 }
 
@@ -1228,6 +1232,8 @@ gomp_target_init (void)
   char *plugin_name;
   int i, new_num_devices;
 
+  gomp_mutex_lock (®ister_lock);
+
   num_devices = 0;
   devices = NULL;
 
@@ -1317,6 +1323,8 @@ gomp_target_init (void)
   if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
goacc_register (&devices[i]);
 }
+
+  gomp_mutex_unlock (®ister_lock);
 }
 
 #else /* PLUGIN_SUPPORT */


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: lto wrapper verboseness

2015-09-28 Thread Thomas Schwinge
Hi!

On Thu, 30 Jul 2015 10:09:05 +0200, Richard Biener  
wrote:
> On Thu, Jul 30, 2015 at 1:05 AM, Nathan Sidwell  wrote:
> > Jakub,
> > this patch augments the lto wrapper to print out the arguments to spawned
> > commands when verbose.  I found this useful in debugging recent development.
> >
> > ok for trunk?
> 
> Err - fork_execute through collect_execute already does this if
> verbose || debug.
> 
> So better figure out why this doesn't work.

We need to pass on the verbose flag.  I once came up with the following
patch (depends on "Refactor intelmic-mkoffload.c argv building",
);
OK for trunk?

commit ad0b6608cff22b62b73016c91c74b21a168acb46
Author: Thomas Schwinge 
Date:   Tue Aug 4 14:08:23 2015 +0200

Pass on the verbose flag "-v" to/in the mkoffloads

gcc/
* config/i386/intelmic-mkoffload.c (main): Parse "-v" flag.
(generate_target_descr_file, generate_target_offloadend_file)
(generate_host_descr_file, prepare_target_image, main): Pass it
on.
* config/nvptx/mkoffload.c (main): Parse "-v" flag.
(compile_native, main): Pass it on.
* lto-wrapper.c (compile_offload_image): Likewise.
---
 gcc/config/i386/intelmic-mkoffload.c |   24 +---
 gcc/config/nvptx/mkoffload.c |6 ++
 gcc/lto-wrapper.c|2 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git gcc/config/i386/intelmic-mkoffload.c 
gcc/config/i386/intelmic-mkoffload.c
index 8d5af0d..ae1bde0 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -281,6 +281,8 @@ generate_target_descr_file (const char *target_compiler)
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
   obstack_ptr_grow (&argv_obstack, target_compiler);
+  if (verbose)
+obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-c");
   obstack_ptr_grow (&argv_obstack, "-shared");
   obstack_ptr_grow (&argv_obstack, "-fPIC");
@@ -319,6 +321,8 @@ generate_target_offloadend_file (const char 
*target_compiler)
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
   obstack_ptr_grow (&argv_obstack, target_compiler);
+  if (verbose)
+obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-c");
   obstack_ptr_grow (&argv_obstack, "-shared");
   obstack_ptr_grow (&argv_obstack, "-fPIC");
@@ -380,9 +384,11 @@ generate_host_descr_file (const char *host_compiler)
   fclose (src_file);
 
   unsigned new_argc = 0;
-#define NEW_ARGC_MAX 9
+#define NEW_ARGC_MAX 10
   const char *new_argv[NEW_ARGC_MAX];
   new_argv[new_argc++] = host_compiler;
+  if (verbose)
+new_argv[new_argc++] = "-v";
   new_argv[new_argc++] = "-c";
   new_argv[new_argc++] = "-fPIC";
   new_argv[new_argc++] = "-shared";
@@ -429,6 +435,8 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
   obstack_ptr_grow (&argv_obstack, target_compiler);
+  if (verbose)
+obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-xlto");
   obstack_ptr_grow (&argv_obstack, "-shared");
   obstack_ptr_grow (&argv_obstack, "-fPIC");
@@ -448,7 +456,7 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   compile_for_target (&argv_obstack);
 
   unsigned objcopy_argc;
-#define OBJCOPY_ARGC_MAX 11
+#define OBJCOPY_ARGC_MAX 12
   const char *objcopy_argv[OBJCOPY_ARGC_MAX];
 
   /* Run objcopy.  */
@@ -457,6 +465,8 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   sprintf (rename_section_opt, ".data=%s", image_section_name);
   objcopy_argc = 0;
   objcopy_argv[objcopy_argc++] = "objcopy";
+  if (verbose)
+objcopy_argv[objcopy_argc++] = "-v";
   objcopy_argv[objcopy_argc++] = "-B";
   objcopy_argv[objcopy_argc++] = "i386";
   objcopy_argv[objcopy_argc++] = "-I";
@@ -510,6 +520,8 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
 
   objcopy_argc = 0;
   objcopy_argv[objcopy_argc++] = "objcopy";
+  if (verbose)
+objcopy_argv[objcopy_argc++] = "-v";
   objcopy_argv[objcopy_argc++] = target_so_filename;
   objcopy_argv[objcopy_argc++] = "--redefine-sym";
   objcopy_argv[objcopy_argc++] = opt_for_objcopy[0];
@@ -565,6 +577,8 @@ main (int argc, char **argv)
 "unrecognizable argument of option " STR);
}
 #undef STR
+  else if (strcmp (argv[i], "-v") == 0)
+   verbose = true;
 }
 
   const char *target_so_filename
@@ -573,13 +587,15 @@ main (int argc, char **argv)
   const char *host_descr_filename = generate_host_descr_file (host_compiler);
 
   unsigned new_argc;
-#define NEW_ARGC_MAX 9
+#define NEW_ARGC_MAX 10
   const char *new_argv[NEW_ARGC_MAX];
 
   /* Perform partial linking for the target image and host side descriptor.
  As a result we'll get a f

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-28 Thread Thomas Schwinge
Hi!

On Wed, 22 Oct 2014 22:57:01 +0400, Ilya Verbin  wrote:
> On 22 Oct 09:57, Jakub Jelinek wrote:
> > On Wed, Oct 22, 2014 at 02:30:44AM +0400, Ilya Verbin wrote:
> > > +  obstack_init (&argv_obstack);
> > > +  obstack_ptr_grow (&argv_obstack, "objcopy");
> > > +  obstack_ptr_grow (&argv_obstack, target_so_filename);
> > > +  obstack_ptr_grow (&argv_obstack, "--redefine-sym");
> > > +  obstack_ptr_grow (&argv_obstack, opt_for_objcopy[0]);
> > > +  obstack_ptr_grow (&argv_obstack, "--redefine-sym");
> > > +  obstack_ptr_grow (&argv_obstack, opt_for_objcopy[1]);
> > > +  obstack_ptr_grow (&argv_obstack, "--redefine-sym");
> > > +  obstack_ptr_grow (&argv_obstack, opt_for_objcopy[2]);
> > > +  obstack_ptr_grow (&argv_obstack, NULL);
> > > +  new_argv = XOBFINISH (&argv_obstack, char **);
> > 
> > Why do you use an obstack for an array of pointers where you know
> > you have exactly 9 pointers?  Wouldn't
> >   char *new_argv[9];
> > and just pointer assignments be better?
> 
> Yes, done.
> 
> > > +  /* Perform partial linking for the target image and host side 
> > > descriptor.
> > > + As a result we'll get a finalized object file with all offload 
> > > data.  */
> > > +  struct obstack argv_obstack;
> > > +  obstack_init (&argv_obstack);
> > > +  obstack_ptr_grow (&argv_obstack, "ld");
> > > +  if (target_ilp32)
> > > +{
> > > +  obstack_ptr_grow (&argv_obstack, "-m");
> > > +  obstack_ptr_grow (&argv_obstack, "elf_i386");
> > > +}
> > > +  obstack_ptr_grow (&argv_obstack, "-r");
> > > +  obstack_ptr_grow (&argv_obstack, host_descr_filename);
> > > +  obstack_ptr_grow (&argv_obstack, target_so_filename);
> > > +  obstack_ptr_grow (&argv_obstack, "-o");
> > > +  obstack_ptr_grow (&argv_obstack, out_obj_filename);
> > > +  obstack_ptr_grow (&argv_obstack, NULL);
> > > +  char **new_argv = XOBFINISH (&argv_obstack, char **);
> > 
> > Similarly (well, here it is not constant, still, you know small upper bound
> > and can just use some int index you ++ in each assignment.
> 
> Done.
> 
> > > +  /* Run objcopy on the resultant object file to localize generated 
> > > symbols
> > > + to avoid conflicting between different DSO and an executable.  */
> > > +  obstack_init (&argv_obstack);
> > > +  obstack_ptr_grow (&argv_obstack, "objcopy");
> > > +  obstack_ptr_grow (&argv_obstack, "-L");
> > > +  obstack_ptr_grow (&argv_obstack, symbols[0]);
> > > +  obstack_ptr_grow (&argv_obstack, "-L");
> > > +  obstack_ptr_grow (&argv_obstack, symbols[1]);
> > > +  obstack_ptr_grow (&argv_obstack, "-L");
> > > +  obstack_ptr_grow (&argv_obstack, symbols[2]);
> > > +  obstack_ptr_grow (&argv_obstack, out_obj_filename);
> > > +  obstack_ptr_grow (&argv_obstack, NULL);
> > > +  new_argv = XOBFINISH (&argv_obstack, char **);
> > > +  fork_execute (new_argv[0], new_argv, false);
> > > +  obstack_free (&argv_obstack, NULL);
> > 
> > Likewise.
> 
> Done.

After approval for "Use gcc/coretypes.h:enum offload_abi in mkoffloads",
,
I'd like to commit the following refactoring patch to trunk, in
preparation for another change:

commit 91fbe15ce2e539a4017f65cc167b362a4b4e4553
Author: Thomas Schwinge 
Date:   Tue Aug 4 14:06:39 2015 +0200

Refactor intelmic-mkoffload.c argv building

gcc/
* config/i386/intelmic-mkoffload.c (generate_host_descr_file)
(prepare_target_image, main): Refactor argv building.
---
 gcc/config/i386/intelmic-mkoffload.c |   88 +-
 1 file changed, 54 insertions(+), 34 deletions(-)

diff --git gcc/config/i386/intelmic-mkoffload.c 
gcc/config/i386/intelmic-mkoffload.c
index 8028584..8d5af0d 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -380,7 +380,8 @@ generate_host_descr_file (const char *host_compiler)
   fclose (src_file);
 
   unsigned new_argc = 0;
-  const char *new_argv[9];
+#define NEW_ARGC_MAX 9
+  const char *new_argv[NEW_ARGC_MAX];
   new_argv[new_argc++] = host_compiler;
   new_argv[new_argc++] = "-c";
   new_argv[new_argc++] = "-fPIC";
@@ -400,6 +401,8 @@ generate_host_descr_file (const char *host_compiler)
   new_argv[new_argc++] = "-o";
   new_argv[new_argc++] = obj_filename;
   new_argv[new_argc++] = NULL;
+  gcc_checking_assert (new_argc <= NEW_ARGC_MAX);
+#undef NEW_ARGC_MAX
 
   fork_execute (new_argv[0], CONST_CAST (char **, new_argv), false);
 
@@ -444,32 +447,37 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   obstack_ptr_grow (&argv_obstack, target_so_filename);
   compile_for_target (&argv_obstack);
 
+  unsigned objcopy_argc;
+#define OBJCOPY_ARGC_MAX 11
+  const char *objcopy_argv[OBJCOPY_ARGC_MAX];
+
   /* Run objcopy.  */
   char *rename_section_opt
 = XALLOCAVEC (char, sizeof (".data=") + strlen (image_section_name));
   sprintf (rename_section_opt, ".data=%s", image_section_name);
-  const char *objcopy_argv[11];
-  objcopy

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-28 Thread Kyrill Tkachov


On 25/09/15 21:03, Jeff Law wrote:

On 09/25/2015 05:06 AM, Kyrill Tkachov wrote:

Hi Rainer,

On 25/09/15 11:57, Rainer Orth wrote:

Hi Kyrill,


Bootstrapped and tested on aarch64 and x86_64.
Rainer, could you please try this patch in combination with the one I
sent
earlier at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

it took me quite a bit, but I've now regtested those two patches: with
them both applied, the sparc-sun-solaris2.12 build succeeds and the two
gcc.c-torture/execute/20071216-1.c failures are gone.

So, from a SPARC POV the patches are good to go.

Phew, thanks a lot!

So, in conclusion the patches I'd like approval for are:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01306.html
and
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

These are OK.  Thanks for taking the time to work with Rainer and sort
out the sparc issues.  It's greatly appreciated.


No problem, they were my regressions to fix after all, and it's easier to
fix now rather than in stage3/4.

I've committed them with r228194 and r228195.

Thanks,
Kyrill



Jeff





[Committed] Add gcc.dg/vect/pr62171.c

2015-09-28 Thread Tom de Vries

Hi,

this patch adds testcase gcc.dg/vect/pr62171.c.

The testcase passes thanks to the fix for PR67673.

Committed to trunk.

Thanks,
- Tom
Add gcc.dg/vect/pr62171.c

2015-09-28  Tom de Vries  

	* gcc.dg/vect/pr62171.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr62171.c | 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr62171.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr62171.c b/gcc/testsuite/gcc.dg/vect/pr62171.c
new file mode 100644
index 000..18517b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr62171.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-require-effective-target vect_double } */
+
+struct omp_data_i
+{
+  double *__restrict__ results;
+  double *__restrict__ pData;
+  double *__restrict__ coeff;
+};
+
+#define nEvents 100
+
+double __attribute__((noinline, noclone))
+f (struct omp_data_i *__restrict__ p, int argc)
+{
+
+  int idx;
+
+  for (idx = 0; idx < nEvents; idx++)
+((p->results))[idx] = (*(p->coeff)) * ((p->pData))[idx];
+
+  return ((p->results))[argc];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "versioned" "vect" } } */
-- 
1.9.1



Re: [Graphite] Redesign Graphite scop detection

2015-09-28 Thread Tobias Grosser

On 09/28/2015 03:30 AM, Aditya Kumar wrote:

From: hiraditya 

Redesign Graphite scop detection for faster compiler time and detecting more 
SCoPs.

Existing algorithm for SCoP detection in graphite was based on dominator tree
where a tree (CFG) traversal was required for analyzing an SESE. The tree
traversal is linear in the number of basic blocks and SCoP detection is
(probably) linear in number of instructions. That algorithm utilized a generic
infrastructure of SESE which does not directly represent loops.  With regards to
graphite framework, we are only interested in subtrees with loops. The new
algorithm is geared towards tree traversal on loop structure. The algorithm is
linear in number of loops which is faster than the previous algorithm.

Briefly, we start the traversal at a loop-nest and analyze it recursively for
validity. Once a valid loop is found we find a valid adjacent loop. If an
adjacent loop is found and is valid, we merge both loop nests otherwise we form
a SCoP from the previous loop nest, and resume the algorithm from the adjacent
loop nest. The data structure to represent an SESE is an ordered pair of edges
(entry, exit). The new algoritm can extend a SCoP in both the directions. With
this approach, the number of instructions to be analyzed for validity reduces to
a minimal set.  We start by analyzing those statements which are inside a loop,
because validity of those statements is necessary for the validity of loop. The
statements outside the loop nest can be just excluded from the SESE if they are
not valid.


I am generally fine with this, but please consider that when growing a SCoP 
certain
previous analysis may become invalid (an affine expression may suddenly become
non-affine as parameters that were previously scop-invariant may now be part of
the scop. Also, how are you planning to handle non-affine regions/loops. In 
polly
we can encapsulate non-affine loops and regions in bigger scops. To handle this,
I assume you would need to teach your patch to start growing regions even though
the innermost loops cannot be modeled precisely.

Best,
Tobias


[Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-28 Thread Senthil Kumar Selvaraj
Hi,

  The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
  pointer checks.

  The test fails for such targets (avr, in my case) because the address
  comparison in the below code does not resolve to a constant, causing
  builtin_constant_p to return false and fail the test.

  /* Variables and functions do not share same memory locations otherwise.  */
  if (!__builtin_constant_p ((void *)undef_fn0 == (void *)&undef_var0))
abort ();

  For targets that delete null pointer checks, the equality comparison 
expression
  is optimized away to 0, as the code in match.pd knows they can only be
  equal if they are both NULL, which cannot be true since
  flag-delete-null-pointer-checks is on.

  For targets that keep null pointer checks, 0 is a valid address and the 
comparison expression is left as is, and that causes a later pass to 
fold the builtin_constant_p to a false value, resulting in the test 
failure.

  If this is ok, could someone commit please? I don't have commit
  access.

Regards
Senthil

gcc/testsuite/ChangeLog

2015-09-28  Senthil Kumar Selvaraj  

* gcc.dg/addr_equal-1.c: Skip test if target keeps
null pointer checks.

diff --git gcc/testsuite/gcc.dg/addr_equal-1.c 
gcc/testsuite/gcc.dg/addr_equal-1.c
index 94499f0..957b03a 100644
--- gcc/testsuite/gcc.dg/addr_equal-1.c
+++ gcc/testsuite/gcc.dg/addr_equal-1.c
@@ -3,6 +3,7 @@
 /* { dg-require-weak "" } */
 /* { dg-require-alias "" } */
 /* { dg-options "-O2" } */
+/* { dg-skip-if "" keeps_null_pointer_checks } */
 void abort (void);
 extern int undef_var0, undef_var1;
 extern __attribute__ ((weak)) int weak_undef_var0;


Re: Use gcc/coretypes.h:enum offload_abi in mkoffloads

2015-09-28 Thread Thomas Schwinge
Hi!

On Tue, 4 Aug 2015 13:20:12 +0200, I wrote:
> On Thu, 8 Jan 2015 07:02:19 -0800, "H.J. Lu"  wrote:
> > On Thu, Jan 8, 2015 at 6:59 AM, Thomas Schwinge  
> > wrote:
> > > On Mon, 22 Dec 2014 12:28:20 +0100, Jakub Jelinek  
> > > wrote:
> > >> On Mon, Dec 22, 2014 at 12:25:32PM +0100, Thomas Schwinge wrote:
> > >> > On Wed, 22 Oct 2014 22:57:01 +0400, Ilya Verbin  
> > >> > wrote:
> > >> > > --- /dev/null
> > >> > > +++ b/gcc/config/i386/intelmic-mkoffload.c
> > >> > > @@ -0,0 +1,541 @@
> > >> > > +/* Offload image generation tool for Intel MIC devices.
> > >> >
> > >> > > +/* Shows if we should compile binaries for i386 instead of x86-64.  
> > >> > > */
> > >> > > +bool target_ilp32 = false;
> 
> Once the following refactoring to use gcc/coretypes.h:enum offload_abi in
> mkoffloads gets approved...
> 
> > Should we also handle x32?
> 
> ..., that should be more easy to do.  OK for trunk, once testing
> succeeds?
> 
> commit de4d7cbcf979edc095a48dff5b38d12846bdab6f
> Author: Thomas Schwinge 
> Date:   Tue Aug 4 13:12:36 2015 +0200
> 
> Use gcc/coretypes.h:enum offload_abi in mkoffloads

That one included unfortunate yet popular ;-) strcmp "typos":

> +#define STR "-foffload-abi="
> +  if (strncmp (argv[i], STR, strlen (STR)) == 0)
> + {
> +   if (strcmp (argv[i] + strlen (STR), "lp64"))
> + offload_abi = OFFLOAD_ABI_LP64;
> +   else if (strcmp (argv[i] + strlen (STR), "ilp32"))
> + offload_abi = OFFLOAD_ABI_ILP32;

..., so with these fixed up:

--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -544,9 +544,9 @@ main (int argc, char **argv)
 #define STR "-foffload-abi="
   if (strncmp (argv[i], STR, strlen (STR)) == 0)
{
- if (strcmp (argv[i] + strlen (STR), "lp64"))
+ if (strcmp (argv[i] + strlen (STR), "lp64") == 0)
offload_abi = OFFLOAD_ABI_LP64;
- else if (strcmp (argv[i] + strlen (STR), "ilp32"))
+ else if (strcmp (argv[i] + strlen (STR), "ilp32") == 0)
offload_abi = OFFLOAD_ABI_ILP32;
  else
fatal_error (input_location,
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -1013,9 +1013,9 @@ main (int argc, char **argv)
 #define STR "-foffload-abi="
   if (strncmp (argv[i], STR, strlen (STR)) == 0)
{
- if (strcmp (argv[i] + strlen (STR), "lp64"))
+ if (strcmp (argv[i] + strlen (STR), "lp64") == 0)
offload_abi = OFFLOAD_ABI_LP64;
- else if (strcmp (argv[i] + strlen (STR), "ilp32"))
+ else if (strcmp (argv[i] + strlen (STR), "ilp32") == 0)
offload_abi = OFFLOAD_ABI_ILP32;
  else
fatal_error (input_location,

..., I'll again propose the following patch for trunk:

commit 922278239a9d346ebde99e616185a91fbfaf
Author: Thomas Schwinge 
Date:   Tue Aug 4 13:12:36 2015 +0200

Use gcc/coretypes.h:enum offload_abi in mkoffloads

gcc/
* config/i386/intelmic-mkoffload.c (target_ilp32): Remove
variable, replacing it with...
(offload_abi): ... this new variable.  Adjust all users.
* config/nvptx/mkoffload.c (target_ilp32, offload_abi): Likewise.
---
 gcc/config/i386/intelmic-mkoffload.c |   90 +++---
 gcc/config/nvptx/mkoffload.c |   56 +++--
 2 files changed, 101 insertions(+), 45 deletions(-)

diff --git gcc/config/i386/intelmic-mkoffload.c 
gcc/config/i386/intelmic-mkoffload.c
index 4a7812c..8028584 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -42,8 +42,7 @@ int num_temps = 0;
 const int MAX_NUM_TEMPS = 10;
 const char *temp_files[MAX_NUM_TEMPS];
 
-/* Shows if we should compile binaries for i386 instead of x86-64.  */
-bool target_ilp32 = false;
+enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
 
 /* Delete tempfiles and exit function.  */
 void
@@ -200,10 +199,17 @@ out:
 static void
 compile_for_target (struct obstack *argv_obstack)
 {
-  if (target_ilp32)
-obstack_ptr_grow (argv_obstack, "-m32");
-  else
-obstack_ptr_grow (argv_obstack, "-m64");
+  switch (offload_abi)
+{
+case OFFLOAD_ABI_LP64:
+  obstack_ptr_grow (argv_obstack, "-m64");
+  break;
+case OFFLOAD_ABI_ILP32:
+  obstack_ptr_grow (argv_obstack, "-m32");
+  break;
+default:
+  abort ();
+}
   obstack_ptr_grow (argv_obstack, NULL);
   char **argv = XOBFINISH (argv_obstack, char **);
 
@@ -379,10 +385,17 @@ generate_host_descr_file (const char *host_compiler)
   new_argv[new_argc++] = "-c";
   new_argv[new_argc++] = "-fPIC";
   new_argv[new_argc++] = "-shared";
-  if (target_ilp32)
-new_argv[new_argc++] = "-m32";
-  else
-new_argv[new_argc++] = "-m64";
+  switch (offload_abi)
+{
+case OFFLOAD_ABI_LP64:
+  new_argv[new_argc++] = "-m64";
+  break;
+case OFFLOAD_ABI_ILP32:
+  new_argv[new_argc++] = "-m32";
+  break;
+default:
+  abort ();
+}
   new_