Re: [PATCH] Fix cond-expr handling in genmatch

2016-10-18 Thread Richard Biener
On Mon, 17 Oct 2016, Richard Biener wrote:

> 
> This fixes matching of toplevel (cond (lt @1 @2) ...) as reported by
> Bin to me privately.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This is what I applied.

Richard.

2016-10-18  Richard Biener  

* genmatch.c (dt_operand::gen_gimple_expr): Use get_name to
get at the operand to look at with TREE_OPERAND for generic
sub-nodes.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 241228)
+++ gcc/genmatch.c  (working copy)
@@ -2644,9 +2644,19 @@ dt_operand::gen_gimple_expr (FILE *f, in
  /* ???  If this is a memory operation we can't (and should not)
 match this.  The only sensible operand types are
 SSA names and invariants.  */
- fprintf_indent (f, indent,
- "tree %s = TREE_OPERAND (gimple_assign_rhs1 
(def), %i);\n",
- child_opname, i);
+ if (e->is_generic)
+   {
+ char opname[20];
+ get_name (opname);
+ fprintf_indent (f, indent,
+ "tree %s = TREE_OPERAND (%s, %i);\n",
+ child_opname, opname, i);
+   }
+ else
+   fprintf_indent (f, indent,
+   "tree %s = TREE_OPERAND "
+   "(gimple_assign_rhs1 (def), %i);\n",
+   child_opname, i);
  fprintf_indent (f, indent,
  "if ((TREE_CODE (%s) == SSA_NAME\n",
  child_opname);


Re: [PATCH] (PR 65950) Improve predicate for exit(0);

2016-10-18 Thread Jan Hubicka
> > Ah, so you have
> >
> > foo () { loop }
> > main()
> > {
> >   if ()
> >{
> >   foo ();
> >   exit (0);
> >}
> > ...
> >   return 0;
> > }
> >
> > and foo is marked cold because its only call is on the path to exit (0)?
> 
> 
> Actually the case I have here is just:
> foo () { loop }
> int main(void)
> {
> .
> foo();
> ...
> exit (0);
> }
> 
> Just an exit at the end of main.
> Basically if we treat exit(0) as a normal function, main becomes hot again.

Yep, it is old noreturn predicate lazynes.  Path is OK

Honza

> 
> Thanks,
> Andrew
> 
> >
> > noreturn prediction is quite aggressive but it works also quite well.  Given
> > you can only detect a very minor fraction of cases (consider exit (0) being
> > in foo itself) I'm not sure that your fix is good progress.  IMHO we might
> > want to switch from 'noreturn' to 'noreturn' + likely error which needs
> > another attribute / auto-discovery and IPA propagation to handle this case
> > better.
> >
> > Anyway, I'll leave the patch to Honza.
> >
> > Richard.
> >
> >> Thanks,
> >> Andrew
> >>
> >>>
> >>> Richard.
> >>>
>  OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>  Also tested it with SPEC CPU 2006 with no performance regressions.
> 
>  Thanks,
>  Andrew Pinski
> 
>  ChangeLog:
>  * predict.c (is_exit_with_zero_arg): New function.
>  (tree_bb_level_predictions): Don't consider paths leading to exit(0)
>  as nottaken.


Re: [PATCH] Simplify conditions in EVRP, handle taken edge

2016-10-18 Thread Richard Biener
On Mon, 17 Oct 2016, Richard Biener wrote:

> 
> This refactors propagation vs. substitution and handles condition
> simplification properly as well as passing a known taken edge down
> to the DOM walker (avoiding useless work and properly handling PHIs).
> 
> If we do all the work it's stupid to not fold away dead code...
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

The following is what I applied, also fixing a spelling mistake noticed
by Bernhard.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Handle
not visited but non-executable predecessors.  Return taken edge.
Simplify conditions and refactor propagation vs. folding step.

* gcc.dg/tree-ssa/pr20318.c: Disable EVRP.
* gcc.dg/tree-ssa/pr21001.c: Likewise.
* gcc.dg/tree-ssa/pr21090.c: Likewise.
* gcc.dg/tree-ssa/pr21294.c: Likewise.
* gcc.dg/tree-ssa/pr21563.c: Likewise.
* gcc.dg/tree-ssa/pr23744.c: Likewise.
* gcc.dg/tree-ssa/pr25382.c: Likewise.
* gcc.dg/tree-ssa/pr68431.c: Likewise.
* gcc.dg/tree-ssa/vrp03.c: Likewise.
* gcc.dg/tree-ssa/vrp06.c: Likewise.
* gcc.dg/tree-ssa/vrp07.c: Likewise.
* gcc.dg/tree-ssa/vrp09.c: Likewise.
* gcc.dg/tree-ssa/vrp19.c: Likewise.
* gcc.dg/tree-ssa/vrp20.c: Likewise.
* gcc.dg/tree-ssa/vrp92.c: Likewise.
* gcc.dg/pr68217.c: Likewise.
* gcc.dg/predict-9.c: Likewise.
* gcc.dg/tree-prof/val-prof-5.c: Adjust.
* gcc.dg/predict-1.c: Likewise.



Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241242)
+++ gcc/tree-vrp.c  (working copy)
@@ -10741,12 +10741,13 @@ evrp_dom_walker::before_dom_children (ba
   gimple_stmt_iterator gsi;
   edge e;
   edge_iterator ei;
-  bool has_unvisived_preds = false;
+  bool has_unvisited_preds = false;
 
   FOR_EACH_EDGE (e, ei, bb->preds)
-if (!(e->src->flags & BB_VISITED))
+if (e->flags & EDGE_EXECUTABLE
+   && !(e->src->flags & BB_VISITED))
   {
-   has_unvisived_preds = true;
+   has_unvisited_preds = true;
break;
   }
 
@@ -10756,7 +10757,7 @@ evrp_dom_walker::before_dom_children (ba
   gphi *phi = gpi.phi ();
   tree lhs = PHI_RESULT (phi);
   value_range vr_result = VR_INITIALIZER;
-  if (!has_unvisived_preds
+  if (!has_unvisited_preds
  && stmt_interesting_for_vrp (phi))
extract_range_from_phi_node (phi, &vr_result);
   else
@@ -10764,81 +10765,90 @@ evrp_dom_walker::before_dom_children (ba
   update_value_range (lhs, &vr_result);
 }
 
+  edge taken_edge = NULL;
+
   /* Visit all other stmts and discover any new VRs possible.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *stmt = gsi_stmt (gsi);
-  edge taken_edge;
   tree output = NULL_TREE;
   gimple *old_stmt = stmt;
   bool was_noreturn = (is_gimple_call (stmt)
   && gimple_call_noreturn_p (stmt));
 
-  /* TODO, if found taken_edge, we should visit (return it) and travel
-again to improve VR as done in DOM/SCCVN optimizations.  It should
-be done carefully as stmts might prematurely leave a BB like
-in EH.  */
-  if (stmt_interesting_for_vrp (stmt))
+  if (gcond *cond = dyn_cast  (stmt))
+   {
+ vrp_visit_cond_stmt (cond, &taken_edge);
+ if (taken_edge)
+   {
+ if (taken_edge->flags & EDGE_TRUE_VALUE)
+   gimple_cond_make_true (cond);
+ else if (taken_edge->flags & EDGE_FALSE_VALUE)
+   gimple_cond_make_false (cond);
+ else
+   gcc_unreachable ();
+   }
+   }
+  else if (stmt_interesting_for_vrp (stmt))
{
+ edge taken_edge;
  value_range vr = VR_INITIALIZER;
  extract_range_from_stmt (stmt, &taken_edge, &output, &vr);
  if (output
  && (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE))
-   update_value_range (output, &vr);
- else
-   set_defs_to_varying (stmt);
-
- /* Try folding stmts with the VR discovered.  */
- bool did_replace
-   = replace_uses_in (stmt,
-  op_with_constant_singleton_value_range);
- if (fold_stmt (&gsi, follow_single_use_edges)
- || did_replace)
-   update_stmt (gsi_stmt (gsi));
-
- if (did_replace)
{
- /* If we cleaned up EH information from the statement,
-remove EH edges.  */
- if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-   bitmap_set_bit (need_eh_cleanup, bb->index);
-
- /* If we turned a not noreturn call into a noreturn one
-schedule it for fixup.  */
- if (!was_noreturn
- && is_g

Re: [PATCH] Clear BB_VISITED in bb-reorder

2016-10-18 Thread Richard Biener
On Mon, 17 Oct 2016, Andrew Pinski wrote:

> On Mon, Oct 17, 2016 at 5:26 AM, Richard Biener  wrote:
> >
> > $subject, applied as obvious.
> 
> I think you should do the same for the vectorizer too.  I noticed that
> when testing the patch for loop splitting.

Can't see where BB_VISITED is used by the vectorizer - can you point
me to that?

Thanks,
Richard.
 
> Thanks,
> Andrew
> 
> >
> > Richard.
> >
> > 2016-10-17  Richard Biener  
> >
> > * bb-reorder.c (reorder_basic_blocks_simple): Clear BB_VISITED
> > before using it.
> >
> > Index: gcc/bb-reorder.c
> > ===
> > --- gcc/bb-reorder.c(revision 241228)
> > +++ gcc/bb-reorder.c(working copy)
> > @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
> >   To start with, everything points to itself, nothing is assigned yet.  
> > */
> >
> >FOR_ALL_BB_FN (bb, cfun)
> > -bb->aux = bb;
> > +{
> > +  bb->aux = bb;
> > +  bb->flags &= ~BB_VISITED;
> > +}
> >
> >EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
> >


[PATCH] Fix BB_VISITED clearing in IRA, remove substitue-and-fold dce flag

2016-10-18 Thread Richard Biener

This fixes the BB_VISITED bug in IRA I ran into earlier this year, 
removing the superfluous clearing in VRP and the SSA propagator as well
as removing the now always true do_dce flag from substitute-and-fold.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-10-18  Richard Biener  

* tree-ssa-propagate.h (substitute_and_fold): Adjust prototype.
* tree-ssa-propagate.c (ssa_prop_fini): Remove final BB_VISITED
clearing.
(substitute_and_fold_dom_walker): Adjust constructor.
(substitute_and_fold_dom_walker::before_dom_children): Remove
do_dce flag and handling (always true).
(substitute_and_fold): Likewise.
* tree-vrp.c (vrp_finalize): Adjust.
(execute_early_vrp): Remove final BB_VISITED clearing.
* tree-ssa-ccp.c (ccp_finalize): Adjust.
* tree-ssa-copy.c (fini_copy_prop): Likewise.
* ira.c (ira): Call clear_bb_flags.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241294)
+++ gcc/tree-vrp.c  (working copy)
@@ -10622,8 +10622,7 @@ vrp_finalize (bool warn_array_bounds_p)
  vr_value[i]->max);
   }
 
-  substitute_and_fold (op_with_constant_singleton_value_range,
-  vrp_fold_stmt, true);
+  substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt);
 
   if (warn_array_bounds && warn_array_bounds_p)
 check_all_array_refs ();
@@ -10954,8 +10953,6 @@ execute_early_vrp ()
   vrp_free_lattice ();
   scev_finalize ();
   loop_optimizer_finalize ();
-  FOR_EACH_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
   return 0;
 }
 
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  (revision 241294)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -953,8 +953,7 @@ ccp_finalize (bool nonzero_p)
 }
 
   /* Perform substitutions based on the known constant values.  */
-  something_changed = substitute_and_fold (get_constant_value,
-  ccp_fold_stmt, true);
+  something_changed = substitute_and_fold (get_constant_value, ccp_fold_stmt);
 
   free (const_val);
   const_val = NULL;
Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 241294)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -479,9 +479,6 @@ ssa_prop_fini (void)
   free (cfg_order_to_bb);
   BITMAP_FREE (ssa_edge_worklist);
   uid_to_stmt.release ();
-  basic_block bb;
-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
-bb->flags &= ~BB_VISITED;
 }
 
 
@@ -972,10 +969,9 @@ class substitute_and_fold_dom_walker : p
 public:
 substitute_and_fold_dom_walker (cdi_direction direction,
ssa_prop_get_value_fn get_value_fn_,
-   ssa_prop_fold_stmt_fn fold_fn_,
-   bool do_dce_)
+   ssa_prop_fold_stmt_fn fold_fn_)
: dom_walker (direction), get_value_fn (get_value_fn_),
-  fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false)
+  fold_fn (fold_fn_), something_changed (false)
 {
   stmts_to_remove.create (0);
   stmts_to_fixup.create (0);
@@ -993,7 +989,6 @@ public:
 
 ssa_prop_get_value_fn get_value_fn;
 ssa_prop_fold_stmt_fn fold_fn;
-bool do_dce;
 bool something_changed;
 vec stmts_to_remove;
 vec stmts_to_fixup;
@@ -1012,8 +1007,7 @@ substitute_and_fold_dom_walker::before_d
   tree res = gimple_phi_result (phi);
   if (virtual_operand_p (res))
continue;
-  if (do_dce
- && res && TREE_CODE (res) == SSA_NAME)
+  if (res && TREE_CODE (res) == SSA_NAME)
{
  tree sprime = get_value_fn (res);
  if (sprime
@@ -1039,8 +1033,7 @@ substitute_and_fold_dom_walker::before_d
   /* No point propagating into a stmt we have a value for we
  can propagate into all uses.  Mark it for removal instead.  */
   tree lhs = gimple_get_lhs (stmt);
-  if (do_dce
- && lhs && TREE_CODE (lhs) == SSA_NAME)
+  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
  tree sprime = get_value_fn (lhs);
  if (sprime
@@ -1180,8 +1173,7 @@ substitute_and_fold_dom_walker::before_d
 
 bool
 substitute_and_fold (ssa_prop_get_value_fn get_value_fn,
-ssa_prop_fold_stmt_fn fold_fn,
-bool do_dce)
+ssa_prop_fold_stmt_fn fold_fn)
 {
   gcc_assert (get_value_fn);
 
@@ -1192,7 +1184,7 @@ substitute_and_fold (ssa_prop_get_value_
 
   calculate_dominance_info (CDI_DOMINATORS);
   substitute_and_fold_dom_walker walker(CDI_DOMINATORS,
-   get_value_fn, fold_fn, do_dce);
+   get_value_fn, fold_fn);
   walker.walk (ENT

Re: [PATCH] rs6000: Fix separate shrink-wrapping for TARGET_MULTIPLE

2016-10-18 Thread Segher Boessenkool
On Tue, Oct 18, 2016 at 12:17:32AM +0100, Iain Sandoe wrote:
> > Bootstrapped and tested on powerpc64-linux {-m64,-m32}.  I'll commit it
> > if Iain's testing (on darwin) also succeeds.
> 
> thanks!
> 
> All-langs bootstrap was restored with the patch (and others in progress for 
> existing known issues); 
> 
> I can’t see any evidence of the assert firing in the test-suite, so it all 
> looks good to me (there’s quite a bit of stage-1-ish testsuite noise at 
> present, however).

Thanks for testing!  I committed the slightly simpler patch below.


Segher


2016-10-18  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_savres_strategy): Do not select
{SAVE,REST}_MULTIPLE if shrink-wrapping separate components.
(rs6000_get_separate_components): Assert we do not have those
strategies selected.

---
 gcc/config/rs6000/rs6000.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 613af48..1b67592 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25511,7 +25511,10 @@ rs6000_savres_strategy (rs6000_stack_t *info,
   if (TARGET_MULTIPLE
   && !TARGET_POWERPC64
   && !(TARGET_SPE_ABI && info->spe_64bit_regs_used)
-  && info->first_gp_reg_save < 31)
+  && info->first_gp_reg_save < 31
+  && !(flag_shrink_wrap
+  && flag_shrink_wrap_separate
+  && optimize_function_for_speed_p (cfun)))
 {
   /* Prefer store multiple for saves over out-of-line routines,
 since the store-multiple instruction will always be smaller.  */
@@ -27445,6 +27448,9 @@ rs6000_get_separate_components (void)
   sbitmap components = sbitmap_alloc (32);
   bitmap_clear (components);
 
+  gcc_assert (!(info->savres_strategy & SAVE_MULTIPLE)
+ && !(info->savres_strategy & REST_MULTIPLE));
+
   /* The GPRs we need saved to the frame.  */
   if ((info->savres_strategy & SAVE_INLINE_GPRS)
   && (info->savres_strategy & REST_INLINE_GPRS))
-- 
1.9.3



Re: Go patch committed: copy print code from Go 1.7 runtime

2016-10-18 Thread Uros Bizjak
Hello!

> This patch copies the code that implements the print and println
> predeclared functions from the Go 1.7 runtime.  The compiler is
> changed to use the new names, and to call the printlock and
> printunlock functions around a sequence of print calls.  The writebuf
> field in the g struct changes to a slice.  Bootstrapped and ran Go
> testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

This patch probably introduced recent regression on 32bit x86 multilib:

Running target unix/-m32
FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
FAIL: go.test/test/printbig.go -O (test for excess errors)
FAIL: go.test/test/printbig.go execution

=== go Summary for unix/-m32 ===

# of expected passes6875
# of unexpected failures3
# of expected failures  1
# of untested testcases 12
# of unsupported tests  2

e.g.:

/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:27:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:45:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:38:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:56:
error: integer constant overflow

FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
UNTESTED: go.test/test/fixedbugs/bug114.go execution,  -O2 -g

FAIL: go.test/test/printbig.go -O (test for excess errors)
Excess errors:
/home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:12:8: error:
integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:13:15:
error: integer constant overflow

./printbig.exe >printbig.p 2>&1
couldn't execute "./printbig.exe": no such file or directory
FAIL: go.test/test/printbig.go execution
UNTESTED: go.test/test/printbig.go compare

Uros.


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Richard Biener
On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
 wrote:
>
> Richard Biener writes:
>
>> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
>>  wrote:
>>> Hi,
>>>
>>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
>>>   same issue on a gcc 5.x and found that the fix didn't get backported.
>>>
>>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
>>>   backport to gcc-5-branch?
>>
>> Ok with me but please double-check there was no fallout.
>
> I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> No regressions.

I meant fallout only fixed with followup patches.  ISTR some in that area
but I might confuse it with another patch.  Marek might remember.

Richard.

> I'll run the reg tests against arm-none-eabi. Can I commit it if that
> passes?
>
> Regards
> Senthil
>>
>> Richard.
>>
>>> Regards
>>> Senthil
>>>
>>> gcc/c/ChangeLog
>>>
>>> 2016-10-17  Senthil Kumar Selvaraj  
>>>
>>>   Backport from mainline
>>> 2015-04-25  Marek Polacek  
>>> PR c/52085
>>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
>>> "mode"
>>> attribute.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2016-10-17  Senthil Kumar Selvaraj  
>>>
>>> Backport from mainline
>>> 2015-04-25  Marek Polacek  
>>> PR c/52085
>>> * gcc.dg/enum-incomplete-2.c: New test.
>>> * gcc.dg/enum-mode-1.c: New test.
>>>
>>>
>>> diff --git gcc/c/c-decl.c gcc/c/c-decl.c
>>> index d1e7444..c508e7f 100644
>>> --- gcc/c/c-decl.c
>>> +++ gcc/c/c-decl.c
>>> @@ -8050,7 +8050,7 @@ finish_enum (tree enumtype, tree values, tree 
>>> attributes)
>>>
>>>/* If the precision of the type was specified with an attribute and it
>>>   was too small, give an error.  Otherwise, use it.  */
>>> -  if (TYPE_PRECISION (enumtype))
>>> +  if (TYPE_PRECISION (enumtype) && lookup_attribute ("mode", attributes))
>>>  {
>>>if (precision > TYPE_PRECISION (enumtype))
>>> {
>>> @@ -8078,6 +8078,7 @@ finish_enum (tree enumtype, tree values, tree 
>>> attributes)
>>>TYPE_MIN_VALUE (enumtype) = TYPE_MIN_VALUE (tem);
>>>TYPE_MAX_VALUE (enumtype) = TYPE_MAX_VALUE (tem);
>>>TYPE_UNSIGNED (enumtype) = TYPE_UNSIGNED (tem);
>>> +  TYPE_ALIGN (enumtype) = TYPE_ALIGN (tem);
>>>TYPE_SIZE (enumtype) = 0;
>>>TYPE_PRECISION (enumtype) = TYPE_PRECISION (tem);
>>>
>>> diff --git gcc/testsuite/gcc.dg/enum-incomplete-2.c 
>>> gcc/testsuite/gcc.dg/enum-incomplete-2.c
>>> new file mode 100644
>>> index 000..5970551
>>> --- /dev/null
>>> +++ gcc/testsuite/gcc.dg/enum-incomplete-2.c
>>> @@ -0,0 +1,41 @@
>>> +/* PR c/52085 */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "" } */
>>> +
>>> +#define SA(X) _Static_assert((X),#X)
>>> +
>>> +enum e1;
>>> +enum e1 { A } __attribute__ ((__packed__));
>>> +enum e2 { B } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e1) == sizeof (enum e2));
>>> +SA (_Alignof (enum e1) == _Alignof (enum e2));
>>> +
>>> +enum e3;
>>> +enum e3 { C = 256 } __attribute__ ((__packed__));
>>> +enum e4 { D = 256 } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e3) == sizeof (enum e4));
>>> +SA (_Alignof (enum e3) == _Alignof (enum e4));
>>> +
>>> +enum e5;
>>> +enum e5 { E = __INT_MAX__ } __attribute__ ((__packed__));
>>> +enum e6 { F = __INT_MAX__ } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e5) == sizeof (enum e6));
>>> +SA (_Alignof (enum e5) == _Alignof (enum e6));
>>> +
>>> +enum e7;
>>> +enum e7 { G } __attribute__ ((__mode__(__byte__)));
>>> +enum e8 { H } __attribute__ ((__mode__(__byte__)));
>>> +SA (sizeof (enum e7) == sizeof (enum e8));
>>> +SA (_Alignof (enum e7) == _Alignof (enum e8));
>>> +
>>> +enum e9;
>>> +enum e9 { I } __attribute__ ((__packed__, __mode__(__byte__)));
>>> +enum e10 { J } __attribute__ ((__packed__, __mode__(__byte__)));
>>> +SA (sizeof (enum e9) == sizeof (enum e10));
>>> +SA (_Alignof (enum e9) == _Alignof (enum e10));
>>> +
>>> +enum e11;
>>> +enum e11 { K } __attribute__ ((__mode__(__word__)));
>>> +enum e12 { L } __attribute__ ((__mode__(__word__)));
>>> +SA (sizeof (enum e11) == sizeof (enum e12));
>>> +SA (_Alignof (enum e11) == _Alignof (enum e12));
>>> diff --git gcc/testsuite/gcc.dg/enum-mode-1.c 
>>> gcc/testsuite/gcc.dg/enum-mode-1.c
>>> new file mode 100644
>>> index 000..a701123
>>> --- /dev/null
>>> +++ gcc/testsuite/gcc.dg/enum-mode-1.c
>>> @@ -0,0 +1,10 @@
>>> +/* { dg-do compile } */
>>> +
>>> +enum e1 { A = 256 } __attribute__((__mode__(__byte__))); /* { dg-error 
>>> "specified mode too small for enumeral values" } */
>>> +enum e2 { B = 256 } __attribute__((__packed__, __mode__(__byte__))); /* { 
>>> dg-error "specified mode too small for enumeral values" } */
>>> +
>>> +enum e3 { C = __INT_MAX__ } __attribute__((__mode__(__QI__))); /* { 
>>> dg-error "specified mode too small for enumeral values" } */
>>> +enum e4 { D = __INT_MAX__ } __attribute__((__packed__, __mode__(__QI__))); 
>>> /* { dg-error "specified 

Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Marek Polacek
On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>  wrote:
> >
> > Richard Biener writes:
> >
> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
> >>  wrote:
> >>> Hi,
> >>>
> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
> >>>
> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
> >>>   backport to gcc-5-branch?
> >>
> >> Ok with me but please double-check there was no fallout.
> >
> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> > No regressions.
> 
> I meant fallout only fixed with followup patches.  ISTR some in that area
> but I might confuse it with another patch.  Marek might remember.

I don't remember any fallout here (and a quick look at the ML around that
time doesn't reveal any).

Marek


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>  wrote:
> >
> > Richard Biener writes:
> >
> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
> >>  wrote:
> >>> Hi,
> >>>
> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
> >>>
> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
> >>>   backport to gcc-5-branch?
> >>
> >> Ok with me but please double-check there was no fallout.
> >
> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> > No regressions.
> 
> I meant fallout only fixed with followup patches.  ISTR some in that area
> but I might confuse it with another patch.  Marek might remember.

I'm not convinced it is desirable to backport such changes, it affects ABI,
people are used to deal with minor ABI changes in between major GCC
releases, but we'd need a strong reason to change it between minor releases.

> >>> 2016-10-17  Senthil Kumar Selvaraj  
> >>>
> >>>   Backport from mainline
> >>> 2015-04-25  Marek Polacek  
> >>> PR c/52085
> >>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
> >>> "mode"
> >>> attribute.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>> 2016-10-17  Senthil Kumar Selvaraj  
> >>>
> >>> Backport from mainline
> >>> 2015-04-25  Marek Polacek  
> >>> PR c/52085
> >>> * gcc.dg/enum-incomplete-2.c: New test.
> >>> * gcc.dg/enum-mode-1.c: New test.

Jakub


Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Richard Biener
On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
>>
>> Mike Stump:
>>> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
 Therefore, it is better to emit it in all circumstances, in case the 
 reader needs to know what the working
 directory was at compile-time.
>>>
>>> I can't help but wonder if this would break ccache some?
>>>
>>
>> Could you explain this in some more detail? At the moment, GCC will already 
>> emit DW_AT_name with an absolute path (in the scenario that this patch is 
>> relevant to). How does ccache work around this at the moment? (Does it use 
>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
>> patch should be fine.)
>
> If you compile the same file, but in a different directory, I wonder if cwd 
> will cause the cache entry to not be reused.
>
> I expect one of the ccache people that are around would just know if it will 
> care at all.

I believe ccache compares preprocessed source, definitely _not_ DWARF
output, so this shouldn't break anything there.
It might result in different object file output but as the reporter
figured due to a bug in dwarf2out.c we end up generating
DW_AT_comp_dir in almost all cases already.

I think the patch is ok but it misses a ChangeLog entry.  How did you
test the patch? (bootstrapped and tested on ...?)

Thanks,
Richard.


Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Richard Biener
On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:

> On 18 October 2016 at 02:46, Jeff Law  wrote:
> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> >>
> >> This patch is mostly the same as previous one, except it drops
> >> targeting __udivmoddi4() because it gave undefined reference link
> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
> >> needed for the target (it was a bug in my patch that called
> >> __udivmoddi4() even though aarch64 supported hardware div).
> >
> > This touches on the one high level question I had.  Namely what is the code
> > generation strategy if the hardware has a div, but not divmod.
> The divmod transform isn't enabled if target supports hardware div in the same
> or wider mode even if divmod libfunc is available for the given mode.
> >
> > ISTM in that case I think we want to use the div instruction and synthesize
> > mod from that result rather than relying on a software divmod.  So it looks
> > like you ought to be doing the right thing for that case now based on your
> > comment above.
> >>
> >>
> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
> >> will be available for a target if it doesn't have hardware div and
> >> divmod insn and doesn't have target-specific libfunc for DImode
> >> divmod ? To be conservative, the attached patch doesn't generate call
> >> to __udivmoddi4.
> >
> > I don't think that's a safe assumption.  Addition of the divmod routines
> > into libgcc is controlled by the target and have to be explicitly added
> > AFAICT.
> >
> > So on a target like the fr30 which has no div or mod insn and doesn't define
> > the right bits in libgcc, there is no divmod libcall available. (On these
> > targets there's a div libcall and a mod libcall, but not a combined one).
> Thanks. I had erroneously  assumed __udivimoddi4() was available for all 
> targets
> because it was defined in libgcc2.c and generated call to it as "last
> resort" for unsigned
> DImode case if target didn't support hardware div and divmod insn and
> didn't have
> target-specific divmod libfunc for DImode.
> The reason why it generated undefined reference on aarch64-linux-gnu
> was because I
> didn't properly check in the patch if target supported hardware div,
> and ended up generating call to
> __udivmoddi4() even though aarch64 has hardware div. I rectified that
> before posting the
> patch.
> >
> > I don't even think we have a way of knowing in the compiler if the target
> > has enabled divmod support in libgcc.

Yeah, that's what bothers me with the current optab libfunc query
setup -- it isn't reliable.

> Well the arm and c6x backends register target-specific divmod libfunc via
> set_optab_libfunc(). I suppose that's sufficient to know if target has
> divmod enabled
> in libgcc ?
> >
> > ISTM that for now we have to limit to cases where we have a divmod
> > insn/libcall defined.
> Indeed. The transform is enabled only if the target has hardware divmod insn
> or divmod libfunc (in the libfunc case, no hardware div insn).
> Please see divmod_candidate_p() in the patch for cases when the
> transform gets enabled.

But after your patch the divmod libfunc is never available?

Richard.

> Thanks,
> Prathamesh
> >
> > jeff
> >
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Prathamesh Kulkarni
On 18 October 2016 at 13:55, Richard Biener  wrote:
> On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 18 October 2016 at 02:46, Jeff Law  wrote:
>> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> >>
>> >> This patch is mostly the same as previous one, except it drops
>> >> targeting __udivmoddi4() because it gave undefined reference link
>> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
>> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
>> >> needed for the target (it was a bug in my patch that called
>> >> __udivmoddi4() even though aarch64 supported hardware div).
>> >
>> > This touches on the one high level question I had.  Namely what is the code
>> > generation strategy if the hardware has a div, but not divmod.
>> The divmod transform isn't enabled if target supports hardware div in the 
>> same
>> or wider mode even if divmod libfunc is available for the given mode.
>> >
>> > ISTM in that case I think we want to use the div instruction and synthesize
>> > mod from that result rather than relying on a software divmod.  So it looks
>> > like you ought to be doing the right thing for that case now based on your
>> > comment above.
>> >>
>> >>
>> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> >> will be available for a target if it doesn't have hardware div and
>> >> divmod insn and doesn't have target-specific libfunc for DImode
>> >> divmod ? To be conservative, the attached patch doesn't generate call
>> >> to __udivmoddi4.
>> >
>> > I don't think that's a safe assumption.  Addition of the divmod routines
>> > into libgcc is controlled by the target and have to be explicitly added
>> > AFAICT.
>> >
>> > So on a target like the fr30 which has no div or mod insn and doesn't 
>> > define
>> > the right bits in libgcc, there is no divmod libcall available. (On these
>> > targets there's a div libcall and a mod libcall, but not a combined one).
>> Thanks. I had erroneously  assumed __udivimoddi4() was available for all 
>> targets
>> because it was defined in libgcc2.c and generated call to it as "last
>> resort" for unsigned
>> DImode case if target didn't support hardware div and divmod insn and
>> didn't have
>> target-specific divmod libfunc for DImode.
>> The reason why it generated undefined reference on aarch64-linux-gnu
>> was because I
>> didn't properly check in the patch if target supported hardware div,
>> and ended up generating call to
>> __udivmoddi4() even though aarch64 has hardware div. I rectified that
>> before posting the
>> patch.
>> >
>> > I don't even think we have a way of knowing in the compiler if the target
>> > has enabled divmod support in libgcc.
>
> Yeah, that's what bothers me with the current optab libfunc query
> setup -- it isn't reliable.
>
>> Well the arm and c6x backends register target-specific divmod libfunc via
>> set_optab_libfunc(). I suppose that's sufficient to know if target has
>> divmod enabled
>> in libgcc ?
>> >
>> > ISTM that for now we have to limit to cases where we have a divmod
>> > insn/libcall defined.
>> Indeed. The transform is enabled only if the target has hardware divmod insn
>> or divmod libfunc (in the libfunc case, no hardware div insn).
>> Please see divmod_candidate_p() in the patch for cases when the
>> transform gets enabled.
>
> But after your patch the divmod libfunc is never available?
After the patch, the divmod libfunc is not available if the target
doesn't explicitly set it.
So by default optab_libfunc ([us]divmod_optab, mode) would return NULL
instead of bogus libfunc
constructed with gen_int_libfunc, but if the target has registered it
with set_optab_libfunc, then
the target-specific libfunc name would be returned.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > jeff
>> >
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH] Implement P0084R2, Emplace return type, for C++17

2016-10-18 Thread Christophe Lyon
Hi Jonathan,


On 17 October 2016 at 13:56, Jonathan Wakely  wrote:
> In C++17 the emplace_front and emplace_back members return a
> reference. There isn't a very neat way to implement this, so it's just
> lots of conditional compilation.
>
> This isn't an ABI break, because these member functions are templates
> and so the return type is part of the mangled name. We can't apply it
> retroactively to older standards though, because it breaks code that
> is valid in C++14, such as:
>
>  void f(std::vector& v) { return v.emplace_back(1); }
>
>
> * doc/xml/manual/status_cxx2017.xml: Update status.
> * doc/html/*: Regenerate.
> * include/bits/deque.tcc (deque::emplace_front,
> deque::emplace_back):
> Return a reference in C++17 mode.
> * include/bits/forward_list.h (forward_list::emplace_front):
> Likewise.
> * include/bits/stl_bvector.h (vector::emplace_back): Likewise.
> * include/bits/stl_deque.h (deque::emplace_front,
> deque::emplace_back):
> Likewise.
> * include/bits/stl_list.h (list::emplace_front, list::emplace_back):
> Likewise.
> * include/bits/stl_queue.h (queue::emplace): Likewise.
> * include/bits/stl_stack.h (stack::emplace): Likewise.
> * include/bits/stl_vector.h (vector::emplace_back): Likewise.
> * include/bits/vector.tcc (vector::emplace_back): Likewise.
> * include/debug/deque (__gnu_debug::deque::emplace_front)
> (__gnu_debug::deque::emplace_back): Likewise.
> * include/debug/vector (__gnu_debug::vector::emplace_back):
> Likewise.
> * testsuite/23_containers/deque/modifiers/emplace/cxx17_return.cc:
> New.
> * testsuite/23_containers/forward_list/modifiers/
> emplace_cxx17_return.cc: New.
> * testsuite/23_containers/list/modifiers/emplace/cxx17_return.cc:
> New.
> * testsuite/23_containers/queue/members/emplace_cxx17_return.cc:
> New.
> * testsuite/23_containers/stack/members/emplace_cxx17_return.cc:
> New.
> * testsuite/23_containers/vector/bool/emplace_cxx17_return.cc: New.
> * testsuite/23_containers/vector/modifiers/emplace/cxx17_return.cc:
> New.
>
> Tested powerpc64le-linux, committed to trunk.
>

The new tests (except for
23_containers/forward_list/modifiers/emplace_cxx17_return.cc) fail on
arm-none-eabi using default cpu/fpu/arch/mode:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(future.o):
In function `__future_category_instance':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++11/future.cc:64:
undefined reference to `__sync_synchronize'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(locale.o):
In function `get_locale_cache_mutex':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++98/locale.cc:36:
undefined reference to `__sync_synchronize'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(locale_init.o):
In function `get_locale_mutex':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++98/locale_init.cc:66:
undefined reference to `__sync_synchronize'

Christophe


Re: [patch] Fix PHI optimization issue with boolean types

2016-10-18 Thread Richard Biener
On Tue, Oct 18, 2016 at 8:35 AM, Eric Botcazou  wrote:
> Hi,
>
> this is a regression present on the mainline and 6 branch: the compiler now
> generates wrong code for the attached testcase at -O because of an internal
> conflict about boolean types.  The sequence is as follows.  In .mergephi3:
>
>   boolean _22;
>   p__enum res;
>
>   :
>   if (_22 != 0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # res_17 = PHI <2(8), 0(9), 1(10)>
>
> is turned into:
>
> COND_EXPR in block 9 and PHI in block 11 converted to straightline code.
> PHI res_17 changed to factor conversion out from COND_EXPR.
> New stmt with CAST that defines res_17.
>
>   boolean _33;
>
>   :
>   # _33 = PHI <2(8), _22(9)>
>   res_17 = (p__enum) _33;
>
> [...]
>
>   :
>   if (res_17 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _12 = res_17 == 2;
>   _13 = (integer) _12
>
> in .phiopt1.  So boolean _33 can have value 2.  Later forwprop3 propagates _33
> into the uses of res_17:
>
>   :
>   if (_33 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _12 = _33 == 2;
>   _13 = (integer) _12;
>
> and DOM3 deduces:
>
>   :
>   _12 = 0;
>   _13 = 0;
>
> because it computes that _33 has value 1 in BB 13 since it's a boolean.
>
> The problem was introduced by the new factor_out_conditional_conversion:
>
>   /* If arg1 is an INTEGER_CST, fold it to new type.  */
>   if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
>   && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
> {
>   if (gimple_assign_cast_p (arg0_def_stmt))
> new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
>   else
> return NULL;
> }
>   else
> return NULL
>
> int_fits_type_p is documented as taking only INTEGER_TYPE, but is invoked
> on constant 2 and a BOOLEAN_TYPE and returns true.
>
> BOOLEAN_TYPE is special in Ada: it has precision 8 and range [0;255] so the
> outcome of int_fits_type_p is not unreasonable.  But this goes against the
> various transformations applied to boolean types in the compiler, which all
> assume that they can only take values 0 or 1.
>
> Hence the attached fix (which should be a no-op except for Ada), tested on
> x86_64-suse-linux, OK for mainline and 6 branch?

Hmm, I wonder if the patch is a good approach as you are likely only papering
over a single of possibly very many affected places (wi::fits_to_tree_p comes
immediately to my mind).  I suppose a better way would be for Ada to not
lie about those types and not use BOOLEAN_TYPE but INTEGER_TYPE.
Because BOOLEAN_TYPE types only have two values as documented in
tree.def:

/* Boolean type (true or false are the only values).  Looks like an
   INTEGRAL_TYPE.  */
DEFTREECODE (BOOLEAN_TYPE, "boolean_type", tcc_type, 0)

There are not many references to BOOLEAN_TYPE in ada/gcc-interface
thus it shouldn't be hard to change ... (looks like Ada might even prefer
ENUMERAL_TYPE here).

Thanks,
Richard.

>
> 2016-10-18  Eric Botcazou  
>
> * tree.c (int_fits_type_p): Accept only 0 and 1 for boolean types.
>
>
> 2016-10-18  Eric Botcazou  
>
> * gnat.dg/opt59.adb: New test.
> * gnat.dg/opt59_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou


Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

2016-10-18 Thread Christophe Lyon
Hi,


On 17 October 2016 at 18:47, Kyrill Tkachov  wrote:
>
> On 30/09/16 14:34, Bernd Edlinger wrote:
>>
>> On 09/30/16 12:14, Bernd Edlinger wrote:
>>>
>>> Eric Botcazou wrote:
>
> A comment before the SETs and a testcase would be nice.  IIRC
> we do have stack size testcases via using -fstack-usage.

 Or -Wstack-usage, which might be more appropriate here.
>>>
>>> Yes.  good idea.  I was not aware that we already have that kind of
>>> tests.
>>>
>>> When trying to write this test. I noticed, that I did not try -Os so far.
>>> But for -Os the stack is still the unchanged 3500 bytes.
>>>
>>> However for embedded targets I am often inclined to use -Os, and
>>> would certainly not expect the stack to explode...
>>>
>>> I see in arm.md there are places like
>>>
>>> /* If we're optimizing for size, we prefer the libgcc calls.  */
>>> if (optimize_function_for_size_p (cfun))
>>>   FAIL;
>>>
>> Oh, yeah.  The comment is completely misleading.
>>
>> If this pattern fails, expmed.c simply expands some
>> less efficient rtl, which also results in two shifts
>> and one or-op.  No libgcc calls at all.
>>
>> So in simple cases without spilling the resulting
>> assembler is the same, regardless if this pattern
>> fails or not.  But the half-defined out registers
>> make a big difference when it has to be spilled.
>>
>>> /* Expand operation using core-registers.
>>>'FAIL' would achieve the same thing, but this is a bit
>>> smarter.  */
>>> scratch1 = gen_reg_rtx (SImode);
>>> scratch2 = gen_reg_rtx (SImode);
>>> arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0],
>>> operands[1],
>>>operands[2], scratch1, scratch2);
>>>
>>>
>>> .. that explains why this happens.  I think it would be better to
>>> use the emit_coreregs for shift count >= 32, because these are
>>> effectively 32-bit shifts.
>>>
>>> Will try if that can be improved, and come back with the
>>> results.
>>>
>> The test case with -Os has 3520 bytes stack usage.
>> When only shift count >= 32 are handled we
>> have still 3000 bytes stack usage.
>> And when arm_emit_coreregs_64bit_shift is always
>> allowed to run, we have 2360 bytes stack usage.
>>
>> Also for the code size it is better not to fail this
>> pattern.  So I propose to remove this exception in all
>> three expansions.
>>
>> Here is an improved patch with the test case from the PR.
>> And a comment on the redundant SET why it is better to clear
>> the out register first.
>>
>>
>> Bootstrap and reg-testing on arm-linux-gnueabihf.
>> Is it OK for trunk?
>
>
> This looks ok to me.
> Thanks,
> Kyrill
>

I am seeing a lot of regressions since this patch was committed:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

(you can click on "REGRESSED" to see the list of regressions, "sum"
and "log" to download
the corresponding .sum/.log)

Thanks,

Christophe

>>
>> Thanks
>> Bernd.
>
>


Re: [PATCH, libfortran] PR 48587 Newunit allocator

2016-10-18 Thread Steven Bosscher
On Thu, Oct 13, 2016 at 5:16 PM, Janne Blomqvist wrote:
> +static bool *newunits;

You could make this a bitmap (like sbitmap). A bit more code but makes
a potentially quadratic search (when opening many units) less time
consuming.

Ciao!
Steven


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Senthil Kumar Selvaraj

Jakub Jelinek writes:

> On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
>> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>>  wrote:
>> >
>> > Richard Biener writes:
>> >
>> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
>> >>  wrote:
>> >>> Hi,
>> >>>
>> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
>> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
>> >>>
>> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
>> >>>   backport to gcc-5-branch?
>> >>
>> >> Ok with me but please double-check there was no fallout.
>> >
>> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
>> > No regressions.
>> 
>> I meant fallout only fixed with followup patches.  ISTR some in that area
>> but I might confuse it with another patch.  Marek might remember.
>
> I'm not convinced it is desirable to backport such changes, it affects ABI,
> people are used to deal with minor ABI changes in between major GCC
> releases, but we'd need a strong reason to change it between minor releases.

Hmm, I tracked this down from a (internal) bug reported on arm-none-eabi, where 
the
inconsistent enum size (used in sizeof to malloc) was eventually causing
heap corruption. 

When debugging the issue, I noticed you'd already backported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69669, so I thought this
should be good.

Regards
Senthil

>
>> >>> 2016-10-17  Senthil Kumar Selvaraj  
>> >>>
>> >>>   Backport from mainline
>> >>> 2015-04-25  Marek Polacek  
>> >>> PR c/52085
>> >>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
>> >>> "mode"
>> >>> attribute.
>> >>>
>> >>> gcc/testsuite/ChangeLog
>> >>> 2016-10-17  Senthil Kumar Selvaraj  
>> >>>
>> >>> Backport from mainline
>> >>> 2015-04-25  Marek Polacek  
>> >>> PR c/52085
>> >>> * gcc.dg/enum-incomplete-2.c: New test.
>> >>> * gcc.dg/enum-mode-1.c: New test.
>
>   Jakub



Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Senthil Kumar Selvaraj
Ping!

Regards
Senthil

Senthil Kumar Selvaraj writes:

> Bernd Schmidt writes:
>
>> On 09/16/2016 09:02 PM, Senthil Kumar Selvaraj wrote:
>>>   Does this make sense? I ran a reg test for the avr target with a
>>>   slightly older version of this patch, it did not show any regressions.
>>>   If this is the right fix, I'll make sure to run reg tests on x86_64
>>>   after backporting to a gcc version where that target used reload.
>>
>> It's hard to say, and could have different effects on different targets.
>> One thing though, at the very least the reg_class_size test would have 
>> to be adapted - the idea is to find the largest class, and there's a 
>> risk here of ending up with a large class that only has one valid register.
>
> Agreed - I've updated the patch to compute rclass sizes based on regno
> availability i.e., only if in_hard_reg_set_p and HARD_REGNO_MODE_OK, and
> then use the computed sizes when calculating best_size.
>
>>
>> You'll also want to verify this against
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54814
>
> Yes, this patch doesn't break the fix for PR54814. The change to
> in_hard_reg_set_p was what fixed that, and that remains unmodified.
>
> Reg tested this on top of trunk@190252 with the in_hard_reg_set_p
> backport. x86_64-pc-linux bootstrapped and regtested ok. avr showed
> no regressions either.
>
> Ok for trunk?
>
> Regards
> Senthil
>
> gcc/ChangeLog:
>
> 2016-10-13  Senthil Kumar Selvaraj  
>
>   * reload.c (find_valid_class_1): Allow regclass if atleast one
>   regno in class is ok. Compute and use rclass size based on
>   actually available regnos for mode in rclass.
>
> gcc/testsuite/ChangeLog:
>
> 2016-10-13  Senthil Kumar Selvaraj  
>   
>   * gcc.target/avr/pr71627.c: New.
>
>
> Index: gcc/reload.c
> ===
> --- gcc/reload.c  (revision 240989)
> +++ gcc/reload.c  (working copy)
> @@ -711,31 +711,36 @@
>enum reg_class best_class = NO_REGS;
>unsigned int best_size = 0;
>int cost;
> +  unsigned int computed_rclass_sizes[N_REG_CLASSES] = { 0 };
>  
>for (rclass = 1; rclass < N_REG_CLASSES; rclass++)
>  {
> -  int bad = 0;
> -  for (regno = 0; regno < FIRST_PSEUDO_REGISTER && !bad; regno++)
> - {
> -   if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno)
> -   && !HARD_REGNO_MODE_OK (regno, mode))
> - bad = 1;
> - }
> -  
> -  if (bad)
> - continue;
> +  int atleast_one_regno_ok = 0;
>  
> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +{
> +  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
> +{
> +  atleast_one_regno_ok = 1;
> +  if (HARD_REGNO_MODE_OK (regno, mode))
> +computed_rclass_sizes[rclass]++;
> +}
> +}
> +
> +  if (!atleast_one_regno_ok)
> +continue;
> +
>cost = register_move_cost (outer, (enum reg_class) rclass, dest_class);
>  
> -  if ((reg_class_size[rclass] > best_size
> -&& (best_cost < 0 || best_cost >= cost))
> -   || best_cost > cost)
> - {
> -   best_class = (enum reg_class) rclass;
> -   best_size = reg_class_size[rclass];
> -   best_cost = register_move_cost (outer, (enum reg_class) rclass,
> -   dest_class);
> - }
> +  if ((computed_rclass_sizes[rclass] > best_size
> + && (best_cost < 0 || best_cost >= cost))
> +|| best_cost > cost)
> +  {
> +best_class = (enum reg_class) rclass;
> +best_size = computed_rclass_sizes[rclass];
> +best_cost = register_move_cost (outer, (enum reg_class) rclass,
> +dest_class);
> +  }
>  }
>  
>gcc_assert (best_size != 0);
>
> Index: gcc/testsuite/gcc.target/avr/pr71627.c
> ===
> --- gcc/testsuite/gcc.target/avr/pr71627.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/avr/pr71627.c  (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +
> +extern volatile __memx const long  a, b, c, d, e, f;
> +extern volatile long result;
> +
> +extern void vfunc (const char*, ...);
> +
> +void foo (void)
> +{
> +   result = a + b + c + d + e + f;
> +   vfunc ("text", a, b, c, d, e, f, result);
> +}



[PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
The name of Etc/GMT+1 timezone is "-01", as evident from:

$ TZ=Etc/GMT+1 date +%Z
-01

Attached patch fixes the testsuite failure.

Uros.
diff --git a/libgo/go/time/time_test.go b/libgo/go/time/time_test.go
index b7ebb37..694e311 100644
--- a/libgo/go/time/time_test.go
+++ b/libgo/go/time/time_test.go
@@ -939,8 +939,8 @@ func TestLoadFixed(t *testing.T) {
// but Go and most other systems use "east is positive".
// So GMT+1 corresponds to -3600 in the Go zone, not +3600.
name, offset := Now().In(loc).Zone()
-   if name != "GMT+1" || offset != -1*60*60 {
-   t.Errorf("Now().In(loc).Zone() = %q, %d, want %q, %d", name, 
offset, "GMT+1", -1*60*60)
+   if name != "-01" || offset != -1*60*60 {
+   t.Errorf("Now().In(loc).Zone() = %q, %d, want %q, %d", name, 
offset, "-01", -1*60*60)
}
 }
 


Re: [PATCH] Fix PR77916

2016-10-18 Thread Christophe Lyon
On 18 October 2016 at 05:18, Markus Trippelsdorf  wrote:
> On 2016.10.18 at 05:13 +0200, Markus Trippelsdorf wrote:
>> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
>> > Hi,
>> >
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
>> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
>> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
>> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
>> > of pointer type.  This patch recognizes when pointer arithmetic is taking
>> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
>> > the case of the PR, this occurs in the logic where the stride S is a known
>> > constant value, but the same problem could occur when it is an SSA_NAME
>> > without known value.  Both possibilities are handled here.
>> >
>> > Fixing the code to ensure that the unknown stride case always uses an
>> > initializer for a negative increment allows us to remove the stopgap fix
>> > added for PR77937 as well.
>> >
>> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> > regressions, committed.
>>
>> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
>> X86_64 before committing these patches, because you broke it for the
>> third time in the last couple of days.
>>
>> markus@x4 ffmpeg % cat h264dsp.i
>> extern int fn2(int);
>> extern int fn3(int);
>> int a, b, c;
>> void fn1(long p1) {
>>   char *d;
>>   for (;; d += p1) {
>> d[0] = fn2(1 >> c >> 1);
>> fn2(c >> a);
>> d[1] = fn3(d[1]) >> 1;
>> d[6] = fn3(d[6] * b + 1) >> 1;
>> d[7] = fn3(d[7] * b + 1) >> 1;
>> d[8] = fn3(d[8] * b + 1) >> 1;
>>   }
>> }
>>
>> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
>> h264dsp.i: In function ‘fn1’:
>> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
>> gimple-ssa-strength-reduction.c:3375
>>  void fn1(long p1) {
>>   ^~~
>> 0x12773a9 replace_one_candidate
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
>> 0x127af77 replace_profitable_candidates
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
>> 0x127aeeb replace_profitable_candidates
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
>> 0x127f3ee analyze_candidates_and_replace
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
>> 0x127f3ee execute
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
>
> Just figured out that this testcase is identical to:
> gcc/testsuite/gcc.dg/torture/pr77937-2.c
>
> So please run the testsuite on X86_64 in the future.
>


I'm not sure whether Markus means that pr77937-2 fails since this commit?

I'm seeing ICEs on pr77937-2 on some arm targets:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241285/report-build-info.html

but I'm not 100% sure it was caused by this patch (the regression
happened between
r241273 and r241285), I haven't looked in details yet.

Christophe

> --
> Markus


Re: [PATCH] Fix PR77916

2016-10-18 Thread Markus Trippelsdorf
On 2016.10.18 at 11:19 +0200, Christophe Lyon wrote:
> On 18 October 2016 at 05:18, Markus Trippelsdorf  
> wrote:
> > On 2016.10.18 at 05:13 +0200, Markus Trippelsdorf wrote:
> >> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> >> > Hi,
> >> >
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> >> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> >> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> >> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> >> > of pointer type.  This patch recognizes when pointer arithmetic is taking
> >> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> >> > the case of the PR, this occurs in the logic where the stride S is a 
> >> > known
> >> > constant value, but the same problem could occur when it is an SSA_NAME
> >> > without known value.  Both possibilities are handled here.
> >> >
> >> > Fixing the code to ensure that the unknown stride case always uses an
> >> > initializer for a negative increment allows us to remove the stopgap fix
> >> > added for PR77937 as well.
> >> >
> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> > regressions, committed.
> >>
> >> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> >> X86_64 before committing these patches, because you broke it for the
> >> third time in the last couple of days.
> >>
> >> markus@x4 ffmpeg % cat h264dsp.i
> >> extern int fn2(int);
> >> extern int fn3(int);
> >> int a, b, c;
> >> void fn1(long p1) {
> >>   char *d;
> >>   for (;; d += p1) {
> >> d[0] = fn2(1 >> c >> 1);
> >> fn2(c >> a);
> >> d[1] = fn3(d[1]) >> 1;
> >> d[6] = fn3(d[6] * b + 1) >> 1;
> >> d[7] = fn3(d[7] * b + 1) >> 1;
> >> d[8] = fn3(d[8] * b + 1) >> 1;
> >>   }
> >> }
> >>
> >> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
> >> h264dsp.i: In function ‘fn1’:
> >> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
> >> gimple-ssa-strength-reduction.c:3375
> >>  void fn1(long p1) {
> >>   ^~~
> >> 0x12773a9 replace_one_candidate
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
> >> 0x127af77 replace_profitable_candidates
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
> >> 0x127aeeb replace_profitable_candidates
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
> >> 0x127f3ee analyze_candidates_and_replace
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
> >> 0x127f3ee execute
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
> >
> > Just figured out that this testcase is identical to:
> > gcc/testsuite/gcc.dg/torture/pr77937-2.c
> >
> > So please run the testsuite on X86_64 in the future.
> >
> 
> 
> I'm not sure whether Markus means that pr77937-2 fails since this
> commit?
> 
> I'm seeing ICEs on pr77937-2 on some arm targets:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241285/report-build-info.html
> 
> but I'm not 100% sure it was caused by this patch (the regression
> happened between r241273 and r241285), I haven't looked in details
> yet.

Yes, this is caused by this patch.

-- 
Markus


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2016 at 02:46:29PM +0530, Senthil Kumar Selvaraj wrote:
> > I'm not convinced it is desirable to backport such changes, it affects ABI,
> > people are used to deal with minor ABI changes in between major GCC
> > releases, but we'd need a strong reason to change it between minor releases.
> 
> Hmm, I tracked this down from a (internal) bug reported on arm-none-eabi, 
> where the
> inconsistent enum size (used in sizeof to malloc) was eventually causing
> heap corruption. 
> 
> When debugging the issue, I noticed you'd already backported
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69669, so I thought this
> should be good.

That one has been a regression, older GCCs handled it the same as does the 5
branch now.  Is that the case here?

Jakub


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Senthil Kumar Selvaraj

Jakub Jelinek writes:

> On Tue, Oct 18, 2016 at 02:46:29PM +0530, Senthil Kumar Selvaraj wrote:
>> > I'm not convinced it is desirable to backport such changes, it affects ABI,
>> > people are used to deal with minor ABI changes in between major GCC
>> > releases, but we'd need a strong reason to change it between minor 
>> > releases.
>> 
>> Hmm, I tracked this down from a (internal) bug reported on arm-none-eabi, 
>> where the
>> inconsistent enum size (used in sizeof to malloc) was eventually causing
>> heap corruption. 
>> 
>> When debugging the issue, I noticed you'd already backported
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69669, so I thought this
>> should be good.
>
> That one has been a regression, older GCCs handled it the same as does the 5
> branch now.  Is that the case here?

No, I can reproduce the bug on 4.9 as well. So not ok to backport then, I
guess?

Regards
Senthil


Re: [PATCH, libfortran] PR 48587 Newunit allocator

2016-10-18 Thread Janne Blomqvist
On Tue, Oct 18, 2016 at 12:09 PM, Steven Bosscher  wrote:
> On Thu, Oct 13, 2016 at 5:16 PM, Janne Blomqvist wrote:
>> +static bool *newunits;
>
> You could make this a bitmap (like sbitmap). A bit more code but makes
> a potentially quadratic search (when opening many units) less time
> consuming.

I did think about that, yes, but decided that it wasn't worth the
extra complexity since

a) The OS typically limits the number of fd's per process to a
relatively small number (typically 1024 by default).

b) For better or worse, in libgfortran a unit is a quite big
structure, not to mention the 8 kB buffer. So obsessing over wasting
an extra 7 bits per unit seemed pointless.

c) Due to the newunit_lwi, in many scenarios it should be able to skip
scanning over, if not all then at least most of, the in-use units. Of
course, it's possible to design a scenario which defeats the lwi, but,
is that something real software does? And even if it does, due to a)
above I think the effect would be quite modest anyway.



-- 
Janne Blomqvist


Re: [PATCH 1/7] make LABEL_REF_LABEL a rtx_insn *

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


+static inline void
+set_label_ref_label (rtx ref, rtx_insn *label)
+{
+  XCEXP (ref, 0, LABEL_REF) = label;
+}


I guess I have to ask for a brief function comment for this. Otherwise OK.


Bernd



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov

> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
> 
> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>>> attempted to scale down the register limit used by -fsched-pressure for the 
>>> case where the block in question executes as frequently as the entry block 
>>> to just the call_clobbered (i.e. call_used) regs. But the code is actually 
>>> scaling toward call_saved registers. The following patch corrects that by 
>>> computing call_saved regs per class and subtracting out some scaled portion 
>>> of that.
 
 Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>> Hi Pat,
>> 
>> I stared at your patch and current code for good 30 minutes, and I still 
>> don't see what is wrong with the current code.
>> 
>> With your patch the number of registers from class CL that scheduler has at 
>> its disposal for a single-basic-block function will be:
>> 
>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>> call_saved_regs_num[CL];
>> 
>> where call_saved_regs_num is number of registers in class CL that need to be 
>> saved in the prologue (i.e., "free" registers).  I can see some logic in 
>> setting
>> 
>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>> 
>> but not in subtracting number of such registers from the number of total 
>> available hard registers.
>> 
>> Am I missing something?
>> 
> 
> Your original patch gave the following reasoning:
> 
> "At the moment the scheduler does not account for spills in the prologues and 
> restores in the epilogue, which occur from use of call-used registers.  The 
> current state is, essentially, optimized for case when there is a hot loop 
> inside the function, and the loop executes significantly more often than the 
> prologue/epilogue.  However, on the opposite end, we have a case when the 
> function is just a single non-cyclic basic block, which executes just as 
> often as prologue / epilogue, so spills in the prologue hurt performance as 
> much as spills in the basic block itself.  In such a case the scheduler 
> should throttle-down on the number of available registers and try to not go 
> beyond call-clobbered registers."
> 
> But the misunderstanding is that call-used registers do NOT cause any 
> save/restore. That is to say, call-used == call-clobbered. Your last sentence 
> explains the goal for a single block function, to not go beyond 
> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
> patch implements that goal by subtracting out call_saved_regs_num (those that 
> require prolog/epilog save/restore) from the total regs, and using that as 
> the target # of registers to be used for the block.

I see your point and agree that current code isn't optimal.  However, I don't 
think your patch is accurate either.  Consider 
https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
that FIXED_REGISTERS in class CL is set for a third of the registers, and 
CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have a 
third available for zero-cost allocation (CALL_USED_REGISTERS-FIXED_REGISTERS), 
a third available for spill-cost allocation (ALL_REGISTERS-CALL_USED_REGISTERS) 
and a third non-available (FIXED_REGISTERS).

For a non-loop-single-basic-block function we should be targeting only the 
third of register available at zero-cost -- correct?  This is what is done by 
the current code, but, apparently, by accident.  It seems that the right 
register count can be obtained with:

  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
-   if (call_used_regs[ira_class_hard_regs[cl][i]])
- ++call_used_regs_num[cl];
+   if (!call_used_regs[ira_class_hard_regs[cl][i]]
+   || fixed_regs[ira_class_hard_regs[cl][i]])
+ ++call_saved_regs_num[cl];

Does this look correct to you?

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH 2/7] make tablejump_p return the label as a rtx_insn *

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


* cfgcleanup.c (merge_blocks_move_successor_nojumps): Adjust.
(outgoing_edges_match): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgrtl.c (try_redirect_by_replacing_jump): Likewise.
(rtl_tidy_fallthru_edge): Likewise.
* rtl.h (tablejump_p): Adjust prototype.
* rtlanal.c (tablejump_p): Return the label as a rtx_insn *.


Ok.


Bernd


Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov
> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkov  wrote:
> 
>> 
>> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
>> 
>> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
 The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
 attempted to scale down the register limit used by -fsched-pressure for 
 the case where the block in question executes as frequently as the entry 
 block to just the call_clobbered (i.e. call_used) regs. But the code is 
 actually scaling toward call_saved registers. The following patch corrects 
 that by computing call_saved regs per class and subtracting out some 
 scaled portion of that.
> 
> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>>> Hi Pat,
>>> 
>>> I stared at your patch and current code for good 30 minutes, and I still 
>>> don't see what is wrong with the current code.
>>> 
>>> With your patch the number of registers from class CL that scheduler has at 
>>> its disposal for a single-basic-block function will be:
>>> 
>>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>>> call_saved_regs_num[CL];
>>> 
>>> where call_saved_regs_num is number of registers in class CL that need to 
>>> be saved in the prologue (i.e., "free" registers).  I can see some logic in 
>>> setting
>>> 
>>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>>> 
>>> but not in subtracting number of such registers from the number of total 
>>> available hard registers.
>>> 
>>> Am I missing something?
>>> 
>> 
>> Your original patch gave the following reasoning:
>> 
>> "At the moment the scheduler does not account for spills in the prologues 
>> and restores in the epilogue, which occur from use of call-used registers.  
>> The current state is, essentially, optimized for case when there is a hot 
>> loop inside the function, and the loop executes significantly more often 
>> than the prologue/epilogue.  However, on the opposite end, we have a case 
>> when the function is just a single non-cyclic basic block, which executes 
>> just as often as prologue / epilogue, so spills in the prologue hurt 
>> performance as much as spills in the basic block itself.  In such a case the 
>> scheduler should throttle-down on the number of available registers and try 
>> to not go beyond call-clobbered registers."
>> 
>> But the misunderstanding is that call-used registers do NOT cause any 
>> save/restore. That is to say, call-used == call-clobbered. Your last 
>> sentence explains the goal for a single block function, to not go beyond 
>> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
>> patch implements that goal by subtracting out call_saved_regs_num (those 
>> that require prolog/epilog save/restore) from the total regs, and using that 
>> as the target # of registers to be used for the block.
> 
> I see your point and agree that current code isn't optimal.  However, I don't 
> think your patch is accurate either.  Consider 
> https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
> that FIXED_REGISTERS in class CL is set for a third of the registers, and 
> CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have 
> a third available for zero-cost allocation 
> (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
> allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
> (FIXED_REGISTERS).
> 
> For a non-loop-single-basic-block function we should be targeting only the 
> third of register available at zero-cost -- correct?  This is what is done by 
> the current code, but, apparently, by accident.  It seems that the right 
> register count can be obtained with:
> 
> for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
> - if (call_used_regs[ira_class_hard_regs[cl][i]])
> -   ++call_used_regs_num[cl];
> + if (!call_used_regs[ira_class_hard_regs[cl][i]]
> +   || fixed_regs[ira_class_hard_regs[cl][i]])
> +   ++call_saved_regs_num[cl];
> 
> Does this look correct to you?

Thinking some more, it seems like fixed_regs should not be available to the 
scheduler no matter what.  Therefore, the number of fixed registers should be 
subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
bb_freq).

--
Maxim Kuvyrkov
www.linaro.org



Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Bernd Schmidt

On 10/13/2016 08:57 AM, Senthil Kumar Selvaraj wrote:


2016-10-13  Senthil Kumar Selvaraj  

* reload.c (find_valid_class_1): Allow regclass if atleast one
regno in class is ok. Compute and use rclass size based on
actually available regnos for mode in rclass.

gcc/testsuite/ChangeLog:

2016-10-13  Senthil Kumar Selvaraj  

* gcc.target/avr/pr71627.c: New.


Index: gcc/reload.c
===
--- gcc/reload.c(revision 240989)
+++ gcc/reload.c(working copy)
@@ -711,31 +711,36 @@
   enum reg_class best_class = NO_REGS;
   unsigned int best_size = 0;
   int cost;
+  unsigned int computed_rclass_sizes[N_REG_CLASSES] = { 0 };


As far as I can tell you're only accessing this as 
computed_rclass_size[rclass], i.e. with the current class in the loop. 
So I don't think you need the array at all, just a computed_size 
variable in the loop?



+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+{
+  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
+{
+  atleast_one_regno_ok = 1;
+  if (HARD_REGNO_MODE_OK (regno, mode))
+computed_rclass_sizes[rclass]++;
+}
+}


Don't you want to also ensure HARD_REGNO_MODE_OK before claiming that 
atleast_one_regno_ok? Maybe I'm forgetting the motivation but this seems 
odd. If so, the variable becomes unnecessary, just check the computed size.



Bernd


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Segher Boessenkool
[ sorry for losing track of this patch ]

On Sun, Oct 09, 2016 at 10:32:51AM +0200, Eric Botcazou wrote:
> > Use "mode" instead of "Pmode" here?
> 
> No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.

I still don't see it, could you explain a bit more?


Segher


Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 07:06 PM, Alexander Monakov wrote:


I've just pushed two commits to the branch to fix this issue.  Before those, the
last commit left the branch in a state where an incremental build seemed ok
(because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
build was broken like you've shown.  LULESH is known to work.  I also intend to
perform a trunk merge soon.


Ok that did work, however...


I think before merging this work we'll need to have some idea of how well it
works on real-world code.


This patchset and the branch lay the foundation, there's more work to be
done, in particular on the performance improvements side. There should be
an agreement on these fundamental bits first, before moving on to fine-tuning.


The performance I saw was lower by a factor of 80 or so compared to 
their CUDA version, and even lower than OpenMP on the host. Does this 
match what you are seeing? Do you have a clear plan how this can be 
improved?


To me this kind of performance doesn't look like something that will be 
fixed by fine-tuning; it leaves me undecided whether the chosen approach 
(what you call the fundamentals) is viable at all. Performance is still 
better than the OpenACC version of the benchmark, but then I think we 
shouldn't repeat the mistakes we made with OpenACC and avoid merging 
something until we're sure it's ready and of benefit to users.



Bernd


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Eric Botcazou
> > No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.
> 
> I still don't see it, could you explain a bit more?

MODE is the mode of operands[1] before:

  operands[1] = force_const_mem (mode, operands[1]);

and after.  But the test is on the address of the MEM, not on the MEM itself:

  if (TARGET_TOC
  && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
  && use_toc_relative_ref (XEXP (operands[1], 0), Pmode))

because it's the mode of SYMBOL_REF we are interesting in (and force_const_mem 
guarantees that it's Pmode).  IOW you could theoretically have mode == SImode 
and we would still need to pass Pmode to use_toc_relative_ref (of course the 
whole thing is guarded with mode == Pmode so that's a little artificial).

-- 
Eric Botcazou


Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

 {
-  rtx r0, r16, eqv, tga, tp, insn, dest, seq;
+  rtx r0, r16, eqv, tga, tp, dest, seq;
+  rtx_insn *insn;

   switch (tls_symbolic_operand_type (x))
{
@@ -1025,66 +1026,70 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
machine_mode mode)
  break;

case TLS_MODEL_GLOBAL_DYNAMIC:
- start_sequence ();
+ {
+   start_sequence ();

- r0 = gen_rtx_REG (Pmode, 0);
- r16 = gen_rtx_REG (Pmode, 16);
- tga = get_tls_get_addr ();
- dest = gen_reg_rtx (Pmode);
- seq = GEN_INT (alpha_next_sequence_number++);
+   r0 = gen_rtx_REG (Pmode, 0);
+   r16 = gen_rtx_REG (Pmode, 16);
+   tga = get_tls_get_addr ();
+   dest = gen_reg_rtx (Pmode);
+   seq = GEN_INT (alpha_next_sequence_number++);

- emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
- insn = gen_call_value_osf_tlsgd (r0, tga, seq);
- insn = emit_call_insn (insn);
- RTL_CONST_CALL_P (insn) = 1;
- use_reg (&CALL_INSN_FUNCTION_USAGE (insn), r16);
+   emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
+   rtx val = gen_call_value_osf_tlsgd (r0, tga, seq);


Since this doesn't consistently declare variables at the point of 
initialization, might as well put val into the list of variables at the 
top, and avoid reindentation that way. There are several such reindented 
blocks, and the patch would be a lot easier to review without this.


Alternatively, split it up a bit more into obvious/nonobvious parts.


diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 21bba0c..8e8fff4 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -4829,7 +4829,6 @@ static rtx
 arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
 {
   rtx r0 = gen_rtx_REG (Pmode, R0_REG);
-  rtx insns;
   rtx call_fusage = NULL_RTX;

   start_sequence ();
@@ -4846,7 +4845,7 @@ arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
   RTL_PURE_CALL_P (call_insn) = 1;
   add_function_usage_to (call_insn, call_fusage);

-  insns = get_insns ();
+  rtx_insn *insns = get_insns ();
   end_sequence ();


For example, stuff like this looks obvious enough that it can go in.


Bernd


Re: [PATCH 4/7] remove cast to rtx_insn * in remove_note

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


2016-10-17  Trevor Saunders  

* config/rl78/rl78.c (gen-and_emit_move): Change argument type
to rtx_insn *.
(transcode_memory_rtx): Likewise.
(move_to_acc): Likewise.
(move_from_acc): Likewise.
(move_acc_to_reg): Likewise.
(move_to_x): Likewise.
(move_to_hl): Likewise.
(move_to_de): Likewise.
* config/rs6000/rs6000.c (emit_frame_save): Likewise.
(rs6000_emit_savres_rtx): Likewise.
(rs6000_emit_prologue): Likewise.
* reorg.c (update_reg_unused_notes): Likewise.
* rtl.h (remove_note): Adjust prototype.
* rtlanal.c (remove_note): Make argument type rtx_insn *.


Ok.


Bernd



Re: [PATCH 6/7] remove cast from prev_nonnote_insn_bb

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* emit-rtl.c (prev_nonnote_insn_bb): Change argument type to
rtx_insn *.
* rtl.h (prev_nonnote_insn_bb): Adjust prototype.


Ok.


Bernd



Re: [PATCH 5/7] remove cast in delete_insn_chain

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* cfgrtl.c (delete_insn_chain): Change argument type to rtx_insn *
and adjust for that.
* cfgrtl.h (delete_insn_chain): Adjust prototype.


Ok.


Bernd



Re: [PATCH 7/7] make targetm.gen_ccmp{first,next} take rtx_insn **

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* ccmp.c (expand_ccmp_expr_1): Adjust.
(expand_ccmp_expr): Likewise.
(expand_ccmp_next): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_ccmp_next): Likewise.
(aarch64_gen_ccmp_first): Likewise.
* doc/tm.texi: Regenerate.
* target.def (gen_ccmp_first): Change argument types to rtx_insn *.
(gen_ccmp_next): Likewise.


Looks reasonable, but has this been tested on aarch64? I think that's a 
prerequisite for this patch.



Bernd


[PATCH] Don't define uses-allocator variable templates in C++11

2016-10-18 Thread Jonathan Wakely

These variable templates give warnings in C++11 mode when
-Wsystem-headers is used:

In file included from /home/jwakely/gcc/7/include/c++/7.0.0/memory:77:0,
from vt.cc:1:
/home/jwakely/gcc/7/include/c++/7.0.0/bits/uses_allocator.h:130:20: warning: 
variable templates only available with -std=c++14 or -std=gnu++14
constexpr bool __is_uses_allocator_constructible_v =
   ^~~
/home/jwakely/gcc/7/include/c++/7.0.0/bits/uses_allocator.h:141:20: warning: 
variable templates only available with -std=c++14 or -std=gnu++14
constexpr bool __is_nothrow_uses_allocator_constructible_v =
   ^~~

We don't need them, so let's only define them for C++14 and up.

* include/bits/uses_allocator.h (__is_uses_allocator_constructible_v)
(__is_nothrow_uses_allocator_constructible_v): Only define for C++14
and later.

Tested powerpc64le-linux, committed to trunk.

commit b19fd14727318d5d6f3a411a2a600f89d07ab28a
Author: Jonathan Wakely 
Date:   Tue Oct 18 12:31:11 2016 +0100

Don't define uses-allocator variable templates in C++11

* include/bits/uses_allocator.h (__is_uses_allocator_constructible_v)
(__is_nothrow_uses_allocator_constructible_v): Only define for C++14
and later.

diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index c7d14f3..612c53c 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -126,9 +126,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : __is_uses_allocator_predicate
 { };
 
+#if __cplusplus >= 201402L
   template
 constexpr bool __is_uses_allocator_constructible_v =
   __is_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
+#endif // C++14
 
   template
 struct __is_nothrow_uses_allocator_constructible
@@ -137,9 +139,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
 
+#if __cplusplus >= 201402L
   template
 constexpr bool __is_nothrow_uses_allocator_constructible_v =
   __is_nothrow_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
+#endif // C++14
 
   template
 void __uses_allocator_construct_impl(__uses_alloc0 __a, _Tp* __ptr,


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Segher Boessenkool
On Tue, Oct 18, 2016 at 01:09:24PM +0200, Eric Botcazou wrote:
> > > No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.
> > 
> > I still don't see it, could you explain a bit more?
> 
> MODE is the mode of operands[1] before:
> 
> operands[1] = force_const_mem (mode, operands[1]);
> 
> and after.  But the test is on the address of the MEM, not on the MEM itself:
> 
> if (TARGET_TOC
> && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
> && use_toc_relative_ref (XEXP (operands[1], 0), Pmode))
> 
> because it's the mode of SYMBOL_REF we are interesting in (and 
> force_const_mem 
> guarantees that it's Pmode).

We need to pass the mode of the actual datum we would put in the TOC to
the use_toc_relative_ref function, not the mode of its address.

I must be missing something...


Segher


Re: [PATCH] PR77990 refactor unique_ptr to encapsulate tuple

2016-10-18 Thread Jonathan Wakely

On 17/10/16 14:37 +0100, Jonathan Wakely wrote:

We are incorrectly requiring unique_ptr deleters to be copyable here:

explicit
unique_ptr(pointer __p) noexcept
: _M_t(__p, deleter_type())
{ }

We could just do:

explicit
unique_ptr(pointer __p) noexcept
: _M_t()
{ std::get<0>(_M_t) = __p; }

But having to deal directly with the std::tuple inside unique_ptr has
been bothering me for some time. The tuple is used so we get the empty
base-class optimisation for the deleter, but that implementation
detail


ops, a dangling sentence. I meant to say that the tuple implementation
details leaks into the definition of lots of members, which have to
say std::get<0>(_M_t) or std::get<1>(_M_t) instead of using more
natural member names.


This patch refactors unique_ptr to put the std::tuple member into a
new type which provides named accessors for the tuple elements, so we
can stop using get<0> and get<1>. That new type can also provide a
single-argument constructor to fix the copyable requirement for
deleters. This also removes the code for deducing the pointer type
which is duplciated in unique_ptr and unique_ptr, and while in
the neighbourhood I changed it from old-school SFINAE using overloaded
functions to the new hotness with __void_t<>.

I intend to commit this to trunk, but on the branches I'll just fix
the constructor as shown above, as it's a smaller change.


I'll wait a bit longer for any objections, as the refactoring could be
seen as unnecessary churn, but I think it's valuable housekeeping.




[Patch,testsuite] Fix sso.exp not calling torture-finish for avr

2016-10-18 Thread Senthil Kumar Selvaraj
Hi,

  When analyzing reg test failures for the avr target, I noticed that the
  torture options were different when running dg-torture.exp compared to
  x86_64-pc-linux-gnu, resulting in additional failures. I also found
  that  a bunch of "torture-without-loops not empty as expected" errors
  show up for a few .exp files.

  I found that these did not occur when the exp files were run in
  isolation. On further debugging, I found that sso.exp calls dg-init and
  torture-init, and returns if !effective_target_int32. It does
  not call the corresponding finish functions for targets like the avr
  for which the effective target condition is true, and this leaves
  torture-options set, which causes the errors and differing options.

  The below patch makes the return occur earlier - before calling the
  init functions.

  Committed to trunk.

Regards
Senthil

2016-10-18  Senthil Kumar Selvaraj  

* gcc.dg/sso/sso.exp: Return early if not
effective_target_int32.


Index: gcc.dg/sso/sso.exp
===
--- gcc.dg/sso/sso.exp  (revision 241299)
+++ gcc.dg/sso/sso.exp  (working copy)
@@ -18,6 +18,10 @@
 load_lib gcc-dg.exp
 load_lib torture-options.exp
 
+if { ![check_effective_target_int32] } {
+return
+}
+
 # Initialize `dg'.
 torture-init
 dg-init
@@ -32,10 +36,6 @@
 
 set-torture-options $SSO_TORTURE_OPTIONS
 
-if { ![check_effective_target_int32] } {
-return
-}
-
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
 


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
> The name of Etc/GMT+1 timezone is "-01", as evident from:
>
> $ TZ=Etc/GMT+1 date +%Z
> -01
>
> Attached patch fixes the testsuite failure.

Forgot to say that the patch was tested with tzdata2016g on Fedora 24
and CentOS 5.11.

Uros.


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
On Tue, Oct 18, 2016 at 2:10 PM, Uros Bizjak  wrote:
> On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
>> The name of Etc/GMT+1 timezone is "-01", as evident from:
>>
>> $ TZ=Etc/GMT+1 date +%Z
>> -01
>>
>> Attached patch fixes the testsuite failure.
>
> Forgot to say that the patch was tested with tzdata2016g on Fedora 24
> and CentOS 5.11.

FYI, tzdata2016g ChangLog says:

  Changes to past and future time zone abbreviations

The Factory zone now uses the time zone abbreviation -00 instead
of a long English-language string, as -00 is now the normal way to
represent an undefined time zone.

Several zones in Antarctica and the former Soviet Union, along
with zones intended for ships at sea that cannot use POSIX TZ
strings, now use numeric time zone abbreviations instead of
invented or obsolete alphanumeric abbreviations.  The affected
zones are Antarctica/Casey, Antarctica/Davis,
Antarctica/DumontDUrville, Antarctica/Mawson, Antarctica/Rothera,
Antarctica/Syowa, Antarctica/Troll, Antarctica/Vostok,
Asia/Anadyr, Asia/Ashgabat, Asia/Baku, Asia/Bishkek, Asia/Chita,
Asia/Dushanbe, Asia/Irkutsk, Asia/Kamchatka, Asia/Khandyga,
Asia/Krasnoyarsk, Asia/Magadan, Asia/Omsk, Asia/Sakhalin,
Asia/Samarkand, Asia/Srednekolymsk, Asia/Tashkent, Asia/Tbilisi,
Asia/Ust-Nera, Asia/Vladivostok, Asia/Yakutsk, Asia/Yekaterinburg,
Asia/Yerevan, Etc/GMT-14, Etc/GMT-13, Etc/GMT-12, Etc/GMT-11,
Etc/GMT-10, Etc/GMT-9, Etc/GMT-8, Etc/GMT-7, Etc/GMT-6, Etc/GMT-5,
Etc/GMT-4, Etc/GMT-3, Etc/GMT-2, Etc/GMT-1, Etc/GMT+1, Etc/GMT+2,
Etc/GMT+3, Etc/GMT+4, Etc/GMT+5, Etc/GMT+6, Etc/GMT+7, Etc/GMT+8,
Etc/GMT+9, Etc/GMT+10, Etc/GMT+11, Etc/GMT+12, Europe/Kaliningrad,
Europe/Minsk, Europe/Samara, Europe/Volgograd, and
Indian/Kerguelen.  For Europe/Moscow the invented abbreviation MSM
was replaced by +05, whereas MSK and MSD were kept as they are not
our invention and are widely used.

Uros.


Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Senthil Kumar Selvaraj

Bernd Schmidt writes:

> On 10/13/2016 08:57 AM, Senthil Kumar Selvaraj wrote:
>>
>> 2016-10-13  Senthil Kumar Selvaraj  
>>
>>  * reload.c (find_valid_class_1): Allow regclass if atleast one
>>  regno in class is ok. Compute and use rclass size based on
>>  actually available regnos for mode in rclass.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-10-13  Senthil Kumar Selvaraj  
>>  
>>  * gcc.target/avr/pr71627.c: New.
>>
>>
>> Index: gcc/reload.c
>> ===
>> --- gcc/reload.c (revision 240989)
>> +++ gcc/reload.c (working copy)
>> @@ -711,31 +711,36 @@
>>enum reg_class best_class = NO_REGS;
>>unsigned int best_size = 0;
>>int cost;
>> +  unsigned int computed_rclass_sizes[N_REG_CLASSES] = { 0 };
>
> As far as I can tell you're only accessing this as 
> computed_rclass_size[rclass], i.e. with the current class in the loop. 
> So I don't think you need the array at all, just a computed_size 
> variable in the loop?

Yes - I mechanically replaced the original array with the computed one.
A variable would suffice.
>
>> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +{
>> +  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
>> +{
>> +  atleast_one_regno_ok = 1;
>> +  if (HARD_REGNO_MODE_OK (regno, mode))
>> +computed_rclass_sizes[rclass]++;
>> +}
>> +}
>
> Don't you want to also ensure HARD_REGNO_MODE_OK before claiming that 
> atleast_one_regno_ok? Maybe I'm forgetting the motivation but this seems 
> odd. If so, the variable becomes unnecessary, just check the computed size.

True again - the original intention was to prevent the best_xxx
variables from getting set if no regno was in_hard_reg_set. Now the
computed class size would be zero, so the variable is unnecessary.

Will do both the changes and re-run the reg tests. Ok for trunk if the
tests pass for x86_64-pc-linux and avr?

Regards
Senthil


Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Bernd Schmidt

On 10/18/2016 02:15 PM, Senthil Kumar Selvaraj wrote:

Will do both the changes and re-run the reg tests. Ok for trunk if the
tests pass for x86_64-pc-linux and avr?


Probably but let's see the patch first.


Bernd



[PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Richard Biener

The following patch makes EVRP remove stmts that will become dead
after propagation.  For this to work we have to propagate into PHIs
(sth we missed as well).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
stmts_to_remove.
(evrp_dom_walker::~evrp_dom_walker): Free it.
(evrp_dom_walker::stmts_to_remove): Add.
(evrp_dom_walker::before_dom_children): Mark PHIs and stmts
whose output we fully propagate for removal.  Propagate
into BB destination PHI arguments.
(execute_early_vrp): Remove queued stmts.  Dump value ranges
before stmt removal.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241302)
+++ gcc/tree-vrp.c  (working copy)
@@ -10643,11 +10643,13 @@ public:
 : dom_walker (CDI_DOMINATORS), stack (10)
 {
   stmts_to_fixup.create (0);
+  stmts_to_remove.create (0);
   need_eh_cleanup = BITMAP_ALLOC (NULL);
 }
   ~evrp_dom_walker ()
 {
   stmts_to_fixup.release ();
+  stmts_to_remove.release ();
   BITMAP_FREE (need_eh_cleanup);
 }
   virtual edge before_dom_children (basic_block);
@@ -10660,6 +10662,7 @@ public:
   auto_vec > stack;
   bitmap need_eh_cleanup;
   vec stmts_to_fixup;
+  vec stmts_to_remove;
 };
 
 
@@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
   else
set_value_range_to_varying (&vr_result);
   update_value_range (lhs, &vr_result);
+
+  /* Mark PHIs whose lhs we fully propagate for removal.  */
+  tree val;
+  if ((val = op_with_constant_singleton_value_range (lhs))
+ && may_propagate_copy (lhs, val))
+   {
+ stmts_to_remove.safe_push (phi);
+ continue;
+   }
 }
 
   edge taken_edge = NULL;
@@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
  update_value_range (output, &vr);
  vr = *get_value_range (output);
 
-
  /* Set the SSA with the value range.  */
  if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
{
@@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
   && range_includes_zero_p (vr.min,
 vr.max) == 1)))
set_ptr_nonnull (output);
+
+ /* Mark stmts whose output we fully propagate for removal.  */
+ tree val;
+ if ((val = op_with_constant_singleton_value_range (output))
+ && may_propagate_copy (output, val)
+ && !stmt_could_throw_p (stmt)
+ && !gimple_has_side_effects (stmt))
+   {
+ stmts_to_remove.safe_push (stmt);
+ continue;
+   }
}
  else
set_defs_to_varying (stmt);
@@ -10860,6 +10882,24 @@ evrp_dom_walker::before_dom_children (ba
}
}
 }
+
+  /* Visit BB successor PHI nodes and replace PHI args.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+{
+  for (gphi_iterator gpi = gsi_start_phis (e->dest);
+  !gsi_end_p (gpi); gsi_next (&gpi))
+   {
+ gphi *phi = gpi.phi ();
+ use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, e);
+ tree arg = USE_FROM_PTR (use_p);
+ if (TREE_CODE (arg) != SSA_NAME
+ || virtual_operand_p (arg))
+   continue;
+ if (tree val = op_with_constant_singleton_value_range (arg))
+   propagate_value (use_p, val);
+   }
+}
+ 
   bb->flags |= BB_VISITED;
 
   return taken_edge;
@@ -10941,6 +10981,34 @@ execute_early_vrp ()
   evrp_dom_walker walker;
   walker.walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
+  if (dump_file)
+{
+  fprintf (dump_file, "\nValue ranges after Early VRP:\n\n");
+  dump_all_value_ranges (dump_file);
+  fprintf (dump_file, "\n");
+}
+
+  /* Remove stmts in reverse order to make debug stmt creation possible.  */
+  while (! walker.stmts_to_remove.is_empty ())
+{
+  gimple *stmt = walker.stmts_to_remove.pop ();
+  if (dump_file && dump_flags & TDF_DETAILS)
+   {
+ fprintf (dump_file, "Removing dead stmt ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ fprintf (dump_file, "\n");
+   }
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (gimple_code (stmt) == GIMPLE_PHI)
+   remove_phi_node (&gsi, true);
+  else
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&gsi, true);
+ release_defs (stmt);
+   }
+}
+
   if (!bitmap_empty_p (walker.need_eh_cleanup))
 gimple_purge_all_dead_eh_edges (walker.need_eh_cleanup);
 
@@ -10954,12 +11022,6 @@ execute_early_vrp ()
   fixup_noreturn_call (stmt);
 }
 
-  if (dump_file)
-{
-  fprintf (dump_file, "\nValue ranges after Early V

Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Ximin Luo
Richard Biener:
> On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
>> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
>>>
>>> Mike Stump:
 On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
> Therefore, it is better to emit it in all circumstances, in case the 
> reader needs to know what the working
> directory was at compile-time.

 I can't help but wonder if this would break ccache some?

>>>
>>> Could you explain this in some more detail? At the moment, GCC will already 
>>> emit DW_AT_name with an absolute path (in the scenario that this patch is 
>>> relevant to). How does ccache work around this at the moment? (Does it use 
>>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
>>> patch should be fine.)
>>
>> If you compile the same file, but in a different directory, I wonder if cwd 
>> will cause the cache entry to not be reused.
>>
>> I expect one of the ccache people that are around would just know if it will 
>> care at all.
> 
> I believe ccache compares preprocessed source, definitely _not_ DWARF
> output, so this shouldn't break anything there.
> It might result in different object file output but as the reporter
> figured due to a bug in dwarf2out.c we end up generating
> DW_AT_comp_dir in almost all cases already.
> 
> I think the patch is ok but it misses a ChangeLog entry.  How did you
> test the patch? (bootstrapped and tested on ...?)
> 

Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
testing/unstable. I'll find some time to bootstrap it and test it fully over 
the next few days.

Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a 
bit more, my patch basically obsoletes the need for this so I can delete that 
as well.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Rainer Orth
Hi Uros,

> On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
>> The name of Etc/GMT+1 timezone is "-01", as evident from:
>>
>> $ TZ=Etc/GMT+1 date +%Z
>> -01
>>
>> Attached patch fixes the testsuite failure.
>
> Forgot to say that the patch was tested with tzdata2016g on Fedora 24
> and CentOS 5.11.

but Fedora 20 still returns GMT+1 here, and Solaris 10 to 12 even
Etc/GMT (where Solaris 12 also has 2016g).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
On Tue, 2016-10-18 at 05:13 +0200, Markus Trippelsdorf wrote:
> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> > Hi,
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> > of pointer type.  This patch recognizes when pointer arithmetic is taking
> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> > the case of the PR, this occurs in the logic where the stride S is a known
> > constant value, but the same problem could occur when it is an SSA_NAME
> > without known value.  Both possibilities are handled here.
> > 
> > Fixing the code to ensure that the unknown stride case always uses an 
> > initializer for a negative increment allows us to remove the stopgap fix
> > added for PR77937 as well.
> > 
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions, committed.
> 
> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> X86_64 before committing these patches, because you broke it for the
> third time in the last couple of days.

Sorry, sorry.  I did intend to build another stage1 cross and try this,
but it just slipped my mind.  Looks like I'll need to put the stopgap
fix back in.  I'll do that shortly.

Meantime, -fno-slsr is a workaround.  If I have trouble building ffmpeg
with a cross, I'll check with you on testing a future patch.

Bill

> 
> markus@x4 ffmpeg % cat h264dsp.i
> extern int fn2(int);
> extern int fn3(int);
> int a, b, c;
> void fn1(long p1) {
>   char *d;
>   for (;; d += p1) {
> d[0] = fn2(1 >> c >> 1);
> fn2(c >> a);
> d[1] = fn3(d[1]) >> 1;
> d[6] = fn3(d[6] * b + 1) >> 1;
> d[7] = fn3(d[7] * b + 1) >> 1;
> d[8] = fn3(d[8] * b + 1) >> 1;
>   }
> }
> 
> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
> h264dsp.i: In function ‘fn1’:
> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
> gimple-ssa-strength-reduction.c:3375
>  void fn1(long p1) {
>   ^~~
> 0x12773a9 replace_one_candidate
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
> 0x127af77 replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
> 0x127aeeb replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
> 0x127f3ee analyze_candidates_and_replace
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
> 0x127f3ee execute
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
> 
> 
> 
> 




Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Richard Biener
On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
> Richard Biener:
>> On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
>>> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:

 Mike Stump:
> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>> Therefore, it is better to emit it in all circumstances, in case the 
>> reader needs to know what the working
>> directory was at compile-time.
>
> I can't help but wonder if this would break ccache some?
>

 Could you explain this in some more detail? At the moment, GCC will 
 already emit DW_AT_name with an absolute path (in the scenario that this 
 patch is relevant to). How does ccache work around this at the moment? 
 (Does it use debug-prefix-map? In which case, this also affects 
 DW_AT_comp_dir, so my patch should be fine.)
>>>
>>> If you compile the same file, but in a different directory, I wonder if cwd 
>>> will cause the cache entry to not be reused.
>>>
>>> I expect one of the ccache people that are around would just know if it 
>>> will care at all.
>>
>> I believe ccache compares preprocessed source, definitely _not_ DWARF
>> output, so this shouldn't break anything there.
>> It might result in different object file output but as the reporter
>> figured due to a bug in dwarf2out.c we end up generating
>> DW_AT_comp_dir in almost all cases already.
>>
>> I think the patch is ok but it misses a ChangeLog entry.  How did you
>> test the patch? (bootstrapped and tested on ...?)
>>
>
> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
> didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
> testing/unstable. I'll find some time to bootstrap it and test it fully over 
> the next few days.
>
> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a 
> bit more, my patch basically obsoletes the need for this so I can delete that 
> as well.

That would be nice.

Richard.

> X
>
> --
> GPG: ed25519/56034877E1F87C35
> GPG: rsa4096/1318EFAC5FBBDBCE
> https://github.com/infinity0/pubkeys.git


Re: [PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 02:34:58PM +0200, Richard Biener wrote:
> 
> The following patch makes EVRP remove stmts that will become dead
> after propagation.  For this to work we have to propagate into PHIs
> (sth we missed as well).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2016-10-18  Richard Biener  
> 
>   * tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
>   stmts_to_remove.
>   (evrp_dom_walker::~evrp_dom_walker): Free it.
>   (evrp_dom_walker::stmts_to_remove): Add.
>   (evrp_dom_walker::before_dom_children): Mark PHIs and stmts
>   whose output we fully propagate for removal.  Propagate
>   into BB destination PHI arguments.
>   (execute_early_vrp): Remove queued stmts.  Dump value ranges
>   before stmt removal.
> 
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c(revision 241302)
> +++ gcc/tree-vrp.c(working copy)
> @@ -10643,11 +10643,13 @@ public:
>  : dom_walker (CDI_DOMINATORS), stack (10)
>  {
>stmts_to_fixup.create (0);
> +  stmts_to_remove.create (0);
>need_eh_cleanup = BITMAP_ALLOC (NULL);
>  }
>~evrp_dom_walker ()
>  {
>stmts_to_fixup.release ();
> +  stmts_to_remove.release ();
>BITMAP_FREE (need_eh_cleanup);
>  }
>virtual edge before_dom_children (basic_block);
> @@ -10660,6 +10662,7 @@ public:
>auto_vec > stack;
>bitmap need_eh_cleanup;
>vec stmts_to_fixup;
> +  vec stmts_to_remove;

That might as well be an auto_vec right?

>  };
>  
>  
> @@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
>else
>   set_value_range_to_varying (&vr_result);
>update_value_range (lhs, &vr_result);
> +
> +  /* Mark PHIs whose lhs we fully propagate for removal.  */
> +  tree val;
> +  if ((val = op_with_constant_singleton_value_range (lhs))
> +   && may_propagate_copy (lhs, val))

wouldn't it be clearer to write that as

tree val = op_with_constant_singleton_value_range (lhs);
if (val && may_propagate_copy (lhs, val))

> + {
> +   stmts_to_remove.safe_push (phi);
> +   continue;
> + }
>  }
>  
>edge taken_edge = NULL;
> @@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
> update_value_range (output, &vr);
> vr = *get_value_range (output);
>  
> -
> /* Set the SSA with the value range.  */
> if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
>   {
> @@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
>  && range_includes_zero_p (vr.min,
>vr.max) == 1)))
>   set_ptr_nonnull (output);
> +
> +   /* Mark stmts whose output we fully propagate for removal.  */
> +   tree val;
> +   if ((val = op_with_constant_singleton_value_range (output))
> +   && may_propagate_copy (output, val)
> +   && !stmt_could_throw_p (stmt)
> +   && !gimple_has_side_effects (stmt))

similar.

Thanks!

Trev



libgo patch committed: scan caller-saved regs for non-split-stack

2016-10-18 Thread Ian Lance Taylor
While testing a libgo patch on Solaris, which does not support
split-stack, I ran across a bug in the handling of caller-saved
registers for the garbage collector.  For non-split-stack systems,
runtime_mcall is responsible for saving all caller-saved registers on
the stack so that the GC stack scan will see them.  It does this by
calling __builtin_unwind_init and setting the g's gcnextsp field to
point to the current stack.  The garbage collector then scans the
stack from gcnextsp to the top of stack.

Unfortunately, the code was setting gcnextsp to point to
runtime_mcall's argument, which meant that even though runtime_mcall
was careful to store all caller-saved registers on the stack, the GC
never saw them.  This is, of course, only a problem if a value lives
only in a caller-saved register, and not anywhere else on the stack or
heap.  And it is only a problem if that caller-saved register manages
to make it all the way down to runtime_mcall without being saved by
any function on the way.  This is moderately unlikely but it turns out
that the recent changes to keep values on the stack when compiling the
runtime package caused it to happen for the local variable `s` in
`notifyListWait` in runtime/sema.go.  That function calls goparkunlock
which is simple enough to not require all registers, and itself calls
runtime_mcall.  So it was possible for `s` to be released by the GC
before the goroutine returned from goparkunlock, which eventually
caused a dangling pointerto be passed to releaseSudog.

This is not a problem on split-stack systems, which use
__splitstack_get_context, which saves a stack pointer low enough on
the stack to scan the registers saved by runtime_mcall.

This patch fixes the problem by introducing a local variable which
should be on the stack below the saved registers.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and
i386-sun-solaris.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241261)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-314ba28067383516c213ba84c931f93325a48c39
+0a49b1dadd862215bdd38b9725a6e193b0d8fd0b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 241261)
+++ libgo/runtime/proc.c(working copy)
@@ -283,6 +283,9 @@ runtime_mcall(void (*pfn)(G*))
 {
M *mp;
G *gp;
+#ifndef USING_SPLIT_STACK
+   void *afterregs;
+#endif
 
// Ensure that all registers are on the stack for the garbage
// collector.
@@ -298,7 +301,9 @@ runtime_mcall(void (*pfn)(G*))
 #ifdef USING_SPLIT_STACK
__splitstack_getcontext(&g->stackcontext[0]);
 #else
-   gp->gcnextsp = &pfn;
+   // We have to point to an address on the stack that is
+   // below the saved registers.
+   gp->gcnextsp = &afterregs;
 #endif
gp->fromgogo = false;
getcontext(ucontext_arg(&gp->context[0]));


Re: [PATCH 7/7] make targetm.gen_ccmp{first,next} take rtx_insn **

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 01:25:55PM +0200, Bernd Schmidt wrote:
> On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > gcc/ChangeLog:
> > 
> > 2016-10-17  Trevor Saunders  
> > 
> > * ccmp.c (expand_ccmp_expr_1): Adjust.
> > (expand_ccmp_expr): Likewise.
> > (expand_ccmp_next): Likewise.
> > * config/aarch64/aarch64.c (aarch64_gen_ccmp_next): Likewise.
> > (aarch64_gen_ccmp_first): Likewise.
> > * doc/tm.texi: Regenerate.
> > * target.def (gen_ccmp_first): Change argument types to rtx_insn *.
> > (gen_ccmp_next): Likewise.
> 
> Looks reasonable, but has this been tested on aarch64? I think that's a
> prerequisite for this patch.

So far I've only checked that I can build a compiler targeting aarch64,
which given the changes in the patch theoretically should be enough.
However it shouldn't be hard to actually test it on an aarch64 machine
so I'll do that.

Thanks!

Trev

> 
> 
> Bernd


Re: [PATCH] Fix PR77916

2016-10-18 Thread Markus Trippelsdorf
On 2016.10.18 at 08:15 -0500, Bill Schmidt wrote:
> On Tue, 2016-10-18 at 05:13 +0200, Markus Trippelsdorf wrote:
> > On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> > > Hi,
> > > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> > > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> > > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> > > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> > > of pointer type.  This patch recognizes when pointer arithmetic is taking
> > > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> > > the case of the PR, this occurs in the logic where the stride S is a known
> > > constant value, but the same problem could occur when it is an SSA_NAME
> > > without known value.  Both possibilities are handled here.
> > > 
> > > Fixing the code to ensure that the unknown stride case always uses an 
> > > initializer for a negative increment allows us to remove the stopgap fix
> > > added for PR77937 as well.
> > > 
> > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > > regressions, committed.
> > 
> > Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> > X86_64 before committing these patches, because you broke it for the
> > third time in the last couple of days.
> 
> Sorry, sorry.  I did intend to build another stage1 cross and try this,
> but it just slipped my mind.  Looks like I'll need to put the stopgap
> fix back in.  I'll do that shortly.
> 
> Meantime, -fno-slsr is a workaround.  If I have trouble building ffmpeg
> with a cross, I'll check with you on testing a future patch.

I you wish I can send you a tarball with the preprocessed *.i files from
ffmpeg, so that you can use a stage1 cross on them.

-- 
Markus


Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
Hi,

The previous solution for PR77916 was inadequately tested, for which I
sincerely apologize.  I've reinstated the stopgap fix previously
reverted, as follows.

Thanks for your patience,
Bill


2016-10-18  Bill Schmidt  

PR tree-optimization/77916
* gimple-ssa-strength-reduction.c (analyze_increments): Reinstate
stopgap fix, as pointers with -1 increment are still broken.

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 241302)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2825,6 +2825,10 @@ analyze_increments (slsr_cand_t first_dep, machine
   && !POINTER_TYPE_P (first_dep->cand_type)))
incr_vec[i].cost = COST_NEUTRAL;
 
+  /* FIXME: Still having trouble with pointers with a -1 increment.  */
+  else if (incr == -1 && POINTER_TYPE_P (first_dep->cand_type))
+   incr_vec[i].cost = COST_INFINITE;
+
   /* FORNOW: If we need to add an initializer, give up if a cast from
 the candidate's type to its stride's type can lose precision.
 This could eventually be handled better by expressly retaining the




Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 01:18:42PM +0200, Bernd Schmidt wrote:
> On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:
> >  {
> > -  rtx r0, r16, eqv, tga, tp, insn, dest, seq;
> > +  rtx r0, r16, eqv, tga, tp, dest, seq;
> > +  rtx_insn *insn;
> > 
> >switch (tls_symbolic_operand_type (x))
> > {
> > @@ -1025,66 +1026,70 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
> > machine_mode mode)
> >   break;
> > 
> > case TLS_MODEL_GLOBAL_DYNAMIC:
> > - start_sequence ();
> > + {
> > +   start_sequence ();
> > 
> > - r0 = gen_rtx_REG (Pmode, 0);
> > - r16 = gen_rtx_REG (Pmode, 16);
> > - tga = get_tls_get_addr ();
> > - dest = gen_reg_rtx (Pmode);
> > - seq = GEN_INT (alpha_next_sequence_number++);
> > +   r0 = gen_rtx_REG (Pmode, 0);
> > +   r16 = gen_rtx_REG (Pmode, 16);
> > +   tga = get_tls_get_addr ();
> > +   dest = gen_reg_rtx (Pmode);
> > +   seq = GEN_INT (alpha_next_sequence_number++);
> > 
> > - emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
> > - insn = gen_call_value_osf_tlsgd (r0, tga, seq);
> > - insn = emit_call_insn (insn);
> > - RTL_CONST_CALL_P (insn) = 1;
> > - use_reg (&CALL_INSN_FUNCTION_USAGE (insn), r16);
> > +   emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
> > +   rtx val = gen_call_value_osf_tlsgd (r0, tga, seq);
> 
> Since this doesn't consistently declare variables at the point of
> initialization, might as well put val into the list of variables at the top,
> and avoid reindentation that way. There are several such reindented blocks,
> and the patch would be a lot easier to review without this.

I do really prefer reading code where variables are declared at first
use, but I'll agree with the tools we are using this can be hard to
review, sorry about that.

> Alternatively, split it up a bit more into obvious/nonobvious parts.

yeah, I'll try to get that done soon.  fwiw a -b diff is below if you
find that better.

> > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> > index 21bba0c..8e8fff4 100644
> > --- a/gcc/config/arc/arc.c
> > +++ b/gcc/config/arc/arc.c
> > @@ -4829,7 +4829,6 @@ static rtx
> >  arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
> >  {
> >rtx r0 = gen_rtx_REG (Pmode, R0_REG);
> > -  rtx insns;
> >rtx call_fusage = NULL_RTX;
> > 
> >start_sequence ();
> > @@ -4846,7 +4845,7 @@ arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx 
> > eqv)
> >RTL_PURE_CALL_P (call_insn) = 1;
> >add_function_usage_to (call_insn, call_fusage);
> > 
> > -  insns = get_insns ();
> > +  rtx_insn *insns = get_insns ();
> >end_sequence ();
> 
> For example, stuff like this looks obvious enough that it can go in.

yeah, I think Jeff preapproved stuff like that a while back, but I just
lumped it in though really that wasn't very nice to review, and there
isn't really a reason for a human to review what a compiler can check
for us.

Thanks!

Trev

> 
> 
> Bernd


Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
On Tue, 2016-10-18 at 15:30 +0200, Markus Trippelsdorf wrote:

> I you wish I can send you a tarball with the preprocessed *.i files from
> ffmpeg, so that you can use a stage1 cross on them.
> 

That would be very helpful, thanks!

Bill



Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Bernd Schmidt

On 10/18/2016 03:54 PM, Trevor Saunders wrote:


I do really prefer reading code where variables are declared at first
use


In general, so do I, but in this case it's one variable out of a whole 
bunch, which makes the entire thing look a little inconsistent.



Bernd


Re: [PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Richard Biener
On Tue, 18 Oct 2016, Trevor Saunders wrote:

> On Tue, Oct 18, 2016 at 02:34:58PM +0200, Richard Biener wrote:
> > 
> > The following patch makes EVRP remove stmts that will become dead
> > after propagation.  For this to work we have to propagate into PHIs
> > (sth we missed as well).
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2016-10-18  Richard Biener  
> > 
> > * tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
> > stmts_to_remove.
> > (evrp_dom_walker::~evrp_dom_walker): Free it.
> > (evrp_dom_walker::stmts_to_remove): Add.
> > (evrp_dom_walker::before_dom_children): Mark PHIs and stmts
> > whose output we fully propagate for removal.  Propagate
> > into BB destination PHI arguments.
> > (execute_early_vrp): Remove queued stmts.  Dump value ranges
> > before stmt removal.
> > 
> > Index: gcc/tree-vrp.c
> > ===
> > --- gcc/tree-vrp.c  (revision 241302)
> > +++ gcc/tree-vrp.c  (working copy)
> > @@ -10643,11 +10643,13 @@ public:
> >  : dom_walker (CDI_DOMINATORS), stack (10)
> >  {
> >stmts_to_fixup.create (0);
> > +  stmts_to_remove.create (0);
> >need_eh_cleanup = BITMAP_ALLOC (NULL);
> >  }
> >~evrp_dom_walker ()
> >  {
> >stmts_to_fixup.release ();
> > +  stmts_to_remove.release ();
> >BITMAP_FREE (need_eh_cleanup);
> >  }
> >virtual edge before_dom_children (basic_block);
> > @@ -10660,6 +10662,7 @@ public:
> >auto_vec > stack;
> >bitmap need_eh_cleanup;
> >vec stmts_to_fixup;
> > +  vec stmts_to_remove;
> 
> That might as well be an auto_vec right?
> 
> >  };
> >  
> >  
> > @@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
> >else
> > set_value_range_to_varying (&vr_result);
> >update_value_range (lhs, &vr_result);
> > +
> > +  /* Mark PHIs whose lhs we fully propagate for removal.  */
> > +  tree val;
> > +  if ((val = op_with_constant_singleton_value_range (lhs))
> > + && may_propagate_copy (lhs, val))
> 
> wouldn't it be clearer to write that as
> 
> tree val = op_with_constant_singleton_value_range (lhs);
> if (val && may_propagate_copy (lhs, val))
> 
> > +   {
> > + stmts_to_remove.safe_push (phi);
> > + continue;
> > +   }
> >  }
> >  
> >edge taken_edge = NULL;
> > @@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
> >   update_value_range (output, &vr);
> >   vr = *get_value_range (output);
> >  
> > -
> >   /* Set the SSA with the value range.  */
> >   if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
> > {
> > @@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
> >&& range_includes_zero_p (vr.min,
> >  vr.max) == 1)))
> > set_ptr_nonnull (output);
> > +
> > + /* Mark stmts whose output we fully propagate for removal.  */
> > + tree val;
> > + if ((val = op_with_constant_singleton_value_range (output))
> > + && may_propagate_copy (output, val)
> > + && !stmt_could_throw_p (stmt)
> > + && !gimple_has_side_effects (stmt))
> 
> similar.

Fixed.  Testing the following.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
stmts_to_remove.
(evrp_dom_walker::~evrp_dom_walker): Free it.
(evrp_dom_walker::stmts_to_remove): Add.
(evrp_dom_walker::before_dom_children): Mark PHIs and stmts
whose output we fully propagate for removal.  Propagate
into BB destination PHI arguments.
(execute_early_vrp): Remove queued stmts.  Dump value ranges
before stmt removal.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241302)
+++ gcc/tree-vrp.c  (working copy)
@@ -10642,12 +10642,10 @@ public:
   evrp_dom_walker ()
 : dom_walker (CDI_DOMINATORS), stack (10)
 {
-  stmts_to_fixup.create (0);
   need_eh_cleanup = BITMAP_ALLOC (NULL);
 }
   ~evrp_dom_walker ()
 {
-  stmts_to_fixup.release ();
   BITMAP_FREE (need_eh_cleanup);
 }
   virtual edge before_dom_children (basic_block);
@@ -10659,7 +10657,8 @@ public:
   /* Cond_stack holds the old VR.  */
   auto_vec > stack;
   bitmap need_eh_cleanup;
-  vec stmts_to_fixup;
+  auto_vec stmts_to_fixup;
+  auto_vec stmts_to_remove;
 };
 
 
@@ -10769,6 +10768,11 @@ evrp_dom_walker::before_dom_children (ba
   else
set_value_range_to_varying (&vr_result);
   update_value_range (lhs, &vr_result);
+
+  /* Mark PHIs whose lhs we fully propagate for removal.  */
+  tree val = op_with_constant_singleton_value_range (lhs);
+  if (val && may_propagate_copy (lhs, val))
+   stmts_to_remove.safe_push (phi)

[PATCH] Use RPO order for domwalk dominator children sort

2016-10-18 Thread Richard Biener

For

extern void baz ();
extern void boo ();
extern void bla ();
int a[100];
void foo (int n)
{
  for (int j = 0; j < n; ++j)
{
  if (a[j+5])
{
  if (a[j])
break;
  baz ();
}
  else
bla ();
  boo ();
}
}

we happen to visit BBs in an unfortunate order so that we do not
have all predecessors visited when visiting the BB of boo().  This
is because domwalk uses a postorder on the inverted graph to
order dominator children -- that doesn't play well with loops
(as we've figured elsewhere before).  The following makes us use
RPO order instead.

This should help EVRP and DOM.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-10-18  Richard Biener  

* domwalk.c (dom_walker::walk): Use RPO order.

Index: gcc/domwalk.c
===
--- gcc/domwalk.c   (revision 241300)
+++ gcc/domwalk.c   (working copy)
@@ -243,7 +243,7 @@ dom_walker::walk (basic_block bb)
   if (m_dom_direction == CDI_DOMINATORS)
 {
   postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
-  postorder_num = inverted_post_order_compute (postorder);
+  postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
   bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
   for (int i = 0; i < postorder_num; ++i)
bb_postorder[postorder[i]] = i;


Re: [patch] Fix PHI optimization issue with boolean types

2016-10-18 Thread Jeff Law

On 10/18/2016 02:35 AM, Richard Biener wrote:

On Tue, Oct 18, 2016 at 8:35 AM, Eric Botcazou  wrote:

Hi,

this is a regression present on the mainline and 6 branch: the compiler now
generates wrong code for the attached testcase at -O because of an internal
conflict about boolean types.  The sequence is as follows.  In .mergephi3:

  boolean _22;
  p__enum res;

  :
  if (_22 != 0)
goto ;
  else
goto ;

  :

  :
  # res_17 = PHI <2(8), 0(9), 1(10)>

is turned into:

COND_EXPR in block 9 and PHI in block 11 converted to straightline code.
PHI res_17 changed to factor conversion out from COND_EXPR.
New stmt with CAST that defines res_17.

  boolean _33;

  :
  # _33 = PHI <2(8), _22(9)>
  res_17 = (p__enum) _33;

[...]

  :
  if (res_17 != 0)
goto ;
  else
goto ;

  :
  _12 = res_17 == 2;
  _13 = (integer) _12

in .phiopt1.  So boolean _33 can have value 2.  Later forwprop3 propagates _33
into the uses of res_17:

  :
  if (_33 != 0)
goto ;
  else
goto ;

  :
  _12 = _33 == 2;
  _13 = (integer) _12;

and DOM3 deduces:

  :
  _12 = 0;
  _13 = 0;

because it computes that _33 has value 1 in BB 13 since it's a boolean.

The problem was introduced by the new factor_out_conditional_conversion:

  /* If arg1 is an INTEGER_CST, fold it to new type.  */
  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
  && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
{
  if (gimple_assign_cast_p (arg0_def_stmt))
new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
  else
return NULL;
}
  else
return NULL

int_fits_type_p is documented as taking only INTEGER_TYPE, but is invoked
on constant 2 and a BOOLEAN_TYPE and returns true.

BOOLEAN_TYPE is special in Ada: it has precision 8 and range [0;255] so the
outcome of int_fits_type_p is not unreasonable.  But this goes against the
various transformations applied to boolean types in the compiler, which all
assume that they can only take values 0 or 1.

Hence the attached fix (which should be a no-op except for Ada), tested on
x86_64-suse-linux, OK for mainline and 6 branch?


Hmm, I wonder if the patch is a good approach as you are likely only papering
over a single of possibly very many affected places (wi::fits_to_tree_p comes
immediately to my mind).  I suppose a better way would be for Ada to not
lie about those types and not use BOOLEAN_TYPE but INTEGER_TYPE.
Because BOOLEAN_TYPE types only have two values as documented in
tree.def:

/* Boolean type (true or false are the only values).  Looks like an
   INTEGRAL_TYPE.  */
DEFTREECODE (BOOLEAN_TYPE, "boolean_type", tcc_type, 0)

There are not many references to BOOLEAN_TYPE in ada/gcc-interface
thus it shouldn't be hard to change ... (looks like Ada might even prefer
ENUMERAL_TYPE here).

Thanks,
Richard.



2016-10-18  Eric Botcazou  

* tree.c (int_fits_type_p): Accept only 0 and 1 for boolean types.
Alternately we could check the precision/range in the optimizers.  We do 
that in various places because of this exact issue, I could well have 
missed one or more.


Though I would have a preference for fixing int_fits_type_p which seems 
like it'll catch the cases we care about without having to twiddle every 
optimizer.


jeff



RE: [RFC,PATCH,testsuite]

2016-10-18 Thread Matthew Fortune
 wrote:
> Richard Biener  writes:
> > On Sun, Nov 23, 2014 at 10:15 AM, Matthew Fortune
> >  wrote:
> > > I'd therefore like to apply the following. Any suggestions on the
> > > testing that this needs? Would a build + regression run of GCC with
> > > binutils configured --disable-plugins be sufficient?
> >
> > Sure, that sounds reasonable.  Note that I think the patch is ok
> > anyway - passing a "redundant" -ffat-lto-objects won't break anything.
> >
> > Thus, ok.
> 
> OK, I wasn't sure if there were any scenarios where an explicit -ffat-
> lto-objects would make a difference. I keep getting lost when trawling
> through the LTO option handling logic.
> 
> I'll run a quick test anyway and commit.

It seems I never committed this. Now committed! r241306

Matthew

> 
> Thanks,
> Matthew
> 
> >
> > > Matthew
> > >
> > > gcc/testsuite/
> > >
> > > * lib/gcc-dg.exp: Set gcc_force_conventional_output for
> whenever
> > > LTO is used.
> > >
> > > diff --git a/gcc/testsuite/lib/gcc-dg.exp
> > > b/gcc/testsuite/lib/gcc-dg.exp index 6df8ae1..8d5bf9b 100644
> > > --- a/gcc/testsuite/lib/gcc-dg.exp
> > > +++ b/gcc/testsuite/lib/gcc-dg.exp
> > > @@ -88,13 +88,13 @@ if [check_effective_target_lto] {
> > >   { -O2 -flto -fno-use-linker-plugin -flto-partition=none }
> \
> > >   { -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects }
> > >]
> > > -  set gcc_force_conventional_output "-ffat-lto-objects"
> > >  } else {
> > >set LTO_TORTURE_OPTIONS [list \
> > >   { -O2 -flto -flto-partition=none } \
> > >   { -O2 -flto }
> > >]
> > >  }
> > > +set gcc_force_conventional_output "-ffat-lto-objects"
> > >  }
> > >
> > >  global orig_environment_saved


[Patch, fortran] PR69566 - Failure of SELECT TYPE with unlimited polymorphic function result

2016-10-18 Thread Paul Richard Thomas
Dear All,

This bug was caused by 'associate name' and 'associate entity'
expressions being incomplete when the 'selector' was an intrinsic
function result. I tried to fix this at source, in match_select _type
and gfc_get_variable_expr, but caused a vast number of breakages.
Undoubtedly, this would be the ecologically sound way to proceed but
applying fixups in resolve_select_type, gfc_conv_class_to_class and
trans_associate_var proved to be the path of least resistance.

Depending on which form of select type is used, the defective
expressions either lacked the correct value for rank, a full array
reference, a symbol with an array spec or had type BT_DERIVED for a
polymorphic symbol; these either singly or in combination depending of
the form of select type.

The patch, taken with the ChangeLogs, is fairly self-explanatory.
Please note that I have suppressed whitespace changes (suppression of
trailing blanks), so use patch with the -l option.

Bootstraps and regtests on FC21/x86_64 - OK for trunk?

Best regards

Paul

2016-10-18  Paul Thomas  

PR fortran/69566
* resolve.c (fixup_array_ref): New function.
(resolve_select_type): Gather up the rank and array reference,
if any, from the selector. Fix up the 'associate name' and the
'associate entities' as necessary.
* trans-expr.c (gfc_conv_class_to_class): If the symbol backend
decl is a FUNCTION_DECL, use the 'fake_result_decl' instead.
* trans-stmt.c (trans_associate_var): Extend 'unlimited' to
include expressions which are themsleves not unlimited
polymorphic but the symbol is.

2016-10-18  Paul Thomas  

PR fortran/69566
* gfortran.dg/select_type_37.f03: New test.
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241274)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_assoc_var (gfc_symbol* sym, bool
*** 8327,8332 
--- 8327,8368 
  }


+ /* Ensure that SELECT TYPE expressions have the correct rank and a full
+array reference, where necessary.  The symbols are artificial and so
+the dimension attribute and arrayspec are also set.  */
+ static void
+ fixup_array_ref (gfc_expr **expr1, gfc_expr *expr2,
+int rank, gfc_ref *ref)
+ {
+   gfc_ref *nref = (*expr1)->ref;
+   gfc_symbol *sym1 = (*expr1)->symtree->n.sym;
+   gfc_symbol *sym2 = expr2 ? expr2->symtree->n.sym : NULL;
+   (*expr1)->rank = rank;
+   if (sym1->ts.type == BT_CLASS)
+ {
+   CLASS_DATA (sym1)->attr.dimension = 1;
+   if (CLASS_DATA (sym1)->as == NULL && sym2)
+   CLASS_DATA (sym1)->as
+   = gfc_copy_array_spec (CLASS_DATA (sym2)->as);
+ }
+   else
+ {
+   sym1->attr.dimension = 1;
+   if (sym1->as == NULL && sym2)
+   sym1->as = gfc_copy_array_spec (sym2->as);
+ }
+
+   for (; nref; nref = nref->next)
+ if (nref->next == NULL)
+   break;
+
+   if (ref && nref && nref->type != REF_ARRAY)
+ nref->next = gfc_copy_ref (ref);
+   else if (ref && !nref)
+ (*expr1)->ref = gfc_copy_ref (ref);
+ }
+
+
  /* Resolve a SELECT TYPE statement.  */

  static void
*** resolve_select_type (gfc_code *code, gfc
*** 8341,8346 
--- 8377,8384 
gfc_namespace *ns;
int error = 0;
int charlen = 0;
+   int rank = 0;
+   gfc_ref* ref = NULL;

ns = code->ext.block.ns;
gfc_resolve (ns);
*** resolve_select_type (gfc_code *code, gfc
*** 8468,8473 
--- 8506,8536 
else
  code->ext.block.assoc = NULL;

+   /* Ensure that the selector rank and arrayspec are available to
+  correct expressions in which they might be missing.  */
+   if (code->expr2 && code->expr2->rank)
+ {
+   rank = code->expr2->rank;
+   for (ref = code->expr2->ref; ref; ref = ref->next)
+   if (ref->next == NULL)
+ break;
+   if (ref && ref->type == REF_ARRAY)
+   ref = gfc_copy_ref (ref);
+
+   /* Fixup expr1 if necessary.  */
+   if (rank)
+   fixup_array_ref (&code->expr1, code->expr2, rank, ref);
+ }
+   else if (code->expr1->rank)
+ {
+   rank = code->expr1->rank;
+   for (ref = code->expr1->ref; ref; ref = ref->next)
+   if (ref->next == NULL)
+ break;
+   if (ref && ref->type == REF_ARRAY)
+   ref = gfc_copy_ref (ref);
+ }
+
/* Add EXEC_SELECT to switch on type.  */
new_st = gfc_get_code (code->op);
new_st->expr1 = code->expr1;
*** resolve_select_type (gfc_code *code, gfc
*** 8533,8539 
--- 8596,8607 
st->n.sym->assoc->target = gfc_get_variable_expr (code->expr1->symtree);
st->n.sym->assoc->target->where = code->expr1->where;
if (c->ts.type != BT_CLASS && c->ts.type != BT_UNKNOWN)
+   {
  gfc_add_data_component (st->n.sym->assoc->target);
+ /* Fixup the target expression if necessary.  */
+ if (rank)
+   fixup_array_ref (&st->n.sym->assoc->target, NU

Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

2016-10-18 Thread Bernd Edlinger
On 10/18/16 10:36, Christophe Lyon wrote:
>
> I am seeing a lot of regressions since this patch was committed:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>
> (you can click on "REGRESSED" to see the list of regressions, "sum"
> and "log" to download
> the corresponding .sum/.log)
>
> Thanks,
>
> Christophe
>

Oh, sorry, I have completely missed that.
Unfortunately I have tested one of the good combinations.

I have not considered the case that in and out could be
the same register.  But that happens here.


This should solve it.

Can you give it a try?



Thanks
Bernd.
2016-10-18  Bernd Edlinger  

	* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Clear the result
	register only if "in" and "out" are different registers.

--- gcc/config/arm/arm.c.orig	2016-10-17 11:00:34.045673223 +0200
+++ gcc/config/arm/arm.c	2016-10-18 14:53:06.710101327 +0200
@@ -29218,8 +29218,10 @@ arm_emit_coreregs_64bit_shift (enum rtx_
 
 	  /* Clearing the out register in DImode first avoids lots
 	 of spilling and results in less stack usage.
-	 Later this redundant insn is completely removed.  */
-	  emit_insn (SET (out, const0_rtx));
+	 Later this redundant insn is completely removed.
+	 Do that only if "in" and "out" are different registers.  */
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
 	  emit_insn (SET (out_down,
 			  ORR (REV_LSHIFT (code, in_up, reverse_amount),
@@ -29231,11 +29233,14 @@ arm_emit_coreregs_64bit_shift (enum rtx_
 	  /* Shifts by a constant greater than 31.  */
 	  rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
 
-	  emit_insn (SET (out, const0_rtx));
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
 	  if (code == ASHIFTRT)
 	emit_insn (gen_ashrsi3 (out_up, in_up,
 GEN_INT (31)));
+	  else
+	emit_insn (SET (out_up, const0_rtx));
 	}
 }
   else


[PATCH, i386]: Fix PR 77991, ICE on x32 in plus_constant, at explow.c

2016-10-18 Thread Uros Bizjak
Hello!

We have to return Pmode RTX from legitimize_tls_address. There was a
path that returned DImode, when SImode was expected. Since the code
deals with various linker bugs, let's leave the generated sequence
as-is and just convert it to Pmode before return.

2016-10-18  Uros Bizjak  

PR target/77991
* config/i386/i386.c (legitimize_tls_address)
: For TARGET_64BIT || TARGET_ANY_GNU_TLS
convert dest to Pmode if different than Pmode.

testsuite/ChangeLog:

2016-10-18  Uros Bizjak  

PR target/77991
* gcc.target/i386/pr77991.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Patch was committed to mainline SVN and will be backported to other
release branches.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 241216)
+++ config/i386/i386.c  (working copy)
@@ -16357,7 +16357,9 @@ legitimize_tls_address (rtx x, enum tls_model mode
  base = get_thread_pointer (tp_mode,
 for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
  off = force_reg (tp_mode, off);
- return gen_rtx_PLUS (tp_mode, base, off);
+ dest = gen_rtx_PLUS (tp_mode, base, off);
+ if (tp_mode != Pmode)
+   dest = convert_to_mode (Pmode, dest, 1);
}
   else
{
Index: testsuite/gcc.target/i386/pr77991.c
===
--- testsuite/gcc.target/i386/pr77991.c (nonexistent)
+++ testsuite/gcc.target/i386/pr77991.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -maddress-mode=short" } */
+
+struct rcu_reader_data
+{
+  unsigned ctr;
+  _Bool waiting;
+}
+
+extern __thread rcu_reader;
+
+void rcu_read_lock()
+{
+  struct rcu_reader_data *x = &rcu_reader;
+  _Bool val = 0;
+
+  __atomic_store(&x->waiting, &val, 0);
+}


Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Pat Haugen
On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>> > I see your point and agree that current code isn't optimal.  However, I 
>> > don't think your patch is accurate either.  Consider 
>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>> > registers.  So we have a third available for zero-cost allocation 
>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>> > (FIXED_REGISTERS).
>> > 
>> > For a non-loop-single-basic-block function we should be targeting only the 
>> > third of register available at zero-cost -- correct?
Yes.

  This is what is done by the current code, but, apparently, by accident.  It 
seems that the right register count can be obtained with:
>> > 
>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>> > -++call_used_regs_num[cl];
>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>> > +++call_saved_regs_num[cl];
>> > 
>> > Does this look correct to you?
> Thinking some more, it seems like fixed_regs should not be available to the 
> scheduler no matter what.  Therefore, the number of fixed registers should be 
> subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
> bb_freq).

Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. 
I agree they should be totally removed from the register limit calculation. 
I'll rework the patch.

Thanks,
Pat



Re: [Patch, fortran] PR69566 - Failure of SELECT TYPE with unlimited polymorphic function result

2016-10-18 Thread Andre Vehreschild
Hi Paul,

> Index: gcc/fortran/trans-stmt.c
> ===
> *** gcc/fortran/trans-stmt.c  (revision 241273)
> --- gcc/fortran/trans-stmt.c  (working copy)
> *** trans_associate_var (gfc_symbol *sym, gf
> *** 1517,1523 
>   && (gfc_is_class_scalar_expr (e)
>   || gfc_is_class_array_ref (e, NULL));
> 
> !   unlimited = UNLIMITED_POLY (e);
> 
> /* Assignments to the string length need to be generated, when
>( sym is a char array or
> --- 1517,1526 
>   && (gfc_is_class_scalar_expr (e)
>   || gfc_is_class_array_ref (e, NULL));
> 
> !   unlimited = UNLIMITED_POLY (e)
> ! || (e->symtree && e->symtree->n.sym
> ! && e->symtree->n.sym->ts.type == BT_CLASS
> ! && UNLIMITED_POLY (e->symtree->n.sym));> 

You can use 

   || (e->symtree && UNLIMITED_POLY (e->symtree->n.sym));

here. UNLIMITED_POLY does all the checks. I am still wondering whether this is
necessary? The symtree is set for expr_type == { EXPR_VARIABLE, EXPR_FUNCTION }
both of those should correctly resolve to an unlimited poly ts. Is this a
left-over? 

I am regression testing my polymorhpic class patch at the moment, therefore I
can't test.

> 
> /* Assignments to the string length need to be generated, when
>( sym is a char array or

- Andre

On Tue, 18 Oct 2016 16:29:54 +0200
Paul Richard Thomas  wrote:


> Dear All,
> 
> This bug was caused by 'associate name' and 'associate entity'
> expressions being incomplete when the 'selector' was an intrinsic
> function result. I tried to fix this at source, in match_select _type
> and gfc_get_variable_expr, but caused a vast number of breakages.
> Undoubtedly, this would be the ecologically sound way to proceed but
> applying fixups in resolve_select_type, gfc_conv_class_to_class and
> trans_associate_var proved to be the path of least resistance.
> 
> Depending on which form of select type is used, the defective
> expressions either lacked the correct value for rank, a full array
> reference, a symbol with an array spec or had type BT_DERIVED for a
> polymorphic symbol; these either singly or in combination depending of
> the form of select type.
> 
> The patch, taken with the ChangeLogs, is fairly self-explanatory.
> Please note that I have suppressed whitespace changes (suppression of
> trailing blanks), so use patch with the -l option.
> 
> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
> 
> Best regards
> 
> Paul
> 
> 2016-10-18  Paul Thomas  
> 
> PR fortran/69566
> * resolve.c (fixup_array_ref): New function.
> (resolve_select_type): Gather up the rank and array reference,
> if any, from the selector. Fix up the 'associate name' and the
> 'associate entities' as necessary.
> * trans-expr.c (gfc_conv_class_to_class): If the symbol backend
> decl is a FUNCTION_DECL, use the 'fake_result_decl' instead.
> * trans-stmt.c (trans_associate_var): Extend 'unlimited' to
> include expressions which are themsleves not unlimited
> polymorphic but the symbol is.
> 
> 2016-10-18  Paul Thomas  
> 
> PR fortran/69566
> * gfortran.dg/select_type_37.f03: New test.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

2016-10-18 Thread Christophe Lyon
On 18 October 2016 at 16:45, Bernd Edlinger  wrote:
> On 10/18/16 10:36, Christophe Lyon wrote:
>>
>> I am seeing a lot of regressions since this patch was committed:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>
>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>> and "log" to download
>> the corresponding .sum/.log)
>>
>> Thanks,
>>
>> Christophe
>>
>
> Oh, sorry, I have completely missed that.
> Unfortunately I have tested one of the good combinations.
>
> I have not considered the case that in and out could be
> the same register.  But that happens here.
>
>
> This should solve it.
>
> Can you give it a try?
>

Sure, I have started the validations, it will take a few hours.

>
>
> Thanks
> Bernd.


Re: Early jump threading

2016-10-18 Thread James Greenhalgh
On Mon, Sep 19, 2016 at 11:22:27AM +0200, Jan Hubicka wrote:
> > On Mon, Sep 19, 2016 at 2:48 AM, Jan Hubicka  wrote:
> > > Hi,
> > > this is the patch compensating testsuite I commited after re-testing
> > > on x86_64-linux.
> > >
> > > Other placements of early_thread_jumps does not work veyr well (at least 
> > > in
> > > current implementation). Putting it before forwprop disables about 15% of
> > > threadings. Placing it after DCE makes inliner to not see much of benefits
> > > because threading requires a cleanup propagation+DCE after itself.
> > > So unless we extend threader to be smarter or add extra DCE cleanup, i 
> > > think
> > > this is best placement.
> > 
> > 
> > This caused (another) 3-4% degradation in coremarks on ThunderX.
> 
> Hmm, this is interesting. The patch should have "fixed" the previous
> degradation by making the profile correct (backward threader still doe not
> update it, but because most threading now happens early and profile is built
> afterwards this should be less of issue).  I am now looking into the profile
> update issues and will try to check why coremarks degrade again.
 
But, the early threader is running with speed_p set to false (second parameter
to find_jump_threads_backwards)

  unsigned int
  pass_early_thread_jumps::execute (function *fun)
  {
/* Try to thread each block with more than one successor.  */
basic_block bb;
FOR_EACH_BB_FN (bb, fun)
  {
if (EDGE_COUNT (bb->succs) > 1)
find_jump_threads_backwards (bb, false);
  }
thread_through_all_blocks (true);
return 0;
  }

So even though profile information is ignored, we think we are compiling
for size and won't thread. The relevant check in profitable_jump_thread_path
is:

  if (speed_p && optimize_edge_for_speed_p (taken_edge))
{
  
}
  else if (n_insns > 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "FSM jump-thread path not considered: "
 "duplication of %i insns is needed and optimizing for size.\n",
 n_insns);
  path->pop ();
  return NULL;
}

Changing false to true in the above hunk looks like it enables some of
the threading we're relying on here.

Thanks,
James



Re: [Patch, fortran] PR69566 - Failure of SELECT TYPE with unlimited polymorphic function result

2016-10-18 Thread Paul Richard Thomas
Hi Andre,

Thanks for a quick response:

> You can use
>
>|| (e->symtree && UNLIMITED_POLY (e->symtree->n.sym));

Ah yes, you are quite right.

> here. UNLIMITED_POLY does all the checks. I am still wondering whether this is
> necessary? The symtree is set for expr_type == { EXPR_VARIABLE, EXPR_FUNCTION 
> }
> both of those should correctly resolve to an unlimited poly ts. Is this a
> left-over?

For reasons I don't understand, sometimes the expression type comes
through as BT_DERIVED, whilst the symbol is BT_CLASS. I could repair
this in resolve.c(fixup_array_ref) if you think that would be cleaner.

>
> I am regression testing my polymorhpic class patch at the moment, therefore I
> can't test.

OK - I can wait. I have quite a few other things to do :-(

Cheers

Paul


-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH] PR77990 refactor unique_ptr to encapsulate tuple

2016-10-18 Thread Pedro Alves
On 10/18/2016 12:54 PM, Jonathan Wakely wrote:

> I'll wait a bit longer for any objections, as the refactoring could be
> seen as unnecessary churn, but I think it's valuable housekeeping.

Having stared at std::unique_ptr a lot recently, I like this, FWIW.

Thanks,
Pedro Alves



Re: [Patch, fortran] PR69566 - Failure of SELECT TYPE with unlimited polymorphic function result

2016-10-18 Thread Andre Vehreschild
Hi Paul,

> For reasons I don't understand, sometimes the expression type comes
> through as BT_DERIVED, whilst the symbol is BT_CLASS. I could repair
> this in resolve.c(fixup_array_ref) if you think that would be cleaner.

I think that I figured the rule:

- when no _class-ref is present, then the type is BT_CLASS,
- as soon as a _class-ref is present the type is BT_DERIVED. 

There is an attr.is_class. Would that be an alternative? I don't know how
reliable it is set.

> > I am regression testing my polymorhpic class patch at the moment, therefore
> > I can't test.  
> 
> OK - I can wait. I have quite a few other things to do :-(

I found an error in my patch that only manifests itself with an optimization
level great than 0. Now I am searching, never having done anything there.

- Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[C++ Patch/RFC] PR 67980 ("left shift count is negative [-Wshift-count-negative] generated for unreachable code")

2016-10-18 Thread Paolo Carlini

Hi,

in the language of our implementations details, submitter noticed that 
in terms of warnings we handle in a different way COND_EXPRs in 
tsubst_copy_and_build - we use fold_non_dependent_expr and integer_zerop 
to suppress undesired warnings by bumping c_inhibit_evaluation_warnings 
- and IF_STMTs in tsubst_expr, where we don't. My patch below, which 
passes testing, tries in a rather straightforward way to adopt the same 
mechanisms in the latter. There are quite a few details I'm not sure 
about: whether we should only use fold_non_dependent_expr for the 
purpose of suppressing the warnings -  thus never touching 'tmp' in the 
pt.c code handling IF_STMTs - which would be completely conservative in 
terms of code generation; whether there are subtle interactions with the 
new if constexpr, which I'm missing at the moment.


Thanks!

Paolo.

//

Index: cp/pt.c
===
--- cp/pt.c (revision 241297)
+++ cp/pt.c (working copy)
@@ -15403,26 +15403,46 @@
   break;
 
 case IF_STMT:
-  stmt = begin_if_stmt ();
-  IF_STMT_CONSTEXPR_P (stmt) = IF_STMT_CONSTEXPR_P (t);
-  tmp = RECUR (IF_COND (t));
-  tmp = finish_if_stmt_cond (tmp, stmt);
-  if (IF_STMT_CONSTEXPR_P (t) && integer_zerop (tmp))
-   /* Don't instantiate the THEN_CLAUSE. */;
-  else
-   RECUR (THEN_CLAUSE (t));
-  finish_then_clause (stmt);
+  {
+   tree folded_tmp;
+   bool zerop, nonzerop;
 
-  if (IF_STMT_CONSTEXPR_P (t) && integer_nonzerop (tmp))
-   /* Don't instantiate the ELSE_CLAUSE. */;
-  else if (ELSE_CLAUSE (t))
-   {
- begin_else_clause (stmt);
- RECUR (ELSE_CLAUSE (t));
- finish_else_clause (stmt);
-   }
+   stmt = begin_if_stmt ();
+   IF_STMT_CONSTEXPR_P (stmt) = IF_STMT_CONSTEXPR_P (t);
+   tmp = RECUR (IF_COND (t));
+   folded_tmp = fold_non_dependent_expr (tmp);
+   if (TREE_CODE (folded_tmp) == INTEGER_CST)
+ tmp = folded_tmp;
+   tmp = finish_if_stmt_cond (tmp, stmt);
+   zerop = integer_zerop (tmp);
+   nonzerop = integer_nonzerop (tmp);
+   if (IF_STMT_CONSTEXPR_P (t) && zerop)
+ /* Don't instantiate the THEN_CLAUSE. */;
+   else
+ {
+   if (zerop)
+ ++c_inhibit_evaluation_warnings;
+   RECUR (THEN_CLAUSE (t));
+   if (zerop)
+ --c_inhibit_evaluation_warnings;
+ }
+   finish_then_clause (stmt);
 
-  finish_if_stmt (stmt);
+   if (IF_STMT_CONSTEXPR_P (t) && nonzerop)
+ /* Don't instantiate the ELSE_CLAUSE. */;
+   else if (ELSE_CLAUSE (t))
+ {
+   begin_else_clause (stmt);
+   if (nonzerop)
+ ++c_inhibit_evaluation_warnings;
+   RECUR (ELSE_CLAUSE (t));
+   if (nonzerop)
+ --c_inhibit_evaluation_warnings;
+   finish_else_clause (stmt);
+ }
+
+   finish_if_stmt (stmt);
+  }
   break;
 
 case BIND_EXPR:
Index: testsuite/g++.dg/cpp1y/pr67980.C
===
--- testsuite/g++.dg/cpp1y/pr67980.C(revision 0)
+++ testsuite/g++.dg/cpp1y/pr67980.C(working copy)
@@ -0,0 +1,23 @@
+// { dg-do compile { target c++14 } }
+
+template 
+constexpr T cpp14_constexpr_then(T value) {
+  if (Y < 0)
+return (value << -Y);
+  else
+return 0;
+}
+
+template 
+constexpr T cpp14_constexpr_else(T value) {
+  if (Y > 0)
+return 0;
+  else
+return (value << -Y);
+}
+
+int main()
+{
+  cpp14_constexpr_then<1>(0);
+  cpp14_constexpr_else<1>(0);
+}


[PATCH][v6] GIMPLE store merging pass

2016-10-18 Thread Kyrill Tkachov

Hi Richard,

This patch is a merge of [1] and [2] and implements the manual merging of 
bitfields
as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too.
It caused me a lot of headeache because the bit offset is counted from the most 
significant bit
in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks 
irrelevant for store merging
anyway as it's just used to described RTL extract operations).
I've included ASCII diagrams of the steps in the merging algorithm.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, 
x86_64-unknown-linux-gnu.
Also tested on aarch64_be-none-elf.

How does this version look now?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html

2016-10-18  Kyrylo Tkachov  

PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* fold-const.h (can_native_encode_type_p): Declare prototype.
* fold-const.c (can_native_encode_type_p): Define.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.

2016-10-18  Kyrylo Tkachov  
Jakub Jelinek  
Andrew Pinski  

PR middle-end/22141
PR rtl-optimization/23684
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.target/aarch64/ldp_stp_4.c: Likewise.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.dg/store_merging_7.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.
* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
* g++.dg/init/new17.C: Likewise.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d6e44e4b77990b9cd48ce31f657e48c44191ade1..02632ea70483224623ac7cb2b97d18686d3a4a3c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1296,6 +1296,7 @@ OBJS = \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
 	gimple-ssa-split-paths.o \
+	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
 	gimple-ssa-sprintf.o \
 	gimple-streamer-in.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 6f24f568d313bbf8f35e8809639ba4d976b57765..f5dc86b6376ded8f2f77313d75f41572eb0e77ad 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1463,6 +1463,10 @@ fstrict-volatile-bitfields
 Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
 Force bitfield accesses to match their type width.
 
+fstore-merging
+Common Report Var(flag_store_merging) Optimization
+Merge adjacent stores.
+
 fguess-branch-probability
 Common Report Var(flag_guess_branch_prob) Optimization
 Enable guessing of branch probabilities.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 795ad1b63a914de50113cd83a1eb7094306f0634..85cedf87cf135e5eb7d53a6ebfd308e19dec7830 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -404,7 +404,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
--fstdarg-opt -fstrict-aliasing @gol
+-fstdarg-opt -fstore-merging -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
@@ -415,8 +415,8 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp -funconstrained-commons @gol
+-ftree-switch-conversion -ftree-tail-merge @gol
+-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -6668,6 +6668,7 @@ compilation time.
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
+-fstore-merging @gol
 -ftree-bit-ccp @gol
 -ftree-ccp @gol
 -ftree-ch @gol
@@ -8010,6 +8011,13 @@ Perform scalar replacement of aggregates.  This pass replaces structure
 references with scalars to prevent committing structures to memory too
 early.  This flag is enabled 

Re: [PATCH][AArch64] Align FP callee-saves

2016-10-18 Thread James Greenhalgh
On Mon, Oct 17, 2016 at 12:40:18PM +, Wilco Dijkstra wrote:
> 
> ping
>
> If the number of integer callee-saves is odd, the FP callee-saves use 8-byte
> aligned LDP/STP.  Since 16-byte alignment may be faster on some CPUs, align
> the FP callee-saves to 16 bytes and use the alignment gap for the last FP
> callee-save when possible. Besides slightly different offsets for FP
> callee-saves, the generated code doesn't change.
> 
> Bootstrap and regression pass, OK for commit?

This looks OK to me.

Thanks for the patch.

James

> ChangeLog:
> 2016-09-08  Wilco Dijkstra  
> 
>     * config/aarch64/aarch64.c (aarch64_layout_frame):
>     Align FP callee-saves.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> fed3b6e803821392194dc34a6c3df5f653d2e33e..075b3802c72a68f63b47574e19186e7ce3440b28
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2735,7 +2735,7 @@ static void
>  aarch64_layout_frame (void)
>  {
>    HOST_WIDE_INT offset = 0;
> -  int regno;
> +  int regno, last_fp_reg = INVALID_REGNUM;
>  
>    if (reload_completed && cfun->machine->frame.laid_out)
>  return;
> @@ -2781,7 +2781,10 @@ aarch64_layout_frame (void)
>    for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
>  if (df_regs_ever_live_p (regno)
>  && !call_used_regs[regno])
> -  cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
> +  {
> +   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
> +   last_fp_reg = regno;
> +  }
>  
>    if (cfun->machine->frame.emit_frame_chain)
>  {
> @@ -2805,9 +2808,21 @@ aarch64_layout_frame (void)
>  offset += UNITS_PER_WORD;
>    }
>  
> +  HOST_WIDE_INT max_int_offset = offset;
> +  offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
> +  bool has_align_gap = offset != max_int_offset;
> +
>    for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
>  if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
>    {
> +   /* If there is an alignment gap between integer and fp callee-saves,
> +  allocate the last fp register to it if possible.  */
> +   if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
> + {
> +   cfun->machine->frame.reg_offset[regno] = max_int_offset;
> +   break;
> + }
> +
>  cfun->machine->frame.reg_offset[regno] = offset;
>  if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
>    cfun->machine->frame.wb_candidate1 = regno;
> 



Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 03:54:43PM +0200, Bernd Schmidt wrote:
> On 10/18/2016 03:54 PM, Trevor Saunders wrote:
> > 
> > I do really prefer reading code where variables are declared at first
> > use
> 
> In general, so do I, but in this case it's one variable out of a whole
> bunch, which makes the entire thing look a little inconsistent.

fair, though you've got to start somewhere unless you want to change all
the variables in a function, which seems really excessive.

Trev

> 
> 
> Bernd


Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-18 Thread Alexander Monakov
On Tue, 18 Oct 2016, Bernd Schmidt wrote:
> The performance I saw was lower by a factor of 80 or so compared to their CUDA
> version, and even lower than OpenMP on the host.

The currently published OpenMP version of LULESH simply doesn't use openmp-simd
anywhere. This should make it obvious that it won't be anywhere near any
reasonable CUDA implementation, and also bound to be below host performance.
Besides, it's common for such benchmark suites to have very different levels of
hand tuning for the native-CUDA implementation vs OpenMP implementation,
sometimes to the point of significant algorithmic differences. So you're
making an invalid comparison here.

Internally at ISP RAS we used a small set of microbenchmarks implemented in
CUDA/OpenACC/OpenMP specifically for the purpose of evaluating the exact same
computations implemented in terms of different APIs. We got close performance in
all three. The biggest issue is visible on short-running OpenMP target regions:
the startup cost (going through libgomp) is non-trivial. That can be improved
with further changes in libgomp port, notably avoiding malloc, shaving off more
code, perhaps inlining more code (e.g. via LTO eventually). There's also
avoidable cuMemAlloc/cuMemFree on the libgomp plugin side.

For example, there's this patch on the branch:

libgomp: avoid malloc calls in gomp_nvptx_main

Avoid calling malloc where it's easy to use stack storage instead: device
malloc is very slow in CUDA.  This cuts about 60-80 microseconds from target
region entry/exit time, slimming down empty target regions from ~95 to ~17
microseconds (as measured on a GTX Titan).

(empty CUDA kernel is ~5 microseconds; all figures are taken via nvprof)

> To me this kind of performance doesn't look like something that will be fixed
> by fine-tuning; it leaves me undecided whether the chosen approach (what you
> call the fundamentals) is viable at all.

If you try to draw conclusions just from comparing the performance you got on
LULESH, without looking at benchmark's source (otherwise you should have
acknowledged the lack of openmp-simd and significant source-level differences
between CUDA and OpenMP implementations, like the use of __shared__ in CUDA
algorithms), I am sorry to say, but that is just ridiculous. The implementation
on the branch is far from ideal, but your method of evaluation is nonsensical.

> Performance is still better than the OpenACC version of the benchmark, but
> then I think we shouldn't repeat the mistakes we made with OpenACC and avoid
> merging something until we're sure it's ready and of benefit to users.

Would you kindly try and keep your commentary constructive. It's frustrating to
me to have to tolerate hostilities like an ad hominem attack, ignored
nvptx-backend-related questions, etc. How can the work get ready if all you do
is passively push back?  Please trust me, I have experience with GPUs and GCC.
There should be a process for getting this gradually reviewed, with fundamental
design decisions acked and patches reviewed before all tweaks and optimizations
are in place. If you suggest that the work needs to proceed on the branch
without any kind of interim review, and then reviewed in one go after satisfying
some unspecified criteria of being "ready and of benefit", that doesn't sound
right to me.

Alexander


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-18 Thread Bernd Edlinger
Hi,

this restricts the -Wint-in-bool-context warning to signed shifts,
to reduce the number of false positives Markus reported yesterday.

Bootstrap and reg-testing on x86_64-pc-linux-gnu was fine.
Is it OK for trunk?


Thanks
Bernd.
2016-10-17  Bernd Edlinger  

	* c-common.c (c_common_truthvalue_conversion): Warn only for signed
	integer shift ops in boolean context.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 241270)
+++ gcc/c-family/c-common.c	(working copy)
@@ -3328,8 +3328,10 @@
 	   TREE_OPERAND (expr, 0));
 
 case LSHIFT_EXPR:
-  warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
-		  "<< in boolean context, did you mean '<' ?");
+  if (TREE_CODE (TREE_TYPE (expr)) == INTEGER_TYPE
+	  && !TYPE_UNSIGNED (TREE_TYPE (expr)))
+	warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
+		"<< in boolean context, did you mean '<' ?");
   break;
 
 case COND_EXPR:


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-18 Thread Joseph Myers
On Tue, 18 Oct 2016, Bernd Edlinger wrote:

> Hi,
> 
> this restricts the -Wint-in-bool-context warning to signed shifts,
> to reduce the number of false positives Markus reported yesterday.

This patch seems to be missing testcases (that warned before the patch 
and don't warn after it).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Improve stack adjustment

2016-10-18 Thread James Greenhalgh
On Mon, Oct 17, 2016 at 12:38:36PM +, Wilco Dijkstra wrote:
> 
> ping
> 
> 
> From: Wilco Dijkstra
> Sent: 10 August 2016 17:20
> To: Richard Earnshaw; GCC Patches
> Cc: nd
> Subject: Re: [PATCH][AArch64] Improve stack adjustment
>     
> Richard Earnshaw wrote:
> > I see you've added a default argument for your new parameter.  I think
> > doing that is fine, but I have two comments about how we might use that
> > in this case.
> > 
> > Firstly, if this parameter is suitable for having a default value, then
> > I think the preceding one should also be treated in the same way.
> > 
> > Secondly, I think (due to these parameters being BOOL with no useful
> > context to make it clear which is which) that having wrapper functions
> > (inlined, of course) that describe the intended behaviour more clearly
> > would be useful.  So, defining
> > 
> > static inline void
> > aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> > {
> >    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> > }
> > 
> > would make it clearer at the call point than having a lot of true and
> > false parameters scattered round the code.
> > 
> > Alternatively we might remove all the default parameters and require
> > wrappers like the above to make it more explicit which API is intended -
> > this might make more sense if not all combinations make sense.
> 
> OK here is the new version using simple wrapper functions and no
> default parameters:

I've got a simple suggestion for one of your comments, and I'd like
clarification on something I'm not understanding...

> gcc/
>     * config/aarch64/aarch64.c (aarch64_add_constant_internal):
>     Add extra argument to allow emitting the move immediate.
>     Use add/sub with positive immediate.
>     (aarch64_add_constant): Add inline function.
>     (aarch64_add_sp): Likewise.
>     (aarch64_sub_sp): Likewise.
>     (aarch64_expand_prologue): Call aarch64_sub_sp.
>     (aarch64_expand_epilogue): Call aarch64_add_sp.
>     Decide when to leave out move.
>     (aarch64_output_mi_thunk): Call aarch64_add_constant.
> @@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>    return;
>  }
>  
> -  /* We need two add/sub instructions, each one performing part of the
> - calculation.  Don't do this if the addend can be loaded into register 
> with
> - a single instruction, in that case we prefer a move to a scratch 
> register
> - following by an addition.  */
> -  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
> +  /* We need two add/sub instructions, each one perform part of the
> + addition/subtraction, but don't this if the addend can be loaded into
> + register by single instruction, in that case we prefer a move to scratch
> + register following by addition.  */

This sentence is missing some words.

> +  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))

Could you explain this change? The comment makes it seem like delta would
still be correct. Probably the comment needs to say "followed by
addition/subtraction" rather than what is currently written?

>  {
>    HOST_WIDE_INT low_off = mdelta & 0xfff;
>  
> @@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, 
> int scratchreg,
>  
>    /* Otherwise use generic function to handle all other situations.  */
>    rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
> -  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
> -  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
> +  if (emit_move_imm)
> +    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, 
> mode);
> +  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
> + : gen_add2_insn (this_rtx, scratch_rtx));

What is contained in scratch_rtx here if we didn't set it up with
aarch64_internal_move_immediate? Are you not adding junk values in the
!emit_move_imm case?

Thanks,
James

>    if (frame_related_p)
>  {
>    RTX_FRAME_RELATED_P (insn) = frame_related_p;



Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Jeff Law

On 10/18/2016 02:25 AM, Richard Biener wrote:


I don't even think we have a way of knowing in the compiler if the target
has enabled divmod support in libgcc.


Yeah, that's what bothers me with the current optab libfunc query
setup -- it isn't reliable.
I wonder if we ought to just have them all available in libgcc.  If 
they're one-function-per-file in the .a, then we're not going to get 
significant codesize bloat in the resulting executables.


jeff



Re: [PATCH][AArch64] Improve stack adjustment

2016-10-18 Thread Wilco Dijkstra
James Greenhalgh wrote:
On Mon, Oct 17, 2016 at 12:38:36PM +, Wilco Dijkstra wrote:

>> +  /* We need two add/sub instructions, each one perform part of the
>> + addition/subtraction, but don't this if the addend can be loaded into
>> + register by single instruction, in that case we prefer a move to 
>> scratch
>> + register following by addition.  */

> This sentence is missing some words.

Sorry, badly edited old comment. I decided to just rewrite it, so here is the 
new version:

+  /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
+ Only do this if mdelta is not a 16-bit move as adjusting using a move
+ is better.  */


> > +  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))

> Could you explain this change? The comment makes it seem like delta would
> still be correct. Probably the comment needs to say "followed by
> addition/subtraction" rather than what is currently written?

aarch64_move_imm (mdelta, mode) is not always the same as aarch64_move_imm
(delta, mode). We later emit a move using mdelta (so that both prolog and 
epilog use
positive adjustments). Therefore we must check mdelta here, not delta.

> > +  if (emit_move_imm)
> > +    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, 
> > mode);
> > +  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
> > + : gen_add2_insn (this_rtx, scratch_rtx));

> What is contained in scratch_rtx here if we didn't set it up with
> aarch64_internal_move_immediate? Are you not adding junk values in the
> !emit_move_imm case?

This function should only be called with !emit_move_imm if scratchreg is known 
to contain
mdelta. The prolog initializes the scratch register, the liveness check is done 
in the epilog:

+  aarch64_add_sp (IP0_REGNUM, initial_adjust, df_regs_ever_live_p 
(IP0_REGNUM));

Wilco


ChangeLog:
2016-10-18  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_add_constant_internal):
Add extra argument to allow emitting the move immediate.
Use add/sub with positive immediate.
(aarch64_add_constant): Add inline function.
(aarch64_add_sp): Likewise.
(aarch64_sub_sp): Likewise.
(aarch64_expand_prologue): Call aarch64_sub_sp.
(aarch64_expand_epilogue): Call aarch64_add_sp.
Decide when to leave out move.
(aarch64_output_mi_thunk): Call aarch64_add_constant.

testsuite/
* gcc.target/aarch64/test_frame_17.c: New test.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
899ce219763a1b62e20de49e279b3fe26ee4bc05..582a9a752f788b57c6e762005c35b460eb45bec8
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1956,25 +1956,30 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
-   intermediate value if necessary.
+   intermediate value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
 
-   This function is sometimes used to adjust the stack pointer, so we must
-   ensure that it can never cause transient stack deallocation by writing an
-   invalid value into REGNUM.  */
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
 
 static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_add_constant_internal (machine_mode mode, int regnum, int scratchreg,
+  HOST_WIDE_INT delta, bool frame_related_p,
+  bool emit_move_imm)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
   rtx_insn *insn;
 
-  /* Do nothing if mdelta is zero.  */
   if (!mdelta)
 return;
 
-  /* We only need single instruction if the offset fit into add/sub.  */
+  /* Single instruction adjustment.  */
   if (aarch64_uimm12_shift (mdelta))
 {
   insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
@@ -1982,11 +1987,10 @@ aarch64_add_constant (machine_mode mode, int regnum, 
int scratchreg,
   return;
 }
 
-  /* We need two add/sub instructions, each one performing part of the
- calculation.  Don't do this if the addend can be loaded into register with
- a single instruction, in that case we prefer a move to a scratch register
- following by an addition.  */
-  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+  /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
+ O

Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-18 Thread Bernd Edlinger
On 10/18/16 19:05, Joseph Myers wrote:
> On Tue, 18 Oct 2016, Bernd Edlinger wrote:
>
>> Hi,
>>
>> this restricts the -Wint-in-bool-context warning to signed shifts,
>> to reduce the number of false positives Markus reported yesterday.
>
> This patch seems to be missing testcases (that warned before the patch
> and don't warn after it).
>

Yes of course.

New patch, this time with a test case, compiled from the linux sample.

Bootstrapped and reg-tested as usual.
Is it OK for trunk?


Bernd.
c-family:
2016-10-17  Bernd Edlinger  

	* c-common.c (c_common_truthvalue_conversion): Warn only for signed
	integer shift ops in boolean context.

testsuite:
2016-10-17  Bernd Edlinger  

	* c-c++-common/Wint-in-bool-context-2.c: New test.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 241270)
+++ gcc/c-family/c-common.c	(working copy)
@@ -3328,8 +3328,10 @@
 	   TREE_OPERAND (expr, 0));
 
 case LSHIFT_EXPR:
-  warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
-		  "<< in boolean context, did you mean '<' ?");
+  if (TREE_CODE (TREE_TYPE (expr)) == INTEGER_TYPE
+	  && !TYPE_UNSIGNED (TREE_TYPE (expr)))
+	warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
+		"<< in boolean context, did you mean '<' ?");
   break;
 
 case COND_EXPR:
Index: gcc/testsuite/c-c++-common/Wint-in-bool-context-2.c
===
--- gcc/testsuite/c-c++-common/Wint-in-bool-context-2.c	(revision 0)
+++ gcc/testsuite/c-c++-common/Wint-in-bool-context-2.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-options "-Wint-in-bool-context" } */
+/* { dg-do compile } */
+
+typedef unsigned u32;
+typedef unsigned char u8;
+#define KEYLENGTH 8
+
+int foo (u8 plen, u32 key)
+{
+  if ((plen < KEYLENGTH) && (key << plen)) /* { dg-bogus "boolean context" } */
+return -1;
+
+  if ((plen << KEYLENGTH) && (key < plen)) /* { dg-warning "boolean context" } */
+return -2;
+
+  return 0;
+}


Re: [PATCH] xtensa: add HW FPU sequences for DIV/SQRT/RECIP/RSQRT

2016-10-18 Thread augustine.sterl...@gmail.com
On Fri, Oct 14, 2016 at 12:14 PM, Max Filippov  wrote:
>
> Use new FPU instruction sequences documented in the ISA book to
> implement __divsf3, __divdf3, __recipsf2, __recipdf2, __rsqrtsf2,
> __rsqrtdf2 and __ieee754_sqrtf and __ieee754_sqrt.
>
> 2013-02-12  Ding-Kai Chen  
> libgcc/
> * config/xtensa/ieee754-df.S (__recipdf2, __rsqrtdf2,
> __ieee754_sqrt): New functions.
> (__divdf3): Add implementation with new FPU instructions under
> #if XCHAL_HAVE_DFP_DIV.
> * config/xtensa/ieee754-sf.S (__recipsf2, __rsqrtsf2,
> __ieee754_sqrtf): New functions.
> (__divsf3): Add implementation with new FPU instructions under
> #if XCHAL_HAVE_FP_DIV.
> * config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _sqrtf, _recipsf2
> _rsqrtsf2, _sqrt, _recipdf2 and _rsqrtdf2


Approved, please apply.


Re: [PATCH] xtensa: don't use unwind-dw2-fde-dip with elf targets

2016-10-18 Thread augustine.sterl...@gmail.com
On Mon, Oct 17, 2016 at 4:23 PM, Max Filippov  wrote:
> Define LIB2ADDEH_XTENSA_UNWIND_DW2_FDE to unwind-dw2-fde.c in
> xtensa/t-elf and to unwind-dw2-fde-dip.c in xtensa/t-linux and use
> LIB2ADDEH_XTENSA_UNWIND_DW2_FDE in LIB2ADDEH definition.
>
> 2016-10-17  Max Filippov  
> libgcc/
> * config/xtensa/t-elf (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
> definition.
> * config/xtensa/t-linux (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
> definition.
> * config/xtensa/t-windowed (LIB2ADDEH): Use
> LIB2ADDEH_XTENSA_UNWIND_DW2_FDE defined by either xtensa/t-elf
> or xtensa/t-linux.
>
> Signed-off-by: Max Filippov 

Approved, please apply.


[PATCH] Fix typos in experimental::shared_ptr

2016-10-18 Thread Jonathan Wakely

A couple of minor things I found whlie reviewing this code.

* include/experimental/bits/shared_ptr.h (shared_ptr(shared_ptr&&)):
Remove const from parameter.
(operator<(const shared_ptr&, nullptr_t)): Use correct
specialization of std::less.
* testsuite/experimental/memory/shared_ptr/comparison/comparison.cc:
Test comparison with nullptr and actually call test functions.

Tested x86_64-linux, comitted to trunk.


commit dd9bc04a6df29881ccbf2e68ae21b6f23cc71dd7
Author: Jonathan Wakely 
Date:   Tue Oct 18 18:31:48 2016 +0100

Fix typos in experimental::shared_ptr

* include/experimental/bits/shared_ptr.h (shared_ptr(shared_ptr&&)):
Remove const from parameter.
(operator<(const shared_ptr&, nullptr_t)): Use correct
specialization of std::less.
* testsuite/experimental/memory/shared_ptr/comparison/comparison.cc:
Test comparison with nullptr and actually call test functions.

diff --git a/libstdc++-v3/include/experimental/bits/shared_ptr.h 
b/libstdc++-v3/include/experimental/bits/shared_ptr.h
index d61789a..7a232f4 100644
--- a/libstdc++-v3/include/experimental/bits/shared_ptr.h
+++ b/libstdc++-v3/include/experimental/bits/shared_ptr.h
@@ -672,7 +672,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
shared_ptr(const shared_ptr<_Tp1>& __r) noexcept
: _Base_type(__r) { }
 
-  shared_ptr(const shared_ptr<_Tp>&& __r) noexcept
+  shared_ptr(shared_ptr&& __r) noexcept
   : _Base_type(std::move(__r)) { }
 
   template>
@@ -815,7 +815,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
  {
using __elem_t = typename shared_ptr<_Tp>::element_type;
-   return std::less<__elem_t>()(__a.get(), nullptr);
+   return std::less<__elem_t*>()(__a.get(), nullptr);
  }
 
template
diff --git 
a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
 
b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
index fafa6eb..d733811 100644
--- 
a/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
+++ 
b/libstdc++-v3/testsuite/experimental/memory/shared_ptr/comparison/comparison.cc
@@ -73,8 +73,18 @@ test02()
   return 0;
 }
 
+void
+test03()
+{
+  std::experimental::shared_ptr a(new A[5]);
+  VERIFY( nullptr < a );
+}
+
 int
 main()
 {
+  test01();
+  test02();
+  test03();
   return 0;
 }


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Eric Botcazou
> We need to pass the mode of the actual datum we would put in the TOC to
> the use_toc_relative_ref function, not the mode of its address.

Right, but this mode is not "mode", the TOC contains only Pmode entries if the 
special constant pool is excluded.

-- 
Eric Botcazou


Re: [PATCH] Fix typos in experimental::shared_ptr

2016-10-18 Thread Jonathan Wakely

On 18/10/16 19:30 +0100, Jonathan Wakely wrote:

A couple of minor things I found whlie reviewing this code.

* include/experimental/bits/shared_ptr.h (shared_ptr(shared_ptr&&)):
Remove const from parameter.
(operator<(const shared_ptr&, nullptr_t)): Use correct
specialization of std::less.
* testsuite/experimental/memory/shared_ptr/comparison/comparison.cc:
Test comparison with nullptr and actually call test functions.

Tested x86_64-linux, comitted to trunk.


I'm also committing this, which only changes whitespace.


commit f9faa4b10d1039df4dd8837149703eac1866c923
Author: Jonathan Wakely 
Date:   Tue Oct 18 19:39:11 2016 +0100

Fix indentation of experimental::shared_ptr code

	* include/experimental/bits/shared_ptr.h: Fix indentation.

diff --git a/libstdc++-v3/include/experimental/bits/shared_ptr.h b/libstdc++-v3/include/experimental/bits/shared_ptr.h
index 7a232f4..e0ec00c 100644
--- a/libstdc++-v3/include/experimental/bits/shared_ptr.h
+++ b/libstdc++-v3/include/experimental/bits/shared_ptr.h
@@ -768,170 +768,170 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   // C++14 ??20.8.2.2.7 //DOING
-   template
- bool operator==(const shared_ptr<_Tp1>& __a,
-		 const shared_ptr<_Tp2>& __b) noexcept
- { return __a.get() == __b.get(); }
+  template
+bool operator==(const shared_ptr<_Tp1>& __a,
+		const shared_ptr<_Tp2>& __b) noexcept
+{ return __a.get() == __b.get(); }
 
-   template
- inline bool
- operator==(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
- { return !__a; }
+  template
+inline bool
+operator==(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
+{ return !__a; }
 
-   template
- inline bool
- operator==(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
- { return !__a; }
+  template
+inline bool
+operator==(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
+{ return !__a; }
 
-   template
- inline bool
- operator!=(const shared_ptr<_Tp1>& __a,
-		const shared_ptr<_Tp2>& __b) noexcept
- { return __a.get() != __b.get(); }
-
-   template
- inline bool
- operator!=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
- { return (bool)__a; }
-
-   template
- inline bool
- operator!=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
- { return (bool)__a; }
-
-   template
- inline bool
- operator<(const shared_ptr<_Tp1>& __a,
+  template
+inline bool
+operator!=(const shared_ptr<_Tp1>& __a,
 	   const shared_ptr<_Tp2>& __b) noexcept
- {
-   using __elem_t1 = typename shared_ptr<_Tp1>::element_type;
-   using __elem_t2 = typename shared_ptr<_Tp2>::element_type;
-   using _CT = common_type_t<__elem_t1*, __elem_t2*>;
-   return std::less<_CT>()(__a.get(), __b.get());
- }
+{ return __a.get() != __b.get(); }
 
-   template
- inline bool
- operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
- {
-   using __elem_t = typename shared_ptr<_Tp>::element_type;
-   return std::less<__elem_t*>()(__a.get(), nullptr);
- }
+  template
+inline bool
+operator!=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
+{ return (bool)__a; }
 
-   template
- inline bool
- operator<(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
- {
-   using __elem_t = typename shared_ptr<_Tp>::element_type;
-   return std::less<__elem_t*>()(nullptr, __a.get());
- }
+  template
+inline bool
+operator!=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
+{ return (bool)__a; }
 
-   template
- inline bool
- operator<=(const shared_ptr<_Tp1>& __a,
-		const shared_ptr<_Tp2>& __b) noexcept
- { return !(__b < __a); }
+  template
+inline bool
+operator<(const shared_ptr<_Tp1>& __a,
+	  const shared_ptr<_Tp2>& __b) noexcept
+{
+  using __elem_t1 = typename shared_ptr<_Tp1>::element_type;
+  using __elem_t2 = typename shared_ptr<_Tp2>::element_type;
+  using _CT = common_type_t<__elem_t1*, __elem_t2*>;
+  return std::less<_CT>()(__a.get(), __b.get());
+}
 
-   template
- inline bool
- operator<=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
- { return !(nullptr < __a); }
+  template
+inline bool
+operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
+{
+  using __elem_t = typename shared_ptr<_Tp>::element_type;
+  return std::less<__elem_t*>()(__a.get(), nullptr);
+}
 
-   template
- inline bool
- operator<=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
- { return !(__a < nullptr); }
+  template
+inline bool
+operator<(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
+{
+  using __elem_t = typename shared_ptr<_Tp>::element_type;
+  return std::less<__elem_t*>()(nullptr, __a.get());
+}
 
-   template
- inline bool
- operator>(const shared_ptr<_Tp1>& __a,
+  template
+inline bool
+operator<=(const shared_ptr<_Tp1>& __a,
 	   const shared

Re: [PATCH] xtensa: add HW FPU sequences for DIV/SQRT/RECIP/RSQRT

2016-10-18 Thread Max Filippov
On Tue, Oct 18, 2016 at 11:22 AM, augustine.sterl...@gmail.com
 wrote:
> On Fri, Oct 14, 2016 at 12:14 PM, Max Filippov  wrote:
>>
>> Use new FPU instruction sequences documented in the ISA book to
>> implement __divsf3, __divdf3, __recipsf2, __recipdf2, __rsqrtsf2,
>> __rsqrtdf2 and __ieee754_sqrtf and __ieee754_sqrt.
>>
>> 2013-02-12  Ding-Kai Chen  
>> libgcc/
>> * config/xtensa/ieee754-df.S (__recipdf2, __rsqrtdf2,
>> __ieee754_sqrt): New functions.
>> (__divdf3): Add implementation with new FPU instructions under
>> #if XCHAL_HAVE_DFP_DIV.
>> * config/xtensa/ieee754-sf.S (__recipsf2, __rsqrtsf2,
>> __ieee754_sqrtf): New functions.
>> (__divsf3): Add implementation with new FPU instructions under
>> #if XCHAL_HAVE_FP_DIV.
>> * config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _sqrtf, _recipsf2
>> _rsqrtsf2, _sqrt, _recipdf2 and _rsqrtdf2
>
>
> Approved, please apply.

Applied to trunk. Thank you!

-- Max


Re: [PATCH] xtensa: don't use unwind-dw2-fde-dip with elf targets

2016-10-18 Thread Max Filippov
On Tue, Oct 18, 2016 at 11:22 AM, augustine.sterl...@gmail.com
 wrote:
> On Mon, Oct 17, 2016 at 4:23 PM, Max Filippov  wrote:
>> Define LIB2ADDEH_XTENSA_UNWIND_DW2_FDE to unwind-dw2-fde.c in
>> xtensa/t-elf and to unwind-dw2-fde-dip.c in xtensa/t-linux and use
>> LIB2ADDEH_XTENSA_UNWIND_DW2_FDE in LIB2ADDEH definition.
>>
>> 2016-10-17  Max Filippov  
>> libgcc/
>> * config/xtensa/t-elf (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
>> definition.
>> * config/xtensa/t-linux (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
>> definition.
>> * config/xtensa/t-windowed (LIB2ADDEH): Use
>> LIB2ADDEH_XTENSA_UNWIND_DW2_FDE defined by either xtensa/t-elf
>> or xtensa/t-linux.
>>
>> Signed-off-by: Max Filippov 
>
> Approved, please apply.

Applied to trunk. Thank you!

-- Max


Re: Use FOR_ALL_BB_FN in a few more places

2016-10-18 Thread Thomas Schwinge
Hi!

On Tue, 18 Oct 2016 07:38:17 +0200, Richard Biener  wrote:
> On October 17, 2016 6:09:02 PM GMT+02:00, Thomas Schwinge 
>  wrote:
> >[FOR_ALL_BB_FN]
> >
> >We could use the former in a few more places; OK for trunk once tested?
> 
> OK.

As posted, committed to trunk in r241315:

commit 195917446a84dd216ec19c6657b298dcb57e039d
Author: tschwinge 
Date:   Tue Oct 18 19:36:45 2016 +

Use FOR_ALL_BB_FN in a few more places

gcc/
* cfg.c (clear_bb_flags): Use FOR_ALL_BB_FN.
* config/nvptx/nvptx.c (nvptx_find_sese): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@241315 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 5 +
 gcc/cfg.c| 2 +-
 gcc/config/nvptx/nvptx.c | 8 +---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 4922809..6002dde 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-10-18  Thomas Schwinge  
+
+   * cfg.c (clear_bb_flags): Use FOR_ALL_BB_FN.
+   * config/nvptx/nvptx.c (nvptx_find_sese): Likewise.
+
 2016-10-18  Kelvin Nilsen  
 
* config/rs6000/altivec.h (vec_xl_len): New macro.
diff --git gcc/cfg.c gcc/cfg.c
index ee2e42c..6604b02 100644
--- gcc/cfg.c
+++ gcc/cfg.c
@@ -386,7 +386,7 @@ clear_bb_flags (void)
 {
   basic_block bb;
 
-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
+  FOR_ALL_BB_FN (bb, cfun)
 bb->flags &= BB_FLAGS_TO_PRESERVE;
 }
 
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 6ec8eb4..80fa9ae 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -3091,17 +3091,11 @@ nvptx_find_sese (auto_vec &blocks, 
bb_pair_vec_t ®ions)
   int ix;
 
   /* First clear each BB of the whole function.  */ 
-  FOR_EACH_BB_FN (block, cfun)
+  FOR_ALL_BB_FN (block, cfun)
 {
   block->flags &= ~BB_VISITED;
   BB_SET_SESE (block, 0);
 }
-  block = EXIT_BLOCK_PTR_FOR_FN (cfun);
-  block->flags &= ~BB_VISITED;
-  BB_SET_SESE (block, 0);
-  block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
-  block->flags &= ~BB_VISITED;
-  BB_SET_SESE (block, 0);
 
   /* Mark blocks in the function that are in this graph.  */
   for (ix = 0; blocks.iterate (ix, &block); ix++)


Grüße
 Thomas


[PATCH, v2] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Pat Haugen
The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
attempted to scale down the register limit used by -fsched-pressure for the 
case where the block in question executes as frequently as the entry block to 
just the call_clobbered (i.e. call_used) regs. But the code is actually scaling 
toward call_saved registers. The following patch corrects the following two 
issues:

1) Computes and then removes the FIXED_REGS for the pressure class.
2) Computes CALL_SAVED regs per class and subtracts out some scaled portion of 
that.

Bootstrap/regtest on powerpc64le in progress. Ok for trunk if no new failures?

-Pat


2016-10-18  Pat Haugen  

* haifa-sched.c (call_used_regs_num): Rename to...
(call_saved_regs_num): ...this.
(fixed_regs_num): New variable.
(sched_pressure_start_bb): Subtract out fixed_regs. Scale call_saved 
regs not call_used.
(alloc_global_sched_pressure_data): Compute call_saved and fixed regs.


Index: gcc/haifa-sched.c
===
--- gcc/haifa-sched.c	(revision 240569)
+++ gcc/haifa-sched.c	(working copy)
@@ -932,9 +932,10 @@ static bitmap region_ref_regs;
 /* Effective number of available registers of a given class (see comment
in sched_pressure_start_bb).  */
 static int sched_class_regs_num[N_REG_CLASSES];
-/* Number of call_used_regs.  This is a helper for calculating of
+/* Number of call_saved_regs and fixed_regs.  Helpers for calculating of
sched_class_regs_num.  */
-static int call_used_regs_num[N_REG_CLASSES];
+static int call_saved_regs_num[N_REG_CLASSES];
+static int fixed_regs_num[N_REG_CLASSES];
 
 /* Initiate register pressure relative info for scheduling the current
region.  Currently it is only clearing register mentioned in the
@@ -3896,17 +3897,19 @@ sched_pressure_start_bb (basic_block bb)
  * If the basic block executes much more often than the prologue/epilogue
  (e.g., inside a hot loop), then cost of spill in the prologue is close to
  nil, so the effective number of available registers is
- (ira_class_hard_regs_num[cl] - 0).
+ (ira_class_hard_regs_num[cl] - fixed_regs_num[cl] - 0).
  * If the basic block executes as often as the prologue/epilogue,
  then spill in the block is as costly as in the prologue, so the effective
  number of available registers is
- (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+ (ira_class_hard_regs_num[cl] - fixed_regs_num[cl]
+  - call_saved_regs_num[cl]).
  Note that all-else-equal, we prefer to spill in the prologue, since that
  allows "extra" registers for other basic blocks of the function.
  * If the basic block is on the cold path of the function and executes
  rarely, then we should always prefer to spill in the block, rather than
  in the prologue/epilogue.  The effective number of available register is
- (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */
+ (ira_class_hard_regs_num[cl] - fixed_regs_num[cl]
+  - call_saved_regs_num[cl]).  */
   {
 int i;
 int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
@@ -3923,9 +3926,10 @@ sched_pressure_start_bb (basic_block bb)
 for (i = 0; i < ira_pressure_classes_num; ++i)
   {
 	enum reg_class cl = ira_pressure_classes[i];
-	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl]
+   - fixed_regs_num[cl];
 	sched_class_regs_num[cl]
-	  -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+	  -= (call_saved_regs_num[cl] * entry_freq) / bb_freq;
   }
   }
 
@@ -7237,17 +7241,20 @@ alloc_global_sched_pressure_data (void)
 	  region_ref_regs = BITMAP_ALLOC (NULL);
 	}
 
-  /* Calculate number of CALL_USED_REGS in register classes that
-	 we calculate register pressure for.  */
+  /* Calculate number of CALL_SAVED_REGS and FIXED_REGS in register classes
+	 that we calculate register pressure for.  */
   for (int c = 0; c < ira_pressure_classes_num; ++c)
 	{
 	  enum reg_class cl = ira_pressure_classes[c];
 
-	  call_used_regs_num[cl] = 0;
+	  call_saved_regs_num[cl] = 0;
+	  fixed_regs_num[cl] = 0;
 
 	  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
-	if (call_used_regs[ira_class_hard_regs[cl][i]])
-	  ++call_used_regs_num[cl];
+	if (!call_used_regs[ira_class_hard_regs[cl][i]])
+	  ++call_saved_regs_num[cl];
+	else if (fixed_regs[ira_class_hard_regs[cl][i]])
+	  ++fixed_regs_num[cl];
 	}
 }
 }


Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-18 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 15:38:50 +0200, I wrote:
> On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener 
>  wrote:
> > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> >  wrote:
> > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
> > >  wrote:
> > >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> > >>  wrote:
> > >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> > >> >  wrote:
> > >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  
> > >> >> wrote:
> > >> >> > On 10/14/16 05:28, Richard Biener wrote:
> > >> >> >
> > >> >> >> The BB_VISITED flag has indetermined state at the beginning of a 
> > >> >> >> pass.
> > >> >> >> You have to ensure it is cleared yourself.
> > >> >> >
> > >> >> >
> > >> >> > In that case the openacc (&nvptx?) passes should be modified to 
> > >> >> > clear the
> > >> >> > flags at their start, rather than at their end.

This already is a "conceptual acknowledgement" of my patch, so...

> > >> > OK to commit the following?  Is such a test case appropriate (which 
> > >> > would
> > >> > have caught this issue right away), in particular the dg-final
> > >> > scan-tree-dump line?
> > >>
> > >> Ugh.  Not worse to what we do in various dwarf scanning I guess.
> > >
> > > ;-|
> > >
> > >> Doesn't failure lead to a miscompile eventually?  So you could formulate
> > >> this as a dg-do run test with a check for the desired outcome?
> > >
> > > No, unfortunately.  In this case the error is "benign" such that the
> > > OpenACC loop processing machinery will decide to not parallelize loops
> > > that ought to be parallelized.
> > 
> > So you can scan for "loop parallelized" instead?
> 
> The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
> have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
> seems likewise fragile (for false negatives), and less useful than
> scanning for the complete pattern.
> 
> > I fear your pattern
> > is quite fragile
> > to maintain over time.
> 
> Agreed -- but then, that's intentional: my idea for this new test case
> has been to have it actually verify the expected OpenACC loop processing,
> so it's clear that this pattern will need to be adjusted if changing the
> OpenACC loop processing.
> 
> > >  This won't generally cause any problem
> > > (apart from performance regression, obviously); it just caused problems
> > > in a few libgomp test cases that actually at run time test for
> > > parallelized execution -- which will/did trigger only with nvptx
> > > offloading enabled, which not too many people are testing.  The test case
> > > I propose below will trigger also for non-offloading configurations.
> 
> On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
> backslashes]' -- thanks, done.

If you don't like the test case as-is (do we need multi-line tree dump
scanning, just like we recently got for compiler diagnostics?), can I at
least commit the OpenACC loops processing fix?  Here is the latest
version, simplified after your r241296 IRA vs. BB_VISITED fixes:

commit 766cf9959b15a17e17e89a50e905b4c546893823
Author: Thomas Schwinge 
Date:   Mon Oct 17 15:33:09 2016 +0200

Clear basic block flags before using BB_VISITED for OpenACC loops processing

gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and
don't clear BB_VISITED after processing.

gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c  |  8 +++-
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 77f89d5..3ef796f 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19245,10 +19247,6 @@ oacc_loop_discovery ()
  that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
-
   return top;
 }
 
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
+  {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)

Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Segher Boessenkool
On Tue, Oct 18, 2016 at 08:37:47PM +0200, Eric Botcazou wrote:
> > We need to pass the mode of the actual datum we would put in the TOC to
> > the use_toc_relative_ref function, not the mode of its address.
> 
> Right, but this mode is not "mode", the TOC contains only Pmode entries if 
> the 
> special constant pool is excluded.

I don't fully understand what you mean.  This code was created for
PR65810, if that helps?


Segher


Re: [C++ Patch/RFC] PR 67980 ("left shift count is negative [-Wshift-count-negative] generated for unreachable code")

2016-10-18 Thread Paolo Carlini
... sorry, what I sent earlier in fact causes a regression in the 
libstdc++-v3 testsuite: 23_containers/list/61347.cc.


Thus, I'm back to one of my first tries earlier today: a much more 
conservative change which uses fold_non_dependent_expr only for the 
purpose of suppressing the unwanted warnings, see the below.


Thanks,
Paolo.

///
Index: cp/pt.c
===
--- cp/pt.c (revision 241313)
+++ cp/pt.c (working copy)
@@ -15410,7 +15410,14 @@
   if (IF_STMT_CONSTEXPR_P (t) && integer_zerop (tmp))
/* Don't instantiate the THEN_CLAUSE. */;
   else
-   RECUR (THEN_CLAUSE (t));
+   {
+ bool inhibit = integer_zerop (fold_non_dependent_expr (tmp));
+ if (inhibit)
+   ++c_inhibit_evaluation_warnings;
+ RECUR (THEN_CLAUSE (t));
+ if (inhibit)
+   --c_inhibit_evaluation_warnings;
+   }
   finish_then_clause (stmt);
 
   if (IF_STMT_CONSTEXPR_P (t) && integer_nonzerop (tmp))
@@ -15417,8 +15424,13 @@
/* Don't instantiate the ELSE_CLAUSE. */;
   else if (ELSE_CLAUSE (t))
{
+ bool inhibit = integer_nonzerop (fold_non_dependent_expr (tmp));
  begin_else_clause (stmt);
+ if (inhibit)
+   ++c_inhibit_evaluation_warnings;
  RECUR (ELSE_CLAUSE (t));
+ if (inhibit)
+   --c_inhibit_evaluation_warnings;
  finish_else_clause (stmt);
}
 
Index: testsuite/g++.dg/cpp1y/pr67980.C
===
--- testsuite/g++.dg/cpp1y/pr67980.C(revision 0)
+++ testsuite/g++.dg/cpp1y/pr67980.C(working copy)
@@ -0,0 +1,23 @@
+// { dg-do compile { target c++14 } }
+
+template 
+constexpr T cpp14_constexpr_then(T value) {
+  if (Y < 0)
+return (value << -Y);
+  else
+return 0;
+}
+
+template 
+constexpr T cpp14_constexpr_else(T value) {
+  if (Y > 0)
+return 0;
+  else
+return (value << -Y);
+}
+
+int main()
+{
+  cpp14_constexpr_then<1>(0);
+  cpp14_constexpr_else<1>(0);
+}


Re: [PATCH 09/16] Split class rtx_reader into base_rtx_reader vs rtx_reader

2016-10-18 Thread David Malcolm
[CCing Richard; this is re:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00273.html
]

Essentially I want to split class rtx_reader into two parts: a base class 
covering the things implemented in read-md.o, and a subclass implemented in 
read-rtl.o.

The motivation is that I want to make some of the rtx-reading functions  in 
read-rtl.c virtual in a followup, so that they can be overridden by the 
function_reader subclass in my RTL frontend.  Hence the ctor needs access to 
these functions in order to access the vtable - or the link fails.

It appears that I can't simply use read-rtl.o everywhere due to 
read-rtl.c needing insn-constants.h, which is built by genconstants, which 
would be a circular dependency, hence the split between BUILD_MD vs BUILD_RTL 
in Makefile.in to break this cycle.

On Tue, 2016-10-11 at 17:53 +0200, Bernd Schmidt wrote:
> On 10/05/2016 06:15 PM, David Malcolm wrote:
> > - rtx_reader_ptr->get_top_level_filename ());
> > + base_rtx_reader_ptr->get_top_level_filename ());
> 
> I wonder if the number of changes could be minimized by retaining the
> name rtx_reader for the base class, and using something more specific
> for the derived ones. Or would that require renaming a bunch of other
> locations?

The number of changes got rather larger with r241293
("[PATCH] read-md.c: Move various state to within class rtx_reader
(v3)"; https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01323.html
), so I'd rather just "bite the bullet" and do the renamings.

I'm not in love with the names I chose in this patch.  It does seem odd
having an "rtx_reader" class that can't actually read hierarchical rtx.

How about "md_reader" as the base class (with responsibility for the
things in read-md.o), and "rtx_reader" for the subclass (adding the
things in read-rtl.o)?

> > -static void
> > -read_rtx_operand (rtx return_rtx, int idx)
> > +rtx
> > +rtx_reader::read_rtx_operand (rtx return_rtx, int idx)
> >  {
> >RTX_CODE code = GET_CODE (return_rtx);
> >const char *format_ptr = GET_RTX_FORMAT (code);
> 
>  > +
>  > +  return return_rtx;
>  >  }
>  >
> 
> This appears to be an unrelated change. The return value needs to be 
> documented.

Indeed.  I'll split that change out into a followup.

Thanks
Dave


Re: [Ada] Set Always_Compatible_Rep to False everywhere

2016-10-18 Thread Eric Botcazou
> Agreed, let's do that for starters.

Here it is, applied on the mainline.


2016-10-18  Eric Botcazou  

* gcc-interface/Makefile.in (EXTRA_GNATRTL_NONTASKING_OBJS): Define.
(EXTRA_GNATRTL_TASKING_OBJS): Likewise.
(ARM/Android): Add atomic support.
(SPARC/Solaris): Simplify.
(x86/Solaris): Likewise.
(x86/Linux): Likewise.
(x86-64/kFreeBDS): Adjust and use system-freebsd-x86.ads
(x86/FreeBSD): Add s-mudido-affinity.adb.
(x86-64/FreeBSD): Likewise and use system-freebsd-x86.ads.
(s390/Linux): Simplify.
(PowerPC/AIX): Likewise.
(Cygwin/Mingw): Likewise.
(MIPSel/Linux): Likewise.
(ARM/Linux): Add atomic support.
(Aarch64/Linux): Use system-linux-armel.ads.
(SPARC/Linux): Simplify.
(IA-64/Linux): Minor tweak.
(IA-64/HP-UX): Likewise.
(Alpha/Linux): Likewise.
(x86-64/Linux): Use system-linux-x86.ads.
(x86/Darwin): Simplify.
(PowerPC/Darwin): Likewise.
(ARM/Darwin): Use system-darwin-arm.ads.
(ADA_EXCLUDE_SRCS): Minor reformatting.
* system-aix.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Support_Atomic_Primitives): Set to True.
* system-aix64.ads: Delete.
* system-darwin-arm.ads: New.
* system-darwin-ppc.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Support_Atomic_Primitives): Set to True.
* system-darwin-ppc64.ads: New.
* system-darwin-x86.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-darwin-x86_64.ads: Delete.
* system-freebsd-x86.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-freebsd-x86_64.ads: Delete.
* system-linux-alpha.ads (Support_Atomic_Primitives): Set to True.
* system-linux-armeb.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Support_Atomic_Primitives): Set to True.
* system-linux-armel.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Support_Atomic_Primitives): Set to True.
* system-linux-mips.ads: (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-linux-mipsel.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-linux-s390.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Stack_Check_Probes): Set to True.
* system-linux-s390x.ads: Delete.
* system-linux-sparc.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-linux-sparcv9.ads: Delete.
* system-linux-x86.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-linux-x86_64.ads: Delete.
* system-mingw-x86_64.ads: Delete.
* system-mingw.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-solaris-sparc.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
(Support_Atomic_Primitives): Set to True.
* system-solaris-sparcv9.ads: Delete.
* system-solaris-x86.ads (Word_Size): Change to Standard'Word_Size.
(Memory_Size): Change to 2 ** Word_Size.
* system-solaris-x86_64.ads: Delete.

-- 
Eric BotcazouIndex: gcc-interface/Makefile.in
===
--- gcc-interface/Makefile.in	(revision 241294)
+++ gcc-interface/Makefile.in	(working copy)
@@ -444,6 +444,12 @@ EXTRA_LIBGNAT_OBJS=
 # specific header files required to rebuild the runtime library from sources.
 EXTRA_LIBGNAT_SRCS=
 
+# Additionnal object files from Ada sources to be added in libgnat
+EXTRA_GNATRTL_NONTASKING_OBJS=
+
+# Additionnal object files from Ada sources to be added in libgnarl
+EXTRA_GNATRTL_TASKING_OBJS=
+
 # Subsets of extra libgnat sources that always go together
 VX_SIGTRAMP_EXTRA_SRCS=sigtramp.h sigtramp-vxworks-target.inc
 
@@ -454,13 +460,7 @@ EXTRA_ADALIB_OBJS=
 VX_CRTBE_EXTRA_ADALIB_OBJS=vx_crtbegin.o vx_crtbegin_auto.o vx_crtend.o
 
 # GCC spec files to be installed in $(libsubdir), so --specs=
-# finds them at runtime. Sequences of alphanum characters prefixed with '_' in
-# the filename are stripped off at installation time. This is used to strip
-# the architecture indications in vxsim spec filenames, installing e.g.
-# vxsim_ppc.spec as vxsim.spec. This allows setting up pretty general self
-# specs to perform -vxsim -> --specs=<...> translations without causing
-# conflicts since the specs are installed in a target spec

Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Eric Botcazou
> I don't fully understand what you mean.  This code was created for
> PR65810, if that helps?

OK, let's turn it into "mode" then, this doesn't change anything.

-- 
Eric Botcazou


Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2016 at 07:58:49PM +0300, Alexander Monakov wrote:
> On Tue, 18 Oct 2016, Bernd Schmidt wrote:
> > The performance I saw was lower by a factor of 80 or so compared to their 
> > CUDA
> > version, and even lower than OpenMP on the host.
> 
> The currently published OpenMP version of LULESH simply doesn't use 
> openmp-simd
> anywhere. This should make it obvious that it won't be anywhere near any
> reasonable CUDA implementation, and also bound to be below host performance.

Yeah, perhaps just changing some or all #pragma omp distribute parallel for
into #pragma omp distribute parallel for simd could do something (of course,
one should actually analyze what it does, but if it is valid for distribute
without dist_schedule clause, then the loops ought to be without forward or
backward lexical dependencies (teams can't really synchronize, though they
can use some atomics).
That said, the OpenMP port of LULESH doesn't seem to be done very carefully,
e.g. in CalcHourglassControlForElems I see:
  /* Do a check for negative volumes */
  if ( v[i] <= Real_t(0.0) ) {
vol_error = i;
  }
There is not any kind of explicit mapping of vol_error nor reduction of it,
so while in OpenMP 4.0 it would be just a possible data race (the var would
be map(tofrom: vol_error) by default and shared between teams/threads, so if
more than one thread decides to write it, it is a data race, in OpenMP 4.5
it is implicitly firstprivate(vol_error) and thus the changes to the var
(still racy) would just never be propagated back to the caller.

For the missing simd regions, it might be helpful if we were able to
"autovectorize" into the SIMT, but I guess that might be quite a lot of
work.

Jakub


  1   2   >