Re: [ARM] fix for PR49423

2012-10-13 Thread Ramana Radhakrishnan
On Tue, Sep 25, 2012 at 5:32 PM, Dinar Temirbulatov
dtemirbula...@gmail.com wrote:
 Hi Ramana,
 Here is obvious fix for PR49423, I just added pool range for

Sorry for the late response - I've been on vacation.

No it's not ok. These were removed deliberately  and subsequent
efforts to put these back on have been rejected as being incomplete.
Please investigate why reload wants to reload the load of a constant 0
which is a valid constant for HImode values into the constant pool .
In addition it might be that movhi_insn needs to grow support for movw
at the right architecture level.

With your patch and for example 920928-2.c you'll start seeing 0 going
into the constant pool and an ldrh out of that !

regards
Ramana


Re: [ARM] fix for PR49423

2012-10-13 Thread Andrew Pinski
On Fri, Oct 12, 2012 at 11:13 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Tue, Sep 25, 2012 at 5:32 PM, Dinar Temirbulatov
 dtemirbula...@gmail.com wrote:
 Hi Ramana,
 Here is obvious fix for PR49423, I just added pool range for

 Sorry for the late response - I've been on vacation.

 No it's not ok. These were removed deliberately  and subsequent
 efforts to put these back on have been rejected as being incomplete.
 Please investigate why reload wants to reload the load of a constant 0
 which is a valid constant for HImode values into the constant pool .
 In addition it might be that movhi_insn needs to grow support for movw
 at the right architecture level.

 With your patch and for example 920928-2.c you'll start seeing 0 going
 into the constant pool and an ldrh out of that !


In my case (scal-to-vec2.c at -O1 -msoft-float), the register pressure
is huge and ira (not reload as far as I can tell) was trying to reduce
it slightly by generating the constant (0x3002) which was pulled out
of the loop.

The register allocator (IRA or reload) got so confused it changed:
(insn 357 355 358 7 (set (reg:SI 372 [ D.4448 ])
(zero_extend:SI (subreg:HI (reg:SI 745 [ D.4141+4 ]) 0)))
src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161
{*arm_zero_extendhisi2_v6}
 (nil))
Into:
(insn 357 1072 1075 7 (set (reg:SI 8 r8)
(zero_extend:SI (mem/u/c:HI (symbol_ref/u:SI (*.LC0) [flags
0x2]) [0 S2 A16])))
src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 161
{*arm_zero_extendhisi2_v6}
 (nil))

(insn 1075 357 1077 7 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp)
(const_int -60 [0xffc4])) [0 %sfp+-24 S4 A32])
(reg:SI 8 r8))
src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec2.c:49 181
{*arm_movsi_insn}
 (nil))

Thanks,
Andrew Pinski


 regards
 Ramana


Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

2012-10-13 Thread Richard Sandiford
I'm having to correct my own comments again, sorry.

Richard Sandiford rdsandif...@googlemail.com writes:
 +  /* If this is post-increment, first copy the location to the reload reg.  
 */
 +  if (post  real_in != result)
 +emit_insn (gen_move_insn (result, real_in));

 Nit, but real_in != result can never be true AIUI, and I was confused how
 the code could be correct in that case.  Maybe just remove it, or make
 it an assert?

Probably obvious, but I meant real_in == result can never be true.
real_in != result could be removed or turned into an assert.

 +if (GET_CODE (op) == PLUS)
 +  {
 +plus = op;
 +op = XEXP (op, 1);
 +  }

 Sorry, I'm complaining about old reload code again, but: does this
 actually happen in LRA?  In reload, a register operand could become a
 PLUS because of elimination, but I thought LRA did things differently.
 Besides, this is only needed for:

 +if (CONST_POOL_OK_P (mode, op)
 + ((targetm.preferred_reload_class
 + (op, (enum reg_class) goal_alt[i]) == NO_REGS)
 +|| no_input_reloads_p)
 + mode != VOIDmode)
 +  {
 +rtx tem = force_const_mem (mode, op);
 +
 +change_p = true;
 +/* If we stripped a SUBREG or a PLUS above add it back.  */
 +if (plus != NULL_RTX)
 +  tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);

 and we shouldn't have (plus (constant ...) ...) after elimination
 (or at all outside of a CONST).  I don't understand why the code is
 needed even in reload.

Scratch the thing about needing it for reload.  It's obviously the
second second operand we're reloading, not the first, and it could
well be that an elimination displacement needs to be reloaded via
the constant pool.

The question for LRA still stands though.

Richard


[i386] scalar ops that preserve the high part of a vector

2012-10-13 Thread Marc Glisse

Hello,

this patch provides an alternate pattern to let combine recognize scalar 
operations that preserve the high part of a vector. If the strategy is all 
right, I could do the same for more operations (mul, div, ...). Something 
similar is also possible for V4SF (different pattern though), but probably 
not as useful.


bootstrap+testsuite ok.

2012-10-13  Marc Glisse  marc.gli...@inria.fr

PR target/54855

gcc/
* config/i386/sse.md (*sse2_vmplusminus_insnv2df3): New define_insn.

gcc/testsuite/
* gcc.target/i386/pr54855.c: New testcase.

--
Marc GlisseIndex: config/i386/sse.md
===
--- config/i386/sse.md  (revision 192420)
+++ config/i386/sse.md  (working copy)
@@ -812,20 +812,38 @@
  (const_int 1)))]
   TARGET_SSE
   @
plusminus_mnemonicssescalarmodesuffix\t{%2, %0|%0, %2}
vplusminus_mnemonicssescalarmodesuffix\t{%2, %1, %0|%0, %1, %2}
   [(set_attr isa noavx,avx)
(set_attr type sseadd)
(set_attr prefix orig,vex)
(set_attr mode ssescalarmode)])
 
+(define_insn *sse2_vmplusminus_insnv2df3
+  [(set (match_operand:V2DF 0 register_operand =x,x)
+   (vec_concat:V2DF
+ (plusminus:DF
+   (vec_select:DF 
+ (match_operand:V2DF 1 register_operand 0,x)
+ (parallel [(const_int 0)]))
+   (match_operand:DF 2 nonimmediate_operand xm,xm))
+ (vec_select:DF (match_dup 1) (parallel [(const_int 1)]]
+  TARGET_SSE2
+  @
+   plusminus_mnemonicsd\t{%2, %0|%0, %2}
+   vplusminus_mnemonicsd\t{%2, %1, %0|%0, %1, %2}
+  [(set_attr isa noavx,avx)
+   (set_attr type sseadd)
+   (set_attr prefix orig,vex)
+   (set_attr mode DF)])
+
 (define_expand mulmode3
   [(set (match_operand:VF 0 register_operand)
(mult:VF
  (match_operand:VF 1 nonimmediate_operand)
  (match_operand:VF 2 nonimmediate_operand)))]
   TARGET_SSE
   ix86_fixup_binary_operands_no_copy (MULT, MODEmode, operands);)
 
 (define_insn *mulmode3
   [(set (match_operand:VF 0 register_operand =x,x)
Index: testsuite/gcc.target/i386/pr54855.c
===
--- testsuite/gcc.target/i386/pr54855.c (revision 0)
+++ testsuite/gcc.target/i386/pr54855.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O -msse2 } */
+
+typedef double vec __attribute__((vector_size(16)));
+
+vec f (vec x)
+{
+  x[0] += 2;
+  return x;
+}
+
+vec g (vec x)
+{
+  x[0] -= 1;
+  return x;
+}
+
+/* { dg-final { scan-assembler-not mov } } */

Property changes on: testsuite/gcc.target/i386/pr54855.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native



Re: PR54915 (ssa-forwprop, vec_perm_expr)

2012-10-13 Thread Marc Glisse

On Fri, 12 Oct 2012, Marc Glisse wrote:


Hello,

apparently, in the optimization that recognizes that {v[1],v[0]} is a 
VEC_PERM_EXPR, I forgot to check that v is a 2-element vector... (not that 
there aren't things that could be done if v has a different size, just not 
directly a VEC_PERM_EXPR, and not right now, priority is to fix the bug)


Checking that v has the same type as the result seemed like the easiest way, 
but there are many variations that could be slightly better or worse.


bootstrap+testsuite ok.

2012-10-02  Marc Glisse  marc.gli...@inria.fr

PR tree-optimization/54915

gcc/
* tree-ssa-forwprop.c (simplify_vector_constructor): Check
argument's type.

gcc/testsuite/
* gcc.dg/tree-ssa/pr54915.c: New testcase.


This new version, with a slightly relaxed test, seems preferable and also 
passes testing.


--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/pr54915.c
===
--- testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+typedef double v2df __attribute__ ((__vector_size__ (16)));
+typedef double v4df __attribute__ ((__vector_size__ (32)));
+
+void f (v2df *ret, v4df* xp)
+{
+  v4df x = *xp;
+  v2df xx = { x[2], x[3] };
+  *ret = xx;
+}

Property changes on: testsuite/gcc.dg/tree-ssa/pr54915.c
___
Added: svn:eol-style
   + native
Added: svn:keywords
   + Author Date Id Revision URL

Index: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 192420)
+++ tree-ssa-forwprop.c (working copy)
@@ -2833,20 +2833,22 @@ simplify_vector_constructor (gimple_stmt
   ref = TREE_OPERAND (op1, 0);
   if (orig)
{
  if (ref != orig)
return false;
}
   else
{
  if (TREE_CODE (ref) != SSA_NAME)
return false;
+ if (!useless_type_conversion_p (type, TREE_TYPE (ref)))
+   return false;
  orig = ref;
}
   if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size)
return false;
   sel[i] = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size;
   if (sel[i] != i) maybe_ident = false;
 }
   if (i  nelts)
 return false;
 


Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760

2012-10-13 Thread Chung-Lin Tang
On 2012/10/12 06:55 AM, Oleg Endo wrote:
 This broke the recently added thread pointer built-ins on SH, but I was
 prepared for that, so no problem here.  The attached patch is a straight
 forward fix.
 
 However, with the patch applied I get an ICE on one of the SH thread
 pointer tests:  gcc/testsuite/gcc.target/sh/pr54760-3.c, function
 test04:
 
 internal compiler error: in expand_insn, at optabs.c:8208
 __builtin_set_thread_pointer (xx[i]);

Looks like I was supposed to use create_input_operand() there instead.
I've committed the attached patch as obvious. This should be fixed now.

Thanks,
Chung-Lin

* builtins.c (expand_builtin_set_thread_pointer): Use
create_input_operand() instead of create_fixed_operand().

Index: builtins.c
===
--- builtins.c  (revision 192421)
+++ builtins.c  (revision 192422)
@@ -5776,7 +5776,7 @@
   struct expand_operand op;
   rtx val = expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX,
 Pmode, EXPAND_NORMAL);  
-  create_fixed_operand (op, val);
+  create_input_operand (op, val, Pmode);
   expand_insn (icode, 1, op);
   return;
 }


Re: [PATCH 0/6] Thread pointer built-in functions / [SH] PR 54760

2012-10-13 Thread Oleg Endo
On Sat, 2012-10-13 at 17:33 +0800, Chung-Lin Tang wrote:
 On 2012/10/12 06:55 AM, Oleg Endo wrote:
  This broke the recently added thread pointer built-ins on SH, but I was
  prepared for that, so no problem here.  The attached patch is a straight
  forward fix.
  
  However, with the patch applied I get an ICE on one of the SH thread
  pointer tests:  gcc/testsuite/gcc.target/sh/pr54760-3.c, function
  test04:
  
  internal compiler error: in expand_insn, at optabs.c:8208
  __builtin_set_thread_pointer (xx[i]);
 
 Looks like I was supposed to use create_input_operand() there instead.
 I've committed the attached patch as obvious. This should be fixed now.

Yep, confirmed.  Thanks!

Cheers,
Oleg



[patch] PR54885

2012-10-13 Thread Steven Bosscher
Hello,

I hadn't realized that dead registers can still reach EQ-notes. Fixed
with the attached patch.

Bootstrappedtested with normal and release checking on
x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. OK for
trunk?

Ciao!
Steven


df_rd_lr_eq.diff
Description: Binary data


PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Tobias Schlüter


Hi,

first a question also to non-gfortraners: if I want to use std::map, 
where do I #include map?  In system.h?


Now to the patch-specific part: in this PR, module files are produced 
with random changes because the order in which symbols are written can 
depend on the memory layout.  This patch fixes this by recording which 
symbols need to be written and then processing them in order.  The patch 
doesn't make the more involved effort of putting all symbols into the 
module in an easily predicted order, instead it only makes sure that the 
order remains fixed across the compiler invocations.  The reason why the 
former is difficult is that during the process of writing a symbol, it 
can turn out that other symbols will have to be written as well (say, 
because they appear in array specifications).  Since the module-writing 
code determines which symbols to output while actually writing the file, 
recording all the symbols that need to be written before writing to the 
file would mean a lot of surgery.


I'm putting forward two patches.  One uses a C++ map to very concisely 
build up and handle the ordered list of symbols.  This has three problems:

1) gfortran maintainers may not want C++isms (even though in this case
   it's very localized, and in my opinion very transparent), and
2) it can't be backported to old release branches which are still
   compiled as C.  Joost expressed interested in a backport.
3) I don't know where to #include map (see above)
Therefore I also propose a patch where I added the necessary ~50 lines 
of boilerplate code and added the necessary traversal function to use 
gfortran's GFC_BBT to maintain the ordered tree of symbols.


Both patches pass the testsuite and Joost confirms that they fix the 
problem with CP2K.  I also verified with a few examples that they both 
produce identical .mod files as they should.


Is the C++ patch, modified to do the #include correctly, ok for the 
trunk?  If not, the C-only patch?  Can I put the C-only patch on the 
release branches?  And which?


Cheers,
- Tobi
2012-10-13  Tobias Schlüter  t...@gcc.gnu.org

PR fortran/51727
* module.c (sorted_pointer_info): New.
(gfc_get_sorted_pointer_info): New.
(free_sorted_pointer_info_tree): New.
(compare_sorted_pointer_info): New.
(find_symbols_to_write): New.
(write_symbol1_recursion): New.
(write_symbol1): Collect symbols that need writing, output in order.
(write_generic): Traverse tree in order.

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 5cfc335..4cfcae4 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -5150,32 +5150,122 @@ write_symbol0 (gfc_symtree *st)
 }
 
 
-/* Recursive traversal function to write the secondary set of symbols
-   to the module file.  These are symbols that were not public yet are
-   needed by the public symbols or another dependent symbol.  The act
-   of writing a symbol can modify the pointer_info tree, so we cease
-   traversal if we find a symbol to write.  We return nonzero if a
-   symbol was written and pass that information upwards.  */
+/* Type for the temporary tree used when writing secondary symbols.  */
+
+struct sorted_pointer_info
+{
+  BBT_HEADER (sorted_pointer_info);
+
+  pointer_info *p;
+};
+
+#define gfc_get_sorted_pointer_info() XCNEW (sorted_pointer_info)
+
+/* Recursively traverse the temporary tree, free its contents.  */
+
+static void
+free_sorted_pointer_info_tree (sorted_pointer_info *p)
+{
+  if (!p)
+return;
+
+  free_sorted_pointer_info_tree (p-left);
+  free_sorted_pointer_info_tree (p-right);
+
+  free (p);
+}
+
+/* Comparison function for the temporary tree.  */
 
 static int
-write_symbol1 (pointer_info *p)
+compare_sorted_pointer_info (void *_spi1, void *_spi2)
 {
-  int result;
+  sorted_pointer_info *spi1, *spi2;
+  spi1 = (sorted_pointer_info *)_spi1;
+  spi2 = (sorted_pointer_info *)_spi2;
 
+  if (spi1-p-integer  spi2-p-integer)
+return -1;
+  if (spi1-p-integer  spi2-p-integer)
+return 1;
+  return 0;
+}
+
+
+/* Finds the symbols that need to be written and collects them in the
+   sorted_pi tree so that they can be traversed in an order
+   independent of memory addresses.  */
+
+static void
+find_symbols_to_write(sorted_pointer_info **tree, pointer_info *p)
+{
+  if (!p)
+return;
+
+  if (p-type == P_SYMBOL  p-u.wsym.state == NEEDS_WRITE)
+{
+  sorted_pointer_info *sp = gfc_get_sorted_pointer_info();
+  sp-p = p; 
+ 
+  gfc_insert_bbt (tree, sp, compare_sorted_pointer_info);
+   }
+
+  find_symbols_to_write (tree, p-left);
+  find_symbols_to_write (tree, p-right);
+}
+
+
+/* Recursive function that traverses the tree of symbols that need to be
+   written and writes them in order.  */
+
+static void
+write_symbol1_recursion (sorted_pointer_info *sp)
+{
+  if (!sp)
+return;
+
+  write_symbol1_recursion (sp-left);
+
+  pointer_info *p1 = sp-p;
+  gcc_assert (p1-type == P_SYMBOL  

Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Tobias Schlüter


ps I forgot to mention that I also changed write_generic to traverse the 
tree in defined order, this is the same in the C++ and the C-only patch.


Cheers,
- Tobi

On 2012-10-13 15:26, Tobias Schlüter wrote:


Hi,

first a question also to non-gfortraners: if I want to use std::map,
where do I #include map?  In system.h?

Now to the patch-specific part: in this PR, module files are produced
with random changes because the order in which symbols are written can
depend on the memory layout.  This patch fixes this by recording which
symbols need to be written and then processing them in order.  The patch
doesn't make the more involved effort of putting all symbols into the
module in an easily predicted order, instead it only makes sure that the
order remains fixed across the compiler invocations.  The reason why the
former is difficult is that during the process of writing a symbol, it
can turn out that other symbols will have to be written as well (say,
because they appear in array specifications).  Since the module-writing
code determines which symbols to output while actually writing the file,
recording all the symbols that need to be written before writing to the
file would mean a lot of surgery.

I'm putting forward two patches.  One uses a C++ map to very concisely
build up and handle the ordered list of symbols.  This has three problems:
1) gfortran maintainers may not want C++isms (even though in this case
it's very localized, and in my opinion very transparent), and
2) it can't be backported to old release branches which are still
compiled as C.  Joost expressed interested in a backport.
3) I don't know where to #include map (see above)
Therefore I also propose a patch where I added the necessary ~50 lines
of boilerplate code and added the necessary traversal function to use
gfortran's GFC_BBT to maintain the ordered tree of symbols.

Both patches pass the testsuite and Joost confirms that they fix the
problem with CP2K.  I also verified with a few examples that they both
produce identical .mod files as they should.

Is the C++ patch, modified to do the #include correctly, ok for the
trunk?  If not, the C-only patch?  Can I put the C-only patch on the
release branches?  And which?

Cheers,
- Tobi




[PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Steven Bosscher
Hello,

This fixes the long-standing enhancement request to use DF_LIVE in
IRA. To quote the first comment in the PR:

IRA should be using the DF-LIVE sets, which are smaller than the DF-LR
sets when they are available (typically at O2 and above). The proper
sets can be conveniently accessed using the df_get_live_[in,out]
functions which use DF-LIVE if it is available and fall back to DF-LR
if it is not.

I thought that DF_LIVE wasn't available at -O1 in IRA, but
interestingly ira.c:ira() adds it to the DF-problems list in that case
already:

$ grep -C3 -n df_live_add ira.c
4160-
4161-  if (optimize == 1)
4162-{
4163:  df_live_add_problem ();
4164-  df_live_set_all_dirty ();
4165-}
4166-#ifdef ENABLE_CHECKING

So DF_LIVE is already being computed for IRA. It's just not being used
because the DF_LR_{IN,OUT} macros are used instead of the
df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are
used, which results in marginally faster compile times for my set of
cc1-i files on power64 and on x86_64, as well as smaller code. There's
also a good speedup for the PR54146 test case.

Only reload.c still uses the DF_LR_OUT. The comment in
find_dummy_reload() suggests this is intentional.

I also fond a bug in IRA:

4392- touch the artificial uses and defs.  */
4393-  df_finish_pass (true);
4394-  if (optimize  1)
4395:df_live_add_problem ();
4396-  df_scan_alloc (NULL);
4397-  df_scan_blocks ();
4398-

At optimize1 the DF_LIVE problem is already always computed. I think
the idea here was to *remove* the DF_LIVE problem at -O1, which is
also part of the attached patch.
(Perhaps we should even simply not add the DF_LIVE problem at -O1, but
that can be done in a follow-up patch if necessary.)

Bootstrapped and tested (with default checking and with release
checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven

PR rtl-optimization/38711
* df.h (df_get_live_out, df_get_live_in): Make static inline functions.
* df-problems.c (df_get_live_out, df_get_live_in): Moved to df.h.
* ira-lives.c (process_bb_node_lives): Use df_get_live_out instead of
DF_LR_OUT.
* ira-build.c (create_bb_allocnos): Likewise.
(create_loop_allocnos): Likewise, and use df_get_live_in instead of
DF_LR_IN.
* ira-emit.c (generate_edge_moves): Likewise.
(add_ranges_and_copies): Likewise.
* ira.c (mark_elimination): Update DF_LR and DF_LIVE.
(build_insn_chain): Use df_get_live_out instead of DF_LR_OUT.
(do_reload): Remove the DF_LIVE problem for -O1.
* ira-color.c (ira_loop_edge_freq): Use df_get_live_out instead of
DF_LR_OUT, and df_get_live_in instead of DF_LR_IN.


ira-speedup-3.diff
Description: Binary data


Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-13 Thread Eric Botcazou
 The trouble area, and where I need help from either Rainer or Eric,
 is the Solaris2 bits.
 
 I think we need to move the Solaris assembler stuff over to a model
 where it passes:
 
   -m{32,64} -xarch=sparcFOO
 
 instead of using the v8plusX stuff to indicate 32bit.  And that's
 the direction I tried to move in here.

The only versions of the Solaris assembler I have access to only support 
v8plusX according to the man page.  Has that changed recently?

 Could one of you help me get the solaris side correct?  I made sure
 that binutils accepts the same options for this stuff, that's why
 I can unconditionally use '-xarch=sparc4' in the configure test.

sparc4 sounds very 1990-ish...

   * config/sparc/sparc.opt (mvis4): New option.
   * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4,
   define __VIS__ to 0x400.

What's the relationship between VIS4 and SPARC-T4 exactly?  The above manual 
only speaks of VIS3 as far as I can see.  And the CBcond instructions are not 
marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange.
Why not make them depend on -mcpu=niagara4 instead?

-- 
Eric Botcazou


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Vladimir Makarov

On 12-10-13 9:40 AM, Steven Bosscher wrote:

Hello,

This fixes the long-standing enhancement request to use DF_LIVE in
IRA. To quote the first comment in the PR:

IRA should be using the DF-LIVE sets, which are smaller than the DF-LR
sets when they are available (typically at O2 and above). The proper
sets can be conveniently accessed using the df_get_live_[in,out]
functions which use DF-LIVE if it is available and fall back to DF-LR
if it is not.

I thought that DF_LIVE wasn't available at -O1 in IRA, but
interestingly ira.c:ira() adds it to the DF-problems list in that case
already:

$ grep -C3 -n df_live_add ira.c
4160-
4161-  if (optimize == 1)
4162-{
4163:  df_live_add_problem ();
4164-  df_live_set_all_dirty ();
4165-}
4166-#ifdef ENABLE_CHECKING

So DF_LIVE is already being computed for IRA. It's just not being used
because the DF_LR_{IN,OUT} macros are used instead of the
df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are
used, which results in marginally faster compile times for my set of
cc1-i files on power64 and on x86_64, as well as smaller code. There's
also a good speedup for the PR54146 test case.

Only reload.c still uses the DF_LR_OUT. The comment in
find_dummy_reload() suggests this is intentional.

I also fond a bug in IRA:

4392- touch the artificial uses and defs.  */
4393-  df_finish_pass (true);
4394-  if (optimize  1)
4395:df_live_add_problem ();
4396-  df_scan_alloc (NULL);
4397-  df_scan_blocks ();
4398-

At optimize1 the DF_LIVE problem is already always computed. I think
the idea here was to *remove* the DF_LIVE problem at -O1, which is
also part of the attached patch.
(Perhaps we should even simply not add the DF_LIVE problem at -O1, but
that can be done in a follow-up patch if necessary.)

Bootstrapped and tested (with default checking and with release
checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu.
OK for trunk?


LIVE is not used on purpose.  I added LIVE usage to the old RA about 10 
years ago (it was before DF-infrastructure).  Everything was ok until, 
somebody reported compiler crash on semantically wrong program 
(something with usage of uninitialized variable).  After that I removed 
this code.


Probably, that is also a reason why reload uses LR not LIVE.  So I'd 
like to wait a few weeks.  When I have more time, I am going to 
reproduce this test suite and look how IRA behaves on it.


Thanks, Steven.


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Steven Bosscher
On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote:
 On 12-10-13 9:40 AM, Steven Bosscher wrote:

 Hello,

 This fixes the long-standing enhancement request to use DF_LIVE in
 IRA. To quote the first comment in the PR:

 IRA should be using the DF-LIVE sets, which are smaller than the DF-LR
 sets when they are available (typically at O2 and above). The proper
 sets can be conveniently accessed using the df_get_live_[in,out]
 functions which use DF-LIVE if it is available and fall back to DF-LR
 if it is not.

 I thought that DF_LIVE wasn't available at -O1 in IRA, but
 interestingly ira.c:ira() adds it to the DF-problems list in that case
 already:

 $ grep -C3 -n df_live_add ira.c
 4160-
 4161-  if (optimize == 1)
 4162-{
 4163:  df_live_add_problem ();
 4164-  df_live_set_all_dirty ();
 4165-}
 4166-#ifdef ENABLE_CHECKING

 So DF_LIVE is already being computed for IRA. It's just not being used
 because the DF_LR_{IN,OUT} macros are used instead of the
 df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are
 used, which results in marginally faster compile times for my set of
 cc1-i files on power64 and on x86_64, as well as smaller code. There's
 also a good speedup for the PR54146 test case.

 Only reload.c still uses the DF_LR_OUT. The comment in
 find_dummy_reload() suggests this is intentional.

 I also fond a bug in IRA:

 4392- touch the artificial uses and defs.  */
 4393-  df_finish_pass (true);
 4394-  if (optimize  1)
 4395:df_live_add_problem ();
 4396-  df_scan_alloc (NULL);
 4397-  df_scan_blocks ();
 4398-

 At optimize1 the DF_LIVE problem is already always computed. I think
 the idea here was to *remove* the DF_LIVE problem at -O1, which is
 also part of the attached patch.
 (Perhaps we should even simply not add the DF_LIVE problem at -O1, but
 that can be done in a follow-up patch if necessary.)

 Bootstrapped and tested (with default checking and with release
 checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu.
 OK for trunk?


 LIVE is not used on purpose.  I added LIVE usage to the old RA about 10
 years ago (it was before DF-infrastructure).  Everything was ok until,
 somebody reported compiler crash on semantically wrong program (something
 with usage of uninitialized variable).  After that I removed this code.

GCC was a completely different compiler 10 years ago. Handling of
uninitialized variables is radically different since tree-ssa, and
also the resource allocation side of the compiler has essentially been
rewritten (by you :-). Whatever broke in the compiler 10 years ago may
not be relevant anymore today.

I can't find your work, or the reported problem, in the gcc mailing
list archives. Did you add a test case for the problem, and if so,
where can I find it?


 Probably, that is also a reason why reload uses LR not LIVE.

The reason why reload uses DF_LR is PR20973, as documented in the code.
(And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the
transformations of pass_initialize_regs to it, that pass exists to
avoid uninitialized regs hazards by making them initialized.)


  So I'd like to
 wait a few weeks.  When I have more time, I am going to reproduce this test
 suite and look how IRA behaves on it.

I don't think that's giving this patch a fair chance. In a few week's
time, stage1 will be over. You may not have time, but that doesn't
mean other developers should wait. If you have a test case that fails
with my patch, I'll gladly spend my own time on it to see if it can be
fixed.

My patch was properly tested on two primary targets and passes all
tests in the test suite, it improves the compiler now, and it
addresses a PR that you've basically ignored for 4 years. I kindly ask
you to reconsider your position :-)

Ciao!
Steven


Re: [lra] patch to fix GCC crash on a SPEC2006 test

2012-10-13 Thread Peter Bergner
On Thu, 2012-10-11 at 23:53 -0400, Vladimir Makarov wrote:
 My biggest problem is ESL.  I should use simpler phrases.

Heh, I'm not sure compiler speak qualifies as English. :)


 Is the following comment better?
 
 Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does not affect value 
 of insn_bitmap of the corresponding lra_reg_info.  That is because we 
 don't need to reload pseudos in CALL_INSN_FUNCTION_USAGEs.  So if we 
 process only insns in the insn_bitmap of given pseudo here, we can miss 
 the pseudo in some CALL_INSN_FUNCTION_USAGEs.

Sure, that's better.  Thanks.

Peter





[PATCH] Fix gcov handling directories with periods

2012-10-13 Thread Andreas Schwab
PR gcov-profile/44728
* gcov.c (create_file_names): When stripping extension only look
at base name.
---
 gcc/gcov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index cf26ce1..09831c2 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -842,7 +842,7 @@ create_file_names (const char *file_name)
 }
 
   /* Remove the extension.  */
-  cptr = strrchr (name, '.');
+  cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.');
   if (cptr)
 *cptr = 0;
 
-- 
1.7.12.3

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Two obvious loop-iv fixes

2012-10-13 Thread Jan Hubicka
Hi,
while proofreading loop-iv for possible cause of the miscompilation that turned
out to be webizer bug I noticed two minor problems.
1) determine_max_iter has path dealing with AND that is bogus, because
   constant argument is always canonized to be second. Enabling the path
   however somethimes leads to worse results, so I improved it a bit
   by combining both estimates.
2) bounds are recorded as signed values intead of unsigned.  This means
   that values with upper bit set gets promoted to almost infinity for
   double int.

Bootstrappedregtested x86_64-linux, comitted as obvious.

Honza

* loop-iv.c (determine_max_iter): Fix handling of AND.
(iv_number_of_iterations): Record upper bounds as unsigned
values.
Index: loop-iv.c
===
--- loop-iv.c   (revision 192409)
+++ loop-iv.c   (working copy)
@@ -2224,13 +2224,18 @@ determine_max_iter (struct loop *loop, s
   rtx niter = desc-niter_expr;
   rtx mmin, mmax, cmp;
   unsigned HOST_WIDEST_INT nmax, inc;
+  unsigned HOST_WIDEST_INT andmax = 0;
+
+  /* We used to look for constant operand 0 of AND,
+ but canonicalization should always make this impossible.  */
+  gcc_checking_assert (GET_CODE (niter) != AND
+  || !CONST_INT_P (XEXP (niter, 0)));
 
   if (GET_CODE (niter) == AND
-   CONST_INT_P (XEXP (niter, 0)))
+   CONST_INT_P (XEXP (niter, 1)))
 {
-  nmax = INTVAL (XEXP (niter, 0));
-  if (!(nmax  (nmax + 1)))
-   return nmax;
+  andmax = UINTVAL (XEXP (niter, 1));
+  niter = XEXP (niter, 0);
 }
 
   get_mode_bounds (desc-mode, desc-signed_p, desc-mode, mmin, mmax);
@@ -2258,7 +2263,13 @@ determine_max_iter (struct loop *loop, s
   if (dump_file)
fprintf (dump_file, ;; improved upper bound by one.\n);
 }
-  return nmax / inc;
+  nmax /= inc;
+  if (andmax)
+nmax = MIN (nmax, andmax);
+  if (dump_file)
+fprintf (dump_file, ;; Determined upper bound 
HOST_WIDEST_INT_PRINT_DEC.\n,
+nmax);
+  return nmax;
 }
 
 /* Computes number of iterations of the CONDITION in INSN in LOOP and stores
@@ -2563,7 +2574,7 @@ iv_number_of_iterations (struct loop *lo
 ? iv0.base
 : mode_mmin);
  max = (up - down) / inc + 1;
- record_niter_bound (loop, double_int::from_shwi (max),
+ record_niter_bound (loop, double_int::from_uhwi (max),
  false, true);
 
  if (iv0.step == const0_rtx)
@@ -2776,14 +2787,14 @@ iv_number_of_iterations (struct loop *lo
 
   desc-const_iter = true;
   desc-niter = val  GET_MODE_MASK (desc-mode);
-  record_niter_bound (loop, double_int::from_shwi (desc-niter),
+  record_niter_bound (loop, double_int::from_uhwi (desc-niter),
  false, true);
 }
   else
 {
   max = determine_max_iter (loop, desc, old_niter);
   gcc_assert (max);
-  record_niter_bound (loop, double_int::from_shwi (max),
+  record_niter_bound (loop, double_int::from_uhwi (max),
  false, true);
 
   /* simplify_using_initial_values does a copy propagation on the registers


Re: Two obvious loop-iv fixes

2012-10-13 Thread Steven Bosscher
On Sat, Oct 13, 2012 at 6:54 PM, Jan Hubicka wrote:
 2) bounds are recorded as signed values intead of unsigned.  This means
that values with upper bit set gets promoted to almost infinity for
double int.

Could you check and see if this change fixes PR54919?

Ciao!
Steven


Re: Use conditional casting with symtab_node

2012-10-13 Thread Lawrence Crowl
On 10/12/12, Richard Biener richard.guent...@gmail.com wrote:
 On Oct 11, 2012 Xinliang David Li davi...@google.com wrote:
 On Oct 11, 2012 Lawrence Crowl cr...@googlers.com wrote:
 On 10/10/12, Xinliang David Li davi...@google.com wrote:
 In a different thread, I proposed the following alternative to
 'try_xxx':

 templatetypename T T* symbol::cast_to(symbol* p) {
if (p-isT())
   return static_castT*(p);
return 0;
  }

 cast:

 templatetypename T T symbol:as(symbol* p) {
assert(p-isT())
return static_castT(*p);

  }

 My concern here was never the technical feasibility, nor the
 desirability of having the facility for generic code, but the amount
 of the amount of typing in non-generic contexts.  That is

   if (cgraph_node *q = p-try_function ())

 versus

   if (cgraph_node *q = p-cast_to cgraph_node * ())

 I thought the latter would generate objections.  Does anyone object
 to the extra typing?

 If not, I can make that change.

 Think about a more complex class hierarchy and interface consistency.
 I believe you don't want this:

 tree::try_decl () { .. }
 tree::try_ssa_name () { ..}
 tree::try_type() {...}
 tree::try_int_const () {..}
 tree::try_exp () { ...}
  ..

 Rather one (or two with the const_cast version).

 template T T *tree::cast_to ()
 {
if (kind_ == kind_traitsT::value)
 return static_castT* (this);

  return 0;
 }

 I also think that instead of

   if (cgraph_node *q = p-cast_to cgraph_node * ())

 we want

   if ((q = cast_to cgraph_node * (p))

 I see absolutely no good reason to make cast_to a member, given
 that the language has static_cast, const_cast and stuff.  cast_to
 would simply be our equivalent to dynamic_cast within our OO model.

Sorry, that was a think-o.  Agreed there is no point in it being a
member.

 Then I'd call it *_cast instead of cast_*, so, why not gcc_cast  ?
 Or dyn_cast  ().  That way

   if ((q = dyn_cast function * (p))

 would be how to use it.

Which function?  The point with my original proposal is that the
kind of function was derivable from context, and thus can be shorter.

 Small enough, compared to

   if ((q = p-try_function ()))

 and a lot more familiar to folks knowing C++ (and probably doesn't
 make a difference to others).

 Thus, please re-use or follow existing concepts.

Both are existing concepts.  What I proposed is a common technique
for avoiding the cost of dynamic_cast when down casting in a class
hierarchy.  Both concepts will work.  I proposed what I thought
would be most convenient to programmers.  However, I will change
to the other form unless someone objects.

 As for the example with tree we'd then have

   if ((q = dyn_cast INTEGER_CST (p)))

 if we can overload on the template parameter kind (can we?
 typename vs.  enum?)

There are two template parameters, one for the argument type and
one for the return type.  We can overload on the argument type but
not on the return type.  We can, however, (indirectly) partially
specialize on the return type.  We need to do that anyway to
represent the fact that not all type conversions are legal.

However, I recommend against specializing on the enum, as it would
become somewhat obscure when the hierarchy is discriminated on more
than one enum.

 or use tree_cast  (no I don't want dyn_cast tree_common
 all around the code).

Once we have proper class hierarchies, derived to base conversions
are implicit.  In the meantime, we need something else.

-- 
Lawrence Crowl


Re: [PATCH] Fix gcov handling directories with periods

2012-10-13 Thread Ian Lance Taylor
On Sat, Oct 13, 2012 at 8:51 AM, Andreas Schwab sch...@linux-m68k.org wrote:
 PR gcov-profile/44728
 * gcov.c (create_file_names): When stripping extension only look
 at base name.

 diff --git a/gcc/gcov.c b/gcc/gcov.c
 index cf26ce1..09831c2 100644
 --- a/gcc/gcov.c
 +++ b/gcc/gcov.c
 @@ -842,7 +842,7 @@ create_file_names (const char *file_name)
  }

/* Remove the extension.  */
 -  cptr = strrchr (name, '.');
 +  cptr = strrchr (CONST_CAST (char *, lbasename (name)), '.');
if (cptr)
  *cptr = 0;

Why do you need the CONST_CAST?  strrchr is a standard function and it
takes const char * as the first argument.  There is other code in gcc
that calls strrchr with a const char * argument.

This patch is OK without the CONST_CAST.

Thanks.

Ian


Re: Use conditional casting with symtab_node

2012-10-13 Thread Gabriel Dos Reis
On Sat, Oct 13, 2012 at 12:44 PM, Lawrence Crowl cr...@googlers.com wrote:

 Thus, please re-use or follow existing concepts.

 Both are existing concepts.  What I proposed is a common technique
 for avoiding the cost of dynamic_cast when down casting in a class
 hierarchy.  Both concepts will work.  I proposed what I thought
 would be most convenient to programmers.  However, I will change
 to the other form unless someone objects.

Let me elaborate on your point.

The concept you are trying to implement is that of:
   (a) check whether a tree is of a certain kind;
   (b) if so return a pointer to the (sub-)object of that kind;
 otherwise a null pointer.

This is a standard idiom -- at least when using C++.

It can be implemented in various ways.  Your earlier attempt
try_xxx is one example.  Another common example, built into
the language is dynamic_cast.  The latter requires that classes
involved in this test be polymorphic (because the builtin implementation
needs to consult metadata that are already present with virtual
functions.)  Yet, another implementation is what is currently in GCC
more-or-less.

I think we should name the operation based on the abstract concept
we are implementing as opposed the builtin language implementation.
That is why I recommend is over dyn_cast or variations of it.

We should be able to understand its uses without having to know
the implementation details.


 However, I recommend against specializing on the enum, as it would
 become somewhat obscure when the hierarchy is discriminated on more
 than one enum.

100% agreed.

-- Gaby


Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Diego Novillo

On 2012-10-13 09:26 , Tobias Schlüter wrote:


Hi,

first a question also to non-gfortraners: if I want to use std::map,
where do I #include map?  In system.h?


No.  Ada includes system.h in pure C code.  Why not include it exactly 
where you need it?



Diego.


Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Tobias Schlüter

On 2012-10-13 20:00, Diego Novillo wrote:

On 2012-10-13 09:26 , Tobias Schlüter wrote:

first a question also to non-gfortraners: if I want to use std::map,
where do I #include map?  In system.h?


No.  Ada includes system.h in pure C code.  Why not include it exactly
where you need it?


Ok, I was just wondering if there's a rule because a quick grep revealed 
no non-header source file that includes system headers.


Thanks,
- Tobi



Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Diego Novillo

On 2012-10-13 14:04 , Tobias Schlüter wrote:

On 2012-10-13 20:00, Diego Novillo wrote:

On 2012-10-13 09:26 , Tobias Schlüter wrote:

first a question also to non-gfortraners: if I want to use std::map,
where do I #include map?  In system.h?


No.  Ada includes system.h in pure C code.  Why not include it exactly
where you need it?


Ok, I was just wondering if there's a rule because a quick grep revealed
no non-header source file that includes system headers.


Joseph or others may have reason to create a system.hxx file or some 
such, but in principle I don't see why we shouldn't include std library 
headers as we need them.


Joseph?


Diego.



Re: [PATCH] Fix gcov handling directories with periods

2012-10-13 Thread Andreas Schwab
Ian Lance Taylor i...@google.com writes:

 Why do you need the CONST_CAST?  strrchr is a standard function and it
 takes const char * as the first argument.  There is other code in gcc
 that calls strrchr with a const char * argument.

strrchr is overloaded as const and non-const in C++.  We need the
non-const version.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue

2012-10-13 Thread Uros Bizjak
On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Hello,

 I was running a couple of tests on various platforms in preparation
 of getting the find_reload_subreg_address patch needed by aarch64 upstream:
 http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01421.html

 This unfortunately uncovered a regression in vect-98-big-array.c on i386.
 It seems to me that this is due to a pre-existing problem in the back-end.

[...]

 But in principle the problem is latent even without the patch.  The back-end
 should not permit reload to make changes to the pattern that fundamentally
 change the semantics of the resulting instruction ...

 I was wondering if the i386 port maintainers could have a look at this
 pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
 value and one to *store* and unaligned value, and not permit that memory
 access to get reloaded?

Please find attached a fairly mechanical patch that splits
move_unaligned pattern into load_unaligned and store_unaligned
patterns. We've had some problems with this pattern, and finally we
found the reason to make unaligned moves more robust.

I will wait for the confirmation that attached patch avoids the
failure you are seeing with your reload patch.

 [I guess a more fundamental change might also be to not have an unspec
 in the first place, but only plain moves, and check MEM_ALIGN in the
 move insn emitter to see which variant of the instruction is required ...]

We already do this for AVX moves, but SSE aligned moves should ICE on
unaligned access. This is handy to find errors in user programs (or
... well ... in the compiler itself).

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 192420)
+++ config/i386/i386.c  (working copy)
@@ -16059,7 +16059,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r
 {
   rtx m;
   rtx (*extract) (rtx, rtx, rtx);
-  rtx (*move_unaligned) (rtx, rtx);
+  rtx (*load_unaligned) (rtx, rtx);
+  rtx (*store_unaligned) (rtx, rtx);
   enum machine_mode mode;
 
   switch (GET_MODE (op0))
@@ -16068,39 +16069,52 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r
   gcc_unreachable ();
 case V32QImode:
   extract = gen_avx_vextractf128v32qi;
-  move_unaligned = gen_avx_movdqu256;
+  load_unaligned = gen_avx_loaddqu256;
+  store_unaligned = gen_avx_storedqu256;
   mode = V16QImode;
   break;
 case V8SFmode:
   extract = gen_avx_vextractf128v8sf;
-  move_unaligned = gen_avx_movups256;
+  load_unaligned = gen_avx_loadups256;
+  store_unaligned = gen_avx_storeups256;
   mode = V4SFmode;
   break;
 case V4DFmode:
   extract = gen_avx_vextractf128v4df;
-  move_unaligned = gen_avx_movupd256;
+  load_unaligned = gen_avx_loadupd256;
+  store_unaligned = gen_avx_storeupd256;
   mode = V2DFmode;
   break;
 }
 
-  if (MEM_P (op1)  TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
+  if (MEM_P (op1))
 {
-  rtx r = gen_reg_rtx (mode);
-  m = adjust_address (op1, mode, 0);
-  emit_move_insn (r, m);
-  m = adjust_address (op1, mode, 16);
-  r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
-  emit_move_insn (op0, r);
+  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
+   {
+ rtx r = gen_reg_rtx (mode);
+ m = adjust_address (op1, mode, 0);
+ emit_move_insn (r, m);
+ m = adjust_address (op1, mode, 16);
+ r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
+ emit_move_insn (op0, r);
+   }
+  else
+   emit_insn (load_unaligned (op0, op1));
 }
-  else if (MEM_P (op0)  TARGET_AVX256_SPLIT_UNALIGNED_STORE)
+  else if (MEM_P (op0))
 {
-  m = adjust_address (op0, mode, 0);
-  emit_insn (extract (m, op1, const0_rtx));
-  m = adjust_address (op0, mode, 16);
-  emit_insn (extract (m, op1, const1_rtx));
+  if (TARGET_AVX256_SPLIT_UNALIGNED_STORE)
+   {
+ m = adjust_address (op0, mode, 0);
+ emit_insn (extract (m, op1, const0_rtx));
+ m = adjust_address (op0, mode, 16);
+ emit_insn (extract (m, op1, const1_rtx));
+   }
+  else
+   emit_insn (store_unaligned (op0, op1));
 }
   else
-emit_insn (move_unaligned (op0, op1));
+gcc_unreachable ();
 }
 
 /* Implement the movmisalign patterns for SSE.  Non-SSE modes go
@@ -16195,7 +16209,7 @@ ix86_expand_vector_move_misalign (enum machine_mod
  op0 = gen_lowpart (V16QImode, op0);
  op1 = gen_lowpart (V16QImode, op1);
  /* We will eventually emit movups based on insn attributes.  */
- emit_insn (gen_sse2_movdqu (op0, op1));
+ emit_insn (gen_sse2_loaddqu (op0, op1));
}
   else if (TARGET_SSE2  mode == V2DFmode)
 {
@@ -16207,7 +16221,7 @@ ix86_expand_vector_move_misalign (enum machine_mod
  || optimize_function_for_size_p (cfun))
{
  /* We will eventually 

Re: encoding all aliases options in .opt files

2012-10-13 Thread Manuel López-Ibáñez
On 12 October 2012 17:18, Joseph S. Myers jos...@codesourcery.com wrote:
 On Fri, 12 Oct 2012, Manuel López-Ibáñez wrote:

 I am trying to encode the relationship between Wstrict-aliasing and
 Wstrict-aliasing= in the .opt files, and the same for Wstrict-overflow
 and Wstrict-overflow=. However, the parameters of Alias() are taken as
 strings, so we get 3 and WARN_STRICT_OVERFLOW_CONDITIONAL. Do you
 have a suggestion how to handle this?

 I don't see what the problem is supposed to be.  Yes, the arguments are
 strings - they describe how one option that the user might pass on the
 command line is exactly equivalent to another that they might pass (up to
 and including the latter form of the option being the one that is used
 when matching specs and multilibs, because the driver does the
 translation, including working out the canonical textual form of the
 option, before processing specs against the command line).

 If -Wstrict-overflow on the command line is equivalent to
 -Wstrict-overflow=WARN_STRICT_OVERFLOW_CONDITIONAL on the command line,
 then specify WARN_STRICT_OVERFLOW_CONDITIONAL in the parameters.  If it's
 equivalent to something else, specify something else.

 My next step would be to handle aliases also when internally
 generating options, so I can add EnabledBy(Wall) to Wstrict-aliasing.
 Does this sound ok to you?

 That seems reasonable.

OK. The attached patch implements this. Does the approach look ok? I
will write changelog and more comments if it seems reasonable. One
thing I don't like is that now we print [-Werror=strict-aliasing=] in
diagnostics.

Two questions:

1) Are references allowed now that C++ is the default?

2) While fixing this, I was thinking: Why the difference between
Joined and Separate? Why not make every function that takes an
argument work with and without '=' and with separate argument. It
seems we could remove a lot of options this way. What I am proposing
is, instead of:

aux-info
Common Separate Var(aux_info_file_name)
-aux-info fileEmit declaration information into file

aux-info=
Common Joined Alias(aux-info)

simply

aux-info
Common RequiredArgument Var(aux_info_file_name)
-aux-info fileEmit declaration information into file

and all of these would be accepted:

aux-infoFILE /* we could accept this to be compatible with some
options like -B */
aux-info=FILE
aux-info FILE

We have dozens of options like these.

Moreover, we could replace:

Wstrict-aliasing
Common Alias(Wstrict-aliasing=, 3, 0) Warning
Warn about code which might break strict aliasing rules

Wstrict-aliasing=
Common Joined RejectNegative UInteger Var(warn_strict_aliasing) Warning
Warn about code which might break strict aliasing rules

with:

Wstrict-aliasing
Common MissingArgument(3) UInteger Var(warn_strict_aliasing) Warning
Warn about code which might break strict aliasing rules

and all of these will be accepted:

Wstrict-aliasing
Wno-strict-aliasing
Wstrict-aliasing2
Wstrict-aliasing=2
Wstrict-aliasing 2
Werror=strict-aliasing
Wno-error=strict-aliasing

Cheers,

Manuel.


lang-enabled-by-with-args.diff
Description: Binary data


Re: Add usage documentation for hash_table

2012-10-13 Thread Lawrence Crowl
On 10/12/12, Diego Novillo dnovi...@google.com wrote:
 Add usage documentation for hash_table.

 Andrew, does this help?

 Lawrence, I think I've gotten the details right, but please confirm.

The patch merges the descriptor class with the element class,
which we do not currently do and which I don't think we should do.
If that class is used in a tree, it would be illegal.

I'll work with Diego to address the issue.


 Tested by re-building stage 1.


   * hash-table.h: Add usage documentation.
   Tidy formatting.

 diff --git a/gcc/hash-table.h b/gcc/hash-table.h
 index 3aa66a7..0c51be6 100644
 --- a/gcc/hash-table.h
 +++ b/gcc/hash-table.h
 @@ -21,8 +21,121 @@ along with GCC; see the file COPYING3.  If not see


  /* This file implements a typed hash table.
 -   The implementation borrows from libiberty's hashtab.  */
 +   The implementation borrows from libiberty's htab_t.

 +  Hash tables are instantiated with two type arguments:
 +
 +  1- Element: A type describing the elements in the table.
 +  This type must provide 3 declarations:
 +
 + - A typedef to create the type Element::T.  This is the type
 +   of the elements stored in the table.  So, if you want the
 +   hash table of MyType elements, declare 'typedef MyType T;'.
 +
 + - A function named 'hash' that takes a pointer to Element::T
 +   and returns a hashval_t value.  This is the hashing
 +   function.
 +
 + - A function named 'equal' that takes two pointers to
 +   Element::T and returns an int.  This is the comparison
 +   function.  It should return nonzero if the elements are
 +   equal, 0 otherwise.
 +
 + - A static function named 'remove' that takes an Element::T
 +   pointer and frees the memory allocated by it.  This is used
 +   when individual elements of the table need to be disposed of
 +   (e.g., when deleting a hash table, removing elements from
 +   the table, etc).
 +
 +  2- Allocator: A template type implementing allocation and deallocation
 for
 + the table itself.
 +
 + This type takes as argument a type passed on by the hash table
 + allocation and deallocation functions.  It must provide four
 + static functions:
 +
 + - Allocator::control_alloc: This allocates the control data
 +   blocks for the table.
 +
 +- Allocator::control_free: This frees the control data blocks
 +   for the table.
 +
 +- Allocator::data_alloc: This allocates the data elements in
 +   the table.
 +
 + - Allocator::data_free: This deallocates the data elements in
 +   the table.
 +
 +  In general, you will not need to provide your own Allocator type.
 +  By default, hash tables will use the class xcallocator, which uses
 +  malloc/free for allocation.
 +
 +  Additionally, this file provides two common types for implementing
 +  element removal:
 +
 + - typed_free_remove: it implements the method 'remove' to call
 +   free().
 +
 + - typed_noop_remove: it implements the method 'remove' to do
 +   nothing.
 +
 +  To use either of these removal strategies, simply make your type a
 +  derived class of one of these two.
 +
 +  Example usage:
 +
 +  1- A hash table for MyType:
 +
 + // Derive from typed_noop_remove so element removal does nothing.
 + class MyType : typed_noop_removeMyType
 + {
 +   int f1;
 +   OtherType f2;
 +
 +   // Hash table support.  Need a typedef and 2 static functions.
 +
 +   // 'T' is the type used in all the hash table functions.
 +   typedef MyType T;
 +
 +   // The hashing function.  'T' and 'MyType' are equivalent here.
 +   static inline hashval_t hash (const MyType *);
 +
 +   // The equality function.  'T' and 'MyType' are equivalent here.
 +   static inline int equal (const MyType *, const MyType *);
 + };
 +
 + inline hashval_t
 + MyType::hash (const MyType *e)
 + { ... compute and return a hash value for E ... }
 +
 + inline int
 + MyType::equal (const MyType *p1, const MyType *p2)
 + { ... compare P1 vs P2.  Return 1 if they are the same ... }
 +
 +
 +  Note that since MyType derives from typed_noop_removeMyType, it does
 not
 +  need to provide a 'remove' function.  It inherits it from
 +  typed_noop_remove.
 +
 +  To instantiate a hash table for MyType:
 +
 + hash_tableMyType mytype_hash;
 +
 +  You can then used any of the functions in hash_table's public interface.
 +  See hash_table for details.  The interface is very similar to
 libiberty's
 +  htab_t.
 +
 +
 +  2- A hash table of pointers.
 +
 +  This file provides the template type 'pointer_hash' which can be
 +  used to create hash tables of pointers to any type.  This class uses
 +  the same hashing function used by libiberty's hash_pointer.
 +
 +  To create a hash table of pointers to MyType:
 +
 + hash_table pointer_hash MyType ptr_htab;
 +*/

  #ifndef TYPED_HASHTAB_H
  #define TYPED_HASHTAB_H
 @@ -71,7 +184,7 @@ xcallocator 

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-13 Thread David Edelsohn
On Fri, Oct 12, 2012 at 10:59 AM, Paolo Carlini
paolo.carl...@oracle.com wrote:
 On 10/12/2012 04:20 PM, Pavel Chupin wrote:

 Please see attached patch (applicable after revert).
 I've moved libgcc libstdc++ common configure thread header chunk into
 separate gthr.m4.
 Could you please try it on AIX?

 Is it OK for trunk?

 Looks Ok. If David can test is successfully on AIX I can approve it.

I was able to bootstrap successfully with the patch.

Thanks, David


Re: [PATCH] Fix gcov handling directories with periods

2012-10-13 Thread Ian Lance Taylor
On Sat, Oct 13, 2012 at 11:23 AM, Andreas Schwab sch...@linux-m68k.org wrote:
 Ian Lance Taylor i...@google.com writes:

 Why do you need the CONST_CAST?  strrchr is a standard function and it
 takes const char * as the first argument.  There is other code in gcc
 that calls strrchr with a const char * argument.

 strrchr is overloaded as const and non-const in C++.  We need the
 non-const version.

Oh yeah.  Really we should overload lbasename the same way.

Suppose you drop this into include/libiberty.h:

#ifdef __cplusplus
inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); }
#endif

I'll preapprove that if it works.

Ian


Re: encoding all aliases options in .opt files

2012-10-13 Thread Ian Lance Taylor
On Sat, Oct 13, 2012 at 11:33 AM, Manuel López-Ibáñez
lopeziba...@gmail.com wrote:

 1) Are references allowed now that C++ is the default?

I'm not sure we addressed those in the coding conventions.  I would like to say:

1) const references are always fine;
2) non-const references as local variables are always fine;
3) non-const references as function parameters should never be used
without a really excellent reason.

The reason is simply that using non-const references as function
parameters means that arguments passed to a function can change
without any indication, and that is potentially quite confusing.
Adding an  before an output argument is easy enough, and we're all
used to it.

Ian


Re: Add usage documentation for hash_table

2012-10-13 Thread Diego Novillo

On 2012-10-13 14:40 , Lawrence Crowl wrote:

On 10/12/12, Diego Novillo dnovi...@google.com wrote:

Add usage documentation for hash_table.

Andrew, does this help?

Lawrence, I think I've gotten the details right, but please confirm.


The patch merges the descriptor class with the element class,
which we do not currently do and which I don't think we should do.
If that class is used in a tree, it would be illegal.


You mean I should s/Element/Descriptor/?  Done.

I included the formatting changes in the doc patch, but I can easily 
split the patch if you prefer.


OK with those changes?


Diego.


Re: [PATCH] Fix gcov handling directories with periods

2012-10-13 Thread Andreas Schwab
Ian Lance Taylor i...@google.com writes:

 Suppose you drop this into include/libiberty.h:

 #ifdef __cplusplus
 inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); }
 #endif

That doesn't work:

../../gcc/libcpp/../include/libiberty.h: In function ‘char* lbasename(char*)’:
../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C 
function ‘char* lbasename(char*)’ conflicts with
../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration 
‘const char* lbasename(const char*)’ here

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Fix PR rtl-optimization/54871

2012-10-13 Thread Eric Botcazou
This is the execution failure of gfortran.dg/vector_subscript_1.f90 on
SPARC/Solaris at -O3 -funroll-loops.  The loop2_unroll dump of the reduced
testcase reads:

Loop 3 is simple:
  simple exit 10 - 12
  number of iterations: (const_int -1 [0x])
  upper bound: 1073741823
  realistic bound: -1

The number of iterations is of course bogus.

The problem is in simplify_using_initial_values: it uses the condition in:

(insn 105 104 106 6 (set (reg:CC 100 %icc)
(compare:CC (reg:SI 153 [ D.1086 ])
(reg:SI 229 [ D.1086 ]))) vector_subscript_1.f90:21 1 
{*cmpsi_insn}
 (expr_list:REG_DEAD (reg:SI 229 [ D.1086 ])
(nil)))

to record an equivalence between (reg:SI 153) and (reg:SI 229) and then:

(insn 179 101 103 6 (set (reg:SI 229 [ D.1086 ])
(reg:SI 245 [ D.1086 ])) vector_subscript_1.f90:21 -1
 (nil))

plus the equivalence to simplify the niter expression.  But there is:

(insn 104 103 105 6 (set (reg:SI 229 [ D.1086 ])
(if_then_else:SI (ge:CC (reg:CC 100 %icc)
(const_int 0 [0]))
(reg:SI 132 [ D.1086 ])
(reg:SI 245 [ D.1086 ]))) vector_subscript_1.f90:21 105 
{*movsi_cc_v9}
 (expr_list:REG_EQUAL (if_then_else:SI (ge:CC (reg:CC 100 %icc)
(const_int 0 [0]))
(reg:SI 132 [ D.1086 ])
(const_int 0 [0]))
(expr_list:REG_DEAD (reg:SI 132 [ D.1086 ])
(expr_list:REG_DEAD (reg:CC 100 %icc)
(nil)

between the two above instructions, which invalidates the equivalence.
So the equivalence should have been pruned when insn 104 was processed.

The patch also does a bit of cosmetic cleanup in loop-unroll.c, for
example in the dump file.  We have in the loop2_unroll dump of the
original testcase:

;; *** Considering loop 3 ***

;; Considering unrolling loop with constant number of iterations
;; max_unroll 5 (6 copies, initial 5).
;; Decided to unroll the constant times rolling loop, 4 times.

Question for the reader: how many times is the loop unrolled? 4, 5 or 6? :-)

In fact, the max_unroll is always equal to the times + 1 and initial
is an internal initial value for max_unroll.  Morever, this max_unroll
is _not_ equal to the max_unroll in unroll_loop_constant_iterations:

  unsigned max_unroll = loop-lpt_decision.times;

which is equal to the times instead.  And we also have comments like:

/* Unroll LOOP with constant number of iterations LOOP-LPT_DECISION.TIMES + 1
  times.

so LOOP-LPT_DECISION.TIMES (the times above) isn't really what one might
think it is.  In the end, this is mightily confusing.

Tested on x86_64-suse-linux and SPARC/Solaris, applied on the mainline.


2012-10-13  Eric Botcazou  ebotca...@adacore.com

PR rtl-optimization/54871
* loop-iv.c (simplify_using_initial_values): When scanning previous
basic blocks, prune the recorded conditions if the current insn was
not used to make a replacement.

* loop-unroll.c (decide_unroll_constant_iterations): Clean up message.
(unroll_loop_constant_iterations): Clarify head comment.
(decide_unroll_runtime_iterations): Clean up message.
(unroll_loop_runtime_iterations): Clarify head comment.
(decide_peel_simple): Clean up message.
(peel_loop_simple): Clarify head comment.
(decide_unroll_stupid): Clean up message.
(unroll_loop_stupid): Clarify head comment.


-- 
Eric BotcazouIndex: loop-iv.c
===
--- loop-iv.c	(revision 192353)
+++ loop-iv.c	(working copy)
@@ -2004,11 +2004,30 @@ simplify_using_initial_values (struct lo
 		}
 	}
 	  else
-	/* If we did not use this insn to make a replacement, any overlap
-	   between stores in this insn and our expression will cause the
-	   expression to become invalid.  */
-	if (for_each_rtx (expr, altered_reg_used, this_altered))
-	  goto out;
+	{
+	  rtx *pnote, *pnote_next;
+
+	  /* If we did not use this insn to make a replacement, any overlap
+		 between stores in this insn and our expression will cause the
+		 expression to become invalid.  */
+	  if (for_each_rtx (expr, altered_reg_used, this_altered))
+		goto out;
+
+	  /* Likewise for the conditions.  */
+	  for (pnote = cond_list; *pnote; pnote = pnote_next)
+		{
+		  rtx note = *pnote;
+		  rtx old_cond = XEXP (note, 0);
+
+		  pnote_next = XEXP (note, 1);
+		  if (for_each_rtx (old_cond, altered_reg_used, this_altered))
+		{
+		  *pnote = *pnote_next;
+		  pnote_next = pnote;
+		  free_EXPR_LIST_node (note);
+		}
+		}
+	}
 
 	  if (CONSTANT_P (*expr))
 	goto out;
Index: loop-unroll.c
===
--- loop-unroll.c	(revision 192353)
+++ loop-unroll.c	(working copy)
@@ -602,26 +602,21 @@ decide_unroll_constant_iterations (struc
 	}
 }
 
-  if (dump_file)
-fprintf (dump_file, ;; max_unroll %d (%d 

Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Eric Botcazou
 IRA should be using the DF-LIVE sets, which are smaller than the DF-LR
 sets when they are available (typically at O2 and above). The proper
 sets can be conveniently accessed using the df_get_live_[in,out]
 functions which use DF-LIVE if it is available and fall back to DF-LR
 if it is not.
 [...]
 At optimize1 the DF_LIVE problem is already always computed. I think
 the idea here was to *remove* the DF_LIVE problem at -O1, which is
 also part of the attached patch.
 (Perhaps we should even simply not add the DF_LIVE problem at -O1, but
 that can be done in a follow-up patch if necessary.)

So, in the end, does the patch enable DF_LIVE at -O1 or not?  There seems to 
be a contradiction between the subject and the body of the message.  If yes, 
perhaps an acceptable compromise would be to keep things unchanged at -O1.

-- 
Eric Botcazou


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Steven Bosscher
On Sat, Oct 13, 2012 at 10:38 PM, Eric Botcazou ebotca...@adacore.com wrote:
 So, in the end, does the patch enable DF_LIVE at -O1 or not?  There seems to
 be a contradiction between the subject and the body of the message.  If yes,
 perhaps an acceptable compromise would be to keep things unchanged at -O1.

Eh, right. When I wrote the e-mail, I was still under the impression
that IRA only has DF_LIVE available at -O2 and higher, and I only
discovered that DF_LIVE is added if optimize==1 just before submitting
the patch. So my patch as submitted should have said ... (for -O1 and
higher).

I think it would be a good idea to keep things unchanged at -O1. For
that, the patch needs a few minor modifications (remove calls to
df_live_add_problem and make some code to update DF_LIVE_{IN,OUT}
conditional). I can prepare an updated patch for that, if you think
that's best.

Ciao!
Steven


Re: Make try_unroll_loop_completely to use loop bounds recorded

2012-10-13 Thread Jan Hubicka
 On Fri, 12 Oct 2012, Jan Hubicka wrote:
 
* f95-lang.c (gfc_init_builtin_functions): Build 
__builtin_unreachable.
   
   I wonder if other languages need similar adjustment?
  
  I also wondered ;) Only Fortran triggered, I will take a look.
   
   +  /* Now destroy the loop.  First try to do so by cancelling the
   + patch from exit conditional if we identified good candidate.
   +
   
   you mean 'path' here, right?  Similar in other places.
  
  Yeah.
   
   +  /* Unloop destroys the latch edge.  */
   +  unloop (loop, irred_invalidated);
   +  if (irred_invalidated)
   +   mark_irreducible_loops ();
   
   this makes the whole thing quadratic in the number of loops.
   Please pass down a flag and handle this in the place we
   update SSA form (thus, once per function / unroll iteration).
  
  This was in my first version of the patch. Then I noticed that remove_path
  also relies on irreducible loops to be computed and does the exactly same
  logic as I do.
  So we would need to extend the irred_invalidated to remove_path as well.
 
 Well, then do that.  Or at least constrain the number of loops
 that are updated in mark_irreducible_loops (though that would
 still be quadratic in the loop depth then?).  We solved scalability
 issues of cunroll for 4.8, so adding new ones isn't going to fly.

Well, limiting number of mark_irreducible_loops updates is probably quite easy
to do.  unloop is returning true just in quite corner cases and actually
we do have this quadratic complexity issue in the RTL unroller forever.
 
   
   Or provide a more optimial implementation that only considers
   parent loops of 'loop' (even that is excessive though, quadratic
   in the loop depth).
   
   -  FOR_EACH_LOOP (li, loop, 0)
   +  for (fel_init (li, next, LI_ONLY_INNERMOST); next;)
   
   FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
   
   but I'm sure that IV canonicalization should happen for all loops.
   So, why do you need this change?  Esp. we should get rid of
   single-iteration outer loops here, too.
  
  FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand.  
  I
  was running into case where we cancelled inner loop but did not update SSA 
  and
  tried to estimate number of iterations in the outer loop (now also 
  innermost)
  that went into infinite loop.
 
 Well, that's not true (it uses fel_* stuff), FOR_EACH_LOOP constructs
 a vector of loops in the desired order it walks over.  As for estimation

I go for next loop before possibly cancelling it, while FOR_EACH_LOOP will
go from possibly cancelled loop to next.

 you probably need to invalidate the scev cache?  The current logic

I do not think the _by_eval function uses much of scev cache.  Here it was
simply walking PHI from infinite loop that was artefact of earlier update.

 explicitely walks all innermost loops and then updates, you cannot
 walk an outer loop of an inner loop that changed without updating SSA
 form for example.

I think the fel terators will do it when you are cancelling the loops on the
go.  I will dig out into this more.

Honza
 
 Richard.
 
  Honza
   
   -  FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
   +  for (fel_init (li, next, 0); next;)
   
   see above - FOR_EACH_LOOP (li, loop, 0)
   
   Otherwise looks ok.  Please update and re-post.
   
   Thanks,
   Richard.
   
   
Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
===
--- testsuite/gcc.dg/tree-ssa/cunroll-1.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-1.c   (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-tree-cunroll-details } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;ic;i++)
+a[i]=5;
+}
+/* Array bounds says the loop will not roll much.  */
+/* { dg-final { scan-tree-dump-not Unrolled loop 1 completely 
(duplicated 1 times). Last iteration exit edge was proved true. 
cunroll} } */
+/* { dg-final { cleanup-tree-dump cunroll } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-2.c   (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-tree-cunroll-details } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;ic;i++)
+{
+  a[i]=5;
+  if (test2())
+   return;
+}
+}
+/* We are not able to get rid of the final conditional because the 
loop has two exits.  */
+/* { dg-final { scan-tree-dump-not Unrolled loop 1 completely 
(duplicated 2 times). cunroll} } */
+/* { dg-final { cleanup-tree-dump cunroll } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c

Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Eric Botcazou
 I think it would be a good idea to keep things unchanged at -O1. For
 that, the patch needs a few minor modifications (remove calls to
 df_live_add_problem and make some code to update DF_LIVE_{IN,OUT}
 conditional). I can prepare an updated patch for that, if you think
 that's best.

That was just a proposal to be put on the table. :-)  Others, especially Vlad 
of course, have the final say.  And I somewhat sympathize with the cautious 
approach here, because we were forced to downgrade from DF_LIVE to DF_LR for 
resource.c at some point (but of course it's resource.c so...).

-- 
Eric Botcazou


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Vladimir Makarov

On 12-10-13 11:37 AM, Steven Bosscher wrote:

On Sat, Oct 13, 2012 at 5:12 PM, Vladimir Makarov wrote:

On 12-10-13 9:40 AM, Steven Bosscher wrote:

Hello,

This fixes the long-standing enhancement request to use DF_LIVE in
IRA. To quote the first comment in the PR:

IRA should be using the DF-LIVE sets, which are smaller than the DF-LR
sets when they are available (typically at O2 and above). The proper
sets can be conveniently accessed using the df_get_live_[in,out]
functions which use DF-LIVE if it is available and fall back to DF-LR
if it is not.

I thought that DF_LIVE wasn't available at -O1 in IRA, but
interestingly ira.c:ira() adds it to the DF-problems list in that case
already:

$ grep -C3 -n df_live_add ira.c
4160-
4161-  if (optimize == 1)
4162-{
4163:  df_live_add_problem ();
4164-  df_live_set_all_dirty ();
4165-}
4166-#ifdef ENABLE_CHECKING

So DF_LIVE is already being computed for IRA. It's just not being used
because the DF_LR_{IN,OUT} macros are used instead of the
df_get_live_{in,out} functions. With my patch, the DF_LIVE sets are
used, which results in marginally faster compile times for my set of
cc1-i files on power64 and on x86_64, as well as smaller code. There's
also a good speedup for the PR54146 test case.

Only reload.c still uses the DF_LR_OUT. The comment in
find_dummy_reload() suggests this is intentional.

I also fond a bug in IRA:

4392- touch the artificial uses and defs.  */
4393-  df_finish_pass (true);
4394-  if (optimize  1)
4395:df_live_add_problem ();
4396-  df_scan_alloc (NULL);
4397-  df_scan_blocks ();
4398-

At optimize1 the DF_LIVE problem is already always computed. I think
the idea here was to *remove* the DF_LIVE problem at -O1, which is
also part of the attached patch.
(Perhaps we should even simply not add the DF_LIVE problem at -O1, but
that can be done in a follow-up patch if necessary.)

Bootstrapped and tested (with default checking and with release
checking) on powerpc64-unknown-linux-gnu and x86_64-unknown-linux-gnu.
OK for trunk?



LIVE is not used on purpose.  I added LIVE usage to the old RA about 10
years ago (it was before DF-infrastructure).  Everything was ok until,
somebody reported compiler crash on semantically wrong program (something
with usage of uninitialized variable).  After that I removed this code.

GCC was a completely different compiler 10 years ago. Handling of
uninitialized variables is radically different since tree-ssa, and
also the resource allocation side of the compiler has essentially been
rewritten (by you :-). Whatever broke in the compiler 10 years ago may
not be relevant anymore today.

I can't find your work, or the reported problem, in the gcc mailing
list archives. Did you add a test case for the problem, and if so,
where can I find it?



Probably, that is also a reason why reload uses LR not LIVE.

The reason why reload uses DF_LR is PR20973, as documented in the code.
(And FWIW, that bug doesn't reproduce even with GCC 4.0 if you add the
transformations of pass_initialize_regs to it, that pass exists to
avoid uninitialized regs hazards by making them initialized.)



  So I'd like to
wait a few weeks.  When I have more time, I am going to reproduce this test
suite and look how IRA behaves on it.

I don't think that's giving this patch a fair chance. In a few week's
time, stage1 will be over. You may not have time, but that doesn't
mean other developers should wait. If you have a test case that fails
with my patch, I'll gladly spend my own time on it to see if it can be
fixed.

My patch was properly tested on two primary targets and passes all
tests in the test suite, it improves the compiler now, and it
addresses a PR that you've basically ignored for 4 years. I kindly ask
you to reconsider your position :-)

I guess you are right.  A lot of things are changed.  DF is also 
different from what I used.  Other passes also uses LIVE sets.


Ok for the idea.  If we have a problem later, we could fix it.  I'll 
look at the next version of the patch when you send it to give your the 
final approval.


Thanks for your attention to IRA.

By the way, I found the reported problem:

http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00533.html

But it occurred on s390.



Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-13 Thread Steven Bosscher
On Sat, Oct 13, 2012 at 11:05 PM, Eric Botcazou wrote:
 I think it would be a good idea to keep things unchanged at -O1. For
 that, the patch needs a few minor modifications (remove calls to
 df_live_add_problem and make some code to update DF_LIVE_{IN,OUT}
 conditional). I can prepare an updated patch for that, if you think
 that's best.

 That was just a proposal to be put on the table. :-)  Others, especially Vlad
 of course, have the final say.  And I somewhat sympathize with the cautious
 approach here, because we were forced to downgrade from DF_LIVE to DF_LR for
 resource.c at some point (but of course it's resource.c so...).

Right, that was for PR40710
(http://gcc.gnu.org/ml/gcc/2009-07/msg00241.html) because resource.c
computes its own idea of liveness and it uses a similar definition as
the DF_LR problem. Using incompatible views of liveness is not a good
idea...

For IRA it doesn't matter whether you use DF_LR or DF_LIVE, as long as
you use one or the other consistently.

Ciao!
Steven


[PATCH, alpha]: Trivial alpha.md macroizations, part 5

2012-10-13 Thread Uros Bizjak
Hello!

2012-10-13  Uros Bizjak  ubiz...@gmail.com

* config/alpha/alpha.md (I24MODE): New mode iterator.
(any_divmod): New code iterator.
(codesi3): Macroize expander from {div,mod,udiv,umod}si3 using
any_divmod code iterator.
(codesi3): Macroize expander from {div,mod,udiv,umod}di3 using
any_divmod code iterator.
(extendqimode2): Macroize insn from extendqi{hi,si}2 using
I24MODE mode iterator.
(unaligned_storemode): Macroize expander from unaligned_store{qi,hi}
using I12MODE mode iterator.
(movmode): Macroize expander from mov{qi,hi} using
I12MODE mode iterator.

Tested on alphaev68-linux-gnu, committed to mainline SVN.

Uros.
Index: alpha.md
===
--- alpha.md(revision 192421)
+++ alpha.md(working copy)
@@ -93,6 +93,7 @@
 (define_mode_iterator IMODE [QI HI SI DI])
 (define_mode_iterator I12MODE [QI HI])
 (define_mode_iterator I124MODE [QI HI SI])
+(define_mode_iterator I24MODE [HI SI])
 (define_mode_iterator I248MODE [HI SI DI])
 (define_mode_iterator I48MODE [SI DI])
 
@@ -734,31 +735,16 @@
 ;; problem.  Is it worth the complication here to eliminate the sign
 ;; extension?
 
-(define_expand divsi3
-  [(set (match_dup 3)
-   (sign_extend:DI (match_operand:SI 1 nonimmediate_operand )))
-   (set (match_dup 4)
-   (sign_extend:DI (match_operand:SI 2 nonimmediate_operand )))
-   (parallel [(set (match_dup 5)
-  (sign_extend:DI (div:SI (match_dup 3) (match_dup 4
- (clobber (reg:DI 23))
- (clobber (reg:DI 28))])
-   (set (match_operand:SI 0 nonimmediate_operand )
-   (subreg:SI (match_dup 5) 0))]
-  TARGET_ABI_OSF
-{
-  operands[3] = gen_reg_rtx (DImode);
-  operands[4] = gen_reg_rtx (DImode);
-  operands[5] = gen_reg_rtx (DImode);
-})
+(define_code_iterator any_divmod [div mod udiv umod])
 
-(define_expand udivsi3
+(define_expand codesi3
   [(set (match_dup 3)
(sign_extend:DI (match_operand:SI 1 nonimmediate_operand )))
(set (match_dup 4)
(sign_extend:DI (match_operand:SI 2 nonimmediate_operand )))
(parallel [(set (match_dup 5)
-  (sign_extend:DI (udiv:SI (match_dup 3) (match_dup 4
+  (sign_extend:DI
+   (any_divmod:SI (match_dup 3) (match_dup 4
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 nonimmediate_operand )
@@ -770,78 +756,16 @@
   operands[5] = gen_reg_rtx (DImode);
 })
 
-(define_expand modsi3
-  [(set (match_dup 3)
-   (sign_extend:DI (match_operand:SI 1 nonimmediate_operand )))
-   (set (match_dup 4)
-   (sign_extend:DI (match_operand:SI 2 nonimmediate_operand )))
-   (parallel [(set (match_dup 5)
-  (sign_extend:DI (mod:SI (match_dup 3) (match_dup 4
- (clobber (reg:DI 23))
- (clobber (reg:DI 28))])
-   (set (match_operand:SI 0 nonimmediate_operand )
-   (subreg:SI (match_dup 5) 0))]
-  TARGET_ABI_OSF
-{
-  operands[3] = gen_reg_rtx (DImode);
-  operands[4] = gen_reg_rtx (DImode);
-  operands[5] = gen_reg_rtx (DImode);
-})
-
-(define_expand umodsi3
-  [(set (match_dup 3)
-   (sign_extend:DI (match_operand:SI 1 nonimmediate_operand )))
-   (set (match_dup 4)
-   (sign_extend:DI (match_operand:SI 2 nonimmediate_operand )))
-   (parallel [(set (match_dup 5)
-  (sign_extend:DI (umod:SI (match_dup 3) (match_dup 4
- (clobber (reg:DI 23))
- (clobber (reg:DI 28))])
-   (set (match_operand:SI 0 nonimmediate_operand )
-   (subreg:SI (match_dup 5) 0))]
-  TARGET_ABI_OSF
-{
-  operands[3] = gen_reg_rtx (DImode);
-  operands[4] = gen_reg_rtx (DImode);
-  operands[5] = gen_reg_rtx (DImode);
-})
-
-(define_expand divdi3
+(define_expand codedi3
   [(parallel [(set (match_operand:DI 0 register_operand )
-  (div:DI (match_operand:DI 1 register_operand )
-  (match_operand:DI 2 register_operand )))
+  (any_divmod:DI
+   (match_operand:DI 1 register_operand )
+   (match_operand:DI 2 register_operand )))
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])]
   TARGET_ABI_OSF
   )
 
-(define_expand udivdi3
-  [(parallel [(set (match_operand:DI 0 register_operand )
-  (udiv:DI (match_operand:DI 1 register_operand )
-   (match_operand:DI 2 register_operand )))
- (clobber (reg:DI 23))
- (clobber (reg:DI 28))])]
-  TARGET_ABI_OSF
-  )
-
-(define_expand moddi3
-  [(parallel [(set (match_operand:DI 0 register_operand )
-  (mod:DI (match_operand:DI 1 register_operand )
-  (match_operand:DI 2 register_operand )))
- (clobber (reg:DI 23))
- (clobber (reg:DI 28))])]
-  TARGET_ABI_OSF
-  )
-
-(define_expand umoddi3
-  [(parallel [(set 

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-13 Thread David Miller
From: Eric Botcazou ebotca...@adacore.com
Date: Sat, 13 Oct 2012 15:47:11 +0200

 The trouble area, and where I need help from either Rainer or Eric,
 is the Solaris2 bits.
 
 I think we need to move the Solaris assembler stuff over to a model
 where it passes:
 
  -m{32,64} -xarch=sparcFOO
 
 instead of using the v8plusX stuff to indicate 32bit.  And that's
 the direction I tried to move in here.
 
 The only versions of the Solaris assembler I have access to only support 
 v8plusX according to the man page.  Has that changed recently?

For the older stuff I mean doing something like -m32 -xarch=v9X

 
  * config/sparc/sparc.opt (mvis4): New option.
  * config/sparc/sparc-c.c (sparc_target_macros): When TARGET_VIS4,
  define __VIS__ to 0x400.
 
 What's the relationship between VIS4 and SPARC-T4 exactly?  The above manual 
 only speaks of VIS3 as far as I can see.  And the CBcond instructions are not 
 marked as belonging to VIS (3 or 4), so using -mvis4 for them seems strange.
 Why not make them depend on -mcpu=niagara4 instead?

The current assembler in Solaris Studio (called 'fbe') calls this
stuff sparc4 which I guess means SPARC-T4 and later.

I'm just calling it VIS4 in GCC so that we can export intrinsics of,
for example, the cryptographic instructions at some point using the
__VIS__ version CPP tests.


[patch] PR54919 - fix variable expansion in RTL loop unrolling

2012-10-13 Thread Steven Bosscher
Hello,

Today appears to be RTL loop optimizer patch day, because here's
another patch...

The problem here is that variable expansion does not update REG_EQUAL
notes when it performs replacement of the renamed register.

I fixed this by using validate_replace_rtx_group(). There is already
code in analyze_insn_to_expand_var() to make sure that the
to-be-replaced register is only used to accumulate into, so I think
that using validate_replace_rtx_group is safe. Could use a 2nd pair of
eyes to make sure, though.

Tested with a bootstrapped compiler. Test coverage isn't great,
because variable expansion is not enabled by default.

OK for trunk?

Ciao!
Steven


pr54919.diff
Description: Binary data


Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-13 Thread Joseph S. Myers
On Sat, 13 Oct 2012, Diego Novillo wrote:

 On 2012-10-13 14:04 , Tobias Schlüter wrote:
  On 2012-10-13 20:00, Diego Novillo wrote:
   On 2012-10-13 09:26 , Tobias Schlüter wrote:
first a question also to non-gfortraners: if I want to use std::map,
where do I #include map?  In system.h?
   
   No.  Ada includes system.h in pure C code.  Why not include it exactly
   where you need it?
  
  Ok, I was just wondering if there's a rule because a quick grep revealed
  no non-header source file that includes system headers.
 
 Joseph or others may have reason to create a system.hxx file or some such, but
 in principle I don't see why we shouldn't include std library headers as we
 need them.
 
 Joseph?

The poisoning of various standard library functions that should not be 
used directly in GCC can be problematic if system headers, using those 
functions, are included after that poisoning.  Thus in general it's better 
for system header includes to go in system.h, before the poisoning.

In addition, where there is some autoconf configuration about whether to 
include a header, it's generally preferred to keep down the number of 
places with such a conditional on host features (encapsulating whatever 
features the header provides in some way so that other code can use them 
unconditionally).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: encoding all aliases options in .opt files

2012-10-13 Thread Joseph S. Myers
On Sat, 13 Oct 2012, Manuel L?pez-Ib??ez wrote:

 OK. The attached patch implements this. Does the approach look ok? I
 will write changelog and more comments if it seems reasonable. One

Without the comments explaining the semantics of the new functions and 
their parameters, I'm not going to attempt to reverse-engineer what the 
approach in question is intended to be and so whether it's reasonable or 
not.

 2) While fixing this, I was thinking: Why the difference between
 Joined and Separate? Why not make every function that takes an
 argument work with and without '=' and with separate argument. It
 seems we could remove a lot of options this way. What I am proposing
 is, instead of:

I see no significant advantage to users in having all these different ways 
to pass options, and multiple disadvantages:

* If you add yet more ways of passing options, then any future overhaul of 
option handling needs to preserve those in addition to all the others.  
The fewer ways there are, the fewer constraints on the implementation.

* Options like this

 aux-infoFILE /* we could accept this to be compatible with some
 options like -B */

are liable to be a pain for anyone, seeing such an option, to work out 
what is the option name that they might search for to get more 
information, and what is the argument.

* In general, if all users are using an option in the same form it's 
easier for them to tell that another person really is using the same 
option rather than something that might be different from the option they 
are used to.

* This also may not work so well with JoinedOrMissing options.

 Wstrict-aliasing2

Shades of the old -gdwarf2 that once meant DWARF 1, debug level 2, not 
to be confused with -gdwarf-2.  I don't think such cryptic forms should be 
encouraged.

 Wstrict-aliasing 2

Ambiguous with the existing -Wstrict-aliasing option followed by a linker 
input file called 2.

-- 
Joseph S. Myers
jos...@codesourcery.com

[patch][wwwdocs] gcc 4.8 changes - mention scalability improvements

2012-10-13 Thread Steven Bosscher
Hello,

This patch adds a short notice about some speed-ups in GCC 4.8 for
extremely large functions (coming from the work done on PR54146 by
several people).
OK for the wwwdocs?

Ciao!
Steven



Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.44
diff -u -r1.44 changes.html
--- htdocs/gcc-4.8/changes.html 9 Oct 2012 18:44:55 -   1.44
+++ htdocs/gcc-4.8/changes.html 13 Oct 2012 22:45:59 -
@@ -65,10 +65,17 @@
   level, and it makes PRE more aggressive.
 /li
 liThe struct reorg and matrix reorg optimizations (command-line
-options code-fipa-struct-reorg/code and
-code-fipa-matrix-reorg/code) have been removed.  They did not
-work correctly nor with link-time optimization (LTO), hence were only
-applicable to programs consisting of a single translation unit./li
+ options code-fipa-struct-reorg/code and
+ code-fipa-matrix-reorg/code) have been removed.  They did not
+ work correctly nor with link-time optimization (LTO), hence were only
+ applicable to programs consisting of a single translation unit.
+/li
+liSeveral scalability bottle-necks have been removed from GCC's
+ optimization passes.  Compilation of extremely large functions,
+ e.g. due to the use of the codeflatten/code attribute in the
+ Eigen C++ linear algebra templates library, is significantly
+ faster than previous releases of GCC.
+/li
   /ul


Re: Propagate profile counts during switch expansion

2012-10-13 Thread Easwaran Raman
Ping.


On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman era...@google.com wrote:
 I have attached a revised patch. The updated ChangeLog is given below
 and I have responded to your comments inline:

 2012-10-08   Easwaran Raman  era...@google.com
 * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to
 specificy the probability of taking the jump.
 (emit_cmp_and_jump_insns): Likewise.
 (expand_compare_and_swap_loop): Make the jump predicted not taken.
 * dojump.c (do_compare_rtx_and_jump): Remove the code attaching
 REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns.
 * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing
 probabilities for branches with more than two successors.
 * expr.c (emit_block_move_via_loop): Predict the loop backedge loop
 to be highly taken.
 (try_casesi): Pass the probability of jumping to the default label.
 (try_tablejump): Likewise.
 (do_tablejump): Likewise.
 * expr.h (try_tablejump): Add a new parameter.
 (try_casesi): Likewise.
 (emit_cmp_and_jump_insns): Add probability as default parameter with a
 default value of -1.
 * except.c (sjlj_emit_function_enter): Pass probability to
 emit_cmp_and_jump_insns.
 * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB.
 (do_jump_if_equal): Pass probability for REG_BR_PROB note.
 (add_case_node): Pass estimated probability of jumping to the case
 label.
 (emit_case_decision_tree): Pass default_prob to emit_case_nodes.
 (get_outgoing_edge_probs): New function.
 (conditional_probability): Likewise.
 (reset_out_edges_aux): Likewise.
 (compute_cases_per_edge): Likewise.
 (emit_case_dispatch_table): Update probabilities of edges coming out
 of the switch statement.
 (expand_case): Compute and propagate default edge probability to
 emit_case_dispatch_table.
 (expand_sjlj_dispatch_table): Update calls to add_case_node and
 emit_case_dispatch_table.
 (balance_case_nodes): Update subtree_prob values.
 (emit_case_nodes): Compute edge probabilities and add pass them to
 emit_cmp_and_jump_insns.

 gcc/testsuite/ChangeLog:
 2012-10-02   Easwaran Raman  era...@google.com
 * gcc.dg/tree-prof/switch-case-1.c: New test case.
 * gcc.dg/tree-prof/switch-case-2.c: New test case.




 On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
  This patch propagates the profile counts during RTL expansion. In
 many cases, there is no way to determine the exact count of an edge
 generated during the expansion. So this patch uses some simple
 heuristics to estimate the edge counts but ensures that the counts of
 the basic blocks corresponding to the cases are (nearly the) same as
 at the gimple level.

 Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for 
 trunk?
 Index: gcc/expr.c
 ===
 --- gcc/expr.c (revision 191879)
 +++ gcc/expr.c (working copy)
 @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
  #ifdef PUSH_ROUNDING
  static void emit_single_push_insn (enum machine_mode, rtx, tree);
  #endif
 -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
 +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
  static rtx const_vector_from_tree (tree);
  static void write_complex_part (rtx, rtx, bool);

 @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree

  static void
  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx 
 table_label,
 -  rtx default_label)
 +  rtx default_label, int default_probability)

 Please document default_probability.
 Done.

  {
rtx temp, vector;

 @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
   the maximum value of the range.  */

if (default_label)
 -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 - default_label);
 +{
 +  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 +   default_label);
 +  if (default_probability != -1)
 +{
 +  rtx jump_insn = get_last_insn();
 +  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
 (default_probability));
 +}
 +}

 dojump already does this kind of logic, but it is bit more cureful:
   emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp,
if_true_label);
   if (prob != -1  profile_status != PROFILE_ABSENT)
 {
   for (last = NEXT_INSN (last);
last  NEXT_INSN (last);
last = NEXT_INSN (last))
 if (JUMP_P (last))
   break;
   if (last
JUMP_P (last)
! NEXT_INSN (last)
any_condjump_p (last))
 {
   gcc_assert (!find_reg_note (last, REG_BR_PROB, 0));
   add_reg_note (last, REG_BR_PROB, GEN_INT (prob));
 }
 }

 What about making emit_cmp_and_jump_insns taking the probability argument
 and moving the code above 

[C++ Patch] PR 17805

2012-10-13 Thread Paolo Carlini

Hi,

back in 2005 Alexandre worked on this issue up to the point that Mark 
approved a patch conditional to a couple of straightforward changes (see 
audit trail). Then nothing happened ;)


Today I resurrected the patch, implemented the requests and in the 
process noticed that it wasn't really handling the single argument case, 
thus I also extended a bit the testcase.


Tested x86_64-linux. (Finally) Ok?

Thanks,
Paolo.

///
/cp
2012-10-14  Alexandre Oliva  aol...@redhat.com
Paolo Carlini  paolo.carl...@oracle.com

PR c++/17805
* call.c (build_new_op): Filter out operator functions that don't
satisfy enum-conversion match requirements.

/testsuite
2012-10-14  Alexandre Oliva  aol...@redhat.com
Paolo Carlini  paolo.carl...@oracle.com

PR c++/17805
* g++.dg/overload/operator6.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 192425)
+++ cp/call.c   (working copy)
@@ -5043,6 +5043,11 @@ build_new_op_1 (location_t loc, enum tree_code cod
  NULL_TREE, arglist, NULL_TREE,
  NULL_TREE, false, NULL_TREE, NULL_TREE,
  flags, candidates, complain);
+
+  args[0] = arg1;
+  args[1] = arg2;
+  args[2] = NULL_TREE;
+
   /* Add class-member operators to the candidate set.  */
   if (CLASS_TYPE_P (TREE_TYPE (arg1)))
 {
@@ -5062,11 +5067,50 @@ build_new_op_1 (location_t loc, enum tree_code cod
BASELINK_ACCESS_BINFO (fns),
flags, candidates, complain);
 }
+  /* Per 13.3.1.2/3, 2nd bullet, if no operand has a class type, then
+ only non-member functions that have type T1 or reference to
+ cv-qualified-opt T1 for the first argument, if the first argument
+ has an enumeration type, or T2 or reference to cv-qualified-opt
+ T2 for the second argument, if the the second argument has an
+ enumeration type.  Filter out those that don't match.  */
+  else if (! arg2 || ! CLASS_TYPE_P (TREE_TYPE (arg2)))
+{
+  struct z_candidate **candp, **next;
 
-  args[0] = arg1;
-  args[1] = arg2;
-  args[2] = NULL_TREE;
+  for (candp = candidates; *candp; candp = next)
+   {
+ tree parmlist, parmtype;
+ int i, nargs = (arg2 ? 2 : 1);
 
+ cand = *candp;
+ next = cand-next;
+
+ parmlist = TYPE_ARG_TYPES (TREE_TYPE (cand-fn));
+
+ for (i = 0; i  nargs; ++i)
+   {
+ parmtype = TREE_VALUE (parmlist);
+
+ if (TREE_CODE (parmtype) == REFERENCE_TYPE)
+   parmtype = TREE_TYPE (parmtype);
+ if (TREE_CODE (TREE_TYPE (args[i])) == ENUMERAL_TYPE
+  (same_type_ignoring_top_level_qualifiers_p
+ (TREE_TYPE (args[i]), parmtype)))
+   break;
+
+ parmlist = TREE_CHAIN (parmlist);
+   }
+
+ /* No argument has an appropriate type, so remove this
+candidate function from the list.  */
+ if (i == nargs)
+   {
+ *candp = cand-next;
+ next = candp;
+   }
+   }
+}
+
   add_builtin_candidates (candidates, code, code2, fnname, args,
  flags, complain);
 
Index: testsuite/g++.dg/overload/operator6.C
===
--- testsuite/g++.dg/overload/operator6.C   (revision 0)
+++ testsuite/g++.dg/overload/operator6.C   (working copy)
@@ -0,0 +1,27 @@
+// PR c++/17805
+
+// Per 13.3.1.2/3 bullet 2, an operator function is not a candidate
+// for overload resolution if neither argument is of class type and
+// neither enumerator-typed argument gets an exact match, with or
+// without reference binding, for the corresponding parameter.
+
+struct A
+{
+  A(int);
+  A(const char*);
+};
+
+bool operator==(const A, const A);
+const A operator*(const A);
+
+enum E { e };
+
+bool b1 = (e == ); // { dg-error no match }
+
+bool b2 = (A(1) == );
+
+bool b3 = (e == A(1));
+
+const A a1 = *e;// { dg-error no match }
+
+const A a2 = *A(1);