date:20131030

Re: [PATCH] Keep REG_INC note in subreg2 pass

2013-10-30 Thread Zhenqiang Chen

On 30 October 2013 02:47, Jeff Law l...@redhat.com wrote:
 On 10/24/13 02:20, Zhenqiang Chen wrote:

 Hi,

 REG_INC note is lost in subreg2 pass when resolve_simple_move, which
 might lead to wrong dependence for ira. e.g. In function
 validate_equiv_mem of ira.c, it checks REG_INC note:

   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
  if ((REG_NOTE_KIND (note) == REG_INC
   || REG_NOTE_KIND (note) == REG_DEAD)
   REG_P (XEXP (note, 0))
   reg_overlap_mentioned_p (XEXP (note, 0), memref))
return 0;

 Without REG_INC note, validate_equiv_mem will return a wrong result.

 Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022  for more

 detail about a real case in kernel.

 Bootstrap and no make check regression on X86-64 and ARM.

 Is it OK for trunk and 4.8?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

  * lower-subreg.c (resolve_simple_move): Copy REG_INC note.

 testsuite/ChangeLog:
 2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

  * gcc.target/arm/lp1243022.c: New test.

 This clearly handles adding a note when the destination is a MEM with a side
 effect.  What about cases where the side effect is associated with a load
 from memory rather than a store to memory?

Yes. We should handle load from memory.



 lp1243022.patch


 diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
 index 57b4b3c..e710fa5 100644
 --- a/gcc/lower-subreg.c
 +++ b/gcc/lower-subreg.c
 @@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn)
 mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
 minsn = emit_move_insn (real_dest, mdest);

 +#ifdef AUTO_INC_DEC
 +  /* Copy the REG_INC notes.  */
 +  if (MEM_P (real_dest)  !(resolve_reg_p (real_dest)
 +|| resolve_subreg_p (real_dest)))
 +   {
 + rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
 + if (note)
 +   {
 + if (!REG_NOTES (minsn))
 +   REG_NOTES (minsn) = note;
 + else
 +   add_reg_note (minsn, REG_INC, note);
 +   }
 +   }
 +#endif

 If MINSN does not have any notes, then this results in MINSN and INSN
 sharing the note.  Note carefully that notes are chained (see implementation
 of add_reg_note).  Thus the sharing would result in MINSN and INSN actually
 sharing a chain of notes.  I'm pretty sure that's not what you intended.  I
 think you need to always use add_reg_note.

Yes. I should use add_reg_note.

Here is the updated patch:

diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index ebf364f..16dfa62 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn)
   rtx reg;

   reg = gen_reg_rtx (orig_mode);
+
+#ifdef AUTO_INC_DEC
+  {
+   rtx move = emit_move_insn (reg, src);
+   if (MEM_P (src))
+ {
+   rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+   if (note)
+ add_reg_note (move, REG_INC, XEXP (note, 0));
+ }
+  }
+#else
   emit_move_insn (reg, src);
+#endif
   src = reg;
 }

@@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn)
mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
   minsn = emit_move_insn (real_dest, mdest);

+#ifdef AUTO_INC_DEC
+  if (MEM_P (real_dest)  !(resolve_reg_p (real_dest)
+|| resolve_subreg_p (real_dest)))
+{
+  rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+  if (note)
+   add_reg_note (minsn, REG_INC, XEXP (note, 0));
+}
+#endif
+
   smove = single_set (minsn);
   gcc_assert (smove != NULL_RTX);

Re: libsanitizer merge from upstream r191666

2013-10-30 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 05:15:24PM -0700, Konstantin Serebryany wrote:
 Actually, I guessed the flags:

You don't have to guess them, if you look into
testsuite/g++/g++.log (or testsuite/gcc/gcc.log etc.), you can find them
there.

  % ../gcc-inst/bin/g++ -g -fsanitize=address -static-libasan -O2 -flto
 -fno-use-linker-plugin -flto-partition=none
 ../gcc/gcc/testsuite/c-c++-common/asan/stack-overflow-1.c; ./a.out
 21
 
 /tmp/ccgSw6NI.lto.o: In function `main':
 ../gcc/gcc/testsuite/c-c++-common/asan/stack-overflow-1.c:13:
 undefined reference to `.LASANPC0.2585'
 collect2: error: ld returned 1 exit status
 
 Looks like this patch is not friendly to -flto

I guess if you do:
...
-  tree str_cst;
+  tree str_cst, decl, id;
...
+  ASM_GENERATE_INTERNAL_LABEL (buf, LASANPC, current_function_funcdef_no);
+  id = get_identifier (buf);
+  decl = build_decl (DECL_SOURCE_LOCATION (current_function_decl),
+VAR_DECL, id, char_type_node);
+  SET_DECL_ASSEMBLER_NAME (decl, id);
+  TREE_ADDRESSABLE (decl) = 1;
...

it might work even for -flto.  The problem with -flto is that the default
set_decl_assembler_name langhook for -flto appends dot and some number to
the non-exported names, which is undesirable here, because we already make
sure those first numbers are unique for the whole compilation unit.

Jakub

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-30 Thread Yury Gribov


 so in the end I guess I have nothing against the original patch
 (with whatever form of the copy to reg call is desirable).

So ok to commit?

-Y

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-30 Thread Jakub Jelinek

On Wed, Oct 30, 2013 at 11:32:14AM +0400, Yury Gribov wrote:
  so in the end I guess I have nothing against the original patch
  (with whatever form of the copy to reg call is desirable).
 
 So ok to commit?

Ok with the change suggested by Richard, I think it was:
  addr = copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0));

Jakub

Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops

2013-10-30 Thread Tobias Burnus


Jason Merrill wrote:

On 10/28/2013 05:48 PM, Tobias Burnus wrote:

   DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4)
has now a 5th operator (RANGE_FOR_IVDEP), which has the value
boolean_true_node - or NULL_TREE (via begin_range_for_stmt).


You don't need to add an operand, you can use one of the 
TREE_LANG_FLAGs for a boolean flag.


Yes, of course. I somehow thought that the TREE_LANG_FLAG had to be 
attached to RANGE_FOR_EXPR (because the annotate expression will be 
placed around the condition). Trying to do so, I had problems to find a 
free TREE_LANG_FLAG.


But it makes much more sense to add the flag to RANGE_FOR_STMT; in that 
case, one can freely choose a flag.


Build and regtested on x86-64-gnu-linux.
OK?

Tobias
2013-10-30  Tobias Burnus  bur...@net-b.de

gcc/cp/
	PR other/33426
	* cp-tree.h (RANGE_FOR_IVDEP): Define.
	(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
	finish_for_cond): Take 'bool ivdep' parameter.
	* cp-array-notation.c (create_an_loop): Update call.
	* init.c (build_vec_init): Ditto.
	* pt.c (tsubst_expr): Ditto.
	* parser.c (cp_parser_iteration_statement, cp_parser_for,
	cp_parser_range_for, cp_convert_range_for): Update calls.
	(cp_parser_pragma): Accept GCC ivdep for 'while' and 'do'.
	* semantics.c (finish_while_stmt_cond, finish_do_stmt,
	finish_for_cond): Optionally build ivdep annotation.

gcc/testsuite/
	PR other/33426
	* g++.dg/vect/pr33426-ivdep-2.cc: New.
	* g++.dg/vect/pr33426-ivdep-3.cc: New.
	* g++.dg/vect/pr33426-ivdep-4.cc: New.

gcc/
	PR other/33426
	* gcc/tree-cfg.c (replace_loop_annotate): Replace warning by
	warning_at.

diff --git a/gcc/cp/cp-array-notation.c b/gcc/cp/cp-array-notation.c
index c700f58..e1fb0ee 100644
--- a/gcc/cp/cp-array-notation.c
+++ b/gcc/cp/cp-array-notation.c
@@ -71,7 +71,7 @@ create_an_loop (tree init, tree cond, tree incr, tree body)
   finish_expr_stmt (init);
   for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE);
   finish_for_init_stmt (for_stmt);
-  finish_for_cond (cond, for_stmt);
+  finish_for_cond (cond, for_stmt, false);
   finish_for_expr (incr, for_stmt);
   finish_expr_stmt (body);
   finish_for_stmt (for_stmt);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 507b389..fd79adb 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -116,6 +116,7 @@ c-common.h, not after.
6: IDENTIFIER_REPO_CHOSEN (in IDENTIFIER_NODE)
   DECL_CONSTRUCTION_VTABLE_P (in VAR_DECL)
   TYPE_MARKED_P (in _TYPE)
+  RANGE_FOR_IVDEP (in RANGE_FOR_STMT)
 
Usage of TYPE_LANG_FLAG_?:
0: TYPE_DEPENDENT_P
@@ -4088,6 +4089,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
 #define RANGE_FOR_EXPR(NODE)	TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 1)
 #define RANGE_FOR_BODY(NODE)	TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 2)
 #define RANGE_FOR_SCOPE(NODE)	TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 3)
+#define RANGE_FOR_IVDEP(NODE)	TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
 
 #define SWITCH_STMT_COND(NODE)	TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 0)
 #define SWITCH_STMT_BODY(NODE)	TREE_OPERAND (SWITCH_STMT_CHECK (NODE), 1)
@@ -4321,7 +4323,7 @@ extern int comparing_specializations;
sizeof can be nested.  */
 
 extern int cp_unevaluated_operand;
-extern tree cp_convert_range_for (tree, tree, tree);
+extern tree cp_convert_range_for (tree, tree, tree, bool);
 extern bool parsing_nsdmi (void);
 
 /* in pt.c  */
@@ -5671,16 +5673,16 @@ extern void begin_else_clause			(tree);
 extern void finish_else_clause			(tree);
 extern void finish_if_stmt			(tree);
 extern tree begin_while_stmt			(void);
-extern void finish_while_stmt_cond		(tree, tree);
+extern void finish_while_stmt_cond		(tree, tree, bool);
 extern void finish_while_stmt			(tree);
 extern tree begin_do_stmt			(void);
 extern void finish_do_body			(tree);
-extern void finish_do_stmt			(tree, tree);
+extern void finish_do_stmt			(tree, tree, bool);
 extern tree finish_return_stmt			(tree);
 extern tree begin_for_scope			(tree *);
 extern tree begin_for_stmt			(tree, tree);
 extern void finish_for_init_stmt		(tree);
-extern void finish_for_cond			(tree, tree);
+extern void finish_for_cond			(tree, tree, bool);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
 extern tree begin_range_for_stmt		(tree, tree);
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 78ea986..bfd9152 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3667,7 +3667,7 @@ build_vec_init (tree base, tree maxindex, tree init,
   finish_for_init_stmt (for_stmt);
   finish_for_cond (build2 (NE_EXPR, boolean_type_node, iterator,
 			   build_int_cst (TREE_TYPE (iterator), -1)),
-		   for_stmt);
+		   for_stmt, false);
   elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, 0,
 complain);
   if (elt_init == error_mark_node)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 8deffc3..9e28ced 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1978,7 +1978,7 @@ static tree

Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-30 Thread Richard Biener

On Tue, Oct 29, 2013 at 8:48 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2013/10/29 Jeff Law l...@redhat.com:
 On 10/29/13 07:52, Ilya Enkovich wrote:


 Yeah.  I'm working on it right now.  I've fixed known issues and now
 I'm looking for others.  Meanwhile here is a new patch version with
 required renames and without LTO restriction.

 I can't help but but curious, what turned out to be the root cause of those
 LTO problems?

 There were three different problems fixed.

 The first one was SSA_NAME in DECL_INITIAL of local var.
 Instrumentation used it to initialize var with input arg value
 (default SSA_NAME of PARM_DECL was used). LTO cannot handle it because
 when it reads symbols, it does not have SSA_NAMEs. It caused ICE.

Obviously putting things in trees is bad.

 Another problem was in LTO front-end. I did not realize it has own
 langhooks. It caused reset of flag_check_pointer_bounds in
 process_options by my own code.

 And the last one was in initialization of checker structures. Some
 structures were initialized during checker pass and then used in other
 passes (e.g. expand). With LTO checker pass is not executed after LTO
 front-end and following passes could work with uninitialized checker
 structures.

Looks badly designed then - any function related information should
be hooked off struct function and streamed by LTO.  Or the info
should be present in the IL.

Richard.




 2013-10-29  Ilya Enkovich  ilya.enkov...@intel.com

 * builtin-types.def (BT_FN_VOID_CONST_PTR): New.
 (BT_FN_PTR_CONST_PTR): New.
 (BT_FN_CONST_PTR_CONST_PTR): New.
 (BT_FN_PTR_CONST_PTR_SIZE): New.
 (BT_FN_PTR_CONST_PTR_CONST_PTR): New.
 (BT_FN_VOID_PTRPTR_CONST_PTR): New.
 (BT_FN_VOID_CONST_PTR_SIZE): New.
 (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New.
 * chkp-builtins.def: New.
 * builtins.def: include chkp-builtins.def.
 (DEF_CHKP_BUILTIN): New.
 * builtins.c (expand_builtin): Support
 BUILT_IN_CHKP_INIT_PTR_BOUNDS,
 BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS,
 BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS,
 BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS,
 BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS,
 BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND,
 BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL,
 BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET,
 BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND,
 BUILT_IN_CHKP_NARROW,
 BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER.
 * common.opt (fcheck-pointer-bounds): New.
 * toplev.c (process_options): Check Pointer Bounds Checker is
 supported.
 * doc/extend.texi: Document Pointer Bounds Checker built-in
 functions.

 This is fine.  Please install.

 Thanks!

 Ilya

 Thanks,
 jeff

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Richard Biener

On Tue, Oct 29, 2013 at 10:53 PM, Ilya Enkovich enkovich@gmail.com wrote:
 On 29 Oct 13:39, Jeff Law wrote:
 On 10/24/13 08:43, Ilya Enkovich wrote:
 
 +/* Return the number of arguments used by call statement GS.  */
 +
 +static inline unsigned
 +gimple_call_num_nobnd_args (const_gimple gs)
 +{
 +  unsigned num_args = gimple_call_num_args (gs);
 +  unsigned res = num_args;
 +  for (unsigned n = 0; n  num_args; n++)
 +if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
 +  res--;
 +  return res;
 +}
 number of arguments seems wrong.  Aren't you counting the number
 of arguments without bounds?

 Damned copy-paste.


 +
 +/* Nonzero if this type supposes bounds existence.  */
 +#define BOUNDED_TYPE_P(type) \
 +  (TREE_CODE (type) == POINTER_TYPE \
 +|| TREE_CODE (type) == REFERENCE_TYPE)
 So how is this different than POINTER_TYPE_P?


 If you really want BOUNDED_TYPE_P, perhaps define it in terms of
 POINTER_TYPE_P?

 With that and the comment fix, this is fine.

 jeff

 I'd like to keep this macro.  Currently it is equal to POINTER_TYPE_P but 
 semantics is a little different.

 Below is a fixed version to be committed.

 Thanks!
 Ilya
 --

 diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
 index e4b0f81..248dfea 100644
 --- a/gcc/gimple-pretty-print.c
 +++ b/gcc/gimple-pretty-print.c
 @@ -539,11 +539,12 @@ dump_gimple_assign (pretty_printer *buffer, gimple gs, 
 int spc, int flags)
  static void
  dump_gimple_return (pretty_printer *buffer, gimple gs, int spc, int flags)
  {
 -  tree t;
 +  tree t, t2;

t = gimple_return_retval (gs);
 +  t2 = gimple_return_retbnd (gs);
if (flags  TDF_RAW)
 -dump_gimple_fmt (buffer, spc, flags, %G %T, gs, t);
 +dump_gimple_fmt (buffer, spc, flags, %G %T %T, gs, t, t2);
else
  {
pp_string (buffer, return);
 @@ -552,6 +553,11 @@ dump_gimple_return (pretty_printer *buffer, gimple gs, 
 int spc, int flags)
   pp_space (buffer);
   dump_generic_node (buffer, t, spc, flags, false);
 }
 +  if (t2)
 +   {
 + pp_string (buffer, , );
 + dump_generic_node (buffer, t2, spc, flags, false);
 +   }
pp_semicolon (buffer);
  }
  }
 diff --git a/gcc/gimple.c b/gcc/gimple.c
 index 3ddceb9..20f6010 100644
 --- a/gcc/gimple.c
 +++ b/gcc/gimple.c
 @@ -174,7 +174,7 @@ gimple_build_with_ops_stat (enum gimple_code code, 
 unsigned subcode,
  gimple
  gimple_build_return (tree retval)
  {
 -  gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 1);
 +  gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 2);

Ick - you enlarge all return statements?  But you don't set the actual value?
So why allocate it with 2 ops in the first place??

[Seems I completely missed that MPX changes gimple and the design
document that was posted somewhere??]

Bah.

Where is the update to gimple.texi and tree.texi?

Richard.

if (retval)
  gimple_return_set_retval (s, retval);
return s;
 @@ -366,6 +366,26 @@ gimple_build_call_from_tree (tree t)
  }


 +/* Return index of INDEX's non bound argument of the call.  */
 +
 +unsigned
 +gimple_call_get_nobnd_arg_index (const_gimple gs, unsigned index)
 +{
 +  unsigned num_args = gimple_call_num_args (gs);
 +  for (unsigned n = 0; n  num_args; n++)
 +{
 +  if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
 +   continue;
 +  else if (index)
 +   index--;
 +  else
 +   return n;
 +}
 +
 +  gcc_unreachable ();
 +}
 +
 +
  /* Extract the operands and code for expression EXPR into *SUBCODE_P,
 *OP1_P, *OP2_P and *OP3_P respectively.  */

 diff --git a/gcc/gimple.h b/gcc/gimple.h
 index fef64cd..c7ce394 100644
 --- a/gcc/gimple.h
 +++ b/gcc/gimple.h
 @@ -919,6 +919,7 @@ extern tree get_initialized_tmp_var (tree, gimple_seq *, 
 gimple_seq *);
  extern tree get_formal_tmp_var (tree, gimple_seq *);
  extern void declare_vars (tree, gimple, bool);
  extern void annotate_all_with_location (gimple_seq, location_t);
 +extern unsigned gimple_call_get_nobnd_arg_index (const_gimple, unsigned);

  /* Validation of GIMPLE expressions.  Note that these predicates only check
 the basic form of the expression, they don't recurse to make sure that
 @@ -2414,6 +2415,32 @@ gimple_call_arg (const_gimple gs, unsigned index)
  }


 +/* Return the number of arguments used by call statement GS
 +   ignoring bound ones.  */
 +
 +static inline unsigned
 +gimple_call_num_nobnd_args (const_gimple gs)
 +{
 +  unsigned num_args = gimple_call_num_args (gs);
 +  unsigned res = num_args;
 +  for (unsigned n = 0; n  num_args; n++)
 +if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
 +  res--;
 +  return res;
 +}
 +
 +
 +/* Return INDEX's call argument ignoring bound ones.  */
 +static inline tree
 +gimple_call_nobnd_arg (const_gimple gs, unsigned index)
 +{
 +  /* No bound args may exist if pointers checker is off.  */
 +  if (!flag_check_pointer_bounds)
 +return gimple_call_arg (gs, index);
 +  return

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Uros Bizjak

On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote:

 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.

 No, it wont be.

 Fallthrough will generate the pattern that will be matched by the insn
 pattern above, just like you are doing by hand below.


 I think the case is special for abs(). In optabs.c, there is a
 function expand_abs() in which the function expand_abs_nojump() is
 called. This function first tries the expand function defined for the
 target and if it fails it will try max(v, -v) then shift-xor-sub
 method. If I don't generate any instruction for SSSE3, the
 fall-through will be max(v, -v). I have tested it on my machine.

I have tested the usual approach in i386.md, shown exactly by the
patch below (and using your other changes to i386.c):

-- cut here --
Index: sse.md
===
--- sse.md  (revision 204149)
+++ sse.md  (working copy)
@@ -10270,7 +10270,7 @@
(set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn)))
(set_attr mode DI)])

-(define_insn absmode2
+(define_insn *absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v)
(abs:VI124_AVX2_48_AVX512F
  (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))]
@@ -10282,6 +10282,19 @@
(set_attr prefix maybe_vex)
(set_attr mode sseinsnmode)])

+(define_expand absmode2
+  [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand)
+   (abs:VI124_AVX2_48_AVX512F
+ (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))]
+  TARGET_SSE2
+{
+  if (!TARGET_SSSE3)
+{
+  ix86_expand_sse2_abs (operands[0], operands[1]);
+  DONE;
+}
+})
+
 (define_insn absmode2
   [(set (match_operand:MMXMODEI 0 register_operand =y)
(abs:MMXMODEI
-- cut here --

using following testcase:

--cut here--
#define N 32

int ca[N];
int cb[N];

void test1 (void)
{
  int i;
  for (i = 0; i  N; ++i)
cb[i] = abs (ca[i]);
}
-- cut here --

Compiling on x86_64-pc-linux-gnu target (inherently -msse2):

~/gcc-build-fast/gcc/cc1 -O2 -ftree-vectorize -dp t.c

.L2:
movdqa  ca(%rax), %xmm0 # 25*movv4si_internal/2 [length = 8]
addq$16, %rax   # 30*adddi_1/1  [length = 4]
movdqa  ca-16(%rax), %xmm1  # 46*movv4si_internal/2
 [length = 8]
psrad   $31, %xmm0  # 26ashrv4si3/1 [length = 5]
pxor%xmm0, %xmm1# 47*xorv4si3/1 [length = 4]
psubd   %xmm0, %xmm1# 28*subv4si3/1 [length = 4]
movaps  %xmm1, cb-16(%rax)  # 29*movv4si_internal/3
 [length = 7]
cmpq$128, %rax  # 32*cmpdi_1/1  [length = 6]
jne .L2 # 33*jcc_1  [length = 2]

~/gcc-build-fast/gcc/cc1 -O2 -ftree-vectorize -mssse3 -dp t.c

.L2:
pabsd   ca(%rax), %xmm0 # 25*absv4si2   [length = 9]
addq$16, %rax   # 27*adddi_1/1  [length = 4]
movaps  %xmm0, cb-16(%rax)  # 26*movv4si_internal/3
 [length = 7]
cmpq$128, %rax  # 29*cmpdi_1/1  [length = 6]
jne .L2 # 30*jcc_1  [length = 2]

As shown above, it worked OK for both with -mssse3 (generating pabsd
insn) and without (generating your V4SI sequence).

Uros.

RE: [PATCH 1/n] Add conditional compare support

2013-10-30 Thread Zhenqiang Chen

 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Monday, October 28, 2013 11:07 PM
 To: Zhenqiang Chen; Richard Earnshaw; 'Richard Biener'
 Cc: GCC Patches
 Subject: Re: [PATCH 1/n] Add conditional compare support

 On 10/28/2013 01:32 AM, Zhenqiang Chen wrote:
  Patch is updated according to your comments. Main changes are:
  * Add two hooks: legitimize_cmp_combination and
  legitimize_ccmp_combination
  * Improve document.

 No, these are not the hooks I proposed.

 You should *not* have a ccmp_optab, because the middle-end has
 absolutely no idea what mode TARGET should be in.

 The hook needs to return an rtx expression appropriate for feeding to
 cbranch et al.  E.g. for arm,

   (ne (reg:CC_Z CC_REGNUM) (const_int 0))

 after having emitted the instruction that sets CC_REGNUM.

 We need to push this to the backend because for ia64 the expression needs
 to be of te form

   (ne (reg:BI new_pseudo) (const_int 0))

Thanks for the clarification.
Patch is updated:
* ccmp_optab is removed.
* Add two hooks gen_ccmp_with_cmp_cmp and gen_ccmp_with_ccmp_cmp for backends 
to generate the conditional compare instructions.

Thanks!
-Zhenqiang

ccmp-hook2.patch
Description: Binary data

Re: [RFA][PATCH] Minor fix to aliasing machinery

2013-10-30 Thread Richard Biener

On Tue, Oct 29, 2013 at 10:36 PM, Jeff Law l...@redhat.com wrote:

 Marc pointed out that the handling of various BUILT_IN_MEM* and
 BUILT_IN_STR* functions in tree-ssa-alias.c probably wasn't working as
 intended because the code wasn't prepared for a common return value from
 ao_ref_base, particularly returns of MEM_REFs.

 This patch fixes the code to handle the trivial case of returning a MEM_REF
 and adds a simple testcase.  There's probably a lot more that could be done
 here.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Ok for the
 trunk?

 Thanks,
 Jeff

 * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle case where
 ao_ref_base returns a MEM_REF.

 * gcc.dg/tree-ssa/alias-26.c: New test.

 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
 b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
 new file mode 100644
 index 000..b5625b8
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
 @@ -0,0 +1,12 @@
 +/* { dg-do compile } */
 +/* { dg-options -O1 -fdump-tree-optimized } */
 +
 +void f (long *p) {
 +  *p = 42;
 +  p[4] = 42;
 +  __builtin_memset (p, 0, 100);
 +}
 +
 +/* { dg-final { scan-tree-dump-not = 42 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 +
 diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
 index 4db83bd..5120e72 100644
 --- a/gcc/tree-ssa-alias.c
 +++ b/gcc/tree-ssa-alias.c
 @@ -2079,6 +2079,7 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
   tree dest = gimple_call_arg (stmt, 0);
   tree len = gimple_call_arg (stmt, 2);
   tree base = NULL_TREE;
 + tree ref_base;
   HOST_WIDE_INT offset = 0;
   if (!host_integerp (len, 0))
 return false;
 @@ -2087,8 +2088,11 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
   offset);
   else if (TREE_CODE (dest) == SSA_NAME)
 base = dest;
 + ref_base = ao_ref_base (ref);
   if (base
 -  base == ao_ref_base (ref))
 +  ((TREE_CODE (ref_base) == MEM_REF
 +   base == TREE_OPERAND (ref_base, 0))

That's not sufficient - ref_base may have an offset, so for correctness
you have to check that integer_zerop (TREE_OPERAND (ref_base, 0)).
But this now looks convoluted and somewhat backward, and still
does not catch all cases (including the def-stmt lookup recently
added to ao_ref_from_ptr_and_size).

Richard.

 + || ref_base == base))
 {
   HOST_WIDE_INT size = TREE_INT_CST_LOW (len);
   if (offset = ref-offset / BITS_PER_UNIT

Re: [RFA][PATCH] Minor fix to aliasing machinery

2013-10-30 Thread Richard Biener

On Tue, Oct 29, 2013 at 11:23 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 29 Oct 2013, Jeff Law wrote:

 Marc pointed out that the handling of various BUILT_IN_MEM* and
 BUILT_IN_STR* functions in tree-ssa-alias.c probably wasn't working as
 intended because the code wasn't prepared for a common return value from
 ao_ref_base, particularly returns of MEM_REFs.


 Hmm, ao_ref_base is never a pointer, so I'd say the issue is really with
 trying to use the SSA_NAME directly.

Yes.  Note that the code tries to relate two pointers but one is a
memory-reference (ao_ref_base) and one is either a SSA name pointer
or the result of get_addr_base_and_unit_offset (a memory reference as well).

 This patch fixes the code to handle the trivial case of returning a
 MEM_REF and adds a simple testcase.  There's probably a lot more that could
 be done here.


 Thanks.

 I am not sure we want to keep the variable base that is either a decl/ref
 (from get_addr_base_and_unit_offset) or a pointer (dest). We know which case
 is which, but then forget it by storing both into base. Maybe something like
 this would be more type-safe.

   bool same = false;
   if (TREE_CODE (dest) == ADDR_EXPR)
 same = (ref_base == get_addr_base_and_unit_offset
   (TREE_OPERAND (dest, 0), offset));
   else if (TREE_CODE (dest) == SSA_NAME
 TREE_CODE (ref_base) == MEM_REF)

integer_zerop (TREE_OPERAND (ref_base, 1))

 same = (TREE_OPERAND (ref_base, 0) == dest);
   if (same)
 ...

Btw, get_addr_base_and_unit_offset may also return an offsetted
MEM_REF (from MEM [p_3, 17] for example).  As we are interested in
pointers this could be handled by not requiring a memory reference
but extracting the base address and offset, covering more cases.

 By the way, I think the patch is fine as is, I am only discussing possible
 follow-ups.

Only slightly wrong-codish ;)

Richard.

 (see http://gcc.gnu.org/ml/gcc-patches/2013-10/txto0PQEYpiuz.txt for another
 approach using ao_ref_init_from_ptr_and_size)

 --
 Marc Glisse

[RFC/CFT] auto-wipe dump files [was: Re: [committed] Fix up bb-slp-31.c testcase]

2013-10-30 Thread Bernhard Reutner-Fischer

 Hi!

 I've noticed that this testcase doesn't clean up after itself.
 Fixed thusly, committed as obvious to trunk.

This was nagging me last weekend.. ;)
What about automating this?

Manual part is attached.
The Adjust all callers below is too big to send to the list:
git grep -l -E (cleanup-.*-dump|cleanup-saved-temps) | \
egrep -v (ChangeLog|/lib/) | sed -e s|[^/]*$|| | sort | uniq | \
while read d;
do
  find $d -type f \
-exec sed -i -e /cleanup-[^-]*[-]*dump/d;/cleanup-saved-temps/d {} +
done


Full regstrap on x86_64-unknown-linux-gnu with no regressions with
trunk@204119 for
configure \
--enable-bootstrap \
--with-system-zlib \
--without-included-gettext \
--disable-werror \
--enable-link-mutex \
--enable-nls \
--enable-plugin \
--enable-__cxa_atexit \
--enable-debug \
--enable-checking=yes,rtl \
--enable-gather-detailed-mem-stats \
--enable-multilib \
--enable-multiarch \
--with-linker-hash-style=both \
--with-as=$BINU/as \
--with-ld=$BINU/ld.gold \
--enable-languages=c,c++,fortran,lto,go,objc,obj-c++ \
 make -k check -j4

Ok for trunk?
Comments?

Given the Fix comment delimiter hunks in the manual patch, i'd suggest
to add -Wcomment as default flags where possible to catch these early on
in the future.

gcc/testsuite/ChangeLog

2013-10-12  Bernhard Reutner-Fischer  al...@gcc.gnu.org

* lib/gcc-dg.exp (cleanup-ipa-dump, cleanup-rtl-dump,
cleanup-tree-dump, cleanup-dump): Remove. Adjust all callers.
(schedule-cleanups): New proc.
(gcc-dg-test-1): Call it.
* lib/profopt.exp (profopt-execute): Likewise.
* g++.dg/cdce3.C: Adjust expected line numbers.
* gcc.dg/cdce1.c: Likewise.
* gcc.dg/cdce2.c: Likewise.
* gcc.dg/strlenopt-22.c: Fix comment delimiter.
* gcc.dg/strlenopt-24.c: Likewise.
* gcc.dg/tree-ssa/vrp26.c: Likewise.
* gcc.dg/tree-ssa/vrp28.c: Likewise.
* obj-c++.dg/encode-2.mm: Likewise.

libgomp/ChangeLog

2013-10-12  Bernhard Reutner-Fischer  al...@gcc.gnu.org

* testsuite/libgomp.graphite/bounds.c: Adjust for
cleanup-tree-dump removal.
* testsuite/libgomp.graphite/force-parallel-1.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-2.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-3.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-4.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-5.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-6.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-7.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-8.c: Likewise.
* testsuite/libgomp.graphite/force-parallel-9.c: Likewise.
* testsuite/libgomp.graphite/pr41118.c: Likewise.


gcc/ChangeLog

2013-10-12  Bernhard Reutner-Fischer  al...@gcc.gnu.org

* config/arm/neon-testgen.ml (emit_epilogue): Remove manual call
to cleanup-saved-temps.

gcc/doc/ChangeLog

2013-10-12  Bernhard Reutner-Fischer  al...@gcc.gnu.org

* doc/sourcebuild.texi (Clean up generated test files): Expand
introduction.
(cleanup-ipa-dump, cleanup-rtl-dump, cleanup-tree-dump,
cleanup-saved-temps): Remove.
From dc181880947cbfb3d652c6d9530cea07cf8280d8 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer rep.dot@gmail.com
Date: Fri, 18 Oct 2013 21:08:49 +0200
Subject: [PATCH] auto-wipe dump files

---
 gcc/config/arm/neon-testgen.ml|   1 -
 gcc/doc/sourcebuild.texi  |  19 ++--
 gcc/testsuite/g++.dg/cdce3.C  |   5 +-
 gcc/testsuite/gcc.dg/cdce1.c  |   3 +-
 gcc/testsuite/gcc.dg/cdce2.c  |   3 +-
 gcc/testsuite/gcc.dg/strlenopt-22.c   |   3 +-
 gcc/testsuite/gcc.dg/strlenopt-24.c   |   3 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp26.c |   3 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp28.c |   3 +-
 gcc/testsuite/lib/dg-pch.exp  |  18 +++-
 gcc/testsuite/lib/gcc-dg.exp  | 160 --
 gcc/testsuite/lib/profopt.exp |   3 +
 gcc/testsuite/obj-c++.dg/encode-2.mm  |   3 +-
 13 files changed, 151 insertions(+), 76 deletions(-)

diff --git a/gcc/config/arm/neon-testgen.ml b/gcc/config/arm/neon-testgen.ml
index 543318b..4734ac0 100644
--- a/gcc/config/arm/neon-testgen.ml
+++ b/gcc/config/arm/neon-testgen.ml
@@ -139,7 +139,6 @@ let emit_epilogue chan features regexps =
  else
()
 );
-Printf.fprintf chan /* { dg-final { cleanup-saved-temps } } */\n
 
 (* Check a list of C types to determine which ones are pointers and which
ones are const.  *)
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1a70916..7e0ebd9 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2145,13 +2145,17 @@ Check branch and/or call counts, in addition to line counts, in
 
 @subsubsection Clean up generated test files
 
+Usually the test-framework removes files that were generated during
+testing. If a testcase, for example, uses any

Re: Aliasing: look through pointer's def stmt

2013-10-30 Thread Richard Biener

On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 29 Oct 2013, Richard Biener wrote:

 For the POINTER_PLUS_EXPR offset argument you should use
 int_cst_value () to access it (it will unconditionally sign-extend)
 and use host_integerp (..., 0).  That leaves the overflow possibility
 in place (and you should multiply by BITS_PER_UNIT) which we
 ignore in enough other places similar to this to ignore ...


 Like this? (passes bootstrap+testsuite on x86_64-linux-gnu)

 2013-10-30  Marc Glisse  marc.gli...@inria.fr


 gcc/
 * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a
 POINTER_PLUS_EXPR in the defining statement.

 gcc/testsuite/
 * gcc.dg/tree-ssa/alias-24.c: New file.


 --
 Marc Glisse

 Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy)
 @@ -0,0 +1,22 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +void f (const char *c, int *i)
 +{
 +  *i = 42;
 +  __builtin_memcpy (i + 1, c, sizeof (int));
 +  if (*i != 42) __builtin_abort();
 +}
 +
 +extern void keepit ();
 +void g (const char *c, int *i)
 +{
 +  *i = 33;
 +  __builtin_memcpy (i - 1, c, 3 * sizeof (int));
 +  if (*i != 33) keepit();
 +}
 +
 +/* { dg-final { scan-tree-dump-not abort optimized } } */
 +/* { dg-final { scan-tree-dump keepit optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 +

 Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
 ___
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Index: gcc/tree-ssa-alias.c
 ===
 --- gcc/tree-ssa-alias.c(revision 204188)
 +++ gcc/tree-ssa-alias.c(working copy)
 @@ -567,20 +567,29 @@ void
  ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
  {
HOST_WIDE_INT t1, t2;
ref-ref = NULL_TREE;
if (TREE_CODE (ptr) == SSA_NAME)
  {
gimple stmt = SSA_NAME_DEF_STMT (ptr);
if (gimple_assign_single_p (stmt)
gimple_assign_rhs_code (stmt) == ADDR_EXPR)
 ptr = gimple_assign_rhs1 (stmt);
 +  else if (is_gimple_assign (stmt)
 +   gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
 +   host_integerp (gimple_assign_rhs2 (stmt), 0)
 +   (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0)

No need to restrict this to positive offsets I think.

 +   {
 + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt),
 size);

Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs
and MEM[ptr, offset] so it shouldn't be necessary.

 + ref-offset += 8 * t1;

BITS_PER_UNIT instead of 8.  I'd say just have a 0-initialized
additional_offset var that you unconditionally add ...

 + return;
 +   }
  }

if (TREE_CODE (ptr) == ADDR_EXPR)
  ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
  ref-offset, t1, t2);
else
  {
ref-base = build2 (MEM_REF, char_type_node,
   ptr, null_pointer_node);
ref-offset = 0;

... here at the end.

Thanks,
Richard.

RE: [PATCH 1/n] Add conditional compare support

2013-10-30 Thread Zhenqiang Chen

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Hans-Peter Nilsson
 Sent: Tuesday, October 29, 2013 10:38 AM
 To: Zhenqiang Chen
 Cc: Richard Earnshaw; 'Richard Biener'; GCC Patches
 Subject: RE: [PATCH 1/n] Add conditional compare support

 On Tue, 22 Oct 2013, Zhenqiang Chen wrote:
  ChangeLog:
  2013-10-22  Zhenqiang Chen  zhenqiang.c...@linaro.org

  * config/arm/arm.c (arm_fixed_condition_code_regs,
 arm_ccmode_to_code,
  arm_select_dominance_ccmp_mode): New functions.
  (arm_select_dominance_cc_mode_1): New function extracted from
  arm_select_dominance_cc_mode.
  (arm_select_dominance_cc_mode): Call
 arm_select_dominance_cc_mode_1.
  * config/arm/arm.md (ccmp, cbranchcc4, ccmp_and, ccmp_ior,
  ccmp_ior_scc_scc, ccmp_ior_scc_scc_cmp, ccmp_and_scc_scc,
  ccmp_and_scc_scc_cmp): New.
  * config/arm/arm-protos.h (arm_select_dominance_ccmp_mode):
 New.
  * expr.c (ccmp_candidate_p, used_in_cond_stmt_p,
 expand_ccmp_expr_2,
  expand_ccmp_expr_3, expand_ccmp_expr_1, expand_ccmp_expr):
 New.
  (expand_expr_real_1): Handle ccmp.
  * optabs.c: Include gimple.h.
  (expand_ccmp_op): New.
  (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
  * optabs.def (ccmp): New.
  * optabs.h (expand_ccmp_op): New.
  * doc/md.texi (ccmp): New index.

 One thing I don't see other people mentioning, is that this patch has just
too
 much code inside #ifdef HAVE_ccmp ... #endif.

 I couldn't actually find the part that *requires* that, i.e.
 code that does something like gen_ccmp (...) but maybe it's there.

In the arm.md, I define (define_expand ccmp  It will be transferred
to function gen_ccmp (...). And it will define HAVE_ccmp.

In the updated patch, I add two backends hooks to generate conditional
compare instruction.

 Where needed and where the conditioned code is more than a few lines,
 such code is preferably transformed into if (0):d code using constructs
like
 that at builtins.c:5354:

 #ifndef HAVE_atomic_clear
 # define HAVE_atomic_clear 0
 # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif

 Right, this causes dead code, but for maintenance it's *much* better than
 when only a fraction of the code being compiled for other targets.  (Also,
the

HAVE_ccmp is automatically generated when the target has ccmp instruction.
The code segments in #ifdef HAVE_ccmp ... #endif will *not* be compiled for
other targets, which do not define ccmp instruction.

 dead code may be eliminated by gcc.) Unfortunately the number of
 examples (as above) are few compared to the pages of #if
 HAVE_thisorthat'd code. :(

 (And IMHO that whole construct should be the default implementation and
 shouldn't have to be written manually in the first place.  But that's
material
 for an invasive patch.)

After adding another two hooks, the code in middle-end is just a framework.
It fully depends on backend to generate the instruction.

Thanks!
-Zhenqiang

[RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Jakub Jelinek

Hi!

Yesterday I've noticed that for AVX which allows unaligned operands in
AVX arithmetics instructions we still don't combine unaligned loads with the
AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
void
f1 (int *__restrict e, int *__restrict f)
{
  int i;
  for (i = 0; i  1024; i++)
e[i] = f[i] * 7;
}

void
f2 (int *__restrict e, int *__restrict f)
{
  int i;
  for (i = 0; i  1024; i++)
e[i] = f[i];
}
we have:
vmovdqu (%rsi,%rax), %xmm0
vpmulld %xmm1, %xmm0, %xmm0
vmovups %xmm0, (%rdi,%rax)
in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
*movmode_internal patterns (and various others) use misaligned_operand
to see if they should emit vmovaps or vmovups (etc.), so as suggested by
Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory
operands of all the various non-move AVX instructions for TARGET_AVX, or
add extra patterns to help combine, this patch instead just uses the
*movmode_internal in that case (assuming initially misaligned_operand
doesn't become !misaligned_operand through RTL optimizations).  Additionally
the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
loads, which usually means combine will fail, by doing the load into a
temporary pseudo in that case and then doing a pseudo to pseudo move with
gen_lowpart on the rhs (which will be merged soon after into following
instructions).

I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
bootstrap/regtest server isn't AVX capable.

2013-10-30  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If
op1 is misaligned_operand, just use *movmode_internal insn
rather than UNSPEC_LOADU load.
(ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only).
Avoid gen_lowpart on op0 if it isn't MEM.

--- gcc/config/i386/i386.c.jj   2013-10-30 08:15:38.0 +0100
+++ gcc/config/i386/i386.c  2013-10-30 10:20:22.684708729 +0100
@@ -16560,6 +16560,12 @@ ix86_avx256_split_vector_move_misalign (
  r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
  emit_move_insn (op0, r);
}
+  /* Normal *movmode_internal pattern will handle
+unaligned loads just fine if misaligned_operand
+is true, and without the UNSPEC it can be combined
+with arithmetic instructions.  */
+  else if (misaligned_operand (op1, GET_MODE (op1)))
+   emit_insn (gen_rtx_SET (VOIDmode, op0, op1));
   else
emit_insn (load_unaligned (op0, op1));
 }
@@ -16634,7 +16640,7 @@ ix86_avx256_split_vector_move_misalign (
 void
 ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
-  rtx op0, op1, m;
+  rtx op0, op1, orig_op0 = NULL_RTX, m;
   rtx (*load_unaligned) (rtx, rtx);
   rtx (*store_unaligned) (rtx, rtx);
 
@@ -16647,7 +16653,16 @@ ix86_expand_vector_move_misalign (enum m
{
case MODE_VECTOR_INT:
case MODE_INT:
- op0 = gen_lowpart (V16SImode, op0);
+ if (GET_MODE (op0) != V16SImode)
+   {
+ if (!MEM_P (op0))
+   {
+ orig_op0 = op0;
+ op0 = gen_reg_rtx (V16SImode);
+   }
+ else
+   op0 = gen_lowpart (V16SImode, op0);
+   }
  op1 = gen_lowpart (V16SImode, op1);
  /* FALLTHRU */
 
@@ -16676,6 +16691,8 @@ ix86_expand_vector_move_misalign (enum m
emit_insn (store_unaligned (op0, op1));
  else
gcc_unreachable ();
+ if (orig_op0)
+   emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
  break;
 
default:
@@ -16692,12 +16709,23 @@ ix86_expand_vector_move_misalign (enum m
{
case MODE_VECTOR_INT:
case MODE_INT:
- op0 = gen_lowpart (V32QImode, op0);
+ if (GET_MODE (op0) != V32QImode)
+   {
+ if (!MEM_P (op0))
+   {
+ orig_op0 = op0;
+ op0 = gen_reg_rtx (V32QImode);
+   }
+ else
+   op0 = gen_lowpart (V32QImode, op0);
+   }
  op1 = gen_lowpart (V32QImode, op1);
  /* FALLTHRU */
 
case MODE_VECTOR_FLOAT:
  ix86_avx256_split_vector_move_misalign (op0, op1);
+ if (orig_op0)
+   emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
  break;
 
default:
@@ -16709,15 +16737,30 @@ ix86_expand_vector_move_misalign (enum m
 
   if (MEM_P (op1))
 {
+  /* Normal *movmode_internal pattern will handle
+unaligned loads just fine if misaligned_operand
+is true, and without the UNSPEC it can be combined
+with arithmetic instructions.  */
+  if (TARGET_AVX
+  (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+ || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+  misaligned_operand

Re: [PATCH] Fix PR middle-end/58134

2013-10-30 Thread Richard Biener

On Wed, Oct 30, 2013 at 1:55 AM, Sharad Singhai sing...@google.com wrote:
 This patch removes a deprecated option -ftree-vectorizer-verbose. It
 was already implemented in terms of -fopt-info and makes little sense
 to keep it around longer. It causes needless confusion as reported in
 PR middle-end/58134.

 I noticed that several gettext related gcc/po files contain help
 translation for the option being removed. However, I haven't updated
 them as I wasn't sure if these files should be regenerated using a
 separate mechanism. If needed, I can include manual updates to those
 .po files as well.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Okay
 for the trunk?

Please keep the common.opt entry as ignored for backward compatibility
(see several examples in that file).

Ok with that change.

Thanks,
Richard.

 Thanks,
 Sharad

 2013-10-29  Sharad Singhai  sing...@google.com

 PR middle-end/58134
 * opts.c (common_handle_option): Remove deprecated option
 -ftree-vectorizer-verbose.
 * doc/invoke.texi (Debugging Options): Ditto.
 * common.opt: Ditto.
 * opts-global.c (handle_common_deferred_options): Ditto.
 (dump_remap_tree_vectorizer_verbose): Delete.

Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-30 Thread Ilya Enkovich

2013/10/30 Richard Biener richard.guent...@gmail.com:
 On Tue, Oct 29, 2013 at 8:48 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2013/10/29 Jeff Law l...@redhat.com:
 On 10/29/13 07:52, Ilya Enkovich wrote:


 Yeah.  I'm working on it right now.  I've fixed known issues and now
 I'm looking for others.  Meanwhile here is a new patch version with
 required renames and without LTO restriction.

 I can't help but but curious, what turned out to be the root cause of those
 LTO problems?

 There were three different problems fixed.

 The first one was SSA_NAME in DECL_INITIAL of local var.
 Instrumentation used it to initialize var with input arg value
 (default SSA_NAME of PARM_DECL was used). LTO cannot handle it because
 when it reads symbols, it does not have SSA_NAMEs. It caused ICE.

 Obviously putting things in trees is bad.

Yes, it is fixed.

 Another problem was in LTO front-end. I did not realize it has own
 langhooks. It caused reset of flag_check_pointer_bounds in
 process_options by my own code.

 And the last one was in initialization of checker structures. Some
 structures were initialized during checker pass and then used in other
 passes (e.g. expand). With LTO checker pass is not executed after LTO
 front-end and following passes could work with uninitialized checker
 structures.

 Looks badly designed then - any function related information should
 be hooked off struct function and streamed by LTO.  Or the info
 should be present in the IL.

Info is local to pass and is not required to be streamed by LTO. It is
just stored
using checker interfaces in its structures.

Ilya


 Richard.




 2013-10-29  Ilya Enkovich  ilya.enkov...@intel.com

 * builtin-types.def (BT_FN_VOID_CONST_PTR): New.
 (BT_FN_PTR_CONST_PTR): New.
 (BT_FN_CONST_PTR_CONST_PTR): New.
 (BT_FN_PTR_CONST_PTR_SIZE): New.
 (BT_FN_PTR_CONST_PTR_CONST_PTR): New.
 (BT_FN_VOID_PTRPTR_CONST_PTR): New.
 (BT_FN_VOID_CONST_PTR_SIZE): New.
 (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New.
 * chkp-builtins.def: New.
 * builtins.def: include chkp-builtins.def.
 (DEF_CHKP_BUILTIN): New.
 * builtins.c (expand_builtin): Support
 BUILT_IN_CHKP_INIT_PTR_BOUNDS,
 BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS,
 BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS,
 BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS,
 BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS,
 BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND,
 BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL,
 BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET,
 BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND,
 BUILT_IN_CHKP_NARROW,
 BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER.
 * common.opt (fcheck-pointer-bounds): New.
 * toplev.c (process_options): Check Pointer Bounds Checker is
 supported.
 * doc/extend.texi: Document Pointer Bounds Checker built-in
 functions.

 This is fine.  Please install.

 Thanks!

 Ilya

 Thanks,
 jeff

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Ondřej Bílka

On Wed, Oct 30, 2013 at 10:47:13AM +0100, Jakub Jelinek wrote:
 Hi!
 
 Yesterday I've noticed that for AVX which allows unaligned operands in
 AVX arithmetics instructions we still don't combine unaligned loads with the
 AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
 void
 f1 (int *__restrict e, int *__restrict f)
 {
   int i;
   for (i = 0; i  1024; i++)
 e[i] = f[i] * 7;
 }
 
 void
 f2 (int *__restrict e, int *__restrict f)
 {
   int i;
   for (i = 0; i  1024; i++)
 e[i] = f[i];
 }
 we have:
 vmovdqu (%rsi,%rax), %xmm0
 vpmulld %xmm1, %xmm0, %xmm0
 vmovups %xmm0, (%rdi,%rax)
 in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
 *movmode_internal patterns (and various others) use misaligned_operand
 to see if they should emit vmovaps or vmovups (etc.), so as suggested by

That is intentional. In pre-haswell architectures splitting load is
faster than 32 byte load. 

See Intel® 64 and IA-32 Architectures Optimization Reference Manual for
details.

[PATCH C++/testsuite] Remove pchtest check objects and compile with current tool

2013-10-30 Thread Bernhard Reutner-Fischer

Hi,

The pch-init leaves check objects lying around, remove them.

While at it, i noticed that i was getting warnings from the check since
it was invoced with xg++ -nostdinc++ on C source (in one of the two
iterations the check is run -- once per tool).
My suggestion is to use the correct tool to perform the pch check. Does
that make sense to you?

Regstrapped (together with the auto-removal patch just sent) on
x86_64-unknown-linux-gnu with trunk@204119 without regressions.

Ok for trunk?

gcc/testsuite/ChangeLog

2013-10-12  Bernhard Reutner-Fischer  al...@gcc.gnu.org

* lib/dg-pch.exp (pch-init): Remove pchtest check objects.
Compile pchtest with current tool.
diff --git a/gcc/testsuite/lib/dg-pch.exp b/gcc/testsuite/lib/dg-pch.exp
index d82c669..41de454 100644
--- a/gcc/testsuite/lib/dg-pch.exp
+++ b/gcc/testsuite/lib/dg-pch.exp
@@ -18,6 +18,18 @@ load_lib copy-file.exp
 
 proc pch-init { args } {
 global pch_unsupported_debug pch_unsupported
+global tool
+
+set chk_content 
+set chk_lang c-header
+switch -- $tool {
+	g++ {
+		set chk_content // C++
+		set chk_lang c++-header
+	  }
+}
+append chk_content \nint i;
+set chk_lang -x $chk_lang
 
 if [info exists pch_unsupported_debug] {
 	error pch-init: pch_unsupported_debug is not empty as expected
@@ -26,20 +38,22 @@ proc pch-init { args } {
 	error pch-init: pch_unsupported is not empty as expected
 }
 
-set result [check_compile pchtest object int i; -g -x c-header]
+set result [check_compile pchtest object $chk_content -g $chk_lang]
 set pch_unsupported_debug \
 	[regexp debug format cannot be used with pre-compiled headers \
 		[lindex $result 0]]
+remove-build-file [lindex $result 1]
 
 set pch_unsupported 0
 if { $pch_unsupported_debug } {
 	verbose -log pch is unsupported with the debug info format
 
-	set result [check_compile pchtest object int i; -x c-header]
+	set result [check_compile pchtest object $chk_type $chk_lang]
 	set pch_unsupported \
 		[regexp debug format cannot be used with pre-compiled headers \
 			[lindex $result 0]]
 }
+remove-build-file [lindex $result 1]
 }
 
 proc pch-finish { args } {

Re: [wide-int] More optimisations

2013-10-30 Thread Richard Biener

On Tue, 29 Oct 2013, Mike Stump wrote:

 On Oct 29, 2013, at 5:43 AM, Richard Sandiford rsand...@linux.vnet.ibm.com 
 wrote:
  Richard Biener rguent...@suse.de writes:
  
  I think the cases Mike added should only be enabled when we can figure
  them out at compile-time, too.
  
  Well, the idea with most of these functions was to handle the simple
  everything is one HWI cases inline.
 
 Yes.  The idea is that 99.99% of all cases are 1 HWI or less, 
 dynamically.  By inlining and doing those cases inline, provided the 
 code is relatively small, seemed like a win.  This win comes at the cost 
 of duplicative code in the wider path, as it checks the precision/length 
 as well, but slowing down those cases seemed reasonable with respect to 
 how infrequent we expect them.  Now, if the slow path code is the same 
 speed as the inline code would have been, certainly the duplication just 
 hurts.  This is why I was posting the code fragments for the fast path 
 case.  I was aiming for the fast path to be really nice to help ensure 
 that we don't don't slow down compiles on narrow code any.

Handling the len == 1 case inline is fine, just don't add too many
fast special cases (mixed len == 1, len  1, special known extended
cases, etc.) that are not decidable at compile-time as branchy code
is both hard to optimize, and puts load on I-cache and the branch
predictor.

Richard.

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Jakub Jelinek

On Wed, Oct 30, 2013 at 10:53:58AM +0100, Ondřej Bílka wrote:
  Yesterday I've noticed that for AVX which allows unaligned operands in
  AVX arithmetics instructions we still don't combine unaligned loads with the
  AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
  void
  f1 (int *__restrict e, int *__restrict f)
  {
int i;
for (i = 0; i  1024; i++)
  e[i] = f[i] * 7;
  }
  
  void
  f2 (int *__restrict e, int *__restrict f)
  {
int i;
for (i = 0; i  1024; i++)
  e[i] = f[i];
  }
  we have:
  vmovdqu (%rsi,%rax), %xmm0
  vpmulld %xmm1, %xmm0, %xmm0
  vmovups %xmm0, (%rdi,%rax)
  in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
  *movmode_internal patterns (and various others) use misaligned_operand
  to see if they should emit vmovaps or vmovups (etc.), so as suggested by
 
 That is intentional. In pre-haswell architectures splitting load is
 faster than 32 byte load. 

But the above is 16 byte unaligned load.  Furthermore, GCC supports
-mavx256-split-unaligned-load and can emit 32 byte loads either as an
unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
affects only the cases where we were already emitting 16 byte or 32 byte
unaligned loads rather than split loads.

Jakub

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Jakub Jelinek

On Wed, Oct 30, 2013 at 11:00:13AM +0100, Jakub Jelinek wrote:
 But the above is 16 byte unaligned load.  Furthermore, GCC supports
 -mavx256-split-unaligned-load and can emit 32 byte loads either as an
 unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
 affects only the cases where we were already emitting 16 byte or 32 byte
 unaligned loads rather than split loads.

With my patch, the differences (in all cases only on f1) for
-O2 -mavx -ftree-vectorize with the patch is (16 byte unaligned load, not 
split):
-   vmovdqu (%rsi,%rax), %xmm0
-   vpmulld %xmm1, %xmm0, %xmm0
+   vpmulld (%rsi,%rax), %xmm1, %xmm0
vmovups %xmm0, (%rdi,%rax)
with -O2 -mavx2 -ftree-vectorize (again, load wasn't split):
-   vmovdqu (%rsi,%rax), %ymm0
-   vpmulld %ymm1, %ymm0, %ymm0
+   vpmulld (%rsi,%rax), %ymm1, %ymm0
vmovups %ymm0, (%rdi,%rax)
and with -O2 -mavx2 -mavx256-split-unaligned-load:
vmovdqu (%rsi,%rax), %xmm0
vinserti128 $0x1, 16(%rsi,%rax), %ymm0, %ymm0
-   vpmulld %ymm1, %ymm0, %ymm0
+   vpmulld %ymm0, %ymm1, %ymm0
vmovups %ymm0, (%rdi,%rax)
(the last change is just giving RTL optimizers more freedom by not
doing the SUBREG on the lhs).

Jakub

Re: C++ PATCH to deal with trivial but non-callable [cd]tors

2013-10-30 Thread Eric Botcazou

 In C++ all classes have destructors, but we try to defer building the
 implicit declaration.  My patch causes us to build those implicit
 declarations more often, which is probably a bit of a memory regression,
 but it would be good for your code to handle the dtor being declared.

OK, patch attached.  It's large because of some internal shuffling, but it 
just implements that.  Tested on x86-64/Linux, any objections?


2013-10-31  Eric Botcazou  ebotca...@adacore.com

c-family/
* c-ada-spec.h (cpp_operation): Add HAS_TRIVIAL_DESTRUCTOR.
(dump_ada_specs): Adjust prototype of second callback.
* c-ada-spec.c (cpp_check): New global variable.
(dump_ada_nodes): Remove cpp_check parameter and do not pass it down.
(print_generic_ada_decl): Likewise.
(has_static_fields): Change return type to bool and add guard.
(is_trivial_method): New predicate.
(has_nontrivial_methods): Likewise.
(is_tagged_type): Change return type to bool.
(separate_class_package): Call has_nontrivial_methods.
(pp_ada_tree_identifier): Minor tweaks.
(dump_ada_function_declaration): Adjust calls to dump_generic_ada_node.
(dump_ada_array_domains): Likewise.
(dump_ada_array_type): Likewise.
(dump_template_types): Remove cpp_check parameter and do not pass it to
dump_generic_ada_node.
(dump_ada_template): Likewise.
(dump_generic_ada_node): Remove cpp_check parameter and do not pass it
recursively.
(print_ada_methods): Change return type to integer.  Remove cpp_check
parameter and do not pass it down.
(dump_nested_types): Remove cpp_check parameter and do not pass it to
dump_generic_ada_node.
(print_ada_declaration): Likewise.  Do not print trivial methods.  Test
RECORD_OR_UNION_TYPE_P predicate before accessing methods.
(print_ada_struct_decl): Remove cpp_check parameter and do not pass it
down.  Use has_nontrivial_methods to recognize C++ classes.  Use return
value of print_ada_methods.
(dump_ads): Rename cpp_check parameter to check and adjust prototype.
Set cpp_check to it before invoking dump_ada_nodes.
(dump_ada_specs): Likewise.
cp/
* decl2.c (cpp_check): Change type of first parameter and deal with
HAS_TRIVIAL_DESTRUCTOR.


-- 
Eric BotcazouIndex: c-family/c-ada-spec.h
===
--- c-family/c-ada-spec.h	(revision 204152)
+++ c-family/c-ada-spec.h	(working copy)
@@ -29,13 +29,14 @@ typedef enum {
   IS_CONSTRUCTOR,
   IS_DESTRUCTOR,
   IS_COPY_CONSTRUCTOR,
-  IS_TEMPLATE
+  IS_TEMPLATE,
+  HAS_TRIVIAL_DESTRUCTOR
 } cpp_operation;
 
 extern location_t decl_sloc (const_tree, bool);
 extern void collect_ada_nodes (tree, const char *);
 extern void collect_source_ref (const char *);
 extern void dump_ada_specs (void (*)(const char *),
-			int (*)(tree, cpp_operation));
+			int (*)(const_tree, cpp_operation));
 
 #endif /* ! C_ADA_SPEC_H */
Index: c-family/c-ada-spec.c
===
--- c-family/c-ada-spec.c	(revision 204152)
+++ c-family/c-ada-spec.c	(working copy)
@@ -46,32 +46,31 @@ along with GCC; see the file COPYING3.
 #endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */
 
 /* Local functions, macros and variables.  */
-static int dump_generic_ada_node (pretty_printer *, tree, tree,
-  int (*)(tree, cpp_operation), int, int, bool);
-static int print_ada_declaration (pretty_printer *, tree, tree,
-  int (*cpp_check)(tree, cpp_operation), int);
-static void print_ada_struct_decl (pretty_printer *, tree, tree,
-   int (*cpp_check)(tree, cpp_operation), int,
-   bool);
+static int dump_generic_ada_node (pretty_printer *, tree, tree, int, int,
+  bool);
+static int print_ada_declaration (pretty_printer *, tree, tree, int);
+static void print_ada_struct_decl (pretty_printer *, tree, tree, int, bool);
 static void dump_sloc (pretty_printer *buffer, tree node);
 static void print_comment (pretty_printer *, const char *);
-static void print_generic_ada_decl (pretty_printer *, tree,
-int (*)(tree, cpp_operation), const char *);
+static void print_generic_ada_decl (pretty_printer *, tree, const char *);
 static char *get_ada_package (const char *);
-static void dump_ada_nodes (pretty_printer *, const char *,
-			int (*)(tree, cpp_operation));
+static void dump_ada_nodes (pretty_printer *, const char *);
 static void reset_ada_withs (void);
 static void dump_ada_withs (FILE *);
 static void dump_ads (const char *, void (*)(const char *),
-		  int (*)(tree, cpp_operation));
+		  int (*)(const_tree, cpp_operation));
 static char *to_ada_name (const char *, int *);
 static bool separate_class_package (tree);
 
-#define INDENT(SPACE) do { \
-  int i; for (i = 0; iSPACE; i++) pp_space (buffer); } while (0)
+#define INDENT(SPACE) \

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Ilya Enkovich

On 30 Oct 10:26, Richard Biener wrote:
 
 Ick - you enlarge all return statements?  But you don't set the actual value?
 So why allocate it with 2 ops in the first place??

When return does not return bounds it has operand with zero value similar to 
case when it does not return value. What is the difference then?

 
 [Seems I completely missed that MPX changes gimple and the design
 document that was posted somewhere??]
 

Design is on GCC Wiki and link was posted few times: 
http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. Here 
is quote about return statements: 

Returns instrumentation. We add new operand to return statement to hold 
returned bounds and instrumentation pass is responsible to fill this operand 
with correct bounds

 Bah.
 
 Where is the update to gimple.texi and tree.texi?
 
 Richard.
 

Unfortunately patch has been already installed.  Should we uninstall it?  If 
not, then here is patch for documentation.

Thanks,
Ilya
--

gcc/

2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

* doc/gimple.texi (gimple_call_num_nobnd_args): New.
(gimple_call_nobnd_arg): New.
(gimple_return_retbnd): New.
(gimple_return_set_retbnd): New.
(gimple_call_get_nobnd_arg_index): New.


diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 7bd9fd5..be74170 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call 
statement @code{G}.
 Return the number of arguments used by call statement @code{G}.
 @end deftypefn
 
+@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g)
+Return the number of arguments used by call statement @code{G}
+ignoring bound ones.
+@end deftypefn
+
 @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index)
 Return the argument at position @code{INDEX} for call statement @code{G}.  The
 first argument is 0.
 @end deftypefn
 
+@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned 
index)
+Return the argument at position @code{INDEX} for call statement @code{G}
+ignoring bound ones.  The first argument is 0.
+@end deftypefn
+
+@deftypefn {GIMPLE function} unsigned gimple_call_get_nobnd_arg_index (gimple 
g, unsigned index)
+Return index of @code{INDEX}'s non bound argument of the call statement 
@code{G}
+@end deftypefn
+
 @deftypefn {GIMPLE function} {tree *} gimple_call_arg_ptr (gimple g, unsigned 
index)
 Return a pointer to the argument at position @code{INDEX} for call
 statement @code{G}.
@@ -2029,6 +2043,15 @@ Return the return value for @code{GIMPLE_RETURN} 
@code{G}.
 Set @code{RETVAL} to be the return value for @code{GIMPLE_RETURN} @code{G}.
 @end deftypefn
 
+@deftypefn {GIMPLE function} tree gimple_return_retbnd (gimple g)
+Return the bounds of return value for @code{GIMPLE_RETURN} @code{G}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_return_set_retbnd (gimple g, tree 
retbnd)
+Set @code{RETBND} to be the bounds of return value for @code{GIMPLE_RETURN}
+@code{G}.
+@end deftypefn
+
 @node @code{GIMPLE_SWITCH}
 @subsection @code{GIMPLE_SWITCH}
 @cindex @code{GIMPLE_SWITCH}

Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Tobias Burnus

Without that patch, which I have copied from asan-dg.exp, I get tons of 
failures because ld cannot find libcilkrts.


OK for committal?

Tobias
2013-10-30  Tobias Burnus  bur...@net-b.de

	* gcc.dg/cilk-plus/cilk-plus.exp: Add the libcilkrts library
	path to the compile flags.

diff --git a/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp b/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp
index a8f00d4..0a9d19b 100644
--- a/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp
+++ b/gcc/testsuite/gcc.dg/cilk-plus/cilk-plus.exp
@@ -26,7 +26,24 @@ if { ![check_effective_target_cilkplus] } {
 verbose $tool $libdir 1
 set library_var [get_multilibs]
 # Pointing the ld_library_path to the Cilk Runtime library binaries. 
-set ld_library_path ${library_var}/libcilkrts/.libs
+if { $gccpath !=  } {
+  if { [file exists ${gccpath}/libcilkrts/.libs/libcilkrts.a]
+   || [file exists ${gccpath}/libcilkrts/.libs/libcilkrts.${shlib_ext}] } {
+  append flags  -B${gccpath}/libcilkrts/ 
+  append flags  -L${gccpath}/libcilkrts/.libs 
+  append ld_library_path :${gccpath}/libcilkrts/.libs
+  }
+} else {
+  global tool_root_dir
+
+  set libcilkrts [lookfor_file ${tool_root_dir} libcilkrts]
+  if { $libcilkrts !=  } {
+  append flags -L${libcilkrts} 
+  append ld_library_path :${libcilkrts}
+  }
+}
+
+set_ld_library_path_env_vars
 
 dg-init
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/AN/*.c]]  -fcilkplus

Re: [PATCH, MPX, 2/X] Pointers Checker [3/25] Attributes

2013-10-30 Thread Ilya Enkovich

On 24 Oct 23:34, Jeff Law wrote:
 On 10/21/13 05:59, Ilya Enkovich wrote:
 Hi,
 
 This patch adds attributes 'bnd_variable_size' and 'bnd_legacy' used by 
 Pointers Checker.
 
 Bootstrapped and tested on linux-x86_64.
 
 Thanks,
 Ilya
 --
 
 gcc/
 
 2013-10-21  Ilya Enkovich  ilya.enkov...@intel.com
 
  * c-family/c-common.c (handle_bnd_variable_size_attribute): New.
  (handle_bnd_legacy): New.
  (c_common_attribute_table): Add bnd_variable_size and bnd_legacy.
  * doc/extend.texi: Document bnd_variable_size and bnd_legacy
  attributes.
 OK with the same terminology changes from the 2/25 Builtins patch.
 
 jeff
 
Thanks! Below is the installed version.

Ilya
--

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index b20fdd6..f519489 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -375,6 +375,8 @@ static tree handle_omp_declare_simd_attribute (tree *, 
tree, tree, int,
   bool *);
 static tree handle_omp_declare_target_attribute (tree *, tree, tree, int,
 bool *);
+static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool 
*);
+static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
 
 static void check_function_nonnull (tree, int, tree *);
 static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
@@ -757,6 +759,10 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_omp_declare_simd_attribute, false },
   { omp declare target, 0, 0, true, false, false,
  handle_omp_declare_target_attribute, false },
+  { bnd_variable_size,  0, 0, true,  false, false,
+ handle_bnd_variable_size_attribute, false },
+  { bnd_legacy, 0, 0, true, false, false,
+ handle_bnd_legacy, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -8013,6 +8019,38 @@ handle_fnspec_attribute (tree *node ATTRIBUTE_UNUSED, 
tree ARG_UNUSED (name),
   return NULL_TREE;
 }
 
+/* Handle a bnd_variable_size attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_bnd_variable_size_attribute (tree *node, tree name, tree ARG_UNUSED 
(args),
+   int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FIELD_DECL)
+{
+  warning (OPT_Wattributes, %qE attribute ignored, name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Handle a bnd_legacy attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_bnd_legacy (tree *node, tree name, tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes, %qE attribute ignored, name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Handle a warn_unused attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8ca3137..1d52e42 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2138,7 +2138,7 @@ attributes are currently defined for functions on all 
targets:
 @code{returns_nonnull}, @code{gnu_inline},
 @code{externally_visible}, @code{hot}, @code{cold}, @code{artificial},
 @code{no_sanitize_address}, @code{no_address_safety_analysis},
-@code{no_sanitize_undefined},
+@code{no_sanitize_undefined}, @code{bnd_legacy},
 @code{error} and @code{warning}.
 Several other attributes are defined for functions on particular
 target systems.  Other attributes, including @code{section} are
@@ -3549,6 +3549,12 @@ The @code{no_sanitize_undefined} attribute on functions 
is used
 to inform the compiler that it should not check for undefined behavior
 in the function when compiling with the @option{-fsanitize=undefined} option.
 
+@item bnd_legacy
+@cindex @code{bnd_legacy} function attribute
+The @code{bnd_legacy} attribute on functions is used to inform
+compiler that function should not be instrumented when compiled
+with @option{-fcheck-pointers} option.
+
 @item regparm (@var{number})
 @cindex @code{regparm} attribute
 @cindex functions that are passed arguments in registers on the 386
@@ -5321,12 +5327,12 @@ placed in either the @code{.bss_below100} section or the
 The keyword @code{__attribute__} allows you to specify special
 attributes of @code{struct} and @code{union} types when you define
 such types.  This keyword is followed by an attribute specification
-inside double parentheses.  Seven attributes are currently defined for
+inside double parentheses.  Eight attributes are currently defined for
 types: @code{aligned}, @code{packed}, @code{transparent_union},
-@code{unused}, @code{deprecated}, @code{visibility}, and
-@code{may_alias}.  Other attributes are defined for functions
-(@pxref{Function

Re: [PATCH, MPX, 2/X] Pointers Checker [4/25] Constructors

2013-10-30 Thread Ilya Enkovich

On 24 Oct 23:55, Jeff Law wrote:
 On 10/21/13 06:10, Ilya Enkovich wrote:
 Hi,
 
 This patch introduces two new contructor types supported by 
 cgraph_build_static_cdtor.
 
 'B' type is used to initialize static objects (bounds) created by Pointers 
 Checker. The difference of this type from the regular constructor is that 
 'B' constructor is never instrumented by Pointers Checker.
 
 'P' type is used by Pointers Checker to generate constructors to initialize 
 bounds of statically initialized pointers. Pointers Checker remove all 
 stores from such constructors after instrumentation.
 
 Since 'P' type constructors are created for statically initialized objects, 
 we need to avoid creation of such objects during its gimplification. New 
 restriction was added to gimplify_init_constructor.
 
 Bootstrapped and checked on linux-x86_64.
 
 Thanks,
 Ilya
 --
 
 gcc/
 
 2013-10-21  Ilya Enkovich  ilya.enkov...@intel.com
 
  * ipa.c (cgraph_build_static_cdtor_1): Support contructors
  with chkp ctor and bnd_legacy attributes.
  * gimplify.c (gimplify_init_constructor): Avoid infinite
  loop during gimplification of bounds initializer.
 This is OK.
 
 As a side note, it seems awfully strange to be passing in the type
 of ctor/dtor in a char varaible.  I'd look favorably upon changing
 that to an enum where the enum values describe the cases they
 handle.  The existing code seems so, umm, 80s/90s style.  Obviously
 not something that's required of you to move this patch forward.

Thanks! Installed to trunk.

Ilya
 
 jeff

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Richard Biener

On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com wrote:
 On 30 Oct 10:26, Richard Biener wrote:

 Ick - you enlarge all return statements?  But you don't set the actual value?
 So why allocate it with 2 ops in the first place??

 When return does not return bounds it has operand with zero value similar to 
 case when it does not return value. What is the difference then?


 [Seems I completely missed that MPX changes gimple and the design
 document that was posted somewhere??]


 Design is on GCC Wiki and link was posted few times: 
 http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. 
 Here is quote about return statements:

 Returns instrumentation. We add new operand to return statement to hold 
 returned bounds and instrumentation pass is responsible to fill this operand 
 with correct bounds

foo (int * p, unsigned int size)
{
  unnamed type __bound_tmp.0;
  long unsigned int D.2239;
  long unsigned int _2;
  sizetype _6;
  int * _7;

  bb 3:
  __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));

  bb 2:
  _2 = (long unsigned int) size_1(D);
  __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
  _6 = _2 + 18446744073709551615;
  _7 = p_3(D) + _6;
  __builtin_ia32_bndcu (__bound_tmp.0_4, _7);
  access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));

so it seems there is now a mismatch between DECL_ARGUMENTS
and the GIMPLE call stmt arguments.  How (if) did you amend
the GIMPLE stmt verifier for this?

How does regular code deal with this which may expect matching
to DECL_ARGUMENTS?  In fact interleaving the additional
arguments sounds very error-prone for existing code - I'd have
appended all bound args at the end.  Also you unconditionally
claim all pointer arguments have a bound - that looks like bad
design as well.  Why didn't you add a flag to the relevant
PARM_DECL (and then, what do you do for indirect calls?).

/* Return the number of arguments used by call statement GS
   ignoring bound ones.  */

static inline unsigned
gimple_call_num_nobnd_args (const_gimple gs)
{
  unsigned num_args = gimple_call_num_args (gs);
  unsigned res = num_args;
  for (unsigned n = 0; n  num_args; n++)
if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
  res--;
  return res;
}

the choice means that gimple_call_num_nobnd_args is not O(1).

/* Return INDEX's call argument ignoring bound ones.  */
static inline tree
gimple_call_nobnd_arg (const_gimple gs, unsigned index)
{
  /* No bound args may exist if pointers checker is off.  */
  if (!flag_check_pointer_bounds)
return gimple_call_arg (gs, index);
  return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index));
}

GIMPLE layout depending on flag_check_pointer_bounds sounds
like a recipie for desaster if you consider TUs compiled with and
TUs compiled without and LTO.  Or if you consider using
optimized attribute with that flag.

I wish I had seen all this before.

 Bah.

 Where is the update to gimple.texi and tree.texi?

 Richard.


 Unfortunately patch has been already installed.  Should we uninstall it?  If 
 not, then here is patch for documentation.

I hope the reviewers that approved the patch will work with you to
address the above issues.  I can't be everywhere.

Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

 * doc/gimple.texi (gimple_call_num_nobnd_args): New.
 (gimple_call_nobnd_arg): New.
 (gimple_return_retbnd): New.
 (gimple_return_set_retbnd): New.
 (gimple_call_get_nobnd_arg_index): New.


 diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
 index 7bd9fd5..be74170 100644
 --- a/gcc/doc/gimple.texi
 +++ b/gcc/doc/gimple.texi
 @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call 
 statement @code{G}.
  Return the number of arguments used by call statement @code{G}.
  @end deftypefn

 +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g)
 +Return the number of arguments used by call statement @code{G}
 +ignoring bound ones.
 +@end deftypefn
 +
  @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index)
  Return the argument at position @code{INDEX} for call statement @code{G}.  
 The
  first argument is 0.
  @end deftypefn

 +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned 
 index)
 +Return the argument at position @code{INDEX} for call statement @code{G}
 +ignoring bound ones.  The first argument is 0.
 +@end deftypefn
 +
 +@deftypefn {GIMPLE function} unsigned gimple_call_get_nobnd_arg_index 
 (gimple g, unsigned index)
 +Return index of @code{INDEX}'s non bound argument of the call statement 
 @code{G}
 +@end deftypefn
 +
  @deftypefn {GIMPLE function} {tree *} gimple_call_arg_ptr (gimple g, 
 unsigned index)
  Return a pointer to the argument at position @code{INDEX} for call
  statement @code{G}.
 @@ -2029,6 +2043,15 @@ Return the return value for @code{GIMPLE_RETURN} 
 @code{G}.
  Set @code{RETVAL} to be the

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Richard Biener

On Wed, Oct 30, 2013 at 11:48 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 On 30 Oct 10:26, Richard Biener wrote:

 Ick - you enlarge all return statements?  But you don't set the actual 
 value?
 So why allocate it with 2 ops in the first place??

 When return does not return bounds it has operand with zero value similar to 
 case when it does not return value. What is the difference then?


 [Seems I completely missed that MPX changes gimple and the design
 document that was posted somewhere??]


 Design is on GCC Wiki and link was posted few times: 
 http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. 
 Here is quote about return statements:

 Returns instrumentation. We add new operand to return statement to hold 
 returned bounds and instrumentation pass is responsible to fill this operand 
 with correct bounds

 foo (int * p, unsigned int size)
 {
   unnamed type __bound_tmp.0;
   long unsigned int D.2239;
   long unsigned int _2;
   sizetype _6;
   int * _7;

   bb 3:
   __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));

   bb 2:
   _2 = (long unsigned int) size_1(D);
   __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
   _6 = _2 + 18446744073709551615;
   _7 = p_3(D) + _6;
   __builtin_ia32_bndcu (__bound_tmp.0_4, _7);
   access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));

 so it seems there is now a mismatch between DECL_ARGUMENTS
 and the GIMPLE call stmt arguments.  How (if) did you amend
 the GIMPLE stmt verifier for this?

 How does regular code deal with this which may expect matching
 to DECL_ARGUMENTS?  In fact interleaving the additional
 arguments sounds very error-prone for existing code - I'd have
 appended all bound args at the end.  Also you unconditionally
 claim all pointer arguments have a bound - that looks like bad
 design as well.  Why didn't you add a flag to the relevant
 PARM_DECL (and then, what do you do for indirect calls?).

 /* Return the number of arguments used by call statement GS
ignoring bound ones.  */

 static inline unsigned
 gimple_call_num_nobnd_args (const_gimple gs)
 {
   unsigned num_args = gimple_call_num_args (gs);
   unsigned res = num_args;
   for (unsigned n = 0; n  num_args; n++)
 if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
   res--;
   return res;
 }

 the choice means that gimple_call_num_nobnd_args is not O(1).

 /* Return INDEX's call argument ignoring bound ones.  */
 static inline tree
 gimple_call_nobnd_arg (const_gimple gs, unsigned index)
 {
   /* No bound args may exist if pointers checker is off.  */
   if (!flag_check_pointer_bounds)
 return gimple_call_arg (gs, index);
   return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index));
 }

 GIMPLE layout depending on flag_check_pointer_bounds sounds
 like a recipie for desaster if you consider TUs compiled with and
 TUs compiled without and LTO.  Or if you consider using
 optimized attribute with that flag.

Btw, we have this kind of mess pre-existing with stuff like flag_wrapv
and flag_trapv, adding a new case should be a 100% no-go.  If you
really need to have this conditional per call then add a flag on
CALL_EXPR and GIMPLE_CALL saying this call has bound arguments
and return value.  And append the bound arguments at the end.

Richard.


 I wish I had seen all this before.

 Bah.

 Where is the update to gimple.texi and tree.texi?

 Richard.


 Unfortunately patch has been already installed.  Should we uninstall it?  If 
 not, then here is patch for documentation.

 I hope the reviewers that approved the patch will work with you to
 address the above issues.  I can't be everywhere.

 Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

 * doc/gimple.texi (gimple_call_num_nobnd_args): New.
 (gimple_call_nobnd_arg): New.
 (gimple_return_retbnd): New.
 (gimple_return_set_retbnd): New.
 (gimple_call_get_nobnd_arg_index): New.


 diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
 index 7bd9fd5..be74170 100644
 --- a/gcc/doc/gimple.texi
 +++ b/gcc/doc/gimple.texi
 @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call 
 statement @code{G}.
  Return the number of arguments used by call statement @code{G}.
  @end deftypefn

 +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g)
 +Return the number of arguments used by call statement @code{G}
 +ignoring bound ones.
 +@end deftypefn
 +
  @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index)
  Return the argument at position @code{INDEX} for call statement @code{G}.  
 The
  first argument is 0.
  @end deftypefn

 +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple g, unsigned 
 index)
 +Return the argument at position @code{INDEX} for call statement @code{G}
 +ignoring bound ones.  The first argument is 0.
 +@end deftypefn
 +
 +@deftypefn {GIMPLE

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Ondřej Bílka

On Wed, Oct 30, 2013 at 11:05:58AM +0100, Jakub Jelinek wrote:
 On Wed, Oct 30, 2013 at 11:00:13AM +0100, Jakub Jelinek wrote:
  But the above is 16 byte unaligned load.  Furthermore, GCC supports
  -mavx256-split-unaligned-load and can emit 32 byte loads either as an
  unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
  affects only the cases where we were already emitting 16 byte or 32 byte
  unaligned loads rather than split loads.
 
 With my patch, the differences (in all cases only on f1) for
 -O2 -mavx -ftree-vectorize with the patch is (16 byte unaligned load, not 
 split):

My point was that this could mask split loads, thank for clarifying

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Uros Bizjak

On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote:

 Yesterday I've noticed that for AVX which allows unaligned operands in
 AVX arithmetics instructions we still don't combine unaligned loads with the
 AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize

This is actually PR 47754 that fell below radar for some reason...

 we have:
 vmovdqu (%rsi,%rax), %xmm0
 vpmulld %xmm1, %xmm0, %xmm0
 vmovups %xmm0, (%rdi,%rax)
 in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
 *movmode_internal patterns (and various others) use misaligned_operand
 to see if they should emit vmovaps or vmovups (etc.), so as suggested by
 Richard on IRC it isn't necessary to either allow UNSPEC_LOADU in memory
 operands of all the various non-move AVX instructions for TARGET_AVX, or
 add extra patterns to help combine, this patch instead just uses the
 *movmode_internal in that case (assuming initially misaligned_operand
 doesn't become !misaligned_operand through RTL optimizations).  Additionally

No worries here. We will generate movdqa, and it is definitely a gcc
bug if RTL optimizations change misaligned operand to aligned.

 the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
 loads, which usually means combine will fail, by doing the load into a
 temporary pseudo in that case and then doing a pseudo to pseudo move with
 gen_lowpart on the rhs (which will be merged soon after into following
 instructions).

Is this similar to PR44141? There were similar problems with V4SFmode
subregs, so combine was not able to merge load to the arithemtic insn.

 I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
 bootstrap/regtest server isn't AVX capable.

I can bootstrap the patch later today on IvyBridge with
--with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

Uros.

[Patch ARM] Fix PR target/58854

2013-10-30 Thread Ramana Radhakrishnan

PR58854 is another manifestation of the usual issue with the scheduler 
not understanding sp fp overlap issues and hoisting such adjustments 
(see PR38644 for a long history on an earlier one) over accesses from 
the frame. The usual bodge in the backend is attached.


Tested arm-none-eabi cross - tested that the testcase works just fine.

Applied to trunk and will backport to the 4.8 branch in a day or so 
after the auto-testers have had a chance to play with this.


regards
Ramana

2013-10-30  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

PR target/58854
* config/arm/arm.c (arm_expand_epilogue_apcs_frame): Emit blockage.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 212a4bc..23dfc0e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26547,6 +26547,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
   num_regs = bit_count (saved_regs_mask);
   if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca)
 {
+  emit_insn (gen_blockage ());
   /* Unwind the stack to just below the saved registers.  */
   emit_insn (gen_addsi3 (stack_pointer_rtx,
  hard_frame_pointer_rtx,

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-30 Thread Richard Biener

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

 On Tue, Oct 29, 2013 at 12:55:04PM +0100, Richard Biener wrote:
  Surely you can't rely on CCP and VRP compute exactly the same
  nonzero_bits.  As you don't record/compute zero_bits you can't
  tell whether a not set bit in nonzer_bits is don't know or
  if it is zero.  And you cannot do an assert during iteration
  (during iteration optimistically 'wrong' values can disappear).
  
  During CCP iteration the rule is that bits may only be added to mask
  (and obviously the constant itself on a CONSTANT lattice value may
  not change).
 
  Factor this out to commonize with code in evaluate_stmt
 
 Ok.
 
   +   tem = val-mask.low;
   +   align = (tem  -tem);
   +   if (align  1)
   + set_ptr_info_alignment (get_ptr_info (name), align,
   + (TREE_INT_CST_LOW (val-value)
   +   (align - 1)));
   + }
   +  else
   + {
   +   double_int nonzero_bits = val-mask;
   +   nonzero_bits = nonzero_bits | tree_to_double_int (val-value);
   +   nonzero_bits = get_nonzero_bits (name);
  
  This looks odd to me - shouldn't it be simply
  
 nonzero_bits = ~val-mask  tree_to_double_int (val-value);
 
 Bits set in the mask are for the bits where the value is unknown,
 so potentially nonzero bits.  Where bits are clear in the mask,
 but set in the value, those are surely nonzero bits.
 The only known zero bits are where both mask and value are zero,
 and as nonzero_bits is defined to be 1 where the bit may be non-zero,
 0 where it must be zero (like it is for nonzero_bits* in RTL),
 I think val-mask | tree_to_double_int (val-value); is exactly
 what we want.

Ah, I think I missed that detail (1 in nonzero_bits means
maybe non-zero and 0 means must-be zero) - a bit odd, but yeah,
making it consistent with RTL sounds good.

  ?  = of get_nonzero_bits either is not necessary or shows the
  lattice is unnecessarily conservative because you apply it too early
  during propagation?
 
   +   set_nonzero_bits (name, nonzero_bits);
   + }
}

  /* Perform substitutions based on the known constant values.  */
   @@ -1671,6 +1700,25 @@ evaluate_stmt (gimple stmt)
  is_constant = (val.lattice_val == CONSTANT);
}

   +  if (flag_tree_bit_ccp
   +   !is_constant
  ^^^
  
  ah, so if the bit-CCP part figured out a set of known bits
  then you don't do anything here while in fact you can
  still remove bits from mask with get_nonzero_bits info.
 
 Yeah, that was not to lose info for the case where the lattice
 is already CONSTANT.

CONSTANT and with mask == 0, sure.  Remember the lattice is
also CONSTANT if only some bits are known.

 Originally, I had code like:
 
 --- tree-ssa-ccp.c.xx 2013-10-29 15:31:03.0 +0100
 +++ tree-ssa-ccp.c2013-10-29 15:34:44.257977817 +0100
 @@ -1701,8 +1701,7 @@ evaluate_stmt (gimple stmt)
  }
  
if (flag_tree_bit_ccp
 -   !is_constant
 -   likelyvalue != UNDEFINED
 +   (is_constant || likelyvalue != UNDEFINED)
 gimple_get_lhs (stmt)
 TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME)
  {
 @@ -1711,11 +1710,27 @@ evaluate_stmt (gimple stmt)
double_int mask = double_int::mask (TYPE_PRECISION (TREE_TYPE (lhs)));
if (nonzero_bits != double_int_minus_one  nonzero_bits != mask)
   {
 -   val.lattice_val = CONSTANT;
 -   val.value = build_zero_cst (TREE_TYPE (lhs));
 -   /* CCP wants the bits above precision set.  */
 -   val.mask = nonzero_bits | ~mask;
 -   is_constant = true;
 +   if (!is_constant)
 + {
 +   val.lattice_val = CONSTANT;
 +   val.value = build_zero_cst (TREE_TYPE (lhs));
 +   /* CCP wants the bits above precision set.  */
 +   val.mask = nonzero_bits | ~mask;
 +   is_constant = true;
 + }
 +   else
 + {
 +   double_int valv = tree_to_double_int (val.value);
 +   gcc_assert ((valv  ~val.mask
 +~nonzero_bits  mask).is_zero ());
 +   if (!(valv  ~nonzero_bits  mask).is_zero ())
 + val.value = double_int_to_tree (TREE_TYPE (lhs),
 + valv  nonzero_bits);
 +   if (nonzero_bits.is_zero ())
 + val.mask = double_int_zero;
 +   else
 + val.mask = val.mask  (nonzero_bits | ~mask);
 + }
   }
  }
  
 in the patch, but that breaks e.g. on
 void
 foo (unsigned int a, unsigned b, unsigned **c, void *d[8], int e)
 {
   int i;
   for (i = 0; i != e; i++);
   if (a)
 {
   for (i = 0;;)
   {
 i--;
 if (bar ())
   goto lab2;
   }
 lab1:;
   __builtin_printf (%d\n, i  0 ? -1 - i : i);
   return;
 }
 lab2:;
   if (!d[b]  *c)
 goto lab1;
 }
 at -O2 - VRP1 figures out:
   if (i_3  0)
 goto bb 10;
   else
 goto bb 11;
 
   bb 10:
   # RANGE [0, 2147483647] NONZERO 0x07fff
   iftmp.0_23

Re: [wide-int] Update main comment

2013-10-30 Thread Richard Sandiford

Kenneth Zadeck zad...@naturalbridge.com writes:
 On 10/29/2013 06:37 PM, Richard Sandiford wrote:
 This patch tries to update the main wide_int comment to reflect the current
 implementation.

 - bitsizetype is TImode on x86_64 and others, so I don't think it's
necessarily true that all offset_ints are signed.  (widest_int are
though.)
 i am wondering if this is too conservative an interpretation.I 
 believe that they are ti mode because that is the next thing after di 
 mode and so they wanted to accommodate the 3 extra bits. Certainly there 
 is no x86 that is able to address more than 64 bits.

Right, but my point is that it's a different case from widest_int.
It'd be just as valid to do bitsizetype arithmetic using wide_int
rather than offset_int, and those wide_ints would have precision 128,
just like the offset_ints.  And I wouldn't really say that those wide_ints
were fundamentally signed in any way.  Although the tree layer might know
that X upper bits of the bitsizetype are always signs, the tree-wide_int
interface treats them in the same way as any other 128-bit type.

Maybe I'm just being pedantic, but I think offset_int would only be like
widest_int if bitsizetype had precision 67 or whatever.  Then we could
say that both offset_int and widest_int must be wider than any inputs,
meaning that there's at least one leading sign bit.

This is related to the way that we have to assert:

template int N
inline wi::extended_tree N::extended_tree (const_tree t)
  : m_t (t)
{
  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) = N);
}

rather than:

template int N
inline wi::extended_tree N::extended_tree (const_tree t)
  : m_t (t)
{
  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t))  N);
}

(which would give slightly better offset_int code, because we could then
always use TREE_INT_CST_EXT_NUNITS.)

Thanks,
Richard

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-30 Thread Richard Biener

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

 Hi!
 
 On Tue, Oct 29, 2013 at 04:29:56PM +0100, Jakub Jelinek wrote:
   Surely you can't rely on CCP and VRP compute exactly the same
   nonzero_bits.  As you don't record/compute zero_bits you can't
   tell whether a not set bit in nonzer_bits is don't know or
   if it is zero.  And you cannot do an assert during iteration
 
 Unset bits in get_nonzero_bits are always known to be zero, set bits
 are don't know or known to be nonzero.
 
   (during iteration optimistically 'wrong' values can disappear).
   
   During CCP iteration the rule is that bits may only be added to mask
   (and obviously the constant itself on a CONSTANT lattice value may
   not change).
 
 Here is an untested safer variant of the original patch that attempts
 to honor the constraints you've mentioned by looking at get_value
 and either making nonzero_bits more conservative or not applying it at
 all if the above rules would be violated.  Still, not doing anything
 for the !is_constant case, because I don't know what would be the right
 thing.  Say, if for iteration we assume some constant value that has some
 bits unset in mask (known) and set in value (known non-zero), yet they
 are clear in get_nonzero_bits, shall we punt, or ignore what the iteration
 would try to do and adjust from get_nonzero_bits instead, something else?

Well, you'd unconditionally make the lattice value more constant
using get_nonzero_bits (which doesn't change during iteration).
If it is applied consistently then convergence shouldn't change.
So as how get_nonzero_bits works you can only force the kown
zero bits to the lattice.  You don't need to be concerned about
invalid lattice transitions as with always applying a constant
op you'll never get those.  [and if you'd ever arrive at
get_nonzero_bits saying that a bit is zero but the lattice
value is not zero in that bit you have the choice of dropping
to don't know for that bit or enforcing the zeroness of the
bit by changing the lattice value as well as its mask]

Richard.

  
   Factor this out to commonize with code in evaluate_stmt
  
  Ok.
 
 But in this more conservative patch I don't see enough code to commonize
 any more.
 
 The only change in the patch below since the previous patch is in the
 evaluate_stmt function:
 
 2013-10-29  Jakub Jelinek  ja...@redhat.com
 
   * gimple-pretty-print.c (dump_ssaname_info): Print newline also
   in case of VR_VARYING.  Print get_nonzero_bits if not all ones.
   * tree-ssanames.h (struct range_info_def): Add nonzero_bits field.
   (set_nonzero_bits, get_nonzero_bits): New prototypes.
   * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if
   a default def isn't partially constant.
   (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME
   is known to be partially zero.
   (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info
   to see if a default def isn't partially constant.
   * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon
   creation of a range, if VR_RANGE, try to improve nonzero_bits from
   the range.
   (set_nonzero_bits, get_nonzero_bits): New functions.
 
   * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure
   we warn about it already during VRP1 pass.
 
 --- gcc/gimple-pretty-print.c.jj  2013-10-23 14:43:12.0 +0200
 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200
 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe
if (!POINTER_TYPE_P (TREE_TYPE (node))
 SSA_NAME_RANGE_INFO (node))
  {
 -  double_int min, max;
 +  double_int min, max, nonzero_bits;
value_range_type range_type = get_range_info (node, min, max);
  
if (range_type == VR_VARYING)
 @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe
 pp_printf (buffer, , );
 pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node)));
 pp_printf (buffer, ]);
 -   newline_and_indent (buffer, spc);
   }
 +  nonzero_bits = get_nonzero_bits (node);
 +  if (nonzero_bits != double_int_minus_one
 +(nonzero_bits
 +   != double_int::mask (TYPE_PRECISION (TREE_TYPE (node)
 + {
 +   pp_string (buffer,  NONZERO );
 +   sprintf (pp_buffer (buffer)-digit_buffer,
 +HOST_WIDE_INT_PRINT_DOUBLE_HEX,
 +(unsigned HOST_WIDE_INT) nonzero_bits.high,
 +nonzero_bits.low);
 +   pp_string (buffer, pp_buffer (buffer)-digit_buffer);
 + }
 +  newline_and_indent (buffer, spc);
  }
  }
  
 --- gcc/tree-ssanames.h.jj2013-09-27 15:42:44.0 +0200
 +++ gcc/tree-ssanames.h   2013-10-24 15:52:53.765955605 +0200
 @@ -52,6 +52,8 @@ struct GTY (()) range_info_def {
double_int min;
/* Maximum for value range.  */
double_int max;
 +  /* Non-zero bits - bits not set are guaranteed to be always zero.  */
 +

Re: [PATCH] Use get_nonzero_bits to improve vectorization

2013-10-30 Thread Richard Biener

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

 On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote:
   +/* Return number of known trailing zero bits in EXPR, or, if the value of
   +   EXPR is known to be zero, the precision of it's type.  */
   +
   +int
  
  unsigned int?
 
 Ok.
 
   +case PLUS_EXPR:
   +case MINUS_EXPR:
   +case BIT_IOR_EXPR:
   +case BIT_XOR_EXPR:
   +case MIN_EXPR:
   +case MAX_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
  
  This first recurses but if either one returns 0 you don't have
  to (that cuts down the recursion from exponential to linear in
  the common case?).  Thus, early out on ret == 0?
 
 Yeah, that is reasonable.  Usually we use this during expansion etc.,
 trees won't be arbitrarily large and for SSA_NAMEs we don't recurse
 on definitions, so I think we are unlikely to ever see very large
 expressions there though.  I've added similar early out to the COND_EXPR
 case.
 
   +  return MIN (ret1, ret2);
   +case POINTER_PLUS_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
   +  ret2 = MIN (ret2, prec);
  
  Why do you need that here but not elsewhere when processing
  binary ops?
 
 Because POINTER_PLUS_EXPR (well, also shifts, but for those we
 don't call tree_ctz on the second argument) is the only
 binop we handle that uses different types for the operands,
 for the first argument we know it has the same precision as the result, but
 what if sizetype has bigger precision than TYPE_PRECISION of pointers?
 I know it typically doesn't, just wanted to make sure we never return
 an out of range return value, tree_ctz result should be in [0, prec]
 always.

Ah, indeed.  Maybe worth a comment.

   +  return MIN (ret1, ret2);
   +case BIT_AND_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
   +  return MAX (ret1, ret2);
   +case MULT_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
   +  return MIN (ret1 + ret2, prec);
   +case LSHIFT_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  if (host_integerp (TREE_OPERAND (expr, 1), 1)
  
  check that first before recursing for op0 - if op1 is negative
  you simply return ret1 which looks wrong, too.
 
 If op1 is negative, then it is undefined behavior, so assuming
 in that case the same thing as when op1 is not constant (i.e.
 that we worst case shift left by 0 and thus not increase number
 of trailing zeros, or shift left by  0 and increase it) is IMHO
 correct.  I don't think we treat left shift by negative value as
 right shift anywhere, do we?

We do during constant folding I think.

   +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
   +(unsigned HOST_WIDE_INT) prec))
  
  This check is to avoid overflowing ret1 + ret2?  If so, why not add
  
   + {
   +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
  
ret2 = MIN (ret2, prec);
  
  instead?
 
 Because ret{1,2} are int (well, with the change you've suggested unsigned
 int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1
 (yeah, surely undefined behavior), I just didn't want to lose any of the upper
 bits.  Sure, alternatively I could have an UHWI temporary variable and
 assign tree_low_cst to it and do the MIN on that temporary and prec, but
 still I think it is better to handle out of range constants as variable
 shift count rather than say shift count up by prec - 1.

Ok.

Thanks,
Richard.

   +case TRUNC_DIV_EXPR:
   +case CEIL_DIV_EXPR:
   +case FLOOR_DIV_EXPR:
   +case ROUND_DIV_EXPR:
   +case EXACT_DIV_EXPR:
   +  if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
   + {
   +   ret2 = tree_log2 (TREE_OPERAND (expr, 1));
   +   if (ret2 = 0  tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1)
  
  cheaper to test the sign first.
 
 Ok.
  
   + {
   +   ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +   if (ret1  ret2)
   + return ret1 - ret2;
   + }
   + }
   +  return 0;
   +CASE_CONVERT:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
   +  if (ret1  ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 
   0
   + ret1 = prec;
   +  return MIN (ret1, prec);
   +case SAVE_EXPR:
   +  return tree_ctz (TREE_OPERAND (expr, 0));
   +case COND_EXPR:
   +  ret1 = tree_ctz (TREE_OPERAND (expr, 1));
   +  ret2 = tree_ctz (TREE_OPERAND (expr, 2));
   +  return MIN (ret1, ret2);
   +case COMPOUND_EXPR:
   +  return tree_ctz (TREE_OPERAND (expr, 1));
   +case ADDR_EXPR:
   +  ret1 = get_object_alignment (TREE_OPERAND (expr, 0));
  
  Use get_pointer_alignment, this isn't a memory reference so type
  alignment rules don't apply.
 
 Ok.

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Jakub Jelinek

On Wed, Oct 30, 2013 at 11:55:44AM +0100, Uros Bizjak wrote:
 On Wed, Oct 30, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote:
 
  Yesterday I've noticed that for AVX which allows unaligned operands in
  AVX arithmetics instructions we still don't combine unaligned loads with the
  AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
 
 This is actually PR 47754 that fell below radar for some reason...

Apparently yes.

  the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
  loads, which usually means combine will fail, by doing the load into a
  temporary pseudo in that case and then doing a pseudo to pseudo move with
  gen_lowpart on the rhs (which will be merged soon after into following
  instructions).
 
 Is this similar to PR44141? There were similar problems with V4SFmode
 subregs, so combine was not able to merge load to the arithemtic insn.

From the work on the vectorization last year I remember many cases where
subregs (even equal size) on the LHS of instructions prevented combiner or
other RTL optimizations from doing it's job.  I believe I've changed some
easy places that did that completely unnecessarily, but certainly have not
went through all the code to look for other places where this is done.

Perhaps let's hack up a checking pass that will after expansion walk the
whole IL and complain about same sized subregs on the LHS of insns, then do make
check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.?

  I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately my
  bootstrap/regtest server isn't AVX capable.
 
 I can bootstrap the patch later today on IvyBridge with
 --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

That would be greatly appreciated, thanks.

Jakub

Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-10-30 Thread Richard Biener

On Tue, 29 Oct 2013, Cong Hou wrote:

 Hi
 
 SAD (Sum of Absolute Differences) is a common and important algorithm
 in image processing and other areas. SSE2 even introduced a new
 instruction PSADBW for it. A SAD loop can be greatly accelerated by
 this instruction after being vectorized. This patch introduced a new
 operation SAD_EXPR and a SAD pattern recognizer in vectorizer.
 
 The pattern of SAD is shown below:
 
  unsigned type x_t, y_t;
  signed TYPE1 diff, abs_diff;
  TYPE2 sum = init;
loop:
  sum_0 = phi init, sum_1
  S1  x_t = ...
  S2  y_t = ...
  S3  x_T = (TYPE1) x_t;
  S4  y_T = (TYPE1) y_t;
  S5  diff = x_T - y_T;
  S6  abs_diff = ABS_EXPR diff;
  [S7  abs_diff = (TYPE2) abs_diff;  #optional]
  S8  sum_1 = abs_diff + sum_0;
 
where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is 
 the
same size of 'TYPE1' or bigger. This is a special case of a reduction
computation.
 
 For SSE2, type is char, and TYPE1 and TYPE2 are int.
 
 
 In order to express this new operation, a new expression SAD_EXPR is
 introduced in tree.def, and the corresponding entry in optabs is
 added. The patch also added the define_expand for SSE2 and AVX2
 platforms for i386.
 
 The patch is pasted below and also attached as a text file (in which
 you can see tabs). Bootstrap and make check got passed on x86. Please
 give me your comments.

Apart from the testcase comment made earlier

+++ b/gcc/tree-cfg.c
@@ -3797,6 +3797,7 @@ verify_gimple_assign_ternary (gimple stmt)
   return false;

 case DOT_PROD_EXPR:
+case SAD_EXPR:
 case REALIGN_LOAD_EXPR:
   /* FIXME.  */
   return false;

please add proper verification of the operand types.

+/* Widening sad (sum of absolute differences).
+   The first two arguments are of type t1 which should be unsigned 
integer.
+   The third argument and the result are of type t2, such that t2 is at 
least
+   twice the size of t1. SAD_EXPR(arg1,arg2,arg3) is equivalent to:
+   tmp1 = WIDEN_MINUS_EXPR (arg1, arg2);
+   tmp2 = ABS_EXPR (tmp1);
+   arg3 = PLUS_EXPR (tmp2, arg3);   */
+DEFTREECODE (SAD_EXPR, sad_expr, tcc_expression, 3)

WIDEN_MINUS_EXPR doesn't exist so you have to explain on its
operation (it returns a signed wide difference?).  Why should
the first two arguments be unsigned?  I cannot see a good reason
to require that (other than that maybe the x86 target only has
support for widened unsigned difference?).  So if you want to
make that restriction maybe change the name to SADU_EXPR
(sum of absolute differences of unsigned)?

I suppose you tried introducing WIDEN_MINUS_EXPR instead and
letting combine do it's work, avoiding the very special optab?

Thanks,
Richard.

 
 
 thanks,
 Cong
 
 
 
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 8a38316..d528307 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,23 @@
 +2013-10-29  Cong Hou  co...@google.com
 +
 + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD
 + pattern recognition.
 + (type_conversion_p): PROMOTION is true if it's a type promotion
 + conversion, and false otherwise.  Return true if the given expression
 + is a type conversion one.
 + * tree-vectorizer.h: Adjust the number of patterns.
 + * tree.def: Add SAD_EXPR.
 + * optabs.def: Add sad_optab.
 + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case.
 + * expr.c (expand_expr_real_2): Likewise.
 + * gimple-pretty-print.c (dump_ternary_rhs): Likewise.
 + * gimple.c (get_gimple_rhs_num_ops): Likewise.
 + * optabs.c (optab_for_tree_code): Likewise.
 + * tree-cfg.c (estimate_operator_cost): Likewise.
 + * tree-ssa-operands.c (get_expr_operands): Likewise.
 + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise.
 + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD.
 +
  2013-10-14  David Malcolm  dmalc...@redhat.com
 
   * dumpfile.h (gcc::dump_manager): New class, to hold state
 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index 7ed29f5..9ec761a 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -2730,6 +2730,7 @@ expand_debug_expr (tree exp)
   {
   case COND_EXPR:
   case DOT_PROD_EXPR:
 + case SAD_EXPR:
   case WIDEN_MULT_PLUS_EXPR:
   case WIDEN_MULT_MINUS_EXPR:
   case FMA_EXPR:
 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index c3f6c94..ca1ab70 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -6052,6 +6052,40 @@
DONE;
  })
 
 +(define_expand sadv16qi
 +  [(match_operand:V4SI 0 register_operand)
 +   (match_operand:V16QI 1 register_operand)
 +   (match_operand:V16QI 2 register_operand)
 +   (match_operand:V4SI 3 register_operand)]
 +  TARGET_SSE2
 +{
 +  rtx t1 = gen_reg_rtx (V2DImode);
 +  rtx t2 = gen_reg_rtx (V4SImode);
 +  emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2]));
 +  convert_move (t2, t1, 0);
 +  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
 +  gen_rtx_PLUS (V4SImode,
 + operands[3], t2)));
 +  DONE;
 +})
 +

RE: [PATCH 1/n] Add conditional compare support

2013-10-30 Thread Hans-Peter Nilsson

On Wed, 30 Oct 2013, Zhenqiang Chen wrote:
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Hans-Peter Nilsson
  One thing I don't see other people mentioning, is that this patch has just
 too
  much code inside #ifdef HAVE_ccmp ... #endif.
 
  I couldn't actually find the part that *requires* that, i.e.
  code that does something like gen_ccmp (...) but maybe it's there.

 In the arm.md, I define (define_expand ccmp  It will be transferred
 to function gen_ccmp (...). And it will define HAVE_ccmp.

I know.  I have not misunderstood that machinery.  The point
again, is that inside the #ifdef HAVE_ccmp there should
contain as few lines as possible.

 In the updated patch, I add two backends hooks to generate conditional
 compare instruction.

And that is good.  Where you use hooks, we don't need the #ifdef
HAVE_ccmp.  That is one of the major points with hooks.

  Where needed and where the conditioned code is more than a few lines,
  such code is preferably transformed into if (0):d code using constructs
 like
  that at builtins.c:5354:
 
  #ifndef HAVE_atomic_clear
  # define HAVE_atomic_clear 0
  # define gen_atomic_clear(x,y) (gcc_unreachable (), NULL_RTX) #endif
 
  Right, this causes dead code, but for maintenance it's *much* better than
  when only a fraction of the code being compiled for other targets.  (Also,
 the

 HAVE_ccmp is automatically generated when the target has ccmp instruction.

Exactly like with my example; atomic_clear is generated from
an insn pattern.  ...or actually, it appears it's *supposed* to
be, but I don't see any target defining it and it's not special
in genconfig.c and not even documented in md.texi so that might
be code dead everywhere!  We have atomic_store but no target
defines atomic_clear; certainly not without a mode.  My bad for
using a bad example.  For a better, live example, see
function.c:4665 same construct as above but with
HAVE_stack_protect_set.

 The code segments in #ifdef HAVE_ccmp ... #endif will *not* be compiled for
 other targets, which do not define ccmp instruction.

I think you misunderstood: *that is the problem*.  We want as
little as possible code to be conditional and inside #if...
preprocessor macros; we want it to use hooks and if (0) code.

brgds, H-P

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Ilya Enkovich

2013/10/30 Richard Biener richard.guent...@gmail.com:
 On Wed, Oct 30, 2013 at 11:34 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 On 30 Oct 10:26, Richard Biener wrote:

 Ick - you enlarge all return statements?  But you don't set the actual 
 value?
 So why allocate it with 2 ops in the first place??

 When return does not return bounds it has operand with zero value similar to 
 case when it does not return value. What is the difference then?


 [Seems I completely missed that MPX changes gimple and the design
 document that was posted somewhere??]


 Design is on GCC Wiki and link was posted few times: 
 http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. 
 Here is quote about return statements:

 Returns instrumentation. We add new operand to return statement to hold 
 returned bounds and instrumentation pass is responsible to fill this operand 
 with correct bounds

 foo (int * p, unsigned int size)
 {
   unnamed type __bound_tmp.0;
   long unsigned int D.2239;
   long unsigned int _2;
   sizetype _6;
   int * _7;

   bb 3:
   __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));

   bb 2:
   _2 = (long unsigned int) size_1(D);
   __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
   _6 = _2 + 18446744073709551615;
   _7 = p_3(D) + _6;
   __builtin_ia32_bndcu (__bound_tmp.0_4, _7);
   access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));

 so it seems there is now a mismatch between DECL_ARGUMENTS
 and the GIMPLE call stmt arguments.  How (if) did you amend
 the GIMPLE stmt verifier for this?

Verifier just ignores bound arguments while iterating through them.


 How does regular code deal with this which may expect matching
 to DECL_ARGUMENTS?  In fact interleaving the additional
 arguments sounds very error-prone for existing code - I'd have
 appended all bound args at the end.  Also you unconditionally
 claim all pointer arguments have a bound - that looks like bad
 design as well.  Why didn't you add a flag to the relevant
 PARM_DECL (and then, what do you do for indirect calls?).

I'll consider using another layout for bound args. But why should we
have any PARM_DECL or other pointer not having bounds?

 /* Return the number of arguments used by call statement GS
ignoring bound ones.  */

 static inline unsigned
 gimple_call_num_nobnd_args (const_gimple gs)
 {
   unsigned num_args = gimple_call_num_args (gs);
   unsigned res = num_args;
   for (unsigned n = 0; n  num_args; n++)
 if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
   res--;
   return res;
 }

 the choice means that gimple_call_num_nobnd_args is not O(1).

Yes. And even having all bound args at the end would not fix it. We
have to additionally keep number of bounds if want to fix it. But I do
not see the strong reason for that. Currently there are just three
calls to this function in whole GCC.


 /* Return INDEX's call argument ignoring bound ones.  */
 static inline tree
 gimple_call_nobnd_arg (const_gimple gs, unsigned index)
 {
   /* No bound args may exist if pointers checker is off.  */
   if (!flag_check_pointer_bounds)
 return gimple_call_arg (gs, index);
   return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index));
 }

 GIMPLE layout depending on flag_check_pointer_bounds sounds
 like a recipie for desaster if you consider TUs compiled with and
 TUs compiled without and LTO.  Or if you consider using
 optimized attribute with that flag.

Yep. Marking instrumented calls and functions would be useful in LTO case.


 I wish I had seen all this before.

 Bah.

 Where is the update to gimple.texi and tree.texi?

 Richard.


 Unfortunately patch has been already installed.  Should we uninstall it?  If 
 not, then here is patch for documentation.

 I hope the reviewers that approved the patch will work with you to
 address the above issues.  I can't be everywhere.

Thanks for valuable input!

Ilya


 Richard.

 Thanks,
 Ilya
 --

 gcc/

 2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

 * doc/gimple.texi (gimple_call_num_nobnd_args): New.
 (gimple_call_nobnd_arg): New.
 (gimple_return_retbnd): New.
 (gimple_return_set_retbnd): New.
 (gimple_call_get_nobnd_arg_index): New.


 diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
 index 7bd9fd5..be74170 100644
 --- a/gcc/doc/gimple.texi
 +++ b/gcc/doc/gimple.texi
 @@ -1240,11 +1240,25 @@ Set @code{CHAIN} to be the static chain for call 
 statement @code{G}.
  Return the number of arguments used by call statement @code{G}.
  @end deftypefn

 +@deftypefn {GIMPLE function} unsigned gimple_call_num_nobnd_args (gimple g)
 +Return the number of arguments used by call statement @code{G}
 +ignoring bound ones.
 +@end deftypefn
 +
  @deftypefn {GIMPLE function} tree gimple_call_arg (gimple g, unsigned index)
  Return the argument at position @code{INDEX} for call statement @code{G}.  
 The
  first argument is 0.
  @end deftypefn

 +@deftypefn {GIMPLE function} tree gimple_call_nobnd_arg (gimple

[PATCH] Fix PR57100, add pre_and_rev_post_order_compute_fn

2013-10-30 Thread Richard Biener


This adds a struct function taking variant of 
pre_and_rev_post_order_compute that also drops the assert that
all blocks were visited (consistent with the function comment).
This makes graph.c not ICE on CFGs with unreachable blocks
which it tries to handle already, but it didn't notice that
RPO order is filled from the end and that pre_and_rev_post_order_compute
ICEs anyway for n != n_basic_blocks_for_function ...

The reverse filling is a good reason to keep the assert in
the pre_and_rev_post_order_compute variant that now wraps
pre_and_rev_post_order_compute_fn.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2013-10-30  Richard Biener  rguent...@suse.de

PR middle-end/57100
* basic-block.h (pre_and_rev_post_order_compute_fn): New function.
* cfganal.c (pre_and_rev_post_order_compute_fn): New function
as worker for ...
(pre_and_rev_post_order_compute): ... which now wraps it.
* graph.c (draw_cfg_nodes_no_loops): Use
pre_and_rev_post_order_compute_fn to avoid ICEing and dependence
on cfun.

Index: gcc/basic-block.h
===
*** gcc/basic-block.h   (revision 204202)
--- gcc/basic-block.h   (working copy)
*** extern void connect_infinite_loops_to_ex
*** 795,800 
--- 795,802 
  extern int post_order_compute (int *, bool, bool);
  extern basic_block dfs_find_deadend (basic_block);
  extern int inverted_post_order_compute (int *);
+ extern int pre_and_rev_post_order_compute_fn (struct function *,
+ int *, int *, bool);
  extern int pre_and_rev_post_order_compute (int *, int *, bool);
  extern int dfs_enumerate_from (basic_block, int,
   bool (*)(const_basic_block, const void *),
Index: gcc/cfganal.c
===
*** gcc/cfganal.c   (revision 204202)
--- gcc/cfganal.c   (working copy)
*** inverted_post_order_compute (int *post_o
*** 878,897 
return post_order_num;
  }
  
! /* Compute the depth first search order and store in the array
!   PRE_ORDER if nonzero, marking the nodes visited in VISITED.  If
!   REV_POST_ORDER is nonzero, return the reverse completion number for each
!   node.  Returns the number of nodes visited.  A depth first search
!   tries to get as far away from the starting point as quickly as
!   possible.
! 
!   pre_order is a really a preorder numbering of the graph.
!   rev_post_order is really a reverse postorder numbering of the graph.
!  */
  
  int
! pre_and_rev_post_order_compute (int *pre_order, int *rev_post_order,
!   bool include_entry_exit)
  {
edge_iterator *stack;
int sp;
--- 878,899 
return post_order_num;
  }
  
! /* Compute the depth first search order of FN and store in the array
!PRE_ORDER if nonzero.  If REV_POST_ORDER is nonzero, return the
!reverse completion number for each node.  Returns the number of nodes
!visited.  A depth first search tries to get as far away from the starting
!point as quickly as possible.
! 
!In case the function has unreachable blocks the number of nodes
!visited does not include them.
! 
!pre_order is a really a preorder numbering of the graph.
!rev_post_order is really a reverse postorder numbering of the graph.  */
  
  int
! pre_and_rev_post_order_compute_fn (struct function *fn,
!  int *pre_order, int *rev_post_order,
!  bool include_entry_exit)
  {
edge_iterator *stack;
int sp;
*** pre_and_rev_post_order_compute (int *pre
*** 921,927 
bitmap_clear (visited);
  
/* Push the first edge on to the stack.  */
!   stack[sp++] = ei_start (ENTRY_BLOCK_PTR-succs);
  
while (sp)
  {
--- 923,929 
bitmap_clear (visited);
  
/* Push the first edge on to the stack.  */
!   stack[sp++] = ei_start (ENTRY_BLOCK_PTR_FOR_FUNCTION (fn)-succs);
  
while (sp)
  {
*** pre_and_rev_post_order_compute (int *pre
*** 935,941 
dest = ei_edge (ei)-dest;
  
/* Check if the edge destination has been visited yet.  */
!   if (dest != EXIT_BLOCK_PTR  ! bitmap_bit_p (visited, dest-index))
{
  /* Mark that we have visited the destination.  */
  bitmap_set_bit (visited, dest-index);
--- 937,944 
dest = ei_edge (ei)-dest;
  
/* Check if the edge destination has been visited yet.  */
!   if (dest != EXIT_BLOCK_PTR_FOR_FUNCTION (fn)
!  ! bitmap_bit_p (visited, dest-index))
{
  /* Mark that we have visited the destination.  */
  bitmap_set_bit (visited, dest-index);
*** pre_and_rev_post_order_compute (int *pre
*** 956,962 
}
else
{
! if (ei_one_before_end_p (ei)  src != ENTRY_BLOCK_PTR

[C++ Patch / RFC] PR 29234

2013-10-30 Thread Paolo Carlini


Hi,

I'm having a look at this very old parsing issue. We reject things like:

struct S { void operator () (); };

void foo ()
{
  ( S()() );
}

because the parenthesized S()() triggers an hard error from groktypename 
as called at the end of cp_parser_type_id (called by 
cp_parser_cast_expression), about a function returning a function. Of 
course we have to parse it as an unary-expression.


I scratched my head for a while, but then I noticed that in 
cp_parser_cast_expression, a bit later the cp_parser_type_id call at 
issue, we are *already* using a cp_parser_tokens_start_cast_expression 
which helps use figuring out whether the tokens following the outer ')' 
make sense as a cast-expression. Using it a bit earlier to completely 
avoid calling cp_parser_type_id in the cases at issue, appears to work 
well, for a - negligeable in my opinion - additional computational cost.


Then, cp_parser_unary_expression is actually used. As part of that, 
cp_parser_postfix_expression looks for a compound-literal, but not in 
the robust way we are using elsewhere involving a 
cp_parser_skip_to_closing_parenthesis, but just using tentative parsing 
with cp_parser_type_id. Thus we are incurring again in the same issue. 
Parsing the compound-literal in the same way used elesewhere - thus 
looking for the '{' after the ')', solves the problem. In my personal 
opinion, again, the computational cost shouldn't be too high, because if 
we don't find '{' we completely avoid cp_parser_type_id.


Is this the kind of approach we want to pursue for this issue?

Tested x86_64-linux.

Thanks!
Paolo.

/


Index: cp/parser.c
===
--- cp/parser.c (revision 204202)
+++ cp/parser.c (working copy)
@@ -5800,31 +5800,45 @@ cp_parser_postfix_expression (cp_parser *parser, b
 cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN))
  {
tree initializer = NULL_TREE;
-   bool saved_in_type_id_in_expr_p;
+   bool compound_literal_p;
 
cp_parser_parse_tentatively (parser);
/* Consume the `('.  */
cp_lexer_consume_token (parser-lexer);
-   /* Parse the type.  */
-   saved_in_type_id_in_expr_p = parser-in_type_id_in_expr_p;
-   parser-in_type_id_in_expr_p = true;
-   type = cp_parser_type_id (parser);
-   parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
-   /* Look for the `)'.  */
-   cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
+
+   /* Avoid calling cp_parser_type_id pointlessly, see comment
+  in cp_parser_cast_expression about c++/29234.  */
+   cp_lexer_save_tokens (parser-lexer);
+
+   compound_literal_p
+ = (cp_parser_skip_to_closing_parenthesis (parser, false, false,
+   /*consume_paren=*/true)
+ cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE));
+
+   /* Roll back the tokens we skipped.  */
+   cp_lexer_rollback_tokens (parser-lexer);
+
+   if (!compound_literal_p)
+ cp_parser_simulate_error (parser);
+   else
+ {
+   /* Parse the type.  */
+   bool saved_in_type_id_in_expr_p = parser-in_type_id_in_expr_p;
+   parser-in_type_id_in_expr_p = true;
+   type = cp_parser_type_id (parser);
+   parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
+   /* Look for the `)'.  */
+   cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
+ }
+
/* If things aren't going well, there's no need to
   keep going.  */
if (!cp_parser_error_occurred (parser))
  {
-   if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE))
- {
-   bool non_constant_p;
-   /* Parse the brace-enclosed initializer list.  */
-   initializer = cp_parser_braced_list (parser,
-non_constant_p);
- }
-   else
- cp_parser_simulate_error (parser);
+   bool non_constant_p;
+   /* Parse the brace-enclosed initializer list.  */
+   initializer = cp_parser_braced_list (parser,
+non_constant_p);
  }
/* If that worked, we're definitely looking at a
   compound-literal expression.  */
@@ -7559,7 +7573,9 @@ cp_parser_cast_expression (cp_parser *parser, bool
 {
   tree type = NULL_TREE;
   tree expr = NULL_TREE;
+  bool skip_to_closing_ok_p;
   bool compound_literal_p;
+  bool cast_expression_p;
   const char *saved_message;
 
   /* There's no way to know yet whether or not this is a cast.

Re: [PATCH 1/n] Add conditional compare support

2013-10-30 Thread Paolo Bonzini

Il 18/09/2013 11:45, Zhenqiang Chen ha scritto:
 +;; Expand conditional compare according to the instruction patterns.
 +(define_expand conditional_compare
 +  [(set (match_operand 0 s_register_operand )
 + (match_operator 1 
 +  [(match_operator:SI 2 arm_comparison_operator
 +   [(match_operand:SI 3 s_register_operand )
 +(match_operand:SI 4 arm_add_operand )])

My first reaction to this is that this operand should only match
const0_rtx, and operands 2/3 should be the same mode as operand 0 rather
than SImode.  In particular, perhaps they could be CCmode or BImode on
some architectures.

But then Richard is right that this would look a lot like the old
compare + branch patterns.

So either you could do this at expansion time using hooks, like Richard
mentioned, or you could do a pattern matching pass that is specific to
this pattern and combines sets with MODE_CC destinations across multiple
basic blocks.

Paolo


 +  (match_operator:SI 5 arm_comparison_operator
 +   [(match_operand:SI 6 s_register_operand )
 +(match_operand:SI 7 arm_add_operand )])]))]
 +  TARGET_ARM || TARGET_THUMB2
 +  {
 + enum machine_mode mode;
 + HOST_WIDE_INT cond_or;
 + rtx par;
 + enum rtx_code code = GET_CODE (operands[1]);
 + cond_or = code == AND? DOM_CC_X_AND_Y : DOM_CC_X_OR_Y;
 +
 + mode = arm_select_dominance_cc_mode (operands[2],
 +   operands[5],
 +   cond_or);
 + /* To match and/ior_scc_scc, mode should not be CCmode.  */
 + if (mode == CCmode)
 +   FAIL;
 +
 + operands[2] = gen_rtx_fmt_ee (GET_CODE (operands[2]),
 +GET_MODE (operands[3]),
 +operands[3], operands[4]);
 + operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[5]),
 +GET_MODE (operands[5]),
 +operands[6], operands[7]);
 + operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), SImode,
 +operands[2], operands[5]);
 +
 +  /* Generate insn to match and/ior_scc_scc.  */
 +  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
 +  XVECEXP (par, 0, 0) = gen_rtx_SET (GET_MODE (operands[0]),
 + operands[0], operands[1]);
 +  XVECEXP (par, 0, 1) = gen_rtx_CLOBBER (CCmode,
 +  gen_rtx_REG (CCmode, CC_REGNUM));
 +
 +  emit_insn (par);
 +
 + DONE;
 +  }

Re: Aliasing: look through pointer's def stmt

2013-10-30 Thread Marc Glisse


On Wed, 30 Oct 2013, Richard Biener wrote:


On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote:

On Tue, 29 Oct 2013, Richard Biener wrote:


For the POINTER_PLUS_EXPR offset argument you should use
int_cst_value () to access it (it will unconditionally sign-extend)
and use host_integerp (..., 0).  That leaves the overflow possibility
in place (and you should multiply by BITS_PER_UNIT) which we
ignore in enough other places similar to this to ignore ...



Like this? (passes bootstrap+testsuite on x86_64-linux-gnu)

2013-10-30  Marc Glisse  marc.gli...@inria.fr


gcc/
* tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a
POINTER_PLUS_EXPR in the defining statement.

gcc/testsuite/
* gcc.dg/tree-ssa/alias-24.c: New file.


--
Marc Glisse

Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+void f (const char *c, int *i)
+{
+  *i = 42;
+  __builtin_memcpy (i + 1, c, sizeof (int));
+  if (*i != 42) __builtin_abort();
+}
+
+extern void keepit ();
+void g (const char *c, int *i)
+{
+  *i = 33;
+  __builtin_memcpy (i - 1, c, 3 * sizeof (int));
+  if (*i != 33) keepit();
+}
+
+/* { dg-final { scan-tree-dump-not abort optimized } } */
+/* { dg-final { scan-tree-dump keepit optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
+

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
___
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 204188)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -567,20 +567,29 @@ void
 ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
 {
   HOST_WIDE_INT t1, t2;
   ref-ref = NULL_TREE;
   if (TREE_CODE (ptr) == SSA_NAME)
 {
   gimple stmt = SSA_NAME_DEF_STMT (ptr);
   if (gimple_assign_single_p (stmt)
   gimple_assign_rhs_code (stmt) == ADDR_EXPR)
ptr = gimple_assign_rhs1 (stmt);
+  else if (is_gimple_assign (stmt)
+   gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
+   host_integerp (gimple_assign_rhs2 (stmt), 0)
+   (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0)


No need to restrict this to positive offsets I think.


Actually there is. The function documents that it only handles nonnegative 
offsets, and if we ignore that, the keepit part of the testcase fails 
because ranges_overlap_p works with unsigned offsets. I can try to change 
ranges_overlap_p and remove this restriction from 
ao_ref_init_from_ptr_and_size, but that's more risky so I'd prefer to do 
that as a separate follow-up patch if that's ok.



+   {
+ ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt),
size);


Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs
and MEM[ptr, offset] so it shouldn't be necessary.


Done. (note that if it isn't necessary, then it doesn't hurt either ;-)


+ ref-offset += 8 * t1;


BITS_PER_UNIT instead of 8.  I'd say just have a 0-initialized
additional_offset var that you unconditionally add ...


I also changed a few other 8s.


+ return;
+   }
 }

   if (TREE_CODE (ptr) == ADDR_EXPR)
 ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
 ref-offset, t1, t2);
   else
 {
   ref-base = build2 (MEM_REF, char_type_node,
  ptr, null_pointer_node);
   ref-offset = 0;


... here at the end.


Here is a new version, same testing, same ChangeLog.

--
Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+void f (const char *c, int *i)
+{
+  *i = 42;
+  __builtin_memcpy (i + 1, c, sizeof (int));
+  if (*i != 42) __builtin_abort();
+}
+
+extern void keepit ();
+void g (const char *c, int *i)
+{
+  *i = 33;
+  __builtin_memcpy (i - 1, c, 3 * sizeof (int));
+  if (*i != 33) keepit();
+}
+
+/* { dg-final { scan-tree-dump-not abort optimized } } */
+/* { dg-final { scan-tree-dump keepit optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
+

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
___
Added: svn:keywords

Re: [wide-int] Update main comment

2013-10-30 Thread Kenneth Zadeck


On 10/30/2013 07:01 AM, Richard Sandiford wrote:

Kenneth Zadeck zad...@naturalbridge.com writes:

On 10/29/2013 06:37 PM, Richard Sandiford wrote:

This patch tries to update the main wide_int comment to reflect the current
implementation.

- bitsizetype is TImode on x86_64 and others, so I don't think it's
necessarily true that all offset_ints are signed.  (widest_int are
though.)

i am wondering if this is too conservative an interpretation.I
believe that they are ti mode because that is the next thing after di
mode and so they wanted to accommodate the 3 extra bits. Certainly there
is no x86 that is able to address more than 64 bits.

Right, but my point is that it's a different case from widest_int.
It'd be just as valid to do bitsizetype arithmetic using wide_int
rather than offset_int, and those wide_ints would have precision 128,
just like the offset_ints.  And I wouldn't really say that those wide_ints
were fundamentally signed in any way.  Although the tree layer might know
that X upper bits of the bitsizetype are always signs, the tree-wide_int
interface treats them in the same way as any other 128-bit type.

Maybe I'm just being pedantic, but I think offset_int would only be like
widest_int if bitsizetype had precision 67 or whatever.  Then we could
say that both offset_int and widest_int must be wider than any inputs,
meaning that there's at least one leading sign bit.
this was of course what mike and i wanted, but we could not really 
figure out how to pull it off.
in particular, we could not find any existing reliable marker in the 
targets to say what the width of the widest pointer on any 
implementation.   We actually used the number 68 rather than 67 because 
we assumed 64 for the widest pointer on any existing platform, 3 bits 
for the bits and 1 bit for the sign.

This is related to the way that we have to assert:

template int N
inline wi::extended_tree N::extended_tree (const_tree t)
   : m_t (t)
{
   gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) = N);
}

rather than:

template int N
inline wi::extended_tree N::extended_tree (const_tree t)
   : m_t (t)
{
   gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t))  N);
}

(which would give slightly better offset_int code, because we could then
always use TREE_INT_CST_EXT_NUNITS.)

Thanks,
Richard

[c++-concepts] Constrained scope bugfix

2013-10-30 Thread Andrew Sutton

Partially fixing a bug that caused lookup errors in valid programs. For example:

templateInt T, Float U
  pairT, U::type void f(T, U); // Error, no such pair

When entering a template scope, we tried to match the template to one
having the same constraints. Obviously pair doesn't have Int and Float
constraints, and it probably doesn't have a partial specialization
with those constraints either.

I relaxed the fixup_template_type function so that it would just
return the looked-up type without emitting a diagnostic. This fix
makes the following a legal, however:

templatetypename T
  struct S { void f(); }

templateInt T
  void ST::f() { } // Should be an error

The right solution seems to be to diagnose the error only when
defining an out-of-class member by verifying that each template scope
in the qualified name matches a declaration with the same constraints.

2013-10-30  Andrew Sutton  andrew.n.sut...@gmail.com
* gcc/cp/semantics.c (fixup_template_type): Don't emit errors when
no templates can be found with matching constraints.
Index: gcc/cp/semantics.c
===
--- gcc/cp/semantics.c	(revision 203626)
+++ gcc/cp/semantics.c	(working copy)
@@ -2847,8 +2847,35 @@ finish_template_decl (tree parms)
 
 // Returns the template type of the class scope being entered. If we're
 // entering a constrained class scope. TYPE is the class template
-// scope being entered. If TYPE is not a class-type (e.g. a typename type),
-// then no fixup is needed.
+// scope being entered and we may need to match the intended type with
+// a constrained specialization. For example:
+//
+//templateObject T
+//  struct S { void f(); }; #1
+//
+//templateObject T
+//  void ST::f() { }  #2
+//
+// We check, in #2, that ST refers precisely to the type declared by
+// #1 (i.e., that the constraints match). Note that the following should
+// be an error since there is no specialization of ST that is 
+// unconstrained, but this is not diagnosed here.
+//
+//templatetypename T
+//  void ST::f() { }
+//
+// We cannot diagnose this problem here since this function also matches
+// qualified template names that are not part of a definition. For example:
+//
+//templateIntegral T, Floating_point U
+//  typename pairT, U::first_type void f(T, U);
+//
+// Here, it is unlikely that there is a partial specialization of
+// pair constrained for for Integral and Floating_point arguments.
+//
+// The general rule is: if a constrained specialization with matching
+// constraints is found return that type. Alos note that if TYPE is not a 
+// class-type (e.g. a typename type), then no fixup is needed.
 static tree
 fixup_template_type (tree type)
 {
@@ -2866,14 +2893,8 @@ fixup_template_type (tree type)
 return type;
   tree cur_constr = TEMPLATE_PARMS_CONSTRAINTS (parms);
 
-  // Do the constraints match those of the most general template? 
-  // If the constraints are NULL_TREE, this will match the most general
-  // template iff it is unconstrained.
+  // Search for a specialization whose constraints match.
   tree tmpl = CLASSTYPE_TI_TEMPLATE (type);
-  if (equivalent_constraints (cur_constr, DECL_CONSTRAINTS (tmpl)))
-return type;
-
-  // Can we find a specialization that matches?
   tree specs = DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
   while (specs)
 {
@@ -2883,10 +2904,8 @@ fixup_template_type (tree type)
   specs = TREE_CHAIN (specs);
 }
 
-  // Emit an error, but return the type to allow processing to continue.
-  // TODO: We should emit candidates since we've just scanned the 
-  // list of template constraints.
-  error (type %qT does not match any declarations, type);
+  // If no specialization matches, then must return the type
+  // previously found.
   return type;
 }
 
@@ -2904,7 +2923,7 @@ finish_template_type (tree name, tree ar
   type = lookup_template_class (name, args,
 NULL_TREE, NULL_TREE, entering_scope,
 tf_warning_or_error | tf_user);
-  
+
   // If entering a scope, correct the lookup to account for constraints.
   if (entering_scope)
 type = fixup_template_type (type);

Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)

2013-10-30 Thread Jakub Jelinek

On Mon, Oct 28, 2013 at 09:01:45PM +, Iyer, Balaji V wrote:
 Thanks! I will extract and check in the Cilk_spawn and _Cilk_sync for C work.

This broke bootstrap on i686-linux, fixed thusly, committed as obvious:

2013-10-30  Jakub Jelinek  ja...@redhat.com

* cilk.c (create_cilk_helper_decl): Use HOST_WIDE_INT_PRINT_DEC.

--- gcc/c-family/cilk.c.jj  2013-10-30 13:53:52.0 +0100
+++ gcc/c-family/cilk.c 2013-10-30 14:44:49.358912539 +0100
@@ -287,9 +287,9 @@ create_cilk_helper_decl (struct wrapper_
 {
   char name[20];
   if (wd-type == CILK_BLOCK_FOR)
-sprintf (name, _cilk_for_%ld, cilk_wrapper_count++);
+sprintf (name, _cilk_for_ HOST_WIDE_INT_PRINT_DEC, cilk_wrapper_count++);
   else if (wd-type == CILK_BLOCK_SPAWN)
-sprintf (name, _cilk_spn_%ld, cilk_wrapper_count++);
+sprintf (name, _cilk_spn_ HOST_WIDE_INT_PRINT_DEC, cilk_wrapper_count++);
   else
 gcc_unreachable (); 
   


Jakub

[c++-concepts] Diagnostics patch

2013-10-30 Thread Andrew Sutton

Applying a patch from Ville that adds diagnostics for the concept
specifier. Thanks Ville!

2013-10-30  Ville Voutilainen  ville.voutilai...@gmail.com
* gcc/cp/decl.c (grokdeclarator): Reject concept keyword
in typedefs, function parameters, data members, non-static
member functions and variables. Allow static member functions
to be concepts.

Andrew Sutton
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 204092)
+++ gcc/cp/decl.c	(working copy)
@@ -9074,6 +9074,12 @@
   if (name == NULL)
 name = decl_context == PARM ? parameter : type name;
 
+  if (concept_p  typedef_p)
+{
+  error (%concept% cannot appear in a typedef declaration);
+  return error_mark_node;
+}
+
   if (constexpr_p  typedef_p)
 {
   error (%constexpr% cannot appear in a typedef declaration);
@@ -9387,9 +9393,12 @@
 	   || thread_p)
 	error (storage class specifiers invalid in parameter declarations);
 
+  /* Function parameters cannot be concept. */
+  if (concept_p)
+  error (a parameter cannot be declared %concept%);
   /* Function parameters cannot be constexpr.  If we saw one, moan
  and pretend it wasn't there.  */
-  if (constexpr_p)
+  else if (constexpr_p)
 {
   error (a parameter cannot be declared %constexpr%);
   constexpr_p = 0;
@@ -10619,6 +10628,11 @@
 			   uqname, ctype);
 		return error_mark_node;
 		  }
+if (concept_p)
+  {
+error (a destructor cannot be %concept%);
+return error_mark_node;
+  }
 if (constexpr_p)
   {
 error (a destructor cannot be %constexpr%);
@@ -10632,6 +10646,12 @@
 		   id_declarator-u.id.unqualified_name);
 		return error_mark_node;
 	  }
+	if (sfk == sfk_constructor)
+if (concept_p)
+  {
+error (a constructor cannot be %concept%);
+return error_mark_node;
+  }
 
 	/* Tell grokfndecl if it needs to set TREE_PUBLIC on the node.  */
 	function_context = (ctype != NULL_TREE) ?
@@ -10645,7 +10665,7 @@
 			   unqualified_id,
 			   virtualp, flags, memfn_quals, rqual, raises,
 			   friendp ? -1 : 0, friendp, publicp,
-   inlinep | (2 * constexpr_p),
+   inlinep | (2 * constexpr_p) | (4 * concept_p),
 			   sfk,
 			   funcdef_flag, template_count, in_namespace,
 			   attrlist, declarator-id_loc);
@@ -10739,8 +10759,12 @@
 		if (declspecs-gnu_thread_keyword_p)
 		  DECL_GNU_TLS_P (decl) = true;
 		  }
-
-		if (constexpr_p  !initialized)
+		if (concept_p)
+		  // TODO: This needs to be revisited once variable
+		  // templates are supported
+		error (static data member %qE declared %concept%,
+			   unqualified_id);
+		else if (constexpr_p  !initialized)
 		  {
 		error (constexpr static data member %qD must have an 
 			   initializer, decl);
@@ -10749,7 +10773,10 @@
 	  }
 	else
 	  {
-if (constexpr_p)
+		if (concept_p)
+		  error (non-static data member %qE declared %concept%,
+			 unqualified_id);
+else if (constexpr_p)
 		  {
 		error (non-static data member %qE declared %constexpr%,
 			   unqualified_id);
@@ -10897,6 +10924,15 @@
   {
 	/* It's a variable.  */
 
+	// TODO: This needs to be revisited once variable
+	// templates are supported
+	if (concept_p)
+	  {
+	error (variable %qE declared %concept%,
+		   unqualified_id);
+	return error_mark_node;
+	  }
+	
 	/* An uninitialized decl with `extern' is a reference.  */
 	decl = grokvardecl (type, unqualified_id,
 			declspecs,
Index: gcc/testsuite/g++.dg/concepts/decl-diagnose.C
===
--- gcc/testsuite/g++.dg/concepts/decl-diagnose.C	(revision 0)
+++ gcc/testsuite/g++.dg/concepts/decl-diagnose.C	(working copy)
@@ -0,0 +1,20 @@
+// { dg-options -std=c++11 }
+typedef concept int CINT; // { dg-error 'concept' cannot appear in a typedef declaration }
+
+void f(concept int); // { dg-error a parameter cannot be declared 'concept' }
+
+concept int f2(); // { dg-error result must be bool }
+concept bool f3();
+
+struct X
+{
+  concept int f4(); // { dg-error result must be bool|declared with function parameters }
+  concept bool f5(); // { dg-error declared with function parameters }
+  static concept bool f6();
+  static concept bool x; // { dg-error declared 'concept' }
+  concept int x2; // { dg-error declared 'concept' }
+  concept ~X(); // { dg-error a destructor cannot be 'concept' }
+  concept X(); // { dg-error a constructor cannot be 'concept' }
+};
+
+concept bool X2; // { dg-error declared 'concept' }

Re: Aliasing: look through pointer's def stmt

2013-10-30 Thread Richard Biener

On Wed, Oct 30, 2013 at 1:43 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Wed, 30 Oct 2013, Richard Biener wrote:

 On Wed, Oct 30, 2013 at 1:09 AM, Marc Glisse marc.gli...@inria.fr wrote:

 On Tue, 29 Oct 2013, Richard Biener wrote:

 For the POINTER_PLUS_EXPR offset argument you should use
 int_cst_value () to access it (it will unconditionally sign-extend)
 and use host_integerp (..., 0).  That leaves the overflow possibility
 in place (and you should multiply by BITS_PER_UNIT) which we
 ignore in enough other places similar to this to ignore ...



 Like this? (passes bootstrap+testsuite on x86_64-linux-gnu)

 2013-10-30  Marc Glisse  marc.gli...@inria.fr


 gcc/
 * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a
 POINTER_PLUS_EXPR in the defining statement.

 gcc/testsuite/
 * gcc.dg/tree-ssa/alias-24.c: New file.


 --
 Marc Glisse

 Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy)
 @@ -0,0 +1,22 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +void f (const char *c, int *i)
 +{
 +  *i = 42;
 +  __builtin_memcpy (i + 1, c, sizeof (int));
 +  if (*i != 42) __builtin_abort();
 +}
 +
 +extern void keepit ();
 +void g (const char *c, int *i)
 +{
 +  *i = 33;
 +  __builtin_memcpy (i - 1, c, 3 * sizeof (int));
 +  if (*i != 33) keepit();
 +}
 +
 +/* { dg-final { scan-tree-dump-not abort optimized } } */
 +/* { dg-final { scan-tree-dump keepit optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 +

 Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
 ___
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Index: gcc/tree-ssa-alias.c
 ===
 --- gcc/tree-ssa-alias.c(revision 204188)
 +++ gcc/tree-ssa-alias.c(working copy)
 @@ -567,20 +567,29 @@ void
  ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
  {
HOST_WIDE_INT t1, t2;
ref-ref = NULL_TREE;
if (TREE_CODE (ptr) == SSA_NAME)
  {
gimple stmt = SSA_NAME_DEF_STMT (ptr);
if (gimple_assign_single_p (stmt)
gimple_assign_rhs_code (stmt) == ADDR_EXPR)
 ptr = gimple_assign_rhs1 (stmt);
 +  else if (is_gimple_assign (stmt)
 +   gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
 +   host_integerp (gimple_assign_rhs2 (stmt), 0)
 +   (t1 = int_cst_value (gimple_assign_rhs2 (stmt))) = 0)


 No need to restrict this to positive offsets I think.


 Actually there is. The function documents that it only handles nonnegative
 offsets, and if we ignore that, the keepit part of the testcase fails
 because ranges_overlap_p works with unsigned offsets. I can try to change
 ranges_overlap_p and remove this restriction from
 ao_ref_init_from_ptr_and_size, but that's more risky so I'd prefer to do
 that as a separate follow-up patch if that's ok.


 +   {
 + ao_ref_init_from_ptr_and_size (ref, gimple_assign_rhs1 (stmt),
 size);


 Please don't recurse - forwprop combines series of POINTER_PLUS_EXPRs
 and MEM[ptr, offset] so it shouldn't be necessary.


 Done. (note that if it isn't necessary, then it doesn't hurt either ;-)


 + ref-offset += 8 * t1;


 BITS_PER_UNIT instead of 8.  I'd say just have a 0-initialized
 additional_offset var that you unconditionally add ...


 I also changed a few other 8s.


 + return;
 +   }
  }

if (TREE_CODE (ptr) == ADDR_EXPR)
  ref-base = get_ref_base_and_extent (TREE_OPERAND (ptr, 0),
  ref-offset, t1, t2);
else
  {
ref-base = build2 (MEM_REF, char_type_node,
   ptr, null_pointer_node);
ref-offset = 0;


 ... here at the end.


 Here is a new version, same testing, same ChangeLog.

Ok.

Thanks,
Richard.

 --
 Marc Glisse
 Index: gcc/testsuite/gcc.dg/tree-ssa/alias-24.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/alias-24.c(working copy)
 @@ -0,0 +1,22 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +void f (const char *c, int *i)
 +{
 +  *i = 42;
 +  __builtin_memcpy (i + 1, c, sizeof (int));
 +  if (*i != 42) __builtin_abort();
 +}
 +
 +extern void keepit ();
 +void g (const char *c, int *i)
 +{
 +  *i = 33;
 +  __builtin_memcpy (i - 1, c, 3 * sizeof (int));
 +  if (*i != 33) keepit();
 +}
 +
 +/* { dg-final { scan-tree-dump-not abort optimized } } */
 +/* { dg-final { scan-tree-dump keepit

Re: [PATCH GCC]Simplify address expression in IVOPT

2013-10-30 Thread Richard Biener

On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 I noticed that IVOPT generates complex address expressions like below for iv
 base.
 arr_base[0].y
 arr[0]
 MEM[p+o]
 It's even worse for targets support auto-increment addressing mode because
 IVOPT adjusts such base expression with +/- step, then creates below:
 arr_base[0].y +/- step
 arr[0] +/- step
 MEM[p+o] +/- step
 It has two disadvantages:
 1) Cost computation in IVOPT can't handle complex address expression and
 general returns spill_cost for it, which is bad since address iv is
 important to IVOPT.
 2) IVOPT creates duplicate candidates for IVs which have same value in
 different forms, for example, two candidates are generated with each for
 a[0] and a.  Again, it's even worse for auto-increment addressing
 mode.

 This patch fixes the issue by simplifying address expression at the entry of
 allocating IV struct.  Maybe the simplification can be put in various fold*
 functions but I think it might be better in this way, because:
 1) fold* functions are used from front-end to various tree optimizations,
 the simplified address expressions may not be what each optimizer wanted.
 Think about parallelism related passes, they might want the array index
 information kept for further analysis.
 2) In some way, the simplification is conflict with current implementation
 of fold* function.  Take fold_binary_loc as an example, it tries to simplify
 a[i1] +p c* i2 into a[i1+i2].  Of course we can simplify in this way
 for IVOPT too, but that will cause new problems like: a) we have to add code
 in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
 to avoid; b) the simplification can't always be done because of the
 sign/unsigned offset problem (especially for auto-increment addressing
 mode).
 3) There are many entry point for fold* functions, the change will be
 non-trivial.
 4) The simplification is only done in alloc_iv for true (not duplicate ones)
 iv struct, the number of such iv should be moderate.

 With these points, I think it might be a win to do the simplification in
 IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?

 Bootstrap and tested on x86/x86_64/arm.
 The patch causes three cases failed on some target, but all of them are
 false alarm, which can be resolved by refining test cases to check more
 accurate information.

 Is it OK?

Hmm.  I think you want what get_inner_reference_aff does, using
the return value of get_inner_reference as starting point for
determine_base_object.  And you don't want to restrict yourselves
so much on what exprs to process, but only exclude DECL_Ps.
Just amend get_inner_reference_aff to return the tree base object.

Note that this isn't really simplifying but rather lowering all
addresses.

The other changes in this patch are unrelated, right?

Thanks,
Richard.

 Thanks.
 bin

 gcc/testsuite/ChangeLog
 2013-10-29  Bin Cheng  bin.ch...@arm.com

 * gcc.dg/tree-ssa/loop-2.c: Refine check condition.
 * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
 * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.

 2013-10-29  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression.
 (get_shiftadd_cost): Check equality using operand_equal_p.
 (force_expr_to_var_cost): Refactor the code.  Handle type
 conversion.
 (split_address_cost): Call force_expr_to_var_cost.

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-30 Thread Marek Polacek

On Fri, Oct 25, 2013 at 03:04:41PM -0400, Jason Merrill wrote:
 I'm sorry, you want me to move the c++1y VLA check into
 compute_array_index_type, or just do the ubsan instrumentation in
 there?  Thanks,
 
 Both.

Unfortunately, I'm having quite a lot of trouble with side-effects. :(
For e.g.
int x = 1;
int a[++x];

with the following hunk

--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -8394,6 +8382,18 @@ compute_array_index_type (tree name, tree size, 
tsubst_flags_t com
  if (found)
itype = variable_size (fold (newitype));
}
+
+ if ((flag_sanitize  SANITIZE_VLA)
+  !processing_template_decl
+ /* From C++1y onwards, we throw an exception on a negative
+length size of an array; see above  */
+  cxx_dialect  cxx1y)
+   {
+ tree x = cp_save_expr (size);
+ x = build2 (COMPOUND_EXPR, TREE_TYPE (x),
+ ubsan_instrument_vla (input_location, x), x);
+ finish_expr_stmt (x);
+   }
}
   /* Make sure that there was no overflow when creating to a signed
 index type.  (For example, on a 32-bit machine, an array with

we generate

  int x = 1;
  int a[0:(sizetype) SAVE_EXPR D.2143];

  cleanup_point   int x = 1;;
  cleanup_point  Unknown tree: expr_stmt
  if (SAVE_EXPR  ++x = 0)
{   
  __builtin___ubsan_handle_vla_bound_not_positive (*.Lubsan_data0, 
(unsigned long) SAVE_EXPR  ++x);
}   
  else
{   
  0   
}, (void) SAVE_EXPR  ++x; ;
ssizetype D.2143;
  cleanup_point  Unknown tree: expr_stmt
  (void) (D.2143 = (ssizetype)  ++x + -1) ;
  cleanup_point   int a[0:(sizetype) SAVE_EXPR D.2143];;

that is, x is incremented twice and that is wrong.

Is it possible to tell x has already been evaluated, don't evaluate
it again so that the x isn't incremented in the cleanup_point?

Or, would you, please, have some other advice?  I've been looking into this
for quite some time now, but haven't been able to come up with anything
better than moving the checks back to create_array_type_for_decl, where it
all started ;).

Marek

patch to fix PR58784 (ARM LRA crash)

2013-10-30 Thread Vladimir Makarov


The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58784

LRA has an old check of legitimate addresses.  It was written before a 
newer address decomposition code which makes more correct checks of 
addresses.


So I removed the old check.

Committed as rev. 204215.

2013-10-30  Vladimir Makarov  vmaka...@redhat.com

PR target/58784
* lra.c (check_rtl): Remove address check before LRA work.

2013-10-30  Vladimir Makarov  vmaka...@redhat.com

PR target/58784
* gcc.target/arm/pr58784.c: New.

Index: lra.c
===
--- lra.c   (revision 204131)
+++ lra.c   (working copy)
@@ -2017,10 +2017,8 @@
 static void
 check_rtl (bool final_p)
 {
-  int i;
   basic_block bb;
   rtx insn;
-  lra_insn_recog_data_t id;
 
   lra_assert (! final_p || reload_completed);
   FOR_EACH_BB (bb)
@@ -2036,31 +2034,13 @@
lra_assert (constrain_operands (1));
continue;
  }
+   /* LRA code is based on assumption that all addresses can be
+  correctly decomposed.  LRA can generate reloads for
+  decomposable addresses.  The decomposition code checks the
+  correctness of the addresses.  So we don't need to check
+  the addresses here.  */
if (insn_invalid_p (insn, false))
  fatal_insn_not_found (insn);
-   if (asm_noperands (PATTERN (insn)) = 0)
- continue;
-   id = lra_get_insn_recog_data (insn);
-   /* The code is based on assumption that all addresses in
-  regular instruction are legitimate before LRA.  The code in
-  lra-constraints.c is based on assumption that there is no
-  subreg of memory as an insn operand.  */
-   for (i = 0; i  id-insn_static_data-n_operands; i++)
- {
-   rtx op = *id-operand_loc[i];
-
-   if (MEM_P (op)
-(GET_MODE (op) != BLKmode
-   || GET_CODE (XEXP (op, 0)) != SCRATCH)
-! memory_address_p (GET_MODE (op), XEXP (op, 0))
-   /* Some ports don't recognize the following addresses
-  as legitimate.  Although they are legitimate if
-  they satisfies the constraints and will be checked
-  by insn constraints which we ignore here.  */
-GET_CODE (XEXP (op, 0)) != UNSPEC
-GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) != RTX_AUTOINC)
- fatal_insn_not_found (insn);
- }
   }
 }
 #endif /* #ifdef ENABLE_CHECKING */
Index: testsuite/gcc.target/arm/pr58784.c
===
--- testsuite/gcc.target/arm/pr58784.c  (revision 0)
+++ testsuite/gcc.target/arm/pr58784.c  (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2 } */
+
+typedef struct __attribute__ ((__packed__))
+{
+char valueField[2];
+} ptp_tlv_t;
+typedef struct __attribute__ ((__packed__))
+{
+char stepsRemoved;
+ptp_tlv_t tlv[1];
+} ptp_message_announce_t;
+int ptplib_send_announce(int sequenceId, int i)
+{
+ptp_message_announce_t tx_packet;
+((long long *)tx_packet.tlv[0].valueField)[sequenceId] = i;
+f(tx_packet);
+}

RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Iyer, Balaji V

Hello Everyone,
What I ideally wanted to do with my testsuite files was that I want all 
the Cilk keywords test to compile no matter what the architecture is, but it 
should only run in certain architectures where the runtime is enabled (this is 
known statically and thus the testsuite doesn't have to do anything to figure 
it out.). Can someone please tell me how do I do this?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Tobias Burnus [mailto:bur...@net-b.de]
 Sent: Wednesday, October 30, 2013 6:35 AM
 To: gcc patches; Iyer, Balaji V
 Subject: Testsuite / Cilk Plus: Include library path in compile flags in 
 gcc.dg/cilk-
 plus/cilk-plus.exp
 
 Without that patch, which I have copied from asan-dg.exp, I get tons of 
 failures
 because ld cannot find libcilkrts.
 
 OK for committal?
 
 Tobias

RFA: patch to fix PR58785 (an ARM LRA crash)

2013-10-30 Thread Vladimir Makarov


The following patch fixes:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58785

LRA chooses constraint 'm' for const_int operand.  It means that the 
const_int should be placed in memory but it does not happen as preferred 
reload class hook returns LO_REGS for class NO_REGS which is result of 
LRA choosing 'm'.  I don't know why reload pass needs such value but it 
should be return NO_REGS IMHO as it results in much less reload insns.


Is this patch ok to commit to the trunk?

2013-10-30  Vladimir Makarov  vmaka...@redhat.com

PR target/58785
* config/arm/arm.c (arm_preferred_reload_class): Don't return
LO_REGS for NO_REGS for LRA.

2013-10-30  Vladimir Makarov  vmaka...@redhat.com

PR target/58785
* gcc.target/arm/pr58785.c: New.


Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 204213)
+++ config/arm/arm.c(working copy)
@@ -6884,7 +6884,7 @@ arm_preferred_reload_class (rtx x ATTRIB
 {
   if (rclass == GENERAL_REGS
  || rclass == HI_REGS
- || rclass == NO_REGS
+ || (! lra_in_progress  rclass == NO_REGS)
  || rclass == STACK_REG)
return LO_REGS;
   else
Index: testsuite/gcc.target/arm/pr58785.c
===
--- testsuite/gcc.target/arm/pr58785.c  (revision 0)
+++ testsuite/gcc.target/arm/pr58785.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -mthumb } */
+
+typedef _Fract HQtype __attribute__ ((mode (HQ)));
+extern void *memcpy (void *,const void *,int);
+
+HQtype f()
+{
+  HQtype c;
+  int z = 0xfada;
+  memcpy (c, z, 2);
+  return c;
+}

[patch] fix libstdc++/58912

2013-10-30 Thread Jonathan Wakely

This isn't a regression but is tiny and removes an unnecessary
pessimisation, so I'm committing it to the 4.7 and 4.8 branches.  The
problem doesn't exist on the trunk.

Tested x86_64-linux.

2013-10-30  Chris Studholme  c...@cs.utoronto.ca

PR libstdc++/58912
* include/bits/shared_ptr_base.h (_Sp_counted_ptr_inplace): Remove
unnecessary initialization of storage buffer.
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -394,7 +394,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 public:
   templatetypename... _Args
_Sp_counted_ptr_inplace(_Alloc __a, _Args... __args)
-   : _M_impl(__a), _M_storage()
+   : _M_impl(__a)
{
  _M_impl._M_ptr = static_cast_Tp*(static_castvoid*(_M_storage));
  // _GLIBCXX_RESOLVE_LIB_DEFECTS

Re: [PATCH] Do not append INTERNAL to the decl name

2013-10-30 Thread Jason Merrill


On 10/29/2013 01:37 PM, Dehao Chen wrote:

If we're actually emitting the name now, we need to give it a name different
from the complete constructor.  I suppose it makes sense to go with C4/D4 as
in the decloning patch,


Shall we do it in a separate patch? And I suppose binutils also need
to be updated for C4/D4?


In the same patch, please.  And yes, the demangler will need to be updated.

Jason

PATCH to use -Wno-format during stage1

2013-10-30 Thread Jason Merrill

I find -Wformat warnings about unknown % codes when building with an 
older GCC to be mildly annoying noise; this patch avoids them by passing 
-Wno-format during stage 1.


Tested x86_64-pc-linux-gnu.  Is this OK for trunk?  Do you have another 
theory of how this should work?


commit c40b06619fc9ef74e4d4d8b299a6c77c6fb63df5
Author: Jason Merrill ja...@redhat.com
Date:   Mon Oct 28 16:45:05 2013 -0400

/
	* Makefile.tpl (STAGE1_CONFIGURE_FLAGS): Pass
	--disable-build-format-warnings.
gcc/
	* configure.ac (loose_warn): Add -Wno-format if
	--disable-build-format-warnings.

diff --git a/Makefile.in b/Makefile.in
index 572b3d0..e0ba784 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -498,8 +498,10 @@ STAGE1_LANGUAGES = @stage1_languages@
 #   the last argument when conflicting --enable arguments are passed.
 # * Likewise, we force-disable coverage flags, since the installed
 #   compiler probably has never heard of them.
+# * We also disable -Wformat, since older GCCs don't understand newer %s.
 STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \
-	  --disable-coverage --enable-languages=$(STAGE1_LANGUAGES)
+	  --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) \
+	  --disable-build-format-warnings
 
 STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate
 STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
diff --git a/Makefile.tpl b/Makefile.tpl
index 3e187e1..65d070b 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -451,8 +451,10 @@ STAGE1_LANGUAGES = @stage1_languages@
 #   the last argument when conflicting --enable arguments are passed.
 # * Likewise, we force-disable coverage flags, since the installed
 #   compiler probably has never heard of them.
+# * We also disable -Wformat, since older GCCs don't understand newer %s.
 STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \
-	  --disable-coverage --enable-languages=$(STAGE1_LANGUAGES)
+	  --disable-coverage --enable-languages=$(STAGE1_LANGUAGES) \
+	  --disable-build-format-warnings
 
 STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate
 STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
diff --git a/gcc/configure b/gcc/configure
index 1e7bcb6..ea91906 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -875,6 +875,7 @@ with_demangler_in_ld
 with_gnu_as
 with_as
 enable_largefile
+enable_build_format_warnings
 enable_werror_always
 enable_checking
 enable_coverage
@@ -1569,6 +1570,8 @@ Optional Features:
   for creating source tarballs for users without
   texinfo bison or flex
   --disable-largefile omit support for large files
+  --disable-build-format-warnings
+  don't use -Wformat while building GCC
   --enable-werror-always  enable -Werror despite compiler version
   --enable-checking[=LIST]
   enable expensive run-time checks. With LIST, enable
@@ -6270,9 +6273,22 @@ fi
 # * C++11 narrowing conversions in { }
 # So, we only use -pedantic if we can disable those warnings.
 
+# In stage 1, disable -Wformat warnings from old GCCs about new % codes
+# Check whether --enable-build-format-warnings was given.
+if test ${enable_build_format_warnings+set} = set; then :
+  enableval=$enable_build_format_warnings;
+else
+  enable_build_format_warnings=yes
+fi
+
+if test $enable_build_format_warnings = no; then :
+  wf_opt=-Wno-format
+else
+  wf_opt=
+fi
 loose_warn=
 save_CFLAGS=$CFLAGS
-for real_option in -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual; do
+for real_option in -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual $wf_opt; do
   # Do the check with the no- prefix removed since gcc silently
   # accepts any -Wno-* option on purpose
   case $real_option in
@@ -17897,7 +17913,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 17900 configure
+#line 17916 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -18003,7 +18019,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 18006 configure
+#line 18022 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 5e686db..3d3b26b 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -326,8 +326,14 @@ GCC_STDINT_TYPES
 # * C++11 narrowing conversions in { }
 # So, we only use -pedantic if we can disable those warnings.
 
+# In stage 1, disable -Wformat warnings from old GCCs about new % codes
+AC_ARG_ENABLE(build-format-warnings,
+  AS_HELP_STRING([--disable-build-format-warnings],[don't use -Wformat while building GCC]),
+  [],[enable_build_format_warnings=yes])
+AS_IF([test $enable_build_format_warnings = no],
+  [wf_opt=-Wno-format],[wf_opt=])
 ACX_PROG_CC_WARNING_OPTS(
-	m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual])), [loose_warn])
+	m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual $wf_opt])),

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-30 Thread Jason Merrill


On 10/30/2013 10:52 AM, Marek Polacek wrote:

+ if ((flag_sanitize  SANITIZE_VLA)
+  !processing_template_decl


You don't need to check processing_template_decl; the template case was 
already handled above.



+ tree x = cp_save_expr (size);
+ x = build2 (COMPOUND_EXPR, TREE_TYPE (x),
+ ubsan_instrument_vla (input_location, x), x);
+ finish_expr_stmt (x);


Saving 'size' here doesn't help since it's already been used above. 
Could you use itype instead of size here?


Jason

Re: [RFA][PATCH] Minor fix to aliasing machinery

2013-10-30 Thread Jeff Law


On 10/30/13 03:34, Richard Biener wrote:


 * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle case where
 ao_ref_base returns a MEM_REF.

 * gcc.dg/tree-ssa/alias-26.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
new file mode 100644
index 000..b5625b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-26.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -O1 -fdump-tree-optimized } */
+
+void f (long *p) {
+  *p = 42;
+  p[4] = 42;
+  __builtin_memset (p, 0, 100);
+}
+
+/* { dg-final { scan-tree-dump-not = 42 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
+
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 4db83bd..5120e72 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2079,6 +2079,7 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
   tree dest = gimple_call_arg (stmt, 0);
   tree len = gimple_call_arg (stmt, 2);
   tree base = NULL_TREE;
+ tree ref_base;
   HOST_WIDE_INT offset = 0;
   if (!host_integerp (len, 0))
 return false;
@@ -2087,8 +2088,11 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
   offset);
   else if (TREE_CODE (dest) == SSA_NAME)
 base = dest;
+ ref_base = ao_ref_base (ref);
   if (base
-  base == ao_ref_base (ref))
+  ((TREE_CODE (ref_base) == MEM_REF
+   base == TREE_OPERAND (ref_base, 0))


That's not sufficient - ref_base may have an offset, so for correctness
you have to check that integer_zerop (TREE_OPERAND (ref_base, 0)).
But this now looks convoluted and somewhat backward, and still
does not catch all cases (including the def-stmt lookup recently
added to ao_ref_from_ptr_and_size).
So how do you want to proceed?  I'm not really up for burning through 
this code right now and trying to sort out how it ought to work.


Perhaps checkin the test (xfailed) and wait for someone with the 
interest and time to push this through to completion?


jeff

Re: [PATCH v2 3/6] Split symtab_node declarations onto multiple lines

2013-10-30 Thread David Malcolm

On Tue, 2013-09-10 at 15:36 +0200, Jan Hubicka wrote:
  Amongst other things, the rename_symtab.py script converts
  symtab_node to symtab_node *.
  
  This will lead to broken code on declarations that declare
  more than one variable (only the first would get a *), so split
  up such declarations.
  
  gcc/
  * cgraphunit.c (analyze_functions): Split symtab_node
  declarations onto multiple lines to make things easier
  for rename_symtab.py.
  
  * symtab.c (symtab_dissolve_same_comdat_group_list): Likewise.
  (symtab_semantically_equivalent_p): Likewise.
  
  gcc/lto
  * lto-symtab.c (lto_symtab_merge_decls_2): Split symtab_node
  declarations onto multiple lines to make things easier for
  rename_symtab.py.
  (lto_symtab_merge_decls_1): Likewise.
  (lto_symtab_merge_symbols_1): Likewise.
 
 OK
Thanks; committed to trunk as r204216.

Re: [PATCH] Keep REG_INC note in subreg2 pass

2013-10-30 Thread Jeff Law


On 10/30/13 00:09, Zhenqiang Chen wrote:

On 30 October 2013 02:47, Jeff Law l...@redhat.com wrote:

On 10/24/13 02:20, Zhenqiang Chen wrote:


Hi,

REG_INC note is lost in subreg2 pass when resolve_simple_move, which
might lead to wrong dependence for ira. e.g. In function
validate_equiv_mem of ira.c, it checks REG_INC note:

   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
  if ((REG_NOTE_KIND (note) == REG_INC
   || REG_NOTE_KIND (note) == REG_DEAD)
   REG_P (XEXP (note, 0))
   reg_overlap_mentioned_p (XEXP (note, 0), memref))
return 0;

Without REG_INC note, validate_equiv_mem will return a wrong result.

Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022  for more

detail about a real case in kernel.

Bootstrap and no make check regression on X86-64 and ARM.

Is it OK for trunk and 4.8?

Thanks!
-Zhenqiang

ChangeLog:
2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

  * lower-subreg.c (resolve_simple_move): Copy REG_INC note.

testsuite/ChangeLog:
2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

  * gcc.target/arm/lp1243022.c: New test.


This clearly handles adding a note when the destination is a MEM with a side
effect.  What about cases where the side effect is associated with a load
from memory rather than a store to memory?


Yes. We should handle load from memory.




lp1243022.patch


diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index 57b4b3c..e710fa5 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn)
 mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
 minsn = emit_move_insn (real_dest, mdest);

+#ifdef AUTO_INC_DEC
+  /* Copy the REG_INC notes.  */
+  if (MEM_P (real_dest)  !(resolve_reg_p (real_dest)
+|| resolve_subreg_p (real_dest)))
+   {
+ rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+ if (note)
+   {
+ if (!REG_NOTES (minsn))
+   REG_NOTES (minsn) = note;
+ else
+   add_reg_note (minsn, REG_INC, note);
+   }
+   }
+#endif


If MINSN does not have any notes, then this results in MINSN and INSN
sharing the note.  Note carefully that notes are chained (see implementation
of add_reg_note).  Thus the sharing would result in MINSN and INSN actually
sharing a chain of notes.  I'm pretty sure that's not what you intended.  I
think you need to always use add_reg_note.


Yes. I should use add_reg_note.

Here is the updated patch:

diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index ebf364f..16dfa62 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -967,7 +967,20 @@ resolve_simple_move (rtx set, rtx insn)
rtx reg;

reg = gen_reg_rtx (orig_mode);
+
+#ifdef AUTO_INC_DEC
+  {
+   rtx move = emit_move_insn (reg, src);
+   if (MEM_P (src))
+ {
+   rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+   if (note)
+ add_reg_note (move, REG_INC, XEXP (note, 0));
+ }
+  }
+#else
emit_move_insn (reg, src);
+#endif
src = reg;
  }

@@ -1057,6 +1070,16 @@ resolve_simple_move (rtx set, rtx insn)
 mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
minsn = emit_move_insn (real_dest, mdest);

+#ifdef AUTO_INC_DEC
+  if (MEM_P (real_dest)  !(resolve_reg_p (real_dest)
+|| resolve_subreg_p (real_dest)))

Formatting nit.   This should be formatted as

if (MEM_P (real_dest)
 !(resolve_reg_p (real_dest) || resolve_subreg_p (real_dest)))

If that results in too long of a line, then it should wrap like this:

if (MEM_P (real_dest)
 !(resolve_reg_p (real_dest)
 || resolve_subreg_p (real_dest)))

OK with that change.  Please install on the trunk.  The 4.8 maintainers 
have the final call for the 4.8 release branch.


Thanks,
Jeff

Re: PATCH to use -Wno-format during stage1

2013-10-30 Thread Paolo Bonzini

Il 30/10/2013 16:47, Jason Merrill ha scritto:
 I find -Wformat warnings about unknown % codes when building with an
 older GCC to be mildly annoying noise; this patch avoids them by passing
 -Wno-format during stage 1.
 
 Tested x86_64-pc-linux-gnu.  Is this OK for trunk?

Ok.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Jeff Law


On 10/30/13 03:26, Richard Biener wrote:

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 3ddceb9..20f6010 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -174,7 +174,7 @@ gimple_build_with_ops_stat (enum gimple_code code, unsigned 
subcode,
  gimple
  gimple_build_return (tree retval)
  {
-  gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 1);
+  gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 2);


Ick - you enlarge all return statements?  But you don't set the actual value?
So why allocate it with 2 ops in the first place??

[Seems I completely missed that MPX changes gimple and the design
document that was posted somewhere??]
I'd assumed that'd be in a follow-up patch and probably would be bounds 
on the return value or NULL if there were no bounds.


I'm terribly concerned about the enlarging of the return statements.  On 
the other hand, if something enlarged a GIMPLE_ASSIGN, then I would have 
called it out for further discussion.





Bah.

Where is the update to gimple.texi and tree.texi?

Good catch.  Ilya, please send a patch to update the docs.

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-30 Thread Marek Polacek

On Wed, Oct 30, 2013 at 11:56:25AM -0400, Jason Merrill wrote:
 On 10/30/2013 10:52 AM, Marek Polacek wrote:
 + if ((flag_sanitize  SANITIZE_VLA)
 +  !processing_template_decl
 
 You don't need to check processing_template_decl; the template case
 was already handled above.

Right, removed.
 
 + tree x = cp_save_expr (size);
 + x = build2 (COMPOUND_EXPR, TREE_TYPE (x),
 + ubsan_instrument_vla (input_location, x), x);
 + finish_expr_stmt (x);
 
 Saving 'size' here doesn't help since it's already been used above.
 Could you use itype instead of size here?

I already experimented with that and I think I can't, since we call
the finish_expr_stmt too soon, which results in:

int x = 1;
int a[0:(sizetype) SAVE_EXPR D.2143];
  
cleanup_point   int x = 1;;
cleanup_point  Unknown tree: expr_stmt
if (SAVE_EXPR D.2143 = 0)
  {   
__builtin___ubsan_handle_vla_bound_not_positive (*.Lubsan_data0, 
(unsigned long) SAVE_EXPR D.2143);
  }   
else
  {   
0   
  }, (void) SAVE_EXPR D.2143; ;
  ssizetype D.2143;
cleanup_point  Unknown tree: expr_stmt
(void) (D.2143 = (ssizetype)  ++x + -1) ;

and that ICEs in gimplify_var_or_parm_decl, presumably because the
if (SAVE_EXPR D.2143 = 0) { ... } should be emitted *after* that
cleanup_point.  When we generated the C++1y check in cp_finish_decl,
we emitted the check after the cleanup_point, and everything was OK.
I admit I don't understand the cleanup_points very much and I don't
know exactly where they are coming from, because normally I don't see
them coming out of C FE. :)  Thanks.

Marek

Re: [RFA][PATCH] Minor fix to aliasing machinery

2013-10-30 Thread Marc Glisse


On Wed, 30 Oct 2013, Richard Biener wrote:


Btw, get_addr_base_and_unit_offset may also return an offsetted
MEM_REF (from MEM [p_3, 17] for example).  As we are interested in
pointers this could be handled by not requiring a memory reference
but extracting the base address and offset, covering more cases.


I tried the attached patch, and it almost worked, except for one fortran 
testcase (widechar_intrinsics_10.f90):


! { dg-do run }
! { dg-options -fbackslash }

  implicit none
  character(kind=1,len=3) :: s1(3)

  s1 = [ abc, def, ghi ]

  if (any (cshift (s1, -1) /= [ s1(3), s1(1:2) ])) call abort

end


we end up with a double array_ref, one of variable index, and 
get_ref_base_and_extent signals that by returning as size the size of the 
whole declaration (the double-array). However, 
ao_ref_init_from_ptr_and_size happily ignores the last two parameters of 
get_ref_base_and_extent and continues as if nothing was wrong (I don't 
know how to easily check that something went wrong either). Whether we 
think this is a good/bad patch for memcpy, we have a latent bug that the 
recent patches are making more visible.


--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/alias-26.c
===
--- testsuite/gcc.dg/tree-ssa/alias-26.c(revision 0)
+++ testsuite/gcc.dg/tree-ssa/alias-26.c(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O1 -fdump-tree-optimized } */
+
+void f (long *p) {
+  *p = 42;
+  p[4] = 42;
+  __builtin_memset (p, 0, 100);
+}
+
+/* { dg-final { scan-tree-dump-not = 42 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Property changes on: testsuite/gcc.dg/tree-ssa/alias-26.c
___
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 204199)
+++ tree-ssa-alias.c(working copy)
@@ -1982,23 +1982,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre
   return stmt_may_clobber_ref_p_1 (stmt, r);
 }
 
 /* If STMT kills the memory reference REF return true, otherwise
return false.  */
 
 static bool
 stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref)
 {
   /* For a must-alias check we need to be able to constrain
- the access properly.  */
-  ao_ref_base (ref);
-  if (ref-max_size == -1)
+ the access properly.
+ FIXME: except for BUILTIN_FREE.  */
+  if (!ao_ref_base (ref)
+  || ref-max_size == -1)
 return false;
 
   if (gimple_has_lhs (stmt)
TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME
   /* The assignment is not necessarily carried out if it can throw
 and we can catch it in the current function where we could inspect
 the previous value.
 ???  We only need to care about the RHS throwing.  For aggregate
 assignments or similar calls and non-call exceptions the LHS
 might throw as well.  */
@@ -2071,37 +2072,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
  case BUILT_IN_MEMPCPY:
  case BUILT_IN_MEMMOVE:
  case BUILT_IN_MEMSET:
  case BUILT_IN_MEMCPY_CHK:
  case BUILT_IN_MEMPCPY_CHK:
  case BUILT_IN_MEMMOVE_CHK:
  case BUILT_IN_MEMSET_CHK:
{
  tree dest = gimple_call_arg (stmt, 0);
  tree len = gimple_call_arg (stmt, 2);
- tree base = NULL_TREE;
- HOST_WIDE_INT offset = 0;
+ tree rbase = ref-base;
+ HOST_WIDE_INT roffset = ref-offset;
  if (!host_integerp (len, 0))
return false;
- if (TREE_CODE (dest) == ADDR_EXPR)
-   base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0),
- offset);
- else if (TREE_CODE (dest) == SSA_NAME)
-   base = dest;
- if (base
-  base == ao_ref_base (ref))
+ ao_ref dref;
+ ao_ref_init_from_ptr_and_size (dref, dest, len);
+ tree base = ao_ref_base (dref);
+ HOST_WIDE_INT offset = dref.offset;
+ if (!base)
+   return false;
+ if (TREE_CODE (base) == MEM_REF)
+   {
+ if (TREE_CODE (rbase) != MEM_REF)
+   return false;
+ // Compare pointers.
+ offset += BITS_PER_UNIT
+   * TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
+ roffset += BITS_PER_UNIT
+* TREE_INT_CST_LOW (TREE_OPERAND (rbase, 1));
+ base = TREE_OPERAND (base, 0);
+ rbase = TREE_OPERAND (rbase, 0);
+   }
+ if (base == rbase)
{
-

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Richard Henderson

On 10/30/2013 02:47 AM, Jakub Jelinek wrote:
 2013-10-30  Jakub Jelinek  ja...@redhat.com
 
   * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If
   op1 is misaligned_operand, just use *movmode_internal insn
   rather than UNSPEC_LOADU load.
   (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only).
   Avoid gen_lowpart on op0 if it isn't MEM.

Ok.


r~

Re: C++ PATCH to deal with trivial but non-callable [cd]tors

2013-10-30 Thread Jason Merrill


On 10/30/2013 06:14 AM, Eric Botcazou wrote:

+/* Return whether DECL, a method of a C++ TYPE, is trivial, that is to say
+   doesn't do anything for the objects of TYPE.  */
+
+static bool
+is_trivial_method (const_tree decl, const_tree type)
+{
+  if (cpp_check (decl, IS_CONSTRUCTOR)  !TYPE_NEEDS_CONSTRUCTING (type))
+return true;


This will tell you whether decl is a constructor for a type with some 
non-trivial constructor, but not whether decl itself is non-trivial.


I think a good way to check for any non-trivial methods would be to 
check trivial_type_p in the front end and then see if there are any 
!DECL_ARTIFICIAL decls in TYPE_METHODS.


Jason

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Jeff Law


On 10/30/13 04:34, Ilya Enkovich wrote:

On 30 Oct 10:26, Richard Biener wrote:


Ick - you enlarge all return statements?  But you don't set the
actual value? So why allocate it with 2 ops in the first place??


When return does not return bounds it has operand with zero value
similar to case when it does not return value. What is the difference
then?
In general, when someone proposes a change in the size of tree, rtl or 
gimple nodes, it's a yellow flag that something may need further 
investigation.


In this specific instance, I could trivially predict how that additional 
field would be used and a GIMPLE_RETURN isn't terribly important from a 
size standpoint, so I didn't call it out.




Returns instrumentation. We add new operand to return statement to
hold returned bounds and instrumentation pass is responsible to fill
this operand with correct bounds

Exactly what I expected.



Unfortunately patch has been already installed.  Should we uninstall
it?  If not, then here is patch for documentation.

I think we're OK for now.  If Richi wants it out, he'll say so explicitly.




Thanks, Ilya --

gcc/

2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

* doc/gimple.texi (gimple_call_num_nobnd_args): New.
(gimple_call_nobnd_arg): New. (gimple_return_retbnd): New.
(gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index):
New.
Can you also fixup the GIMPLE_RETURN documentation in gimple.texi.  It 
needs a minor update after these changes.


jeff

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Cong Hou

I found my problem: I put DONE outside of if not inside. You are
right. I have updated my patch.

I appreciate your comment and test on it!


thanks,
Cong



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..84c7ab5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-22  Cong Hou  co...@google.com
+
+ PR target/58762
+ * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
+ * config/i386/i386.c (ix86_expand_sse2_abs): New function.
+ * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
+
 2013-10-14  David Malcolm  dmalc...@redhat.com

  * dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3ab2f3a..ca31224 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx,
rtx, rtx, bool, bool);
 extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
+extern void ix86_expand_sse2_abs (rtx, rtx);

 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02cbbbd..71905fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
gen_rtx_MULT (mode, op1, op2));
 }

+void
+ix86_expand_sse2_abs (rtx op0, rtx op1)
+{
+  enum machine_mode mode = GET_MODE (op0);
+  rtx tmp0, tmp1;
+
+  switch (mode)
+{
+  /* For 32-bit signed integer X, the best way to calculate the absolute
+ value of X is (((signed) X  (W-1)) ^ X) - ((signed) X  (W-1)).  */
+  case V4SImode:
+ tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
+GEN_INT (GET_MODE_BITSIZE
+ (GET_MODE_INNER (mode)) - 1),
+NULL, 0, OPTAB_DIRECT);
+ if (tmp0)
+  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
+  NULL, 0, OPTAB_DIRECT);
+ if (tmp0  tmp1)
+  expand_simple_binop (mode, MINUS, tmp1, tmp0,
+   op0, 0, OPTAB_DIRECT);
+ break;
+
+  /* For 16-bit signed integer X, the best way to calculate the absolute
+ value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
+  case V8HImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  /* For 8-bit signed integer X, the best way to calculate the absolute
+ value of X is min ((unsigned char) X, (unsigned char) (-X)),
+ as SSE2 provides the PMINUB insn.  */
+  case V16QImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  default:
+ break;
+}
+}
+
 /* Expand an insert into a vector register through pinsr insn.
Return true if successful.  */

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c3f6c94..46e1df4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8721,7 +8721,7 @@
(set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn)))
(set_attr mode DI)])

-(define_insn absmode2
+(define_insn *absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v)
  (abs:VI124_AVX2_48_AVX512F
   (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))]
@@ -8733,6 +8733,19 @@
(set_attr prefix maybe_vex)
(set_attr mode sseinsnmode)])

+(define_expand absmode2
+  [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand)
+ (abs:VI124_AVX2_48_AVX512F
+  (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))]
+  TARGET_SSE2
+{
+  if (!TARGET_SSSE3)
+{
+  ix86_expand_sse2_abs (operands[0], operands[1]);
+  DONE;
+}
+})
+
 (define_insn absmode2
   [(set (match_operand:MMXMODEI 0 register_operand =y)
  (abs:MMXMODEI
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 075d071..cf5b942 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-22  Cong Hou  co...@google.com
+
+ PR target/58762
+ * gcc.dg/vect/pr58762.c: New test.
+
 2013-10-14  Tobias Burnus  bur...@net-b.de

  PR fortran/58658
diff --git a/gcc/testsuite/gcc.dg/vect/pr58762.c
b/gcc/testsuite/gcc.dg/vect/pr58762.c
new file mode 100644
index 000..6468d0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58762.c
@@ -0,0 +1,28 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-do compile } */
+/* { dg-options -O2 -ftree-vectorize } */
+
+void test1 (char* a, char* b)
+{
+  int i;
+  for (i = 0; i  1; ++i)
+a[i] = abs (b[i]);
+}
+
+void test2 (short* a, short* b)
+{
+  int i;
+  for (i = 0; i  1; ++i)
+a[i] = abs (b[i]);
+}
+
+void test3 (int* a, int* b)
+{
+  int i;
+  for (i = 0; i  1; ++i)
+a[i] = abs (b[i]);
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 3 vect
+   {

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Cong Hou

Forget to attach the patch file.



thanks,
Cong


On Wed, Oct 30, 2013 at 10:01 AM, Cong Hou co...@google.com wrote:
 I found my problem: I put DONE outside of if not inside. You are
 right. I have updated my patch.

 I appreciate your comment and test on it!


 thanks,
 Cong



 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 8a38316..84c7ab5 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,10 @@
 +2013-10-22  Cong Hou  co...@google.com
 +
 + PR target/58762
 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
 + * config/i386/i386.c (ix86_expand_sse2_abs): New function.
 + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
 +
  2013-10-14  David Malcolm  dmalc...@redhat.com

   * dumpfile.h (gcc::dump_manager): New class, to hold state
 diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
 index 3ab2f3a..ca31224 100644
 --- a/gcc/config/i386/i386-protos.h
 +++ b/gcc/config/i386/i386-protos.h
 @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx,
 rtx, rtx, bool, bool);
  extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
  extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
  extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 +extern void ix86_expand_sse2_abs (rtx, rtx);

  /* In i386-c.c  */
  extern void ix86_target_macros (void);
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 02cbbbd..71905fc 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
 gen_rtx_MULT (mode, op1, op2));
  }

 +void
 +ix86_expand_sse2_abs (rtx op0, rtx op1)
 +{
 +  enum machine_mode mode = GET_MODE (op0);
 +  rtx tmp0, tmp1;
 +
 +  switch (mode)
 +{
 +  /* For 32-bit signed integer X, the best way to calculate the absolute
 + value of X is (((signed) X  (W-1)) ^ X) - ((signed) X  (W-1)).  */
 +  case V4SImode:
 + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
 +GEN_INT (GET_MODE_BITSIZE
 + (GET_MODE_INNER (mode)) - 1),
 +NULL, 0, OPTAB_DIRECT);
 + if (tmp0)
 +  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
 +  NULL, 0, OPTAB_DIRECT);
 + if (tmp0  tmp1)
 +  expand_simple_binop (mode, MINUS, tmp1, tmp0,
 +   op0, 0, OPTAB_DIRECT);
 + break;
 +
 +  /* For 16-bit signed integer X, the best way to calculate the absolute
 + value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
 +  case V8HImode:
 + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
 + if (tmp0)
 +  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
 +   OPTAB_DIRECT);
 + break;
 +
 +  /* For 8-bit signed integer X, the best way to calculate the absolute
 + value of X is min ((unsigned char) X, (unsigned char) (-X)),
 + as SSE2 provides the PMINUB insn.  */
 +  case V16QImode:
 + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
 + if (tmp0)
 +  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
 +   OPTAB_DIRECT);
 + break;
 +
 +  default:
 + break;
 +}
 +}
 +
  /* Expand an insert into a vector register through pinsr insn.
 Return true if successful.  */

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index c3f6c94..46e1df4 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -8721,7 +8721,7 @@
 (set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p 
 (insn)))
 (set_attr mode DI)])

 -(define_insn absmode2
 +(define_insn *absmode2
[(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v)
   (abs:VI124_AVX2_48_AVX512F
(match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))]
 @@ -8733,6 +8733,19 @@
 (set_attr prefix maybe_vex)
 (set_attr mode sseinsnmode)])

 +(define_expand absmode2
 +  [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand)
 + (abs:VI124_AVX2_48_AVX512F
 +  (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))]
 +  TARGET_SSE2
 +{
 +  if (!TARGET_SSSE3)
 +{
 +  ix86_expand_sse2_abs (operands[0], operands[1]);
 +  DONE;
 +}
 +})
 +
  (define_insn absmode2
[(set (match_operand:MMXMODEI 0 register_operand =y)
   (abs:MMXMODEI
 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
 index 075d071..cf5b942 100644
 --- a/gcc/testsuite/ChangeLog
 +++ b/gcc/testsuite/ChangeLog
 @@ -1,3 +1,8 @@
 +2013-10-22  Cong Hou  co...@google.com
 +
 + PR target/58762
 + * gcc.dg/vect/pr58762.c: New test.
 +
  2013-10-14  Tobias Burnus  bur...@net-b.de

   PR fortran/58658
 diff --git a/gcc/testsuite/gcc.dg/vect/pr58762.c
 b/gcc/testsuite/gcc.dg/vect/pr58762.c
 new file mode 100644
 index 000..6468d0a
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vect/pr58762.c
 @@ -0,0 +1,28 @@
 +/* { dg-require-effective-target vect_int } */
 +/* { dg-do compile } */
 +/* { dg-options -O2 -ftree-vectorize } */
 +
 +void test1 (char* a, char* b)
 +{
 +  int i;
 +  for (i = 0; i  1; ++i)
 +a[i] = abs (b[i]);
 +}
 +
 +void test2

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Jeff Law


On 10/30/13 09:09, Iyer, Balaji V wrote:

Hello Everyone, What I ideally wanted to do with my testsuite files
was that I want all the Cilk keywords test to compile no matter what
the architecture is, but it should only run in certain architectures
where the runtime is enabled (this is known statically and thus the
testsuite doesn't have to do anything to figure it out.). Can someone
please tell me how do I do this?


I can't recall a similar situation off the top of my head.  Are you 
using the dg framework?  Can you have multiple dg-do directives?


ie,
/* { dg-do compile } */
/* { dg-do run { target i?86-*-* x86-64-*-* } } */

?

That'd be my best guess.

jeff

RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Iyer, Balaji V

 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Wednesday, October 30, 2013 1:08 PM
 To: Iyer, Balaji V; Tobias Burnus; gcc patches
 Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in
 gcc.dg/cilk-plus/cilk-plus.exp

 On 10/30/13 09:09, Iyer, Balaji V wrote:
  Hello Everyone, What I ideally wanted to do with my testsuite files
  was that I want all the Cilk keywords test to compile no matter what
  the architecture is, but it should only run in certain architectures
  where the runtime is enabled (this is known statically and thus the
  testsuite doesn't have to do anything to figure it out.). Can someone
  please tell me how do I do this?

 I can't recall a similar situation off the top of my head.  Are you using the 
 dg
 framework?  Can you have multiple dg-do directives?

 ie,
 /* { dg-do compile } */
 /* { dg-do run { target i?86-*-* x86-64-*-* } } */

Yes, I am using the dg-framework.

I tried that couple months back, and from what it looked like, it was replacing 
the compile command with the run command, and then on non-x86 architectures, it 
was just ignoring the file...

But, will try it again to see if it works.

 ?

 That'd be my best guess.

 jeff

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Joseph S. Myers

On Wed, 30 Oct 2013, Jeff Law wrote:

 /* { dg-do compile } */
 /* { dg-do run { target i?86-*-* x86-64-*-* } } */

But with an effective-target keyword cilkplusrts or similar, rather than 
hardcoding the same list of targets in lots of places, please.

-- 
Joseph S. Myers
jos...@codesourcery.com

RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Iyer, Balaji V

 -Original Message-
 From: Joseph Myers [mailto:jos...@codesourcery.com]
 Sent: Wednesday, October 30, 2013 1:15 PM
 To: Jeff Law
 Cc: Iyer, Balaji V; Tobias Burnus; gcc patches
 Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in
 gcc.dg/cilk-plus/cilk-plus.exp

 On Wed, 30 Oct 2013, Jeff Law wrote:

  /* { dg-do compile } */
  /* { dg-do run { target i?86-*-* x86-64-*-* } } */

 But with an effective-target keyword cilkplusrts or similar, rather than
 hardcoding the same list of targets in lots of places, please.

Wow, I didn't know you could do that. Do you have an example where it is done 
that I can model after?

 --
 Joseph S. Myers
 jos...@codesourcery.com

[Patch] Fix canadian cross build on systems with no fenv.h

2013-10-30 Thread Steve Ellcey


I ran into a build problem while doing a canadian cross build of GCC.
I was building on linux to create a Windows (mingw) GCC that generates
code for mips-mti-elf.

The mips-mti-elf target uses newlib for its system headers and libraries
and the headers do not include a fenv.h header file.  However, when doing
a canadian cross build libstdc++ is built with the C++ compiler that runs
on linux (the build system), not the C++ that was just built for Windows
(the host system).  The problem is that the libstdc++ configure script
(in GLIBCXX_CHECK_C99_TR1) is checking for the fenv.h using C++ (not C)
and the C++ compiler running on linux does have an fenv.h header because
the latest libstdc++ builds always create this header for C++ regardless
of whether there is one on the system or not.  This results in the newly
created libstdc++ library having a fenv.h header that tries to include the
system fenv.h header file which does not exist and the build fails.

My fix for this is to explicitly check for fenv.h and complex.h in the
libstdc++ configure.ac script before calling GLIBCXX_CHECK_C99_TR1.
This makes the configure script check for these headers using the C
compiler instead of the C++ compiler and when GLIBCXX_CHECK_C99_TR1
is run it uses that information (saved in a autoconf variable) to 
correctly ascertain that fenv.h does not exist.

I could put these checks in GLIBCXX_CHECK_C99_TR1 if that were considered
preferable, it would just have to be done before we call 'AC_LANG_CPLUSPLUS'.

Tested with both my canadian cross build and a standard cross build
targetting mips-mti-elf.

OK for checkin?

Steve Ellcey
sell...@mips.com


2013-10-30  Steve Ellcey  sell...@mips.com

* configure.ac: Add header checks for fenv.h and complex.h.
* configure: Regenerate.


diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index dd13b01..22fc840 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -195,6 +195,12 @@ GLIBCXX_CHECK_S_ISREG_OR_S_IFREG
 AC_CHECK_HEADERS(sys/uio.h)
 GLIBCXX_CHECK_WRITEV
 
+# Check for fenv.h and complex.h before GLIBCXX_CHECK_C99_TR1
+# so that the check is done with the C compiler (not C++).
+# Checking with C++ can break a canadian cross build if either
+# file does not exist in C but does in C++.
+AC_CHECK_HEADERS(fenv.h complex.h)
+
 # For C99 support to TR1.
 GLIBCXX_CHECK_C99_TR1

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Jeff Law

On 10/30/13 11:12, Iyer, Balaji V wrote:

-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Wednesday, October 30, 2013 1:08 PM
To: Iyer, Balaji V; Tobias Burnus; gcc patches
Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in
gcc.dg/cilk-plus/cilk-plus.exp

On 10/30/13 09:09, Iyer, Balaji V wrote:

Hello Everyone, What I ideally wanted to do with my testsuite files
was that I want all the Cilk keywords test to compile no matter what
the architecture is, but it should only run in certain architectures
where the runtime is enabled (this is known statically and thus the
testsuite doesn't have to do anything to figure it out.). Can someone
please tell me how do I do this?

I can't recall a similar situation off the top of my head.  Are you using the dg
framework?  Can you have multiple dg-do directives?

ie,
/* { dg-do compile } */
/* { dg-do run { target i?86-*-* x86-64-*-* } } */

Yes, I am using the dg-framework.

I tried that couple months back, and from what it looked like, it was replacing 
the compile command with the run command, and then on non-x86 architectures, it 
was just ignoring the file...

But, will try it again to see if it works.
Hmm, if that's the case, you may need distinct tests.  Judicious use of 
#include files would avoid unnecessary duplication.

jeff

C++ PATCH to C++1y VLA of 0 length

2013-10-30 Thread Jason Merrill

At the Chicago meeting the EWG agreed that we don't need to throw on 
0-length VLAs.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1e328cbd26bfb641db8e218e4a4c32fc1a9a8d9d
Author: Jason Merrill ja...@redhat.com
Date:   Fri Oct 25 06:15:01 2013 -0400

	* decl.c (cp_finish_decl): Never throw for VLA bound == 0.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1e92f2a..476d559 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6404,11 +6404,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
   /* If the VLA bound is larger than half the address space, or less
 	 than zero, throw std::bad_array_length.  */
   tree max = convert (ssizetype, TYPE_MAX_VALUE (TYPE_DOMAIN (type)));
-  /* C++1y says we should throw for length = 0, but we have
-	 historically supported zero-length arrays.  Let's treat that as an
-	 extension to be disabled by -std=c++NN.  */
-  int lower = flag_iso ? 0 : -1;
-  tree comp = build2 (LT_EXPR, boolean_type_node, max, ssize_int (lower));
+  tree comp = build2 (LT_EXPR, boolean_type_node, max, ssize_int (-1));
   comp = build3 (COND_EXPR, void_type_node, comp,
 		 throw_bad_array_length (), void_zero_node);
   finish_expr_stmt (comp);

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Uros Bizjak

On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote:
 I found my problem: I put DONE outside of if not inside. You are
 right. I have updated my patch.

OK, great that we put things in order ;)

Does this patch need some extra middle-end functionality? I was not
able to vectorize char and short part of your patch.

Regarding the testcase - please put it to gcc.target/i386/ directory.
There is nothing generic in the test, as confirmed by target-dependent
scan test. You will find plenty of examples in the mentioned
directory. I'd suggest to split the testcase in three files, and to
simplify it to something like the testcase with global variables I
used earlier.

Modulo testcase, the patch is OK otherwise, but middle-end parts
should be committed first.

Thanks,
Uros.

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Jeff Law

On 10/30/13 11:16, Iyer, Balaji V wrote:

-Original Message-
From: Joseph Myers [mailto:jos...@codesourcery.com]
Sent: Wednesday, October 30, 2013 1:15 PM
To: Jeff Law
Cc: Iyer, Balaji V; Tobias Burnus; gcc patches
Subject: Re: Testsuite / Cilk Plus: Include library path in compile flags in
gcc.dg/cilk-plus/cilk-plus.exp

On Wed, 30 Oct 2013, Jeff Law wrote:

/* { dg-do compile } */
/* { dg-do run { target i?86-*-* x86-64-*-* } } */

But with an effective-target keyword cilkplusrts or similar, rather than
hardcoding the same list of targets in lots of places, please.

Wow, I didn't know you could do that. Do you have an example where it is done 
that I can model after?

Look at
testsuite/lib/target-support.exp for check_*.  Once you see one or two, 
you can search in testsuite/gcc.dg for examples.

jeff

Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops

2013-10-30 Thread Jason Merrill


OK.

Jason

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Jeff Law


On 10/30/13 04:48, Richard Biener wrote:

foo (int * p, unsigned int size)
{
   unnamed type __bound_tmp.0;
   long unsigned int D.2239;
   long unsigned int _2;
   sizetype _6;
   int * _7;

   bb 3:
   __bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));

   bb 2:
   _2 = (long unsigned int) size_1(D);
   __builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
   _6 = _2 + 18446744073709551615;
   _7 = p_3(D) + _6;
   __builtin_ia32_bndcu (__bound_tmp.0_4, _7);
   access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));

so it seems there is now a mismatch between DECL_ARGUMENTS
and the GIMPLE call stmt arguments.  How (if) did you amend
the GIMPLE stmt verifier for this?

Effectively the bounds are passed on the side.



How does regular code deal with this which may expect matching
to DECL_ARGUMENTS?  In fact interleaving the additional
arguments sounds very error-prone for existing code - I'd have
appended all bound args at the end.  Also you unconditionally
claim all pointer arguments have a bound - that looks like bad
design as well.  Why didn't you add a flag to the relevant
PARM_DECL (and then, what do you do for indirect calls?).
You can't actually interleave them -- that results in MPX and normal 
code not being able to interact.   Passing the bound at the end doesn't 
really work either -- varargs and the desire to pass some of the bounds 
around in bound registers.





/* Return the number of arguments used by call statement GS
ignoring bound ones.  */

static inline unsigned
gimple_call_num_nobnd_args (const_gimple gs)
{
   unsigned num_args = gimple_call_num_args (gs);
   unsigned res = num_args;
   for (unsigned n = 0; n  num_args; n++)
 if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
   res--;
   return res;
}

the choice means that gimple_call_num_nobnd_args is not O(1).

Yes, but I don't see that's terribly problematical.




/* Return INDEX's call argument ignoring bound ones.  */
static inline tree
gimple_call_nobnd_arg (const_gimple gs, unsigned index)
{
   /* No bound args may exist if pointers checker is off.  */
   if (!flag_check_pointer_bounds)
 return gimple_call_arg (gs, index);
   return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index));
}

GIMPLE layout depending on flag_check_pointer_bounds sounds
like a recipie for desaster if you consider TUs compiled with and
TUs compiled without and LTO.  Or if you consider using
optimized attribute with that flag.

Sorry, I don't follow.  Can you elaborate please.


I hope the reviewers that approved the patch will work with you to
address the above issues.  I can't be everywhere.

Obviously I will.

jeff

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Jakub Jelinek

On Wed, Oct 30, 2013 at 09:17:04AM -0700, Richard Henderson wrote:
 On 10/30/2013 02:47 AM, Jakub Jelinek wrote:
  2013-10-30  Jakub Jelinek  ja...@redhat.com
  
  * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If
  op1 is misaligned_operand, just use *movmode_internal insn
  rather than UNSPEC_LOADU load.
  (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only).
  Avoid gen_lowpart on op0 if it isn't MEM.
 
 Ok.

Testing revealed some testsuite failures, due to either trying to match
insn names in -dp dump or counting specific FMA insns, where with the
patch there are changes like:
-   vmovupd 0(%r13,%rax), %ymm0
-   vfmadd231pd %ymm1, %ymm2, %ymm0
+   vmovapd %ymm2, %ymm0
+   vfmadd213pd 0(%r13,%rax), %ymm1, %ymm0

So, here is updated patch with those testsuite changes and added PR line
to ChangeLog.  I'll wait for Uros' testresults.

2013-10-30  Jakub Jelinek  ja...@redhat.com

PR target/47754
* config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If
op1 is misaligned_operand, just use *movmode_internal insn
rather than UNSPEC_LOADU load.
(ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only).
Avoid gen_lowpart on op0 if it isn't MEM.

* gcc.target/i386/avx256-unaligned-load-1.c: Adjust scan-assembler
and scan-assembler-not regexps.
* gcc.target/i386/avx256-unaligned-load-2.c: Likewise.
* gcc.target/i386/avx256-unaligned-load-3.c: Likewise.
* gcc.target/i386/avx256-unaligned-load-4.c: Likewise.
* gcc.target/i386/l_fma_float_1.c: Expect vf{,n}m{add,sub}213*p*
instead of vf{,n}m{add,sub}231*p*.
* gcc.target/i386/l_fma_float_3.c: Likewise.
* gcc.target/i386/l_fma_double_1.c: Likewise.
* gcc.target/i386/l_fma_double_3.c: Likewise.

--- gcc/config/i386/i386.c.jj   2013-10-30 08:15:38.0 +0100
+++ gcc/config/i386/i386.c  2013-10-30 10:20:22.684708729 +0100
@@ -16560,6 +16560,12 @@ ix86_avx256_split_vector_move_misalign (
  r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
  emit_move_insn (op0, r);
}
+  /* Normal *movmode_internal pattern will handle
+unaligned loads just fine if misaligned_operand
+is true, and without the UNSPEC it can be combined
+with arithmetic instructions.  */
+  else if (misaligned_operand (op1, GET_MODE (op1)))
+   emit_insn (gen_rtx_SET (VOIDmode, op0, op1));
   else
emit_insn (load_unaligned (op0, op1));
 }
@@ -16634,7 +16640,7 @@ ix86_avx256_split_vector_move_misalign (
 void
 ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
-  rtx op0, op1, m;
+  rtx op0, op1, orig_op0 = NULL_RTX, m;
   rtx (*load_unaligned) (rtx, rtx);
   rtx (*store_unaligned) (rtx, rtx);
 
@@ -16647,7 +16653,16 @@ ix86_expand_vector_move_misalign (enum m
{
case MODE_VECTOR_INT:
case MODE_INT:
- op0 = gen_lowpart (V16SImode, op0);
+ if (GET_MODE (op0) != V16SImode)
+   {
+ if (!MEM_P (op0))
+   {
+ orig_op0 = op0;
+ op0 = gen_reg_rtx (V16SImode);
+   }
+ else
+   op0 = gen_lowpart (V16SImode, op0);
+   }
  op1 = gen_lowpart (V16SImode, op1);
  /* FALLTHRU */
 
@@ -16676,6 +16691,8 @@ ix86_expand_vector_move_misalign (enum m
emit_insn (store_unaligned (op0, op1));
  else
gcc_unreachable ();
+ if (orig_op0)
+   emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
  break;
 
default:
@@ -16692,12 +16709,23 @@ ix86_expand_vector_move_misalign (enum m
{
case MODE_VECTOR_INT:
case MODE_INT:
- op0 = gen_lowpart (V32QImode, op0);
+ if (GET_MODE (op0) != V32QImode)
+   {
+ if (!MEM_P (op0))
+   {
+ orig_op0 = op0;
+ op0 = gen_reg_rtx (V32QImode);
+   }
+ else
+   op0 = gen_lowpart (V32QImode, op0);
+   }
  op1 = gen_lowpart (V32QImode, op1);
  /* FALLTHRU */
 
case MODE_VECTOR_FLOAT:
  ix86_avx256_split_vector_move_misalign (op0, op1);
+ if (orig_op0)
+   emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
  break;
 
default:
@@ -16709,15 +16737,30 @@ ix86_expand_vector_move_misalign (enum m
 
   if (MEM_P (op1))
 {
+  /* Normal *movmode_internal pattern will handle
+unaligned loads just fine if misaligned_operand
+is true, and without the UNSPEC it can be combined
+with arithmetic instructions.  */
+  if (TARGET_AVX
+  (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+ || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+  misaligned_operand (op1, GET_MODE (op1)))
+   emit_insn

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Uros Bizjak

On Wed, Oct 30, 2013 at 12:11 PM, Jakub Jelinek ja...@redhat.com wrote:

  Yesterday I've noticed that for AVX which allows unaligned operands in
  AVX arithmetics instructions we still don't combine unaligned loads with 
  the
  AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize

 This is actually PR 47754 that fell below radar for some reason...

 Apparently yes.

  the patch attempts to avoid gen_lowpart on the non-MEM lhs of the unaligned
  loads, which usually means combine will fail, by doing the load into a
  temporary pseudo in that case and then doing a pseudo to pseudo move with
  gen_lowpart on the rhs (which will be merged soon after into following
  instructions).

 Is this similar to PR44141? There were similar problems with V4SFmode
 subregs, so combine was not able to merge load to the arithemtic insn.

 From the work on the vectorization last year I remember many cases where
 subregs (even equal size) on the LHS of instructions prevented combiner or
 other RTL optimizations from doing it's job.  I believe I've changed some
 easy places that did that completely unnecessarily, but certainly have not
 went through all the code to look for other places where this is done.

 Perhaps let's hack up a checking pass that will after expansion walk the
 whole IL and complain about same sized subregs on the LHS of insns, then do 
 make
 check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.?

  I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately 
  my
  bootstrap/regtest server isn't AVX capable.

 I can bootstrap the patch later today on IvyBridge with
 --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

 That would be greatly appreciated, thanks.

The bootstrap and regression test was OK for x86_64-linux-gnu {,-m32}.

The failures in the attached report are either pre-existing or benign
due to core-avx-i default to AVX.

Please also mention PR44141 in the ChangeLog entry.

Uros.


mail-report.log.gz
Description: GNU Zip compressed data

Re: [RFC PATCH] For TARGET_AVX use *movmode_internal for misaligned loads

2013-10-30 Thread Uros Bizjak

On Wed, Oct 30, 2013 at 6:42 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Oct 30, 2013 at 12:11 PM, Jakub Jelinek ja...@redhat.com wrote:

  Yesterday I've noticed that for AVX which allows unaligned operands in
  AVX arithmetics instructions we still don't combine unaligned loads with 
  the
  AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize

 This is actually PR 47754 that fell below radar for some reason...

 Apparently yes.

  the patch attempts to avoid gen_lowpart on the non-MEM lhs of the 
  unaligned
  loads, which usually means combine will fail, by doing the load into a
  temporary pseudo in that case and then doing a pseudo to pseudo move with
  gen_lowpart on the rhs (which will be merged soon after into following
  instructions).

 Is this similar to PR44141? There were similar problems with V4SFmode
 subregs, so combine was not able to merge load to the arithemtic insn.

 From the work on the vectorization last year I remember many cases where
 subregs (even equal size) on the LHS of instructions prevented combiner or
 other RTL optimizations from doing it's job.  I believe I've changed some
 easy places that did that completely unnecessarily, but certainly have not
 went through all the code to look for other places where this is done.

 Perhaps let's hack up a checking pass that will after expansion walk the
 whole IL and complain about same sized subregs on the LHS of insns, then do 
 make
 check with it for a couple of ISAs (-msse2,-msse4,-mavx,-mavx2 e.g.?

  I'll bootstrap/regtest this on x86_64-linux and i686-linux, unfortunately 
  my
  bootstrap/regtest server isn't AVX capable.

 I can bootstrap the patch later today on IvyBridge with
 --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx.

 That would be greatly appreciated, thanks.

 The bootstrap and regression test was OK for x86_64-linux-gnu {,-m32}.

 The failures in the attached report are either pre-existing or benign
 due to core-avx-i default to AVX.

I was referring to the *runtime* failures here.

 Please also mention PR44141 in the ChangeLog entry.

Ops, this should be PR47754.

Thanks,
Uros.

Re: RFA: patch to fix PR58785 (an ARM LRA crash)

2013-10-30 Thread Richard Earnshaw (home)

On 30 Oct 2013, at 08:16, Vladimir Makarov vmaka...@redhat.com wrote:

 The following patch fixes:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58785
 
 LRA chooses constraint 'm' for const_int operand.  It means that the 
 const_int should be placed in memory but it does not happen as preferred 
 reload class hook returns LO_REGS for class NO_REGS which is result of 
 LRA choosing 'm'.  I don't know why reload pass needs such value but it 
 should be return NO_REGS IMHO as it results in much less reload insns.
 
 Is this patch ok to commit to the trunk?
 
 2013-10-30  Vladimir Makarov  vmaka...@redhat.com
 
 PR target/58785
 * config/arm/arm.c (arm_preferred_reload_class): Don't return
 LO_REGS for NO_REGS for LRA.
 
 2013-10-30  Vladimir Makarov  vmaka...@redhat.com
 
 PR target/58785
 * gcc.target/arm/pr58785.c: New.
 
 pr58785.patch

We've been suspicious of this hunk of code for a while now.  One reading of the 
manual suggests that p_r_c can only return a subset of rclass, not a different 
class.  On that basis, lo_regs as a result should only be returned when rclass 
is general_regs, even for traditional reload.

R.

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Cong Hou

On Wed, Oct 30, 2013 at 10:22 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote:
 I found my problem: I put DONE outside of if not inside. You are
 right. I have updated my patch.

 OK, great that we put things in order ;)

 Does this patch need some extra middle-end functionality? I was not
 able to vectorize char and short part of your patch.


In the original patch, I converted abs() on short and char values to
their own types by removing type casts. That is, originally char_val1
= abs(char_val2) will be converted to char_val1 = (char) abs((int)
char_val2) in the frontend, and I would like to convert it back to
char_val1 = abs(char_val2). But after several discussions, it seems
this conversion has some problems such as overflow converns, and I
thereby removed that part.

Now you should still be able to vectorize abs(char) and abs(short) but
with packing and unpacking. Later I will consider to write pattern
recognizer for abs(char) and abs(short) and then the expand on
abs(char)/abs(short) in this patch will be used during vectorization.



 Regarding the testcase - please put it to gcc.target/i386/ directory.
 There is nothing generic in the test, as confirmed by target-dependent
 scan test. You will find plenty of examples in the mentioned
 directory. I'd suggest to split the testcase in three files, and to
 simplify it to something like the testcase with global variables I
 used earlier.


I have done it. The test case is split into three for s8/s16/s32 in
gcc.target/i386.


Thank you!

Cong



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..84c7ab5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-22  Cong Hou  co...@google.com
+
+ PR target/58762
+ * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
+ * config/i386/i386.c (ix86_expand_sse2_abs): New function.
+ * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
+
 2013-10-14  David Malcolm  dmalc...@redhat.com

  * dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3ab2f3a..ca31224 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx,
rtx, rtx, bool, bool);
 extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
+extern void ix86_expand_sse2_abs (rtx, rtx);

 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02cbbbd..71905fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
gen_rtx_MULT (mode, op1, op2));
 }

+void
+ix86_expand_sse2_abs (rtx op0, rtx op1)
+{
+  enum machine_mode mode = GET_MODE (op0);
+  rtx tmp0, tmp1;
+
+  switch (mode)
+{
+  /* For 32-bit signed integer X, the best way to calculate the absolute
+ value of X is (((signed) X  (W-1)) ^ X) - ((signed) X  (W-1)).  */
+  case V4SImode:
+ tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
+GEN_INT (GET_MODE_BITSIZE
+ (GET_MODE_INNER (mode)) - 1),
+NULL, 0, OPTAB_DIRECT);
+ if (tmp0)
+  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
+  NULL, 0, OPTAB_DIRECT);
+ if (tmp0  tmp1)
+  expand_simple_binop (mode, MINUS, tmp1, tmp0,
+   op0, 0, OPTAB_DIRECT);
+ break;
+
+  /* For 16-bit signed integer X, the best way to calculate the absolute
+ value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
+  case V8HImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  /* For 8-bit signed integer X, the best way to calculate the absolute
+ value of X is min ((unsigned char) X, (unsigned char) (-X)),
+ as SSE2 provides the PMINUB insn.  */
+  case V16QImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  default:
+ break;
+}
+}
+
 /* Expand an insert into a vector register through pinsr insn.
Return true if successful.  */

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c3f6c94..46e1df4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8721,7 +8721,7 @@
(set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn)))
(set_attr mode DI)])

-(define_insn absmode2
+(define_insn *absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v)
  (abs:VI124_AVX2_48_AVX512F
   (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand vm)))]
@@ -8733,6 +8733,19 @@
(set_attr prefix maybe_vex)
(set_attr mode sseinsnmode)])

+(define_expand absmode2

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Cong Hou

Also, as the current expand for abs() on 8/16bit integer is not used
at all, should I comment them temporarily now? Later I can uncomment
them once I finished the pattern recognizer.



thanks,
Cong


On Wed, Oct 30, 2013 at 10:22 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou co...@google.com wrote:
 I found my problem: I put DONE outside of if not inside. You are
 right. I have updated my patch.

 OK, great that we put things in order ;)

 Does this patch need some extra middle-end functionality? I was not
 able to vectorize char and short part of your patch.

 Regarding the testcase - please put it to gcc.target/i386/ directory.
 There is nothing generic in the test, as confirmed by target-dependent
 scan test. You will find plenty of examples in the mentioned
 directory. I'd suggest to split the testcase in three files, and to
 simplify it to something like the testcase with global variables I
 used earlier.

 Modulo testcase, the patch is OK otherwise, but middle-end parts
 should be committed first.

 Thanks,
 Uros.

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Ilya Enkovich

On 30 Oct 11:40, Jeff Law wrote:
 On 10/30/13 04:48, Richard Biener wrote:
 foo (int * p, unsigned int size)
 {
unnamed type __bound_tmp.0;
long unsigned int D.2239;
long unsigned int _2;
sizetype _6;
int * _7;
 
bb 3:
__bound_tmp.0_4 = __builtin_ia32_arg_bnd (p_3(D));
 
bb 2:
_2 = (long unsigned int) size_1(D);
__builtin_ia32_bndcl (__bound_tmp.0_4, p_3(D));
_6 = _2 + 18446744073709551615;
_7 = p_3(D) + _6;
__builtin_ia32_bndcu (__bound_tmp.0_4, _7);
access_and_store (p_3(D), __bound_tmp.0_4, size_1(D));
 
 so it seems there is now a mismatch between DECL_ARGUMENTS
 and the GIMPLE call stmt arguments.  How (if) did you amend
 the GIMPLE stmt verifier for this?
 Effectively the bounds are passed on the side.
 
 
 How does regular code deal with this which may expect matching
 to DECL_ARGUMENTS?  In fact interleaving the additional
 arguments sounds very error-prone for existing code - I'd have
 appended all bound args at the end.  Also you unconditionally
 claim all pointer arguments have a bound - that looks like bad
 design as well.  Why didn't you add a flag to the relevant
 PARM_DECL (and then, what do you do for indirect calls?).
 You can't actually interleave them -- that results in MPX and normal
 code not being able to interact.   Passing the bound at the end
 doesn't really work either -- varargs and the desire to pass some of
 the bounds around in bound registers.
 
 
 
 /* Return the number of arguments used by call statement GS
 ignoring bound ones.  */
 
 static inline unsigned
 gimple_call_num_nobnd_args (const_gimple gs)
 {
unsigned num_args = gimple_call_num_args (gs);
unsigned res = num_args;
for (unsigned n = 0; n  num_args; n++)
  if (POINTER_BOUNDS_P (gimple_call_arg (gs, n)))
res--;
return res;
 }
 
 the choice means that gimple_call_num_nobnd_args is not O(1).
 Yes, but I don't see that's terribly problematical.
 
 
 
 /* Return INDEX's call argument ignoring bound ones.  */
 static inline tree
 gimple_call_nobnd_arg (const_gimple gs, unsigned index)
 {
/* No bound args may exist if pointers checker is off.  */
if (!flag_check_pointer_bounds)
  return gimple_call_arg (gs, index);
return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index));
 }
 
 GIMPLE layout depending on flag_check_pointer_bounds sounds
 like a recipie for desaster if you consider TUs compiled with and
 TUs compiled without and LTO.  Or if you consider using
 optimized attribute with that flag.
 Sorry, I don't follow.  Can you elaborate please.

I suppose the possile problem here is when we run LTO compiler without 
-fcheck-pointer-bounds and give instrumented code as input. 
gimple_call_nobnd_arg would work wrong for instrumented code. Actually there 
are other places in subsequent patches wich assume that 
flag_check_pointer_bounds is 1 if we have instrumented code. 

Ilya

 
 I hope the reviewers that approved the patch will work with you to
 address the above issues.  I can't be everywhere.
 Obviously I will.
 
 jeff

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-30 Thread Uros Bizjak

On Wed, Oct 30, 2013 at 7:01 PM, Cong Hou co...@google.com wrote:

 I found my problem: I put DONE outside of if not inside. You are
 right. I have updated my patch.

 OK, great that we put things in order ;)

 Does this patch need some extra middle-end functionality? I was not
 able to vectorize char and short part of your patch.


 In the original patch, I converted abs() on short and char values to
 their own types by removing type casts. That is, originally char_val1
 = abs(char_val2) will be converted to char_val1 = (char) abs((int)
 char_val2) in the frontend, and I would like to convert it back to
 char_val1 = abs(char_val2). But after several discussions, it seems
 this conversion has some problems such as overflow converns, and I
 thereby removed that part.

 Now you should still be able to vectorize abs(char) and abs(short) but
 with packing and unpacking. Later I will consider to write pattern
 recognizer for abs(char) and abs(short) and then the expand on
 abs(char)/abs(short) in this patch will be used during vectorization.

OK, this seems reasonable. We already have unused SSSE3 8/16 bit abs
pattern, so I think we can commit SSE2 expanders, even if they will be
unused for now. The proposed recognizer will benefit SSE2 as well as
existing SSSE3 patterns.

 Regarding the testcase - please put it to gcc.target/i386/ directory.
 There is nothing generic in the test, as confirmed by target-dependent
 scan test. You will find plenty of examples in the mentioned
 directory. I'd suggest to split the testcase in three files, and to
 simplify it to something like the testcase with global variables I
 used earlier.


 I have done it. The test case is split into three for s8/s16/s32 in
 gcc.target/i386.

OK.

The patch is OK for mainline, but please check formatting and
whitespace before the patch is committed.

Thanks,
Uros.

Re: [wide-int] Update main comment

2013-10-30 Thread Richard Sandiford

Kenneth Zadeck zad...@naturalbridge.com writes:
 On 10/30/2013 07:01 AM, Richard Sandiford wrote:
 Kenneth Zadeck zad...@naturalbridge.com writes:
 On 10/29/2013 06:37 PM, Richard Sandiford wrote:
 This patch tries to update the main wide_int comment to reflect the current
 implementation.

 - bitsizetype is TImode on x86_64 and others, so I don't think it's
 necessarily true that all offset_ints are signed.  (widest_int are
 though.)
 i am wondering if this is too conservative an interpretation.I
 believe that they are ti mode because that is the next thing after di
 mode and so they wanted to accommodate the 3 extra bits. Certainly there
 is no x86 that is able to address more than 64 bits.
 Right, but my point is that it's a different case from widest_int.
 It'd be just as valid to do bitsizetype arithmetic using wide_int
 rather than offset_int, and those wide_ints would have precision 128,
 just like the offset_ints.  And I wouldn't really say that those wide_ints
 were fundamentally signed in any way.  Although the tree layer might know
 that X upper bits of the bitsizetype are always signs, the tree-wide_int
 interface treats them in the same way as any other 128-bit type.

 Maybe I'm just being pedantic, but I think offset_int would only be like
 widest_int if bitsizetype had precision 67 or whatever.  Then we could
 say that both offset_int and widest_int must be wider than any inputs,
 meaning that there's at least one leading sign bit.
 this was of course what mike and i wanted, but we could not really 
 figure out how to pull it off.
 in particular, we could not find any existing reliable marker in the 
 targets to say what the width of the widest pointer on any 
 implementation.   We actually used the number 68 rather than 67 because 
 we assumed 64 for the widest pointer on any existing platform, 3 bits 
 for the bits and 1 bit for the sign.

Ah yeah, 68 would be better for signed types.

Is the patch OK while we still have 128-bit bitsizetypes though?
I agree the current comment would be right if we ever did switch
to sub-128 bitsizes.

Thanks,
Richard

Re: [PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces

2013-10-30 Thread Ilya Enkovich

2013/10/30 Jeff Law l...@redhat.com:
 On 10/30/13 04:34, Ilya Enkovich wrote:

 On 30 Oct 10:26, Richard Biener wrote:


 Ick - you enlarge all return statements?  But you don't set the
 actual value? So why allocate it with 2 ops in the first place??


 When return does not return bounds it has operand with zero value
 similar to case when it does not return value. What is the difference
 then?

 In general, when someone proposes a change in the size of tree, rtl or
 gimple nodes, it's a yellow flag that something may need further
 investigation.

 In this specific instance, I could trivially predict how that additional
 field would be used and a GIMPLE_RETURN isn't terribly important from a size
 standpoint, so I didn't call it out.



 Returns instrumentation. We add new operand to return statement to
 hold returned bounds and instrumentation pass is responsible to fill
 this operand with correct bounds

 Exactly what I expected.



 Unfortunately patch has been already installed.  Should we uninstall
 it?  If not, then here is patch for documentation.

 I think we're OK for now.  If Richi wants it out, he'll say so explicitly.




 Thanks, Ilya --

 gcc/

 2013-10-30  Ilya Enkovich  ilya.enkov...@intel.com

 * doc/gimple.texi (gimple_call_num_nobnd_args): New.
 (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New.
 (gimple_return_set_retbnd): New. (gimple_call_get_nobnd_arg_index):
 New.

 Can you also fixup the GIMPLE_RETURN documentation in gimple.texi.  It needs
 a minor update after these changes.

I could not find anything but accessors for GIMPLE_RETURN in
gimple.texi. And new accessors are in my doc patch already.

Ilya

 jeff

Re: C++ PATCH to deal with trivial but non-callable [cd]tors

2013-10-30 Thread Eric Botcazou

  +/* Return whether DECL, a method of a C++ TYPE, is trivial, that is to
  say +   doesn't do anything for the objects of TYPE.  */
  +
  +static bool
  +is_trivial_method (const_tree decl, const_tree type)
  +{
  +  if (cpp_check (decl, IS_CONSTRUCTOR)  !TYPE_NEEDS_CONSTRUCTING
  (type)) +return true;
 
 This will tell you whether decl is a constructor for a type with some
 non-trivial constructor, but not whether decl itself is non-trivial.

I think that's good enough though, in practice we only need to eliminate 
constructors/destructors for POD types.  As soon as there is one non-trivial 
method, the game is essentially over.

 I think a good way to check for any non-trivial methods would be to
 check trivial_type_p in the front end and then see if there are any
 !DECL_ARTIFICIAL decls in TYPE_METHODS.

That sounds interesting indeed, thanks for the tip.  I was initially reluctant 
to call into the front-end because of side-effects, but the various predicates 
in tree.c seem fine in this respect.

-- 
Eric Botcazou

[C++ Patch] PR 58581

2013-10-30 Thread Paolo Carlini


Hi,

to resolve this simple ICE we only have to check the return value of 
mark_used, like we do in a number of other places. Tested x86_64-linux.


Thanks!
Paolo.

/
/cp
2013-10-30  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58581
* call.c (build_over_call): Check return value of mark_used.

/testsuite
2013-10-30  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58581
* g++.dg/cpp0x/deleted1.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 204219)
+++ cp/call.c   (working copy)
@@ -7112,8 +7112,9 @@ build_over_call (struct z_candidate *cand, int fla
mark_versions_used (fn);
 }
 
-  if (!already_used)
-mark_used (fn);
+  if (!already_used
+   !mark_used (fn))
+return error_mark_node;
 
   if (DECL_VINDEX (fn)  (flags  LOOKUP_NONVIRTUAL) == 0
   /* Don't mess with virtual lookup in fold_non_dependent_expr; virtual
Index: testsuite/g++.dg/cpp0x/deleted1.C
===
--- testsuite/g++.dg/cpp0x/deleted1.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/deleted1.C   (working copy)
@@ -0,0 +1,6 @@
+// PR c++/58581
+// { dg-do compile { target c++11 } }
+
+templatetypename T int foo(T) noexcept(T()) = delete;
+
+int i = foo(0);   // { dg-error deleted }

Re: [PATCH][RFC] fix reload causing ICE in subreg_get_info on m68k (PR58369)

2013-10-30 Thread Jeff Law


On 10/18/13 14:17, Mikael Pettersson wrote:

   Thanks Mikael.  My only concern is the lack of adjustment when the value
   found was already a SUBREG.
  
   ie, let's assume rld[r].in_reg was something like
   (subreg:XF (reg:DF) 0)
  
   and our target is (reg:DF)
  
   In this case it seems to me we still want to compute the subreg offset,
   right?
  
   jeff

Thanks Jeff.  Sorry about the delay, but it took me a while to work
out the details of the various cases (I was afraid of having to do
something like simplify_subreg, but the cases in reload are simpler),
and then to re-test on my various targets.
No worries.  This stuff is complex and taking the time to thoroughly 
analyze the cases and test is warranted and greatly appreciated.







Let the reloadee be rld[r].in_reg, outermode be its mode, and innermode
be the mode of the hard reg in last_reg.

The trivial cases are when the reloadee is a plain REG or a SUBREG of a
hard reg.  For these, reload virtually forms a normal lowpart subreg of
last_reg, and subreg_lowpart_offset (outermode, innermode) computes the
endian-correct offset for subreg_regno_offset.  This is exactly what my
previous patch did.

Right.




In remaining cases the reloadee is a SUBREG of a pseudo.  Let outer_offset
be its BYTE, and middlemode be the mode of its REG.

Another simple case is when the reloadee is paradoxical.  Then outer_offset
is zero (by convention), and reload should just form a normal lowpart
subreg as in the trivial cases.  Even though the reloadee is paradoxical,
this subreg will fit thanks to the mode size check on lines 6546-6547.

Agreed.





If the reloadee is a normal lowpart SUBREG, then again reload should just
form a normal lowpart subreg as in the trivial cases.  (But see below.)

The tricky case is when the reloadee is a normal but not lowpart SUBREG.
We get the correct offset for reload's virtual subreg by adding outer_offset
to the lowpart offset of middlemode and innermode.  This works for both
big-endian and little-endian.  It also handles normal lowpart SUBREGs,
so we don't need to check for lowpart vs non-lowpart normal SUBREGs.

Sounds right.




Tested with trunk and 4.8 on {m68k,sparc64,powerpc64,x86_64}-linux-gnu
and armv5tel-linux-gnueabi.  No regressions.

Ok for trunk?

gcc/

2013-10-18  Mikael Pettersson  mikpeli...@gmail.com

PR rtl-optimization/58369
* reload1.c (compute_reload_subreg_offset): Define.
(choose_reload_regs): Use it to pass endian-correct
offset to subreg_regno_offset.
I fixed a few trivial whitespace issues and added a testcase.  Installed 
onto the trunk on your behalf.


Thanks for your patience,

Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ef40a43..e3d2abd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-18  Mikael Pettersson  mikpeli...@gmail.com
+
+   PR rtl-optimization/58369
+   * reload1.c (compute_reload_subreg_offset): New function.
+   (choose_reload_regs): Use it to pass endian-correct
+   offset to subreg_regno_offset.
+
 2013-10-30  Tobias Burnus  bur...@net-b.de
 
PR other/33426
diff --git a/gcc/reload1.c b/gcc/reload1.c
index d56c554..b62b047 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -6371,6 +6371,37 @@ replaced_subreg (rtx x)
 }
 #endif
 
+/* Compute the offset to pass to subreg_regno_offset, for a pseudo of
+   mode OUTERMODE that is available in a hard reg of mode INNERMODE.
+   SUBREG is non-NULL if the pseudo is a subreg whose reg is a pseudo,
+   otherwise it is NULL.  */
+
+static int
+compute_reload_subreg_offset (enum machine_mode outermode,
+ rtx subreg,
+ enum machine_mode innermode)
+{
+  int outer_offset;
+  enum machine_mode middlemode;
+
+  if (!subreg)
+return subreg_lowpart_offset (outermode, innermode);
+
+  outer_offset = SUBREG_BYTE (subreg);
+  middlemode = GET_MODE (SUBREG_REG (subreg));
+
+  /* If SUBREG is paradoxical then return the normal lowpart offset
+ for OUTERMODE and INNERMODE.  Our caller has already checked
+ that OUTERMODE fits in INNERMODE.  */
+  if (outer_offset == 0
+   GET_MODE_SIZE (outermode)  GET_MODE_SIZE (middlemode))
+return subreg_lowpart_offset (outermode, innermode);
+
+  /* SUBREG is normal, but may not be lowpart; return OUTER_OFFSET
+ plus the normal lowpart offset for MIDDLEMODE and INNERMODE.  */
+  return outer_offset + subreg_lowpart_offset (middlemode, innermode);
+}
+
 /* Assign hard reg targets for the pseudo-registers we must reload
into hard regs for this insn.
Also output the instructions to copy them in and out of the hard regs.
@@ -6508,6 +6539,7 @@ choose_reload_regs (struct insn_chain *chain)
  int byte = 0;
  int regno = -1;
  enum machine_mode mode = VOIDmode;
+ rtx subreg = NULL_RTX;
 
  if (rld[r].in == 0)
;
@@ -6528,7 +6560,10 @@ choose_reload_regs (struct insn_chain

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Tobias Burnus


Iyer, Balaji V wrote:
What I ideally wanted to do with my testsuite files was that I want 
all the Cilk keywords test to compile no matter what the architecture 
is, but it should only run in certain architectures where the runtime 
is enabled (this is known statically and thus the testsuite doesn't 
have to do anything to figure it out.). Can someone please tell me how 
do I do this?



Which is a bit orthogonal to my patch, which helps that the tests pass 
on systems which are supported. – Thus, let's start with the question 
what do you think of that patch?


On the other hand, one could use the existence of libcilkrts* as 
detected by the patch to decide whether to link or not: If the library 
is there, one can link – if not found, it is unlikely to work (unless it 
is, e.g. found in /usr/lib).


* * *

Regarding compile vs. run: I think one possibility would be to have no 
dg-do in the files and simply change the default to compile or run, 
depending whether the architecture is supported. On the other hand, that 
can be confusing as an explicit dg-do run will break it on some systems.


* * *

Actually, I was wondering whether -fcilkplus should always automatically 
link libcilkrts – akin to -fopenmp which links libgomp. Currently, one 
has to specify it manually.*


Or are the features which do not need libcilkplus common enough that one 
doesn't always want to link it?


[For OpenMP, GCC will have -fopenmp-simd [patch posted, 1], which 
doesn't link libgomp. I could imagine that one would like to have Cilk 
Plus' #pragma simd and the array syntax without enabling threads 
(cilk_sync, cilk_spawn, cilk_for, reducers – and thus libcilkrts linkage).]



Tobias

* libgomp is handled via spec files – including one which adds librt 
when libgomp is linked.


[1] -fopenmp-simd: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02275.html

RE: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Iyer, Balaji V

 * * *
 
 Actually, I was wondering whether -fcilkplus should always automatically link
 libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to 
 specify it
 manually.*

Yes, I would like that to happen. Do you have any pointers about how to do that?

Re: [C++ Patch] PR 58581

2013-10-30 Thread Jason Merrill


OK.

Jason

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Tobias Burnus


Iyer, Balaji V wrote:

* * *

Actually, I was wondering whether -fcilkplus should always automatically link
libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to 
specify it
manually.*

Yes, I would like that to happen. Do you have any pointers about how to do that?


Well, if you need some special libraries like librt, see:
libgomp/libgomp.spec.in
libgomp/configure.ac (search for link_gomp).

Otherwise, simply have a look at gcc/gcc.c – search for fopenmp (which 
adds -pthread).


Tobias

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Jeff Law


On 10/30/13 13:27, Iyer, Balaji V wrote:

* * *

Actually, I was wondering whether -fcilkplus should always automatically link
libcilkrts - akin to -fopenmp which links libgomp. Currently, one has to 
specify it
manually.*


Yes, I would like that to happen. Do you have any pointers about how to do that?

LINK_SPEC and its friends.

jeff

Re: Testsuite / Cilk Plus: Include library path in compile flags in gcc.dg/cilk-plus/cilk-plus.exp

2013-10-30 Thread Jeff Law


On 10/30/13 04:34, Tobias Burnus wrote:

Without that patch, which I have copied from asan-dg.exp, I get tons of
failures because ld cannot find libcilkrts.

OK for committal?

Tobias

cilk.diff


2013-10-30  Tobias Burnusbur...@net-b.de

* gcc.dg/cilk-plus/cilk-plus.exp: Add the libcilkrts library
path to the compile flags.
Yea, seems good to me.  g++.dg/cilk-plus/cilk-plus.exp may have the same 
problem (haven't looked).


jeff

1 2 >

1 - 100 of 152 matches

Mail list logo